# ADVANCES IN OIL CROPS RESEARCH – CLASSICAL AND NEW APPROACHES TO ACHIEVE SUSTAINABLE PRODUCTIVITY

EDITED BY : Dragana Miladinović, Johann Vollmann, Leire Molinero-Ruiz and Mariela Torres PUBLISHED IN : Frontiers in Plant Science

#### Frontiers Copyright Statement

© Copyright 2007-2019 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use. ISSN 1664-8714 ISBN 978-2-88963-068-4 DOI 10.3389/978-2-88963-068-4

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# ADVANCES IN OIL CROPS RESEARCH – CLASSICAL AND NEW APPROACHES TO ACHIEVE SUSTAINABLE PRODUCTIVITY

#### Topic Editors:

Dragana Miladinović, Institute of Field and Vegetable Crops, Serbia Johann Vollmann, University of Natural Resources and Life Sciences Vienna, Austria Leire Molinero-Ruiz, Instituto de Agricultura Sostenible, Spain Mariela Torres, Instituto Nacional de Tecnología Agropecuaria, Argentina

Image: Ilija Radeka, IFVCNS.

Citation: Miladinović, D., Vollmann, J., Molinero-Ruiz, L., Torres, M., eds. (2019). Advances in Oil Crops Research – Classical and New Approaches to Achieve Sustainable Productivity. Lausanne: Frontiers Media. doi: 10.3389/978-2-88963-068-4

# Table of Contents

*06 Editorial: Advances in Oil Crops Research—Classical and New Approaches to Achieve Sustainable Productivity*

Dragana Miladinović, Johann Vollmann, Leire Molinero-Ruiz and Mariela Torres

*10 Allelic Variation and Distribution of the Major Maturity Genes in Different Soybean Collections*

Jegor Miladinović, Marina Ćeran, Vuk Đorđević, Svetlana Balešević-Tubić, Kristina Petrović, Vojin Đukić and Dragana Miladinović

*18 OliveCan: A Process-Based Model of Development, Growth and Yield of Olive Orchards*

Álvaro López-Bernal, Alejandro Morales, Omar García-Tejera, Luca Testi, Francisco Orgaz, J. P. De Melo-Abreu and Francisco J. Villalobos


Alberto Martín-Sanz, Sandra Rueda, Ana B. García-Carneros, Sara González-Fernández, Pedro Miranda-Fuentes, Sandra Castuera-Santacruz and Leire Molinero-Ruiz

*57 Fruit Phenolic Profiling: A New Selection Criterion in Olive Breeding Programs*

Ana G. Pérez, Lorenzo León, Carlos Sanz and Raúl de la Rosa

*71 Using Wild Olives in Breeding Programs: Implications on Oil Quality Composition*

Lorenzo León, Raúl de la Rosa, Leonardo Velasco and Angjelina Belaj


Cloe S. Pogoda, Silas Tittes and Jarrad R. Prasifka

*141 An Oleuropein ß-Glucosidase From Olive Fruit is Involved in Determining the Phenolic Composition of Virgin Olive Oil*

David Velázquez-Palmero, Carmen Romero-Segura, Rosa García-Rodríguez, María L. Hernández, Fabián E. Vaistij, Ian A. Graham, Ana G. Pérez and José M. Martínez-Rivas

*153 Variability in Susceptibility to Anthracnose in the World Collection of Olive Cultivars of Cordoba (Spain)*

Juan Moral, Carlos J. Xaviér, José R. Viruega, Luis F. Roca, Juan Caballero and Antonio Trapero

*164 Olive Cultivation in the Southern Hemisphere: Flowering, Water Requirements and Oil Quality Responses to New Crop Environments* Mariela Torres, Pierluigi Pierantozzi, Peter Searles, M. Cecilia Rousseaux, Georgina García-Inza, Andrea Miserere, Romina Bodoira, Cibeles Contreras and Damián Maestri

#### *176 Genomic Prediction of Sunflower Hybrids Oil Content* Brigitte Mangin, Fanny Bonnafous, Nicolas Blanchet, Marie-Claude Boniface, Emmanuelle Bret-Mestries, Sébastien Carrère, Ludovic Cottret, Ludovic Legrand, Gwenola Marage, Prune Pegot-Espagnet, Stéphane Munos, Nicolas Pouilly, Felicity Vear, Patrick Vincourt and Nicolas B. Langlade

*188 Identification and Functional Annotation of Genes Differentially Expressed in the Reproductive Tissues of the Olive Tree (*Olea europaea *L.) Through the Generation of Subtractive Libraries*

Adoración Zafra, Rosario Carmona, José A. Traverso, John T. Hancock, Maria H. S. Goldman, M. Gonzalo Claros, Simon J. Hiscock and Juan D. Alche

*206 Genetic Tracing of* Jatropha curcas *L. From its Mesoamerican Origin to the World*

Haiyan Li, Suguru Tsuchimoto, Kyuya Harada, Masanori Yamasaki, Hiroe Sakai, Naoki Wada, Atefeh Alipour, Tomohiro Sasai, Atsushi Tsunekawa, Hisashi Tsujimoto, Takayuki Ando, Hisashi Tomemori, Shusei Sato, Hideki Hirakawa, Victor P. Quintero, Alfredo Zamarripa, Primitivo Santos, Adel Hegazy, Abdalla M. Ali and Kiichi Fukui

*217 Development of Highly Informative Genome-Wide Single Sequence Repeat Markers for Breeding Applications in Sesame and Construction of a Web Resource: SisatBase*

Komivi Dossa, Jingyin Yu, Boshou Liao, Ndiaga Cisse and Xiurong Zhang

*227 The First Molecular Identification of an Olive Collection Applying Standard Simple Sequence Repeats and Novel Expressed Sequence Tag Markers*

Soraya Mousavi, Roberto Mariotti, Luca Regni, Luigi Nasini, Marina Bufacchi, Saverio Pandolfi, Luciana Baldoni and Primo Proietti


*266 Use of Blue-Green Fluorescence and Thermal Imaging in the Early Detection of Sunflower Infection by the Root Parasitic Weed* Orobanche cumana *Wallr.*

Carmen M. Ortiz-Bustos, María L. Pérez-Bueno, Matilde Barón and Leire Molinero-Ruiz


# Editorial: Advances in Oil Crops Research—Classical and New Approaches to Achieve Sustainable Productivity

#### Dragana Miladinovic´ 1 \*, Johann Vollmann<sup>2</sup> , Leire Molinero-Ruiz <sup>3</sup> and Mariela Torres <sup>4</sup>

<sup>1</sup> Sunflower Department, Institute of Field and Vegetable Crops, Novi Sad, Serbia, <sup>2</sup> Department of Crop Sciences, University of Natural Resources and Life Sciences Vienna, Vienna, Austria, <sup>3</sup> Departamento de Protección de Cultivos, Instituto de Agricultura Sostenible, Córdoba, Spain, <sup>4</sup> Estación Experimental Agropecuaria San Juan, Instituto Nacional de Tecnología Agropecuaria, Buenos Aires, Argentina

Keywords: oil crops, breeding, quality, production, diversity

#### **Editorial on the Research Topic**

#### **Advances in Oil Crops Research—Classical and New Approaches to Achieve Sustainable Productivity**

The world production of main oil crops is steadily increasing, mainly due to population growth and increased use of oil crops in bio-fuel production and in edible vegetable oils. From the perspective of sowing area in the world, oil crops are only preceded by cereals in importance. Edible or industrial oils are extracted from seeds, fruits or mesocarp, and nuts of both annual and perennial species. Oil can be obtained from about 40 different crops, but soybean, sunflower, olive tree, and rapeseed have a major importance in the total world trade. The purpose of the Research Topic "Advances in Oil Crops Research—Classical and New Approaches to Achieve Sustainable Productivity" is to provide the reader compiled information of the latest research results about different aspects of oil crops.

This research topic incorporates 23 publications including 19 research papers, three review articles, and one perspective.

#### DIVERSITY CHARACTERIZATION

Germplasm collections are very important for conservation of oil crop diversity. In olive tree, only few collections have been fully genotyped and characterized. Mousavi et al. screened previously uncharacterized olive trees from the olive collection of Perugia University with standard simple sequence repeats (SSR) and new expressed sequence tag (EST)-SSR markers. They found the new OLea Expressed Sequence Tags (OLEST) SSR markers to be as effective as other commonly used markers. OLEST-SSR markers are very useful to create a common database from worldwide collections of olive cultivars.

Jatropha has been recognized as one of the best candidates for future biodiesel production. Attempts for commercial cultivation of the species in Africa and Asia have failed due to low productivity. Thus, better yielding commercial cultivars must be developed on the basis of an increased knowledge on the species genetic diversity. Li et al. analyzed the genetic diversity of 246 Jatropha accessions collected in Africa, Asia, and Mesoamerica by means of SSR and retrotransposon-based insertion polymorphism markers. The authors identified that two accessions from Veracruz (Mesoamerica) were the source of all Jatropha of Africa and Asia, and hypothesized that human selection caused low Jatropha productivity. They have also suggested that Jatropha of Africa and Asia can be improved through the implementation of breeding strategies.

Edited and reviewed by: Sergio J. Ochatt, INRA UMR1347 Agroécologie, France

> \*Correspondence: Dragana Miladinovic´ draganavas@yahoo.com

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 28 February 2019 Accepted: 31 May 2019 Published: 18 June 2019

#### Citation:

Miladinovic D, Vollmann J, ´ Molinero-Ruiz L and Torres M (2019) Editorial: Advances in Oil Crops Research—Classical and New Approaches to Achieve Sustainable Productivity. Front. Plant Sci. 10:791. doi: 10.3389/fpls.2019.00791

**6**

# ENVIRONMENT AND AGRONOMY

Olive is a crop very well-adapted to temperature and precipitation regimes typical of regions in the Mediterranean Basin. The increasing international demand for olive oil has led to the expansion of olive cultivation in new production regions where environmental conditions differ from those of the Mediterranean Basin and there is little information about olive adaptation to these new environments. Torres et al. reviewed the scientific literature on olive cultivation in non-Mediterranean environments, with focus on chilling requirements for flowering, water requirements, and irrigation management, as well as environmental effects on fruit oil content and quality. Their information could be useful to determine whether environmental conditions in new growing regions are appropriate for achieving sustainable olive and oil productions. Several simulation models of the olive crop have been formulated so far, but none of them is capable to account for the impacts of environmental conditions and management practices on growth and productivity in the absence of nutrient deficiencies, diseases, and pests. López-Bernal et al. described OliveCan, a process-oriented model formulated using previous models by the group, that enables the assessment of combined effects of management operations, soil traits, and weather over crop performance both under unstressed and water deficit conditions.

Functional-structural plant modeling (FSPM) is a dynamic method for prediction of plant growth under varying environmental conditions. As the temperature is a primary factor affecting growth and development of rapeseed, Tian et al. tested three different temperature treatments by using a FSPM of seedlings with a growth function used for leaf extension and biomass accumulation implemented by combining measurements with literature data.

#### BIOTIC STRESSES

Effective, economically viable, and environmentally friendly methods for pest control are needed in the framework of sustainable agriculture in Europe. These novel methods are particularly important when dealing with insect pests that have developed insecticide resistance, such as olive fruit fly. Yousef et al. tested the efficacy of soil treatment with the entomopathogenic fungus Metarhizium brunneum for controlling the olive fly and found that, during spring, the density of pest population emerging from the soil was reduced by up to 70%. They also studied the retention of conidia as a function of soil type and rain amount.

Several secondary metabolites can be produced by capitate glandular trichomes (CGT). Some of these secondary metabolites provide durable resistance to insect pests. The specialist pest sunflower moth is combated by host resistance based on CGT. Gao et al. have identified two major QTLs controlling CGT density in sunflower florets. They constructed a genetic linkage map from genotyping-by-sequencing data using SNP markers. The obtained results will advance the understanding of CGTmediated resistance to insect pests in sunflower. The authors also provide a resource for marker-assisted selection for insect resistance in this oil crop species.

Anthracnose is an important disease that causes fruit rot and branch dieback of olive trees. The appearance of symptoms and their evolution in time are highly dependent on cultivar susceptibility and environmental conditions. Moral et al. looked for potential sources of resistance to Colletotrichum acutatum, causal agent of anthracnose, in the World Olive Germplasm Bank located in Córdoba (Spain). The authors also described the original methodology that could be also used in the evaluation of olive cultivar responses to other aerial fungi affecting leaves and fruit.

Verticillium dahliae is a soil-borne fungus that causes wilt and leaf mottle in sunflower. The incidence of this disease of sunflower has increased in recent years consequently affecting sunflower oil production. Martin-Sanz et al. described the population structure of V. dahliae affecting sunflowers in Europe. According to the results of genetic, molecular, and pathogenic analyses of particular traits of the isolates, they identified two groups. One was diverse and included V. dahliae isolates from Eastern Europe. The other was highly homogeneous and clustered isolates from Western Europe. The authors recommend taking into account the existence of these two groups of V. dahliae when looking for sources of resistance to this disease in European environments.

Orobanche cumana L. is one of the main limiting factors in sunflower production. Infection of this parasitic weed occurs early after sowing and affects host physiology from that moment and during underground parasite stages. Ortiz-Bustos et al. used the blue-green fluorescence (BGF) emission and thermal imaging of leaves for the detection of the infection of sunflower by O. cumana during underground parasite development. Lower BGF emissions and higher temperatures were detected in leaves of infected plants as compared to those of healthy plants, indicating that BGF imaging and thermography could be used for fast and non-destructive screening of lines of sunflower from breeding programs for resistance to this parasitic weed.

# PRODUCT QUALITY

The high genetic diversity of wild olives can be used for introduction of some important agronomic traits into cultivated varieties. However, some of the traits introduced from wild relatives can have negative effects on some other desirable traits such as the composition of virgin olive oil (VOO). As VOO from olive cultivars is highly appreciated for its fatty acid composition, León et al. compared the fatty acid profiles of VOO from wild olives and olive cultivars. They found that the use of wild germplasm in olive breeding programs does not have a negative impact on fatty acid composition, tocopherol content, and tocopherol and phytosterol profiles if the selection for these traits is conducted from early generations of crossings.

Phenolic compounds that are present in olive oil have a strong antioxidant activity and are one of the factors responsible for health benefits associated with VOO consumption. However, phenolic composition is not a common selection criterion in olive breeding programs. A simplified procedure for phenolic profiling of the olive fruits avoiding the previous step of oil extraction was developed by Pérez et al. When applying it in the detection of phenolic content variability in different olive genotypes, a high genotypic variance of fruit phenol content was found in the tested genotypes. The authors concluded that the observed high genotypic variance together with the simplified method for fruit phenol evaluation can be useful in olive tree breeding for improved phenolic profile of the oil. Velázquez-Palmero et al. studied genes and enzymes responsible for the phenolic composition of VOO, such as β-glucosidases. They found that olive GLU gene expression is cultivar-dependent and regulated by temperature, light, and water regime, as well as its transcription in olive fruit spatially and temporally regulated. Their study is a further step in elucidating the factors involved in biosynthesis of the major phenolic compounds in VOO. It can also help to develop molecular markers for the selection of cultivars with improved oil quality characteristics.

#### PLANT NUTRITION

Nitrogen fertilization allows farmers to improve plant productivity. However, it has to be carefully managed to avoid harmful environmental impacts due to nitrogen loss. Increasing nitrogen use efficiency (NUE), or the amount of N fertilizer taken up and used by the crop, is vital to solve the conflict between productivity and the protection of the environment. The NUE of rapeseed is low, although it has a high uptake capacity for inorganic N. The development of cultivars with improved NUE is a major goal in rapeseed breeding. A collection of 30 elite rapeseed varieties registered between 1989 and 2014 was tested by Stahl et al. under two different fertilization regimes in a 2-year experiment in 10 different environments for changes in seed yield and seed quality traits. Their data revealed that genetic improvement in combination with reduced N fertilizer inputs has a tremendous potential to increase NUE of rapeseed.

Using a transcriptome approach, Safavi-Rizi et al. found an opposing effect of N depletion on gene regulation: some genes were up- or downregulated in N-depleted lower canopy leaves, contrarily to their regulation in upper leaves under ample N supply. Also, some genes were expressed as associated to senescence in rapeseed, but not in Arabidopsis. They hypothesized that some of these genes may have rapeseedspecific functions in nitrogen remobilization during N-deficiency induced leaf senescence. Some genes also seem to contribute to differences in senescence and N resources mobilization in upper and lower canopy leaves.

#### PLANT GROWTH AND DEVELOPMENT

Production of olive trees relies on the successful achievement of sexual reproduction. Due to self-incompatibility in the species, successful fertilization is highly favored by the presence of pollen grains from a different cultivar. Zafra et al. studied the reproductive biology of the olive through identification of key gene products involved in pollen and pistil physiology. Aiming at elucidating the biological processes occurring during the courtship, pollen grain germination, and fertilization in olive, they constructed SSH libraries using pistil and pollen that allowed the identification of transcripts with important roles in stigma physiology.

In the MADS-Box gene family, MADS intervening keratinlike and MIKC-type are found only in plants. They encode transcription factors that have major roles in plant growth and developmental processes. However, a comprehensive analysis of this gene family in G. hirsutum, concerning characterization and functions, has not yet been reported. Ren et al. analyzed MIKC-type MADS genes in the tetraploid cotton, which is the most widely cultivated cotton species. A total of 110 GhMIKC genes were identified, located in the genome, and phylogenetically classified into 13 subfamilies. Since most of them were highly expressed in floral organs, the study provides useful information of the involvement of GhMIKCs' regulation in cotton flowering.

#### MOLECULAR TOOLS FOR BREEDING

In their review, Dimitrijevic and Horn ´ elaborated present and past achievements in sunflower breeding and biotechnology, as well as the future perspectives of using modern molecular tools in this research area. They emphasized that the key for further successes in sunflower breeding is in integrative approaches to the genetics of complex quantitative traits and physiological and biochemical mechanisms involved, a challenge that must be faced with new high-throughput technologies in combination with new genomic-based breeding strategies.

Genomic selection (GS) models can predict characters performance after being trained on phenotypic and genotypic information for those characters, and their most important advantage is that they do not have to include all hybrid parents. Mangin et al. compared the accuracy of sunflower oil content prediction of commonly used general combining ability (GCA) and genomic predictions. Their results show that GS provides more accurate predictions compared to the classical predictions based on GCA with at least one parent is untested or not wellcharacterized, thus increasing breeding speediness and efficiency.

The lack of identified marker-trait associations is a major limitation toward development of successful marker-assisted breeding programs in safflower. Ambreen et al. tested a safflower panel (CartAP) comprising of 124 accessions for its suitability for association mapping. They have detected several markertrait associations, either in a majority or in some environments. The results of this study will facilitate wider use of markerassisted breeding in safflower, as well as identification of genetic determinants of trait variability.

Soybean flowering time and maturity are controlled by E genes whose different allelic combinations determine soybean adaptation to a certain latitude. Miladinovic et al. ´ have described the first attempt to assess adaptation of soybean genotypes developed at Institute of Field and Vegetable Crops, Novi Sad, Serbia based on E gene variation, as well as to comparatively assess E gene variation in North-American, Chinese and European genotypes. As e1-as/e2/E3/E4 was the most common genotype and was present in the top performing genotypes, this specific allele combination was proposed as optimal for the environments of Central-Eastern Europe.

Despite its economic and nutritional importance, genetic improvement of sesame lags behind that of the other oil crops. In their review, Dossa et al. described the evolution of research in sesame genetics and highlighted the recent advances in the—omics area. They have also discussed the future prospects for genetic improvement and better expansion of sesame. In another study, Dossa et al. developed SisatBase, an user-friendly database providing free access to SSRs data of sesame. It is as well an integrated platform for functional analyses of this oil crop.

#### CONCLUSION

The paraphrased last sentence from the review of Dimitrijevic´ and Horn is a perfect conclusion message for the topic and a recommendation for future work on oil crops:

". . . Only a combined effort of the oil crop research community can make oil crops more competitive to other crops. The new high-throughput technologies combined with new genomicbased breeding strategies give us the opportunity, as never before, to understand and mine genetic variation and to use it for improvement of oil crops. . . ".

# AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Miladinovi´c, Vollmann, Molinero-Ruiz and Torres. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Allelic Variation and Distribution of the Major Maturity Genes in Different Soybean Collections

Jegor Miladinovic´ 1 \*, Marina Ceran ´ <sup>1</sup> , Vuk Ðordevi ¯ c´ 1 , Svetlana Baleševic-Tubi ´ c´ 1 , Kristina Petrovic´ 1 , Vojin Ðukic´ <sup>1</sup> and Dragana Miladinovic´ 2

<sup>1</sup> Soybean Department, Institute of Field and Vegetable Crops, Novi Sad, Serbia, <sup>2</sup> Industrial Crops Department, Institute of Field and Vegetable Crops, Novi Sad, Serbia

Soybean time of flowering and maturity are genetically controlled by E genes. Different allelic combinations of these genes determine soybean adaptation to a specific latitude. The paper describes the first attempt to assess adaptation of soybean genotypes developed and realized at Institute of Field and Vegetable Crops, Novi Sad, Serbia [Novi Sad (NS) varieties and breeding lines] based on E gene variation, as well as to comparatively assess E gene variation in North-American (NA), Chinese, and European genotypes, as most of the studies published so far deal with North-American and Chinese cultivars and breeding material. Allelic variation and distribution of the major maturity genes (E1, E2, E3, and E4) has been determined in 445 genotypes from soybean collections of NA ancestral lines, Chinese germplasm, and European varieties, as well as NS varieties and breeding lines. The study showed that allelic combinations of E1–E4 genes significantly determined the adaptation of varieties to different geographical regions, although they have different impacts on maturity. In general, each collection had one major E genotype haplogroup, comprising over 50% of the lines. The exceptions were European varieties that had two predominant haplogroups and NA ancestral lines distributed almost evenly among several haplogroups. As e1-as/e2/E3/E4 was the most common genotype in NS population, present in the bestperforming genotypes in terms of yield, this specific allele combination was proposed as the optimal combination for the environments of Central-Eastern Europe.

#### Keywords: photoperiodism, E genes, soybean, maturity, allelic variation, SNP

#### INTRODUCTION

Photoperiodism is reaction of a plant to the length of day and night. The discovery of photoperiodism in soybean and tobacco (Garner and Allard, 1930), and subsequently in numerous other plant species, paved the way for investigation of mechanisms which control seasonal rhythm in plants, such as flowering, growth and reproduction (Putterill et al., 2009). Besides plant growth, photoperiod also affects other aspects of plant development, including changes of developmental phases and overall length of vegetation period. Knowledge of photoperiodic response of cultivated crops is one of the vital aspects of modern-day plant production, as shifting seasonal timing of reproduction became a major goal of plant breeding programs in their effort to produce novel varieties better adapted to local environments and variable climate conditions. This also stands for

Edited by:

Rafael Lozano, University of Almería, Spain

#### Reviewed by:

Gunvant Baliram Patil, University of Minnesota, United States Matthew Nicholas Nelson, Royal Botanic Gardens, Kew, United Kingdom

> \*Correspondence: Jegor Miladinovic´ jegor.miladinovic@ifvcns.ns.ac.rs

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 09 October 2017 Accepted: 16 August 2018 Published: 04 September 2018

#### Citation:

Miladinovic J, ´ Ceran M, Ðor ´ devi ¯ c V, ´ Baleševic-Tubi ´ c S, Petrovi ´ c K, Ðuki ´ c V´ and Miladinovic D (2018) Allelic ´ Variation and Distribution of the Major Maturity Genes in Different Soybean Collections. Front. Plant Sci. 9:1286. doi: 10.3389/fpls.2018.01286

soybean [Glycine max (L.) Merr.] since it is a photoperiodsensitive short-day plant, i.e., soybean transition from vegetative to reproductive stage directly depends on the length of day (Miladinovic and Ðor ´ devi ¯ c, 2011 ´ ).

# SOYBEAN AND PHOTOPERIODICITY

Estimation of plant phenological stage of development, as a function of specific environment, is the key factor in any attempt of modifying plant growth, adaptation, or productivity (Jones and Laing, 1978; Davidson and Campbell, 1983; Hodges and French, 1985; Wang et al., 1987; Colson et al., 1995; Miladinovic and ´ Ðordevi ¯ c, 2011 ´ ). Soybean phenology is difficult to predict, as it depends on photoperiod (Garner and Allard, 1930; Seddigh et al., 1989), temperature (Egli and Wardlaw, 1980; Board and Hall, 1984), and amount of plant-available water (Meckel et al., 1984; Miladinovic et al., 2006 ´ ).

Dependence of soybean on the length of the day resulted in its geographic distribution into 13 maturity group (MG) zones, from MG000 comprised of varieties which can thrive in higher geographic latitudes, to MGX which includes varieties grown in lower latitudes (Hartwig, 1973). By definition, MGs are the result of classification of soybeans based on their growth and development. The difference between MGs in a certain area are caused by photoperiodic requirements of a variety, while the difference in maturity date between two adjacent MGs ranges from 10 to 18 days. In soybeans, the critical photoperiod, or the definite day length light period above or below which the plant never blooms, decreases progressively from higher to lower geographic latitudes. Photoperiodic requirements thus reduce soybean growing area to a narrow latitude range of around 200 km (Scott and Aldrich, 1983). If grown in higher geographic latitudes compared to its optimal growing area, a soybean variety will flower and mature later, while a variety grown in lower geographic latitudes will flower and mature earlier, resulting in lower vegetative mass and lower yields. Hence, there is an optimal MG for each soybean growing region (Miladinovic and Ðor ´ devi ¯ c, ´ 2011).

# SOYBEAN AND E GENES

Time of flowering and maturity in soybean are genetically controlled by E genes that have different roles in maturity and photoperiod sensitivity. Their allelic combinations contribute to fine adaptation of genotype to certain latitude and climate conditions.

Up to date, 11 major loci (E1–E10 and J) affecting time of flowering and maturity have been identified in soybean (Samanfar et al., 2017). The dominant allele of E genes usually confers later flowering and later maturity, except for E6, E9, and J (Ray et al., 1995; Bonato and Vello, 1999; Kong et al., 2014), while photoperiod sensitivity decreases with numbers of recessive alleles. Owen (1927) described the effect of E1 gene in soybean in his inheritance studies, while Bernard (1971) described two major genes (E1 and E2) which control flowering time and maturity. Buzzell (1971) and Kilen and Hartwig (1971) investigated soybean flowering response to fluorescent daylength conditions and concluded that E3 gene determines late flowering, while e3 causes insensitivity to fluorescent light. Buzzell and Voldeng (1980) studied inheritance of insensitivity to extended daylength and found that E4 allele confers late flowering and sensitivity to extended daylength, while e4 allele influences early flowering and insensitivity to extended daylength. McBlain and Bernard (1987) described E5 gene and its genetic effect similar to E2. However, in the study conducted in order to map E5 locus, the results obtained from different F2 mapping populations expected to segregate for E5 were not consistent, and no candidate QTL was found, indicating that a unique E5 gene may not exist (Dissanayaka et al., 2016). Bonato and Vello (1999) described the dominant allele of E6 gene which confers early flowering and maturity in soybeans. Cober and Voldeng (2001) described E7 - soybean maturity and photoperiod-sensitivity locus - and its link to E1 and T. Cober et al. (2010) mapped E8 gene, with the dominant allele resulting in later maturity and recessive allele conferring early maturity. The E9 locus was subjected to detailed molecular analysis and gene FT2a was identified, with the recessive allele delaying flowering (Kong et al., 2014; Zhao et al., 2016). Samanfar et al. (2017) identified E10, new maturity locus in soybean, with FT4 being the predicted functional gene underlying this locus.

Maturity genes E1 (Xia et al., 2012), E2 (Watanabe et al., 2011), E3 (Watanabe et al., 2009), E4 (Liu et al., 2008), E9 (Zhao et al., 2016), and J (Yue et al., 2017) have recently been characterized at the molecular level. Tsubokura et al. (2014) estimated that maturity genes E1–E4 contribute to 62–66% of the flowering time variation. Association analysis and the study of genetic architecture and networks underlying agronomical traits revealed that E1 and E2 loci have pleiotropic effects across the traits related to yield and seed quality, as the key nodes in the regulation of different traits (Fang et al., 2017). E1 gene was located on chromosome 6 and identified as legume-specific transcription factor which functions as the flowering repressor with putative nuclear localization signal and B3-related domain (Xia et al., 2012). E1 allele is functional, e1-as allele with missense mutation is not fully functional, while e1-fs with a frameshift mutation and e1-nl with deletion of entire E1 gene are both nonfunctional (Xia et al., 2012). Mutant e1-nl allele was identified in MG000 European (Swedish) cultivar Fiskeby III, along with the major mutant alleles for other maturity genes (E2 and E3) (Valliyodan et al., 2016). Out of the known E loci in soybean, E1 gene is considered to have the largest effect on determination of flowering time under field conditions (Xia et al., 2012). For improvement of varietal adaptability, the function of E1 gene should be reduced due to its strong impact on delaying maturity. Besides E1, allelic variability of other E genes is also important in adaptation and can provide significant genetic plasticity that could enable cultivation of soybeans in wider geographic areas (Cober et al., 1996). Located on chromosome 10, E2 gene is an ortholog of Arabidopsis flowering gene GIGANTEA involved in the circadian rhythm and flowering time pathway (Watanabe et al., 2011). The dominant E2 allele is functional, while e2 with nonsense mutation presents the non-functional

allele with an early-flowering phenotype (Watanabe et al., 2011). E3 (GmPhyA3) and E4 (GmPhyA2) are located on chromosomes 19 and 20, respectively, and they represent phytochrome A genes. E3-Ha and E3-Mi are functional alleles, while e3-tr, e3-ns, and e3-fs are non-functional with mutations causing truncated proteins (Watanabe et al., 2009). E3 delays flowering under longday conditions which ultimately affects maturity. E4 gene has a functional allele E4 and five non-functional alleles (e4-SORE-1, e4-oto, e4-tsu, e4-kam, and e4-kes) (Liu et al., 2008; Tsubokura et al., 2013). Being photoperiodically insensitive, varieties with non-functional alleles are mostly adapted to high latitudes, which indicates the importance of e4 allele for high latitude adaptation. E1 and E2 genes have significant impact on vegetative development, while loci E3 and E4 might affect post-flowering reproductive development by increasing pod filling duration, and the number of nodes and pods by up-regulating the expression of growth habit gene Dt1 (Xu et al., 2013).

Higher temperatures at reproductive stages were found to affect plant vigor and overall yield (Egli et al., 2005; Ren et al., 2009). It has also been reported that N and P concentrations in mature seeds increase with the increment of day/night temperatures (Thomas, 2001). This observation implicates that there might be some loci which show optimum activity of N uptake and transport at elevated temperatures (Patil et al., 2017). Protein content was also positively correlated with higher temperature (Wolf et al., 1982). Higher protein content was observed in late maturity group lines (MGV-MGX) compared to early maturity group lines (MG000-MGII). Since genetic variation in seed composition is more affected by geographic regions (∼5%) than by MG (∼2%) (Bandillo et al., 2015), it can be concluded that the effect of lower temperatures on seed composition at higher latitudes might be overcome by the selection of more diverse parental lines in early MGs (Piper and Boote, 1999). Hence, achieving higher protein content without affecting yield or oil content could be obtained by exploring genetic variants of E genes.

## SOYBEAN AND E GENES ALLELIC VARIATION

Study of allelic variation and diversity of soybean floweringtime and maturity genes may enhance soybean breeding for particular environments, since achieving appropriate maturity in a target environment maximizes crop yield potential (Langewisch et al., 2017). However, there are not many studies focused on soybean maturity and genes affecting it. This especially stands for European soybean varieties, as most of the studies published so far deal with North-American and Chinese genotypes (Tsubokura et al., 2013, 2014; Jiang et al., 2014; Langewisch et al., 2014, 2017; Zhai et al., 2014; Abugalieva et al., 2016; Valliyodan et al., 2016; Kurasch et al., 2017; Li et al., 2017; Wolfgang and An, 2017)

Soybean collection maintained at Institute of Field and Vegetable Crops Novi Sad, Serbia (IFVCNS), consists of more than 1200 cultivars and lines originating from America (50.1%), Asia (15.6%), and Europe (34.4%). Most of the genotypes belong to MG0 and MGI, but the collection also includes genotypes from MG000 to MGV (Hrustic and Miladinovi ´ c, 2011 ´ ). In order to characterize allelic variation and distribution of the major maturity genes (E1–E4) we conducted the study on 445 genotypes from soybean collections of North-American (NA) ancestral lines (27), Chinese germplasm (14), European varieties (12), genotypes developed and released at IFVCNS - NS varieties (56), NS breeding lines (229), 57 high-protein and 50 high-oil genotypes from NS collection. Our research was mainly focused on identification of the most prominent E allelic combinations in NS varieties and breeding lines, with the aim of applying the obtained results for the improvement and targeted breeding of soybean varieties adapted to the environments of Central-Eastern Europe. MGs of the analyzed soybean genotypes ranged from MG000 to MGIII, including several genotypes from NA ancestral population which belong to later MGs MGV-MGVII. All 445 soybean genotypes were characterized using genotyping-by-sequencing approach (Elshire et al., 2011). Sequenced libraries produced a total of 145x10<sup>7</sup> raw 100-bp DNA reads that were aligned against soybean reference genome Williams 82. Reads aligned in the same genomic region were used for SNP calling, producing initially more than 85,000 SNP markers, where mean sequencing depth per SNP locus was 5. After filtering and application of a combination of criteria, such as minor allele frequency (MAF < 0.05) and the percentage of missing values (PMV < 20%), the obtained marker dataset was used for prediction of the allelic variation of four maturity genes (E1/e1-as, E2/e2, E3/e3-tr, and E4/e4). Genes E1 and E2 possess SNP that gives rise to alleles e1-as (G/C; Glyma.W82.a2: 20,207,322 bp) and e2 (A/T; Glyma.W82.a2: 45,310,781 bp), while E3 and E4 genes do not have a functional SNP, causing corresponding occurrence of variant alleles e3 tr and e4 correspondingly. The determinant SNP makers for genes E1 and E2 were not present in our dataset, as marker analysis was performed with reduced representation genotyping approach. Thus we examined groups of markers positioned in specific genomic regions which include entire E genes (positions given in **Supplementary Table S1**), by characterizing haplotypes aiming to identify the causative haplotype for different alleles, functional, and variant (**Supplementary Table S1**). SNPViz tool was used for haplotype visualization and comparison, in order to provide information on genetic neighborhood in which variable alleles appeared, as described by Langewisch et al. (2014). By comparing the obtained haplotypes with previously published profiles of E genes for NA ancestral lines and European varieties (Kurasch et al., 2017; Langewisch et al., 2017), we managed to identify the groups of haplotypes for each allele. Within these regions, lines with known different alleles were distinctively separated to specific haplotypes, which allows assignment of alleles of each gene to specific haplotypes. Moreover, it was made possible to further specify the identified haplotypes and select groups of diagnostic SNP markers able to determine different alleles of E genes (**Supplementary Table S1** and **Supplementary Figures S1**–**S4**). Four SNP markers were identified for E1 gene which could be distinguished among alleles E1, e1-as, and e1 nl **Supplementary Figure S1**. Two SNP markers were defined for genes E2 and E3, causing the difference between E2 and e2 alleles, as well as between E3 and e3-tr **Supplementary**

**Figures S2**, **S3**. According to the five identified SNP markers, lines could be classified as E4 or e4 allele (**Supplementary Figure S4**). The obtained results were applied for further allele scoring of E genes of previously non-genotyped lines, such as NS-varieties and NS-breeding lines. Predictions did not include the non-functional el-fs allele, and no distinction could be made between E3-Ha and E3-Mi alleles, or e4-oto, e4-tsu, e4 kam, and e4-kes.

All examined genotypes were classified into 12 E haplogroups which represent a unique combination of E1, E2, E3, and E4 alleles (**Supplementary Table S2**). Each collection had one major E haplogroup which included most of the lines (≥50%) (**Figure 1**). The exception were European varieties with two predominant groups and NA ancestral lines which were distributed almost evenly among several haplogroups. The most abundant NA ancestral haplogroup was e1-as/e2/E3/E4 with 26% of genotypes. Over 20% of genotypes had E1/E2/E3/E4, which coincides with the high number of late-maturing genotypes included in the NA collection. This is concordant with molecular maturity model proposed by Langewisch et al. (2017) where early-maturing lines had non-functional e1/e2/e3-tr and latematuring lines had E1/E2/E3. None of the lines from the Chinese collection carried functional photoperiod-sensitive E1 allele, while two photoperiod-insensitive alleles (e1-nl and e1 as) were identified at E1 locus. Contrary to previous reports, where most of the tested accessions of breeding collections

from China were E1/e2/E3/E4 followed by e1-as/e2/e3/E4 (Jia et al., 2014; Jiang et al., 2014), the most abundant haplogroup in Chinese germplasm in our collection was e1-as/e2/e3-tr/E4 including 57% of lines. The remaining lines mostly contained e1 nl allele with e1-nl/e2/e3-tr/E4 and e1-nl/e2/e3-tr/e4, covering 21 and 14% of genotypes, respectively. Of all examined genotypes, only three lines from the NA ancestral and Chinese germplasm had the non-functional e4 allele, which is in agreement with the fact that e4 alleles were found in small geographical locations with high latitudes and low temperature (Tsubokura et al., 2013; Langewisch et al., 2014). European varieties from our collection differed from the varieties studied by Kurasch et al. (2017) in their E haplotype, as more than 30% of varieties from our collection were e1-as/E2/E3/E4 and the same number of genotypes had e1-as/E2/e3-tr/E4, compared to 1.3 and 12%, respectively, in the study of Kurasch et al. (2017). The haplogroup e1-as/e2/E3/E4 comprised 17% of lines in European varieties. Similar to the results of Kurasch et al. (2017), NS varieties primarily had e1-as/e2/E3/E4 haplotype identified in 63% of genotypes. The same haplotype was found to be predominant in the Kazakh breeding collection (Abugalieva et al., 2016), which was attributed to the frequent exchange of germplasm with partners from former USSR countries. The second largest haplotype in NS varieties is e1-as/e2/e3 tr/E4 with 11% of genotypes. All other allelic combinations were found in less than 10% of genotypes. For the E1 locus, allele el-as was detected in 88% of NS varieties classified into MG000-MGIII. Nearly half of the NS breeding lines were predicted to be e1-as/e2/E3/E4, with the additional 13% of genotypes predicted as e1-as/e2/e3-tr/E4 and 13% of e1-as/E2/e3 tr/E4.

Novi Sad collection exhibited a high diversity of allelic combinations of E genes. The most abundant haplogroup in NS varieties and breeding lines was e1-as/e2/E3/E4. The same haplogroup comprised 26% of genotypes in NA ancestral lines which represent more than 85% of the pedigree of NS varieties, indicating that e1-as/e2/E3/E4 haplotype was favored by artificial selection during soybean breeding under the environmental

FIGURE 2 | Clustering tree and haplotypes of NA ancestral lines, Chinese germplasm, European varieties, and NS varieties, based on the diagnostic SNP markers for four analyzed E genes, (EM, early maturing; LM, late maturing).

conditions of Central-Eastern Europe. NS varieties and breeding lines had predominantly e1-as allele, indicating that they might have been purposely selected for similar maturity and highyield capacity. This finding is compatible with identified genomic regions affected by positive selection during breeding of NS soybean varieties, where two loci (Satt357 and Satt557) on chromosome 6 were identified as strong positive selection candidates (Tomici ˇ c et al., 2015 ´ ). Similar results were obtained during evaluation of the allelic variants at the E genes in 75 European cultivars from five MGs (MG000-MGII), where none of the cultivars carried functional E1 allele (Kurasch et al., 2017). It was concluded that the E1 locus plays a major role in early flowering and maturity, and that photoperiod insensitivity at this locus is a probable requirement for soybeans adapted to Central or Northern Europe.

Based on the used SNP markers, two major clusters can be identified and described as early material (EM) consisting of MG000-MGI and late material (LM) consisting mainly of MGII-MGIV genotypes (**Figure 2**). The first subgroup from EM included Chinese germplasm, NA ancestral lines and NS varieties. Recessive haplotype e1-as/e2/e3-tr/E4 was the dominant haplotype in this subgroup. The majority of NS varieties belonged to the second subgroup, along with a few NA and European varieties. The most abundant allelic combination in this subcluster was e1-as/e2/E3/E4, which represents the major haplogroup within environmental conditions of Central-Eastern Europe. The second group mostly comprised LM genotypes, including NA ancestral lines, European and NS varieties, from MGI to MGIV, with the exception of NS Princeza which belongs to MG0. Most lines from the LM cluster had three or four dominant alleles. The observed grouping of genotypes into late and early maturing clusters indicates that alleles of E1–E4 genes can be used in the assessment and classification of the assortment according to maturity. However, four genes used in this study are not sufficient to make the precise distinction between different MGs, as there is no unique allelic combination specific for each group. Further work is needed to identify molecular mechanisms behind soybean photoperiod sensitivity, and linking specific allelic combination with a certain MG.

#### FUTURE PROSPECTS

Flowering time and maturity are agronomically important traits, which affect soybean adaptation, quality traits, and yield. Complete understanding of their regulation will therefore allow breeding of varieties with optimal flowering or maturity for particular geographic regions. Considering this, we have studied the distribution of different haplogroups of E genes in soybean collections of NA ancestral lines, Chinese germplasm, European varieties, NS varieties, and NS breeding lines. Overall, the observed allelic combinations of E1–E4 genes significantly determined the adaptation of varieties to different geographic regions, although they have different impacts on maturity.

The study was also the first attempt to assess NS soybean varieties and breeding line adaptation based on E gene variation, as well as comparatively assess E gene variation in NA, Chinese and European genotypes. The study aims to identify the most prominent E allelic combinations in NS genotypes and apply the obtained results in target breeding and introgression of yield and quality traits for environments of Central-Eastern Europe. As e1-as/e2/E3/E4 was the most common haplotype in the NS population comprising high-yielding varieties grown throughout Central and Eastern Europe, the specific allele combination was proposed as the optimal combination for this environment.

The obtained data provide useful information about the selection of parental genotypes most appropriate for use in local breeding programs for improved soybean productivity. This allows plant breeders to transfer traits more effectively into different MGs, enhance germplasm exploitation, and increase overall efficiency of soybean breeding. Furthermore, the obtained knowledge will significantly contribute to the marker-assisted selection and molecular breeding of soybean, and increase breeding efficiency by enhancing genomic prediction models.

#### AUTHOR CONTRIBUTIONS

JM, MC, SB-T, and VuÐ planned and designed the experiment. ´ MC, KP, VuÐ, and VoÐ performed the research. JM, DM, M ´ C, ´ and VuÐ contributed to the interpretation and analysis of results. JM, MC, and DM wrote and approved the manuscript. ´

# FUNDING

This research was part of the project TR31022, which is supported by Ministry of Education, Science and Technological Development of the Republic of Serbia.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.01286/ full#supplementary-material

FIGURE S1 | Diagnostic haplotypes of NA ancestral lines, Chinese germplasm, European varieties and NS varieties, based on selected SNP markers for E1 gene.

FIGURE S2 | Diagnostic haplotypes of NA ancestral lines, Chinese germplasm, European varieties and NS varieties, based on selected SNP markers for E2 gene.

FIGURE S3 | Diagnostic haplotypes of NA ancestral lines, Chinese germplasm, European varieties and NS varieties, based on selected SNP markers for E3 gene.

FIGURE S4 | Diagnostic haplotypes of NA ancestral lines, Chinese germplasm, European varieties and NS varieties, based on selected SNP markers for E4 gene.

TABLE S1 | SNP calls located in the genomic regions of E1, E2, E3, and E4 genes in soybean collections of North-American (NA) ancestral lines, Chinese germplasm, European varieties, and genotypes developed and released at Institute of Field and Vegetable Crops – NS (Novi Sad) varieties, NS breeding lines, high-protein and high-oil genotypes from NS collection.

TABLE S2 | Allelic variation on the E1, E2, E3, and E4 maturity genes in soybean collections of North-American (NA) ancestral lines, Chinese germplasm, European varieties, and genotypes developed and released at Institute of Field and Vegetable Crops – NS (Novi Sad) varieties, NS breeding lines, high-protein and high-oil genotypes from NS collection.

#### REFERENCES

fpls-09-01286 September 3, 2018 Time: 19:24 # 7


from high-latitude cold regions. PLoS One 9:e94139. doi: 10.1371/journal.pone. 0094139



allelic variation at four maturity loci. Mol. Breed. 37:8. doi: 10.1007/s11032-016- 0611-7


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Miladinovi´c, Ceran, Ðor ´ devi ¯ ´c, Baleševi´c-Tubi´c, Petrovi´c, Ðuki´c and Miladinovi´c. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# OliveCan: A Process-Based Model of Development, Growth and Yield of Olive Orchards

Álvaro López-Bernal<sup>1</sup> , Alejandro Morales<sup>2</sup> , Omar García-Tejera<sup>3</sup> , Luca Testi<sup>3</sup> \*, Francisco Orgaz<sup>3</sup> , J. P. De Melo-Abreu<sup>4</sup> and Francisco J. Villalobos1,3

<sup>1</sup> Departamento de Agronomía, ETSIAM, Universidad de Córdoba, Córdoba, Spain, <sup>2</sup> Centre for Crop Systems Analysis, Wageningen University & Research, Wageningen, Netherlands, <sup>3</sup> Department of Agronomy, Institute for Sustainable Agriculture, Spanish National Research Council (CSIC), Córdoba, Spain, <sup>4</sup> Departamento de Ciencias e Engenharia de Biossistemas, Instituto Superior de Agronomia, Universidade de Lisboa, Lisbon, Portugal

#### Edited by:

Mariela Torres, Instituto Nacional de Tecnología Agropecuaria (INTA), Argentina

#### Reviewed by:

Mukhtar Ahmed, Pir Mehr Ali Shah Arid Agriculture University, Pakistan Hartmut Stützel, Leibniz University of Hanover, Germany Peter S. Searles, Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina

> \*Correspondence: Luca Testi lucatesti@ias.csic.es

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 04 September 2017 Accepted: 23 April 2018 Published: 09 May 2018

#### Citation:

López-Bernal Á, Morales A, García-Tejera O, Testi L, Orgaz F, De Melo-Abreu JP and Villalobos FJ (2018) OliveCan: A Process-Based Model of Development, Growth and Yield of Olive Orchards. Front. Plant Sci. 9:632. doi: 10.3389/fpls.2018.00632 Several simulation models of the olive crop have been formulated so far, but none of them is capable of analyzing the impact of environmental conditions and management practices on water relations, growth and productivity under both wellirrigated and water-limiting irrigation strategies. This paper presents and tests OliveCan, a process-oriented model conceived for those purposes. In short, OliveCan is composed of three main model components simulating the principal elements of the water and carbon balances of olive orchards and the impacts of some management operations. To assess its predictive power, OliveCan was tested against independent data collected in two 3-year field experiments conducted in Córdoba, Spain, each of them applying different irrigation treatments. An acceptable level of agreement was found between measured and simulated values of seasonal evapotranspiration (ET, range 393 to 1016 mm year−<sup>1</sup> ; RMSE of 89 mm year−<sup>1</sup> ), daily transpiration (Ep, range 0.14–3.63 mm d−<sup>1</sup> ; RMSE of 0.32 mm d−<sup>1</sup> ) and oil yield (Yoil, range 13–357 g m−<sup>2</sup> ; RMSE of 63 g m−<sup>2</sup> ). Finally, knowledge gaps identified during the formulation of the model and further testing needs are discussed, highlighting that there is additional room for improving its robustness. It is concluded that OliveCan has a strong potential as a simulation platform for a variety of research applications.

Keywords: carbon assimilation, crop model, Olea europaea L., SPAC model, water stress, water uptake

# INTRODUCTION

Olive orchards represent the main component of agricultural systems in many semiarid regions with Mediterranean climate, reaching 10.1 Mha worldwide in 2011 (FAOSTAT, 2014). In countries where the cultivation of this tree species is done in extensive areas, olive cropping systems have become of high relevance not only from an economic perspective, but also from an ecological one. Olive orchards have been traditionally cultivated at low planting densities under low-input rainfed conditions. However, the increase in the demand for oil of recognized and consistently high quality in recent years has triggered the development and adoption of farming techniques aimed to improve productivity, such as localized irrigation, fertigation and mechanical pruning and harvesting. As a result, traditional rainfed olive orchards (<200 trees ha−<sup>1</sup> ) coexist nowadays with new intensive (250–850 trees ha−<sup>1</sup> ) or super-intensive (1200–3000 trees ha−<sup>1</sup> ) irrigated

plantations. The rapid changes in olive farming have raised questions on the economic and environmental sustainability of the different olive cropping systems under present and future climate scenarios. Given that an olive orchard is a complex system, its quantitative study via modeling is a crucial step in understanding its behavior in response to climatic and management factors.

To our knowledge, Abdel-Razik (1989) was the first researcher to describe a model for estimating the productivity of olive orchards. The model describes the growth of different organs by simulating radiation interception, photosynthesis, respiration, and applying simple allocation rules, but it does not consider the effects of planting density, canopy structure or pruning, and many of its equations lack a consistent theoretical basis. Villalobos et al. (2006) proposed a simpler approach to estimate biomass production and yield in olive canopies, based on the concept of annual radiation use efficiency and partitioning coefficients, yet this approach does not give insight about the dynamics of the system, its response to climatic variables (besides solar radiation) or the effect of management. More recently, Morales et al. (2016) presented a mechanistic model of olive oil production in the absence of any biotic or abiotic stress, based on a three-dimensional model of canopy photosynthesis and respiration and dynamic distribution of assimilates among organs. However, water stress is the main limiting factor for biomass production in rainfed and deficit-irrigated olive orchards (Moriana et al., 2003; Iniesta et al., 2009).

Simulating the water balance of an irrigated olive orchard is a particularly challenging task as the trees are typically watered by point-source emitters that keep a small fraction of the surface frequently wet while the remaining area remains dry, unless it rains. This fact results in differences between these two soil areas in relation to soil water content, the water fluxes determining the water balance (i.e., runoff, drainage, redistribution along the soil profile, soil evaporation, and root water uptake) and root length density (Fernández et al., 1991). Therefore, traditional modeling approaches based on the use of the average soil water content can lead to large errors, besides giving a poor insight into the system. One alternative consists of using a two-compartment model that solves the water balance separately for each zone of the soil. In this regard, Testi et al. (2006) proposed a model capable of simulating potential transpiration, separately calculating runoff, drainage and soil evaporation from the wet and dry fractions of the soil surface under localized irrigation. The model was developed to determine the potential irrigation needs of olive orchards, so its use is unfortunately limited to unstressed conditions. Lately, García-Tejera et al. (2017a) have formulated a soil-plant-atmospherecontinuum (SPAC) model capable of calculating root water uptake from soils with spatially heterogeneous distributions of water content and root length densities. Such a model also discretizes the soil into different soil zones and layers and, for the canopy, it considers two leaf classes (i.e., sunlit and shaded). Furthermore, the model by García-Tejera et al. (2017a) provides estimates of gross assimilation (A), offering an opportunity to link the water and carbon balances of olive trees.

The goal of this study is to present and test a processoriented model integrating existing knowledge on the growth and development processes of olive orchards and capable to account for the impacts of water stress, management and climate on their productivity, in the absence of nutrient deficiencies, diseases and pests. The model, hereafter named 'OliveCan' -which comes from 'Olive Canopy-,' was formulated using the models by Testi et al. (2006); Morales et al. (2016) and García-Tejera et al. (2017a) as starting point.

#### MATERIALS AND METHODS

#### Model Description

This section provides an overview of the main features and processes within OliveCan. An in-depth description of the model, along with its equations and scientific rationale is given as Supplementary Material. The code of OliveCan was written in Visual Basic 6.0.

OliveCan is subdivided into three main components (Supplementary Figure S1) that are devoted to the computation of the water and carbon balances of the olive orchard and to simulate the impacts of some management operations. The water and carbon balance components are interdependent (i.e., each one needs data provided by the other) and both of them require information on soil traits and weather data.

Although most processes in the model run at daily time steps, others (i.e., root water uptake, photosynthesis, maintenance respiration and chilling accumulation) are computed over the diurnal course and integrated to yield daily values. The number of sub-day periods per day to be considered is customizable through a user-defined parameter (N). The meteorological input data required consist of daily values of the following variables: maximum and minimum air temperatures (Tmax and Tmin, respectively), average vapor pressure (ea), solar radiation (IG,D) average wind speed (U) and precipitation (P). For those processes simulated at sub-day intervals, OliveCan incorporates routines that disaggregate the daily values of weather data into theoretical diurnal time series.

#### Water Balance Component

This modeling component simulates different physical and physiological processes relevant to olive water use (**Figure 1**). The model solves separately the water balance for two soil zones representing the dry and wetted (by localized irrigation) surface fractions. This approach enables the model to simulate the spatial heterogeneities in soil water content dynamics associated to the use of point-source emitters. Hence, the model considers that the water supplied by irrigation (Irr) is only applied to the wetted soil zone, whereas the dry soil zone is only watered by rainfall. On the other hand, runoff (Rf), soil evaporation (Es), root water uptake (RWU), and drainage (D) represent the water effluxes for both soil zones and are computed independently for each of those. Besides, each soil zone is subdivided into a user-defined number of soil layers (n) which are also customizable in thickness. Vertical soil water fluxes between adjacent soil layers are simulated, but no lateral flow between soil zones is considered.

Daily effective precipitation (Peff) is calculated by discounting rainfall interception by the canopy (Pint) from total daily precipitation (P). Pint is calculated using a simplified version of the model of Gómez et al. (2001) and the resulting Peff is distributed proportionally between the two soil zones as a function of the surface fractions that remain rainfed or are wetted by localized irrigation. With regard to Pint, the canopy is treated as a capacitor capable of storing rain water up to a certain limit determined by canopy dimensions and leaf area index (LAI), according to Gómez et al. (2001). The stored water is subsequently lost by direct evaporation, which is simulated based on the Penman–Monteith equation assuming a null canopy resistance. As in Testi et al. (2006), the aerodynamic resistance is deduced from the model proposed by Raupach (1994), parametrized and validated specifically for olive orchards following Verhoef et al. (1997). The direct evaporation from wet foliage prevents tree transpiration (Ep), until the intercepted water is totally lost.

Runoff and infiltration are calculated following a Soil Conservation Service curve number methodology that was specifically calibrated and validated for different typologies of olive orchards (Romero et al., 2007). The approach requires information on the canopy ground cover (GC) and the soil hydrological condition (SHC) -i.e., an indicative of the capacity of infiltration of the soil when it is wet. The water content at field capacity (θUL), wilting point (θLL) and saturation (θsat) are also needed for the computation of infiltration and all the remaining simulated processes.

Drainage and soil water redistribution processes are simulated by CERES-type sub-models (Jones and Kiniry, 1986), while soil evaporation rates (Es) are estimated using the model of Bonachela et al. (2001). The latter, calculates E<sup>s</sup> with a modified Penman-FAO equation for stage-one evaporation and uses the model of Ritchie (1972) for the soil-limited evaporation stage. For the wetted soil zone, microadvection effects are considered (Bonachela et al., 2001).

The model by García-Tejera et al. (2017a) is used to compute root water uptake (RWU) from each layer in the two soil zones, canopy transpiration (Ep) and gross assimilation (A 0 ). By analogy with the Ohm's law for electric circuits, the model assumes that water transport through the SPAC is driven by differences in water potential and hydraulic resistances. In this regard, three hydraulic resistances are considered: from the soil to the root-soil-interface (Rs), from the soil-root interface to the root xylem (Rr) and from the root xylem to the canopy (Rx). R<sup>s</sup> depends on soil texture, root length density (Lv), soil

water content (θ) (Gardner, 1960). R<sup>r</sup> is a function of L<sup>v</sup> and root permeability, the latter being mediated by θ (Bristow et al., 1984) and temperature (García-Tejera et al., 2016). Finally, R<sup>x</sup> is calculated from xylem anatomical traits and tree height. In the canopy, two leaf populations are considered (i.e., sunlit and shaded). For each one, gross assimilation (A 0 ), stomatal conductance (gs), intercellular CO<sup>2</sup> concentration (Ci) and leaf water potential (9<sup>l</sup> ) are calculated iteratively, considering both the models by Farquhar et al. (1980) and Tuzet et al. (2003). In doing so, the environmental CO<sup>2</sup> concentration (Ca) is explicitly taken into account for calculating both A 0 and g<sup>s</sup> on the one hand. On the other, the model requires information on the intercepted photosynthetically active radiation (IPAR) as well as the sunlit and shaded fractions of the canopy. These inputs are provided by a simple geometric model of radiation interception which assumes a spheroidal shape for the crown and accounts for the shadowing from neighboring trees. Finally, E<sup>p</sup> is estimated from the imposed evaporation equation assuming that the canopy is coupled to the atmosphere, whereas RWU is deduced in each layer of each soil zone from the corresponding water potential differences and hydraulic resistances.

#### Carbon Balance Component

This modeling component is aimed to simulate the growth and development of the trees and the carbon exchange of the orchard (**Figure 2**). First, the model calculates the daily pool of assimilates from the rate of A <sup>0</sup> determined by the SPAC model. Based on experimental evidence, OliveCan also assumes that such pool can be increased to some extent by reserve remobilization (Bustan et al., 2011) or fruit photosynthesis (Proietti et al., 1999). The available assimilates are allocated to growth and respiration, the latter being segregated into maintenance and growth respiration (RESP<sup>M</sup> and RESPg, respectively), of the different organs. In this regard, OliveCan considers six organ types: leaves, shoots (i.e., stems of up to 3 years-old), branches (including the trunk), coarse roots (i.e., roots with secondary growth), fine roots (i.e., absorbing roots) and fruits.

RESP<sup>M</sup> is calculated as a function of temperature and biomass, and it is subtracted directly from the pool of assimilates. Whenever maintenance respiration exceeds the pool of assimilates, the deficit is discounted from the reserve pool. The remaining assimilates are distributed among the different organs with partitioning rules being mediated by phenology. The loss of carbon during the synthesis of new biomass was included by calculating a production value (PV) (Penning de Vries et al., 1974) for each type of organ according to its biochemical composition.

Two phenological stages are considered for the vegetative organs: (i) a dormant stage characterized by an absence of growth that is induced by chilling accumulation during autumn and (ii) a phase of active growth that starts in late winter, by the time average temperature is above a threshold. In relation to the reproductive growth, the date of flowering is determined with the two-phase model by De Melo-Abreu et al. (2004). Fruit growth is assumed to start after a given amount of thermal time is accumulated from the date of flowering and ceases when either maturity or the harvest date is reached.

During the vegetative rest period and provided that fruits are not present, all the available assimilates after discounting maintenance respiration are allocated to a virtual pool of reserves. Such reserve pool is subsequently used for the growth of vegetative organs and fruits during the growth season. Fruit growth can either be source-limited or sink-limited. In the former case, the associated partitioning coefficient is fixed whereas in the latter, it is calculated as a function of the number of fruits (FN), which in turn is modeled as a function of the number of fruits and nodes produced in the previous year. In doing so, the model may be prone to errors in the estimates of productivity and vegetative growth for a given year when performing long runs, but such errors are to be compensated if those model outputs are averaged over biennia. With regard to the vegetative organs, fixed partitioning coefficients are adopted. Whenever fruits are present, the model considers that they become the prioritary sink of assimilates, thus the vegetative partitioning coefficients are applied after discounting the fruit demand from the daily pool of assimilates. Therefore, partitioning coefficients to vegetative organs are assumed to be independent of tree size, management factors and environmental conditions, as in the model of Morales et al. (2016). As a final remark, inspired by the CERES-type models (Jones and Kiniry, 1986), the growth of fine roots is distributed among the different layers in the two soil zones as a function of the size and water content of each soil compartment.

Senescence of leaves and fine roots are simulated using a similar approach to that in the model by Morales et al. (2016). OliveCan takes also into account the conversion of shoots into branches when they exceed 3 years-old. Besides that, the model considers some of the effects of frost events and heat stress. Frost damage is simulated by assuming that a fraction of the standing leaves is defoliated when minimum air temperature falls below a certain temperature threshold. A similar approach is used for simulating the effect of extremely high temperatures during flowering on fruit set: when maximum air temperature exceeds a given threshold, a reduction in the final FN is triggered.

Variables related to canopy characteristics such as leaf area index (LAI) or GC are updated from the estimates of biomass of leaves assuming that the crowns present an spheroidal shape with constant leaf area density (LAD) and ratio of vertical to horizontal canopy radiuses (Rzx). Similarly, the biomass of fine roots in each soil compartment is used to compute root length density (Lv) by adopting a constant specific root length (SRL).

Finally, the soil carbon balance and heterotrophic respiration (RESPH) are computed with an adaptation of the model proposed by Huang et al. (2009) and modified to take into account the effect of soil moisture on the rate of decomposition according to Verstraeten et al. (2006). Then, by considering the different computed fluxes of assimilation and respiration within the orchard, OliveCan provides estimates of the ecosystem respiration (RESPeco) and net ecosystem exchange (NEE).

#### Management Component

Four management operations are considered in OliveCan: tillage, irrigation, harvest and pruning. In the model, tillage operations have an impact on CN whereas irrigation provides an additional

water input for the wetted soil zone. Irrigation amounts and dates can either be defined explicitly by the users or implicitly calculated through a dedicated routine that, at customizable intervals, applies a fraction of the maximum ET lost since the last irrigation. Harvesting takes place on a user-defined day of the year and results in the removal of fruits. At harvest, the model provides an estimate of oil yield (Yoil) by multiplying the dry biomass of fruits and a fixed coefficient representing the ratio of oil content to dry matter. Finally, pruning is simulated by setting a customizable fraction of LAI to be removed (Fprune) and an interval between pruning operations. The model also reduces the biomasses of shoots and branches by the same fraction Fprune. The user should indicate whether pruning residues are incorporated into the soil or exported.

#### Initialization Requirements

Apart from the weather dataset and some orchard (e.g., planting density, age, and latitude) and soil (e.g., depth, θUL, θLL) basic traits, the user is required to enter the initial values of GC and L<sup>v</sup> to deduce the biomasses of the different organs following simple criteria (see Supplementary Material). For the computation of FN in the first season, an estimate of dry yield for the year preceding the start of the simulation is also needed. To initialize the state variables related to phenology, simulations must start at the beginning of a year and the temperature records of the preceding 3 months must be provided. Some simulation settings such as the number of years to simulate and N must also be provided. Finally, the user is to indicate the management operations to be implemented and provide values to their parameters.

#### Model Parameterization

When available, the values of the different parameters were taken from the literature. Supplementary Table S2 provides a complete list with the parameter values used for the simulations and the source from which they were taken. In short, the parameters of the SPAC model were taken from García-Tejera et al. (2017a,b), who, in turn, gathered most of the parameter values from different sources. Parameters related to phenology were obtained from reports by De Melo-Abreu et al. (2004) and López-Bernal et al. (2014, 2017). The studies by Mariscal et al. (2000) and Pérez-Priego et al. (2014) were used for setting the maintenance respiration and PV coefficients, respectively. Parameters related to the calculation of fruit number and yield were taken from several sources, including experimental data (see section "Number of Fruits and Alternate Bearing" in Supplementary Material). The coefficient of oil yield to dry fruit matter was taken from experimental data collected in a hedgerow cv. 'Arbequina' orchard (López-Bernal et al., 2015). Partitioning coefficients were based on findings by Mariscal et al. (2000); Villalobos et al. (2006) and Scariano et al. (2008). Reports from Barranco et al. (2005) and Koubouris et al. (2009) were used to parametrize the routines modeling the impacts of frost damage and heat stress, respectively. Coefficients modulating fine root growth distribution were directly taken from Jones and Kiniry

(1986). Finally, parameters implied in the soil carbon balance were taken from Verstraeten et al. (2006); Huang et al. (2009) and, to a lesser extent, from other studies.

#### Model Testing

Experimental measurements conducted in two mature olive orchards located in the Alameda del Obispo Research Station, Córdoba, Spain (37.8◦N, 4.8◦W, 110 m) were used for assessing the reliability of OliveCan. The climate in the area is typically Mediterranean, with around 600 mm of average annual rainfall and 1390 mm and average annual ET<sup>0</sup> of 1390 mm (Testi et al., 2004), respectively. The soil for both orchards is classified as a Typic Xerofluvent of sandy loam texture and exceeds 2 m in depth, with field capacity (θUL) and permanent wilting point (θLL) water contents of 0.23 m<sup>3</sup> m−<sup>3</sup> and 0.07 m<sup>3</sup> m−<sup>3</sup> , respectively (Testi et al., 2004). Weather data were collected using a station placed 500 m away from the orchards. Within both orchards, irrigation experiments comprising several irrigation treatments were performed. Each irrigation treatment was simulated separately with OliveCan.

#### Experiment I

Extensive information on the orchard characteristics and dataset of Experiment I is provided by Iniesta et al. (2009). In short, the experiment was performed between 2004 and 2006 in a high density cv. 'Arbequina' olive orchard (tree spacing was 7 m × 3.5 m, i.e., 408 trees ha−<sup>1</sup> ) planted in 1997. Irrigation was applied 5 days a week by drip, with seven emitters of 4 L h <sup>−</sup><sup>1</sup> per tree. A randomized complete-block design was used with three replications of 12 trees each, and the following irrigation treatments:


The measurements (only performed for the central trees of the replicates) used for testing the model were oil yield (Yoil, g m−<sup>2</sup> ), seasonal ET and daily transpiration (Ep, mm d−<sup>1</sup> ). With regard to the former, each tree was manually harvested and the fresh yield weighed in the field. Yoil was subsequently determined from sub-samples of 5 kg of fresh fruits. Cumulative ET was determined by water balance for the whole 2005 and 2006 seasons by measuring soil water content with a neutron probe (model 503, Campbell Pacific Nuclear Corp, Pacheco, CA, United States). Eight access tubes were installed between two trees per replicate, normal to tree rows. Measurements were taken at several depths (from 0.075 to 2.65 m deep). Finally, E<sup>p</sup> was measured in 2006 with a sap-flow system device developed and assembled in the IAS-CSIC in Córdoba and described by Testi and Villalobos (2009). The system uses the Compensated Heat Pulse (CHP) method in combination with the Calibrated Average Gradient (CAG) procedure. The probes performed readings every 15 min at 4 depths in the xylem, spaced 10 mm. Six RDI, six CDI and four CON trees were instrumented with two probes per tree, at a height of 30 cm. The outputs of each probe were integrated first along the trunk radius and then around the azimuth angle. Average sap flow records for each treatment were calibrated against the estimates of E<sup>p</sup> deduced from the difference between the measured ET and soil evaporation in a period of several weeks with no rainfall events during the summer. The model of Orgaz et al. (2006) was used to calculate soil evaporation. The calibrated sap flow data have not been published so far.

Values of GC, LAD, and Rzx required to initialize the model were taken from measurements of tree silhouettes. A record of Ydry of the year preceding simulations was also considered. Initial L<sup>v</sup> values were taken from records measured by Moriana (2001) for the trees of Experiment II.

#### Experiment II

Extensive information on the orchard characteristics and dataset of Experiment II is provided by Moriana et al. (2003). In short, the experiment was performed between 1997 and 1999 in a high density cv. 'Picual' olive orchard (tree spacing was 6 m × 6 m, i.e., 278 trees ha−<sup>1</sup> ) of 18 years of age. Irrigation was applied 5 days a week by drip, with four emitters of 4 L h−<sup>1</sup> per tree. A randomized complete-block design was used with three replications of 16 trees each, and the following irrigation treatments:


The measurements (only performed for the central trees of the replicates) used for the model were Yoil and seasonal ET. On the one hand, trees were harvested between December 15th and January 15th for the 3 years. Individual fruit weight of each tree was measured and a subsample of 150 fruits from each tree was used for determining oil content. On the other, cumulative ET was determined by water balance for each season by measuring soil water content with a neutron probe (model 503, Campbell Pacific Nuclear Corp, Pacheco, CA, United States). Eight access tubes were installed between two trees per replicate in the four irrigation treatments and six tubes were placed in the rainfed treatment. Measurements were taken were performed at several depths (from 0.075 to 2.4 m deep).

Values of GC, LAD, and Rzx required to initialize the model were taken from dedicated measurements. A record of Ydry of the year preceding simulations was also considered. Initial L<sup>v</sup> values were taken from records measured by Moriana (2001).

#### Statistical Analysis

Model performances in reproducing measured data were assessed using mean absolute error (MAE, from 0 to +∞, optimum 0), root mean square error (RMSE; from 0 to +∞, optimum 0), coefficient of residual mass (CRM, from -∞ to +∞, optimum 0) and modeling efficiency (EF, from −∞ to 1, optimum 1):

$$\text{MAE} = \sum\_{\mathbf{i}} \,^\mathbf{n} |\mathbf{S}\_{\mathbf{i}} - M\_{\mathbf{i}}| \, / n \tag{1}$$

$$\text{RMSE} = \sqrt{\sum\_{\mathbf{i}}^{n} \left(\mathbf{S}\_{\mathbf{i}} - M\_{\mathbf{i}}\right)^{2} / n} \tag{2}$$

$$\text{CRM} = 1 - \sum\_{i}^{n} \text{S}\_{i} \left/ \sum\_{i}^{n} M\_{i} \right. \tag{3}$$

$$\text{EF} = \frac{\sum\_{i=1}^{n} \left(M\_{\text{i}} - \bar{M}\right)^{2} - \sum\_{i=1}^{n} \left(\text{S}\_{\text{i}} - M\_{\text{i}}\right)^{2}}{\sum\_{i=1}^{n} \left(M\_{\text{i}} - \bar{M}\right)^{2}} \tag{4}$$

Where M<sup>i</sup> is the ith measured variable, M¯ is the average value of all measurements, S<sup>i</sup> is the ith simulated variable and n is the number of measured values. In addition, the slope, intercept and coefficient of determination (r 2 ) obtained by regressing the simulated and measured values were also used.

#### RESULTS

Measured and simulated values of Yoil for the two experiments are also presented in **Table 1** in relation to the year and the irrigation treatment. Field measurements showed two common patterns, irrespective of the experiment. On the one hand, there were consistent differences in Yoil between treatments, with the fully irrigated treatments (i.e., CON) exhibiting higher values than the deficit or rainfed ones. On the other hand, measurements revealed "high" Yoil in the first and third experimental seasons and "low" Yoil in the intermediate one. The only exception to this rule occurred in the second biennia of Experiment II for the DRY and RDI treatments, probably as a consequence of the high level of water stress reached in 1999 (Moriana et al., 2003). OliveCan reproduced well both the differences between treatments and the general alternating trend in Yoil for both experiments. Pooling all the data together, MAE and CRM were close to zero, RMSE was 63.2 g m−<sup>2</sup> and EF was 0.48. The regression analysis yielded an r <sup>2</sup> of 0.51 and a slope of 0.65. Slightly better results in terms of regression parameters, RMSE

TABLE 1 | Observed (Obs.) and simulated (Sim.) values of seasonal evapotranspiration (ET) and oil yield (Yoil) in the model tests using the experimental data from Iniesta et al. (2009) (Experiment I) and Moriana et al. (2003) (Experiment II).


<sup>∗</sup>Not measured.



MAE is mean absolute error, RMSE is root mean square error, CRM is coefficient of residual mass, EF is modeling efficiency and r<sup>2</sup> is coefficient of determination.

and EF were found when analyzing the data grouped in biennia (**Table 2**).

A good agreement was found between model estimates of seasonal ET and the measurements by Moriana et al. (2003) and Iniesta et al. (2009) (**Table 1**). Simulated values were close to experimental data and tracked reasonably well the differences between treatments. MAE and RMSE were 36.04 mm and 89.4 mm, respectively. The statistical analysis revealed no systematic bias, with CRM close to 0, and values of EF of 0.62 (**Table 2**). Nevertheless, the model was found to underestimate seasonal ET for the treatments receiving more irrigation, which led to a rather low slope for the linear regression fit (0.59) between simulated and observed values.

Beyond the seasonal scale, OliveCan also provided good estimates of daily Ep, as evidenced by comparing the model outputs with the calibrated sap flow measurements recorded in Experiment I (**Figure 3**). The model was found to slightly underestimate E<sup>p</sup> in winter and late autumn, particularly for the CON treatment, which led to a CRM of 0.14. In addition, E<sup>p</sup> was slightly overestimated during the RDI midsummer nonirrigated period. However, simulated and observed values were generally very close in the three treatments. Considering the three treatments together (n = 1095), the comparison between simulated and observed E<sup>p</sup> values yielded a MAE of 0.17 mm d−<sup>1</sup> , a RMSE of 0.32 mm d−<sup>1</sup> , a EF of 0.84 and a satisfactory linear regression fit (**Table 2**).

**Figure 4** illustrates the time course of measured and modeled E<sup>p</sup> for the CON and CDI treatments over a typical sunny summer day of Experiment I (July 21st 2006, DOY 202). The four plotted E<sup>p</sup> curves follow a bell shape, the CON ones exhibiting higher values than those of CDI. Apart from that, **Figure 4** shows a remarkable lag between the observed and the simulated daily course of Ep, regardless of the treatment, with the diurnal trends of E<sup>p</sup> being anticipated by the model. Hence, OliveCan tended to overestimate E<sup>p</sup> in the hours following sunrise and to underestimate it around sunset.

#### DISCUSSION

#### Model Performance

Model tests generally revealed a high level of agreement between simulations and experimental measurements. Given the variety of the simulated treatments and the many assumptions that a model like OliveCan must take, we found the results satisfactory. Notwithstanding that, there were situations in which model estimates departed from observations. For example, some discrepancies were found for some of the simulations of Yoil, ET (**Table 1**) and E<sup>p</sup> (**Figure 3**), but, considering that the general trends and differences between treatments were captured by the model, we believe that the results are highly acceptable. Some of the divergences between measured and simulated Yoil might be attributed to the fact that the approach followed by OliveCan to simulate alternate bearing is limited, as far as the physiological bases of alternate bearing are not completely understood yet (Connor, 2005; Dag et al., 2010). However, biennial comparisons (**Table 2**) only improved slightly the results. Apart from that, the remarkable lag between the simulated and measured diurnal courses of E<sup>p</sup> (**Figure 4**) was to be expected: measurements were performed in the trunk with sap flow sensors and OliveCan does not simulate the buffering effect of the water stored in aboveground organs (Cermák et al., 2007). Also, the model assumes that stomatal conductance responds instantaneously to changes in environmental conditions, but the slow dynamics of stomatal opening and closing can cause lags in diurnal transpiration (Vialet-Chabrand et al., 2013).

Considering all the simulations together, the maximum simulated oil yield was 358 g m−<sup>2</sup> (**Table 1**), which is comparable to the maximum values estimated by the model of Morales et al. (2016) and to available experimental data (Villalobos et al., 2006; Pastor et al., 2007). Simulated values of radiation use efficiency for oil production (i.e., the amount of oil produced per unit of intercepted PAR) averaged over biennia ranged between 0.17 and 0.10 g MJ−<sup>1</sup> . These estimates are within the range of variation found by Villalobos et al. (2006) across a wide range of commercial orchards in Southern Spain.

Overall, the results of all the aforementioned comparisons suggest that model performance is fairly satisfactory. However, further testing against experimental data taken from different environmental conditions and orchard characteristics seems highly desirable. This would help to provide additional evidence on the predictive power of OliveCan, or else to identify situations for which model accuracy could be improved through either better calibrations or reformulation of some routines. Apart from that, it should be noted that the reliability of OliveCan for estimating certain output parameters (e.g., NEE, RESPH) has not been tested specifically in the present study, which should also be the focus of future research efforts.

#### Model Applicability

Considering its mechanistic approach, the vast quantity of simulated processes and its potential uses, OliveCan represents a momentous step forward in relation to previous olive growth

simulation models. In this regard, OliveCan enables one to assess the combined effects of management operations and weather over crop performance for different olive orchard and soil typologies both under unstressed and water deficit conditions. Thus, the model shows potential for a broad range of research applications. For instance, OliveCan seems particularly suitable for assessing the performance of olive orchards under future climatic scenarios, as the model explicitly accounts for the multiple effects of reduced rainfall and increased environmental CO<sup>2</sup> and temperature for the water and carbon balances of the orchard and the development of trees.

Obviously, the comprehensive nature and the wide range of simulated processes come at the expense of both model complexity and high input requirements. The latter is likely to be its main limitation, as far as some of the inputs (e.g., soil depth, L<sup>v</sup> distribution) are not easy to measure in the field. In any case, it is noteworthy to emphasize that OliveCan has not been primarily conceived as a decision support system for farmers, but as a research tool.

### Further Research

During the development of the model, it became apparent that our current understanding of some of the physiological processes to be simulated was limited. For example, timing of vegetative bud break, dynamics of leaf senescence, fruit photosynthesis and the use of reserves are among the phenomena that have received less attention in the literature. Also, OliveCan is missing a sub-model aimed to properly simulate the dynamics of oil accumulation during the fruit growth period. Further research on these and other topics (e.g., alternate bearing) are clearly needed and might result in model improvements through either a more consistent parametrization or the formulation of better equations for simulating such processes.

Further research regarding genetic variability in model parameters is also desirable. With the exception of those related to the simulation of flowering date (De Melo-Abreu et al., 2004) and frost damage (Barranco et al., 2005), all parameters have been taken from past experiments carried out either with only one cultivar each ('Arbequina' being the most frequent) or averaging the results obtained for a few of them. Although the scarce literature does not allow us to disentangle how many of these crop parameters are cultivar-specific, it is clear that exploring their genetic variability might be important for enhancing model reliability. Moreover, the quantification of such cultivar variability may be used for evaluating its impact on tree physiology and productivity under different management, weather or orchard characteristics using OliveCan, which may be useful for breeding purposes.

Finally, future improvements of OliveCan might include additional sub-models for simulating nutrient uptake and the impact of pests and diseases. Apart from that, the model shows potential for being adapted to other tree species, so its interest may not be only restricted to olive researchers.

#### CONCLUSION

The model presented here targets the simulation of the interactions between olive trees and their environment through a detailed characterization of the water and carbon balances of

the orchard as affected by weather variables, soil attributes and management operations. The generally high level of agreement found between measured and simulated data evidence the suitability of OliveCan for estimating olive orchard dynamics. These results encourage the application of the model to simulate the growth, carbon exchange and water relations of olive orchards in a wide range of research contexts, including studies on the performance of olive trees under climate change scenarios. The development of OliveCan has also highlighted significant knowledge gaps in relation to some physiological processes and the cultivar specificity of some of the parameters. Further research on these aspects may contribute to improve the reliability of the model.

#### AUTHOR CONTRIBUTIONS

All authors played a significant role in the conception and development of the model. FV led out the coding, with contributions from ÁL-B, AM, OG-T, and LT. ÁL-B, LT, and FV gathered the datasets for testing the model. ÁL-B led out the writing with significant contributions from all co-authors.

# REFERENCES


#### FUNDING

The research leading to these results has received funding from Ministerio de Economía y Competitividad (Grant Nos. AGL-2010-20766 and AGL2015-69822), from Junta de Andalucía (Grant No. P08-AGR-04202), from the European Community's Seven Framework Programme-FP7 (KBBE.2013.1.4-09) under Grant Agreement No. 613817. 2013–2016 "MODelling vegetation response to EXTREMe Events" (MODEXTREME, modextreme.org) and from ERA-NET FACCE SURPLUS (Grant No. 652615, project OLIVE-MIRACLE), the latter co-funded by INIA (PCIN-2015-259). Besides, ÁL-B was funded by a postdoctoral fellowship ('Juan de la Cierva-Formación 2015' Programme, FJCI-2015-24109) from Ministerio de Economía y Competitividad.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.00632/ full#supplementary-material

soil multi-compartment solution. Plant Soil 412, 215–233. doi: 10.1007/s11104- 016-3049-0



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 López-Bernal, Morales, García-Tejera, Testi, Orgaz, De Melo-Abreu and Villalobos. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Association Mapping for Important Agronomic Traits in Safflower (Carthamus tinctorius L.) Core Collection Using Microsatellite Markers

Heena Ambreen† , Shivendra Kumar † , Amar Kumar, Manu Agarwal, Arun Jagannath\* and Shailendra Goel\*

Department of Botany, University of Delhi, New Delhi, India

#### Edited by:

Johann Vollmann, Universität für Bodenkultur Wien, Austria

#### Reviewed by:

Leonardo Velasco, Instituto de Agricultura Sostenible (CSIC), Spain Ahmad Arzani, Isfahan University of Technology, Iran Ankica Kondic-Spika, Institute of Field and Vegetable Crops, Serbia

#### \*Correspondence:

Arun Jagannath jagannatharun@yahoo.co.in Shailendra Goel shailendragoel@gmail.com

†These authors have contributed equally to this work.

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 14 October 2017 Accepted: 13 March 2018 Published: 29 March 2018

#### Citation:

Ambreen H, Kumar S, Kumar A, Agarwal M, Jagannath A and Goel S (2018) Association Mapping for Important Agronomic Traits in Safflower (Carthamus tinctorius L.) Core Collection Using Microsatellite Markers. Front. Plant Sci. 9:402. doi: 10.3389/fpls.2018.00402 Carthamus tinctorius L. (safflower) is an important oilseed crop producing seed oil rich in unsaturated fatty acids. Scarcity of identified marker-trait associations is a major limitation toward development of successful marker-assisted breeding programs in safflower. In the present study, a safflower panel (CartAP) comprising 124 accessions derived from two core collections was assayed for its suitability for association mapping. Genotyping of CartAP using microsatellite markers revealed significant genetic diversity indicated by Shannon information index (H = 0.7537) and Nei's expected heterozygosity (I = 0.4432). In Principal Coordinate Analysis, the CartAP accessions were distributed homogeneously in all quadrants indicating their diverse nature. Distance-based Neighbor Joining analysis did not delineate the CartAP accessions in consonance with their geographical origin. Bayesian analysis of population structure of CartAP demonstrated the unstructured nature of the association panel. Kinship analysis at population (Gij) and individual level (Fij) revealed absence of or weak relatedness between the CartAP accessions. The above parameters established the suitability of CartAP for association mapping. We performed association mapping using phenotypic data for eight traits of agronomic value (viz., seed oil content, oleic acid, linoleic acid, plant height, number of primary branches, number of capitula per plant, 100-seed weight and days to 50% flowering) available for two growing seasons (2011–2012 and 2012–2013) through General Linear Model and Mixed Linear Model. Our study identified ninety-six significant marker-trait associations (MTAs; P < 0.05) of which, several MTAs with correlation coefficient (R 2 ) > 10% were consistently represented in both models and in both seasons for traits viz., oil content, oleic acid content, linoleic acid content and number of primary branches. Several MTAs with high R 2 -values were detected either in a majority or in some environments (models and/or seasons). Many MTAs were also common between traits (viz., oleic/linoleic acid content; plant height/days to 50% flowering; number of primary branches/number of capitula per plant) that showed positive or negative correlation in their phenotypic values. The marker-trait associations identified in this study will facilitate marker-assisted breeding and identification of genetic determinants of trait variability.

Keywords: safflower, association mapping, SSR markers, core collections, population structure, kinship analysis

# INTRODUCTION

Carthamus tinctorius L., commonly known as "safflower," contains seed oil with significantly high levels of nutritionally desirable unsaturated fatty acids, which is unique among oilseed crops (Fernandez-Martinez et al., 1993). Safflower is also used for extraction of dyes, several medicinal applications and as a plant factory for production of pharmaceuticals (Dajue and Mündel, 1996; Flider, 2013; Carlsson et al., 2014; Zhou et al., 2014). Additionally, its ability to grow under low moisture and high salinity conditions would give it a competitive edge over other oilseed crops in arid zones (Kaya et al., 2011; Bahrami et al., 2014; Yeilaghi et al., 2015; Ebrahimi et al., 2016). Currently, safflower is cultivated in around 20 countries in a total area of 1,140,002 hectares and production of 948,516 tons (FAOSTAT)<sup>1</sup> . The major producers of safflower are the Russian Federation (286,351 tons), Kazakhstan (167,243 tons), Mexico (121,767 tons), USA (99,830 tons), Turkey (58,000 tons), and India (53,000 tons) accounting for ∼71% of total production (FAOSTAT)<sup>1</sup> . In spite of significant fluctuations in acreage under safflower cultivation, India was the highest average producer of the crop during 1994 to 2016. Nevertheless, safflower has not been able to create a niche for itself as a major oilseed crop. The primary factors that deter its cultivation are low yield and oil content, susceptibility to several biotic stresses, presence of spines, and lack of seed dormancy (Nimbkar, 2008). Genetic improvement of safflower is therefore, essential to increase its acceptability and utility as an oilseed crop of global importance.

A primary requisite for crop improvement is the presence of molecular and phenotypic diversity and identification of genes or quantitative trait loci (QTL) governing traits of agronomic importance. Genetic diversity in safflower has been analyzed using several molecular markers viz., Random Amplified Polymorphic DNA (RAPD), Inter Simple Sequence Repeat (ISSR), Amplified Fragment Length Polymorphism (AFLP), Simple Sequence Repeats (SSRs), and Single Nucleotide Polymorphism (SNPs) (Johnson et al., 2007; Yang et al., 2007; Amini et al., 2008; Khan et al., 2009; Sehgal et al., 2009; Chapman et al., 2010; Lee et al., 2014; Pearl and Burke, 2014; Ambreen et al., 2015; Kumar et al., 2015). These studies revealed high genetic differentiation among global safflower accessions and validated few "centers of similarity" of safflower that were identified based on morphological traits (Johnson et al., 2007; Chapman et al., 2010; Pearl and Burke, 2014; Kumar et al., 2015). Studies on development of linkage maps and tagging of agronomic traits using RAPDs, Restriction Fragment Length Polymorphism (RFLP), SSRs and SNP markers have been initiated in safflower (Hamdan et al., 2008, 2012; Mayerhofer et al., 2010; García-Moreno et al., 2011; Pearl et al., 2014). Linkage analysis and/or QTL mapping is a conventional method of identifying genomic regions governing simple and/or complex traits but requires development of bi-parental mapping populations, which is a time consuming process. Moreover, the allelic variation captured in QTL mapping is limited due to the use of bi-parental crosses and fewer recombination events are tested leading to low mapping resolution (Flint-Garcia et al., 2005). In contrast, association mapping (AM) is a faster and more efficient approach for high-resolution evaluation of complex traits and appears as a promising tool to circumvent limitations of linkage mapping (Yu and Buckler, 2006; Oraguzie and Wilcox, 2007; Abdurakhmonov and Abdukarimov, 2008). AM identifies relationships between traits and genetic polymorphisms in a heterogeneous assemblage of unrelated individuals using naturally occurring recombination events, thus enabling fine-scale mapping of traits. It has emerged as a useful approach for identifying marker-trait associations for several agronomic traits in various crop species (Abdurakhmonov and Abdukarimov, 2008; Zhu et al., 2008; Blair et al., 2009; Yang et al., 2010; Li et al., 2011; Upadhyaya et al., 2012, 2013; Xu et al., 2013; Zhang et al., 2014). Till date, Ebrahimi et al. (2017) is the only study that has explored markertrait associations in safflower through association mapping using AFLP markers.

The choice of germplasm used for association mapping is critical and should incorporate a wide range of diversity capturing maximum number of historical recombination events (Flint-Garcia et al., 2005; Yan et al., 2011). Core collections have performed well as association mapping panels in various crop species (Blair et al., 2009; Li et al., 2011; Upadhyaya et al., 2013; Soto-Cerda et al., 2014; Zhang et al., 2014) since they are enriched with the maximum possible range of diversity (genetic and phenotypic) existing in the crop germplasm. Another advantage offered by core collections is that these collections often include unrelated individuals (accessions) which decreases the chances of identifying spurious markertrait associations due to pre-existing population structure. In safflower, evaluation of global germplasm collections identified significant diversity for most of the desirable traits (oil content, fatty acid composition, resistance to biotic and abiotic stresses, and spines) (Ashri, 1975; Dwivedi et al., 2005; Kumar et al., 2016). To facilitate genetic dissection of complex traits, small, operational core collections have been developed in safflower (Johnson et al., 1993; Dwivedi et al., 2005; Kumar et al., 2016). These core collections can be effectively used for association mapping provided they demonstrate presence of high genetic variance (for better mapping resolution) and possess weak population structure and low kinship association among individuals (to avoid spurious associations between molecular markers and functional loci) during association analysis (Pritchard et al., 2000).

The present study describes evaluation of a safflower panel of 124 accessions derived from two core collections (reported earlier from our group; Kumar et al., 2016) for its suitability for association mapping and identification of SSR loci associated with eight traits of agronomic value (plant height, number of primary branches, number of capitula per plant, 100-seed weight, days to 50% flowering, seed oil content, oleic acid content and linoleic acid content) in the crop. The current work will facilitate mining of elite genes or loci for safflower breeding and conservation.

Ambreen et al. Association Mapping in Safflower

<sup>1</sup>www.fao.org/faostat/en/#data/QD

# MATERIALS AND METHODS

# Plant Material and Phenotypic Evaluation

The present work utilize C. tinctorius L. accessions selected from two composite core collections of safflower, CartC1 (57 accessions) and CartC2 (106 accessions; Kumar et al., 2016). We merged accessions from these two core collections, removed redundancy (44 accessions were common to the two core collections) and included five Indian cultivars (Sharda, Manjira, Annigeri, PBNS-12 and TSF-1) assembling a final association panel with a non-redundant set of 124 accessions, which will hereafter be referred to as "CartAP" collection. Seed material of the accessions was obtained from USDA\_ARS, WRPIS, Pullman, WA, USA while the Indian cultivars were procured from National Bureau of Plant Genetic Resources, New Delhi, India and University of Agricultural Sciences, Dharwad, India. Details of the plant material used including their plant identification (PI) number, geographical origin and information regarding their distribution in the two core collections are provided in Supplementary Table 1. Phenotypic evaluation of eight agronomic traits utilized in the present work viz., plant height, number of primary branches, number of capitula per plant, 100 seed weight, days to 50% flowering, seed oil content, oleic and linoleic acid was described by Kumar et al. (2016). The safflower accessions were evaluated in field conditions for two consecutive growing seasons (2011–2012 and 2012–2013) for the studied traits. The climatic variable data for the two growing seasons is provided in Supplementary Table 2.

# Isolation of Genomic DNA and SSR Genotyping

Genomic DNA was extracted from leaf tissues of 10 weekold seedlings of each accession as described by Ambreen et al. (2015). Ninety-three polymorphic SSR markers developed earlier in safflower (Ambreen et al., 2015) were used for genotyping the accessions. A three-primers PCR protocol (Perry, 2004) was followed for genotyping and the reaction mixture included 50 ng of template DNA, 1.5 mM MgCl2, 0.2 mM of each dNTPs, 0.05µM IR700-labeled M13 primer, 0.05µM M13-tailed forward primer, 0.05µM reverse primer and 0.75 units of Taq DNA polymerase (Biotools, Spain). PCR amplifications were conducted on Veriti thermal cycler (Applied Biosystems, USA) under the following cycling conditions: initial denaturation at 96◦C for 5 min followed by 28–30 cycles of 96◦C for 45 s, appropriate annealing temperature (Ta◦C, ranging from 55 to 65◦C) for 30 s, extension at 72◦C for 1 min and final extension at 72◦C for 7 min. The IR700-labeled amplification products were size fractionated on 6.5% PAGE using 4,300 DNA analyzer system and analyzed following instructions provided in the user manual (LICOR, USA).

#### Genetic Analysis

Genetic diversity statistics including number of alleles (Na), major allele frequency (MAF), observed heterozygosity (Ho), expected heterozygosity (He) and degree of polymorphism generated by each marker (computed as polymorphism information content; PIC) were calculated using PowerMarker version 3.25 (Liu and Muse, 2005). Rare/private allelic richness per locus (Rp) was determined based on rarefaction approach using HP-Rare v.1.0 (Kalinowski, 2005). Rarefaction allows evaluation of the allelic richness independent of the sample size (Kalinowski, 2004). Hardy Weinberg equilibrium and assessment of genetic diversity through expression of Shannon diversity index (H) and Nei's expected gene diversity (I) was tested using POPGENE version 1.32 (Yeh et al., 1999). Information about the genetic location of these SSR markers on safflower linkage groups was derived from Bowers et al. (2016) through BLASTN search using standalone Blast+ version 2.2.26 (ftp://ftp.ncbi.nlm.nih. gov/blast/executables/blast+/2.2.26/). The genomic sequences of 93 SSR loci were used as query for search against the genetic map assembled into twelve linkage groups of safflower by Bowers et al. (2016).

# Distance-Based Genetic Analysis

Genetic relationships between sampled accessions were elucidated through construction of an unrooted Neighbor Joining (NJ) dendrogram based on simple matching coefficient using Darwin version 6.0 (Perrier and Jacquemoud-Collet, 2006). A bootstrap value of 1,000 replicates was used to test the reliability of the NJ dendrogram. Principal Coordinate Analysis (PCoA) was performed using Darwin version 6.0.

# Analysis of Population Structure Using Bayesian Method

The population genetic structure was studied using Bayesian clustering method implemented in STRUCTURE version 2.3.4 (Pritchard et al., 2000). An admixture model and correlated allele frequencies were chosen for estimating proportion of ancestral contribution in each accession. We tested various K-values ranging from 1 to 25 with 10 independent replications at each K, 100,000 generations burnin period and 200,000 Markov Chain Monte Carlo (MCMC) repetitions. The analysis was performed independent of the geographical origin of accessions. The optimal K-value for the dataset was obtained by assessment of the results using STRUCTURE HARVESTOR (Earl, 2012). Accessions with membership proportions (Q-value) >80% were considered as pure and part of their corresponding cluster while accessions with membership proportions lesser than 80% were adjudged as admixtures. Calculation of pairwise FST and Analysis of molecular variance (AMOVA) between the sub-populations of STRUCTURE were performed using GenAlEx version 6.5 (Peakall and Smouse, 2012) with 1,000 permutations.

# Kinship Coefficient

Relatedness between individuals was estimated through computation of kinship coefficients (Fij; individual level) following Loiselle et al. (1995). Fij measures the relative probabilities of identity in descent for random genes between two individuals, i and j. Kinship coefficient matrix between all pairs of accessions of the association panel was generated using SPAGeDi v 1.5 (Hardy and Vekemans, 2002). All the negative Fij-values were treated as zero (Hardy and Vekemans, 2002; Yu et al., 2006). The same program was used for evaluation of average kinship coefficient (Gij; population level) between sub-populations inferred using STRUCTURE.

#### Association Mapping

Association mapping was performed for eight quantitative traits viz., seed oil content, oleic acid content, linoleic acid content, 100-seed weight, plant height, number of primary branches, days to 50% flowering and number of capitula per plant for the two seasons independently. Only SSR markers with known positions on linkage groups were used for association mapping. Both General linear model (GLM) and Mixed linear model (MLM) were applied for assessment of marker-trait associations (MTA) in TASSEL v 5.0 (Bradbury et al., 2007) following the user manual. The GLM model includes population structure (Q matrix) but does not include kinship relationships (K matrix) and can often identify false positives in association mapping. To overcome this limitation, the MLM was proposed which incorporates both population structure (Q) and kinship (K) information to decrease spurious associations (Yu et al., 2006). The association between marker and trait was considered significant at a P < 0.05. The phenotypic variation explained by each marker-trait association was studied through correlation coefficient (R 2 ).

#### RESULTS

#### SSR-Based Genotyping and Statistical Analysis

Genotyping of 124 safflower accessions of the CartAP collection was performed using 93 polymorphic SSR markers. The SSR markers generated a total of 311 alleles with an average of 3.34 alleles per locus. The number of alleles per locus ranged from 2 to 8 (Supplementary Table 3). The highest number of alleles (8) was detected for loci NGSaf\_265 and NGSaf\_282 (**Table 1**). The major allele frequency (MAF) was high and varied from 0.342 to 0.97 with an average of 0.66 (Supplementary Table 3). The observed heterozygosity (Ho) had a low average value of 0.112 and ranged from 0 to 0.96 (Supplementary Table 3). The gene diversity or expected heterozygosity (He) of studied markers ranged from 0.016 for NGSaf\_117 to 0.76 for NGSaf\_281 with an average of 0.438 (Supplementary Table 3). Information on genetic variability at the 93 SSR loci among accessions of the CartAP collection is summarized in **Table 1**.

The relative informativeness of these markers was estimated by measurement of polymorphic information content (PIC) for each locus. The PIC-value averaged at 0.38 and ranged from 0.02 for NGSaf\_117 to 0.73 for NGSaf\_281. Nineteen microsatellite loci were highly polymorphic with PIC > 0.5 and ranged from 0.502 to 0.736 (Supplementary Table 3). A major proportion of microsatellite loci (63) were moderately polymorphic with PIC-values between 0.25 to 0.5 while 12 loci generated low polymorphism with PIC < 0.25. Based on marker statistics, the NGSaf\_281 locus was most informative and had highest utility value among the 93 SSR loci. We were able to assign map positions to 48 out of the 93 SSR loci based on the genetic map of safflower reported by Bowers et al. (2016). SSR locus specific details (Na, MAF, He, Ho, and PIC) and their map positions are provided in Supplementary Table 3. Mean allelic richness and private allelic richness (number of unique alleles) TABLE 1 | Summary statistics for genetic variability at 93 SSR loci.


He, gene diversity or expected heterozygosity; Ho, observed heterozygosity; PIC, polymorphic information content, # Values for each parameter are provided in parenthesis.

were measured based on rarefaction approach implemented in HP-rare v 1.0. The mean allelic richness per locus was estimated as 1.287 while private allelic richness was 0.0186. At a significance level of P < 0.05, all loci except NGSaf\_300 deviated from Hardy-Weinberg equilibrium (HWE) while at P < 0.01, all loci deviated from HWE (Supplementary Table 3).

#### Genetic Diversity

High genetic differentiation was observed among CartAP accessions based on the genetic diversity indices—Shannon information index (H; 0.7537) and Nei's expected heterozygosity (I; 0.4432). Genetic diversity in the 10 proposed regional gene pools and regions of secondary introduction of safflower (America and Australia) were also evaluated (**Table 2**). Among the regional gene pools, the Shannon information index varied from 0.4158 to 0.7211 while Nei's expected heterozygosity ranged from 0.2595 to 0.4265. Based on diversity indices, accessions from the Indian subcontinent and American region were the most diverse with an estimated Shannon information index and Nei's expected heterozygosity value of H = 0.7028 and 0.7211 and I = 0.4265 and 0.4227, respectively. Near East (H = 0.4158; I = 0.2595) and Turkey (H = 0.4650; I = 0.2981) were genetically least diverse based on our assessment using SSR markers. Withinregion segregation assessed using SSR markers revealed the highest number of polymorphic loci among accessions from the Indian subcontinent (91) followed by America (90) and Iran-Afghanistan (85). The least number of polymorphic loci were recorded for accessions from Near East (60) and Turkey (60; **Table 2**).

# Genetic Relationships and Population Structure

#### Neighbor Joining (NJ) Dendrogram and Principal Coordinate Analysis (PCoA)

The genetic relatedness between safflower accessions of CartAP was analyzed by construction of Neighbor-Joining (NJ) dendrogram based on simple matching coefficient (**Figure 1A**). The 124 accessions were grouped into three major clusters (named as NJ I–NJ III) with 7, 48, and 69 accessions, respectively. However, no distinction could be drawn among these groups based on their geographical origin with each of the major cluster exhibiting heterogeneous clustering of accessions (**Figure 1A**). We also obtained low bootstrap support (< 50%) for majority

TABLE 2 | Genetic diversity indices of CartAP accessions based on their regional gene pool distribution.


N, Number of accessions; P<sup>l</sup> , Number of polymorphic loci; PP<sup>l</sup> , Percent polymorphism; H, Shannon information index; I, Nei's gene diversity.

of the nodes of the NJ dendrogram and only very few internal nodes exhibited high bootstrap value > 50%. All the five Indian cultivars analyzed in the present study clustered together in NJ II indicating their narrow genetic base. Distance-based NJ analysis indicated lack of genetic relationships among these accessions. In Principal Coordinate Analysis (PCoA) based on SSR data, the principal axes 1 and 2 explained 29 and 25% of the total molecular variance, respectively (**Figure 1B**). In agreement with the clustering pattern of NJ dendrogram, aggregation of accessions into discrete groups relative to their geographical origin was not observed in the PCoA plot. The CartAP accessions tend to disperse homogeneously in all the four quadrants indicating the diverse nature of sampled accessions (**Figure 1B**).

#### Population Structure and Differentiation

We also analyzed the underlying population structure of CartAP through Bayesian-based approach using STRUCTURE v 2.3.4. Log mean probability and change in log probability (1K) were determined using STRUCTURE HARVESTER, which generated the highest peak at K = 2 (**Figure 2A**). Based on membership proportions, safflower accessions with ≥ 80% of shared ancestry were delineated into two major clusters (STR I and STR II) with 46 and 62 accessions, respectively (**Figure 2B**). Each of these clusters included diverse accessions originating from different regional pools. Sixteen accessions with < 80% of shared ancestry were categorized as admixtures. Additional smaller peaks observed at K = 4 and K = 6 implied presence of subgroups within the two major groups (**Figure 2A**). Therefore, an independent STRUCTURE run was performed for STR I and

FIGURE 1 | (A) Neighbor Joining dendrogram elucidating the genetic relationships between 124 CartAP accessions using 93 SSR loci based on simple matching coefficient. NJ clusters (NJ I–NJ III) are demarcated through dashed lines. (B) Scatter plot for Principal Coordinate Analysis (PCoA) exhibiting the distribution of CartAP accessions on two main axes. Primary axes 1 and 2 captured 29 and 25% of the total variance, respectively. Color codes used for different regional pools are provided. STR II with K-values ranging from 1 to 10. Sub-clustering of STR I identified the highest peak at K = 2 (**Figure 3A**). The two sub-clusters designated as STR I(a) and STR I(b), contained 32 and 14 accessions, respectively (**Figure 3B**). Sub-clustering of the second major group, STR II yielded highest peak at K = 5 (**Figure 3C**). The five sub-clusters were designated as STR II(a), STR II(b), STR II(c), STR II(d), and STR II(e) and contained 12, 18, 13, 5, and 14 accessions, respectively (**Figure 3D**). Clustering of accessions in STRUCTURE analysis was heterogeneous and in concordance with the results of NJ and PCoA. The regional distribution of accessions in the seven sub-populations is given in **Table 3**. Among the sub-populations, a substantial number of accessions were admixtures (**Figures 3B,D**).

The expected heterozygosity for the seven sub-clusters varied from 0.1726 for STR II(d) to 0.4352 for STR I(a) (**Table 4**). The genetic divergence among the seven sub-populations was

FIGURE 3 | Hierarchical population structure analysis (A) Estimation of sub-population for STR I. Maximum number of sub-populations were inferred at K = 2. (B) Population structure for STR I at K = 2 based on Q matrix. The sub-populations were designated as STR I(a) and STR I(b) (C) Estimation of sub-populations for STR II. Maximum number of subpopulations were inferred at K = 5. (D) Population structure for STR II at K = 5 based on Q matrix. The sub-populations were named as STR II(a)–STR II(e). Color codes for each sub-population are provided.

TABLE 3 | Distribution of CartAP accessions in different sub-populations derived from STRUCTURE analysis.


TABLE 4 | Expected heterozygosity of the sub-populations derived through STRUCTURE analysis.


measured through pair-wise FST, which ranged from 0.0799 for STR II(b) and STR II(a) to 0.2888 for STR II(d) and STR I(a) (P < 0.001; **Figure 4**). A hierarchical AMOVA analysis was performed among and within regional pools of CartAP accessions, which provided an estimated molecular variance of 7 and 93%, respectively (**Table 5**). Similar evaluation was conducted for seven sub-clusters derived through STRUCTURE analysis of safflower accessions. It provided a molecular variance of 16 and 84% among and within sub-clusters, respectively (**Table 5**). Genetic differentiation among the sub-clusters was highly significant (P < 0.0001).

# Kinship Analysis at Individual and Population Level

Based on 93 SSR markers, the co-ancestry or kinship coefficient (Fij) between any two safflower accessions of CartAP averaged at a value of 0.043. A major proportion of the pairs of safflower accessions (∼54.3%) had zero pairwise kinship

inferred through hierarchical STRUCTURE analysis. Color codes are provided for each sub-population. \*Denotes FST-values between same sub-population.

estimates followed by 16.59% pairs having kinship estimates between 0.001 to 0.05, 12.24% pairs between 0.051 to 0.1, 7.8% pairs between 0.11 to 0.15, 4.5% pairs between 0.151 to 0.20 and 2.4% pairs between 0.201 to 0.25 (**Figure 5A**). A minimal fraction of safflower accession pairs (0.022%) exhibited kinship estimate >0.25. At the population level, Gijvalues were estimated to assess kinship associations between the seven inferred sub-populations derived from STRUCTURE analysis. The mean kinship coefficients (Gij) ranged from 0 to 0.0783 (**Figure 5B**) between the sub-populations. Negative values of Gij were treated as zero for the current analysis. The kinship analysis indicated that most CartAP accessions and the inferred sub-populations exhibited no or weak level of genetic relatedness at the individual and population level, respectively.

#### Association Mapping

The 48 mapped SSR markers were distributed across all the 12 linkage groups of safflower (Supplementary Table 3) with maximum number of SSR loci (9) on linkage group (LG) 6. Identification of markers linked with each of the eight agronomic traits of safflower (seed oil content, oleic acid content, linoleic acid content, 100-seed weight, plant height, number of primary branches, days to 50% flowering and number of capitula per plant) was done through association analysis and performed independently for the two growing seasons. The range and mean values of the eight phenotypic traits evaluated for CartAP accessions are provided in **Table 6**. Association mapping using GLM and MLM models identified 96 significant markertrait associations (P < 0.05) for the eight traits across two

TABLE 5 | Analysis of molecular variance (AMOVA) between and within regional gene pools and between and within sub-populations derived through hierarchical STRUCTURE analysis.


\*Estimated variance.

<sup>a</sup>With 999 data permutations.

growing seasons (2011–2012 and 2012–2013). Twenty-eight out of 48 SSR loci accounted for these 96 significant associations. A comparatively higher number of significant marker-trait associations were observed through GLM (60) than MLM (36). Out of the 96 significant marker-trait associations (MTAs) detected based on P-value, 63 associations exhibited correlation value (R 2 ) ≥ 10%. Those MTAs, which were detected in more than one season/model and exhibited R <sup>2</sup> ≥ 10% in at least one model and/or season were retained for further analysis (**Table 7**). As an exception, we have included the MTA for NGSaf\_279 with number of capitula per plant, since it was consistently detected in all the seasons and models with R 2 ranging from 7.4 to 9.3%.

For the 2011–2012 season, 19 and 14 significant MTAs with R <sup>2</sup> ≥ 10% were detected using GLM and MLM analysis, respectively (**Table 7**). Twelve MTAs were common between both the models for the studied traits. In GLM, the least number of significant associations (R <sup>2</sup> ≥ 10%) were recorded for oleic acid, number of capitula per plant and number of primary branches (1) while the maximum number of MTAs (6) was recorded for plant height. The correlation value (R 2 ) for these MTA ranged from 10 to 24.1%. The MTA between 100-seed weight and NGSaf\_306 displayed the highest R 2 (24.1%; **Table 7**). In MLM analysis, the highest number of significant loci was associated with oleic


TABLE 6 | Mean and range of different phenotypic traits for CartAP accessions.

FIGURE 5 | Kinship estimates at individual and population level. (A) Frequency distribution for global pairwise kinship estimates (Fij) for CartAP accessions. (B) Matrix showing pairwise Gij-values between sub-populations inferred through hierarchical STRUCTURE analysis. Color codes are provided for each sub-population. \*Denotes Gij-values values between same sub-population.

TABLE 7 | Marker-trait associations identified through General linear model (GLM) and Mixed linear model (MLM) using phenotypic data of season 2011–2012 and 2012–2013.


@Linkage group; R<sup>2</sup> , correlation coefficient.

acid and linoleic acid (3) while the least number (1) of MTA were detected for number of capitula per plant and number of primary branches. MLM analysis could not identify any MTA with R <sup>2</sup> ≥ 10% for plant height. Among the MTA detected through MLM, the R 2 -values ranged from 10.4 to 34.1%. MTA between oleic/linoleic acid and NGSaf\_309 displayed the highest R 2 (34%).

For the 2012–2013 season, we detected 17 and 13 significant MTAs with R <sup>2</sup> ≥ 10% for the eight agronomic traits using GLM and MLM analysis, respectively (**Table 7**). Twelve MTAs were common between both the models for the studied traits. In GLM, the number of significant SSR loci associated with each trait ranged from 1 (for oleic acid) to 4 (for oil content) with R 2 -value ranging from 10.5 to 23.4%. The MTA between oil content and NGSaf\_15 had the highest R 2 -value of 23.4% (**Table 7**). In the MLM model, the least number of loci (1) were observed for 100 seed weight and days to 50% flowering while maximum number of loci (3) was found to be linked with oleic acid. These 13 MTAs had R 2 -values ranging from 10.5 to 34.1%. Among these MTAs, we observed the highest R 2 -value of 34% between oleic acid and NGSaf\_309.

A cumulative assessment of MTAs for each trait identified several significant (P < 0.05; R <sup>2</sup> > 10) associations, which were consistently represented in both models and in both seasons. For example, NGSaf\_15 and NGSaf\_300 for oil content, NGSaf\_67 for oleic acid content, NGSaf\_210 for linoleic acid content, NGSaf\_279 for number of primary branches and number of capitula per plant were detected (**Table 7**). We also observed MTAs with high correlation values, which were detected either in a majority or only in some environments (models and/or seasons). Many MTAs were also common between traits that showed positive or negative correlation in their phenotypic values (**Table 7**).

#### DISCUSSION

Our earlier work on analysis of a representative global germplasm collection of ∼531 safflower accessions identified significant genetic and phenotypic variability in the crop (Kumar et al., 2015, 2016) that could be effectively used to improve traits of agronomic value to enhance crop yield and productivity. Since identification of desirable alleles from large germplasm collections is tedious and time-consuming, core collections encompassing the prevailing genetic and phenotypic diversity of the crop are often developed. In safflower, three studies have reported development of core collections (Johnson et al., 1993; Dwivedi et al., 2005; Kumar et al., 2016). Core collections described by Johnson et al. (1993) and Dwivedi et al. (2005) were based only on morphological and geographical parameters and did not include information on molecular diversity of the crop. They were also comparatively larger with 210 and 570 accessions, respectively. In contrast, two core collections, CartC1 and CartC2, described by Kumar et al. (2016) were constructed based on Maximization (M) strategy using genetic (AFLP), phenotypic and geographical data and were much smaller (57 and 106 accessions, respectively). Additionally, CartC1 and CartC2 include accessions from eight and ten regional gene pools of safflower (Ashri, 1975), respectively and also have representation from secondary regions of introduction. The present study analyzed the suitability of the above core collections (CartC1 and CartC2) for association mapping through assessment of its genetic diversity and population structure using SSR markers followed by establishment of marker-trait associations for eight agronomically important traits.

# CartAP Collection Is an Effective Panel for Association Mapping

#### High Genetic Variation in CartAP Accessions

SSR markers have been extensively used for assessment of association mapping panel and marker-trait associations (Breseghello and Sorrells, 2006; Agrama et al., 2007; Jun et al., 2008; Yang et al., 2010; Rezaeizad et al., 2011). Being co-dominant markers, they are considered more robust for estimation of population structure and kinship relatedness than dominant markers (Zhu et al., 2008). A large proportion of our tested microsatellite loci (88%) generated moderate to high level of polymorphism, which indicates their efficiency in differentiating C. tinctorius L. accessions of CartAP. The mean PIC-value (0.38) obtained in our study using SSR markers on the CartAP collection corroborated well with previous SSR-based studies in safflower (**Table 8**; Hamdan et al., 2011; Barati and Arzani, 2012; Derakhshan et al., 2014; Lee et al., 2014). The allelic range and average number of alleles per locus observed in the present study were lower than those observed by Chapman et al. (2009; **Table 8**) probably due to the inclusion of wild species in their study. High genetic variability among CartAP accessions are also illustrated by the diversity indices, H = 0.7537 and I = 0.4432. This wide genetic diversity is particularly advantageous for association studies as it offers greater allelic diversity with inclusion of rare variants and higher mapping resolution due to representation of historical recombination events from a global population. Thus, the CartAP panel fulfilled the prerequisite of a highly differentiated panel for association mapping (Flint-Garcia et al., 2005).


\*Not available.

#### Weak Population Structure and Low Molecular Relatedness in CartAP Accessions

Investigation of the population structure and genetic relatedness between accessions of an association panel is critical to circumvent spurious marker-trait associations (Yu and Buckler, 2006; Zhu et al., 2008; Yang et al., 2010; Nachimuthu et al., 2015). Structure analysis predicts the number of hypothetical sub-populations and identify admixed individuals among the studied panel while clustering analysis display the genetic relationships among accessions. In our study, we observed complete concordance between distance-based NJ and PCoA and Bayesian-based STRUCTURE analysis (**Figures 6A,B**). NJ clusters, NJ I and NJ II corresponded with STRUCTURE cluster STR I while NJ III coincided with STR II. Neither distance-based NJ analysis nor Bayesianbased STRUCTURE analysis could delimit the accessions on the basis of their geographic origin and/or distribution of studied traits. The lack of geographical structuring among CartAP accessions was expected since Maximization strategy was used for development of these core collections (Kumar et al., 2016), which emphasizes on maximization of the allelic diversity (both genetic and phenotypic) with minimum redundancy. For increasing representation of allelic diversity in the core collections, accessions with unique allelic combinations would have been selected and this might have assembled heterogeneous accessions from different regional pools resulting in an unstructured population. Additionally, an extensive number of genetic admixtures were identified with an ancestry share of < 80% in STRUCTURE analysis (**Figures 3B,D**). The presence of large number of admixed individuals can be attributed to incomplete lineage sorting during safflower diversification.

The range of Fst-value (0.0799–0.2888) suggests low to moderate differentiation between the seven sub-populations derived from hierarchical population structure analysis (**Figure 4**). At the population level, mean kinship coefficient (Gij) values (0–0.075) indicate low genetic similarity between these sub populations of the CartAP collection (**Figure 5B**). Additionally, high proportion of admixed individuals and the low pairwise kinship or co-ancestry values (Fij) at the individual level reflects no or weak relatedness between the safflower accessions (**Figure 5A**). The weak population structure coupled with low molecular relatedness between the sampled accessions of CartAP would prevent spurious marker-trait associations and confirms suitability of the CartAP panel for association mapping.

#### Association Mapping Identifies Significant Marker-Trait Associations

Identification of loci influencing agronomic traits facilitates marker-assisted breeding to increase crop productivity. In safflower, efforts to identify molecular markers linked to traits of agronomic value are limited (Hamdan et al., 2008, 2012; Mayerhofer et al., 2010; García-Moreno et al., 2011; Pearl et al., 2014; Ebrahimi et al., 2017). In the present study, we performed association mapping and identified markers associated with eight traits of agronomic importance (seed oil content, oleic acid content, linoleic acid content, 100-seed weight, plant height, number of primary branches, days to 50% flowering and number of capitula per plant), many of which have been reported to influence crop yield (Patil, 1998). Of these, several

FIGURE 6 | Demarcation of two major STRUCTURE populations (STR I and STR II) and seven sub-populations obtained by further hierarchical structure analysis on (A) PCoA scatter plot. (B) Neighbor Joining dendrogram. Color codes are provided for each sub-population.

marker-traits associations were found to be stable over the two growing seasons. Since MTAs are influenced by variations in environmental conditions, the strength of these associations needs to be analyzed through multi-location trials.

Low seed oil content is a serious impediment to adoption of safflower as a major oilseed crop globally. Oil content is a quantitative trait controlled by several genomic loci imparting small to moderate genetic effects, which are also influenced by environmental conditions (Hwang et al., 2014). Until now, no attempts have been made to map QTLs for oil content in safflower. Seed oil content in the CartAP accessions ranges from 16 to 50% (**Table 6**) indicating that the association panel includes diverse accessions which can be used in breeding programs. Our association study identified 11 MTAs correlated with oil content and supported by high correlation coefficient (R <sup>2</sup> ≥ 10%; **Table 7**). Among these, two SSR loci, NGSaf\_15 (R <sup>2</sup> = 10.4–23.4%) and NGSaf\_300 (R <sup>2</sup> = 13.8–17.9%) were strongly associated with oil content in both GLM and MLM models and in both seasons (**Table 7**) and can be used to increase oil content of safflower cultivars.

The fatty acid composition of edible oil determines its utility and market value. Ever since the importance of oleic acidrich dietary fats was elucidated, the demand for edible oil with high oleic acid content has increased substantially. Safflower is unique among oilseed crops due to the wide range of seed oil compositions available in its global germplasm (Velasco et al., 2005). Two major oil types are found in safflower—one with high linoleic acid and the other rich in oleic acid (Fernandez-Martinez et al., 1993). In addition, several lines with similar levels of linoleic and oleic acid have also been identified (Fernandez-Martinez et al., 1993). Modifying the fatty acid composition of safflower oil is particularly significant from the Indian perspective since majority of Indian cultivars are high in linoleic acid in spite of availability of high oleic lines in the Indian germplasm. Only one major QTL associated with oleic acid has been identified and it mapped on linkage group 3 using a cross between CL-1 (oleic acid: 14–22%) and CL-9 (oleic acid >84%; Hamdan et al., 2012). In addition to the major QTL associated with oleic acid, several modifying genes, which either enhance or repress the expression of major QTLs have been postulated to determine oleic acid content in safflower (Hamdan et al., 2008, 2012). In CartAP accessions, the oleic acid content ranged from 10 to 79% while linoleic acid varied from 13 to 87%. By association mapping, we identified three SSR loci (NGSaf\_67, NGSaf\_210 and NGSaf\_309) linked with oleic acid content in safflower (**Table 7**). Among these loci, NGSaf\_67 (R <sup>2</sup> = 11.4–11.6%) was consistently associated with oleic acid content in both seasons and in both models while NGSaf\_210 (displaying a phenotypic variation of ∼14%) and NGSaf\_309 (with the highest R 2 -value (∼34%) among all the recorded marker-trait associations) were identified to be associated with oleic acid through MLM approach in both the seasons. We identified 10 significant marker-trait associations for linoleic acid content with R <sup>2</sup> ≥ 10%. The NGSaf\_210 locus (R <sup>2</sup> = 14.3–17.9%) was associated with linoleic acid in both the models and seasons. In addition, NGSaf\_67 (R <sup>2</sup> = 10.9–16.7%) and NGSaf\_309 (R <sup>2</sup> = 34%) loci detected for oleic acid were also linked with linoleic acid content. Linoleic acid is synthesized from oleic acid by activity of fatty acid desaturase 2 enzyme encoded by the FAD2 gene. A strong negative correlation has been reported between oleic acid and linoleic acid content in safflower (Kumar et al., 2016). Identification of common SSR loci (NGSaf\_67, NGSaf\_210 and NGSaf\_309) for both these fatty acids indicates their strong linkage with the corresponding gene/s of the fatty acid biosynthetic pathway. These MTAs can serve as a useful resource for achieving the desired oil composition in the crop.

Plant height determines plant architecture and also influences crop yield. It is quantitatively inherited and a large number of QTLs associated with plant height have been reported in different crop systems (Wu et al., 2010; Morris et al., 2013; Zanke et al., 2014). In previous studies, safflower germplasm have revealed significant variations for plant height (Yeilaghi et al., 2015; Kumar et al., 2016). However, genetic loci influencing this trait have not yet been identified. The CartAP association panel harbors wide variation for plant height ranging from 94 to 226 cm. Association analysis identified 11 marker-trait associations with a R <sup>2</sup> ≥ 10%. Two loci, NGSaf\_156 (R 2 -value from 11.8 to 13.9%) and NGSaf\_296 (R 2 -value from 12.7 to 14.7%) were associated with plant height in all environments except in the MLM model of 2011–2012 season. Plant height and flowering time are correlated traits as the onset of reproductive phase marks the termination of apical growth in safflower. Early maturing cultivars are desirable and several accessions with early flowering time were identified in CartAP (Kumar et al., 2016). Association analysis revealed that days to 50% flowering correlated significantly with 8 marker-trait associations which had R <sup>2</sup> ≥ 10%. The NGSaf\_92 locus (R <sup>2</sup> = 7.9–11.6%) was associated with days to 50% flowering for all models and seasons except GLM of 2011–2012 season. Interestingly, we detected two markers (NGSaf\_101 and NGSaf\_296), which were associated with both plant height and days to 50% flowering in either one of the models or seasons (**Table 7**). Plant height and flowering time are traits that are highly influenced by environmental effects. Therefore, seasonal variations between the two growing periods could have led to differences in detection of these MTAs in the studied seasons and models. Nevertheless, these common MTAs provide an initial platform to study the genetic relationships between the two traits.

The number of primary branches also defines plant architecture and influences yield since primary branches produce secondary to quaternary branches, each of which terminates into the characteristic globular head of safflower. Association analysis for number of primary branches detected 6 marker-trait associations, which showed correlation coefficient (R 2 -value) ≥ 10%. The locus NGSaf\_279 (R 2 -value ranging from 8 to 10.8%) was associated with primary branches using both the models in both growing seasons. The number of capitula in accessions of the CartAP collection ranged from 16 to 203 (Kumar et al., 2016). The number of capitula per plant exhibited 2 significant associations which had R <sup>2</sup> ≥ 10%. The NGSaf\_279 locus (R <sup>2</sup> = 7.4–9.3%) was common among all models and for both seasons. A positive correlation was reported by Kumar et al. (2016) between the number of capitula per plant and number of primary branches, which is in congruence with our results on identification of the SSR locus NGSaf\_279, as a common marker for both the traits.

A mature safflower achene consists of 33–60% hull and 40– 67% kernel (Dajue and Mündel, 1996). It has been suggested that reducing hull content can significantly enhance seed oil content in safflower (Dajue and Mündel, 1996). Association analysis of 100-seed weight among CartAP accessions identified 7 significant marker-trait associations with R <sup>2</sup> ≥ 10%. With the exception of season 2012–2013 MLM approach, two loci NGSaf\_306 (R <sup>2</sup> = 13–24.5%) and NGSaf\_309 (R <sup>2</sup> = 15.5–24.1%), were consistently associated with 100-seed weight (**Table 7**). These loci may play an important role in deciphering the molecular relationship between hull thickness and oil content.

Although safflower is an important oilseed crop with global distribution, there have been limited attempts to identify markertrait associations for crop improvement. The present study established the suitability of CartAP collection for association mapping due to the absence of population structure and presence of significant genetic and phenotypic diversity. Using association mapping we were able to identify significant markertrait associations for eight important agronomic traits. Many of these associations were consistently detected over two growing seasons and multiple models. The stability and utility of these marker-trait associations need to be analyzed further under additional environments through multi-location trials. The potential marker-trait associations identified in this study

#### REFERENCES


will be useful in facilitating marker-assisted breeding for crop improvement and identification of candidate genes for trait variability in safflower.

# AUTHOR CONTRIBUTIONS

SG and AJ conceived and designed the experiments; HA and SK conducted the experiments, collected and analyzed the data; HA, SK, AJ, and SG wrote the manuscript; MA, AK, AJ, and SG provided facilities for completion of experiments and reviewed the manuscript.

### ACKNOWLEDGMENTS

This work was supported by the DST-PURSE grant of Department of Science and Technology, Government of India [grant no. Dean(R)/2009/868] provided to the University of Delhi. HA was supported by a research fellowship from University Grants Commission, India. We are thankful for the support of our technical and field staff in completion of the work.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018. 00402/full#supplementary-material

	- a global reference collection of 531 accessions of Carthamus tinctorius

L.(Safflower) using AFLP markers. Plant Mol. Biol. Rep. 33, 1299–1313. doi: 10.1007/s11105-014-0828-8


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Ambreen, Kumar, Kumar, Agarwal, Jagannath and Goel. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genetics, Host Range, and Molecular and Pathogenic Characterization of Verticillium dahliae From Sunflower Reveal Two Differentiated Groups in Europe

Alberto Martín-Sanz<sup>1</sup> , Sandra Rueda<sup>1</sup> , Ana B. García-Carneros<sup>2</sup> , Sara González-Fernández<sup>2</sup> , Pedro Miranda-Fuentes<sup>2</sup> , Sandra Castuera-Santacruz<sup>2</sup> and Leire Molinero-Ruiz<sup>2</sup> \*

#### Edited by:

Thomas Miedaner, University of Hohenheim, Germany

#### Reviewed by:

Thomas John Gulya, Agricultural Research Service (USDA), United States Matthew Denton-Giles, Curtin University, Australia

> \*Correspondence: Leire Molinero-Ruiz leire.molinero@csic.es

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 05 December 2017 Accepted: 19 February 2018 Published: 09 March 2018

#### Citation:

Martín-Sanz A, Rueda S, García-Carneros AB, González-Fernández S, Miranda-Fuentes P, Castuera-Santacruz S and Molinero-Ruiz L (2018) Genetics, Host Range, and Molecular and Pathogenic Characterization of Verticillium dahliae From Sunflower Reveal Two Differentiated Groups in Europe. Front. Plant Sci. 9:288. doi: 10.3389/fpls.2018.00288 <sup>1</sup> Pioneer Hi-Bred International, Inc., La Rinconada, Spain, <sup>2</sup> Department of Crop Protection, Institute for Sustainable Agriculture, Spanish National Research Council, Córdoba, Spain

Verticillium wilt and leaf mottle of sunflower, caused by the fungus Verticillium dahliae (Vd) has become a major constraint to sunflower oil production in temperate European countries. Information about Vd from sunflower is very scarce despite genetics, molecular traits and pathogenic abilities of fungal strains affecting many other crops being widely known. Understanding and characterizing the diversity of Vd populations in those countries where sunflowers are frequent and severely affected by the fungus are essential for efficient breeding for resistance. In this study, we have analyzed genetic, molecular and pathogenic traits of Vd isolates affecting sunflower in European countries. When their genetics was investigated, almost all the isolates from France, Italy, Spain, Argentina, and Ukraine were assigned to vegetative compatibility group (VCG) 2B. In Bulgaria, Turkey, Romania, and Ukraine, some isolates were assigned to VCG6, but some others could not be assigned to any VCG. Genotyping markers used for Vd affecting crops other than sunflower showed that all the isolates were molecularly identified as race 2 and that markers of defoliating (D) and non-defoliating (ND) pathotypes distinguished two well-differentiated clusters, one (E) grouping those isolates from Eastern Europe and the other (W) all those from the Western Europe and Argentina. All the isolates in cluster W were VCG2B, while the isolates in cluster E belonged to an unknown VCG or to VCG6. When the host range was investigated in the greenhouse, the fungus was highly pathogenic to artichoke, showing the importance of farming alternatives in the management of Verticillium attacks. Sunflower genotypes were inoculated with a selection of isolates in two experiments. Two groups were identified, one including the isolates from Western Europe, Argentina, and Ukraine, and the other including isolates from Bulgaria, Romania, and Turkey. Three pathogenic races were differentiated: V1, V2-EE (Eastern Europe) and V2-WE (Western Europe). Similarly, three differentials are proposed for race identification: HA 458 (universal susceptible),

**44**

HA 89 (resistant to V2-EE, susceptible to V2-WE) and INRA2603 (susceptible to V2- EE, resistant to V2-WE). The diversity found in Vd affecting sunflower must be taken into account in the search for resistance to the pathogen for European environments of sunflower production.

Keywords: control strategies, crop rotation, genetic resistance, molecular markers, races of V. dahliae, pathotypes of V. dahliae, soilborne fungus

#### INTRODUCTION

Verticillium wilt and leaf mottle (VWLM), caused by the fungus Verticillium dahliae Kleb. (Vd), has traditionally been a major disease of sunflower in Argentina and the United States (Gulya et al., 1997; Radi and Gulya, 2007; Galella et al., 2012) as well as in temperate European countries (Harveson and Markell, 2016). However, disease incidence in France, Italy, Spain, and countries around the Black Sea has dramatically increased in recent years and in some regions, like southern France, it is becoming a major constraint to sunflower oil production (Harveson and Markell, 2016; Debaeke et al., 2017). Verticilliun dahliae is a soilborne ascomycete with a wide range of host crops. Besides sunflower, it causes important yield losses in artichoke (Cynara cardunculus L. var. scolymus), cauliflower (Brassica oleracea var. botrytis L.), cotton (Gossypium hirsutum L.), eggplant (Solanum melongena L.), lettuce (Lactuca sativa L.), olive tree (Olea europaea L.), potato (Solanum tuberosum L.), tobacco (Nicotiana tabacum L.), and tomato (Solanum lycopersicum L.), among others (Pegg and Brady, 2002). In Spain, Verticillium constitutes an important constraint for the production of cotton, artichoke, and, particularly, of olive tree (Bejarano-Alcázar et al., 1996; Berbegal et al., 2010; Jiménez-Díaz et al., 2011; López-Escudero and Mercado-Blanco, 2011). Also in Spain, VWLM outbreaks have repeatedly been observed in the last few years in sunflower fields of Cadiz province (García-Ruiz et al., 2014), where it is grown in alternation with other crops, particularly cotton and/or tomato. Some fields have in fact even been turned into olive tree groves. Host specialization occurs in Vd, meaning that isolates from a given host may be pathogenic on other hosts but, generally, they are more virulent (symptoms are more severe) on the hosts from which they were obtained. In some Vd isolates, host specialization is more pronounced (Bhat and Subbarao, 1999; Douhan and Johnson, 2001). In areas where sunflower is grown in alternation with other crop species, determination of the host range specificity of Vd affecting sunflowers is important for the correct management of the whole cropping system.

Clonality in Vd is described by means of vegetative compatibility, which refers to the genetically controlled ability of individual fungal strains to undergo hyphal anastomosis and form stable heterokaryons. Vegetatively compatible isolates are placed in the same vegetative compatibility group (VCG). In spite of there being many studies on the genetic diversity of Vd from artichoke (Jiménez-Díaz et al., 2006), cotton (Dervis et al., 2008; Korolev et al., 2008), eggplant (Dervis et al., 2009b), olive tree (Navas-Cortés et al., 2009; Dervis et al., 2010), potato and mint (Dung et al., 2013) or sugarbeet (Strausbaugh et al., 2016) among other crops, the genetic characterization of Vd isolates from sunflower has been scarcely addressed. Isolates of Vd affecting sunflowers in Canada showed weak reactions with testers from VCG4A and 4B; one isolate was identified as VCG3 and another one was compatible with all VCG groups except VCG2A (El-Bebany et al., 2013). In previous works by our research group, isolates from Argentina and Spain were adscribed to VCG2B (García-Carneros et al., 2014). Since Vd reproduces asexually, isolates in the same VCG could be genetically distinct populations with a similarity in a number of physiological, ecological, pathogenic, and host-range traits (Jiménez-Díaz et al., 2006; Collado-Romero et al., 2008; Korolev et al., 2009). Thus, genetic diversity of Vd isolates from sunflower could be intimately associated with disease occurrence and severity as a consequence of particular interactions Vd isolate – sunflower genotype.

In some crops, such as cotton or olive tree, isolates of Vd causing infection are pathogenically characterized by assignment to defoliating (D) or non-defoliating (ND) pathotypes, which are identified on the basis of their capacity to cause, or not, the complete fall of green leaves (Rodríguez-Jurado et al., 1993; Bejarano-Alcázar et al., 1996). Molecular analyses using SCAR markers differentiated a genetically homogeneous group of D isolates belonging to VCG1A (Mercado-Blanco et al., 2002). In contrast, these markers showed a high molecular diversity of ND pathotypes belonging to 2A, 2B, and 4B VCGs (Pérez-Artés et al., 2000; Mercado-Blanco et al., 2001, 2002, 2003). Preliminary results from our research group showed that the molecular pattern of Vd isolates infecting sunflower in Argentina and Spain matched that of the ND pathotype of artichoke and/or cotton, pointing to the closeness between ND isolates affecting these three crops and, therefore, suggesting that any of them could serve as a carrier and source of inoculum for Verticillium outbreaks (García-Carneros et al., 2014).

While isolates of Vd infecting crops like cotton and olive tree are assigned to D or ND pathotypes, races of Vd pathogenic to tomato, lettuce, and sunflower are distinguished depending on the genes of resistance that they overcome. Race 1 and race 2 have been described for isolates of Vd pathogenic to tomato (Alexander, 1962) and lettuce (Vallad et al., 2006; Hayes et al., 2007). Moreover, the races of Vd on tomato and lettuce have been reported as being correlated (Maruthachalam et al., 2010). Sequence similarity between the resistance gene of lettuce (Vr1) and that of tomato (Ve1) suggests that they share similar race 1-specific genes for resistance to Vd (Hayes et al., 2011). Also, race 1 is characterized by the presence of the effector gene Ave1, conferring avirulence to lettuce or tomato that carry the resistance genes Vr1 or Ve1, respectively

(de Jonge et al., 2012). Conversely, race 2 Vd isolates lack Ave1 and they are, therefore, potentially virulent on plants carrying resistance to race 1 (de Jonge et al., 2012; Short et al., 2014). Race 1 seems to have arisen once by horizontal gene transfer and is genetically much less diverse than race 2 (de Jonge et al., 2012; Jiménez-Díaz et al., 2017). Race 2 occurs worldwide and it causes disease on cultivars from a range of crops for which effective resistance has not been reported (Maruthachalam et al., 2010; Short et al., 2014; Sandoya et al., 2017). Regarding Vd on sunflower, the first race (NA-1) was detected in the United States, and it was controlled by the resistance in HA 89, which was associated to a single major gene (Fick and Zimmer, 1974; Gulya et al., 1997). New races overcoming this resistance, and apparently different to each other, have been reported later on the basis of phenotypic characterization: one (NA-Vd2) in the United States (Gulya, 2007), four in Argentina (Bertero de Romano and Vázquez, 1982; Galella et al., 2004; Clemente et al., 2017) and one in Spain (García-Ruiz et al., 2014). No comparative studies of these proposed races of Vd affecting sunflower have so far been conducted, and neither are any relationships with races 1 and 2 on tomato and lettuce known.

From a practical point of view, genetic resistance has been the most effective method for controlling VWLM in sunflower for nearly 50 years. Initial sources of resistance were identified in Canada in the 1950s (Putt, 1958). The inheritance of resistance in some inbred lines was found to be qualitative or of a complete dominance and designated as V<sup>1</sup> (Putt, 1964). The same type of resistance was found 10 years later in certain inbred lines from the USDA collection, such as HA 89 (Fick and Zimmer, 1974), which is a recurrent parent in the development of resistant hybrids, particularly in public sunflower breeding programs. The new races of Vd are not controlled by the resistance in HA 89 (see above). Instead, some of them seem to be controlled by the resistance of some entries of the USDA sunflower collection, such as PI507901 (Radi and Gulya, 2007) or the inbred lines HA 300 and HA 371 (Gulya et al., 1997). Moreover, the inheritance of resistance to race NA-Vd2 appears to be recessive or additive in some lines, and the breeding alternative of pyramiding quantitative resistance is being explored in Argentina (Galella et al., 2012). Frequent outbreaks of VWLM in sunflower-growing countries suggest that the resistance in commercial hybrids was overcome by the pathogen and this makes it urgent to identify plant material which could serve as a donor of resistance against the current races of the fungus worldwide.

This work was conceived from a holistic perspective since a bewildering amount of scientific information is available for Vd affecting many crops but Vd being pathogenic on sunflower is largely unknown. Here we describe the population structure of the Vd affecting sunflowers in countries of Europe where VWLM recurrently threatens oil production: Bulgaria, France, Italy, Romania, Spain, Ukraine, and Turkey. Genetic, molecular and pathogenic traits of the fungal collection were studied and, because of its epidemiological significance, we also addressed to what extent Vd from sunflower can be pathogenic on other crops.

# MATERIALS AND METHODS

# Isolates of Verticillium dahliae From Sunflower

All the isolates of the fungus were recovered from affected sunflowers that were collected between 2009 and 2016 in Argentina, Bulgaria, France, Italy, Romania, Spain, Turkey, and Ukraine. Because of the importance Verticillium wilt has in sunflowers in Argentina and although we did not have a set of isolates representing the race diversity of Vd in that country, one isolate from Argentina was included as representative. The plants showed interveinal chlorosis and yellowing, as well as wilt symptoms (see **Supplementary Figure S1A**). Their reference and information on the year of collection and geographical location of the samples are presented in **Table 1**. Cross sections of the stem base and petiole tissues of all the plants were analyzed. Each section was divided into two–six pieces that were surface-disinfested for 3 min by immersion in 10% household bleach (40 g of active chlorine per liter), rinsed in deionized water for 3 min and air dried using a vertical laminar flow cabinet. Segments 2 to 4 mm long of sunflower tissue were aseptically transferred to petri plates containing potato dextrose agar (PDA). Plates were incubated at 25◦C for 72 h in darkness. Colonies were morphologically confirmed by observation under the stereoscope. Only one colony among all those recovered from the same field was selected for further studies, and a minimum of two monoconidial cultures were obtained by the following procedure. Each isolated colony was transferred to PDA and incubated in the laboratory at 25◦C. After 8–10 days, the plates were flooded with 5 ml of sterile deionized water each and swirled gently. The conidial suspension was filtered through two layers of sterile gauze. Five serial 1:10 dilutions were prepared from the initial suspension and, from each of them, a small volume was streaked onto Water Agar (WA) medium following a zigzag distribution. Plates were incubated at 28◦C for 24 h in darkness. Germinating conidia were then identified under the stereoscope and individually transferred to PDA. The final colonies were confirmed as being Vd based on morphological characters, and labeled as monoconidial isolates. Monoconidial and original isolates were stored in PDA as part of the fungal collection of the Laboratory of Field Crop Diseases at the Institute for Sustainable Agriculture, Córdoba, Spain.

# Genetic Diversity: Determination of Vegetative Compatibility Groups

The VCG of 38 monoconidial isolates, including those previously characterized by our research group (García-Carneros et al., 2014), were determined by generation and characterization of nitrate non-utilizing (nit) mutants of each of them and determination of vegetative compatibility. Nit mutants were generated on Water Agar Chlorate medium as colonies presenting a faint growth on Czapek-Dox Agar (CDA) with no aerial mycelium (Korolev and Katan, 1997) and phenotyped on CDA amended with hypoxanthine as described by Correll et al. (1987). Complementation tests were done by pairing nit mutants of the isolates with the complementary mutants of the

TABLE 1 | Isolates of Verticillium dahliae used in this work, listed by year of collection, geographic origin and host, and genetic characterization by means of assignment to Vegetative Compatibility Group (VCG).


a Isolate identified as a new race of Vd overcoming V<sup>1</sup> gene in HA89 by García-Ruiz et al. (2014).

∗ Isolates included in the pathogenic characterization with sunflower genotypes.

international OARDC (The Ohio State University, Wooster, OH, United States) reference testers, Israeli nit testers and testers of Washington State University (Pullman WA, United States): T9 isolate (VCG1A), Ep8M and Ep52 isolates (VCG2A), Cot200 and Cot254 isolates (VCG2B), 70-21 isolate (VCG3), 131M isolate (VCG4A), Pt15M isolate (VCG4B), and MT isolate (VCG6). Pairings were done following the methodology of Collado-Romero et al. (2006). Mycelial plugs of nit testers and nit mutants of test (unknown) isolates were placed 1.5 cm apart on CDA in petri plates and incubated at 25◦C in the dark. Plates were scored for prototrophic growth every 7 days and until 28–35 days of incubation. Positive complementation was indicated by the formation of a dense, aerial growth where mycelia from the tester and the test nit mutant had met and formed a prototrophic heterokaryon (**Supplementary Figure S2**). The test nit mutant was thus considered vegetatively compatible with the tester strain and was assigned to its VCG.

#### Molecular Characterization

The molecular characterization of all the isolates was performed using diagnostic primers of race 1 (Usami et al., 2007; de Jonge et al., 2012) and race 2 (Short et al., 2014), as well as with the diagnostic markers of D and ND pathotypes described for Vd infecting olive tree and artichoke (Carder et al., 1994;

Mercado-Blanco et al., 2001, 2002, 2003; Collado-Romero et al., 2009). Isolates of the D pathotype of Vd are pathogenic on sunflower among other crop species (Jiménez-Díaz et al., 2017), but previous findings by our research group showed that those affecting sunflower are molecularly similar to ND isolates of the fungus that are pathogenic to artichoke or cotton (García-Carneros et al., 2014). Besides, races of Vd from sunflower have been determined on the basis of the reaction of particular host genotypes (or differentials) carrying genes of resistance from different sources, but racial characterization by means of molecular markers for race has not been addressed so far. Total genomic DNA from each isolate was purified using the i-Genomic Plant DNA Extraction Minikit (Intron Biotechnology, Sangdaewon-Dong, South Korea) according to the manufacturer's instructions. Quality and concentration of DNA samples were determined using a QubitTM 3.0 Fluorometer (InvitrogenTM, Carlsbad, CA, United States). Finally, DNA samples were adjusted to a final concentration of 10 ng/µL and stored at −20◦C until required.

The primer pairs used for the diagnosis of D and ND pathotypes were: DB19/DB22, DB19/espdef01, INTD2f/INTD2r, INTND2f/INTND2r, INTNDf/INTNDr and INTND2f/INTND3r. Optimized PCR assays were carried out in a final volume of 25 µL containing 0.4 µM each primer, 800 µM dNTPs, 2.5 µL 10x PCR buffer (800 mM tris–HCl, pH 8.3–8.4 at 25◦C, 0.2% Tween20 wt/V), 0.75 U Taq-DNA Polymerase (Dominion MBL, Córdoba, Spain), 1.5 mM (DB19/DB22 primers) or 2 mM (rest of primers) MgCl2. Amplification conditions were as follows: 4 min denaturation at 94◦C; followed by 35 cycles of 1 min denaturation at 94◦C, 1 min of annealing at 54◦C (DB19/DB22), 62◦C (DB19/espdef01), 64◦C (INTD2f/INTD2r, INTND2f/INTND2r, INTNDf/INTNDr), 60◦C (INTND2f/INTND3r), and 1 min of extension at 72◦C; and a final extension step of 6 min at 72◦C. Determination of races 1 and/or 2 was conducted using diagnostic primer pairs Tr1/Tr2, VdAve1F/VdAve1R and VdR2F/VdR2R. Optimized PCR assays were carried out in a final volume of 25 µL containing 10 µM each primer, 400 µM dNTPs, 2.5 µL 10x PCR buffer (800 mM tris–HCl, pH 8.3–8.4 at 25◦C, 0.2% Tween 20 wt/V), 0.75 U Taq-DNA Polymerase (Dominion MBL, Córdoba, Spain), 3 mM MgCl2. The following profiles were set for amplifications: 2 min initial denaturation at 94◦C; 35 cycles of 1 min denaturation at 94◦C, 1 min annealing at 64◦C (Tr1/Tr2 and VdR2F/VdR2R) or 62◦C (VdAve1F/VdAve1R), and 1 min of extension at 72◦C; and a final extension step of 10 min at 72◦C.

All reactions were done in a T1 Thermocycler (Whatman Biometra, Göttingen, Germany). Amplification products were separated by horizontal electrophoresis in 1.5 or 2% agarose gels containing 0.05 µl/ml GoldView Nucleic Acid Stain (SBS Genetech, Beijing, China) and visualized over a UV light source. A 100- to 2,000-bp or 100- to 1,000-bp ladder (Dominion MBL, Cordoba, Spain) was included in the electrophoresis.

A binary matrix based on presence (1) or absence (0) of PCR product was generated. Cluster analysis using the unweighted paired group method with arithmetic averages (UPGMA) algorithm and Jaccard's similarity coefficient (Jaccard, 1908) were used to classify the isolates and determine genetic similarities among them. Analyses were performed with InfoStat Software <sup>R</sup> v. 2010 (Di Rienzo et al., 2010).

# Host Range: Pathogenicity of Isolates of Verticillium dahliae From Sunflower on Herbaceous Crop Species

An experiment was conducted under greenhouse conditions. Six crop species were inoculated with three isolates of Vd from sunflower (VdS0112, VdS0113, and VdS0213) and two from olive tree (one from the D pathotype, VdO0913, and another from the ND pathotype, VdO1113, both of them belonging to the fungal collection of the Laboratory of Field Crop Diseases at the Institute for Sustainable Agriculture, Córdoba, Spain).

Seeds of genetically susceptible artichoke ('Talpiot'), cotton ('Avangard'), eggplant ('Cristal'), lettuce ('Maravilla de verano'), tomato ('Manacor') and sunflower (HA 89 hybrid) were surfacesterilized by surface-sterilized by immersing them in 10% sodium hypochlorite for 10 min, then thoroughly rinsed in deionised water and incubated in the dark at saturation humidity in a germinator at 26 ± 2 ◦C until radicles were 2–5 mm long. Seedlings were then transplanted into vermiculite and incubated in the greenhouse at 15–25◦C and photoperiod of 14 h light per day for 1 month. Fertilization was applied weekly using a commercial solution (COMPO Universal Fertilizer) following manufacturer's recommendation. Then, six plants (replications) of each crop species were uprooted and inoculated by root immersion in conidial suspensions of the Vd isolates (10<sup>6</sup> conidia/ml). Roots of control plants were immersed in deionized water. The experiment was carried out in a completely randomized 6 × 6 factorial design. Plants were incubated for 8 weeks in the greenhouse under the same conditions previously described. Severity of symptoms (SS) in each plant was assessed weekly as a percentage of the foliar tissue affected. Sequential SS values were used to calculate the area under the disease progress curve (AUDPC) by trapezoidal integration method (Campbell and Madden, 1990). The experiment was performed twice and, as no significant differences between the two replicates were found for AUDPC (McIntosh, 1983), data were pooled and analysed using analysis of variance (ANOVA). Mean values of AUDPC were compared using Fisher's protected least significant difference (LSD) tests (P = 0.05). Statistical analyses of data were performed using STATISTIX 10.0 software (Analytical Software, Tallahassee, FL, United States).

# Pathogenic Characterization of Verticillium dahliae in Sunflower Genotypes

The pathogenic characterization of Vd from sunflower was conducted by means of two phenotyping experiments carried out under greenhouse conditions (20–25◦C day and 15–20◦C night, with 14 h light). Both experiments were replicated and similar results were obtained. In the first experiment, seven genotypes of sunflower, inbred lines and commercial

hybrids with different responses to Verticillium wilt according to previous unpublished data (from field and greenhouse experiments), were independently inoculated with 21 isolates (**Table 1**). Inbred lines were Pioneer 1 and the public lines HA 458, HA 89, and INRA2603. The hybrids included were Pioneer 2, Pioneer 3, and Pioneer 4. Four-week-old plants grown as previously described (experiment of host range in previous subheading) were uprooted and inoculated by immersing the roots in a suspension of 10<sup>6</sup> conidia per ml for 30 min. Roots of the control plants were immersed in water. Inoculated plants were individually transplanted to 0.75 l pots filled with peat:sand (2:1). Four replications (pots) were used for each genotype and Vd isolate. Plants were incubated for 25 days and, at the end of the experiment, VWLM was assessed by means of a Disease Index (DI) that was calculated as: DI = AN × SS, where AN is the percent of affected nodes and SS represents the severity of symptoms according to a 0–5 scale based on chlorosis and necrosis of leaves proposed by Alkher et al. (2009) (0 = no chlorosis or necrosis, 1 = visible chlorosis with <1% necrosis, 2 = up to 40% chlorosis and 1–20% necrosis, 3 = up to 65% chlorosis and 20–35% necrosis, 4 = 100% chlorosis and 35–70% necrosis and 5 = 100% chlorosis and 70–100% necrosis). This scale was used because it represents, with only one value, both the area of the plant with symptoms and the severity of those symptoms. The experiment was performed twice and, as no significant difference between the two replicates was found for DI (McIntosh, 1983), data were pooled and VWLM assessed using ANOVA. Hierarchical cluster analysis with UPGMA algorithm and Euclidean distance was made to classify the isolates into different pathotypes. A principal component analysis (PCA) was also carried out to visualize the distribution of the variability found. Statistical, clustering and PCA analyses were performed with InfoStat Software <sup>R</sup> v. 2010.

A second phenotyping experiment was established in order to confirm the existence of pathogenic variants and to identify differentials of races of Vd. This experiment was performed with inbred lines and a selection of those isolates representing the diversity found after UPGMA analysis in the first experiment. Thus, 12 isolates of Vd were inoculated into HA 458, HA 89, INRA2603, and Pioneer 1. The experiment was conducted like experiment 1, with slight modifications. Three-week-old sunflowers were grown and inoculated as described. Aiming at obtaining similar disease data to those occurring under field conditions, sunflowers were transplanted into 3.5 l pots for 6 weeks. At the end of the experiment, VWLM in the plants was evaluated using the DI explained above. The experiment was conducted twice and the data pooled since no significant difference between the two replicates was found for DI (McIntosh, 1983). The DI results were analyzed using ANOVA and, when significant effects were obtained, Fisher's protected LSD tests (P = 0.05) were used for comparisons of genotypes, Vd isolates, and their interaction. Statistical analyses of data were performed using InfoStat Software <sup>R</sup> v. 2010.

# RESULTS

# Genetic Diversity: Determination of Vegetative Compatibility Groups

Of the 36 isolates of Vd from sunflower characterized to VCG, 17 were assigned to VCG2B and 6 were assigned to VCG6. The remaining isolates could not be assigned to any VCG. The two isolates from olive tree belonged to VCG1A (VdO0913 isolate, D pathotype), and to VCG2A (VdO1113 isolate, ND pathotype). Surprisingly, VCGs of Vd from sunflower were related to their geographical origin. All the isolates from Argentina, France, Italy, and Spain were assigned to VCG2B, while isolates from Bulgaria, Turkey, Romania, and Ukraine were assigned only to VCG6. In this latter group, 13 of the 19 isolates failed to form stable heterokaryons with any of the nit mutant testers (**Table 1**).

#### Molecular Characterization

With regard to molecular characterization, all the isolates amplified, as expected, the 543- or 526-bp marker specific to Vd (DB19/DB22 primers).

When amplified using primer pairs INTD2f/INTD2r (462 bp marker) and DB19/espdef01 (334-bp marker), all the isolates from Argentina and Western Europe showed the 462 (−), 334 (−) pattern. When isolates of Vd from Eastern European countries were amplified using the same pairs of primers, they had the following patterns: 462 (+), 334 (+); 462 (−), 334 (+); and 462 (−), 334 (−). This same group of isolates had a single molecular pattern after amplifications with INTNDf/INTNDr (1,163-bp marker), INTND2f/INTND2r (824-bp marker), and INTND2f/INTND3r (688-bp marker): 1,163 (−), 824 (−), 688 (−). Conversely, PCR assays of Vd isolates from Argentina and Western European countries using these three pairs of primers amplified either the three markers, only the 688-bp marker, or any combination of two out of the three of them (**Supplementary Table S1**).

When amplified with race-specific primers, all 38 isolates yielded 256-bp amplicons with VdR2F/VdR2R and failed to amplify with Tr1/Tr2 and VdAve1F/VdAve1R (**Supplementary Table S1**). Since no polymorphisms were detected in our Vd isolates when using race-specific primers, these data were omitted for the molecular analysis.

The dendrogram resulting from the UPGMA analysis of the molecular data set for pathogenic characterization distinguished three well-differentiated clusters among the 38 isolates of Vd (**Figure 1**). The first cluster (Cluster E) grouped the 16 isolates collected in countries from Eastern Europe as well as the isolate VdO0913 from olive tree collected in Spain, which shared about a 50% similarity. Moreover, all the isolates grouped in cluster E belonged to an unknown VCG or to VCG6. All isolates of Vd from sunflower of Argentina and Western Europe countries, as well as one isolate from Ukraine and the isolate VdO0113 from olive tree of Spain, shared a 21% similarity and were grouped in a second cluster (cluster W) irrespective of their country of origin. Interestingly, all the isolates in cluster W were assigned to VCG2B. Finally, two isolates from Turkey (VdS0614 and

VdS0414) and one from Romania (VdS1114) were genetically very distant from the rest of the isolates.

### Host Range: Pathogenicity of Isolates of Verticillium dahliae From Sunflower in Herbaceous Crop Species

The disease by Vd, expressed as the AUDPC, caused by the three isolates from sunflower and the two from olive tree on artichoke, cotton, eggplant, sunflower, lettuce, and tomato is depicted in **Figure 2**. Data for non-inoculated control plants are not shown since they were zero. Statistical analyses showed that both main factors −crop species and isolate of Vd− had a significant impact (P < 0.0001) on the disease, but also a significance (P < 0.0001) of the crop species × isolate of Vd interaction was obtained, indicating that the AUDPC of each crop species was influenced by the particular Vd isolate infecting it. Overall, the susceptibility of the crop species studied varied from low in the case of lettuce and tomato (36 and 90 AUDPC across isolates, respectively), to high and very high in that of sunflower (326 AUDPC across isolates) and artichoke (548 AUDPC across isolates), respectively (**Figure 2**). Likewise, the pathogenic ability of Vd isolates was dependent on the crop species that they were infecting. None of them were pathogenic to lettuce or tomato, since disease values did not significantly differ from those of the non-inoculated controls. On the contrary, all the isolates were highly pathogenic to artichoke, with AUDPC values ranging from 590 for 112 to 475 for 1333 (**Figure 2**).

The most interesting reactions to Vd were those of cotton and sunflower. Cotton was highly susceptible to the isolate VdO0913 (551 AUDPC), and was moderately susceptible to the rest of the isolates (average 159 AUDPC across them). Sunflower was highly susceptible to those isolates recovered from sunflower samples (450 AUDPC averaged across VdS0112, VdS0113, and VdS0213) but not to those from olive tree (140 AUDPC averaged across them). Finally, eggplant displayed moderate susceptibility to VdO1113 isolate (192 AUDPC in comparison to 129 AUDPC averaged across the remaining four isolates).

# Pathogenic Characterization: Identification of Sunflower Genotypes as Differentials of Races of Verticillium dahliae From Sunflower

The VdS1116 isolate was pathogenic to HA 458 and Pioneer 3 genotypes but it did not cause any symptom in the rest of the genotypes. All the isolates except VdS1116 induced symptoms in those two genotypes but also in at least one of the others. The UPGMA dendrogram generated with the phenotypic data of experiment 1 shows a first approach to the pathogenic diversity of Vd isolates (**Figure 3**). Individual phenotypic information for each isolate is presented in **Supplementary Table S2**. The UPGMA dendrogram shows two big clusters, one including isolates from Western Europe (France, Spain, and Italy), the isolate from Argentina and the two isolates from Ukraine. The other big cluster grouped isolates from Eastern Europe (Turkey,

Bulgaria, and Romania). This clustering reflects the reactions of the sunflower genotypes: in the Eastern Europe group, Pioneer 4, HA 89 and Pioneer 1 were the most resistant genotypes, while in the case of isolates from Western Europe, Pioneer 1, Pioneer 2, and INRA2603 were the most resistant ones. In both groups, HA 458 and Pioneer 3 were the most susceptible genotypes. The PCA biplot represented in **Supplementary Figure S3** shows these relationships.

**Figure 4** shows the results of experiment 2 in which four sunflower inbred lines (HA 458, HA 89, INRA2603, and Pioneer 1) and 12 isolates representing the diversity observed in experiment 1 were included. Characteristic and differential reactions of sunflower genotypes in experiment 1 were observed again in this experiment. Thus, INRA2603 was resistant to isolates from Western Europe (including those from Argentina and Ukraine) while it was susceptible to isolates from Eastern Europe, and the opposite situation was observed for HA 89 genotype. Pioneer 1 was resistant to both groups of isolates and HA 458 was the one most susceptible to all of them. There were significant differences for DI between genotypes, isolates of Vd and their interaction (P < 0.0001 for all). In general, genotype × isolate combinations resulting in values of DI between 0 and 100 did not significantly differ from those of the non-inoculated plants so that this criterion of <100 was used to determine resistance interactions. Pioneer 1 line was resistant to all isolates (31 DI averaged across isolates) and HA 458 presented DI values of 200 or higher with all the isolates. The INRA2603 and HA 89 responses depended on the isolates, showing resistance reactions for some of them and susceptible ones for others. As in experiment 1, the VdS1116 isolate was pathogenic only to HA 458 (230 DI). Isolates from Turkey, Bulgaria, and Romania caused higher disease responses than 100 in INRA2603 (144 DI averaged across isolates). This same inbred line was resistant to the rest of the isolates. By contrast, HA 89 was resistant to VdS0714, VdS0516, VdS0616, VdS0816, and VdS0916 (Turkey, Bulgaria, and Romania) (68 DI averaged across isolates) and susceptible to the remaining seven Vd isolates (195 DI averaged across them). **Supplementary Figure S1B** shows the symptoms in HA 458, HA 89, INRA2603, and Pioneer 1 inbred lines after inoculation with isolate VdS0316 which corresponds to the typical profile observed for Western Europe Vd isolates.

# DISCUSSION

In this study, a low genetic diversity of Vd from sunflower was found, with only VCG2B and VCG6 being identified among the isolates. One important finding has been the identification of sunflower as the second host, after pepper (Bhat et al., 2003), in which Vd is assigned to VCG6. Moreover, isolates belonging to VCG6 were restricted to Eastern Europe, where we also found a high proportion of Vd isolates not assigned to any VCG. This could be related to a genetic diversity of Vd that is unidentifiable with the available VCG testers. In contrast, VCG2B was the only group identified for all the Vd isolates from Western Europe and Argentina in agreement with previous reports from our group (García-Carneros et al., 2014). The VCG2B has been identified for isolates from Vd pathogenic to other herbaceous crops species, such as mint (Douhan and Johnson, 2001), spinach (Iglesias-Garcia et al., 2013), cotton (Göre et al., 2014), watermelon (Dervis et al., 2009a), or eggplant (Dervis et al., 2009b) among others. Our host range results are consistent with host adaptation (Jiménez-Díaz et al., 2017) in Vd from sunflower, which was clearly pathogenic to sunflower but not to the other crop species, with the exception of artichoke. Host adaptation means that isolates may be pathogenic on multiple

FIGURE 3 | UPGMA dendrogram based on Disease Index values for seven sunflower genotypes inoculated with 21 isolates of Verticillium dahliae from sunflower. Blue and red colors are used to indicate Clusters W and E, respectively.

hosts but are usually more virulent on some hosts, typically, but not exclusively, on those from which they were recovered (Bhat and Subbarao, 1999; Douhan and Johnson, 2001; Jiménez-Díaz et al., 2006). Furthermore, the finding that Vd from sunflower is pathogenic and highly virulent on artichoke is in agreement with the results of its genetic characterization. All three isolates from sunflower included in the host range study were identified as VCG2B, a frequent VCG in Vd from artichoke (Berbegal et al., 2010). Additionally, it was not unexpected to find that isolates of Vd from olive tree affected cotton, since cross pathogenicity of the fungus in both crops has been reported (López-Escudero and Mercado-Blanco, 2011). However, according to our results about the molecular and pathogenic diversity of Vd in Europe, our conclusions on host range might not be applicable to all isolates of Vd in Europe. Host range of Vd from sunflower would be better precised if isolates of the fungus from Eastern Europe and VCG different to VCG2B were considered. From the phytopathological point of view, root tissues and plant debris of any crop species infected by Vd strains from sunflower can serve as carriers and sources of inoculum. Studies on cross pathogenicity in Vd belonging to VCG6 and infecting sunflower and pepper, as well as that of Vd belonging

to VCG2B and pathogenic to herbaceous hosts such as sunflower, mint, spinach, watermelon, or eggplant are needed to better understand the concern that these crops as farming alternatives can raise for possible severe outbreaks or increased severities of Verticillium wilt.

Molecular markers revealed a haplotype diversity that suggests a clear divergence between Vd from the east and west of Europe. An important finding of this study is that molecular differences in Vd from sunflower were mostly related to ND and D pathotypes, since all the isolates were race 2. Since Ave1 was not amplified from any of the haplotypes of Vd from sunflower nor from either of the two from olive tree, all of them lack this gene (de Jonge et al., 2012). Hu et al. (2015) found that ND and D isolates of Vd from cotton correlated with races 1 and 2. In our research ND and D pathotypes of Vd from sunflower were identified irrespective of race. Geographical differences were found instead: the ND pathotype was identified for haplotypes of Vd from Western Europe (cluster W) and the D pathotype for haplotypes from Eastern Europe (cluster E). Another interesting outcome of our study is the unexpectedly strong agreement between haplotype clustering and genetic characterization, with only VCG6 identified in cluster E (east of Europe) and VCG2B the only genetic group in haplotypes from cluster W (west of Europe). Although race identification using molecular biological methods is more useful than time-consuming inoculation experiments, the molecular identification of those pathogenic differences in Vd from sunflower that are clearly distinguished on the basis of phenotypic data is still not possible. Differential pathogenicity within race 2 has also recently been reported by Usami et al. (2017) for Vd from tomato.

Little is known about the pathogenic diversity of Vd in sunflower. Understanding this diversity of Vd populations in Europe and determining the pathogenic races that are present in the area is an essential and determining requirement for efficient resistance breeding. In this study we found that the new race overcoming the V<sup>1</sup> gene into HA 89 (VdS0113 isolate) (García-Ruiz et al., 2014) can be effectively controlled by the resistance in the public line INRA2603. Moreover, INRA2603 frequently presented reverse reactions to the Vd isolates to those of HA 89. Overall, those Vd isolates effectively controlled by INRA2603 were not controlled by HA 89. This was the case of Vd isolates from Argentina, France, Italy, and Spain. On the contrary, INRA2603 was susceptible to isolates from Bulgaria, Turkey, and Romania to which HA 89 was resistant. These results suggest that the nature of the resistance of INRA2603 to Vd, and, probably, its associated resistance mechanism/s, is different to that in HA 89. On the other hand, we propose to name these pathogenic races of Vd as: (a) V2-EE ("Verticillium race 2 East Europe," pathogenic on INRA2603 but not on HA 89), (b) V2-WE ("Verticillium race 2 West Europe," pathogenic on HA 89 but not on INRA2603), and (c) V1 (the race controlled by both HA 89 and INRA2603). Another finding of our study is that race V2-WE of Vd is not only present in Spain, but also in Argentina (VdS0112), France (VdS1414 and VdS1714), Italy (VdS0316), and Ukraine (VdS1016). Whether or not race V2-WE has the same pathogenic abilities as isolates of Vd overcoming V<sup>1</sup> (HA 89) in Argentina (Bertero de Romano and Vázquez, 1982; Galella et al., 2004) and/or in the United States (Gulya, 2007) remains unknown.

The most useful differentials for race characterization of plant pathogens -including those of sunflower- are public inbred lines, since their genetic background is known and they can easily be exchanged between research groups (Molinero-Ruiz et al., 2015). The presence of four pathogenic races has recently been reported in Argentina (Clemente et al., 2017) on the basis of the use of a set of differentials that is not public. Since INRA2603 and HA 89 are public lines and differentially resistant and/or susceptible to Vd in Europe, we propose that they should be used as differentials for pathogenic races of Vd. Thus, the set for identification of pathogenic races of Vd would be: HA 458 (universal susceptible), HA 89 and INRA2603.

#### CONCLUSION

The current study constitutes the first research work focused on the characterization of Vd on sunflower in Europe. Its findings provide new insights into Vd populations affecting sunflower, a preliminary description of three genotypes to establish a universal set of race differentials like, for example, those in downy mildew – sunflower (Tourvieille de Labrouhe et al., 2000), and have fundamental implications for resistance breeding. First, we found that the Vd isolates from sunflower lack the Ave1 gene and are molecularly distinguished into two different groups: Western Europe and Eastern Europe, their differences being associated with ND and D pathotypes, respectively. Even genetic differences were found between both groups, VCG2B being described in Vd from the west of Europe and VCG6 being assigned only to isolates from the east of Europe. With respect to pathogenic characterization of Vd from sunflower, and in addition to race V1, races V2-EE and V2-WE were determined according to the sources of resistance that they overcome (HA 89 and INRA2603 inbred lines). Secondly, any search for resistance to Vd for European environments of sunflower production should take this diversity into account in order to find donors with a broad resistance that can be effective to both V2-EE and V2-WE races. Otherwise, this pathogenic variability must be properly managed through the development of hybrids with resistance to specific geographical areas (Western and Eastern Europe). This research constitutes a milestone in analyzing the diversity of Vd in countries of Europe where sunflowers are grown. Collaborations between public and private sectors similar to that of this work should be advisable in other areas where Verticillium poses a threat to this oil crop.

# AUTHOR CONTRIBUTIONS

AM-S and LM-R analyzed the data, interpreted the results, conceived and designed the experiments, and contributed materials, equipment, and analysis tools. AM-S, SR, AG-C, SG-F, PM-F, and SC-S conducted the experiments. AM-S, AG-C, PM-F, SC-S, and LM-R wrote the manuscript. All authors reviewed the manuscript and approved the final version.

# FUNDING

Financial support for this research was partially provided by the Spanish Ministry of Economy, Industry and Competitiveness (AGL2010-17909 and AGL2016-80483-R grants) and the European Regional Development Fund (ERDF). Nit testers were kindly provided by Prof. R. Jiménez-Díaz (University of Córdoba, Córdoba, Spain) and by Dr. D. A. Johnson and Mr. D. L. Wheeler (Washington State University, Pullman, WA, United States). VdO0913 and VdO1113 isolates of V.dahliae from olive tree were kindly provided by Ms. M. Herrera (Laboratorio de Producción y Sanidad Vegetal de Jaén, AGAPA, Consejería de Agricultura, Pesca y Desarrollo Rural, Junta de Andalucía, Spain). Seeds of artichoke, cotton, eggplant, lettuce, and tomato were provided by Semillas Fitó (Barcelona, Spain).

### ACKNOWLEDGMENTS

We sincerely thank Mr. D. M. Martínez-Rosales (Dow Dupont) for his support in data analysis.

#### REFERENCES


# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.00288/ full#supplementary-material

FIGURE S1 | (A) Severe infection by Verticillium dahliae (Vd) in sunflower. (B) Symptoms produced by Vd isolate VdS0316 in the four sunflower genotypes used for race characterization.

FIGURE S2 | Example of the formation of a prototrophic heterokaryon.

FIGURE S3 | Principal coordinates analysis of the seven sunflower genotypes used in the pathogenic characterization and the 21 Verticillium dahliae isolates used in this study.

TABLE S1 | Molecular characterization of isolates of Verticillium dahliae (Vd) from sunflower using markers diagnostic of defoliating (D) and not defoliating (ND) pathotypes and of races 1 and 2 of the fungal species.

TABLE S2 | Phenotypic data of experiment 1 of the pathogenic characterization of 21 Verticillium dahliae isolates inoculated on seven sunflower genotypes.

compatibility groups. Phytopathology 98, 1019–1028. doi: 10.1094/PHYTO-98- 9-1019



race 2 strains in commercial spinach seed lots and delineates race structure. Phytopathology 104, 779–785. doi: 10.1094/PHYTO-09-13-0253-R


Vallad, G. E., Qin, Q. M., Grube, R., Hayes, R. J., and Subbarao, K. V. (2006). Characterization of race-specific interactions among isolates of Verticillium dahliae pathogenic on lettuce. Phytopathology 96, 1380–1387. doi: 10.1094/ PHYTO-96-1380

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Martín-Sanz, Rueda, García-Carneros, González-Fernández, Miranda-Fuentes, Castuera-Santacruz and Molinero-Ruiz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Fruit Phenolic Profiling: A New Selection Criterion in Olive Breeding Programs

#### Ana G. Pérez <sup>1</sup> \*, Lorenzo León<sup>2</sup> , Carlos Sanz <sup>1</sup> and Raúl de la Rosa<sup>2</sup>

<sup>1</sup> Department of Biochemistry and Molecular Biology of Plant Products, Instituto de la Grasa, CSIC, Seville, Spain, <sup>2</sup> Instituto Andaluz de Investigación y Formación Agraria, Pesquera, Alimentaria y de la Producción Ecológica (IFAPA), Centro Alameda del Obispo, Córdoba, Spain

#### Edited by:

Mariela Torres, Instituto Nacional de Tecnología Agropecuaria (INTA), Argentina

#### Reviewed by:

Agnese Taticchi, University of Perugia, Italy Primo Proietti, University of Perugia, Italy Arnon Dag, Agricultural Research Organization (Israel), Israel

> \*Correspondence: Ana G. Pérez agracia@cica.es

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 11 October 2017 Accepted: 12 February 2018 Published: 27 February 2018

#### Citation:

Pérez AG, León L, Sanz C and de la Rosa R (2018) Fruit Phenolic Profiling: A New Selection Criterion in Olive Breeding Programs. Front. Plant Sci. 9:241. doi: 10.3389/fpls.2018.00241 Olive growing is mainly based on traditional varieties selected by the growers across the centuries. The few attempts so far reported to obtain new varieties by systematic breeding have been mainly focused on improving the olive adaptation to different growing systems, the productivity and the oil content. However, the improvement of oil quality has rarely been considered as selection criterion and only in the latter stages of the breeding programs. Due to their health promoting and organoleptic properties, phenolic compounds are one of the most important quality markers for Virgin olive oil (VOO) although they are not commonly used as quality traits in olive breeding programs. This is mainly due to the difficulties for evaluating oil phenolic composition in large number of samples and the limited knowledge on the genetic and environmental factors that may influence phenolic composition. In the present work, we propose a high throughput methodology to include the phenolic composition as a selection criterion in olive breeding programs. For that purpose, the phenolic profile has been determined in fruits and oils of several breeding selections and two varieties ("Picual" and "Arbequina") used as control. The effect of three different environments, typical for olive growing in Andalusia, Southern Spain, was also evaluated. A high genetic effect was observed on both fruit and oil phenolic profile. In particular, the breeding selection UCI2-68 showed an optimum phenolic profile, which sums up to a good agronomic performance previously reported. A high correlation was found between fruit and oil total phenolic content as well as some individual phenols from the two different matrices. The environmental effect on phenolic compounds was also significant in both fruit and oil, although the low genotype × environment interaction allowed similar ranking of genotypes on the different environments. In summary, the high genotypic variance and the simplified procedure of the proposed methodology for fruit phenol evaluation seems to be convenient for breeding programs aiming at obtaining new cultivars with improved phenolic profile.

Keywords: Olea europaea, olive breeding, virgin olive oil, phenolic compounds, genotype, genotype × environment interaction

# INTRODUCTION

Virgin olive oil (VOO) is a key food within the Mediterranean diet whose daily intake has well-known benefits for human health (Estruch et al., 2013). Olive oil has traditionally been produced and consumed in the Mediterranean countries. Thus, Europe is responsible for 78% of the olive oil world production, and Spain is the largest producer with an average production of 1.3 million tons over the past 7 years, being "Picual" and "Arbequina" cultivars the two most important Spanish varieties in terms of oil production (FAOSTAT, 2016<sup>1</sup> ). The olive possesses a substantial genetic diversity (Belaj et al., 2012) with more than 1,200 olive varieties cataloged (Bartolini et al., 2005). However, most of those cultivars are traditional as few attempts have been made to produce new varieties by systematic breeding (Bellini et al., 2002; Rallo et al., 2008; Lavee, 2013). This can be due to the limited knowledge on the real variability of olive germplasm for many of the most important agronomic and oil quality traits. Also, the long juvenile phase, high heterozygosity and scarce information on trait heritability have been important limiting factors that have negatively affected olive breeding (de la Rosa et al., 2016).

In the past few years, olive growing and olive oil production had shown an exponential increase in non-Mediterranean countries (FAOSTAT, 2016<sup>1</sup> ). The emergence of these new olive producing areas with very different edaphoclimatic conditions respect to those of the Mediterranean countries and the necessary adaptation to intensive production systems and mechanical harvesting have driven the demand for new olive varieties. All these factors have significantly boosted the development of new and more ambitious olive breeding programs (Lavee, 2013). Thus, the objectives of most recent breeding programs are not only agronomic. In these sense, two different marketing strategies need to be fulfilled (i) producing standard quality extravirgin olive oil at lower prices, and (ii) offering consumers a variety of extra-virgin olive oils with high quality standards and different sensory profiles. The first approach is usually related to super-intensive cultivation and highly mechanized harvesting methods, while the second is associated to preserving olive tree biodiversity and traditional methods as part of the extraordinary food traditions associated with the Mediterranean diet (Ilarioni and Proietti, 2014).

The increasing importance of the nutritional quality of olive oil for consumers and markets has led to olive-breeding programs with new nutritional targets (El Riachy et al., 2012; Rugini and De Pace, 2016). In this sense, although VOO contains a number of minor compounds with interesting biological activities, it is generally accepted that the phenolic compounds are the oil components most directly associated with its health related properties (Servili et al., 2014; Bernardini and Visioli, 2017). In addition to their nutritional properties, the phenolic compounds of VOO also have important organoleptic implications since they are the main contributors to bitter and pungent sensory descriptors. The secoiridoids compounds, containing in their molecules the phenolic alcohol tyrosol (p-HPEA) or its hydroxyl derivative hydroxytyrosol (3,4-DHPEA), are the most abundant class of phenolics in all olive products. Thus, the main phenolic glucosides present in the olive fruit are the secoiridods oleuropein, ligstroside and demethyloleuropein, and their hydrolytic derivatives, the dialdehydic forms of decarboxymethyloleuropein and decarboxymethylligstroside aglucones (3,4-DHPEA-EDA and p-HPEA-EDA, respectively) and the aldehydic forms of oleuropein and ligstroside aglucones (3,4-DHPEA-EA and p-HPEA-EA, respectively) are the main phenolic components in most olive oils (Montedoro et al., 2002; Pérez et al., 2014). Many studies reporting the ability of VOO phenolics to reduce chronic inflammation and oxidative damage relate these beneficial effects of VOO with the level of 3,4-DHPEA in plasma (Mateos et al., 2011; Bernardini and Visioli, 2017). This scientific evidence has led the European Union to approve a health claim on olive oil polyphenols which may be applied only for oils containing at least 250 mg/kg of hydroxytyrosol and its derivatives (European Commission, 2012).

The metabolism of phenolic compounds in the olive tree is very complex, and is modulated by genetic (Talhaoui et al., 2016) and environmental factors (Romero and Motilva, 2010) that determine the final phenolic composition of olive fruits. In a similar way, the influence of agricultural practices such as limited irrigation (Cirilli et al., 2017), optimization of pruning to increase light availability (Proietti et al., 2012) or selection of optimum harvest date (Famiani et al., 2000) have also been described. The phenolic glycosides present in the olive fruit are later transformed and modified by endogenous hydrolytic and oxidative enzymes that are activated during the oil extraction process. In this way, the phenolic profile of VOO is directly related to the phenolic content of the olive fruit (Gómez-Rico et al., 2008) and the activity of hydrolytic and oxidative enzymes during milling and malaxation (Romero-Segura et al., 2011). Any breeding program that aims to improve the phenolic composition of the VOO should consider an experimental design that allows the evaluation of all these factors. Likewise, it is important to have accurate analytical tools to find out the influence of each specific factor on the final phenolic composition of VOO. Olive breeding programs focusing on oil quality have additional limitations to those previously mentioned (de la Rosa et al., 2016) due to the very large number of genotypes but very little oil production achieved in the early stages of breeding. For this reason, it is vital to have reliable analytical methodologies to predict the composition of the oil from the analysis of the fruits. In this sense, recent studies have described significant correlations between the composition of olive fruits and oils for components such as fatty acids, sterols, tocopherols, or squalene (Velasco et al., 2014; de la Rosa et al., 2016). To the best of our knowledge, fruit phenolic profiling has not previously been used as selection criteria in olive breeding. Besides, there are no previous reports on olive comparative trials devoted to study the interaction of genetic and environmental factors that may influence fruit phenolic composition which in turn determines the phenolic profile of VOO.

The objective of this work is to describe the use of new analytical tools that allow predicting the phenolic composition of the oils from the phenolic profiling of the olive fruits, without

<sup>1</sup>http://faostat.fao.org/site/567/DesktopDefault.aspx?PageID=567#ancor

the previous step of oil extraction, and therefore facilitating the scrutiny of large seedling populations. The predictive method developed has been used to select a new olive breeding selection producing VOO with an optimum phenolic composition.

#### MATERIALS AND METHODS

#### Plant Material

The selection process on the olive breeding program of Cordoba, (Spain) is divided in three steps, seedling stage, intermediate selection and final comparative trials (León et al., 2015). In each step, the number of genotypes is reduced and the number of replications per genotype increases (de la Rosa et al., 2016). The final comparative trials are planted in different environments with a reduced number of selections according to the potential adaptability to each edaphoclimatic conditions.

In the present work, phenolic evaluation was carried out in three comparative trials planted in typical olive growing areas of Andalusia, Sothern Spain: Córdoba, Morón, and Ubeda. Ubeda has lower winter temperatures and rainfall than the other two locations, while Cordoba has the highest rainfall (**Table 1**). Soils mainly differ on the clay percentage (42.0% in Moron, 31.9% in Ubeda and 22.3% in Cordoba). Two breeding selections (UCI-2-68 and UCI-5-65) were planted in the selected locations together with their parents ("Picual" and "Arbequina"), following an unbalanced design. These two selections showed high productivity, oil content and oleic acid percentage in previous breeding selection stages (León et al., 2004a,b, 2007, 2011). Besides, four more breeding selections were also included in the Ubeda trial (UCI-12-85, UCI-12-104, UCI-19-79, and UCI-19-60) for having good productivity and high oil content (unpublished data). All the breeding selections come from crosses performed in 1992 to 1997, between "Picual" and "Arbequina." All the trials have a randomized complete design with 3–4 blocks and 3–4 trees per elementary plot. Trees were planted in 2011 at 6 × 7 m distance and standard fertilization and irrigation practices were carried out. Irrigation supply (1,500 m<sup>3</sup> per year and h) by in-line drippers was used to avoid water stress of plants.

Fruit samples of 2 kg were harvested per each genotype and elementary plot in 3 blocks per each of the three trials (Córdoba, Morón, and Ubeda). Breeding selections and parents were collected in a common date, in mid-November 2016, typical for olive harvesting in southern Spain, when most fruits were at turning color (de la Rosa et al., 2013). Random subsamples were taken for both direct fruit analysis and oil extraction.

# Chemicals

Reagents for extraction and other measurements were supplied by Sigma-Aldrich (St. Louis, MO). Oleuropein, verbascoside, luteolin-7glucoside, apigenin-7glucoside, rutin, apigenin, luteolin, tyrosol, hydroxytyrosol, vanillic acid, vainillin, p-coumaric acid, and ferulic acid were obtained from Sigma Chemical (St. Louis, MO, USA) and Extrasynthese (Genay, France). Non-commercial phenolic standards such as ligstroside, hydroxytyrosol-1-glucoside, or the main secoiridoids derivatives were obtained from olive leaves, fruits and oils using a high performance liquid chromatography (HPLC) preparative system.

#### Olive Oil Extraction

Olive oil was extracted using an Abencor analyzer (Comercial Abengoa, S.A., Seville, Spain) that simulates the industrial process of VOO production on a laboratory scale (Martínez et al., 1975). Processing parameters have been precisely described in a previous study (Pérez et al., 2014).

#### Extraction and Analysis of Fruit and VOO Phenolic Compounds

Fruit phenolic compounds were extracted according to a previously developed protocol (García-Rodríguez et al., 2011).

TABLE 1 | Mean temperatures (maximum, minimum, and average) and monthly rainfall during 2016 in the three locations studied.


Total rainfall 615.2 562.0 404.8

Longitudinal pieces of mesocarp tissue were cut from 20 olive fruits and kept at 4◦C for 24 h in DMSO (6 ml/g of fruit), containing syringic acid (24 mg/ml) as internal standard. The extracts were filtered through a 0.45µm mesh nylon and kept at −20◦C until HPLC analysis.

VOO phenolics were isolated by solid phase extraction (SPE) on a diol-bonded phase cartridge (Supelco, Bellefonte, PA) following a previously described procedure (Mateos et al., 2001). 0.5 ml of a methanol solution containing p-hydroxyphenyl-acetic acid and o-coumaric acid as internal standards was added to each oil sample (2.5 g) before the extraction.

Phenolic extracts from fruits and oils were analyzed by HPLC on a Beckman Coulter liquid chromatography system equipped with a System Gold 168 detector, a solvent module 126, an autosampler module 508 and a Waters column heater module following a previously described methodology (Pérez et al., 2014). A Superspher RP 18 column (4.6 mm i.d. × 250 mm, particle size 4 µm: Dr Maisch GmbH, Germany) at flow rate 1 mL min−<sup>1</sup> and a temperature of 35◦C was used for all the analyses. A total of 15 phenolic compounds were analyzed in fruit phenolic extracts: hydroxytyrosol-4-glucoside, hydroxytyrosol-1-glucoside, demethyloleuropein, verbascoside, luteolin-7-glucoside, demethylligstroside, rutin, oleuropein, comselogoside, ligstroside, luteolin 3,4-DHPEA-EA, apigenin, and p-HPEA-EA. The last four compounds were also analyzed in VOO extracts in which 12 other phenolic compounds were also detected: hydroxytyrosol, tyrosol, vanillic acid, vainillin, p-coumaric acid, hydroxytyrosol acetate, 3,4- DHPEA-DEA, p-HPEA-DEA, pinoresinol, acetoxypinoresinol, and ferulic acid. The quantification of flavones and ferulic acid was done at 335 nm while the rest of phenolic components were quantitated at 280 nm. Response factors were calculated for each phenolic compound as described previously (García-Rodríguez et al., 2011).

The tentative identification of compounds by their UVvis spectra was confirmed by HPLC/ESI-qTOF-HRMS. The liquid chromatograph system was Dionex Ultimate 3000 RS U-HPLC liquid chromatograph system (Thermo Fisher Scientific, Waltham, MA, USA) equipped with a similar Superspher RP 18 column but with formic acid (1%) instead of phosphoric acid (0.5%) in the mobile phase. A split post-column of 0.4 mL/min was introduced directly on the mass spectrometer electrospray ion source. The HPLC/ESI-qTOF operated for mass analysis using a micrOTOF-QII High Resolution Time-of-Flight mass spectrometer (UHRTOF) with qQ-TOF geometry (Bruker Daltonics, Bremen, Germany) equipped with an electrospray ionization (ESI) interface. Mass spectra were acquired in MS fullscan mode and data were processed using TargetAnalysis 1.2 software (Bruker Daltonics, Bremen, Germany).

#### Statistical Analysis

Fruit samples (2 kg) were harvested per each cultivar and elementary plot in 3 blocks per each trial and random subsamples were taken for both direct fruit analysis and oil extraction. All the data were statistically evaluated using STATISTICA 5.0 (Statsoft Inc., Tulsa, OK, USA). Descriptive statistics and variability plots were obtained for the whole dataset of genotypes and environments. Correlations among phenols or group of phenols were analyzed for the whole dataset using Pearson's correlations (at p ≤ 0.05; p ≤ 0.01; p ≤ 0.001). A subset of data (4 genotypes and 3 environments) was used to evaluate the relative contribution of genetic and environmental factors on the phenolic variability by means of ANOVA and separation of the means was obtained at p ≤ 0.05 by least significance differences (LSD). Principal component analysis (PCA) was used to evaluate the levels of association among the phenolic compounds from the cultivars and advanced breeding selections under study. PCA was applied to the same subset of data used for ANOVA (4 genotypes and 3 environments) and to a second subset of data containing all the genotypes in a single environment.

# RESULTS

#### Identification and Quantitation of the Main Phenolic Components of Olive Fruits and Oils from New Advanced Breeding Selections

The analysis of the fruit phenolic extracts allowed identifying a great variability in the profile of phenolic glycosides of the eight olive genotypes and three environments under study. Demethyloleuropein, was the most abundant phenolic compound with a mean content of 8,787.4 µg/ g fruit and a range of variability from 26.1 to 23,937.9 µg/ g fruit. This was followed by oleuropein with a significantly lower mean value (4,966.2 µg/ g) and a lower range (175.2–16,542.2µg/g). The mean contents of verbascoside, ligstroside and luteolin-7-glucoside were considerably lower than those of demethyloleuropein and oleuropein: 825.2, 497.2, and 391.06µg/g respectively.

Similarly, a great variability was also observed in terms of phenolic components of the oils. Tyrosol (p-HPEA) and hydroxytyrosol (3,4-DHPEA) derived compounds were the most important class of phenolic compounds found in all VOOs. Among them 3,4-DHPEA-EDA was the most abundant phenolic component in the studied oils with a mean content of 234.1µg/g of oil in the range 27.1–576.81µg/g. The tyrosol derivative p-HPEA-EDA was the second most abundant component (mean value of 148.2µg/g and range 23.3–444.11µg/g) followed by another hydroxytyrosol derivative 3,4 DHPEA-EA (mean value 85.3 and range 14.2–410.1µg/g) while lower contents were found for p-HPEA-EA (mean value 15.87µg/g) and 3,4-DHPEAacetate (mean value 11.06µg/g). Significant amounts of lignans (acetoxypinoresinol and pinoresinol), flavones (luteolin and apigenin), and phenolic acids (cinnamic acid, p-coumaric acid, vanillic acid, and ferulic acid) were also found in the oils obtained from the eight genotypes and three environments analyzed. The highest variability ranges found among them, although less important from a quantitative point of view than those previously mentioned for tyrosol derived compounds, were those of acetoxypinoresinol which possesses promising anticancer activity (mean value 23.7 and range 10.2–74.33,4µg/g) (Menéndez et al., 2008) and luteolin (mean value 7.32µg/g).

# Genetic and Environmental Effects on the Phenolic Composition of Olive Fruit and Oil

The well-known influence of edaphoclimatic parameters on the quality of VOO makes extremely important to include different environments in the evaluation of advanced olive breeding selections (León et al., 2016). In the present work, the phenolic evaluation was carried out in three comparative trials (Morón, Córdoba, and Ubeda), representing different edaphoclimatic areas in Southern Spain. The variability specifically induced by both factors, genotype and environment, on the phenolic profile of fruits and oils was analyzed. **Figure 1**, graphically shows the variability observed in terms of the total phenolic content in the fruits, and in the content of the three main secoiridoid glycosides. Thus, among the genotypes grown in different environments, the highest phenolic content was always found in fruits grown in the environment Ubeda, while the lowest content was generally associated to fruits from Córdoba. However, the range of this environmental variability was different for each genotype. **Figure 2** graphically shows the variability plots of the main phenolic components found in VOO. The influence of the environmental factor on the oils was in good agreement with that observed in the fruits. As previously reported for fruit phenolic, in those genotypes grown in the three different environments, higher phenolic contents were always associated to oils from the environment Ubeda. The highest phenolic contents were found in oils obtained from breeding selections UCI 12-85 and UCI 12-104 grown in Ubeda. Besides their extremely high phenolic content, these two genotypes also possess very high levels of p-HPEA-EDA (320 and 450µg/g oil, respectively), which is closely related to the pungency of VOO (Andrewes et al., 2003).

In order to evaluate the relative contribution of genetic and environmental factors on the variability observed in the phenolic content of fruits and oils a subset of data from the genotypes "Arbequina," "Picual," UCI 5-65, and UCI 2-68, was subjected to analysis of variance. Minor phenolic components were excluded and only 15 phenolic variables were included in the analysis. Four variables were selected in fruits (total phenolics; demethyloleuropein, oleuropein and ligstroside) and 11 variables were selected in VOOs (total phenolics; 3,4 DHPEA-EDA; p-HPAE-EDA; 3,4 DHPEA-EA; HPEA-EA, 3,4 DHPEA-acetate, pinoresinol, acetoxypinoresinol, luteolin, apigenin, and the sum of the main phenolic acids (cinnamic acid, p-coumaric acid, vanillic acid, and ferulic acid). The influence of each factor was estimated by the percent of the total variance (**Figure 3**). Analysis of variance shows a significant effect (p ≤ 0.05) of genotype for all the phenolic compounds analyzed in fruits and VOOs. The environment was the major contributor to the variance of the total phenolic content of the fruits (66.9%), fruit secoiridoids (59.7%), oleuropein content (46.8%), and to a lesser extent to the variance of the total phenolic content of the oils (29.99%). On the contrary, the genotype × environment effect was only significant (p ≤ 0.05) for demethyloleuropein and 3,4 DHPEA-EDA. To analyze this information in detail, the mean values of the 15 selected variables in the four genotypes and three environments analyzed were compared (**Table 2**). Significant differences were found for key phenolic components among the four genotypes. UCI 2-68 displayed the highest phenolic content in both, fruit and VOO. The content of demethyloleuropein, the most abundant phenolic glycoside found in the olive fruits analyzed was significantly higher in UCI 2-68 and "Arbequina," with moderate levels found in UCI 5-65 and very low amount detected in "Picual" fruits. The genotype UCI 2-68 also had the highest contents of 3,4 DHPEA-EDA, p-HPEA-EDA, 3,4 DHPEA-acetate, and pinoresinol.

# Relationship between Fruit and VOO Phenolic Components

To investigate the possibility to predict the phenolic composition of the VOO from the analysis of the phenolic profiles of the fruits Pearson's correlation coefficients were computed using the data obtained from all the fruits and oils analyzed in this study (**Table 3**). Significant positive correlation was found between total fruit phenolics and total VOO phenolic contents (r = 0.685) and even a slightly higher coefficient was calculated for fruit secoiridoids compounds and the total phenolic content of the oil (r = 0.735). The highest correlation found between individual fruit phenolic compounds and VOO phenolic content was found for demethyloleuropein (r = 0.650). Curiously, oleuropein content correlated poorly to total fruit phenolics (r = 0.289) and no correlation at all was found between oleuropein content and the total phenolic content of the oil (r = −0.057). Data shown in **Table 3** also provides relevant information on the relationship between the different phenolic components of VOO. Thus, 3,4-DHPEA-EDA was highly correlated to total AOV phenolics (r = 0.850) followed by p-HPEA-EDA (r = 0.759) while significantly lower correlation coefficients were found for the secoiridoids with monoaldehyde structure, 3,4-DHPEA-EA and p-HPEA-EA (r = 0.217 and r = 0.144, respectively).

#### Phenolic Profiling as a Selection Criterion in Olive Breeding

PCA analysis was first applied to the same subset of data used for the variance analysis (four genotypes and three environments). The first and second principal components described 60% of the total variability (PC1 35.66 and PC2 23.65%). PC1 was strongly linked to demethyloleuropein (r = 0.90), 3,4-DHPEA-EDA (r = 0.82), p-HPEA-EDA (r = 0.78) while negatively correlated to oleuropein (r = −0.67), 3,4-DHPEA-EA (r = −0.71), p-HPEA-EA (r = −0.67). PC2 was positively correlated to total VOO phenolic content (r = 0.72), pinoresinol (r = 0.633) and to a lesser extent to acetoxypinoresinol (r = 0.46) and negatively correlated to luteolin and apigenin content (r = −0.57 and r = −0.78, respectively). The PCA bi-plot (**Figure 4**) shows the contribution of the PCA analysis to sample profiling for genotype and environment. The distribution of phenolic profiles in the PCA bi-plot suggests a greater influence of genotype vs. environment. In this sense, the distinction of different environments within each genotype is not evident in the four genotypes studied. Thus, the profiles of UCI 5-65 and UCI 2-68 grown in different environments were only partially segregated in the scatter-plot. On the contrary, when PCA analysis was applied to all the breeding selections grown in the environment Ubeda

the eight genotypes were clearly separated in the scatter plot (**Figure 5**).

# DISCUSSION

#### Phenolic Composition of New Advanced Olive Breeding Selections. Influence of Genetic and Environmental Factors

The major aim of the olive breeding program of Cordoba is to select new olive cultivars that, together with their good agronomic characteristics, are able to produce oils with an optimum nutritional and organoleptic quality. The genotypes included in this study are advanced selections from this program which in previous breeding selection stages have shown good productivity and high oil content (León et al., 2004a,b, 2007, 2011). The phenolic profiles of fruits and oils of these new breeding selections together with their parents ("Picual" and "Arbequina") showed a great variability in the three environments studied (**Figures 1**, **2**). The greatest variability in the phenolic profile of the fruits was associated to demethyloleuropein, oleuropein and verbascoside. The mean value found for demethyloleuropein (8787.4µg/g) is lower than the mean content analyzed in the "Arbequina" fruits grown in the three environments selected in this study shown in **Table 1** (11,771.5µg/g) but quite similar to other values recently reported for "Arbequina" fruits also grown in south Spain (Romero et al., 2017). However, the significant content of demethyloleuropein found in all the breeding selections analyzed it is quite remarkable considering the fact that this phenolic glycoside is only present in a very short number of traditional olive cultivars (Gómez-Rico et al., 2008). The mean content of oleuropein found among the advanced breeding selections (**Figure 1**) was lower than the mean value found for "Picual" fruits in the environments selected in this study (**Table 2**) and also lower than those recently reported for Picual fruits grown in seven orchards representatives of southeast Spain (Romero et al., 2017). The high content of verbascoside found in the breeding selections analyzed might also be of interest from a nutritional point of view given the biological properties and clinical potential described for this compound (Alipieva et al., 2014). However, the content of this glucoside is not so relevant in relation to VOO phenolic composition since verbascoside, due to its chemical structure, is not hydrolysable by olive β-glucosidase so that no significant hydrolytic derivatives of


Different letters indicate LSD between genotypes and environments ANOVA (P≤0.05).

Pérez et al. Fruit Phenolic Profiling in Olive Breeding

this glucoside are found in VOO (Romero-Segura et al., 2012). The two most relevant phenolic compounds found in the oils of the eight genotypes were 3,4 DHPEA-EDA and p-HPEA-EDA. The variability range of 3,4 DHPEA-EDA was quite similar to that previously analyzed in 136 olive seedlings from a single cross Picual × Arbequina (Pérez et al., 2014) and significantly higher than those found among the crosses "Arbequina" × "Arbosana" and "Sikitita" × "Arbosana" (El Riachy et al., 2012).The wellknown biological activity of 3,4 DHPEA-EDA (Grasso et al., 2007; Bernardini and Visioli, 2017) and its high content suggest a key contribution of this compound to the antioxidant activity of the oils. Similarly, the high content of p-HPEA-EDA found among the analyzed breeding selections may also have important quality implications, both nutritional and organoleptic, due to its relation to oil pungency and to its anti-inflammatory properties (Lucas et al., 2011).

One of the main goals of this study was to estimate the relative influence of genetic and environmental factors on the phenolic composition of olive breeding selections. The variability plots obtained for the main phenolic components of the fruits and oils (**Figures 1** , **2**) and the comparison of means (**Table 2** ) show the specific contribution of environments and genotypes to the phenolic variability found in this study. Rainfall and/or irrigation regime of the olive tree it is probably the most studied environmental factor influencing the composition of VOOs, (Romero and Motilva, 2010). In this sense, given that the water applied by irrigation was similar in the three environments, the major differences in water availability corresponds to rainfall (**Table 1**), that was higher in Cordoba respect to the other two locations, with the lowest value associated to Ubeda. The overall higher phenolic content found in fruits and oils from Ubeda could be related with the lower rainfall of this location and similarly, the lower phenolic content of genotypes grown in Cordoba could be explained by its highest rainfall. The increase of phenolic content in VOO with reduced water availability has been well documented for olive (Marra et al., 2016; Cirilli et al., 2017) although this relation has not always been clearly observed (Pierantozzi et al., 2014). However, taking into account that the three environments differs on many aspects (soil type, temperature regime, rainfall, etc.), differences on phenolic composition among the three environments could not be directly attributed to a single environmental factor. In relation to the contribution of the genotype, the variability pattern of UCI 2-68 in total VOO phenolics is in good agreement with that observed for demethyloleuropein content (**Figure 1B**), which is the precursor of 3,4-DHPEA-EDA the most abundant phenolic compound analyzed in the oils of UCI 2- 68, (**Figure 2B**). Similarly, the highest variability of "Picual" oils for the 3,4 DHPEA-EA and p-HPAE-EA contents (**Figures 2E,F** ) correlates with that previously mentioned for oleuropein and ligstroside contents in the same fruits (**Figures 1C,D**). The major contribution of the genotype to the phenolic variability was clearly demonstrated after calculating the percent of the total variance (**Figure 3**). Thus, while a significant effect of the genotype was found for all the phenolic compounds, the environment was only a major contributor to the variance of the total phenolic content of the fruits (66.9%). According

TABLE 2 | Phenolic components

 (µg g−1) of fruits (lowercase)

 and oils (uppercase

 letters) with regards to genotype and environment.


 significant 

to variance components, the strongest genotypic effects were observed in the contents of apigenin (86.8%), 3,4-DHPEA-EA (84.2%), demethyloleuropein (73.4%), and pinoresinol (66.6%). As shown in **Figure 3**, the genotype × environment effect was only significant (p ≤ 0.05) for demethyloleuropein and 3,4 DHPEA-EDA, although in both cases the relative contribution to total variance is very low compared to the main effects.

The comparison of the mean values of the 15 selected variables in the four genotypes and three environments (**Table 2**) showed

statistically significant differences among the phenolic profiles. The oils from selection UCI2-68 had the highest phenolic content in fruits and oils, and the highest levels of 3,4-DHPEA-EDA. In contrast to previous literature on olive oil, the mean phenolic contents analyzed in the oils from "Picual" and "Arbequina" cultivars (482,2 and 502µg/g, respectively) were not significantly different. In this sense, some studies have also reported the similarities between the phenolic contents of VOOs obtained from fruits from these two cultivars harvested at specific ripening stages (García González et al., 2010; Pérez et al., 2014). The

genotypes.

comparison of the mean values determined in each environment indicates that, with the exception of 3,4-DHPEA-EDA and Pinoresinol, significantly lowest phenolic contents were always found in the fruits and oils from Cordoba. On the contrary, although the values found in fruits and oils from Ubeda were usually higher than in the other two environments, for most traits the differences between Ubeda and Moron were not statistically different.

According to the data obtained, the phenolic profile seems to be dependent on the genotype with only quantitative, but noqualitative, differences associated to the environmental factor. Although it is clear that variations among years might be added to that observed between environments, considering that the climatic conditions (maximum, minimum, and average temperature and rainfall) of 2016 were in the average range of the last 10 years, the results could be extendable to other seasons.

# Relationship between Fruit and VOO Phenolic Components

One of the main objectives of this study was to investigate the suitability of fruit phenolic profiling to predict the phenolic composition of VOO from olive breeding selections. The hydrolysis of secoiridoid glycosides seems to be the key step in the biosynthesis of phenolic components during the extraction of VOO (Obied et al., 2008). In previous studies we have demonstrated that during the milling of the olive fruits, cell integrity is disrupted and phenolic glycosides are transformed to their corresponding aglycones (3,4-DHPEA-EDA, p-HPEA-EDA, 3,4-DHPEA-EA, p-HPEA-EA, luteolin or apigenin) by a highly specific olive β-glucosidase (Romero-Segura et al., 2012). The secoiridoid derivatives formed may be further hydrolyzed to simple phenolic compounds such as 3,4 DHPEA or p-HPEA. Other compounds such as the lignans, pinoresinol and 1-acetoxypinoresinol, not detected in the olive pulp, are presumably formed in the olive seed and only liberated and transferred to the oil after olive stone crushing (Klen et al., 2015). However, a number of studies have reported that only a minimal amount of the phenolic compounds formed during the milling of the olive fruits and the subsequent malaxation of olive pastes are transferred to the oil. This transfer being cultivar dependent (Talhaoui et al., 2016) and significantly different for each class of compounds, with the highest transfer rate corresponding to secoiridoids compounds, followed by flavonoids and simple phenols. The Pearson's correlation coefficients computed with all the phenolic components analyzed in fruits and oils could provide useful information on this issue (**Table 3**). The correlation between the total phenolic content of fruits and oils was (r = 0.685) and slightly higher between total secoiridoids from fruits and oils (r = 0.740). Higher correlations have been reported for fatty acids (r = 0.98), tocopherols (r = 0.96), and other compounds analyzed in olive fruit and VOO (Velasco et al., 2014). However, those compounds are already present in the olive fruit tissue while the biosynthesis of oil phenolic components may involve a number of complex biochemical reactions (Obied et al., 2008; Klen et al., 2015) which are also affected by the oxidative degradation catalyzed by olive polyphenol oxidase and peroxidase (García-Rodríguez et al., 2011). The significant positive correlation found between demethyloleuropein and the VOO phenolic content (r = 0.650) contrasts with the low value found for oleuropein (r = −0.057). However, the very different correlation coefficients found for both secoiridoid glycosides are in good concordance with the higher mean value found for demethyloleuropein compared to that of oleuropein in the fruits of the eight genotypes analyzed (**Figure 1**). Similarly, the highest Pearson's coefficients between individual fruit and oil phenolic components were obtained for demethyloleuropein and 3,4-DHPEA-EDA (r = 0.900), which in turn is the most abundant secoiridoid compound in the oils analyzed in this study (**Figure 2**). This data also corroborates previous findings on the high specificity of the olive β-glucosidase which forms 3,4-DHPEA-EDA as the unique product of demethyloleuropein hydrolysis while 3,4-DHPEA-EA is the main, but not the unique product found after oleuropein hydrolysis (Romero-Segura et al., 2012). In this sense, the correlation coefficient found for 3,4-DHPEA-EA and oleuropein was significantly lower (r = 0.611) than that mentioned for 3,4-DHPEA-EDA and demethyloleuropein. Positive correlation coefficients were also found between demethyloleuropein and p-HPEA-EDA (r = 0.663) and demethyloleuropein and 3,4-DHPEA acetate (r = 0.758) although no conclusive data has been obtained so far on their biosynthesis. In a similar way, ligstroside content positively correlated to p-HPEA-EA (r = 0.435) which suggests a similar biosynthetic pathway for the latter compound to that described for 3,4-DHPEA-EA (Romero-Segura et al., 2012). The also high correlation coefficient found between ligstroside and oleuropein (r = 0.765) points to a common biosynthesis for both glycosides not fully demonstrated yet (Obied et al., 2008).

Data shown in **Table 3** also reveal significant differences between the major phenolic components of the VOO, with very high correlation values between 3,4-DHPEA-EDA (r = 0.850) and p-HPEA-EDA (r = 0.759) and the total phenolic content of the oils but very low coefficients for 3,4-DHPEA-EA and p-HPEA-EA. Similar correlations were reported by El Riachy et al. (2012) for these monoaldehydic compounds in the analysis of two different segregating populations although non-significant correlation was found for the major secoiridoid component of VOO, 3,4-DHPEA-EDA. An interesting relation was also found also between 3,4-DHPEA-EDA and the non-secoiridoid 3,4-DHPEA acetate (r = 0.612). This correlation value, and that previously mentioned between demethyloeluropein and 3,4-DHPEA-acetate, seem to support the formation of 3,4-DHPEA-acetate not by simple hydroxytyrosol acetylation but through a more complex biochemical pathway involving the cleavage of the aglycone formed after demethyloleuropein deglucosilation. The elucidation of 3,4-DHPEA-acetate biosynthesis could have great interest from a biotechnological point of view given the enhanced bioavailability of this hydroxytyrosol derivative (Mateos et al., 2011). Pinoresinol was the non-secoiridoid compound which best correlated to AOV phenolic content (0.630) while a very low correlation coefficient was calculated for acetoxypinoresinol (r = 0.122).

Among all the information obtained from the Pearson correlation coefficients computed, it is important to emphasize that the significant correlations found between specific phenolic compounds, or groups of phenolic compounds, in the olive fruit, and the phenolic content of AOV may have an important predictive value.

### Phenolic Profiling as a Selection Criterion in Olive Breeding

The first and second principal components from the PCA applied to the data set used for analysis of variance described 60% of the total variability found in the four genotypes and three environments (**Figure 4**). This value is quite similar to that found in previous studies on the phenolic variability of olive progenies (El Riachy et al., 2012; Pérez et al., 2014). The distribution of phenolic profiles in the PCA bi-plot seems to confirm the conclusion raised after the analysis of variance, since most of the variability shown in **Figure 4** corresponds to genotype rather than to environment. In this sense, while the four genotypes are clearly segregated, the distinction among the different environments within each genotype is not always possible. Thus, the profiles of UCI 5-65 and UCI 2-68 grown in different environments were only partially segregated in the scatter-plot.

The genotype Picual, located in the second quadrant of the plot (**Figure 4B**), is clearly separated from the other three genotypes. According to the vector distribution (**Figure 4A**) oleuropein, 3,4 DHPEA-EA, p-HPEA-EA, and acetoxypinoresinol are related, and located in the second quadrant in which the genotype Picual is also included (**Figure 4B**). The other three genotypes are mainly located in the right part of the scatter plot, mostly in the first and fourth quadrants. According to the scatter plot the phenolic profiles of the two analyzed breeding selections (UCI 5-65 and UCI 2-68) seem to be closer to "Arbequina" than to "Picual" cultivar. In this sense, UCI 2-68, located in the first quadrant has a phenolic composition quite similar to that of "Arbequina" cultivar but a higher phenolic content. On the contrary, selection UCI 5-65 exhibits also a similar phenolic pattern, with higher luteolin and apigenin levels that increase its potential antioxidant properties (Rice-Evans et al., 1995), but with a significantly lower total phenolic content. As shown in **Figure 4B**, the overall higher phenolic content of fruits and oils from Ubeda are confirmed by the upper position of 5-65-U and 2-68-U in their respective groups. On the contrary, the influence of environmental factors seems to be less important in "Arbequina," located in the central area of the plot and in "Picual" cultivar located along the second quadrant.

The effect of environmental factors on the phenolic profile of new breeding selections may provide very relevant information within an olive breeding program. In this sense, the total phenolic content of oils from UCI 5-65 grown in Ubeda is 482,2µg/g oil, but this content is significantly lower in other environments (**Figure 2A**). Thus, the very low phenolic content found in the oils obtained from fruits grown in Moron (253.0µg/g oil) could negatively affect the nutritional quality of these oils and could exclude them from the European health claim (European Commission, 2012). Taking into account the minimum values established by the EFSA for VOO phenolic health claim (250µg/g oil), and the additional benefits of a medium-high phenolic content for the organoleptic properties and the stability of VOO, the breeding selection UCI 2-68 is clearly superior to UCI 5-65 in terms of phenolic composition.

When PCA analysis was applied to breeding selections grown in the same environment genotypes are clearly segregated (**Figure 5**). The selection UCI 12-85, with the highest oil phenolic content (999µg/g oil) and the selection UCI 12-104 are located in the upper part of the first quadrant, above the selection UCI 2-68. The high phenolic content of the breeding selection UCI 12-85 may have potential in terms of nutritional quality. However, it is important to point out that the very high levels of secoiridoid compounds in its oil may also have a negative impact in its organoleptic properties. In this sense, the VOO from this selection has the highest contents of p-HPEA-EDA (321µg/g oil) which is highly related to the pungency of VOO (Andrewes et al., 2003) which greatly affects consumer acceptance. According to their respective positions in the scatter plot the breeding selections UCI 19-60 and UCI 19-19 may be categorized as medium and low phenolic content genotypes. The breeding selection UCI 19-79 had slightly higher phenolic content than UCI 5-65 but lower than "Arbequina" cultivar. The localization of this selection in the center of the fourth quadrant (**Figure 5B**) matches that of luteolin and 3,4-DHPEA acetate in the vector distribution of the variables (**Figure 5A**). These two compounds are the most significant phenolic components in the VOO of UCI 19-79, which possesses the highest content of luteolin and 3,4-DHPEA acetate among all the genotypes analyzed (14.4 and 27,6µg/g oil, respectively).

# CONCLUSION

The high correlation found between fruit and oil phenolic components content, as well as the high genotypic variance for them, indicate that the analysis of fruit phenolic compounds, without the previous step of oil extraction, is an useful tool for olive breeding which could facilitate the selection of olive genotypes with potential interest in terms of oil phenolic composition. In this sense, fruit phenolic profiling could be used as selection criterion in the early stages of olive breeding programs to avoid the selection of genotypes whose oils will never reach an optimum phenolic content (European Commission, 2012). Moreover, the low genotype × environment interaction on phenolic composition, leading to a similar ranking of genotypes on the different environments, could also facilitate the evaluation of new selections from breeding works. The analytical methodology reported in this study allowed the identification of the new selection UCI2-68, characterized by an optimum phenolic profile, together to a good agronomic performance previously reported (León et al., 2004a,b, 2007, 2011), which represents a new olive cultivar producing superior quality oil.

# AUTHOR CONTRIBUTIONS

RdlR and LL: Conceived and designed the breeding experiments; AP and CS: Designed and performed the analytical studies; AP: Wrote the manuscript. All authors discussed and commented the manuscript.

#### FUNDING

This work was partly funded by European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 645595, by Research Project AVA201601.2 from IFAPA (partially

#### REFERENCES


funded by European Regional Development fund) and by Programa Nacional de Recursos y Tecnologías Agroalimentarias financed by the Spanish Government, project AGL2015-67652.

#### ACKNOWLEDGMENTS

We thank Juan Luis Tortosa (Instituto de la Grasa, CSIC) for technical assistance.

virgin olive oils. Food Res. Int. 41, 433–440. doi: 10.1016/j.foodres.2008. 02.003


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Pérez, León, Sanz and de la Rosa. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Using Wild Olives in Breeding Programs: Implications on Oil Quality Composition

#### Lorenzo León<sup>1</sup> \*, Raúl de la Rosa<sup>1</sup> , Leonardo Velasco<sup>2</sup> and Angjelina Belaj<sup>1</sup>

1 IFAPA Centro Alameda del Obispo, Córdoba, Spain, <sup>2</sup> Instituto de Agricultura Sostenible – Consejo Superior de Investigaciones Científicas, Córdoba, Spain

A wide genetic diversity has been reported for wild olives, which could be particularly interesting for the introgression of some agronomic traits and resistance to biotic and abiotic stresses in breeding programs. However, the introgression of some beneficial wild traits may be paralleled by negative effects on some other important agronomic and quality traits. From the quality point of view, virgin olive oil (VOO) from olive cultivars is highly appreciated for its fatty acid composition (high monounsaturated oleic acid content) and the presence of several minor components. However, the composition of VOO from wild origin and its comparison with VOO from olive cultivars has been scarcely studied. In this work, the variability for fruit characters (fruit weight and oil content, OC), fatty acid composition, and minor quality components (squalene, sterols and tocopherols content and composition) was studied in a set of plant materials involving three different origins: wild genotypes (n = 32), cultivars (n = 62) and genotypes belonging to cultivar × wild progenies (n = 62). As expected, values for fruit size and OC in wild olives were lower than those obtained in cultivated materials, with intermediate values for cultivar × wild progenies. Wild olives showed a remarkably higher C16:0 percentage and tocopherol content in comparison to the cultivars. Contrarily, lower C18:1 percentage, squalene and sterol content were found in the wild genotypes, while no clear differences were found among the different plant materials regarding composition of the tocopherol and phytosterol fractions. Some common highly significant correlations among components of the same chemical family were found in all groups of plant materials. However, some other correlations were specific for one of the groups. The results of the study suggested that the use of wild germplasm in olive breeding programs will not have a negative impact on fatty acid composition, tocopherol content, and tocopherol and phytosterol profiles provided that selection for these compounds is conducted from early generations. Important traits such as tocopherol content could be even improved by using wild parents.

Keywords: breeding, fatty acid composition, minor components, Olea europaea, oleasters

# INTRODUCTION

The use of novel genetic diversity in plant breeding is considered of paramount importance to obtain new cultivars adapted to high productive, resilient, and sustainable growing systems. Crop wild relatives represent potential new sources of genetic diversity so that global conservation priorities for crop wild genetic resources are encouraged (Castañeda-Álvarez et al., 2016).

#### Edited by:

Mariela Torres, Instituto Nacional de Tecnología Agropecuaria (INTA), Argentina

#### Reviewed by:

Rosario Muleo, Università degli Studi della Tuscia, Italy Daniela Farinelli, University of Perugia, Italy

> \*Correspondence: Lorenzo León lorenzo.leon@juntadeandalucia.es

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 06 October 2017 Accepted: 09 February 2018 Published: 27 February 2018

#### Citation:

León L, de la Rosa R, Velasco L and Belaj A (2018) Using Wild Olives in Breeding Programs: Implications on Oil Quality Composition. Front. Plant Sci. 9:232. doi: 10.3389/fpls.2018.00232

**71**

According to recent revisions, the species to which cultivated olive belongs, Olea europaea, includes six sub-species based on morphology and geographical distribution (Green, 2002). Among them, the subsp. europaea, which can be found throughout the whole Mediterranean basin, is represented by two botanical varieties: cultivated olive (var. europaea) and wild olive (var. sylvestris). A wide genetic diversity has been reported for wild olives, even higher than the observed for cultivated ones (Baldoni et al., 2006; Breton et al., 2006; Belaj et al., 2010; Erre et al., 2010; Boucheffa et al., 2016). This could be particularly interesting for the introgression of some agronomic characters in breeding programs, such as biotic and abiotic stresses resistance, as already reported for many crops (Hajjar and Hodgkin, 2007). For instance, resistance to verticillium wilt scarcely found in current cultivars, has been observed in wild germplasm (Colella et al., 2008; Arias-Calderón et al., 2015), which suggest its potential use as parents in olive breeding programs. Progenies involving wild parents have also showed shorter juvenile period and more abundant flowering than progenies from cultivated parents, which can represent important advantages for olive breeding (Klepo et al., 2013, 2014). However, it is not fully understood whether the introgression of some beneficial characters could be accompanied by a parallel negative effect regarding some other important agronomic and quality characters.

From the quality point of view, virgin olive oil (VOO) is highly appreciated due to its fatty acid (high monounsaturated oleic acid content) and minor compounds composition. All these components are responsible of well-known healthy properties of VOO (Pérez-Jiménez et al., 2007). High variability for most olive oil quality components has been reported in progenies from breeding programs (Sánchez de Medina et al., 2014, 2015; De la Rosa et al., 2016). However, compositional quality of VOOs from wild origin has been scarcely studied. Some works comparing oil composition between cultivars and wild olives indicate overlapping results between the two groups (Hannachi et al., 2008; Dabbou et al., 2011; Boucheffa et al., 2014). While it seems that fruit size and oil content (OC) are determinant traits to discriminate between wild and cultivated forms (Hannachi et al., 2008; Belaj et al., 2011). Wild olives yielding high quality oils were suggested to be commercially useful (Baccouri et al., 2008; Dabbou et al., 2011), although a possible feral origin of the genotypes reported to have high OC cannot be excluded (Baccouri et al., 2008).

Up to recently, most of the olive breeding programs were based in intra-specific crosses between cultivars of well-known merit (De la Rosa et al., 2013), representing the selection work performed in Australia a rare case of the use of wild olive material for olive breeding. As a result, based on oil yield and quality, some interesting and well adapted genotypes to the conditions of Australia were selected, although feral but not genuine wild olives were used (Sedgley, 2004). In this sense, in the framework of the olive breeding program of Córdoba Spain, many efforts have been dedicated, in the last decade, to the collection, ex situ conservation and evaluation of wild genotypes from different origins (Belaj et al., 2007, 2011) as well as initial characterization of some olive progenies involving wild genotypes as parents (Klepo et al., 2013, 2014).

In this work, the variability for oil quality components including fatty acid composition and minor components such as tocopherols, phytosterols, and squalene was studied in a set of plant materials involving wild genotypes, cultivars, and crosses between them. The main objective of the research was to study the usefulness and implications for oil quality of using wild genetic resources in olive breeding. Additionally, we intended to gain some insights about the genetic determinism for these characters.

#### MATERIALS AND METHODS

#### Plant Materials

The olive germplasm under study include genotypes from three different origins: wild genotypes (n = 32), cultivars (n = 62) and genotypes belonging to cultivar × wild progenies (n = 62). All the genotypes were grown in the same conditions at the IFAPA Centro Alameda del Obispo, Córdoba (Spain). Maintained in an ex situ wild olive collection (De la Rosa et al., 2014), the wild genotypes came from prospecting surveys in Andalusia and Balearic Islands (21 and 9, respectively), and the rest (2) belonged to subs Olea guanchica from Canary Islands. The cultivated plant material under study comprised 62 olive cultivars from 13 different countries maintained at the World Olive Germplasm Collection (Supplementary Table 1). In both cases, each genotype is represented by two trees which have been evaluated for the traits under study during two harvesting seasons. In addition, 62 genotypes from crosses between the cultivar 'Picual' and two wild trees (W1 and W2), including 44 and 18 genotypes, respectively, were also evaluated. In this case, each seedling is represented by a single tree. Most of the plant materials were evaluated in two harvest seasons. However, average values per genotype were used in all analysis as genotype effect represented the main source of variation accounting for a major proportion of sums of squares for the evaluated traits, compared to harvest season effect (Supplementary Table 2).

#### Traits Evaluated

Fruit samples of around 0.5 kg were randomly collected for each plant in a common date (mid-November), typical for olive harvesting. Previous works suggest that it seems more efficient to compare genotypes in a common date rather than to harvest the olive samples in a fixed ripening index, which is also quite difficult to achieve for large number of genotypes (De la Rosa et al., 2013; Bodoira et al., 2016). From each sample, three subsamples of around 25 g were randomly selected to produce dried samples sizes suitable for NMR sample holder. Fruit fresh weight (FFW) was measured and, after drying in a forced-air oven at 105◦C for 42 h to ensure dehydration, OC was determined using an NMR fat analyzer (Minispec MQone, Bruker Optik GmbH, Ettlingen, Germany), and expressed as a percentage on dry weight basis.

Twenty additional fruits were randomly chosen from each sample, stored at −80◦C and then lyophilized. After lyophilization, the stones were removed and the flesh was milled in a laboratory ball mill. The samples were then stored at

−20◦C till analysis, usually within 48–72 h. All the analyses were performed in duplicate following previous procedures developed in our breeding program for direct analysis of fruit flesh (Velasco et al., 2014). In short, fatty acid composition was analyzed by simultaneous oil extraction and fatty acid methylation followed by gas-liquid chromatography (GLC) on a Perkin Elmer Clarus 600 GC (Perkin Elmer Inc., Waltham, MA, United States) equipped with a BPX70 30 m × 0.25 mm internal diameter × 0.25 µm film thickness capillary column (SGE Analytical Science Pty Ltd., Ringwood, VIC, Australia). Fatty acids were named according to C:D nomenclature, number of carbon atoms:number of double bonds in the chain. Tocopherol extraction, separation by high-performance liquid chromatography (HPLC), and quantification was done on around 100 mg of lyophilized olive flesh using a fluorescence detector (Waters 474) at 295 nm excitation and 330 nm emission and iso-octane/tert-butylmethylether (94:6) as eluent at an isocratic flow rate of 0.8 ml/min. Quantitative determination of tocopherols was done by using rac-5,7-dimethyltocol (Matreya LLC, Pleasant Gap, PA, United States) as internal standard and total tocopherol content (TO) was calculated as the sum of α, β, γ and δ-tocopherol contents, expressed as mg kg−<sup>1</sup> lyophilized fruit flesh. Finally, sterols (ST) and squalene (SQ) contents in lyophilized olive flesh samples were analyzed by


TABLE 1 | Descriptive statistics for fruit traits and main oil quality components evaluated in the three groups of samples (wilds, cultivars, and crosses).

<sup>∗</sup>For each trait different letters indicate significant differences between groups at P < 0.05.

TABLE 2 | Number of genotypes for each group exceeding thresholds values for the main fatty acids according to IOC trade standards (International Olive Council [IOC], 2016).


Values of the parents for each fatty acid are indicated.

GLC of the unsaponifiable fraction following silylation, using a Perkin Elmer Clarus 600 Gas Chromatograph equipped with a ZB-5 capillary column (id = 0.25 mm, length = 30 m, film thickness = 0.10 µm; Phenomenex, Torrance, CA, United States). Total sterol content was expressed as mg kg−<sup>1</sup> lyophilized fruit flesh.

#### Data Analysis

Principal components analysis (PCA) was used to investigate the variability between and within the different groups of samples evaluated and the relationship among traits, and analysis of variance and box and whiskers plots were performed for the most important variables selected from PCA to analyze differences between groups. Pearson correlation and linear regression were used to test the relations among the traits measured. Unscrambler (CAMO A/S, Trondheim, Norway) and Statistix (Analytical Software, Tallahassee, FL, United States) software were used for the statistical analysis.

#### RESULTS AND DISCUSSION

PCA was used for a preliminary exploratory analysis of variability between and within groups of samples (**Figure 1**). The score biplot of PC1 (26.4% of total variation) and PC2 (14.8%) showed clear separation of the three groups of samples (wilds, cultivars, and crosses), although a wide variability was also observed within each of the three groups. PC1 was positively correlated mainly with C16:0, C16:1, and TC, and negatively with FFW, OC, and SQ content. PC2 was associated positively mainly with C18:1 and negatively with C18:2 and ST. Main separation between groups was obtained through PC1, i.e., higher values for C16:0, C16:1, and TC are expected for wilds, and higher values for FW, OC, and SQ content for cultivars, with intermediate values in progenies from crosses in accordance to the relative position of the parents in the score plot.

The general results inferred from PCA could be expanded to original data, with significant differences in average values between groups of samples for the most important variables selected from PCA (**Table 1**). A wide and similar variability was observed in all three groups but clear differences in range of variation could be inferred from box and whiskers plots (**Figure 2**). High variability for most of the components has


Pearson correlation coefficients and signification levels are indicated above and below the diagonal, respectively.

been reported in other cultivars and wilds collections (Aparicio and Luna, 2002; Baccouri et al., 2008; Hannachi et al., 2013; Beltrán et al., 2015; Kyçyk et al., 2016), as well as progenies from breeding programs (De la Rosa et al., 2016). From the commercial point of view, this wide variability could impose some restrictions according to international trade standards (International Olive Council [IOC], 2016) (**Table 2**). For instance, C16:0 values for most of the wild genotypes would be higher than established (20% maximum). Some of them could also affront problems due to currents limits for C18:1 and C18:2. It should be noted, however, that the same problem is currently faced for traditional cultivars with some of them exceeding thresholds values for the main fatty acids set up by the International Olive Council (International Olive Council [IOC], 2016). Interestingly, genotypes from crosses exceeded maximum permitted values for C16:0, probably due to the high content of wild parents, but they complied with the other thresholds. Similar results have been reported in previous evaluations. For instance, Baccouri et al. (2011) found that only 85 out of 150 studied wild olives showed an oil fatty acid composition within IOC trade standards.

As expected from PCA and previous results, wild olives were characterized by low average values of fruit size and OC, significantly lower than cultivated materials (Hannachi et al., 2008, 2009; Belaj et al., 2011), while intermediate ones were obtained in progenies from crosses. For all groups, C18:1 was clearly the most abundant fatty acid, followed by C16:0 and C18:2 (**Table 1**). Wilds showed the highest mean content for C16:0 and C16:1, cultivars the opposite results, and progenies from crosses intermediate values. Not so clear results were obtained for C18:1 and C18:2, where progenies from crosses showed the highest and lowest mean values, respectively. However, it should be noted that high oleic acid 'Picual' cultivar was used as parent for these crosses, which may have affected these results due to the high heritability reported for this character (De la Rosa et al., 2016). Similar average composition has been previously reported in the evaluation of wild olives from different origins, such as Tunisia (Baccouri et al., 2011), Turkey (Matthäus et al., 2014), and Pakistan (Anwar et al., 2013). In contrast with our results, much lower values of C16:0 were found in wilds olives from Algeria (Boucheffa et al., 2014) and Tunisia (Dabbou et al., 2011). Besides, no differentiation in the distribution of fatty acid composition was observed between wild and cultivated olives in Tunisia (Hannachi et al., 2008; Dabbou et al., 2011). In addition, extreme values have been reported for subsp. cuspidata specimens from Kenya, with lower C18:1 content (up to 44.3%) and subsequently higher C18:2 content (up to 33.3%) than those identified in the present research for wilds. Such contrasting results found in wild materials may be probably attributed to the different environmental factors linked to the geographical position where the genotypes were evaluated in situ (Hannachi et al., 2009), to the representativeness of the oleasters included in the studies (Dabbou et al., 2011) as well as to a possible feral origin of them (Hannachi et al., 2008; Dabbou et al., 2011).

Significant differences between groups were also obtained for total amounts of minor components, with cultivars showing the highest total values for SQ and ST and the lowest values for TO. Wilds showed the lowest average value for SQ and the highest average value for TO. High TO content has been previously reported in samples from wild olives (Baccouri et al., 2008; Boucheffa et al., 2014). The lowest average value for ST was found in the progenies from crosses. Again, low ST content of 'Picual' parent could have affected these results due to the high heritability also reported for this character (De la Rosa et al., 2016). No or only minimum differences were obtained regarding minor components composition, being α-tocopherol and β-sitosterol the predominant TO and ST forms, respectively, in all groups. The ST profile in wild olives is in agreement with the results obtained by Hannachi et al. (2013), although the authors did not quantify total ST content. To the best of our knowledge, no information on total SQ content in wild olive germplasm is found in the literature.

Pearson's correlation coefficients among individual components reflected relationships previously inferred from

PCA analysis (**Table 3**). Some highly significant values among components of the same chemical family were found in all groups of plant materials, for instance between α-tocopherol and γ-tocopherol or between δ5-avenasterol and β-sitosterol (data not showed), as previously reported for other materials in our breeding program (De la Rosa et al., 2016). However, no significant correlations were found among components of different chemical families.

In relation to the fatty acid profile, the most significant correlations were found between C16:0 and C16:1, which were positively correlated, and between C18:1 and both C16:0 and C18:2, in both cases negatively correlated. The evaluation of linear regressions between C18:1 and both C16:0 and C18:2 in the three groups of trees revealed marked differences in the coefficients of determination and/or the slope of the regression line (**Figure 3**). Thus, in the regression between C16:0 and C18:1, crosses showed a lower slope compared to cultivars and wilds. A similar situation was observed in the regression between C18:1 and C18:2 (**Figure 3**). In the latter case, the coefficient of determination was considerably lower in the crosses than in the parents and wilds, attributable to the existence of lower variability for both traits in the crosses than in the cultivated and wild parents (**Figure 3** and **Table 1**). The fatty acid composition of olive oil depends on both genetic and environmental factors, with genotypic variance having been reported as the main contributor to total variance in studies involving high genetic variability (León et al., 2008; Rondanini et al., 2011), as it is the case for the present study. The occurrence of strong negative correlation between oleic acid and both palmitic and linoleic acid has been reported previously in olive, which has been attributed to the relationship of these fatty acids in the biosynthetic pathway (Dabbou et al., 2012). Biosynthesis of C16 and C18 fatty acids in olive is located in plastids, where a series of enzymatic reactions that elongate an acyl chain bound to an acyl carrier protein (ACP) by the stepwise addition of two-carbon molecules are catalyzed by the fatty acid synthase complex. The last elongation step produces C18:0-ACP from C16:0-ACP. Then, C18:0-ACP can be desaturated by the C18:0-ACP desaturase to produce C18:1-ACP. The fatty acids C16:0, C18:0, and C18:1 can be exported from the plastid after the hydrolysis of the acyl-ACP by acyl-ACP thioesterases. Outside the plastid, C18:1 can be desaturated to C18:2 by the microsomal enzyme oleoyl-phosphatidylcholine desaturase (FAD2) (Somerville et al., 2000). This biosynthetic pathway is complex due to the existence

#### REFERENCES


of several rate-limiting reactions controlling carbon flux through the pathway, which determines that the levels of one fatty acid may alter the activity of other enzymes within the pathway and subsequently the levels of other fatty acids (Salas et al., 2014).

In summary, the results of the study seem to indicate that the use of wild germplasm in olive breeding programs could not have a negative impact on fatty acid composition, tocopherol content, and tocopherol and phytosterol profiles provided that selection for these compounds is conducted from early generations. Important traits such as TO content can be even improved by using wild parents. Conversely, our results indicated a putative negative impact on both total phytosterol and squalene contents, although this needs to be confirmed with studies involving larger populations.

### AUTHOR CONTRIBUTIONS

All authors conceived and designed the experiment. AB was in charge of plant materials selection and samples collection. LL, RdlR, and LV performed fruit traits and oil quality analysis. LL prepared the first draft of the manuscript, all authors critically reviewed the manuscript prior to submission, read and approved the final version of the manuscript and agreed to be accountable for accuracy, integrity, and appropriateness of the manuscript.

# FUNDING

CICE project P11-AGR-7301, IFAPA projects AVA201601.2 and PR.PEI.IDF201601.2 partially funded by European Regional Development Fund, and European Union's Horizon 2020 RISE project BeFOre funded under the Marie Skłodowska-Curie grant agreement No. 645595. The conservation and management of olive cultivars at WOGB IFAPA Córdoba has been financially supported by INIA Projects (RFP 2012-00005; RFP 2013-00005).

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.00232/ full#supplementary-material

wilt in olive: fishing in the wild relative gene pool. Crop Prot. 75, 25–33. doi: 10.1016/j.cropro.2015.05.006




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 León, de la Rosa, Velasco and Belaj. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Divergent N Deficiency-Dependent Senescence and Transcriptome Response in Developmentally Old and Young Brassica napus Leaves

Vajiheh Safavi-Rizi<sup>1</sup> , Jürgen Franzaring<sup>2</sup> , Andreas Fangmeier<sup>2</sup> and Reinhard Kunze<sup>1</sup> \*

1 Institute of Biology, Dahlem Centre of Plant Sciences, Free University Berlin, Berlin, Germany, <sup>2</sup> Institute of Landscape and Plant Ecology, University of Hohenheim, Stuttgart, Germany

In the spring oilseed rape (OSR) cultivar 'Mozart' grown under optimal N supply (NO) or mild N deficiency (NL) the transcriptome changes associated with progressing age until early senescence in developmentally old lower canopy leaves (leaf #4) and younger higher canopy leaves (leaf #8) were investigated. Twelve weeks old N<sup>O</sup> and N<sup>L</sup> plants appeared phenotypically and transcriptomically identical, but thereafter distinct nutritiondependent differences in gene expression patterns in lower and upper canopy leaves emerged. In N<sup>O</sup> leaves #4 of 14-week-old compared to 13-week-old plants, ∼600 genes were up- or downregulated, whereas in N<sup>L</sup> leaves #4 ∼3000 genes were upor downregulated. In contrast, in 15-week-old compared to 13-week-old upper canopy leaves #8 more genes were up- or downregulated in optimally N-supplied plants (∼2000 genes) than in N-depleted plants (∼750 genes). This opposing effect of N depletion on gene regulation was even more prominent among photosynthesis-related genes (PSGs). Between week 13 and 14 in leaves #4, 99 of 110 PSGs were downregulated in N<sup>L</sup> plants, but none in N<sup>O</sup> plants. In contrast, from weeks 13 to 16 in leaves #8 of N<sup>L</sup> plants only 11 PSGs were downregulated in comparison to 66 PSGs in N<sup>O</sup> plants. Different effects of N depletion in lower versus upper canopy leaves were also apparent in upregulation of autophagy genes and NAC transcription factors. More than half of the regulated NAC and WRKY transcription factor, autophagy and protease genes were specifically regulated in N<sup>L</sup> leaves #4 or N<sup>O</sup> leaves #8 and thus may contribute to differences in senescence and nutrient mobilization in these leaves. We suggest that in N-deficient plants the upper leaves retain their N resources longer than in amply fertilized plants and remobilize them only after shedding of the lower leaves.

Keywords: autophagy, Brassica napus, leaf senescence, N remobilization, N-deficiency, oilseed rape, transcriptome, transcription factor

# INTRODUCTION

In the past three decades the worldwide oilseed rape acreage has expanded nearly threefold to 36 million ha and the production has increased even fivefold to 73 million tons in 2013 (Food and Agriculture Organization of the United Nations)<sup>1</sup> . In winter oilseed rape production fertilization

<sup>1</sup>http://faostat3.fao.org

#### Edited by:

Dragana Miladinovic,´ Institute of Field and Vegetable Crops, Serbia

#### Reviewed by:

Frédéric Marsolais, Agriculture and Agri-Food Canada (AAFC), Canada Astrid Wingler, University College Cork, Ireland

> \*Correspondence: Reinhard Kunze reinhard.kunze@fu-berlin.de

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 02 October 2017 Accepted: 10 January 2018 Published: 01 February 2018

#### Citation:

Safavi-Rizi V, Franzaring J, Fangmeier A and Kunze R (2018) Divergent N Deficiency-Dependent Senescence and Transcriptome Response in Developmentally Old and Young Brassica napus Leaves. Front. Plant Sci. 9:48. doi: 10.3389/fpls.2018.00048

with up to 200 kg nitrogen (N) ha−<sup>1</sup> year−<sup>1</sup> is common practice. Although oilseed rape (OSR) has a high uptake capacity for inorganic N, its nitrogen use efficiency (NUE; for definitions see Masclaux-Daubresse et al., 2010; Xu et al., 2012) is low. Only 50–60% of the applied N is recovered in the plants and at the time of harvest 80% of the total plant N is localized in the seeds (Schjoerring et al., 1995; Jensen et al., 1997; Malagoli et al., 2005a; Rathke et al., 2006). Accordingly, winter OSR production has a high N balance surplus that often exceeds the limit of 60 kg ha−<sup>1</sup> year−<sup>1</sup> that is effective since 2009 in Germany (Düngeverordnung)<sup>2</sup> and the European Union (Nitrates Directive<sup>3</sup> ). To meet these requirements without compromising seed yield, the development of cultivars with improved NUE at reduced fertilizer input is an important agricultural goal in OSR breeding.

Two factors determining the NUE are the N-uptake ability of the plants and the N-remobilization efficiency from old, senescing leaves during pod development and seed ripening. N-uptake increases in young plants approximately until flowering, but stagnates or even decreases during pod ripening and contributes only a minor fraction of the N in the seeds (Schjoerring et al., 1995; Rossato et al., 2001; Malagoli et al., 2005a; Gombert et al., 2010). Indeed, the majority of N required for seed filling and pod ripening is mobilized from senescing leaves and stems (Malagoli et al., 2005a; Gombert et al., 2010). Although the N-efficiency of winter OSR can also be enhanced by breeding cultivars with enhanced N-uptake ability (Schulte auf'm Erley et al., 2007), strengthening the N remobilization activity of leaves during the vegetative phase is a promising approach for improving the NUE of oilseed rape (Gironde et al., 2015). In a simulation model of N partitioning, Malagoli et al. (2005b) came to the conclusion that by optimizing N remobilization from leaves at lower nodes and N retranslocation from vegetative to reproductive tissues, OSR yield could be increased by 15%.

Yet, the shed leaves from lower nodes still have a high N content of up to 3.5% whereas leaves from upper nodes contain at the time of abscission only 1% residual N (Malagoli et al., 2005a). What limits N remobilization from early senescing leaves? Phloem loading of amino acids from degraded leaf proteins appears not to be the limiting step (Tilsner et al., 2005). In many winter OSR cultivars the onset of senescence and abscission of lower node leaves occurs already during the vegetative stages before the development of pods and seeds. This lack of sink organs supposedly leads to a low N remobilization rate from early leaves (Schjoerring et al., 1995; Rossato et al., 2001; Noquet et al., 2004; Malagoli et al., 2005a). Accordingly, winter cultivars with a delayed leaf senescence phenotype ('functional stay-green'; reviewed in Thomas and Ougham, 2014) tend to have a higher N-efficiency (Schulte auf'm Erley et al., 2007; Gregersen et al., 2013; Koeslin-Findeklee et al., 2015a,b).

The onset of leaf senescence is regulated by multiple, endogenous and environmental factors, among them N deficiency (Gregory, 1937; Mei and Thimann, 1984; Masclaux-Daubresse et al., 2007; Bieker and Zentgraf, 2013; Koeslin-Findeklee et al., 2015b). However, the developmental response to N deficiency and timing of senescence initiation are not uniform throughout the plant body. During development of winter OSR plants senescence progresses sequentially from the bottom toward the top and the sink leaves in young plants later turn into source leaves during pod ripening (reviewed by Avice and Etienne, 2014). N-deprivation triggers earlier onset of senescence in older leaves, whereas in young leaves at higher nodes senescence is delayed (Etienne et al., 2007; Desclos et al., 2008). Thus, the spatially and temporally concerted modulation of senescence initiation is a promising target to improve the N-efficiency of OSR, but it requires a deeper understanding of the metabolic and transcriptional changes associated with leaf senescence initiation and progression in different parts of the plant. Koeslin-Findeklee et al. (2015b) identified in a study of transcriptomic changes following senescence induction by N-depletion in leaves from a lower node of two B. napus winter cultivars differing in their stay-green properties and N-efficiency, a large number of cultivar-specifically regulated, senescence-associated genes, but they did not address leaf-rank specific expression differences.

In this study we report that in the doubled haploid OSR spring cultivar 'Mozart' senescence progression and the effect of N-limitation are similar as in winter OSR cultivars and we present a genome-wide developmental transcription analysis of plants grown under standard or reduced N-supply. The developmental transcription changes in lower and upper canopy leaves of plants grown under low N-fertilization indicated that in old (source) leaves senescence was initiated earlier and this onset was accompanied by extensive transcriptional reprogramming. In contrast, in young (sink) leaves at a node below the inflorescence, transcriptional reprogramming was delayed in N-depleted plants. We identified transcription regulator, autophagy and protease genes that were specifically regulated in N-depleted lower canopy leaves or in upper leaves under ample N supply, and genes that were expressed senescence-associated in oilseed rape, but not in Arabidopsis. We hypothesize that some of these genes may have OSR-specific functions in N-remobilization during N-deficiency induced leaf senescence and contribute to differences in senescence execution and nutrient mobilization in upper and lower canopy leaves.

#### MATERIALS AND METHODS

#### Plant Material and Growth Conditions

Oilseed rape spring cultivar Brassica napus cv. 'Mozart' plants (BSA Nr. RAS 502, supplied by Norddeutsche Pflanzenzucht Hans-Georg Lembke KG – NPZ, Hohenlieth, Germany) were cultivated in solid medium in growth chambers that simulated the daylight length and average daily temperature profile between 1991 and 2005 in South–West Germany from March 15th (day 0: sowing) onward (Supplementary Table 1). Light intensity (photon flux density) during daylight phases was approximately 1000 µmol m−<sup>2</sup> s −1 . The average CO<sup>2</sup> concentration during illumination was 396 ppm which

<sup>2</sup>https://www.gesetze-im-internet.de/d\_v\_2017/

<sup>3</sup>http://ec.europa.eu/environment/water/water-nitrates/index\_en.html

approximates ambient atmospheric conditions. During the dark phase the CO<sup>2</sup> concentration increased by approximately 100 ppm. A more detailed description of nursing, growth and physiological parameters of the plants analyzed in this study are presented in Franzaring et al. (2011). Leaf disks from early developing leaf #4 (at 78, 85, 92, and 99 days after sowing, DAS) and leaf #8 (at 92 and 106 DAS) were collected from plants grown at optimal (NO) or low (NL) N supply. For optimal N nutrition, NH4NO<sup>3</sup> was supplied in three equal gifts to each pot at germination (0 DAS; extended BBCH-scale stage GS0; Meier, 2001), 72 DAS (GS35) and 79 DAS (GS59) at an equivalent of 150 kg N ha−<sup>1</sup> t. For N<sup>L</sup> plants fertilizer gifts were reduced by half (75 kg N ha−<sup>1</sup> t). For each leaf sample three biological replicates from different plants were collected. Before harvesting, relative chlorophyll levels of the leaves were determined using a Konica Minolta SPAD-502 chlorophyll meter. For each leaf, SPAD values from two positions were measured and averaged.

#### RNA Isolation

After freezing and grinding the samples in liquid nitrogen, total RNA was isolated by a hot phenol method as described (Drechsler et al., 2015). Total RNA was purified further using the RNeasy Mini Kit (Qiagen, Hilden, Germany). RNA quality was monitored on an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, United States).

#### Brassica napus Custom Microarray Design and Functional Annotation

The Brassica napus custom microarray was designed and processed as described in Koeslin-Findeklee et al. (2015b). Briefly, after a probe-preselection strategy established by ImaGenes GmbH (Berlin, Germany; now Source BioScience)<sup>4</sup> (Weltmeier et al., 2011) 60,955 probes representing 59,577 targets (EST clusters termed in this paper B. napus 'unigenes') were selected for the production of microarrays in the Agilent 8 × 60k format. Microarray design (GPL19044) and expression data (Series entry GSE97653) are deposited in the NCBI Gene Expression Omnibus (GEO) repository. To assign putative functions to the 59,577 B. napus 'unigenes,' they were locally BLASTed (RRID:SCR\_004870) against the TAIR10 Arabidopsis thaliana cDNA collection (Lamesch et al., 2012) using the BioEdit alignment editor<sup>5</sup> (RRID:SCR\_007361). Putative functions were attributed to B. napus unigenes based on the annotation of the most homologous Arabidopsis thaliana genes with a BLAST E-value ≤ 10−<sup>6</sup> . When multiple B. napus unigenes had the same Arabidopsis homolog, the unigene with the lowest E-value that is significantly regulated in any one sample was selected for further analysis.

#### Processing and Bioinformatic Analysis of Microarray Data

Microarray expression data readouts were generated by the Agilent Feature Extraction software. The raw data files were

<sup>4</sup>www.sourcebioscience.com

processed, normalized and analyzed with the Bioconductor package LIMMA<sup>6</sup> (RRID:SCR\_006442; Smyth, 2004). The read.maimages function was used to load the data into an RGList object. Background subtraction and quantile normalization was performed followed by statistical analysis (moderated t-test). The average of replicated spots was calculated using the avereps function. A design matrix was built for the linear modeling function and the intensity values were applied as lmFit function. Contrast matrices representing comparisons between different harvest time points and N treatments were created and applied to modeled data for computing the statistical significance. Regulated unigenes (≥3-fold expression change and Benjamini-Hochberg-corrected p-value Padj < 0.05) were clustered by their temporal expression profiles with the Short Time-series Expression Miner (STEM) software, RRID:SCR\_005016, using default settings (Ernst and Bar-Joseph, 2006). Grouping of unigenes into functional categories was performed with the BAR Classification SuperViewer Tool w/Bootstrap<sup>7</sup> (RRID:SCR\_006748) using MapMan (RRID:SCR\_003543) categories as annotation source (Provart and Zhu, 2003; Provart et al., 2003). Enriched GO-terms were identified with the DAVID Bioinformatics Resources 6.8 in the GOTERM\_BP\_DIRECT term compilation using default settings<sup>8</sup> (RRID:SCR\_003033; Huang et al., 2009a,b). Heat maps of differentially regulated genes were created using MultiExperiment Viewer (MeV)<sup>9</sup> (RRID:SCR\_001915; Saeed et al., 2006). Arabidopsis thaliana transcription factors were compiled from the AGRIS AtTFDB<sup>10</sup> (RRID:SCR\_006928; Yilmaz et al., 2011), autophagy-related genes from the Autophagy database<sup>11</sup> (RRID:SCR\_002671; Homma et al., 2011), and peptidases from the MEROPS database<sup>12</sup>(RRID:SCR\_002671; Rawlings et al., 2016).

#### qPCR Primer Design and Assay for B. napus 'Unigenes'

Primers for quantitative real-time PCR (qPCR) were calculated by QuantPrime<sup>13</sup> (Arvidsson et al., 2008) after importing the B. napus unigene assemblies (Supplementary Table 13). One µg DNase I-digested total RNA was used for cDNA synthesis using SuperScript III Reverse Transcriptase (ThermoFisher Scientific). qPCR reactions were performed in 5 µl total volume including 2.5 µl Power SYBR Green Master Mix (ThermoFisher Scientific), 0.5 µM forward and reverse primers and 0.5 µl cDNA. UP1 and UBC9 were used as reference genes (Chen et al., 2010). The thermal profile used for all qPCRs was: 2 min 50◦C > 10 min 95◦C > (15 s 95◦C > 1 min 60◦C)40x. Data were analyzed by the 2 <sup>−</sup>11Ct method (Schmittgen and Livak, 2008).

<sup>5</sup>www.mbio.ncsu.edu/BioEdit/bioedit.html

<sup>6</sup>www.bioconductor.org/packages/release/bioc/html/limma.html

<sup>7</sup>http://bar.utoronto.ca/ntools/cgi-bin/ntools\_classification\_superviewer.cgi <sup>8</sup>https://david.ncifcrf.gov

<sup>9</sup>www.tm4.org/

<sup>10</sup>http://arabidopsis.med.ohio-state.edu

<sup>11</sup>www.tanpaku.org/autophagy/index.html

<sup>12</sup>http://merops.sanger.ac.uk

<sup>13</sup>http://quantprime.mpimp-golm.mpg.de

# RESULTS

# The Transcriptome Response to Reduced N Supply Differs in Early and Late Oilseed Rape Leaves

The aim of this study was to investigate if in spring oilseed rape (OSR) a mild N deficiency can be detected at the transcriptomic level, if the transcriptome response differs in developmentally older (source) leaves at a lower node and younger (sink) leaves at a higher node, and if the developmental response to N-deficiency resembles that in winter OSR cultivars. Brassica napus cv. 'Mozart' plants were raised under controlled conditions in a growth chamber under optimal N supply (NO) or N supply reduced by 50% (NL). Morphological, physiological and performance data of the same plants we investigated in this study were previously reported by Franzaring et al. (2011). The N<sup>L</sup> conditions caused only subtle developmental and growth phenotypes (Supplementary Figure 1), flowering started on average only 2 days later than in N<sup>O</sup> plants (Figure 3 in Franzaring et al., 2011), but seed yield was reduced (Table 1 in Franzaring et al., 2011) The early developing leaf #4 and the late developing leaf #8, located at the base of a flower developing side shoot, were harvested as representatives of old (source) leaves and young (sink) leaves, respectively. Leaf #4 was harvested at four different time points during development (78, 85, 92, and 99 days after sowing, DAS). Leaf #8 was harvested at the two time points 92 DAS and 106 DAS. Under both N treatments, at 92 DAS leaves #4 were still alive and attached to the stem, whereas at 106 DAS on most plants they were dead and shed (**Figure 1** and Supplementary Figure 1).

Transcription analysis was performed using a B. napus custom microarray representing 59,577 'unigenes' (Koeslin-Findeklee et al., 2015b). For 54,095 (91%) of the B. napus 'unigenes' 19,185 homologs were identified in Arabidopsis thaliana (Supplementary Table 2). For only 5,522 of these Arabidopsis genes one single B. napus unigene is represented on the microarray, whereas for 71% more than one B. napus unigene exist (Supplementary Table 3). These unigenes include representatives in each of the 66 biological categories in the MapMan metabolic pathway visualizer (Thimm et al., 2004). In 41 (sub-)categories more than 70% of the corresponding genes are represented by a homologous B. napus unigene (Supplementary Table 4).

Under optimal N supply in leaf #4 the number of B. napus unigenes up- or downregulated relative to the previous harvest time point with significant (Padj < 0.05) and ≥3-fold expression changes progressively increased from week 1 (26 genes) to week 2 (78 genes) to week 3 (579 genes) of the observation period (**Figure 1** and Supplementary Table 5). The same trend, but with a much steeper increment, was observed in plants that were grown under reduced N fertilization. In the first week the transcriptome did not change at all, whereas 1 week later 169 regulated genes appeared and in week 3 the number of regulated genes jumped to 2,985. In both growth conditions and all time intervals the downregulated genes outnumbered the upregulated genes.

In the upper canopy leaf #8, 1,950 genes were up- or downregulated in N<sup>O</sup> plants and 744 in N<sup>L</sup> plants. Thus, the

#8 were harvested between 78 and 106 days after sowing (DAS) as indicated below the sketches of the oilseed rape plants. The Venn diagrams display the numbers of upregulated and downregulated genes highlighted in light red and lime green, respectively. The examined developmental intervals are termed week 1 (W1), week 2 (W2), week 3 (W3) and weeks 3 + 4 (W3–4). Depicted are genes with significant, ≥3-fold expression changes (Padj < 0.05, n = 3).

relation of regulated genes in N<sup>O</sup> and N<sup>L</sup> plants was inverse compared to leaf #4. Also in leaf #8 the downregulated genes outnumbered the upregulated genes under both N fertilization regimes.

# Reduced N Supply Correlates with Differential Senescence Progression in Lower and Upper Canopy Leaves

The observed massive increase in gene regulation might indicate the onset of leaf senescence, which is known to be accompanied by transcriptome reorganization (Buchanan-Wollaston et al., 2005; van der Graaff et al., 2006; Koeslin-Findeklee et al., 2015b). We therefore tracked senescence initiation by measuring chlorophyll content and expression of chlorophyll A/B binding protein gene BnCAB1, Brassica napus drought 22 kD protein gene BnD22 and the senescence associated genes BnSAG12-1 and BnSAG2 by qPCR.

In leaves #4 of N<sup>O</sup> plants, BnSAG12-1 (**Figure 2A**) and BnSAG2 (**Figure 2B**) expression by trend increased already in week 1 and continued to increase throughout weeks 2 and 3. In N<sup>L</sup> leaves, upregulation of these two genes started only in week 2. Under both N treatments BnCAB1 transcription appeared to decline in week 1 (**Figure 2C**), but the expression change was not significant (Supplementary Table 6). BnD22, whose expression level was approximately 3.5-fold higher under low N-conditions at the beginning of the observation period than under optimal N supply (Supplementary Table 5) as has also been reported by Desclos et al. (2008), displayed no significant expression change in weeks 1 and 2 (Supplementary Table 6), but a rapid decline in week 3 (**Figure 2D**). Expression of both SAGs and CAB, but not BnD22, indicated upcoming senescence one to 2 weeks before also a decline in chlorophyll was measurable (**Figure 2E**).

In leaf #8, the equal upregulation of BnSAG12-1 and BnSAG2 and downregulation of BnD22 in N<sup>O</sup> and in N<sup>L</sup> leaves indicates that senescence has started during weeks 3–4. The regulation of BnCAB1 was strikingly different in N<sup>O</sup> and N<sup>L</sup> leaves #8. Under low N supply BnCAB1 expression was maintained, whereas under optimal N supply its transcription declined.

Neither in the lower or the upper leaves the differences between N<sup>L</sup> and N<sup>O</sup> plants in senescence marker gene expression and chlorophyll content were significant, whereas the expression pattern of BnCAB1 is very different in N<sup>L</sup> and N<sup>O</sup> leaves #8. We therefore compared the expression levels of the 110 OSR homologs of A. thaliana photosynthesis-related genes (PSG) on the microarray (Supplementary Table 7). In the optimally N-supplied leaves #4, none of these genes were significantly upor downregulated between 78 DAS and 99 DAS (except one downregulated gene in week 2). In striking contrast, in leaves #4 grown under reduced N fertilization, 94% of the PSGs were downregulated (4 PSGs in week 2 and another 99 PSGs in week 3). In the upper canopy leaves #8 the pattern was opposite: in N<sup>O</sup> leaves 66 PSGs were downregulated compared to 11 downregulated PSGs in N<sup>L</sup> leaves.

In summary, these data and the higher total number of upand downregulated genes in N<sup>L</sup> leaves #4 (**Figure 1**) indicate that a mild N deficiency leads not to a significant earlier initiation, but

FIGURE 2 | Senescence marker gene expression and chlorophyll content in Brassica napus leaves of plants grown under optimal or reduced N supply. B. napus leaves #4 and #8 of plants grown under optimal N supply (NO; dark gray columns) or under reduced N supply (NL; light gray columns) were harvested at the indicated days after sowing (DAS). Relative expression levels were determined by qPCR. For leaf #4, expression changes relative to the level at 78 DAS, and for leaf #8, expression changes relative to the level at 92 DAS are shown of (A) senescence associated gene BnSAG12-1, (B) senescence associated gene BnSAG2, (C) chlorophyll a/b binding protein gene BnCAB1 and (D) B. napus gene BnD22. (E) Relative chlorophyll contents at each harvest time point are shown as SPAD values. Error bars indicate the standard error of the means (n = 3). Significant expression differences by N treatment, over time and by interaction of the two parameters (Continued)

#### FIGURE 2 | Continued

fpls-09-00048 January 30, 2018 Time: 15:33 # 6

was calculated by two way ANOVA (∗P < 0.05; ∗∗P < 0.01; ∗∗∗P < 0.001). Subsequently a Tukey's HSD post hoc test was done to identify significant differences between all different harvest time points (Supplementary Table 6).

FIGURE 3 | Clustering of regulated genes with similar temporal expression profiles in leaf #4. The leaf #4 genes regulated during development under optimal and reduced N supply were clustered according to their temporal expression profiles. The mean expression values (log2) of all differentially regulated genes (≥3-fold expression change in W1, W2, or W3 and Padj < 0.05) were grouped by STEM using default settings. Each box represents one of 50 predefined expression profiles. Depicted are only profiles with a statistically significant number of genes assigned. Profiles with the same color are similar and defined as one cluster. The number of genes belonging to each model expression profile is shown in the top left corner of each box. (A) Of the 655 regulated genes under optimal N supply (NO), 484 genes are allocated to four clusters. Two hundred and ninety one genes are allocated to one 'downregulated' cluster that is subdivided in four profiles (green boxes) and 140 genes are allocated to one 'upregulated' cluster subdivided in two profiles (red boxes). To the right the numbers of up- and downregulated N<sup>O</sup> leaf #8 genes are shown. (B) Of the 3111 regulated genes under reduced N supply (NL), 2771 genes are allocated to seven clusters. 1174 genes are allocated to one 'downregulated' cluster that is subdivided in four profiles (green boxes) and 987 genes are allocated to one 'upregulated' cluster subdivided in three profiles (red boxes). To the right the numbers of upand downregulated N<sup>O</sup> leaf #8 genes are shown.

to a more rapid progression of senescence once it has started. In contrast, in the upper canopy leaf #8 mild N deficiency causes a delay in senescence progression. Thus, in spring OSR the effect of N-deprivation on senescence in older and younger leaves is similar as in winter OSR (Etienne et al., 2007; Desclos et al., 2008).

# N Fertilization-Dependent Gene Expression

To identify genes with similar expression change profiles during development under optimal and reduced N supply in the lower canopy leaf #4, the regulated genes were clustered by their expression profiles and assigned to 50 predefined model temporal expression profiles (Supplementary Table 8). Of the 665 genes up- or downregulated in N<sup>O</sup> leaves #4, four clusters with eight profiles had a statistically significant number of genes assigned (**Figure 3A**). Overall, 291 of the genes (44%) are allocated to a cluster of four downregulated profiles and 140 genes (21%) fall into a cluster of two upregulated profiles. In leaves #4 of N<sup>L</sup> plants, 2941 genes (94% of the 3111 regulated genes) are assigned to 15 model temporal expression profiles with a statistically significant number of genes (**Figure 3B**). Of these, 1174 genes (38%) are allotted to a cluster of four downregulated profiles and 987 genes (32%) to a cluster of three upregulated profiles, respectively. A conspicuous contrast between the transcriptomes of the developmentally younger upper canopy leaves #8 compared to the older leaves #4 is that in leaves #8 a higher number of genes is regulated during the 2 weeks observation interval in plants grown under optimal than under reduced N supply (**Figure 1**).

### Functional Classification of N Fertilization-Dependently Regulated Genes

To identify the most highly regulated N deficiency-responsive and senescence-associated pathways in leaves #4 and #8, we performed a Gene Ontology term enrichment analysis with the up- and downregulated genes in the two leaf #4 STEM clusters and in leaf #8 (**Figure 3**). To visualize considerably regulated biological processes in N-deficient leaves #4, which show the most progressed senescence symptoms, all significantly enriched (P < 0.05) GO terms for the up- and downregulated genes are displayed in **Figures 4**, **5**, respectively, and listed in Supplementary Table 9. The most significantly upregulated processes in N<sup>L</sup> leaves #4 include cell wall weakening, cellular response to N starvation, intracellular bulk degradation of cytoplasmic components like chloroplasts and mitochondria (autophagy, mitophagy) and general leaf senescence activities. N<sup>L</sup> leaves #4 shared almost 40% of upregulated GO terms with N<sup>O</sup> leaves #8, but only 19 and 13% with N<sup>O</sup> leaves #4 and N<sup>L</sup> leaves #8, respectively. Of the 73 significantly downregulated GO terms in N<sup>L</sup> leaves #4, approximately one quarter are associated with photosynthesis and related pathways, and another ∼20% encompass biosynthesis of chlorophyll, amino acids, fatty acids, glucose, amylopectin, alkanes, and plastoquinone. In optimally N-supplied leaves #8, 40 of these GO terms (55%) were also downregulated, but only ∼30% in N<sup>O</sup> leaves #4 and in N-deficient leaves #8. In agreement with photosynthetic gene and senescence marker gene expression levels (**Figure 2** and Supplementary Table 7), the GO term analysis suggests that senescence was most advanced in N<sup>L</sup> leaves #4 at 99 DAS, followed by N<sup>O</sup> leaves #8 at 106 DAS, N<sup>L</sup> leaves #8 at 106 DAS and N<sup>O</sup> leaves #4 at 99 DAS. Accordingly, N-deficiency induced in

upregulated genes in a GO term is not significantly higher than the overall fraction of upregulated genes. The numbers in brackets denote the count of enriched GO terms.

developmentally old leaves of the spring OSR cultivar 'Mozart' the accelerated progression of senescence and remobilization of nutrients, whereas in developmentally younger leaves in the upper canopy it led to a delay in senescence progression.

In an independent approach to identify up- or downregulated biological pathways, the temporally regulated genes in N<sup>O</sup> and N<sup>L</sup> leaves #4 were grouped by their putative biological functions according to the MapMan classification (Thimm et al., 2004), and categories enriched or depleted for regulated genes were identified. Although the total number of regulated genes was five times higher in N<sup>L</sup> compared to N<sup>O</sup> plants, the majority of functional gene categories showed no significant differences in the fractions of regulated genes (Supplementary Figure 2). This is consistent with the weak phenotypic differences between N<sup>O</sup> and N<sup>L</sup> plants (Supplementary Figure 1). However, major differences are apparent in the categories tetrapyrrole synthesis and photosynthesis, oxidative pentose phosphate pathway (OPP) and C1-metabolism. In leaves #4 of N<sup>L</sup> plants, half of the genes

FIGURE 5 | Enriched Gene Ontology terms among downregulated leaf #4 genes. The downregulated genes in all samples (green pictograms in Figure 2) were analyzed for enriched Gene Ontology terms using the DAVID Bioinformatics Resources 6.8. The blue bars represent all significantly enriched GO\_BP\_DIRECT terms in N<sup>L</sup> leaves #4 (P < 0.05). Those terms that were also enriched in other samples are shown as green, red, and yellow bars. The lack of a bar indicates that the fraction of downregulated genes in a GO term is not significantly higher than the overall fraction of downregulated genes. The numbers in brackets denote the count of enriched GO terms.

associated with photosynthesis (97 of 206 genes in this category) and 25% of the genes involved in chlorophyll biosynthesis (12 of 48 genes in the tetrapyrrole category) were downregulated, whereas in N<sup>O</sup> plants only 5% of the photosynthesis and no chlorophyll biosynthesis genes were downregulated. Thus, the gene expression data reflect the reduced chlorophyll content in N<sup>L</sup> plants (**Figure 2**). Also downregulated in NL, but not in N<sup>O</sup> plants, were the oxidative pentose phosphate pathway, which generates reductants required for various biosynthetic processes, including fatty acid synthesis and inorganic N and S assimilation (Kruger and von Schaewen, 2003; Bussell et al., 2013) and the one-carbon (C1) metabolism pathway. This pathway is also connected to the S-assimilation pathway by supplying C1 units for the synthesis of S-methylmethionine (SMM), which is transported from source leaves via the phloem to sink organs (Hanson and Roje, 2001). The MapMan classification of upand downregulated leaf #8 genes revealed overall less differences in significantly regulated categories between N<sup>O</sup> and N<sup>L</sup> plants compared to the older leaves #4 (Supplementary Figure 3). However, noticeable differences are apparent in the categories tetrapyrrole synthesis and photosynthesis. Opposite to leaves #4, in leaves #8 in both categories a large fraction of the genes was downregulated in NO, but not in N<sup>L</sup> plants, suggesting that, although the chlorophyll content had not yet much declined between 92 DAS and 106 DAS, senescence was more advanced in leaves #8 of plants grown under optimal N supply than in plants grown under reduced N fertilization.

#### Senescence- and N Deficiency-Associated Transcription Factor Genes

A major process during leaf senescence is remobilization of N and other nutritional degradation products from source to sink organs. The initiation and progression of senescence is orchestrated by transcription factors (TFs) and thus the identification of senescence-associated TFs that are responsive to N deficiency conditions is crucial for understanding the parameters that determine the N-efficiency of oilseed rape. We therefore investigated the range of B. napus homologs of Arabidopsis TFs that were differentially regulated upon N-deprivation in source leaves #4 and sink leaves #8. In total, 271 regulated OSR homologs of Arabidopsis TFs were found in 37 of the 51 Arabidopsis TF families (Yilmaz et al., 2011), and in most families more genes were down- than upregulated (Supplementary Table 10). Under both N-regimes, almost all TF genes in leaf #4 were regulated exclusively in week 3, only nine genes showed regulation during weeks 1 or 2 (Supplementary Table 11). Analogous to the frequencies of total regulated genes (**Figure 1**), between 92 and 99 DAS in N-deficient leaves #4, 3.6-fold more putative TF genes were transcriptionally regulated (78 genes up and 104 genes down) than in optimally N-supplied leaves #4 (22 genes up, 29 genes down). In leaf #8, between 92 and 106 DAS threefold more TF genes were regulated in N<sup>O</sup> (48 genes up, 92 genes down) than in N<sup>L</sup> plants (15 genes up, 21 genes down). None of the TF genes were oppositely regulated in N<sup>O</sup> and N<sup>L</sup> leaves #4 or leaves #8, or in N<sup>L</sup> leaf #4 and N<sup>O</sup> leaf #8. Twelve genes were upregulated exclusively in the senescing N<sup>L</sup> #4 and N<sup>O</sup> #8 leaves, among them the NAP/NAC029 homolog, and thus are leaf rank-independent senescence-associated TFs. Twenty-four TF genes were downregulated solely in the N<sup>L</sup> #4 and N<sup>O</sup> #8 leaves, among them the WRKY53 homolog, suggesting that they are controlling pathways that are downregulated during senescence. The WRKY, Whirly and NAC families deserve special attention, because members of these families were reported to play key roles in controlling leaf senescence and plastid stability in Arabidopsis. In contrast to most other TF families, the NAC genes were predominantly upregulated in senescing leaves #4 and #8 (**Figure 6** and Supplementary Table 11). With the only exception of JUB1, in senescing Arabidopsis leaves the corresponding genes are also upregulated (Breeze et al., 2011). Eleven of the 19 regulated OSR NAC genes were specifically regulated in N<sup>L</sup> leaves #4 or N<sup>O</sup> leaves #8, which indicates differences in the regulation of downstream processes in the two canopy levels. Other than the NAC factors, most of the regulated WRKY genes were downregulated and it appears that in this TF family more transcriptional reprogramming occured in N<sup>O</sup> leaves #8 than in N<sup>L</sup> leaves #4 (**Figure 6**).

### N-Deficiency Associated Expression of Protein Degradation Genes

The plant-specific developmental process of leaf senescence safeguards the coordinate degradation of proteins, lipids and nucleic acids and remobilization of the resulting low molecular weight nutrients from the senescing leaves to sink organs. Chloroplasts are the most important resource for nitrogen remobilized from senescing source leaves, and autophagy is a crucial process for degradation of chloroplasts during senescence and in response to starvation (Ishida et al., 2014; Michaeli et al., 2014; Izumi et al., 2017). Autophagy mutant plants suffer from premature senescence accompanied by accelerated cell death (reviewed in Minina et al., 2014). Senescence is also accompanied by the activation of various peptidases. To identify the senescence-associated OSR homologs of Arabidopsis autophagy genes in leaves #4 and #8 we matched them against the autophagy database (Homma et al., 2011).

We identified 28 OSR homologs of Arabidopsis autophagy(-related) genes that showed ≥ 3-fold changes in transcription levels (**Figure 7A** and Supplementary Table 12). In N-deficient leaves #4, 19 autophagy gene homologs were upregulated, among them ten ATG core genes that are essential for autophagosome formation (reviewed in Michaeli et al., 2016; Have et al., 2017). Seven of these autophagy core genes and seven autophagy-related genes are also upregulated in senescing Arabidopsis leaves (van der Graaff et al., 2006; Breeze et al., 2011). Remarkably, although also in N<sup>O</sup> leaves #4 senescence initiated during week 3 of the observation period, as was indicated by marker gene expression (**Figure 2**) and enrichment of the GO term 'leaf senescence' (**Figure 4**), except for ATG4a (see below) none of the autophagy core genes were regulated yet in week 3. Also in N-deficient leaves #8, where senescence was delayed, no activation of the autophagy genes was observed. In N<sup>O</sup> leaves #8, ATG7, ATG8a and six autophagy-related genes

or downregulated (green bars) Brassica napus unigene homologs of Arabidopsis thaliana NAC, Whirly and WRKY family transcription factors in leaf #4 between 92 and 99 DAS and in leaf #8 between 92 and 106 DAS as indicated on top of the columns. Depicted are genes with expression ratios ≥ 3 and Padj < 0.05 (n = 3). Green and red dots denote genes that were reported as leaf senescence-associated down- and upregulated in Arabidopsis thaliana by Breeze et al. (2011). Dots in parentheses indicate that this gene was not steadily regulated and WRKY65 was first up- and later downregulated in the course of leaf senescence.

were upregulated, however, the intensification of the autophagy pathway was clearly lower than in N-deficient leaves #4. Only two genes were regulated in both leaves #4 and #8 and independent of the N supply. ATG4a, which encodes a cysteine protease involved in the ATG8 ubiquitination-like pathway and is linked to autophagosome formation, was downregulated in all four samples. In Arabidopsis, ATG4a is transcriptionally induced by sudden N-depletion and carbon-starvation (Yoshimoto et al., 2004; Rose et al., 2006), but it has not been reported in the context of senescence yet. PLDP2 was upregulated in all four samples. This gene is also in Arabidopsis upregulated during senescence and was reported to regulate vesicle trafficking and to play a role in Pi-starvation (Breeze et al., 2011). The downregulation of the salicylic-acid responsive PR1 gene (Ward et al., 1991) in all samples except N<sup>O</sup> leaves #4 is consistent with the corresponding downregulation of the GO term 'response to salicylic acid' in these samples.

In addition to autophagy-related proteins, transcriptional and proteomic studies identified in various plant species a large number of senescence-associated, mostly upregulated peptidases from diverse families (reviewed in Roberts et al., 2012). Yet, in winter OSR, only few senescence-associated proteases and protease inhibitors were reported (Etienne et al., 2007; Desclos et al., 2009). We identified in spring OSR 'Mozart' overall 69 up- or downregulated OSR homologs of all A. thaliana peptidases listed in the MEROPS peptidase database (Rawlings et al., 2016) (**Figure 7B**). For 35 of these, Breeze et al. (2011) observed leaf senescence-associated regulation of the corresponding Arabidopsis genes; 30 of them were regulated in the same direction. Other than with the autophagy-related proteins, regulation of protease genes was heterogeneous and more genes were down- than upregulated (37 vs. 32). Remarkably, in contrast to all other gene classes, the highest number of regulated protease genes occured in N<sup>O</sup> leaves #8. It is tempting to speculate that in spite of the onset of senescence under ample N supply in leaves #8 (**Figure 4**), dismantling of chloroplasts and degradation of chlorophyll is still pending (**Figure 2**) and therefore autophagy is not massively upregulated yet (**Figure 7A**). However, at that stage leaves #8 likely act already as source leaves and provide nutrients for pod development and seed filling (Supplementary Figure 1, Franzaring et al., 2011). The prominent regulation of many proteases in these leaves may be associated with an elevated nutrient export activity.

# DISCUSSION

# Transcriptome Reprogramming in N Supply-Dependent Senescence of Lower and Upper Node B. napus Leaves

Since the divergence of the ancestral Brassicaceae into the Arabidopsis and Brassica lineages ∼17 million years ago (Cheung et al., 2009), genome triplication, allopolyploidization of the B. napus parental B. rapa and B. oleracea genomes, and gene loss events occurred, with the consequence that the modern OSR genome contains zero to more than six orthologs of any Arabidopsis gene (Rana et al., 2004). This complicates the identification of orthology relationships between Arabidopsis and B. napus genes and prevents the distinction between multiple related B. napus genes when using Arabidopsis microarrays. We therefore used a microarray with 60 nt-probes based predominantly on three EST libraries from B. napus, B. rapa, and B. oleracea and a smaller number of other publically available ESTs (Trick et al., 2009).

In previous studies of the transcriptome response to N starvation in Arabidopsis thaliana (Wang et al., 2003; Scheible et al., 2004; Balazadeh et al., 2014) and winter oilseed

rape (Koeslin-Findeklee et al., 2015b), plants were grown hydroponically in low N medium and transcription analysis was performed after nitrate re-addition. This treatment may invoke a rapid and temporal plant response to nutrient shock (Wang et al., 2001; Peng et al., 2007) and thus may not fully reflect the plant adaptive responses to long-term low N conditions. Here, we compared the OSR transcriptome in plants of a spring cultivar grown under optimal or low N fertilization in solid medium under seasonal climate simulating conditions, which is more similar to field conditions (Franzaring et al., 2012).

In the developmentally early winter OSR leaves, senescence typically begins during flowering but before the seed filling stage, and the leaves are shed before the developing reproductive organs have reached their maximal sink strength. This is considered as one reason for the relatively inefficient N remobilization and high residual N content in the fallen leaves that, moreover, increases with N fertilization (Schjoerring et al., 1995; Hocking et al., 1997; Rossato et al., 2001; Noquet et al., 2004; Malagoli et al., 2005a). In cauline leaves in the upper canopy, senescence initiates later and the N content of fallen leaves is lower, indicating a more efficient N remobilization from these leaves driven by the higher sink strength of the developing pods during seed filling (Malagoli et al., 2005a; Etienne et al., 2007). In this study we aimed to determine if the chronology of senescence initiation in different canopy levels is similar in a spring OSR cultivar and how differences between early and late leaves are reflected in their transcriptomes.

Taking the expression changes of the senescence marker genes BnSAG12-1, BnSAG2 and BnCAB1 as indicators, under optimal as well as low N supply the first signs of senescence initiation appeared in the lower canopy leaf #4 already in week 1 of the observation period. In this early senescence phase the chlorophyll content is not a useful indicator for senescence or N deprivation (**Figure 2**), as had also been observed by Gombert et al. (2006). In the following 2 weeks senescence progressed under both N regimes, but more rapidly in the N deficient plants as is indicated by the massive downregulation of PSGs. In the upper canopy leaf #8 the BnSAG12-1, BnSAG2 and BnD22 expression changes did not show a difference in the senescence status of N<sup>O</sup> and N<sup>L</sup> leaves. However, downregulation of BnCAB and many PSGs indicated that N deficiency led to a delay of senescence progression in younger leaves. This conclusion was corroborated by the extent of transcriptome reprogramming and the affected metabolic processes. In leaves #4 essentially no change in gene regulation in either N<sup>O</sup> or N<sup>L</sup> leaves was observed in week 1. One week later at an overall low level already twice as many genes were regulated in N<sup>L</sup> compared to N<sup>O</sup> leaves, and in week 3 in N-deficient leaves the number of regulated genes increased another 15-times to almost 3,000 regulated genes, whereas under ample N supply less than 600 genes were regulated. The effect of N deprivation was opposite in the upper canopy leaves #8, where 2.5-times more genes were regulated in N<sup>O</sup> plants. The functional classification of regulated genes revealed that senescence-associated transcriptome reprogramming in spring oilseed rape cv. 'Mozart' comprises largely the same biological

processes as in Arabidopsis thaliana (Buchanan-Wollaston et al., 2005; van der Graaff et al., 2006; Breeze et al., 2011).

barley WHIRLY1 ortholog is involved in premature senescence induction under photooxidative stress (Kucharewicz et al., 2017).

### Divergent Regulation of Transcription Factors in Senescing Young and Old Leaves

The age-dependent expression of thousands of senescenceassociated genes is orchestrated by transcription factors, many of which are themselves transcriptionally regulated during senescence. Several of these TFs are also induced by various biotic or abiotic stresses, indicating that senescence is an integrated response of plants to endogenous developmental signals and environmental cues (Woo et al., 2013). In Arabidopsis, transcriptomic analyses revealed the enrichment of upregulated TF genes of the NAC, WRKY, AP2/EREBP, MYB, C2H2 zinc-finger, bZIP, and GRAS families during leaf senescence (Buchanan-Wollaston et al., 2005; van der Graaff et al., 2006; Balazadeh et al., 2008, 2010; Breeze et al., 2011). We observed also in spring OSR a senescence-associated transcriptional reorganization in all these TF families. The comparison of the expression changes of the 112 senescence-associated TF genes that were identified in Arabidopsis by Breeze et al. (2011) and here in OSR reveals largely congruent transcription increases or decreases (Supplementary Table 11). Interestingly though, we note in OSR that 107 TF genes were only in N<sup>L</sup> leaf #4 and 65 genes only in N<sup>O</sup> leaves #8 more than threefold up- or downregulated, which indicates distinct regulation of individual senescence processes and nutrient remobilization in upper and lower canopy leaves. Also noticeable is a virtually perfect congruence of TF gene regulation in certain families between OSR and Arabidopsis (NAC, C2C2, C2H2) and more divergent regulation in others (AP2-EREBP, C3H, homeobox).

The expression profiles of well characterized key regulators of senescence in OSR and Arabidopsis attest that, in spite of their very different sporophyte architectures, the regulatory network controlling senescence is similar in these two Brassicaceae. For example, in both species the positive regulators of chlorophyll degradation NAC046 and NAC055 and the senescence promoting NAP/NAC029 factor are upregulated (**Figure 6**) (Guo and Gan, 2006; Hickman et al., 2013; Oda-Yamamizo et al., 2016). On the other hand, the negative regulator of senescence WRKY70 (Ülker et al., 2007; Zentgraf et al., 2010; Besseau et al., 2012) and the early induced WRKY53 factor, which interacts with other senescence regulators (Hinderhofer and Zentgraf, 2001; Miao et al., 2004; Zentgraf et al., 2010), were downregulated. However, in a few cases also divergent regulation of senescence-controlling factors in Arabidopsis and OSR was observed. The NAC family member JUB1, which was identified as a longevity-promoting factor in Arabidopsis (Wu et al., 2012), was downregulated in N<sup>L</sup> leaves #4. Surprisingly, it was found to be induced in Arabidopsis during leaf senescence (Breeze et al., 2011). Noteworthy is also the downregulation of the OSR homolog of WHY1 in N<sup>L</sup> leaves #4 (**Figure 6**), because this gene is involved in maintaining chloroplast stability. The Arabidopsis WHY1 gene, which is one of only three Whirly family genes in this plant, is required for chloroplast genome stability (Marechal et al., 2009), and the

### Chloroplast Decomposition and Protein Degradation Pathway Activation in Senescing Leaves

The critical role of autophagy for the disassembly of chloroplasts, mitochondria and other cellular structures in the course of senescence has been extensively demonstrated in Arabidopsis (reviewed by Michaeli et al., 2016; Have et al., 2017). During developmental and starvation-induced senescence, entire chloroplasts can be degraded by autophagy (Minamikawa et al., 2001; Wada et al., 2009). Other than in A. thaliana, where 9 of 15 upregulated autophagy genes were activated in leaves that were not even fully expanded yet and showed no signs of senescence (Breeze et al., 2011), in OSR we do not observe activation of autophagy genes in N<sup>O</sup> leaves #4 or N<sup>L</sup> leaves #8, while senescence was initiated in these leaves. However, in the more advanced senescence stages in N<sup>L</sup> #4 and N<sup>O</sup> #8 leaves, more OSR autophagy genes appear to be upregulated than in Arabidopsis (Breeze et al., 2011), although we did not consider all statistically significantly regulated genes but only those exhibiting ≥ 3-fold transcriptional changes. A possible explanation could be that autophagy is a generic, auto-cleaning process required to remove obsolete cell components and maintain cellular integrity. It is thus constitutively active at a low level which might be sufficient during early senescence. Only when senescence progresses it may become necessary to boost the autophagy pathway.

The more pronounced transcriptional activation of autophagy genes in N-deficient leaves #4 compared to N<sup>O</sup> leaves #8 could indicate that the cell death program, the last phase of senescence, has started in the lower canopy leaves, whereas the young upper canopy leaves #8 have to stay alive to serve as source leaves for nutrient remobilization toward the developing pods. This course of events is known from winter OSR genotypes (reviewed in Avice and Etienne, 2014). Consistent with this hypothesis is the much higher number of regulated protease genes in N<sup>O</sup> leaves #8 which may be involved in protein turnover and N remobilization, but not in executing cell death.

Differences between OSR and Arabidopsis are also apparent in the regulation of senescence-associated peptidase genes, which play a crucial role in providing nitrogen transport molecules like amino acids for developing sink organs (reviewed by Masclaux-Daubresse et al., 2010). Similar to the group of autophagy genes, in OSR more peptidase genes are differentially regulated than in Arabidopsis (Breeze et al., 2011), and a larger fraction of these genes is downregulated. These differences might indicate a partly different orchestration of the senescence course in OSR, which may reflect the more complex architecture and morphological development of OSR plants compared to A. thaliana. Recently, by protease activity profiling Poret et al. (2016) identified in senescing B. napus leaves after 23 days of N-starvation an activity increase relative to plants grown with ample N-supply of 17 serine- and cysteine-proteases with homology to 10 Arabidopsis proteases including SAG12, AALP, and AARE. In our study, both AALP and AARE are downregulated only in N-deficient

leaves #4 (**Figure 7B**). However, transcription data do not always reflect protein level or activity data, as has also been reported for metabolic flux data (Schwender et al., 2014), and especially proteases are frequently regulated at the post-transcriptional level.

# CONCLUSION

We found evidence that the sequence of senescence initiation and progression and also the effects of N-limitation are similar in the spring OSR cultivar 'Mozart' and in winter OSR cultivars. Like in winter OSR, long-term, mild N deficiency leads in spring OSR to premature shutdown of PSGs and senescence in lower canopy source leaves, whereas in upper canopy sink leaves senescence progression is delayed. The onset of senescence is accompanied by a massive reprogramming of the transcriptome. The affected regulatory and metabolic pathways are overall similar to those in Arabidopsis, but we identified transcription regulator and protein degradation genes that are specifically regulated in N-depleted lower canopy leaves or in upper leaves under ample N supply, and genes that are senescence-associatedly expressed in oilseed rape, but not in Arabidopsis. In future studies it will be interesting to address the question whether these genes fulfill specific tasks in N-remobilization during N deficiency-induced leaf senescence and if their regulation affects the nitrogen use efficiency of oilseed rape.

# REFERENCES


# AUTHOR CONTRIBUTIONS

VS-R: acquisition, analysis, and interpretation of data; writing the manuscript. JF: acquisition of data and design of the work. AF: design of the work. RK: conception and design of the work; acquisition, analysis, and interpretation of data; writing the manuscript.

#### FUNDING

This research was supported by the Deutsche Forschungsgemeinschaft (Forschergruppe FOR 948 grant no. KU715/10–2 to RK).

#### ACKNOWLEDGMENTS

The authors thank Samuel Arvidsson for help with QuantPrime, Stefan Bieker, and Ulrike Zentgraf for help with plant harvesting, and Christine Rausch for instructing VS-R during the early phase of the project.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.00048/ full#supplementary-material

and dark/starvation-induced senescence in Arabidopsis. Plant J. 42, 567–585. doi: 10.1111/j.1365-313X.2005.02399.x


Ernst, J., and Bar-Joseph, Z. (2006). STEM: a tool for the analysis of short time series gene expression data. BMC Bioinformatics 7:191. doi: 10.1186/1471-2105-7-191

Etienne, P., Desclos, M., Le Goua, L., Gombert, J., Bonnefoy, J., Maurel, K., et al. (2007). N-protein mobilisation associated with the leaf senescence process in

oilseed rape is concomitant with the disappearance of trypsin inhibitor activity. Funct. Plant Biol. 34, 895–906. doi: 10.1071/Fp07088


in nitrogen starvation-induced leaf senescence are governed by leaf-inherent rather than root-derived signals. J. Exp. Bot. 66, 3669–3681. doi: 10.1093/jxb/ erv170



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Safavi-Rizi, Franzaring, Fangmeier and Kunze. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# *Metarhizium brunneum* (Ascomycota; Hypocreales) Treatments Targeting Olive Fly in the Soil for Sustainable Crop Production

Meelad Yousef <sup>1</sup> , Carmen Alba-Ramírez <sup>1</sup> , Inmaculada Garrido Jurado<sup>1</sup> , Jordi Mateu<sup>2</sup> , Silvia Raya Díaz <sup>1</sup> , Pablo Valverde-García<sup>1</sup> and Enrique Quesada-Moraga<sup>1</sup> \*

<sup>1</sup> Department of Agricultural and Forestry Sciences, ETSIAM, University of Cordoba, Cordoba, Spain, <sup>2</sup> Department of Agriculture, Livestock and Fisheries, Government of Catalonia, Catalonia, Spain

#### *Edited by:*

Leire Molinero-Ruiz, Instituto de Agricultura Sostenible (CSIC), Spain

#### *Reviewed by:*

Fernando E. Vega, Agricultural Research Service (USDA), United States Paula Baptista, Polytechnic Institute of Bragança, Portugal

> *\*Correspondence:* Enrique Quesada-Moraga equesada@uco.es

#### *Specialty section:*

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

*Received:* 29 November 2017 *Accepted:* 01 January 2018 *Published:* 23 January 2018

#### *Citation:*

Yousef M, Alba-Ramírez C, Garrido Jurado I, Mateu J, Raya Díaz S, Valverde-García P and Quesada-Moraga E (2018) Metarhizium brunneum (Ascomycota; Hypocreales) Treatments Targeting Olive Fly in the Soil for Sustainable Crop Production. Front. Plant Sci. 9:1. doi: 10.3389/fpls.2018.00001 Soil treatments with Metarhizium brunneum EAMa 01/58-Su strain conducted in both Northern and Southern Spain reduced the olive fly (Bactrocera oleae) population density emerging from the soil during spring up to 70% in treated plots compared with controls. A model to determine the influence of rainfall on the conidial wash into different soil types was developed, with most of the conidia retained at the first 5 cm, regardless of soil type, with relative percentages of conidia recovered ranging between 56 and 95%. Furthermore, the possible effect of UV-B exposure time on the pathogenicity of this strain against B. oleae adults coming from surviving preimaginals and carrying conidia from the soil at adult emergence was also evaluated. The UV-B irradiance has no significant effect on M. brunneum EAMa 01/58-Su pathogenicity with B. oleae adult mortalities of 93, 90, 79, and 77% after 0, 2, 4, and 6 of UV-B irradiance exposure, respectively. In a next step for the use of these M. brunneum EAMa 01/58-Sun soil treatments within a B. oleae IPM strategy, its possible effect of on the B. oleae cosmopolitan parasitoid Psyttalia concolor, its compatibility with the herbicide oxyfluorfen 24% commonly used in olive orchards and the possible presence of the fungus in the olive oil resulting from olives previously placed in contact with the fungus were investigated. Only the highest conidial concentration (1 × 10<sup>8</sup> conidia ml−) caused significant P. concolor adult mortality (22%) with enduing mycosis in 13% of the cadavers. There were no fungal propagules in olive oil samples resulting from olives previously contaminated by EAMa 01/58-Su conidia. Finally, the strain was demonstrated to be compatible with herbicide since the soil application of the fungus reduced the B. oleae population density up to 50% even when it was mixed with the herbicide in the same tank. The fungal inoculum reached basal levels 4 months after treatments (1.6 × 10<sup>3</sup> conidia g soil−<sup>1</sup> ). These results reveal both the efficacy and environmental and food safety of this B. oleae control method, protecting olive groves and improving olive oil quality without negative effects on the natural enemy P. concolor.

Keywords: olive oil production, soil treatment, entomopathogenic fungi, microbial control, *Psyttalia concolor*, *Bactrocera oleae*

# INTRODUCTION

There is a need to develop effective, economically viable, and environmentally friendly methods for pest control (Nicolopoulou-Stamati et al., 2016), which has become even more critical for those insect pests that have developed insecticide-resistance such as the olive fruit fly Bactrocera oleae Rossi (Diptera: Tehphritidae) (Kakani et al., 2010; Hsu et al., 2015). This monophagous and multivoltine species is the most destructive to the olive crop worldwide (Daane and Johnson, 2010), not only reducing crop production, but even more important, olive oil quality (Mraicha et al., 2010; Medjkouh et al., 2016; Caleca et al., 2017). The importance of this insect pest has been aggravated by irrational repeated aerial spray applications of chemical insecticides targeting B. oleae adults for more than 60 years (Haniotakis, 2005). Even if most B. oleae control efforts have targeted the adult stage, from mid-autumn onwards, larvae of the last generation of B. oleae fall to the ground to pupate ∼3 cm below the soil surface beneath the tree canopy, which offers a great opportunity for an effective control of B. oleae (Dimou et al., 2003; Ekesi et al., 2007). Faced with this scenario, microbial control of soil dwelling stages of insect pests is among the most promising alternatives to synthetic chemical pesticides (Eilenberg and Hokkanen, 2006). In particular, entomopathogenic fungi have gained importance within the entomopathogenic microorganisms mainly due to their unique contact mode of action through the cuticle (Quesada-Moraga and Santiago-Álvarez, 2008). Besides entomopathogenic fungi are naturally distributed in a wide range of habitats and the soil is considered their natural reservoir (Quesada-Moraga et al., 2007; Pell et al., 2010; Garrido-Jurado et al., 2015); therefore, soil application of entomopathogenic fungi to target soil dwelling stages of insect pests could be a powerful and sustainable pest management strategy (Rogge et al., 2017). Yousef et al. (2017) have demonstrated the efficacy of soil treatments under olive tree canopy using the M. brunneum EAMa 01/58-Su strain (hereafter referred to as M. brunneum) for B. oleae control, as well as the compatibility of M. brunneum with commercial herbicides under laboratory conditions (Yousef et al., 2015), while Garrido-Jurado et al. (2011a) have demonstrated the lack of negative direct or indirect impact of such treatments on the olive crop soil-dwelling non-target arthropod population.

The use of entomopathogenic fungi has successfully reduced by 50–70% the adult B. oleae spring population, showing to be an effective, economically viable, and environmentally friendly method for B. oleae management. However, several aspects of the method related to efficacy, food safety and environmental sustainability still need to be addressed. In this study food safety, quality, and sustainability are targeted through (1) the compatibility of M. brunneum strain with the herbicide Oxifourfen 24% EC that is commonly used in olive orchards in Andalusia (Spain) (2) the efficacy of this strain in different climatic conditions (North and South of Spain); (3) the effect of soil type and rainfall on the movement of conidia into the soil; (4) the effect of the exposure time to UV-B radiation on virulence of this strain against B. oleae adults emerging from surviving preimaginals and carrying conidia from the soil at adult emergence; (5) the effect of the fungus on the cosmopolitan parasitoid Psyttalia concolor (Szépligeti) (Hymenoptera: Braconidae); and (6) the presence of the fungus in the olive oil.

### MATERIALS AND METHODS

#### Fungal Strain, Cultivation, and Inoculum Production

Metarhizium brunneum was obtained from the culture collection at the Agricultural and Forestry Sciences and Resources (AFSR) Department of the University of Cordoba, Spain. This strain was originally isolated from soil in a wheat plantation at Hinojosa del Duque, Cordoba, Spain. The strain was deposited in the Spanish collection of culture types (CECT) with accession number CECT 20764. The cultivation and inoculum production for the laboratory and field experiments were done as described by Yousef et al. (2017).

#### Insects Used in the Laboratory Bioassays

Bactrocera oleae adults used in the laboratory bioassays were obtained from naturally infested fruit collected from September to December in the Cordoba area. The infested olives were maintained as described by Yousef et al. (2013) to obtain the adults. P. concolor were originally obtained from a population at the Technical University of Madrid (UPM, Madrid, Spain), and then a stock colony was maintained at the Department of Agricultural and Forestry Sciences of the University of Cordoba in a rearing chamber set at 25 ± 2 ◦C, 50–60% RH, and 16:8 h (L:D) following the standard procedure developed by Jacas and Viñuela (1994).

#### Fungus-Herbicide Compatibility under Field Conditions

The field experiment was conducted in a commercial olive orchard to evaluate the compatibility of M. brunneum with the herbicide Oxyflourfen 24% EC. This experiment was performed in Castro del Río, Cordoba, Spain (37◦ 41′ 9.5′′N, 4◦ 29′ 38.7′′W; 227 masl). The experimental site was divided into six 1-ha square sub-fields (≈98 olive trees each; olive variety: Picual). Two of these sub-fields were the M. brunneum-treated plots, two were the fungi and oxyflourfen-treated plots and the others were the control plots. Soil application of the fungus was performed once in autumn (October–November) to target prepupating thirdinstar olive fruit fly larvae that exit from the fruits to the ground to pupate beneath the tree and spend the winter in the pupal stage (Santiago-Álvarez and Quesada-Moraga, 2007). The soil beneath each tree canopy in the olive orchards was sprayed with 1 l of M. brunneum suspension [which contained 1 g of conidia or 1 × 10<sup>9</sup> conidia and the herbicide at recommended field concentration (2 l ha−<sup>1</sup> )].

To evaluate the compatibility of M. brunneum with the herbicide Oxyflourfen 24% EC, two types of monitoring were performed after the simultaneous treatment. The first one consisted in monitoring the fungal strain persistence in the soil from both fungus-treated plots and fungus and herbicide treated one. Before the treatment, six completely randomized soil samples were collected using a soil corer (5 cm diam) to a depth of 15 cm to determine the natural presence of indigenous entomopathogenic fungi in the soil according to Goettel and Inglis (1996). After treatment, soil samples from fungus-treated and fungus and oxyflourfen treated plots were collected monthly beneath the canopy following the same procedure as mentioned above for 6 months. To assess the conidial density in each sample, the number of colony-forming units (CFU) per gram of dry soil was determined using Sabouraud chloramphenicol agar medium in petri dishes (Goettel and Inglis, 1996). Rainfall data was obtained from the climatological stations operated by Junta de Andalucía (Red de Alerta e Información Fitosanitaria-RAIF).

In the second monitoring, the adult population dynamics was compared in both treated and control plots using a combination of pheromone yellow and McPhail traps. A total of eight traps (five yellow traps and three Mcphail traps) in each plot (treated and control) were randomly distributed and inspected weekly to count the number of B. oleae adults. McPhail traps were baited with diammonium phosphate 4%. To improve the count accuracy, each plot (both treated and control) was surrounded with 120 Olipe traps (one per tree) (Caballero, 2002; Altolaguirre-Obrero et al., 2003) to reduce as much as possible the entry of adults from other farms. These traps were baited with proteinbased attractant, which is the most effective for the B. oleae (Ruiz-Torres, 2010).

#### Fungal Effectiveness in Northern Spain

The second experiment assessed the effectiveness of M. brunneum for B. oleae control under Northern Spain climatic conditions. The experiment was performed in Rodonyà located in Northern Spain (Tarragona, Spain) (41◦ 16′ 38.438′′N, 1 ◦ 24′ 7.297′′W; 312 masl). The experimental site was divided into four 0.5-ha square sub-fields (ca. 40 olive trees each; olive variety: Menya). Two of these sub-fields were the fungi-treated plots, and the others were the control plots. Soil application of the fungus was performed twice, once in autumn as described above and in late spring (Jun–Jul) to target the emerging adults. In both sites, the integrated pest management was applied as authorized by the Spanish Ministry of Agriculture, Food, and Environment (MAGRAMA, 2014). All treatments were performed following the same protocol described by Yousef et al. (2017).

The B. oleae adult monitoring described above was used to evaluate the effectiveness of the M. brunneum strain in soil applications in Northern Spain.

# Effect of Soil Type and Rain Volume on the Movement of Conidia in Soil

Four soils and one sandy substrate were used. The physicochemical properties of these soils are shown in **Table 1**. Further details of the analytical procedures used in this study have been previously described by Cañasveras et al. (2010). For the substrate preparation, quartz sand of aeolian origin was sieved to 0.2–0.5 mm, washed with a large volume of tap water enriched with Na2CO<sup>3</sup> (pH 9.5) to disperse clay, washed with deionized water to remove salts, and finally dried in an oven at 40◦C. Previously, the field capacity of each soil was estimated by saturation with water. For this purpose, five glass columns similar to a funnel (**Figure 1**) (140 mm height and 40 mm diameter) were packed with 100 g of soil, placing a piece of cotton in the bottom to keep the substrate in place. Deionized water was added slowly and at short intervals, ensuring that the water did not reach the lower edge of the funnel. The column was capped with Parafilm <sup>R</sup> and small holes were made with a needle. At 48 h, a portion of soil was taken from the central part of the funnel, discarding the top. In a weighing bottle of known weight (P1), the portion of soil collected was weighed (P2), placed in an oven at 105◦C for 48 h, allowed to cool and reweighed (P3). Field capacity was determined using the following equation:

$$PField\ capacity\ (\%) = \frac{P2 - P3}{P3 - P1} \times 100$$

To determine the influence of the volume of effluent added on the adsorption of conidia, an assembly was performed as shown in **Figure 1**. The five glass columns were packed with 100 g of sterile soil (autoclaved for 60 min twice at an interval of 24 h) representing model combinations of texture (sandy or clay) and pH (acid or alkaline), a piece of cotton was placed in the bottom to keep the substrate in place. Then, 5 ml of a 10<sup>8</sup> conidia/ml suspension was added to the soil surface. The columns were then run with the test volume (i.e., 60, 85, 100, 140, 250, 300, or 400 ml) of 0.002 M CaCl2, added dropwise with a crystal burette. At the bottom, an Erlenmeyer flask was placed to collect percolated effluent and the collected volume was recorded.

To determine the number of CFU, each column was divided into three equal sections or depths (A, B, and C; ca. 5 cm each one). 1 g of soil from each section was dissolved in 10 ml of deionized water with Tween 80 (0.1% v/v) and shaken at 12 rpm in a Model 3000445 Orbit rotary stirrer (J.P. Selecta, Barcelona, Spain) for 90 min. Aliquots of 0.1 ml were then spread with a Drigalsky loop on Saboraud dextrose agar supplemented with 0.5 g l−<sup>1</sup> of chloramphenicol (SDAC, Biolife, Italy). Aliquots of 0.1 ml of the collected effluent (E) were also spread on the SDAC medium. In some cases, dilutions were necessary before spreading. The plates were cultured at 25◦C for 3–4 days and CFU were counted. Fungal growth was visually identified: M. brunneum colonies exhibited circular growth, were largely white and contained varying shades of green in the central mycelia (Humber, 1997).

#### Effect UV-B Radiation Exposure Time on the Pathogenicity of *M. brunneum* against *B. oleae* Adults

The irradiation experiment was conducted in a temperaturecontrolled chamber (Fitoclima S600PL, ARALAB, Portugal) at a constant temperature of 25 ± 1 ◦C. To filter out the radiation below 290 nm, the irradiated samples (UV-B treatments) were protected with a 0.13 mm-thick cellulose-diacetate film (Clarifoil, Texas, USA), which allowed passage of most of the UV-B and UV-A radiation but prevented the UV-C radiation (<280 nm). In parallel a "no UV-B treatment" was performed, where cages were wrapped in aluminum foil to prevent UV-B exposure during the irradiation.

TABLE 1 | Geographical location and physicochemical properties of the soil samples used in this work.


OM, organic matter; EC, electrical conductivity; CEC, cation exchange capacity; Fed, dithionite extractable iron (Mehra and Jackson, 1958); DC, dispersable clay.

A M. brunneum conidial suspension (10<sup>9</sup> conidia ml−<sup>1</sup> ) was prepared. Then, newly emerged (24 h) B. oleae adults were cold anesthetized and treated with 1 ml of conidial suspension by using a Potter Spray Tower (Burkard Rickmansworth Co, Rickmansworth, UK). There were two controls, the first one was treated with the same volume of a sterile 0.1% Tween 80 aqueous solution and then irradiated, and the other was treated with the fungus and not irradiated. Following treatment, the insects were placed in the above-mentioned methacrylate cages. Then, the UV-B treatment (inoculated with the fungi and covered with cellulose-diacetate film) was irradiated at 1200 mW m−<sup>2</sup> for 2, 4, or 6 h. The no UV-B treatment (inoculated with the fungi and covered with aluminum foil) was irradiated for 6 h. Adult diet and water were provided ad libitum. Three replicates of 10 insects each were used for the UV-B and no UV-B treatments. Mortality was monitored for 10 days. Dead flies were removed daily and processed to assess mycosis.

#### Virulence Assay of *M. brunneum* against *P. concolor* Adults

Virulence of M. brunneum against P. concolor adults was evaluated under laboratory conditions. Four M. brunneum concentrations were prepared in a sterile aqueous solution that contained 0.1% Tween 80 [10<sup>5</sup> , 10<sup>6</sup> , 10<sup>7</sup> , and 10<sup>8</sup> conidia ml−<sup>1</sup> ], with the control consisting of no conidia. Newly emerged P. concolor adults that had previously been cold-anesthetized were treated with a Potter Spray Tower. Half a milliliter of conidial suspension was used for each replicate, with three replicates per treatment (10 adults each). After the treatment, adults were placed in methacrylate cages (80 × 80 × 60 mm, Resopal, Madrid, SP) with lids containing a 20 mm diam circular hole covered with a net cloth. Adult (treated and control) parasitoids were fed on honey. Mortality data were registered for 10 days. Dead flies were removed daily and the development of mycosis on the cadavers was determined.

### Evaluation of the Presence of Fungal Structures in Olive Oil

To assess the possible presence of M. brunneum conidia in olive oil after the treatment, two laboratory experiments were performed In the first one, the olives were directly exposed to the fungus, creating a "worst case" scenario. For that, three olive samples (1 kg each) were placed on trays (690 cm<sup>2</sup> ) and sprayed with 5.75 ml of 10<sup>9</sup> conidia ml−<sup>1</sup> with an Aerograph 27085 (piston compressor of 23 l min−<sup>1</sup> , 15–50 PSI, nozzle diameter of 0.3 mm, China). After treatment, olives remained in the trays for 48 h, and then olives from different trays were mixed and processed in three different groups before milling: (1) olives from the first group were rinsed in water for 1 min; (2) olives from the second group were surface-sterilized with 1% sodium hypochlorite; and (3) olives from the third group remained unwashed. Finally, olives from each group were processed into oil following the extraction process described by Vossen (2007) with some modifications. Olives were crushed for 2 min at 37◦C using a Thermomix <sup>R</sup> TM31 (Vorwerk, Wuppertal, D). After grinding, the paste was warmed in the same Thermomix for 30 min at 27.5◦C. Then, the oil was extracted through combination of pressing and centrifuge rotating at ∼10000 rpm.

In the second experiment, the olives were indirectly exposed to the fungus, creating a "real case" scenario for M. brunneum soil application. The above-mentioned trays were prepared with 500 g of soil that covered the entire base. The soil used in this study was collected from a farm in Córdoba and was characterized as sandy loam (78.0% sand, 17.0% silt, 5.0% clay, and 0.2% organic matter) with a pH of 8.3. The soil was sieved (2-mm mesh) and stored in a dry place at ∼25◦C. The soil was then sterilized at 121◦C for 20 min and dried in an oven at 105◦C for 24 h. Then, soils were sprayed with 5.75 ml of the fungal suspension (10<sup>9</sup> conidia ml−<sup>1</sup> ) using the above-described aerograph. After treatment, soil samples were collected from the bioassay trays. The conidial density after treatment was assessed using the CFU per gram of dry soil. Then, olives were dropped onto soil (treated and control). Olive samples were taken from the trays every 4 days for 16 days. Three replicates (trays) per sampling date were used for treatment and control. Finally, olive oil of each sampling date was extracted following the above-described protocol.

For both experiments, the possible presence of the fungus was evaluated according to the CFU method: 100 µl of each oil sample was spread onto medium and incubated at 25◦C for 15 days.

#### Statistical Analysis

The area under the B. oleae flight curves (AUBFC) in treated and control plots was calculated by trapezoidal integration method of SAS (Campbell and Madden, 1990). Then, the values of AUBFC were log<sup>10</sup> transformed and subjected to factorial analysis of variance using Statistix 9.0 (Analytical Software 2008). The same program was used to analyze mortality data. The values of average survival times obtained by the Kaplan-Meier method and compared using the log-rank test were calculated with SPSS 15.0 Software for Windows. Replicates in time, for all experiments, were analyzed as series of experiments with the model, y = treatment + experiment + treatment × experiment (Littell et al., 2006). Since the effect of experiment and interaction treatment × experiment was not significant, replicates from both experiments were combined in a model with only treatment as a factor (one-way ANOVA). Mortality data was transformed, <sup>y</sup> <sup>=</sup> arcsinq % Mortality <sup>100</sup> , to improve normality and homogeneity of variance, both requirements for linear model analysis. Means from different treatments were compared using Tukey's test (P < 0.05). The effect of soil type and rain volume on relative percentage of M. brunneum recovered was evaluated with a generalized linear model for ordinal data (proportional odds model). This proportional odds model is the standard generalized linear model for ordinal regression (Stroup, 2012) and is appropriate for this experiment since we are measuring the response as ordinal data type, expressed as relative or cumulative percentage of conidia in each section of the soil column. The dependent variable or response is the four possible classes or soil sections (A, B, C, and E). This model calculates the cumulative probability or proportion of conidia at each soil section or in the sections above. i.e., P conidia ≤ A is the probability or proportion of conidia in section A; P conidia ≤ B is the probability or proportion of conidia in sections A and B; P conidia ≤ C is the probability or proportion of conidia in sections A, B, and C; and P conidia ≤ E is the probability or proportion of conidia in the effluent (E) or in any of the sections = 100%. The proportion of conidia in one specific section can be calculated then as the difference between contiguous cumulative probabilities e.g., probability or proportion of conidia in section B, P conidia = B = P conidia ≤ B − P conidia ≤ A . The equation of the model is:

$$\begin{aligned} \eta\_{cijk} &= \eta\_c + \text{Soil type}\_i + \text{Rain volume}\_j + \text{Soil type}\_i\\ &\times \text{Rain volume}\_{ij}, \text{ c (class)} = A, B, C, E \end{aligned}$$

Where: Rain volume is modeled as continuous factor or covariate; k sub index refers to replicate; observations follow a multinomial distribution; and cumulative logit link function.

For the generalized linear model the estimation method was maximum likelihood with Laplace approximation. Model significance was evaluated with χ 2 test and the significance of the fixed effects was evaluated with F-approximate test (α = 0.05). Estimated cumulative probabilities for soil types were compared with odds ratio test. If the confidence interval for the ratio includes 1, the two soils are not significantly different (Stroup, 2012).

#### RESULTS

#### Fungus-Herbicide Compatibility and Fungal Effectiveness in Northern Spain

The soil applications of M. brunneum in both Northern and Southern Spain reduced the B. oleae population in treated plots compared with controls (**Figure 2**). The reduction in the B. oleae adult population emerging from the treated plots compared to the control plots was significant even if the fungus mixed with the herbicide (P < 0.5). In the Southern Spain experiment, the maximum captures of adult flies was 0.9 and 1.5 flies per trap/day in the fungus and fungus and herbicide treated plots, respectively, compared with 2.6 flies per trap/day in the control plots. In Northern Spain, the maximum captures of adult flies were 65 flies per trap/day in the fungus treated plots compared with 115 flies per trap/day in the control plots. The M. brunneum soil density decreased after autumn treatments in relation with the monthly rainfall and date of fungus application (**Figure 3**). However, the fungus persisted in the soil in both fungus and fungus and herbicide treated plots. The fungal concentrations in the soil 4 months after treatment in both fungus and fungus and herbicide treated plots were 1.2 × 10<sup>3</sup> and 2.3 × 10<sup>3</sup> conidia per gram of soil, respectively (**Figure 3**).

conditions.

Effect of Soil Type and Rain Volume on the

Oxyflourfen 24%EC. (B) Fungus efficacy under Northern Spain climatic

Movement of Conidia in Soil The analysis of the effect of soil type and rain volume on relative percentage of M. brunneum recovered showed a significant effect of soil type [F(4,130) = 33.64, P < 0.0001], rain volume [F(1,130) = 390.80, P < 0.0001], and interaction soil type x rain volume [F(4,130) = 8.42, P < 0.0001]. The whole model was significant χ 2 <sup>9</sup>df <sup>=</sup> 2347.08, <sup>P</sup> <sup>&</sup>lt; 0.0001.

The effect of rain volume on the relative percentage of conidia recovered for sections from A to E was inversely proportional for the series AG51, AG53, AG35, INM9, and FOCS.

The soils with more retention in the first section (A) were the clayey soils AG51 and AG53, with relative percentages of conidia recovered ranging between 83 and 97% of conidia, for the range of water volume evaluated. There were no significant differences between the profiles of retention for these two soil types (odds ratio test, 95% CI = 0.96–2.77) (**Table 2** and **Figure 4**). The retention in the first section significantly decreased in the sandy soils AG35 and INM9 in comparison with the clayey soils AG51 and AG53. The relative percentage of conidia in section A ranged between 55 and 91%, with cumulative values for sections A and B ranging between 85 and 98%. There were no significant differences between the profiles of AG35 and INM9 (odds ratio test, 95% CI = 0.77–1.44). The soil type with significant lower retention in the first sections was FOCS, with 23.4% of conidia in section C and effluent of 9.1% at the maximum rain volume of 400 ml.

### Effect UV-B Radiation Exposure Time on the Virulence of *M. brunneum* EAMa 01/58-Su Strain against *B. oleae* Adults

The virulence of M. brunneum was slightly higher on nonirradiated insects than UV-B treatment, with mortality ranging from 16.6 (non-treated and non-irradiated adults) to 92.9% (treated and non-irradiated adults), and fungal outgrowth from cadavers reaching 73.3% (**Table 3**). However, the average survival time (AST) of fungus-treated and irradiated adults were statistically equal to the AST of fungus-treated and nonirradiated adults, both, statically lower of AST of non-treated and irradiated adults or non-treated and non-irradiated adults (**Table 3**).

#### Virulence Assay for *M. brunneum* against *Psyttalia concolor* Adults

P. concolor adult mortality due to M. brunneum differed significantly at different conidial concentrations (P < 0.001). However, only the highest concentration (10<sup>8</sup> conidia ml−<sup>1</sup> ) caused significant P. concolor adults mortality (21.6%) (**Figure 5**). P. concolor adult mortality caused by M. brunneum ranged between 1.6 and 6.6% at 10<sup>5</sup> to 10<sup>7</sup> conidia ml−<sup>1</sup> compared with control mortality of 1.6%. Furthermore, the fungal treatment significantly influenced mycosis on the cadavers (P < 0.001), with mycosis on 13.3, 1.6, 0, and 0% of the cadavers treated with 10<sup>8</sup> , 10<sup>7</sup> , 10<sup>6</sup> , and 10<sup>5</sup> conidia ml−<sup>1</sup> , respectively. Only the highest concentration (10<sup>8</sup> conidia ml−<sup>1</sup> ) was statically different from the control (**Figure 5**).

### Evaluation of the Presence of Fungal Structures in the Olive Oil

No fungal propagules were detected in any of the olive oil samples obtained from both olive experiments, in which olives were directly or indirectly exposed to the fungus. There were no CFU in any of the Petri plates inoculated with the different olive oil samples.

# DISCUSSION

Throughout the last decades the development of biopesticides based on entomopathogenic fungi has become a very active line of work, since these products constitute an environmentally friendly alternative to synthetic/chemical pesticides (Sinha et al., 2016). Entomopathogenic fungi have been mainly investigated for aboveground pest control, whereas their potential for the control of soil-dwelling pests has been mostly ignored (Jackson et al., 2000). We had already developed a pioneer method based on soil application of M. brunneum for olive fruit fly control by targeting third-instar larvae in the soil during autumn and emerging adults from the soil during spring (Yousef et al., 2017).

However, the results of the present work present additional evidence of the potential of this strategy to develop a safe and sustainable olive pest IPM.

Metarhizium brunneum mixed with the herbicide Oxyflourfen 24% EC in the atomizer tank and then applied beneath the olive trees canopy, reduced the B. oleae population that emerged during spring from the soil of treated plots compared to controls plots in the range of 43% (fungus+herbicide treatment) and 65% (fungus alone treatment). This in vivo compatibility of M. brunneum with the herbicide confirms our previous laboratory results (Yousef et al., 2015), and allows a fungusherbicide simultaneous application reducing the application costs of both the fungus and herbicide. In addition, it is noteworthy that this reduction in B. oleae population density was a result of a single M. brunneum soil application in autumn. However, in our previous study, the soil application was performed twice, targeting preimaginals in the soil in autumn and targeting adults at emergence from the soil in spring, providing between 50 and 70% of reduction in B. oleae population density (Yousef et al., 2017). The persistence of the fungus in the soil after treatment was not affected by the herbicide mixed with the fungus, with M. brunneum persisting in the soil, after both fungus treatment and fungus-herbicide simultaneous treatment during 4 months at the onset of the natural background concentration registered in bulk soil (Bruck, 2010; Scheepmaker and Butt, 2010). This is noteworthy because the decline of entomopathogenic fungi densities over time to acceptable background level is requested by the EU regulations for registration purpose (Scheepmaker and Butt, 2010). This gradually decrease in time of the entomopathogenic fungi density in the soil may be influenced by edaphic, biotic, climatic, and cultural factors (Quesada-Moraga et al., 2007; Scheepmaker and Butt, 2010; Garrido-Jurado et al., 2011b). The effect of rainfall in the fate of fungal propagules in soil is the least understood, and has been shown to affect the foliar use of entomopathogenic fungi or the dispersion in soil, but in relation to crop residue (Bruck and Lewis, 2002; Jaronski, 2010). Many studies have suggested that rainfall is an important abiotic factor affecting conidia vertical mobility in the soil (Inglis et al., 2001; Garrido-Jurado et al., 2011b). However, this is first time that the fate of the fungal inoculum along the soil profile has been examined after the soil has been treated with entomopathogenic fungi. The results of the present work may serve as a model useful in estimating the fungal inoculum contained in each soil section according to soil type and amount of rain. More than 50% of the conidia was retained in the first 5 cm of the soil regardless of soil type and rain amount, which would guarantee the contact of the fungal propagules with B. oleae preimaginals in the soil since around 80% of the third instar prepupating olive fruit fly larvae pupate ∼3 cm below the soil surface (Dimou et al., 2003).

These results reveal both the high potential of M. brunneum and the key impact of the autumn soil application for the control of B. oleae. Our study validates the effectiveness use of

TABLE 2 | Relative percentages (cumulative values) of conidia at different soil sections and soil types.


Values estimated with the generalized linear model for ordinal data for the exemplary cases of rain volume = 140 and 400 ml. Depth of each soil section from A to C is ca. 5 cm.

M. brunneum-soil applications as a pest management method in Northern Spain since both the climatic conditions and biological activity of B. oleae (up to 5 generations per year) are different to those in Southern Spain (Santiago-Álvarez and Quesada-Moraga, 2007). Surprisingly, similar results were obtained with a reduction percentage of almost 50% in the B. oleae population emerged from the soil of treated plots during spring compared with controls.

Our results also reveal that UV-B did not reduce the virulence of M. brunneum against B. oleae when adults were irradiated at 1200 mW m−<sup>2</sup> for 2, 4, and 6 h after being treated with the fungus. Even if the B. oleae adult mortality has been slightly reduced after 6 h exposure to UV-B, the AST of fungustreated and irradiated adults were statically equal to the AST of fungus-treated and non-irradiated adults. However, the fungus applied to the soil during autumn targeting B. oleae larvae will be protected from environmental conditions, since the soil is the main natural reservoir for native entomopathogenic fungi and provides protection from extreme conditions (Ekesi et al., 2007; Zimmermann, 2007a,b; Garrido-Jurado et al., 2015). Nevertheless, it is interesting to know how UV-B may affect fungal virulence in B. oleae adults that emerge from the soil during spring from surviving preimaginals. Generally, UV-B, can be particularly deleterious for entomopathogenic fungi due to its detrimental effects on various fungal biological processes (Inglis et al., 2001; Jaronski, 2010). Fernández-Bravo et al. (2017) have shown that UV-B radiation may potentially affect the virulence of M. brunneum against Ceratitis capitata adults, but the decrease of the conidial density due to UV-B exposure was not enough to avoid a viable number of conidia remaining that exceeded the threshold to cause disease.

Once again, M. brunneum was shown to be safe for P. concolor in olive orchards. The highest P. concolor adult mortality caused by this strain was 21.6% at a concentration of 10<sup>8</sup> conidia ml−<sup>1</sup> . However, the same concentration (10<sup>8</sup> conidia ml−<sup>1</sup> ) has caused more than 95% of B. oleae adult's mortality (Yousef et al., 2017). Nevertheless, at field conditions, the direct contact between M. brunneum applied to the soil and the parasitoid is not possible. Furthermore, studies of Ekesi et al. (2005) and Daane et al. (2015) have demonstrated that fungal application targeting tephritid fruit fly preimaginals in the soil can be compatible with the classic biological control in which Psyttalia spp. larval parasitoids are field released. Our previous studies TABLE 3 | Effect of the exposure time to UV-B (1200 mWm−<sup>2</sup> ) on the virulence of M. brunneum strain against adult B. oleae.


<sup>a</sup>Control 1, Adult B. oleae were sprayed with 1 × 10<sup>9</sup> conidia ml−<sup>1</sup> and placed in methacrylate boxes, which were covered with cellulose-diacetate film (UV-B treatment) and aluminum foil for 6 h; Control 2, Treated and non-irradiated insects placed inside the chamber for 6 h; Control 3, non-treated and non-irradiated insects placed inside the chamber for 6 h.

<sup>b</sup>AST (mean ± SE) limited to 10 days. Data in the same column followed by the same letter are not significantly different (α = 0.05) according to the log-rank test.

have shown that soil applications of M. brunneum are safe to soil dwelling non-target arthropod communities, specially the formicid Tapinoma nigerrimum (Hymenoptera: Formicidae) used as a bioindicator in olive groves (Garrido-Jurado et al., 2011a).

#### REFERENCES


All olive oils obtained from olives previously exposed to M. brunneum were free of fungal propagules. The possibility of direct contact between the fungus and the olives is minimum, in spite of the direct contact experiment as a "worst case" scenario or the real conditions (soil application of the fungus). To date, this is the first study that addresses such possible presence of fungal inoculum in oils obtained from olives exposed to entomopathogenic fungi. These results demonstrate the food safety of this control method compared to the chemical insecticides used for more than 60 years for B. oleae control (Haniotakis, 2005), since pesticide residues have been detected in oils obtained from treated olives (Lentza-Rizos and Avramides, 1995). In addition, olive oil extraction at different exposure time to M. brunneum shows that there is no minimum waiting period between the application of the fungus and the harvest of the fruit.

This work highlights the adaptation of the olive fruit fly control method based on soil application of M. brunneum to the different climatic conditions and the possibility of a simultaneous soil application of the fungus with the herbicide beneath the tree canopy, which reduces application costs. This, together with the absence of negative effects on P. concolor, a B. oleae cosmopolitan parasitoid, demonstrates the environmental sustainability of this innovative method of control. In addition the use of entomopathogenic fungi leaves no residues in olive oil, in contrast to the use of chemical insecticides.

#### AUTHOR CONTRIBUTIONS

MY, IG, and EQ-M designed the experiments. MY, CA-R, SR, and JM performed the experiments. MY, and PV-G analyzed the data. MY, IG, and EQ-M wrote the manuscript.

#### ACKNOWLEDGMENTS

The authors thank the regional government of Andalusia for grant P11-AGR-7681, Sustainable strategies for pest control based on the establishment of rhizosphere competent and endophyte entomopathogenic fungi, and the Ministry of Economy and Competitiveness of the Spanish Government for grant, INNOLIVAR. We are also grateful to the Technical University of Madrid represented by Dra. Elisa Viñuela, Dra. Angeles Adan Del Rio, and Sr. Luis Quiros for kindly providing P. concolor pupae and the rearing standard procedure. I. Garrido-Jurado thanks the Ministry of Economy and Competitiveness of the Spanish Government for a Juan de la Cierva postdoctoral grant.

Bruck, D. J., and Lewis, L. C. (2002). Rainfall and crop residue effects on soil dispersion and Beauveria bassiana spread to corn. Appl. Soil Ecol. 20, 183–190. doi: 10.1016/S0929-1393(02)00022-7


Mundial de IFOAM Sobre Olivar Ecológico: Producciones y Culturas. Puente de Génave (Jaén), 421–424.


Control: Measures of Success, eds G. Gurr and S. Wratten (Dordrecht: Springer), 271–296.


Metarhizium brunneum and its extracts. J. Econ. Entomol. 106, 1118–1125. doi: 10.1603/EC12489


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Yousef, Alba-Ramírez, Garrido Jurado, Mateu, Raya Díaz, Valverde-García and Quesada-Moraga. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Sunflower Hybrid Breeding: From Markers to Genomic Selection

Aleksandra Dimitrijevic<sup>1</sup> and Renate Horn<sup>2</sup> \*

1 Institute of Field and Vegetable Crops, Novi Sad, Serbia, <sup>2</sup> Institut für Biowissenschaften, Abteilung Pflanzengenetik, Universität Rostock, Rostock, Germany

In sunflower, molecular markers for simple traits as, e.g., fertility restoration, high oleic acid content, herbicide tolerance or resistances to Plasmopara halstedii, Puccinia helianthi, or Orobanche cumana have been successfully used in marker-assisted breeding programs for years. However, agronomically important complex quantitative traits like yield, heterosis, drought tolerance, oil content or selection for disease resistance, e.g., against Sclerotinia sclerotiorum have been challenging and will require genome-wide approaches. Plant genetic resources for sunflower are being collected and conserved worldwide that represent valuable resources to study complex traits. Sunflower association panels provide the basis for genome-wide association studies, overcoming disadvantages of biparental populations. Advances in technologies and the availability of the sunflower genome sequence made novel approaches on the whole genome level possible. Genotype-by-sequencing, and whole genome sequencing based on next generation sequencing technologies facilitated the production of large amounts of SNP markers for high density maps as well as SNP arrays and allowed genome-wide association studies and genomic selection in sunflower. Genome wide or candidate gene based association studies have been performed for traits like branching, flowering time, resistance to Sclerotinia head and stalk rot. First steps in genomic selection with regard to hybrid performance and hybrid oil content have shown that genomic selection can successfully address complex quantitative traits in sunflower and will help to speed up sunflower breeding programs in the future. To make sunflower more competitive toward other oil crops higher levels of resistance against pathogens and better yield performance are required. In addition, optimizing plant architecture toward a more complex growth type for higher plant densities has the potential to considerably increase yields per hectare. Integrative approaches combining omic technologies (genomics, transcriptomics, proteomics, metabolomics and phenomics) using bioinformatic tools will facilitate the identification of target genes and markers for complex traits and will give a better insight into the mechanisms behind the traits.

Keywords: association panel, genome-wide association studies, genomic estimated breeding value, genomic selection, genome sequence, marker-assisted selection, sunflower, traits

# INTRODUCTION

Sunflower represents the second most important crop based on hybrid breeding, after maize (Seiler et al., 2017). It is mainly used for its seed oil, even though the seeds of confectionary sunflower also serve as snacks. With up to 12% of the global production of vegetable oils worldwide, sunflower takes position number four after palm oil, soybean and canola oil (Rauf et al., 2017). Apart from

#### Edited by:

Leire Molinero-Ruiz, Instituto de Agricultura Sostenible (CSIC), Spain

#### Reviewed by:

Felicity Vear, INRA – Auvergne Rhône-Alpes Centre, France Lili Qi, Agricultural Research Service (USDA), United States Ruth Amelia Heinz, Instituto Nacional de Tecnología Agropecuaria (INTA), Argentina

> \*Correspondence: Renate Horn renate.horn@uni-rostock.de

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 15 October 2017 Accepted: 20 December 2017 Published: 17 January 2018

#### Citation:

Dimitrijevic A and Horn R (2018) Sunflower Hybrid Breeding: From Markers to Genomic Selection. Front. Plant Sci. 8:2238. doi: 10.3389/fpls.2017.02238

**106**

its use for human nutrition, sunflower oil has a number of industrial applications as, e.g., basic component for polymer synthesis, biofuel, emulsifier or lubricants (Dimitrijevic et al., 2017).

Up until the beginning of the 1970s of the last century sunflower production was based on open-pollinated varieties (Vear, 2016). Events that led to changing sunflower production to hybrid breeding were the discoveries of the first cytoplasmic male sterility (CMS) source (Leclercq, 1969) and the identification of corresponding restorer genes (Kinman, 1970; Leclercq, 1971). Soon after, in 1972, the first sunflower commercial hybrid was available for production in United States (Putt, 1978). Exploitation of heterosis for hybrid development enabled farmers to obtain higher seed and oil yields, as well as increased uniformity (Bohra et al., 2016). The development of sunflower hybrids set up sunflower as a major viable crop worldwide and encouraged the founding of numerous public and private breeding centers (Skoric, 2012; Seiler et al., 2017). In recent years, public and private sector contributed to assemble huge plant genetic resources in sunflower, to identify markers for marker assisted selection (MAS) and to establish the use of new highthroughput technologies in sunflower. Today, the estimated value of global sunflower production reaches \$20 billion per year (FAO, 2016).

Basic directions in sunflower hybrid breeding include developing: (1) high seed and oil yield hybrids resistant to dominant diseases and tolerant to drought, (2) hybrids with changed oil properties, (3) confectionary hybrids, (4) herbicide resistant hybrids and (5) ornamental hybrids (Jocic et al., 2015). In addition, special markets have particular demands such as (1) achene and kernel properties as well as high protein content and lower oil content (lower than 40%) in confectionary sunflower production, (2) specific fatty acid and tocopherol composition in food and non-food industry or (3) plant height, ray and disk flower color, duration of flowering in ornamental sunflower hybrid breeding. The common needs for resistance against abiotic and biotic stress as well as the special needs of the various breeding purposes require the development of markers to facilitate the introduction of different traits.

Botanically, sunflower (Helianthus annuus L.) is a member of the Asteraceae family, one of the most diverse and largest families of flowering plants. Due to the economic importance of the cultivated sunflower and the ecophysiological variability within the genus Helianthus, sunflower became a model plant species for genome studies in the family (Bachlava et al., 2012). The sunflower genome with 3.6 Gb is quite large (Badouin et al., 2017), three times larger than the rapeseed genome (Chalhoub et al., 2014), or more than eight times larger than the one of rice (Arumuganathan and Earle, 1991). Due to its ability to grow in different agroecological conditions and its moderate drought tolerance, sunflower may become the oil crop of preference in the future, especially in the light of global environmental changes. Even though simulations showed an increase of sunflower yield for northern parts of Europe in view of predicted climate changes, negative effects on sunflower yield may occur in southern latitudes (Debaeke et al., 2017). Consequently, more attention should be paid to breeding for better adaptation with regard to climate changes. These traits should include not only improvement in drought tolerance, but also introduction of pest resistance, salt tolerance and changes of plant architecture for better adaptation. Exploitation of available plant genetic resources in combination with the use of modern molecular tools for genome-wide association studies (GWAS) and application of genomic selection (GS) could lead to considerable improvements in sunflower. However, only in the recent years plant and genomic resources have become available in sunflower comparable to other crops (**Figure 1**). In this review we will talk about the long way that sunflower breeders and biotechnologists have to go and the future perspectives of using modern molecular tools in sunflower breeding.

#### PLANT GENETIC RESOURCES IN SUNFLOWER

#### Biparental and Wild Populations

Biparental populations based on crosses between elite breeding, conventional, or introgressed lines (e.g., Berry et al., 1995; Horn et al., 2003; Vera-Ruiz et al., 2006; Kane et al., 2013; Livaja et al., 2016) as well as landraces and wild species (e.g., Quillet et al., 1995; Kim and Rieseberg, 1999; Brouillette et al., 2007; Ma et al., 2017) have been employed in sunflower for mapping of genes, marker detection, QTL analyses and gene cloning. In addition, recombinant inbred lines (RILs) have been developed that as immortals can be maintained forever by self-propagation (e.g., Berrios et al., 1999; Tang et al., 2002; Tang et al., 2006; Poormohammad Kiani et al., 2007a; Talukder et al., 2016). However, biparental populations have three major disadvantages: (1) these populations have to be individually established for each research project requiring time and resources, (2) only two alleles per locus can be evaluated and (3) due to missing recombination events populations show low resolutions in mapping (Bernardo, 2008; Patrick and Alfonso, 2013). The use of association panels overcomes these problems. To verify the usefulness of association panels the genetic diversity between wild populations and the genetic diversity fixed in association panels were compared (Mandel et al., 2011; Filippi et al., 2015). Even though alleles present only in the wild populations were detected, the majority of the alleles were present in the investigated association panels.

#### Sunflower Collections

The largest sunflower collection is handled at the Institute of Field and Vegetable Crops, Novi Sad, Serbia consisting of over 7,000 sunflower inbred lines developed from different genetic sources and 21 perennial and 7 annual species (447 accessions in total)<sup>1</sup> (Atlagic and Terzic, 2014). The next largest collection of more than 5000 cultivated and wild Helianthus accessions is held at the USDA-ARS NPGS in Ames (Marek, 2016). About half of these, 2,519 accessions, represent the world's largest wild relatives sunflower collection, comprising 53 species – 39 perennial and 14 annual species (Seiler et al., 2017). Another large collection for sunflower (cultivated and wild) is maintained at the

<sup>1</sup>http://www.nsseme.com/about/inc/oilcrops/wild.php

Vavilov Institute of Plant Industry, which consists of a total of 2,780 accessions from which 2,230 represent cultivated sunflower accessions and 550 wild sunflower accessions belonging to 24 species (19 perennials and 5 annuals) (Gavrilova et al., 2014). Some smaller numbers of 585 accessions of H. annuus are available via GRIN-CA, the Plant Gene Resources of Canada<sup>2</sup> and additional 613 sunflower accessions of diverse origin are distributed by the IPK Gatersleben<sup>3</sup> . These resources represent mostly uncharacterized plant material. In contrast to these, a well-defined collection of 400 open-pollinated varieties, landraces and breeding pools has been assembled by INRA to reflect the worldwide diversity present in sunflower (Mangin et al., 2017b). However, conservation of population diversity of sunflower populations represents a challenge in the maintenance process due to the self-incompatibility of wild sunflowers (Gandhi et al., 2005) and the possibility of genetic drifts occurring during the propagation of seed stocks (Mangin et al., 2017b). To study the preservation of the genetic variability, a set of 114 cultivated sunflower populations of the INRA collection were genotyped using a 384 Golden Gate SNP Assay. In conclusion, multiplication in isolation fields or use of cages is recommended to reduce loss of genetic variability in cultivated genetic resources.

These worldwide available collections of sunflower represent a valuable resource for the sunflower community. It could be of interest to include some additional accessions of these large collections to the existing association panels described below.

# Association Panels

Association panels have to be characterized by molecular markers like SSRs or SNPs to avoid false associations due to the population structure and family relationship. The review here focusses on association panels that are online available as the prior mentioned sunflower collections. To analyze the primary gene pool of sunflower an association panel consisting of 433 cultivated accessions from North America and Europe in addition to 24 wild sunflower populations distributed over the whole of United States were characterized by 34 selected EST-SSRs chosen on the presumptive neutrality toward domestication and breeding efforts (Chapman et al., 2008; Mandel et al., 2011). USDA cultivated accessions in this panel were assigned to the following categories: HA and RHA being either non-oil or oil, landrace, open-pollinated variety (OPV), non-oil introgressed, oil introgressed, other non-oil, and other oil. The INRA accessions could only be categorized into INRA-HA and INRA-RHA as the information on oil and non-oil was not always available. Analyses using the software STRUCTURE (Pritchard et al., 2000) and Principle Coordinates (PCO) analyses (Patterson et al., 2006) did not reveal deep genetic divisions within the germplasm (Mandel et al., 2011). The cultivated and the wild populations separated into two different groups and within the cultivated accessions the restorer-oil (RHA-oil) category stayed apart from the remaining gene pool (Mandel et al., 2011). This is not unexpected due to the hybrid history of sunflower in which the maintainer and restorer pools have been kept separate on purpose to maximize heterosis (Fick and Miller, 1997). A selection of 288 accessions still covers nearly 90% of the genetic diversity available in the

<sup>2</sup>http://pgrc3.agr.gc.ca/order-ordre\_e.html

<sup>3</sup>https://gbis.ipk-gatersleben.de/GBIS\_I/

original larger panel. This association panel was named UGA-SAM1 and consists of 259 accessions, which are distributed by the Germplasm Resources Information Network (GRIN<sup>4</sup> ) of the USDA National Plant Germplasm System (NPGS) and 29 accessions available through the French National Institute for Agricultural Research (INRA, France). This UGA-SAM1 population, which has been successfully employed in association studies (Mandel et al., 2013; Nambeesan et al., 2015), represents a very valuable tool for future association studies for the whole sunflower community. A minimal core set of 12 accessions (representing HA, RHA, oil and non-oil accessions as well as INRA material) capturing nearly 50% of the total allelic diversity might be ideal to build up a MAGIC (Multi-parent advanced generation intercross) population for sunflower. The MAGIC strategy is interesting for studies of multiple alleles in order to exploit higher recombination frequencies and better mapping resolution (Cavanagh et al., 2008). Development of MAGIC populations is in progress for numerous plant species (Bandillo et al., 2013) and would be interesting for sunflower as well.

Argentinean germplasm also represents a valuable genetic resource due to the long history of sunflower breeding in Argentina (de Bertero, 2003; Moreno et al., 2013). Filippi et al. (2015) characterized the population structure und calculated the genetic diversity of the association mapping population (AMP-IL), representing 137 INTA (National Institute of Agricultural Technology, Argentina) accessions and 33 accessions of open-pollinated and composite populations. The plant material is maintained by The Active Germplasm Bank of INTA Manfredi (AGB-IM). Using 42 SSR markers and/or SNP markers, detected by a 384 Illumina SNP-oligo pool array, estimated and observed heterozygosity as well as clustering using STRUCTURE and Discriminant Analysis of Principle Components (DAPC) were compared for the two marker types. As in other studies (Mandel et al., 2011, 2013) the population structure was dominated by the maintainer/restorer trait (Filippi et al., 2015).

A germplasm collection of 196 Spanish confectionary sunflower accessions is maintained at the Centre of Plant Genetic Resources of the National Institute for Agricultural and Food Research and Technology (CRF-INIA)<sup>5</sup> . A large genetic variation was revealed regarding hundred-seed weight, kernel percentage, seed oil content, fatty acid and tocopherol composition, phytosterols and other traits (Velasco et al., 2014; Pérez Vich et al., 2017).

In addition to well characterized association panels, considerable plant genetic resources are nowadays available in sunflower (cultivated as well as wild H. annuus accessions and accessions representing other species in the genus Helianthus).

#### Mutagenized Populations

To increase the naturally available genetic variability sunflower has been mutagenized (Zambelli et al., 2015). Mutant populations have been successfully developed and used to screen for mutant phenotypes interesting for breeding purposes with regard to

<sup>4</sup>https://www.ars-grin.gov/

flowering time, dwarf habitus, oil content, high oleic trait, herbicide resistance and branching (Soldatov, 1976; Gabard and Huby, 2001; Sala et al., 2008a; Cvejic et al., 2011b; Leon et al., 2013). Recently, a TILLING (Targeted Induced Local Lesion In Genomes) population for high throughput screening of EMS (ethyl methane sulfonate)-induced mutations in sunflower was established by Sabetta et al. (2011) and used for studies of genes involved in the fatty acid biosynthesis. Optimized mutagenesis using EMS was used to develop an additional sunflower TILLING platform (Kumar et al., 2013). Phenotypic characterization of 5,000 M2 lines was performed to estimate the mutation rates and to select interesting mutants. As seed oil biosynthesis is of major importance in sunflower, TILLING of FatA and SAD genes were investigated and revealed an overall mutation rate of one mutation every 480 kb (Kumar et al., 2013). Another possibility to develop mutant populations is to apply gamma irradiation or fast neutrons (Cvejic et al., 2011a). Optimal ranges for gamma irradiation and fast neutrons were explored in comparison to EMS concentrations.

Besides induced mutagenized populations, natural mutations that have occurred in wild sunflower populations have had significant impact on sunflower hybrid breeding, especially in the area of herbicide resistance. In the recent years intensive use of herbicides has led to the emergence of resistant wild sunflower populations. The first case was a population of common sunflower found in a soybean field in Rossville (KS, United States), in which imazethapyr that belongs the group of AHAS (acetohydroxy acid synthase) inhibitors was used over a time course of seven consecutive years for weed control. Thus, creating the first sunflower population, named ANN-PUR, resistant to one of the AHAS inhibitors (Al-Khatib et al., 1998). Resistance from this population was successfully introduced into commercial sunflower hybrids (Miller and Al-Khatib, 2000; Jocic et al., 2004 ´ ). Sunflower production based on the use of this imidazolinone (IMI) resistance, which provides an efficient and easy control of post-emergence broadleaf weeds in Europe, is called Clearfield <sup>R</sup> technology. In addition to the discovery of IMI resistant sunflowers, another population of wild sunflowers (ANN-KAN), tolerant to another AHAS herbicide group called sulfonylurea, was discovered in Kansas (United States) (Al-Khatib et al., 1999). The same tolerance was also obtained by EMS mutagenesis (Gabard and Huby, 2001). Later, more populations of wild sunflowers resistant to AHAS herbicides were found (e.g., White et al., 2002, 2003; Jacob et al., 2017). In addition, a new tolerance for imidazoline called Clearfield Plus <sup>R</sup> was selected from an M2 population of 600,000 plants treated with EMS (Sala et al., 2008a).

Natural genetic diversity and naturally occurring or chemically/gamma-ray induced genetic variability represent a perquisite for selection in breeding. The wide range of accessions maintained and made available by the germplasm banks for the research community is an extremely valuable starting point for successful breeding programs in sunflower allowing association studies and introduction of new traits into existing commercial breeding material. However, mutagenesis can create additional new genetic variability in traits where the natural variability is not sufficient.

<sup>5</sup>http://wwwx.inia.es/coleccionescrf/PasaporteCRF.asp

#### GENETIC MAPS AND SUNFLOWER GENOME SEQUENCE

Different molecular markers, which have been applied in mapping genes and development of sunflower linkage maps (**Figure 1**), set the basis for the assessment of the genetic diversity present in the genus Helianthus as well as in cultivated and wild sunflower accessions. Positioning of desirable genes allowed the identification and development of more specific molecular markers. At the Sunflower CMap database<sup>6</sup> genetic maps available for sunflower have been listed and can be compared with each other by using the program CMAP (Kane et al., 2013).

The first map was developed on wild sunflower using RAPD markers (Rieseberg et al., 1993). A couple of years later maps were generated and published by using non-PCR based RFLP markers in different crosses of cultivated sunflower (Berry et al., 1995; Gentzbittel et al., 1995; Jan et al., 1998). These maps were published several years later than RLFP maps, e.g., in wheat, maize, barley, rice, and oilseed rape due to companies being involved in construction of the sunflower map (Hu, 2010). Later on, AFLP markers were added to the maps (Peerbolte and Peleman, 1996; Gedil et al., 2001). Most sunflower linkage maps contained 17 linkage groups (LG), representing the number of haploid chromosomes in sunflower. These maps were followed by genetic maps based on SSR markers (Tang et al., 2003b; Yu et al., 2003). The first composite genetic SSR map consisted of 278 single-locus SSR markers as well as additional 379 markers (public and proprietary), covering 1423 cM. This map that nowadays serves as reference genetic map for sunflower (Tang et al., 2003b) was then further saturated with additional SSR markers exploring three new mapping populations (Yu et al., 2003). In between more than 2,000 SSR have been derived from genomic sequences (gSSR) and EST (EST-SSR) and are now available for mapping and genotyping (Brunel, 1994; Dehmer and Friedt, 1998; Paniego et al., 2002; Tang et al., 2003b; Yu et al., 2003; Poormohammad Kiani et al., 2007b; Chapman et al., 2008; Heesacker et al., 2008). Existing sunflower maps were further enriched by these gSSRs, EST-SSRs, INDELs, TRAPs markers (Hu et al., 2007; Heesacker et al., 2008). These SSR markers (sequences and primers available through NCBI) represent a very valuable tool as they allow the localization of genes on individual linkage groups (Tang et al., 2003b) as well as on the recently published sunflower genome sequence of HanXRQ<sup>7</sup> (Badouin et al., 2017). About 3 gigabases (Gb) representing 80% of the whole genome size were assembled and represent an extremely useful tool for all different research programs that aim at the improvement of sunflower hybrids.

Finally, the step toward high-density maps was made possible by using SNP-based markers, starting with Lai et al. (2005) who derived SNPs from an EST database (as part of the Compositae Genome Project) and used them for mapping. An Infinium Beadchip including 9,480 SNPs based on transcriptome data was developed by Bachlava et al. (2012) and employed by Bowers et al.

<sup>6</sup>http://www.sunflower.uga.edu/cmap/

<sup>7</sup>https://www.heliagene.org/HanXRQ-SUNRISE/

(2012) to obtain four high-density genetic maps. Each of these maps contained 3,500–5,500 loci. Even though the maps were highly colinear, gaps in individual maps were observed. To solve this issue a consensus map of 10,080 loci was constructed from these data (Bowers et al., 2012). Talukder et al. (2014a) developed a high density map of 5,019 SNP markers obtained via RADsequencing. The rust resistance gene R<sup>12</sup> was fine-mapped using this SNP-based map. In addition, 118 SSR markers were included in the SNP map to address and orientate the linkage groups according to the sunflower reference genetic map. Celik et al. (2016) pioneered the use of genotyping-by-sequencing for large scale SNP detection in sunflower and developed a SNP-based linkage map of 817 SNP-markers covering all 17 LG by analyzing an F<sup>2</sup> obtained from the cross RHA 436 × H08 M1. Using the newly developed 25 K SNP array in sunflower Livaja et al. (2016) were able to construct a linkage map based on 6,355 SNP markers for the RIL population NDBLOSsel × CM625. The connection between genetic linkage maps and the sunflower karyotype was finally made by developing a molecular cytogenetic map for H. annuus (Feng et al., 2013). BAC and BIBAC clones with known genetic locations were used in fluorescence in situ hybridization (FISH) experiments to address the individual chromosomes.

The high resolution of the recently developed high-density maps in sunflower facilitates to narrow down the regions of interest, which should allow identification and cloning of genes for various relevant traits in the near future. In addition, SNPbased maps deliver markers closely linked to, e.g., resistance genes that can be applied in large scale marker-assisted breeding programs or can be integrated in SNP arrays.

#### MARKER DEVELOPMENT BY LINKAGE MAPPING

#### Resistance to Downy Mildew

Developed linkage maps set a good basis for localization and mapping of simply inherited traits. Most of the downy mildew resistance genes, conferring resistance to the oomycete Plasmopara halstedii, have been found to be dominantly inherited and consequently, relatively easy to map by using molecular markers. Identification of closely linked markers also represents a good basis for map-based cloning of the genes.

Great efforts were put into examination of downy mildew resistance genes (R genes), designated as Pl genes, that are distributed throughout sunflower genome: Pl13, Pl14, Pl16, and PlArg on LG1; Pl1, Pl2, Pl6, Pl7, Pl15, and Pl<sup>20</sup> on LG8; Pl5, Pl8, and Pl<sup>21</sup> on LG13; Pl<sup>17</sup> and Pl<sup>19</sup> on LG4; while Pl<sup>18</sup> is localized on LG2 of the sunflower SSR reference map (Mouzeyar et al., 1995; Roeckel-Drevet et al., 1996; Vear et al., 1997; Bert et al., 2001; Slabaugh et al., 2003; Yu et al., 2003; Mulpuri et al., 2009; Wieckhorst et al., 2010; Bachlava et al., 2011; Vincourt et al., 2012; Qi et al., 2015a; Qi L.L. et al., 2016; Zhang et al., 2016; Ma et al., 2017). Most of Pl genes are clustered, except for PlArg and Pl18.

The Pl cluster on LG8 was the first to be detected by molecular markers. Mouzeyar et al. (1995) used RAPD and RFLP markers for mapping the first downy mildew resistance gene, Pl1, which

is a part of a large Pl cluster (Pl1, Pl2, Pl6, Pl7). Of all genes in the cluster, Pl<sup>6</sup> gene was the most intensively examined since it conferred resistance to all the races present for a long time, except race 304. Different marker types were used for the introgression of Pl<sup>6</sup> into susceptible sunflower material including STS (Sequence-Tagged Sites) markers belonging to the TIR-NBS-LRR class of RGA (Resistance-Gene Analog) (Bouzidi et al., 2002) and co-dominant CAPS (Cleaved Amplified Polymorphic Sequence) markers (**Table 1**). The developed markers have been successfully used to introduce Pl<sup>6</sup> by MAS or to track the introduction of Pl<sup>6</sup> in backcrosses during conversion of downy mildew susceptible lines into resistant ones (Dimitrijevic et al., 2010; Jocic et al., 2010).

Two Pl genes originating from H. argophyllus, Pl<sup>8</sup> and PlArg , were also subject of numerous studies. PlArg confers resistance to all present downy mildew races and Pl<sup>8</sup> to 96% of all isolates collected in the north-central region of United States (Gilley et al., 2016). Radwan et al. (2004) developed STS markers for detection of the Pl5/Pl<sup>8</sup> locus that were later on also explored in other sunflower genotypes (Dimitrijevic et al., 2011). Bachlava et al. (2011) developed two resistance gene candidate (RGC) markers, RGC251 and RGC15/16, closely linked to Pl<sup>8</sup> that belong to the group of SSCP (Single-Strand Conformational Polymorphism) markers. However, SSCPs are labor-intensive and time-consuming in MAS. A comprehensive study of Pl<sup>8</sup> by Qi et al. (2017) explored previously published SNP markers as well as two SSR markers (Bowers et al., 2012; Talukder et al., 2014a) to genotype a F<sup>2</sup> population derived from the cross HA434 × RHA340. The three closest SNP markers, NSA\_000423, NSA\_002220, and NSA\_002251, were then investigated to check the specificity of the identified markers, concluding that NSA\_000423 and NSA\_002220 could serve as diagnostic markers in 87% of the tested sunflower lines when RHA 340 is used as donor for the Pl<sup>8</sup> gene. Validation of these markers across 548 sunflower lines proved their usefulness for MAS. However, a larger panel of sunflower lines should to be tested.

Unlike, the Pl<sup>8</sup> gene, PlArg is not clustered. Several authors identified and developed different types of markers (SSRs, SNPs, RGCs) for MAS (Dußle et al., 2004; Wieckhorst et al., 2010; Imerovski et al., 2014b), some of these were also validated across a panel of sunflower lines. ORS716 was identified as the most useful marker in MAS (**Table 1**). Recently, Qi et al. (2017) combined available genomic data for the population obtained from the cross HA 89 × RHA 464 by use of SNP markers (Pegadaraju et al., 2013; Talukder et al., 2014a) with the phenotypic evaluation for resistance. The two nearest SNP markers (NSA\_007595 and NSA\_001835) narrowed the PlArg locus down to an area of 2.83 Mb. The nine identified SNP markers represent valuable diagnostic tools for introgression of PlArg into most genetic backgrounds in sunflower.

Other markers for use in MAS for downy mildew resistance include the identification of the tightly linked SSR marker ORS1008 to Pl<sup>13</sup> gene (Mulpuri et al., 2009), development of RGC markers tightly linked to Pl<sup>14</sup> gene (Bachlava et al., 2011) and the identification of one dominant co-segregating SSR marker (ORS1008) and one co-dominant tightly linked (EST)- SSR (HT636) to Pl16. Interestingly, HT636 and ORS1008 were reported to be linked to both, Pl<sup>13</sup> and Pl16, indicating that these genes are in close vicinity to each other (Liu et al., 2012a). Qi et al. (2015a) used SSRs to place Pl<sup>17</sup> onto LG4 and then used SNPs identified by the National Sunflower SNP Consortium (Talukder et al., 2014a) and by Bowers et al. (2012) to saturate the region surrounding the Pl<sup>17</sup> gene. The authors identified SNP SFW04052 and ORS963 as the closest flanking markers linked to Pl17. A year later, Qi L.L. et al. (2016) used the same methodology to map Pl<sup>18</sup> to LG2 and found two SSRs and 10 SNPs flanking the Pl<sup>18</sup> gene. Pl<sup>18</sup> represents the first gene mapped to LG2. In 2017, two new Pl genes, Pl<sup>19</sup> and Pl20, were reported and mapped to LG4 and LG8, respectively (Ma et al., 2017; Zhang et al., 2017). Two SSRs and two SNPs were mapped in close vicinity to Pl19, while four SNP markers (SFW02745, SFW09076, S8\_11272025, and S8\_11272046) co-segregated with Pl20. All markers can be used in MAS and most importantly in pyramiding Pl genes in order to achieve long lasting resistance toward downy mildew. The development of SNP markers is of special interest because of the large number of markers generated that increase the likelihood to have markers available for any cross combination.

#### Resistance to Sunflower Rust

Infections of sunflower plants with Puccinia helianthi Schwein lead to the rust disease. This fungus, which is mostly spread in North America, Argentina, South Africa, and Australia, can cause significant damage and yield reduction in infected fields. Genetic control of the disease can be effective; however, due to fast emergence of new races either by sexual or asexual reproduction, resistance achieved is short-termed. Consequently, a significant effort has been made into discovering rust resistance genes and the introduction into commercial lines and hybrids with a final goal of pyramiding several resistance genes in order to achieve long-term resistance. Most of the rust resistance genes (R genes), described so far, are monogenic dominant. R genes are located on different LGs of the sunflower genome with the majority being located on the LG13 [R4, Ru6, R11, Radv, R13<sup>a</sup> (RHAR6), and R13<sup>b</sup> ] (Bachlava et al., 2011; Qi et al., 2011b, 2012b; Gong et al., 2013b; Bulos et al., 2014).

First molecular studies were conducted on discovering markers for R<sup>1</sup> and Radv genes by use of RAPD and SCAR markers (Lawson et al., 1996, 1998). While R<sup>1</sup> gene was the first rust resistance gene present in a large number of sunflower lines, Radv is present in the line P2 owned by Pioneer Hi-Bred Australia (Lawson et al., 1998; Qi et al., 2011a). Radv is also present in the USDA line RHA 340, which Bachlava et al. (2011) used for mapping of the gene. Lawson et al. (1998) developed the SCAR marker SCT06<sup>950</sup> linked to R<sup>1</sup> gene, which proved to be useful for detection of R<sup>1</sup> in different genetic backgrounds, except for the sunflower line MC29, which carries the R<sup>2</sup> and R<sup>10</sup> genes. For mapping of R2, Qi et al. (2015c) used a different MC29 line, called MC29 (USDA) as it was cultivated in the USDA-ARS Sunflower Research Unit, Fargo, North Dakota, which differs in term of resistance to NA race 6 in comparison to the MC29 line used by Lawson et al. (1998). Qi et al. (2015c) reported two SNP markers, NSA\_002316 and SFW01272, flanking the R<sup>2</sup> gene on LG14. Since, the closest marker, SFW01272, can only to a certain extent be used to detect the R<sup>2</sup> gene across different genetic

#### TABLE 1 | Overview of resistance sources, locations, and markers of disease resistance genes in sunflower.


(Continued)

#### TABLE 1 | Continued


Numbers are brackets in the superscript in columns: "Source" and "Markers linked to the resistance gene" refer to the author citation list superscript numbers given in the "Reference" column concerning a specific resistance gene.

backgrounds; the authors recommend the use of two flanking SNP markers in order to minimize selection of false positives in MAS.

Further molecular studies of R genes include identification of molecular markers closely linked to R4, Radv, Pu6, R11, R13<sup>a</sup> (RHAR6), and R13<sup>b</sup> genes that are located on LG13. Qi et al. (2011b) identified two markers flanking R<sup>4</sup> gene (ORS581 and ZVG61) in the cross HA 89 × HA-R3, which were later also reported to be linked to rust resistance genes R13<sup>a</sup> (RHAR6) and R13<sup>b</sup> located on the lower end of the LG13 (Bulos et al., 2013a; Gong et al., 2013b; Qi et al., 2015b) (**Table 1**). Further on, Gong et al. (2013b) saturated the region flanking the genes by analysis of RGC markers that were present in vicinity of downy mildew resistance gene Pl8, which was also mapped in the lower end of LG13. Another R gene that mapped in vicinity of Pl<sup>8</sup> and fertility restorer gene Rf<sup>1</sup> was Radv. A completely co-segregating SCAR marker (Lawson et al., 1998) as well as RGC and SSR markers tightly linked to Radv were identified (Bachlava et al., 2011) (**Table 1**). Recently, Bulos et al. (2014) mapped Pu<sup>6</sup> gene and identified closely linked SSRs to this gene in the sunflower line P386 on lower end of LG13. However, these markers are too far away to be useful in MAS (**Table 1**). Pu<sup>6</sup> and R<sup>4</sup> map 6.25 cM apart from each other. Qi et al. (2012b) examined the R<sup>11</sup> gene and mapped it 1.6 cM from fertility restoration gene Rf<sup>5</sup> also on the lower end of LG13, hypothesizing the presence of a great rust R-gene cluster of Radv/R11/R4. SSR marker ORS45 was the closest to R<sup>11</sup> gene and was mapped 1 cM proximal to the gene, while ORS728 was shown to be a common marker for R<sup>11</sup> and Rf<sup>5</sup> genes. The results allow the conclusion that the lower end of LG13 harbors the second largest cluster of NBS-LRR encoding

genes: rust resistance and downy mildew resistance genes. Based on SSR and RGC markers used in this area, Gong et al. (2013b) proposed that this big cluster could be sub-divided into two clusters. Radv and R<sup>11</sup> form sub-cluster I, while R4, R13a/<sup>b</sup> , Pl5, Pl<sup>8</sup> form subcluster II. Pl<sup>21</sup> that was also positioned on LG13 mapped 8 cM proximal to Pl5/Pl<sup>8</sup> (Radwan et al., 2004; Vincourt et al., 2012).

Other rust resistance genes investigated by use of molecular markers include analysis of R5. This is to date the only R gene discovered on LG2. Qi et al. (2012a, 2015b) identified two SSR and two SNP markers flanking the gene, with the closest being 0.6 cM away (**Table 1**). On LG11 two rust resistance genes have been mapped so far: R<sup>12</sup> and R14. Both genes were positioned in the middle of LG11, and were discovered in wild sunflower accessions, however, they have different origin: R<sup>12</sup> from PI413047 and R<sup>14</sup> from PI413038 (Gong et al., 2013a; Zhang et al., 2016). Both genes were mapped between the markers ORS1227 and ZVG53 (ORS1227 with 3.3 and 1.6 cM and ZVG53 with 9.6 and 6.9 cM from R<sup>12</sup> and R14, respectively). Talukder et al. (2014b) performed fine mapping of the R<sup>12</sup> gene region by using SNP markers. Five SNP markers (NSA\_000064, NSA\_008884, NSA\_004155, NSA\_003320, and NSA\_003426) were linked with 0.83 cM to the gene, but only two markers (NSA\_003426 and NSA\_004155) proved to have diagnostic quality for R<sup>12</sup> (**Table 1**). The nearest SNP marker to R<sup>14</sup> was NSA\_000064, which was mapped with 0.7 cM from the gene in the F<sup>2</sup> mapping population obtained from the cross HA 343 × PH3 (Zhang et al., 2016). However, this marker amplified the same banding pattern in RHA 464 (R12) and PH3 (R14). Zhang et al. (2016) identified thirteen SSR/InDel and two SNP markers that amplified different profiles between the two donors of R<sup>12</sup> and R<sup>14</sup> indicating polymorphisms between these regions.

One of the latest efforts in saturation mapping of R genes was published by Qi et al. (2015b) who used previously developed SFW and NSA SNP markers in order to saturate the regions surrounding R4, R5, R13a, and R13<sup>b</sup> genes and succeeded in identifying markers that are under 1 cM distant from all analyzed genes thus raising the efficiency of introduction of rust resistance in to susceptible material (**Table 1**). The authors used previously developed SSR markers and newly developed SNP markers for identification of homozygous "double-resistant" F<sup>2</sup> individuals in a population obtained from a cross combination between a BC3F<sup>2</sup> plant harboring R<sup>5</sup> and HA-R6 bearing R13a. The F<sup>4</sup> progeny obtained from chosen plants showed improved resistance toward races 336 and 777 in comparison to lines that possess only one resistance gene. Qi et al. (2015c) also performed marker-assisted pyramiding of R<sup>2</sup> and R13<sup>a</sup> in confectionary sunflower by use of SSR and SNP markers. Further pyramiding of R genes could lead to long-term improvements in sunflower rust resistance. The process of converting susceptible into resistant forms can be greatly facilitated and accelerated by use of the reported molecular markers.

#### Resistance to Broomrape

Another constraint in sunflower production is broomrape (Orobanche cumana), a parasitic flowering plant, that can cause significant yield loss of up to 100%. Most of the genes that confer resistance to broomrape were found to be monogenic dominant for broomrape races A to E and G (Vranceanu et al., 1980; Velasco et al., 2012), while resistance to race F was either inherited by a monogenic dominant gene (Pacureanu-Joita et al., 1998; Pérez-Vich et al., 2004) or by two recessive genes (Rodríguez-Ojeda et al., 2001) depending on the genetic background. Broomrape resistance genes are denoted as Or genes. Imerovski et al. (2014a) reported a single recessive resistance gene in the sunflower line HA-267 that carried a resistance gene higher than Or6. The majority of molecular analyses were conducted in investigating and creating different types of molecular markers for detection of Or<sup>5</sup> that conveys resistance to broomrape race E or lower (Lu et al., 2000; Tang et al., 2003a) (**Table 1**). The efficiency of RAPD and SSR primers in MAS for Or<sup>5</sup> were tested by Iuoras et al. (2004), however, none of the primers proved to be efficient or accurate enough. Imerovski et al. (2013) identified SSR markers associated with Or2, Or4, and Or<sup>6</sup> genes that could be used in converting broomrape susceptible sunflower genotypes into resistant ones. However, O. cumana populations belonging to race F have shown different aggressiveness (Molinero-Ruiz et al., 2009). Imerovski et al. (2016) mapped newly identified broomrape resistant gene conferring resistance to broomrape races overcoming race F from sunflower inbred line AB-VL-8 on LG3. The authors named the gene Orab−vl−8, which was shown to be recessive and ORS683 mapped 1.5 cM from the gene. Further molecular analysis are needed in order to develop co-segregating markers for some of the Or genes. In addition, finding novel resistance sources is essential since broomrape races are emerging at a high speed. Recent work of Louarn et al. (2016) involved using 586, 985 SNPs from SUNRISE project<sup>8</sup> on GeneTitan <sup>R</sup> (Affymetrix) for identification of QTL for resistance to broomrape races F and G. The authors identified 17 QTL spread throughout 9 LGs. Among them was a stable QTL on LG13 that controlled the number of broomrape emergence that explained 15–30% of the phenotypic variability. This QTL was marked as the one that could be the most rapidly used. A molecular characterization of O. cumana populations in Europe using RAPD-PCR identified four groups (Molinero-Ruiz et al., 2014). These markers might be useful as molecular tools to detect first broomrape appearances in fields that had been free of virulent races (Molinero-Ruiz et al., 2014).

#### Herbicide Tolerance

Different tolerances against herbicides inhibiting the large, catalytic subunit of the acetohydroxyacid synthase (AHASL) have become a very necessary tool in sunflower hybrid production and cultivation as these facilitate the application of either imidazolinones (IMIs) or sulfonylureas (SUs) against broadleaf weeds (Sala et al., 2012). It also allows a race independent control of broomrape (Skoric and Pacureanu, 2010). Three AHASL genes were isolated from sunflower: AHASL1 located on LG9, AHASL2 on LG6 and AHASL3 on LG2 (Kolkman et al., 2004). Only mutations in AHASL1 seem to be involved

<sup>8</sup>https://cnrgv.toulouse.inra.fr/fr/Projets/Analyse-de-genome-pour-l-

amelioration-des-plantes/SUNRISE-SUNflower-Resources-to-Improve-yield-Stability-in-a-changing-Environment

in the herbicide tolerance in sunflower. Four different mutated alleles have been explored for commercial use in sunflower hybrid breeding: Imisun/Clearfield <sup>R</sup> , Clearfield Plus <sup>R</sup> , Sures and ExpressSun <sup>R</sup> (Sala et al., 2012). Point mutations from C-T in codon 205 (Ahasl1-1) and in codon 197 (Ahasl1-2) (adopting the Arabidopsis nomenclature) confer moderate tolerance to IMIs and high tolerance to SUs, respectively. The allele Ahasl1-3 is characterized by a G-A mutation in codon 122 and results in high levels of IMI tolerance (Sala et al., 2008b). The broadest range of herbicide tolerance is shown by allele Ahasl1-4, which has a G-T mutation in codon 574 (Sala and Bulos, 2012). A first SNP marker based on the C-T change in codon 205 proved to be very useful as it cosegregated with partially dominant herbicide tolerance for the Imisun/Clearfield <sup>R</sup> system (Kolkman et al., 2004), even though an additional non-target gene is required for the tolerance (Bruniard and Miller, 2001; Miller and Al-Khatib, 2002). One SSR marker exploiting the differences in the (ACC) repeats present in the AHASL gene allows the differentiation between the wild type Ahasl1 allele and alleles Ahasl1-1 and Ahasl1-2 (Sures and ExpressSun <sup>R</sup> ) (Kolkman et al., 2004; Bulos et al., 2013b). A CAPS marker developed by Bulos et al. (2013b) uses the A-T exchange to detect the Ahasl1-3 allele (Clearfield Plus <sup>R</sup> ) by digesting the PCR product with the restriction enzyme BmgBI. The markers can now help to select for herbicide tolerance. Nevertheless, the development of efficient screening tests for herbicide tolerance is crucial (e.g., Breccia et al., 2011; Vega et al., 2012).

#### Seed Oil Quality

Several oil properties have been characterized as quantitative traits, however, some traits such as oleic acid content (OAC) could, to a certain extent, be considered a semi-qualitative trait since OAC is dependent not only on the environment, but also on the genetic background of the receiver line (Ferfuia et al., 2015; Regitano Neto et al., 2016). A partial duplication of the FAD2-1 allele caused by chemical mutation leads to an increase in OAC by silencing the FAD2-1 gene encoding FAD2 (oleoylphosphatidylcholine desaturase) (Lacombe et al., 2002; Schuppert et al., 2006). This enzyme catalyzes the synthesis of linoleic acid from oleic acid and by silencing its activity oleic acid is accumulated. Soldatov (1976) created the Pervenets cultivar with elevated OAC, which has become the main source of elevated OAC in sunflower breeding programs worldwide due to the beneficial properties of high oleic sunflower oil (Allman-Farinelli et al., 2005; Vannozzi, 2006). Inheritance of the OAC trait has been a subject of numerous studies and different results were reported from a single dominant gene to several genes influencing OAC (Urie, 1984; Lacombe et al., 2004; Joksimovic et al., 2006; Bervillé, 2010; Ferfuia and Vannozzi, 2015; Premnath et al., 2016; Dimitrijevic et al., 2017). Gene/genes involved in inheritance of OAC have been denoted as Ol genes. Different markers were employed in mapping and detecting the mutation (Ol mutation) in sunflower. The two RAPD markers, F15-690 and AC10-765, were linked with 7.0 and 7.2 cM to Ol<sup>1</sup> gene, respectively (Dehmer and Friedt, 1998). Later on, the Ol1-FAD2-1 locus was placed onto LG14 (Pérez-Vich et al., 2002; Schuppert et al., 2006). One major QTL identified by Pérez-Vich et al. (2002) explained 84.5% of the variation in the OAC. Schuppert et al. (2006) provided dominant INDEL markers for tracking the Ol mutation in addition to identifying 49 SNPs and five INDELs in the 3<sup>0</sup> region of FAD2-1. Three years later, a co-dominant SSR marker tightly linked to the Ol mutation and dominant markers specific for the mutation were published (Lacombe et al., 2009). Recently, Premnath et al. (2016) identified in addition to the QTL on LG14, two additional QTL for OAC on LG8 and LG9. The two markers HO\_Fsp\_b for the QTL on LG14 (Schuppert et al., 2006) and ORS762 for the QTL on LG8 explained about 60% of the phenotypic variation in OAC. Several of the markers have been used for validation across numerous sunflower lines (Nagarathna et al., 2011; Singchai et al., 2013; Bilgen, 2016; Dimitrijevic et al., 2016). Dimitrijevic et al. (2017) reported marker F4-R1 created by Schuppert et al. (2006) as the most efficient in MAS for OAC.

#### Fertility Restoration

Development of reliable tools for detection of cytoplasmic male sterility (cms) and restorer of fertility (Rf) genes would significantly improve and accelerate the process of developing sunflower hybrids. In sunflower, CMS PET1 originating from an interspecific hybridization of H. petiolaris with H. annuus (Leclercq, 1969) is the only CMS cytoplasm worldwide used for hybrid breeding. Male sterility is caused by the co-transcription of the atpA gene with the new CMS-specific orfH522 leading to the expression of a 16-kDa-protein (Horn et al., 1991; Köhler et al., 1991). Fertility restoration suppresses the co-transcription anther-specific (Monéger et al., 1994). In sunflower, the restorer genes for the PET1 cytoplasm represent the best characterized due to the commercial use of this cytoplasm in sunflower hybrid breeding. The restorer gene Rf1, which was originally discovered by Kinman (1970) in the line T66006-2-1-B, has since then been integrated into a number of USDA/ARS RHA lines like RHA 271, RHA 272, RHA 273, and others (Korell et al., 1992; Serieys, 2005). A second major dominant restorer gene Rf<sup>2</sup> was discovered in a test cross between T66006-2-1-B and MZ01398. However, this Rf<sup>2</sup> gene seems to be ubiquitously present in almost all cultivated sunflower lines, along with maintainer lines of CMS PET1 (Serieys, 2005). Only Rf<sup>1</sup> is responsible for restoring male fertility in sunflower hybrids (Leclercq, 1984). RAPD markers in combination with AFLP markers were very useful for mapping of the restorer gene Rf<sup>1</sup> (Horn et al., 2003), which was first positioned on LG6 of the RFLP sunflower map (Gentzbittel et al., 1995). Two RAPD markers OPK13\_454 and OPY10\_740, which mapped 0.8 and 2.0 cM from Rf1, respectively, were converted into more reliable, easier to handle SCAR markers HRG01 and HRG02 (Horn et al., 2003). A recent study of these SCAR markers for breeding practice proved that HRG01 is more efficient for Rf<sup>1</sup> detection in perennial species, whereas HRG02 gave better results for annual species (Markin et al., 2017). In addition a multiplex TaqMan assay was established that allowed the detection of HRG01 and orfH522 at the same time (Markin et al., 2017). Using the SSR markers ORS1030, Rf<sup>1</sup> had been mapped to LG13 (Kusterer et al., 2005) of the sunflower reference map (Tang et al., 2003b). In addition, a CAPS marker H13, which mapped 7.7 cM from Rf<sup>1</sup> gene, was developed from the RAPD marker OPH13\_337 by digesting the PCR product with HinfI (Kusterer et al., 2005). The tight linkage between CAPS H13 and Rf<sup>1</sup> was

confirmed in Xenia hybrid combination (Port et al., 2013). An additional SSR marker ORS511 and a TRAP marker K11F05Sa12- 160 were mapped to the Rf<sup>1</sup> gene with distances of 3.7 and 0.4 cM, respectively (Yue et al., 2010). A fertility restorer gene Rf3, which could be shown to be different from Rf<sup>1</sup> and Rf2, was identified in the confectionery restorer line RHA 280 (Jan and Vick, 2007). Rf<sup>3</sup> could be linked with eight markers to LG7, including five known SSR markers (ORS328, ORS331, ORS928, ORS966, and ORS1092) and three new SSR markers HT-619-1, HT619-2, and HT1013 derived from expressed sequence tags (Liu et al., 2012b). SSR ORS328, which mapped 0.7 cM distant from Rf3, represents so far the closest co-dominant marker to the gene (Liu et al., 2012b). Another restorer gene Rf<sup>3</sup> in RHA 340 has also been mapped to LG7 (Abratti et al., 2008). Rf ANN-1742, a restorer line derived from wild H. annuus showed resistance to rust (Qi et al., 2012b). The new rust resistance gene R<sup>11</sup> mapped with 1.6 cM closely to a restorer gene on the lower end of LG13. The SSR marker ORS728 was mapped 1.3 cM proximal from this restorer gene and 0.3 cM distal to R11. Marker analyses using HRG01, HRG02, STS115, and ORS728 indicated that this restorer gene, now called Rf5, might not be allelic to Rf<sup>1</sup> (Qi et al., 2012b).

So far, 72 new CMS sources have been described for sunflower (Serieys, 2005). However, only for very few of these CMS sources markers have been detected linked to the corresponding restorer genes (Horn et al., 2016). Feng and Jan (2008) tagged an additional restorer gene Rf<sup>4</sup> with molecular markers and assigned it to LG3 of the sunflower general reference map (Tang et al., 2003b). Rf<sup>4</sup> is restoring male fertility to a newly identified CMS cytoplasm GIG2. Schnabel et al. (2008) identified AFLP markers that mapped in close vicinity of the restorer gene Rf\_PEF1, which represent a major restorer gene for the PEF1 CMS cytoplasm, another potentially interesting CMS source for commercial sunflower hybrid breeding. In addition, markers were developed that allowed the distinction between the PET1 cytoplasm and the PEF1 cytoplasm. For CMS 514A, a H. tuberosus based male sterile cytoplasm, the restorer gene Rf<sup>6</sup> was located on LG3 with eight markers (Liu et al., 2013). Two SSR markers, ORS13 and ORS1114, mapped as close as 1.6 cM to Rf6. GISH showed Rf<sup>6</sup> to be present on a small translocation introgressed from H. angustifolius.

Further analyses are needed in order to develop more tightly linked molecular markers to Rf genes to locate them on the genetic map and to get an insight on the fertility restoration mechanisms in sunflower. In other species, most of the so far cloned restorer of fertility genes belong to the pentatricopeptide repeat gene family (PPR), however, also other types of restorer genes have been identified (Horn et al., 2014).

#### ASSOCIATION MAPPING

For association mapping two approaches have been explored: (1) genome-wide association studies (GWAS) and (2) candidate gene approaches. For most plant species, the last strategy was predominantly applied because whole genome sequences have only recently become available (Fusari et al., 2008). However, high-throughput marker systems nowadays give full genome coverage, which makes approaches as genome-wide association mapping, QTLSeq mapping and genomic selection possible (Mammadov et al., 2012). As the linkage disequilibrium (LD) in sunflower rapidly decays (Liu and Burke, 2006; Kolkman et al., 2007; Fusari et al., 2008) studies based on associations could result in resolution levels detecting genes underlying quantitative trait loci. However, it is important to analyze the population structure of the association mapping population to avoid false associations.

In sunflower, only one of the association mapping studies so far was performed genome-wide (Mandel et al., 2013), all others were candidate gene based (Fusari et al., 2012; Cadic et al., 2013; Talukder et al., 2014b; Nambeesan et al., 2015; McAssey et al., 2016). Genome-wide association mapping was performed in an association population of 271 lines (Mandel et al., 2011), using 5,359 SNP marker from the Illumina Infinium Beadchip (Mandel et al., 2013). Associations were studied regarding flowering time, branching and heterotic groups. LD showed considerable variability across the genome, but significant marker-trait associations were detected. Selection for disease resistance as well as initial domestication might be responsible for the genomewide differences in the LD profile (Mandel et al., 2013). This first screen was followed by a more detailed, refined association mapping approach based on candidate genes for branching (Nambeesan et al., 2015). Shoot branching was differentiated in no branching, apical, mid-apical, mid, mid-basal, basal branching as well as whole plant branching or other phenotype. A total of 48 candidate genes described to be involved in branching in other plant species were used to detect homologs to 39 genes in sunflower. Up to eight of the highest BLAST hit for each gene were included in the analyses due to the recent triplication of the sunflower genome (Badouin et al., 2017). For 13 candidate genes for branching co-localization of SNPs associated with branching was observed (Nambeesan et al., 2015). Most of these were found on LG10, where previous QTL mapping had detected the B-Locus for recessive branching (Tang et al., 2006; Bachlava et al., 2009). With regard to flowering time, a SNP in HaFT2 was identified that co-localized with a flowering time QTL (McAssey et al., 2016).

Association mapping and linkage mapping were combined with QTL detection to identify mutations responsible for changes in flowering time (Cadic et al., 2013). Associations with flowering time could be demonstrated for 11 regions distributed over 10 LGs. In addition, QTL for flowering time were detected on 11 LGs in a RIL population by linkage mapping. This large number of QTL is consistent with the polygenic pattern of inheritance of flowering time reported before (Leon et al., 2000). SNPs detected by association mapping were then investigated with regard to positional overlaps with QTL identified in the RIL population. The remaining eight regions contained five candidate genes potentially associated with flowering time in other species that showed SNPs in sunflower, one of the genes was the gibberellin receptor GID1B (Cadic et al., 2013). Thirty genes, including this gene had before been investigated as candidate genes for flowering time with regard to domestication and improvement in sunflower (Blackman et al., 2011). One major QTL, which was detected on LG14 by linkage mapping (Poormohammad Kiani et al., 2009), was not detected by the association study (Cadic et al., 2013). This can happen if alleles are present in

a low frequency in an association panel as one disadvantage of association mapping is that rare alleles are difficult to be associated with traits.

Sclerotinia sclerotiorum, a necrotrophic, fungal pathogen, is one of the most devastating diseases in sunflower. The fungus can cause three different types of diseases depending on which part of the plants gets infected and whether the infection occurs via ascospores or mycelia (Gulya et al., 1997). These are stalk rot, mid-stalk rot and head rot. In a panel consisting of 94 sunflower lines 16 candidate genes were screened for associations to Sclerotinia head rot using a Mixed Linear Model (MLM) that also considers family relationship as well as population structure (Fusari et al., 2012). These candidate genes had been derived from previous transcript profiling in sunflower (Peluffo, 2010) and Brassica (Zhao et al., 2007) after infecting the plants with S. sclerotiorum. Significant association of the haplotype 3 of the gene HaRIC\_B, representing a truncated gene, was detected and accounted for 20% reduction in Sclerotinia head rot. Candidate gene association mapping for Sclerotinia stalk rot was also performed in another association panel of 260 cultivated sunflower lines (Talukder et al., 2014b). Eight genes, which had been identified in defense response against S. sclerotiorum in Arabidopsis (Guimaraes and Stotz, 2004; Guo and Stotz, 2007), served as basis to identify the orthologous genes in sunflower. The panel was divided in two groups representing either the best resistance response or the most susceptible lines. Association studies found strong association of HaCOI1-1 and HaCOI1-2 with resistance against Sclerotinia stalk rot, explaining 7.4% of the observed phenotypic variation (Talukder et al., 2014b).

Association mapping studies in the recent years have shown that this approach represents an interesting alternative to linkage mapping especially regarding quantitative inherited traits.

#### TOWARD GENOMIC SELECTION

Genomic selection (GS) is so far mostly used in animals, e.g., dairy cattle (Van Raden et al., 2009). However, application of genomic selection got started as well in plant breeding, e.g., in maize (Massman et al., 2013; Bandeira e Sousa et al., 2017; Cantelmo et al., 2017; Lyra et al., 2017), potato (Habyarimana et al., 2017), soybean (de Azevedo Peixoto et al., 2017), sugar beet (Würschum et al., 2013), and wheat (Bassi et al., 2016). Genomic selection was regarded as promising in hybrid breeding of selfpollinating crops as wheat (Longin and Reif, 2014; Zhao et al., 2015), especially if little is known about the heterotic pools. To implement GS into sunflower breeding programs some general aspects of genomic selection need to be emphasized.

Genomic selection selects the individuals based on genomic breeding values (GEBVs) (Meuwissen et al., 2001). The idea of GS is to use genome-wide molecular data to effectively select for quantitative trait loci (Bernardo, 2008; Massman et al., 2013; Würschum et al., 2013). More than 10,000 QTL have been detected by traditional mapping approaches considering 12 major crop species, but only very few have been successfully applied in marker-assisted breeding programs (Bernardo, 2008). Genomic selection is a concept that becomes more attractive as high-throughput genotyping becomes feasible due to recent advances in genotyping platforms and to considerable price reductions in the last few years. As first step in GS, a training population has to be established that is genotyped and phenotyped. This training population is needed to adjust the statistical models, which are then applied to predict breeding and genotypic values of individuals that have not been phenotyped (Bassi et al., 2016). The breeding population consists of these not phenotyped individuals that are only genotyped. Selection is performed in the breeding population. Finally, a validation population serves to estimate the accuracy of the GS models (Bassi et al., 2016). Comparing traditional MAS and GS, three major differences are obvious: (1) within the training phase markers linked with a gene of interest and quantitative traits are identified in MAS, whereas in GS models are developed to predict GEBVs, (2) in the breeding phase only few markers are used in traditional MAS for genotyping, whereas in the GS genome-wide genome data are collected and (3) regarding the selection in the breeding phase traditional MAS uses only the identified markers to select the individuals by genotype, whereas selection in GS is performed based on the GEBV (Nakaya and Isobe, 2012). For the success of GS, the accuracy of the prediction of GEBV is the most important factor. The accuracy of prediction relies on the characteristics of the training population as size, marker density, trait heritability and kinship between training and breeding population as well as the ratio of training population : breeding population (Nakaya and Isobe, 2012; Bassi et al., 2016). In traditional MAS, markers tightly linked to a QTL could be applied in most other breeding population, so that the relationship between the mapping and the breeding population had not to be considered by the breeder. However, in GS the interrelationship between training and breeding population is crucial for the predictive power (Nakaya and Isobe, 2012).

In sunflower, prediction of hybrid performance was based on fingerprinting data in form of 572 AFLP markers (Reif et al., 2013). Intragroup (133) and intergroup hybrids and the parental lines were evaluated at two locations in 2 years for grain yield, oil content and oil yield. If no information on the General Combining Ability (GCA) of the parental lines was accessible, prediction of hybrid performance using genomic selection methods was accurate if the parents were closely related, but with genetically distant lines prediction proved challenging (Reif et al., 2013). However, prediction based on GCA could not be improved by genomic selection. In the recent years, large sets of markers were generated in sunflower by genotyping-by-sequencing (Baute et al., 2016; Celik et al., 2016; Talukder et al., 2016; Ma et al., 2017; Qi et al., 2017), application of the new 25 K SNP genotyping array (Livaja et al., 2016) and sequencing of parental lines (Mangin et al., 2017a). However, so far only SNP array data were used for genomic prediction of Sclerotinia resistance (Livaja et al., 2016) and sequencing data for the genomic prediction of sunflower hybrid oil content (Bonnafous et al., 2016; Mangin et al., 2017a). In the latter case, an incomplete factorial design consisting of 36 CMS lines and 36 restorer lines was used to compare

prediction accuracy of GS and classical GCA modeling in sunflower. Multi-environmental field trials were performed to characterize 452 sunflower hybrids of the panel with regard to hybrid performance in oil content, which represents a primarily additive trait with high heritability. In addition, all 72 parental lines were sequenced to obtain genome-wide SNP markers (Mangin et al., 2017a). Genomic predictions were then made for missing hybrids and hybrid combinations lacking information about at least one parental line. In conclusion, GS led to considerable improvement in breeding efficiency compared to the conventional GCA modeling if little is known about one or both parental lines (Mangin et al., 2017a). For Sclerotinia midstalk rot, the prediction ability of a genome-based best linear unbiased prediction (GBLUB) model was evaluated in a biparental population genotyped with the 25 K SNP array (Livaja et al., 2016). High predictive abilities were obtained for "stem lesion length" and lower predictive abilities for "leaf lesion length" and "speed of fungal growth," which represent traits with lower heritabilities. These first experimental trials for genomic predictions, using and comparing the results of different models, have shown the potential and the limitations for genomic selection in sunflower.

#### FUTURE PERSPECTIVES

In this review the emphasis was given to plant genetic resources and molecular tools used to detect and exploit genetic diversity and to facilitate sunflower hybrid breeding. Traditional MAS has been successfully used to introduce monogenic traits into the breeding material, especially disease resistance as well as herbicide tolerance. Validation of identified molecular markers across different genotypes has also shown the limitation in markers to be used in different genetic backgrounds. However, sunflower researchers have put a lot of effort in the identification of markers linked to specific traits without gaining insight into the function of the involved genes, even though this would allow a better understanding of the metabolism and mechanisms behind traits. Breeding for complex polygenic traits is still challenging. With this regard, it is necessary to stress the importance of precise phenotypic evaluation, on which molecular biologists rely to correctly interpret the molecular and phenotypic data. High-throughput phenotyping as applied and tested in other crops would be also interesting for sunflower (Sankaran et al., 2015). There has been a first report on testing remote sensing on sunflower and maize in China with regard to future applications (Yu and Shang, 2017). In recent years, high throughput genotyping platforms, e.g., SNP arrays, GBS and whole genome sequencing have been established and successfully used in sunflower (Livaja et al., 2016; Qi L. et al., 2016; Talukder et al., 2016). GWAS (genome wide association study) and GS (genomic selection) using large amounts of markers across a wide range of genotypes provided by these techniques open up new possibilities to address complex traits in sunflower. However, GWAS is still expensive and unavailable for many researchers and breeders. Some initial steps have been made in order to create the most appropriate models for prediction of hybrid performance based on GWAS and GS data (Bonnafous et al., 2016; Mangin et al., 2017a), yet there is still a need for further improvement of prediction models, which mostly take additive effects into account, whereas for heterosis also dominance and epistasis play an important role. As in conventional breeding, speciesspecific strategies will have to be developed for GS taking into account reproduction system, generation time, genome structure, harvested organs and breeding purposes (Nakaya and Isobe, 2012). However, first empirical GS studies in plants showed the potential for GS also in plant breeding. It could be demonstrated that the correct choice of population allows successful performance of GS even with lower numbers of markers and reasonable sizes of populations (Nakaya and Isobe, 2012).

Access to the recently published sunflower genome sequence (Badouin et al., 2017) should allow researchers and breeders to make sunflower breeding more efficient in the coming years. However, exploring the sunflower genome on its own is not enough. Extensive transcriptomics, proteomics and metabolomics data are required as only the combination of all Omics data will enable us to get to the bottom of some important physiological and molecular mechanisms unique to sunflower. This is especially important for quantitative traits such as drought tolerance or biotic stress resistance (e.g., against Sclerotinia, Phoma, Phomopsis). First results in this direction have been published. Transcriptional profiling has been done with regard to disease reactions of resistant and sensitive genotypes to pathogens as S. sclerotiorum (Muellenborn et al., 2011), Plasmopara halstedii (Livaja et al., 2013) and Verticillium dahliae (Guo et al., 2017). Identification of the differentially expressed genes now allows a better understanding of the mechanisms behind pathogen attacks and plant reactions. This knowledge will be helpful with regard to developing resistant cultivars. Earlier metabolome data of head rot between genotypes with different reactions to S. sclerotiorum also gave an indication to 63 metabolites involved in the attack of the pathogen (Peluffo et al., 2010). To analyze the response of sunflower to drought transcriptome analyses of sunflower genotypes under waterlimited conditions in comparison to well-water plants have been performed by RNASeq or microarray analyses (Liang et al., 2017; Moschen et al., 2017; Sarazin et al., 2017). Combination of the transcriptomic and metabolic data made the identification of drought relevant hubs for transcription possible (Moschen et al., 2017). Leaf senescence is a naturally occurring process, but the onset and progress of senescence plays a major role for yield. Integration of transcriptomic and metabolomics data identified metabolites and transcription factors as applicable biomarkers (Moschen et al., 2016a,b). To explore the potential of other species in the genus Helianthus for sunflower breeding, transcriptomics have also been performed to address populations of, e.g., perennial sunflowers as H. maximiliani (Kawakami et al., 2014) and H. tuberosus (Jung et al., 2014) as well as interspecific hybrids of annuals in the first generation (Rowe and Rieseberg, 2013). Proteomic analyses in sunflower have been performed with regard to drought stress (Castillejo et al., 2008; Fulda et al., 2011; Ghaffari et al., 2013, 2017), cold acclimation

(Balbuena et al., 2011), response to metal-ion contamination (Garcia et al., 2006; Printz et al., 2013; Lopes Júnior et al., 2015), seed protein composition (De Sousa Barbosa et al., 2013), heterosis performance (Mohayeji et al., 2014), and resistance to O. cumana (Yang et al., 2017). In addition, the sunflower genome database represents a very valuable tool, which allows access to a wide range of transcriptome data, which have already been successfully used to address flowering time and oil metabolism (Badouin et al., 2017). However, further studies in sunflower are still needed in order to analyze in detail responses to different abiotic and biotic stress conditions and to prepare sunflower for future climatic challenges. Combining Omics data will allow system biology approaches to improve sunflower hybrids. Another aspect is the optimization of plant architecture to a more compact form, which would have an influence on photosynthesis, lodging, climatic adaptation and possible plant densities. This could also improve sunflower hybrid performance and increase yields per hectare by use of higher plant densities (Hall et al., 2010). Picheny et al. (2017) used the crop model SUNFLO to design sunflower ideotypes with optimized morphological and physiological traits for certain environments.

However, only a combined effort of the sunflower research community can make sunflower more competitive to other oil

#### REFERENCES


crops. The new high-throughput technologies combined with new genomic-based breeding strategies give us the opportunity, as never before, to understand and mine genetic variation and to use it for improvement of sunflower hybrids.

#### AUTHOR CONTRIBUTIONS

Both authors AD and RH have made an equal substantial, direct and intellectual contribution to writing the review and its revision, and approved it for publication. The table was prepared by AD, the figure by RH. The authors complied to the ethical standards.

#### ACKNOWLEDGMENTS

The authors would like to thank the University of Rostock and the DFG for funding part of the research included (HO 1593/5-1, HO 1593/5-2, and HO 1593/6-1). They are also grateful for the University of Rostock and the DFG for funding the open access publication. Another part of the research presented was supported by project TR31025 (Ministry of Education, Science and Technological Development, R. Serbia).


(Helianthus annuus L.). Theor. Appl. Genet. 103, 992–997. doi: 10.1007/ s001220100660



of contrasting susceptibility to lodging. Field Crops Res. 116, 46–51. doi: 10. 1016/j.fcr.2009.11.008



oleate desaturase protein, proteins, methods and uses. Patent No. WO 2013/004281 A1.



sunflower (Helianthus annuus L.). Theor. Appl. Genet. 119, 795–803. doi: 10. 1007/s00122-009-1089-z


water treatments. Plant Breed. 128, 363–373. doi: 10.1111/j.1439-0523.2009. 01628.x



accumulation in sunflower seeds. Mol. Breed. 17, 291–296. doi: 10.1007/s11032- 005-5678-5


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Dimitrijevic and Horn. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genetic Architecture of Capitate Glandular Trichome Density in Florets of Domesticated Sunflower (Helianthus annuus L.)

Qing-Ming Gao<sup>1</sup> , Nolan C. Kane<sup>2</sup> , Brent S. Hulke<sup>1</sup> \*, Stephan Reinert <sup>2</sup> , Cloe S. Pogoda<sup>2</sup> , Silas Tittes <sup>2</sup> and Jarrad R. Prasifka<sup>1</sup>

<sup>1</sup> USDA-ARS Red River Valley Agricultural Research Center, Fargo, ND, United States, <sup>2</sup> Ecology and Evolutionary Biology Department, University of Colorado, Boulder, CO, United States

Capitate glandular trichomes (CGT), one type of glandular trichomes, are most common in Asteraceae species. CGT can produce various secondary metabolites such as sesquiterpene lactones (STLs) and provide durable resistance to insect pests. In sunflower, CGT-based host resistance is effective to combat the specialist pest, sunflower moth. However, the genetic basis of CGT density is not well understood in sunflower. In this study, we identified two major QTL controlling CGT density in sunflower florets by using a F<sup>4</sup> mapping population derived from the cross HA 300 × RHA 464 with a genetic linkage map constructed from genotyping-by-sequencing data and composed of 2121 SNP markers. One major QTL is located on chromosome 5, which explained 11.61% of the observed phenotypic variation, and the second QTL is located on chromosome 6, which explained 14.06% of the observed phenotypic variation. The QTL effects and the association between CGT density and QTL support interval were confirmed in a validation population which included 39 sunflower inbred lines with diverse genetic backgrounds. We also identified two strong candidate genes in the QTL support intervals, and the functions of their orthologs in other plant species suggested their potential roles in regulating capitate glandular trichome density in sunflower. Our results provide valuable information to sunflower breeding community for developing host resistance to sunflower insect pests.

Keywords: sesquiterpenes, capitate glandular trichome, glandular trichome, sunflower, Helianthus annuus L., heat shock transcription factor, WRKY transcription factor

#### INTRODUCTION

Plant trichomes, the hair-like structures on above-ground plant surfaces, are key features governing interaction with the environment, including biotic and abiotic factors. Plant trichomes vary greatly in morphology and function, with at least 300 known types of plant trichomes (Wagner, 1991; Spring, 2000; Werker, 2000). Based on their metabolic activity, plant trichomes are classified into two broad groups: glandular trichomes (GTs) and non-glandular trichomes (non-GTs), which co-exist on plant surfaces like leaves, flowers, stems, and bracts (Hare and Elle, 2002; Rautio et al., 2002). A well-studied example of non-GTs is from the model plant Arabidopsis thaliana, which possesses trichomes that are unicellular, unbranched or with two to five branches (Oppenheimer et al., 1991; Larkin, 1994; Szymanski et al., 2000). In contrast, GTs are usually multicellular and

#### Edited by:

Dragana Miladinovic,´ Institute of Field and Vegetable Crops, Serbia

#### Reviewed by:

Begoña Pérez Vich, Consejo Superior de Investigaciones Científicas (CSIC), Spain Jadranka Z. Lukovic, Faculty of Science, University of Novi Sad, Serbia

> \*Correspondence: Brent S. Hulke brent.hulke@ars.usda.gov

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 19 September 2017 Accepted: 18 December 2017 Published: 09 January 2018

#### Citation:

Gao Q-M, Kane NC, Hulke BS, Reinert S, Pogoda CS, Tittes S and Prasifka JR (2018) Genetic Architecture of Capitate Glandular Trichome Density in Florets of Domesticated Sunflower (Helianthus annuus L.). Front. Plant Sci. 8:2227. doi: 10.3389/fpls.2017.02227

**126**

consist of differentiated basal, stalk, and secretory cells. GTs have been found on ∼30% of all vascular plants, particularly in dicots.

Numerous studies have shown plant trichomes are involved in water protection, absorbing UV radiation, and attracting pollinators by releasing specific chemicals (Skaltsa et al., 1994; Moraes et al., 1998; Paré and Tumlinson, 1999; Benz and Martin, 2006; Lusa et al., 2014). However, the most notable function of plant trichomes is their role in plant defense. Non-GTs serve as physical obstacles to create an unfavorable microenvironment for herbivores, parasites, and pathogens. Some non-GTs form spine- or thorn-like structures that are able to injure and trap insects and other pests during their searching and feeding activities (Pillemer and Tingey, 1976; Riddick and Wu, 2011). On the other hand, GTs are metabolic factories and mainly function in chemical defense in plants (Peiffer et al., 2009; Tian et al., 2012). The biochemical pathways and metabolic profiles in GTs have been studied in great detail in plants (Tissier, 2012). GTs can synthesize and release a wide range of chemicals, including flavonoids, terpenes, alkaloids, acylsugars, methyl-ketones and surface proteins (Glas et al., 2012). Recent molecular studies confirm, in different plant species, that trichome development is under the control of a conserved regulatory network, and have identified the function of several key genes in the network (Serna and Martin, 2006; Yang and Ye, 2012; Pattanaik et al., 2014). Compared with leaf tissues, more than 200 genes involved in secondary metabolism and defense-related genes are differentially expressed in trichomes in Nicotiana tabacum, and some of these were only expressed in trichomes (Cui et al., 2011). These secondary metabolites are either effective toxins against the pests or attractors to the natural enemies of the pests (Howe and Jander, 2008; War et al., 2012). In some cases, GTs actively release resinous or sticky chemicals on plant surfaces to limit insect movement and function as a physical defense (Simmons and Gurr, 2005; Glas et al., 2012). Studies on wild species of Medicago have shown that erect GTs are effective against stem-, leaf-, and fruit-eating pests (Kitch et al., 1985; Danielson et al., 1990), demonstrating their wide role in defense of many types of tissues.

In Asteraceae, GTs have been the subject of many studies for their ability to synthesize various secondary metabolites of ecological and medical importance (Wang et al., 2009; Chadwick et al., 2013; Lv et al., 2017). GTs on tarweed (Madia elegans) entrap insects by releasing sticky chemicals and provide carrion as food resource for predators. This indirect defense efficiently decreases the herbivore's activities and increases plant fitness under natural field conditions (Krimmel and Pearse, 2013). Among all the secondary metabolites produced by GTs, sesquiterpene lactones (STLs) are the most prevalent within Asteraceae species. Chemically, STLs derive from the isoprenoid biosynthetic pathway and contain a basic backbone of 15 carbon atoms (Zidorn, 2008). STLs show great structural variety in the arrangement of the basic skeleton and in the composition of the side chain. More than 7,000 distinct molecules of STL have been reported so far (Fraga, 2005). Besides of their ecological role in nature, STLs provide benefits to human health (Chadwick et al., 2013). Artemisinin, a unique STL molecule isolated from sweet wormwood (Artemisia annua), is the most effective treatment for malaria (Covello et al., 2007). Other studies also showed STLs have anti-carcinogenic and anti-inflammatory effects (Li-Weber et al., 2002; Ghantous et al., 2010; Nasim et al., 2011). Consequently, there is great interest in understanding the anatomy, development, biochemistry and molecular biology of GTs in Asteraceae, and the biosynthetic pathways of artemisinin and other STLs have been well studied (Spring and Bienert, 1987; Covello et al., 2007; Zidorn, 2008; Wang et al., 2009; Duarte and Empinotti, 2012; Menin et al., 2012; Michalska et al., 2013; Eljounaidi et al., 2014; Aschenbrenner et al., 2016; Bombo et al., 2016; Lv et al., 2017).

In sunflower (Helianthus annuus), there are three main types of trichomes: non-GTs, linear glandular trichomes (LGTs), and capitate glandular trichomes (CGTs). As early as 60– 70 h post-germination, these three types of trichomes form on the primordia of the first true leaves, and LGTs and CGTs begin active biosynthesis and secretion of secondary metabolites such as flavonoids, terpenoids, and sesquiterpenes at 72–96 h post-germination (Aschenbrenner et al., 2014). The LGTs are often found among non-GTs and are present on the most parts of sunflower plant, while CGTs only occur on the leaf surfaces and the anther appendages (distal ends) of sunflower florets (Aschenbrenner et al., 2013). In fact, CGTs are the only type of trichome found on anther appendages, where several important sunflower pests feed on pollen and florets, especially the most damaging insect pest of North American sunflower, the sunflower moth (Homoeosoma electellum Hulst.). As seen in other Asteraceae, sunflower CGTs synthesize STLs as the primary chemical defense component against herbivores (Göpfert et al., 2005; Rowe et al., 2012).

Since STL are phytotoxic, they are mainly synthesized in the stalk cells of CGTs and secreted and accumulated in the cuticle globes (Göpfert et al., 2005; Amrehn et al., 2013, 2015). Three sesquiterpene synthases (HaGAS1, HaGAS2, and HaTPS2) and one monooxynase (HaGAO) involved in the STL biosynthesis pathway have been identified in sunflower plants (Göpfert et al., 2009; Amrehn et al., 2015). The expression pattern of these genes and the STL biosynthesis activity accompanied anther development, and STL levels in CGTs reached a peak before disk flower opening and remained more or less constant for several days (Göpfert et al., 2009; Amrehn et al., 2015). While feeding or visiting, the insects cause the rupture of the CGTs on anther florets and lead to direct contact with STL. In vitro studies have shown that both polar and non-polar extracts of CGTs caused high mortality and retarded growth for the larvae of sunflower moth, and the purified STL species also showed similar effects (Rossiter et al., 1986; Rogers et al., 1987; Prasifka, 2015; Prasifka et al., 2015). Interestingly, the extracts of CGTs are also effective against other floret-feeding insects like yellowstriped armyworm (Spodoptera ornithogalli) and migratory grasshopper (Melanoplus sanguinipes), indicating a broader spectrum of CGTmediated defense (Mabry et al., 1977; Isman, 1985).

Three observations have lead us to perform this study: (1) the number of plant trichomes is heritable. In plant taxonomy, trichomes are used as a grouping character since it is a heritable trait across different plant species (Spring, 2000; Reis et al., 2002). Studies in other crop plants have also shown GT density is a quantitatively-inherited trait (Kitch et al., 1985; Agren and Schemske, 1992; Maliepaard et al., 1995; Andrade et al., 2017). However, such information is not available in sunflower. (2) Studies in sunflower have suggested the possibility of developing inbreds or hybrids with high CGT density. Previously, Prasifka (2015) has shown that public sunflower inbred lines (both male and female) present a large range in mean CGT number (>50 fold differences), but wild sunflower and commercial hybrids showed much less variation in the mean of CGT number (∼5 fold differences). Thus, breeding a sunflower line with high CGT density is plausible. (3) A positive correlation between CGT number and STL levels in sunflower florets has been documented (Prasifka et al., 2015). Increasing CGT density in sunflower florets could be a strategy for reducing the fitness of insect pests like sunflower moth, thereby reducing damage from insect pests. Therefore, the objective of this study is to identify the QTL responsible for CGT density of anther florets in sunflower. We believe this study will advance our understanding of CGT-mediated resistance to insect pests in sunflower and also benefit sunflower breeding programs by providing the first marker-assisted resource for insect resistance.

# MATERIALS AND METHODS

#### Plant Materials

Plants of H. annuus genotype HA 300 (PI 552938) and RHA 464 (PI 655015; Hulke et al., 2010) were grown in the one-gallon plastic pots under greenhouse conditions with a photoperiod of 16-h light and 8-h dark. High CGT number per floret was observed in maintainer inbred HA300, and low CGT number per floret was observed in restorer inbred RHA 464 (Prasifka, 2015). To map the genetic factors which control CGT number per floret in sunflower, crosses were made between the parents HA 300 (female) and RHA 464 (male) in the 2013 winter greenhouse, and F<sup>1</sup> plants were grown in a field environment near Fargo, ND, in the summer of 2013. The 300 selfed F<sup>2</sup> plants and 280 F<sup>3</sup> plants were grown in the 2014 winter greenhouse and 2014 Fargo field, respectively. A total of 239 F<sup>4</sup> plants were grown in the 2015 winter greenhouse in the conditions described above and subjected to CGT counting and DNA sampling.

To validate the QTL mapping results of the F<sup>4</sup> population, a separate diversity panel of 39 inbred lines was selected, which provided a distinct genetic background based on breeding history and pedigree information. These lines were grown in 2016 under greenhouse conditions for CGT counting.

CGTs were sampled and counted in single environments because of the very high heritability of the trait and high cost of evaluation. Prasifka (2015) assessed CGT number from public inbred lines and sampled three replicate plants with three subsamples (florets) from each plant in field environments in 2012 and 2013, finding a very high correlation between the two years (environments; R <sup>2</sup> = 0.98). Similarly, Spring and Bienert (1987) found that while compounds within glands on leaves were affected by lighting, the number of glands per unit area were not. These published data suggest this trait does not possess genotypeby-environment variation generally typical of quantitative traits.

# CGT Imaging and Counting

Unopened sunflower florets from 239 F<sup>4</sup> plants were removed with forceps from the outermost 1 cm of the capitulum 1 day before anthesis. After storage at −20◦C, one floret per plant was dissected by removing the corolla and making a latitudinal cut through the fused anther tube, after which the unfurled anther tube was attached to an aluminum mount with double-sided carbon adhesive tape (Ted Pella Inc., Redding CA, USA). Mounted groups of 15 florets were sputter-coated with a conductive layer of gold/palladium (Balzers SCD 030, Balzers Union Ltd., Liechtenstein). Scanning electron micrographs (SEM) were obtained using a JEOL JSM-6490LV scanning electron microscope (JEOL USA, Inc., Peabody MA, USA) operating at an accelerating voltage of 15 kV and a magnification of 45 ×. Since the floret area of HA 300 (mean = 14.5, SE = 0.75, n = 25) and RHA 464 (mean = 13.3, SE = 0.2, n = 25) are similar, micrographs subsequently were used to quantify the total number of CGT per floret. To estimate the counting error between florets on the same plant (natural variation or trichome loss due to handling), a second floret of each of 30 plants in the sampled F<sup>4</sup> population was analyzed by SEM and compared to the initial result.

# DNA Sampling and Sequencing

Leaf tissues were collected from each F<sup>4</sup> plant, and genomic DNA (gDNA) was extracted from lyophilized leaf material according to the Qiagen DNeasy 96 Plant Kit protocol. Subsequently, library prep was performed using a modified Genotyping by Sequencing (GBS) protocol (Meyer and Kircher, 2010). This protocol used restriction enzymes Mse1 and EcoRI to first digest and fragment the gDNA. The result was a pool of fragments with sticky-ends from restriction cut sites that provide the template for adaptor ligation. Illumina adaptors and barcodes were ligated to the digested fragments. A subset of fragments were then amplified using Illumina PCR primers. The samples were normalized and pooled following PCR, using SequalPrep normalization kit (Thermo Fisher Scientific, USA). Next, the pooled, normalized samples were run out on a 2.5% agarose gel. Using a 1,000 bp ladder, we selected 300–400 bp fragments by cutting out this segment from the gel. The gel segment was then purified using Qiagen gel purification kit according to accompanying protocol. Samples were sent to the University of Texas, Austin, for sequencing on an Illumina HiSeq 2500 sequencer. The results were multiplexed 150 bp single-end barcoded GBS reads.

For HA 300, RHA 464, and a portion of the validation panel, genomic DNA was extracted from lyophilized leaf, and whole genome sequencing was performed. Genomic libraries were prepared using Nextera <sup>R</sup> XT DNA library prep kits (Illumina <sup>R</sup> ) according to the protocol. Each gDNA sample was diluted to the appropriate concentration using a Qubit 3.0 fluorometer (Thermo Fisher Scientific, USA). Each sample was barcoded by the unique dual index adapters Nextera <sup>R</sup> i5 and i7. Resulting libraries were cleaned using solid-phase reversible immobilization (SPRI) to remove fragment sizes less than 300 base pairs via an epMotion 5075TMX automated liquid handling system (Eppendorf North America). Sample quality control (QC) was conducted to ensure appropriate sample concentration and fragment size using a Qubit 3.0 fluorometer and an Agilent 2100 Bioanalyzer prior to normalizing the loading concentration to 1.8–2.1 pM with 1% PhiX control v3 added (Illumina <sup>R</sup> ). Samples that passed QC were processed for paired-end 150 base pair reads on the Illumina NextSeq <sup>R</sup> sequencer. Whole genome sequencing was conducted at the BioFrontiers Institute Next-Generation Sequencing Facility at University of Colorado. The remainder of the validation panel was sequenced as part of the sunflower SAM population by the Genome Quebec Innovation Centre at McGill University, Montreal, QC, Canada (Mandel et al., 2013; Burke, pers. comm.).

#### SNP Marker Calling

All GBS reads were demultiplexed using process\_radtags (v1.45) command from the Stacks software suite (Catchen et al., 2013). Default parameters were used with the exception of adding the "r" and "disable\_rad\_check" flags. After demultiplexing, each sample was checked for overrepresented sequences with FastQC (v0.11.5; Andrews, 2010). Trimmomatic (v0.35; Bolger et al., 2014) was used with default parameters to trim and clean the demultiplexed sequences. Reads with overrepresented sequences as identified by FastQC were also trimmed by appending them to the "TruSeq3- SE.fa" adapters file distributed with Trimmomatic. Reads were aligned to the sunflower reference genome (HA 412HO\_v1.1) using the BWA mem (v0.7.12-r1039; Li and Durbin, 2009) algorithm with default parameters and the duplicates in alignments were marked with Picard (v2.8.1; Broad Institute). A realignment and base call recalibration were performed with GATK (Genome Analysis Toolkit) and followed the GATK Best Practices. Finally, the alignment quality was checked with Qualimap (v2.2.1; Okonechnikov et al., 2015), which rendered the data ready for variant calling. The variant calling was performed with freebayes (v1.1.0-1-gf15e66e; Garrison and Marth, 2012), and WGS data from the parental lines (HA 300 and RHA 464) were jointly genotyped to improve the variant calling quality. The variant sites were filtered using vcftools (v0.1.13; Danecek et al., 2011). We kept only bi-allelic single nucleotide polymorphic (SNP) sites, and required a minimum site quality of 30 and a minor allele frequency greater than 0.05.

In addition to filtering sites with vcftools, we applied a protocol to filter sites that are likely to be misassembled multicopy portions of the genome. Repetitive sites are susceptible to misassembly with respect to the reference sequence and may result in several reads erroneously mapping to the same locus. To reduce the number of these sites in our variant calls, we assessed sequencing depth of whole genome shotgun sequence reads from four modern sunflower cultivars and one landrace (Table S1) aligned to the reference genome. We used the samtools depth command to produce the number of mapped sequences per site for each individual. We summed the depths at each position, and calculated the frequency of summed depth values across samples. We then plotted the summed depth values against their frequencies, providing a visual means to choose a range of summed depth values that are most likely to correspond to well-assembled single copy sites of the genome. We then filtered our vcf table to only include the identified single-copy sites. Out of 3,200,466 diagnostic sites differing between the parental genomes, we identified 1,138,669 single-copy diagnostic SNPs. After filtering the vcf table, missing genotypic information from the GBS data of the F<sup>4</sup> population was trio imputed using our own custom phaser2.pl and genotyper7.pl software (Kane, 2017), and leveraging the whole genome sequence of the parent lines HA 300 and RHA 464. These programs are designed to accurately impute for high-quality single-copy, diagnostic SNPs. Because of our extensive filtering, and high quality data, these assumptions were met.

# QTL Analysis

QTL analyses were carried out using the R/qtl package in R version 3.2.3 (Broman et al., 2003). The genetic map was constructed with the est.map function (assuming a genotyping error rate of 0.001). Single QTL analysis and LOD scores were calculated by a single QTL genome scan (scanone function) with standard interval mapping (0.1 cM steps, assuming a genotyping error rate of 0.001). Pairwise QTL interactions were calculated using a two-dimensional QTL scan (scantwo function) via standard interval mapping (0.1 cM steps, assuming a genotyping error rate of 0.001). LOD significance thresholds for type I error rates of α < 0.05 were determined by running 1,000 permutations on the single- and two-dimensional QTL scan. In addition, the stepwiseqtl function (max.qtl = 4, additive.only = FALSE) was used to perform forward/backward stepwise search, and find the QTL model which has the highest LOD score. The fitqtl function was used to create a QTL project to fit the phenotypic data with the selected model, and one QTL was omitted at a time to obtain an ANOVA table.

# Validation of the Mapped QTL

To validate the QTL results, the SNP markers were called from the validation population together with HA 300 and RHA 464, as described above. The SNP markers derived from the QTL support interval were extracted. Monomorphic SNPs between the two parent lines were removed, and the heterozygous SNPs and markers with missing data in one of the parent lines were also filtered out. The SNP markers were treated as fixed effects and all possible marker pairs were tested by fitting a regression function in R as y ∼ SNP<sup>m</sup> + SNPn, where y is the CGT number, SNP<sup>m</sup> is the effect of the mth marker from chromosome 5, and SNP<sup>n</sup> is the effect of the nth marker from chromosome 6. The significance threshold for multiple comparisons was determined by the adjusted p-values with the false discovery rate at 0.05. (Benjamini and Yekutieli, 2001).

#### Identification and Phylogenetic Analysis of Candidate Genes within the QTL Support Interval

To identify the candidate genes with the highest likelihood of influence on the phenotype, we first manually checked through the QTL support interval region in the sunflower reference genome (INRA Sunflower Bioinformatics Resources, 2014) and listed all the annotated genes (Table S5). We then filtered the list based on two criteria: (1) significant association between a colocated SNP and CGT number in the validation population; and (2) putative gene function. The sequences of selected candidate genes were further characterized by two approaches, NCBI (National Center for Biotechnology Information) ORFfinder search and BLAST homology search. All ORFs within these sequences were identified using NCBI ORFfinder with the following parameters: a minimal ORF length of 150 nucleotides, the standard genetic code, ATG and alternative start codons, and ignore nested ORFs. BLASTp searches were carried out with the given ORFs to obtain the most similar protein sequences in Asteraceae, Solanaceae, Malvaceae and Brassicaceae. The sequences with an E-value less than 10−<sup>5</sup> were selected, combined, and searched for the presence of conserved protein domains using ScanProsite (Sigrist et al., 2012). Comparisons were made among HA 412HO (reference sequence), HA 300, and RHA 464 for genomic sequence differences, alternative splice sites, and changes in protein domains at the selected loci. For phylogenetic analysis, the conserved protein domain sequences with 10–15 additional amino acids on the N-terminal and Cterminal ends were used for alignment. The alignment was done with the L-INS-i strategy implemented in MAFFT (version 7.310; Katoh, 2002). The BLOSUM62 scoring matrix and a gap opening penalty of 1.5 were selected to assess the phylogenetic relationship among the protein sequences. After the alignment, a neighbor-joining tree was constructed with PhyML including the Smart Model Selection approach (version 3; Guindon et al.,

FIGURE 1 | Frequency distribution of capitate glandular trichomes (CGTs) in the F4 mapping population. The arrowheads indicate the CGT number of the parental lines, HA 300 (red arrow) and RHA 464 (yellow arrow).

2010; Lefort et al., 2017), bootstrap reassembling with 1,000 bootstrap samples. The phylogenetic tree was colorized according to protein families in Archaeopteryx (version 0.9920 beta; Han and Zmasek, 2009).

# RESULTS

#### Capitate Glandular Trichome Density in Mapping Population

The two parents presented a significant difference in CGT density, with the male parent RHA 464 having ∼2 CGT per floret and the female parent HA 300 having ∼300 CGT per floret (**Figure 1**). The CGT numbers per floret were counted for 239 F<sup>4</sup> plants, and 179 plants with good genotypic data quality were selected as the final mapping population. Based on frequency distribution, 179 F<sup>4</sup> plants were classified into three groups (**Figure 1**). Of these 179 F<sup>4</sup> plants, 25 plants (13.9%) had high CGT density which is more than 150 CGT per floret, 109 plants (60.9%) had medium CGT density which is 25–150 CGT per floret and 45 plants (25.2%) had low CGT density which is less than 25 CGT per floret. Shapiro-Wilk normality test (Shapiro and Wilk, 1965) indicated that the CGT number in the F<sup>4</sup> population was not normally distributed (W = 0.915, p < 1.1 × 10−<sup>8</sup> ), and the distribution of CGT number was moderately skewed toward the lower CGT number (Skewness = 0.98). The original mapping population with 239 individuals showed a similar distribution of CGT number (Skewness = 0.97; W = 0.916, p < 1.1 × 10−10).

To estimate the potential error in CGT counting (natural variation or trichome loss due to handling), 30 plants from the F<sup>4</sup> population were randomly selected for recounting. A second


floret from each of these 30 plants was counted, and the CGT numbers from first floret and second floret were highly correlated (r = 0.98, p < 2.2 × 10−16) (Figure S1).

#### Genetic Map Construction

A total of 443.2 M raw GBS reads were generated from 239 F<sup>4</sup> plants with the Illumina HiSeq 2500 sequencing system. After trimming the adapters and overrepresented sequences with Trimmomatic (v0.35; Bolger et al., 2014), the number of remaining reads was 210.1 M. From these 210.1 M reads, 97.9% (205.7 M) of the reads were successfully aligned to the sunflower reference genome HA 412HO\_v1.1., and the mean length of reads was 74.3 bp. Before performing variant calling, five individuals were removed due to poor genotyping quality. A total of 16,028,511 SNPs was produced in the initial variant calling, and 1,138,669 single-copy diagnostic SNP were retained after applying several filters (minQ > 30, MAF > 0.05, biallelic only). To further control genotypic data quality, we also checked each individual in the F<sup>4</sup> population, and the individuals with low coverage on one or more chromosomes (less than 10 SNPs per chromosome) were dropped. A total of 179 individuals with acceptable sequence quality were kept to conduct imputation. A total of 1.09 million SNPs was generated from the imputation, and monomorphic markers and markers with LD above 0.9

were filtered out first, followed by removing SNPs with high missing rate and significant distortion from expected Mendelian ratio. Finally, 2,121 high quality SNPs were kept and used for constructing the genetic linkage map (**Figure 2**).

The 2121 SNP markers were distributed in all 17 chromosomes, and the number of markers per chromosome varies from 32 on chromosome 7 to 238 on chromosome 10 (**Table 1**). The total length of the map was 1,519.52 cM, and the size of chromosomes varies from 13.48 cM (chromosome 7) to 125.58 cM (chromosome 17). Chromosome 7 had the fewest SNP markers and shortest map distance. The average density per marker was 0.72 cM, and the largest gap between two adjacent loci was 27.32 cM on chromosome 6.

#### QTL Analysis

After performing a single QTL scan on the data collected from the F<sup>4</sup> population, two putative QTL were revealed. One was mapped to chromosome Ha5 at the 14.64 cM position, and the other one was mapped to chromosome Ha6 at the 60.72 cM position (**Figure 3**). Both QTL had LOD scores greater than the LOD threshold (4.70, α = 0.05), which was 95 percentile of the distribution of genome-wide max LOD obtained by 1,000 permutations (Table S2). Therefore, a two-QTL model scan was performed with the scantwo function, and the same pair of

QTL on Ha5 and Ha6 were identified (Table S3). In this two-QTL model, the additive effect of these two QTL was strongly supported by the data, but no epistasis interaction was detected. The two-QTL additive model (Ha5@14.6, Ha6@60.5) explained 24.58% of CGT number variation for the F<sup>4</sup> mapping population, while the single QTL model explained 11.61 and 14.06% of the CGT number variation, respectively (**Table 2**). To confirm the QTL model, we also performed a forward/backward stepwise search, allowing four QTL at maximum, and found the two-QTL model had the best LOD score. The closest SNPs to each QTL were identified, and the phenotypes were plotted against genotypes at the putative QTL (Figure S2, Table S4). The 1.5— LOD support interval for SNP marker Ha5\_11356218 was 12.1– 15.7 cM, which extended from 8.07 to 12.65 Mbp on the physical map, and the support interval of SNP marker Ha6\_8364901 was 41.3–76.3 cM, which covered from 7.42 to 8.58 Mbp on the physical map (**Table 2**, Figure S3).

### Validation of the Mapped QTL

To validate the QTL analysis, we selected 39 inbred lines which were whole genome sequenced with 10 × coverage, and had phenotypic data from a previous study (Prasifka, 2015). With the WGS reads, 609914 SNPs were produced from variant calling and 485516 good quality SNPs were kept after applying the additional filters (minQ > 30, MAF > 0.05, biallelic only, max missing rate <10%). A phylogenetic tree was constructed from these 485516 SNPs using the SNPhylo pipeline (Lee et al., 2014; Figure S4). As shown in Figure S4, the two parent lines, RHA 464 and HA 300, were clustered into two distinct subgroups, and the other 37 lines in the validation population presented additional genetic backgrounds.

To test the two-QTL model, which was suggested by the QTL mapping results, we retrieved all the SNP markers from the QTL support intervals. A total of 990 SNP markers were extracted from the QTL support interval on chromosome 5, with a marker density of 0.22 SNP per kb, and 363 SNP markers were extracted from the QTL support interval on chromosome 6, with a marker density of 0.31 SNP per kb. After filtering these SNP markers, as described above, we kept 385 SNP markers from the QTL support interval on chromosome 5 and 33 SNP markers from the QTL support interval on chromosome 6. Next, we tested pairs of SNP markers, one from each QTL, with a linear regression function in R [lm(y ∼ SNP<sup>m</sup> + SNPn), where y is the CGT number, SNP<sup>m</sup> is the effect of the mth marker on chromosome 5 and SNP<sup>n</sup> is the effect of the nth marker on chromosome 6]. In total, 12705 two-QTL models were tested, and 414 of these models were detected as significant with the adjusted p < 0.05 (p = 1.6 × 10−<sup>4</sup> ) (**Figures 4A,B**). The best two-QTL model explained 59.9% of total phenotypic variation.

#### Identification of Putative Genes Controlling CGT Number

To identify putative genes which control CGT number in sunflower, we manually checked through the physical regions, which included the Ha5\_11356218 marker support interval and the Ha6\_8364901 marker support interval, with sunflower genome JBrowse (INRA Sunflower Bioinformatics Resources,


TABLE 2 | Two QTL for capitate glandular trichome (CGT) density identified in the HA 300 × RHA 464 mapping population.

\*\*\*Significant at p < 0.001 as determined by permutation testing.

<sup>a</sup>Additive effect of two QTL.

b Interaction effect of two QTL.

2014). A total of 105 genes were predicted within the Ha5\_11356218 marker support interval on chromosome 5, and 35 genes were predicted within the Ha6\_8364901 marker support interval on chromosome 6. A list of these genes with SNP marker data and expression profile in flower tissues is shown in Table S5.

Since some of the SNP markers in the QTL support intervals were within these genes, we found 16 genes on chromosome 5 and five genes on chromosome 6 with variation that had significant associations with CGT number (**Figures 4A,B**). Based on gene annotations that suggested plausible functional association, the genes co-localized with Ha5\_10149906 and Ha6\_7633946 were selected for further characterization. This pair of markers had strong association with CGT number (adjusted p < 0.05, p = 3.6 × 10−<sup>5</sup> ) and the two-QTL model explained 47.2% of phenotypic variation in the validation data. The genotypic data at these loci together with phenotypic data are shown in **Table 3**. A single factor ANOVA was also used to test the association between each SNP marker and CGT number. The marker Ha5\_10149906 showed strong association with CGT number (p = 0.004), explaining 20.3% of the phenotypic variation, and the single SNP marker Ha6\_7633946 also showed significant association with CGT number (p = 0.0004), explaining 31.1% of the phenotypic variation (**Figures 4C,D**).

The SNP marker Ha5\_10149906 is located within the second exon of the gene Ha5g003120, which is annotated as a member of heat shock transcription factor (HSF) family (Table S5). The HSF proteins are not only involved in heat stress responses but also participate in many other developmental/physiological activities such as cell division and root growth (Westerheide et al., 2012). As shown in the phylogenetic tree (Figure S5), sunflower Ha5g003120 gene is orthologous to Arabidopsis thaliana gene HSF2A. Furthermore, we obtained the genomic sequence of gene Ha5g003120 from the two parents, HA 300 and RHA 464, and the reference genome HA 412HO, and performed alignments. Interestingly, the genomic sequence in RHA 464 has a 326 bp deletion in the promoter region of Ha5g003120, but this deletion is not present in HA 300 and HA 412HO (**Figure 5A**).

The SNP marker Ha6\_7633946 is located in the 3′ -UTR (untranslated region) of the gene Ha6g003560, which is annotated as a member of the WRKY transcription factor family (Table S5). WRKY proteins are key regulators of many biotic, abiotic and physiological responses in plants (Phukan et al., 2016). Similarly, the ORFs within the gene sequence of Ha6g003560 were confirmed with NCBI ORFfinder. The confirmed ORF sequence was used for BLASTp searches to obtain the most similar protein sequences in other phyla (Figure S6). As shown in the phylogenetic tree, Ha6g003560 closely resembles Arabidopsis thaliana gene WRKY44/TTG2. It has been shown that the WRKY44/TTG2 gene regulates trichome development in Arabidopsis (Ishida et al., 2007). In addition, we also performed alignments with the genomic sequence of gene Ha6\_7633946 from HA 300, RHA 273 and reference genome XRQ. Since the sequence quality was poor in this region in RHA 464 we used the sequence from RHA 273 instead because these two inbred lines are phylogenetically close, they share the same phenotype for CGT, and they share the same haplotype of chromosome 6 (**Table 3**, Figure S4). As a result, we found a 51 bp deletion in the intron of gene Ha6\_7633946 in RHA 273, and this deletion results in an alternative splicing (**Figures 5B**, **6**). These results make the putative HSF gene Ha5g003120 and WRKY gene Ha6g003560 good candidates as the regulators of CGT density in sunflower florets.

# DISCUSSION

Trichomes, especially glandular trichomes, play an important role in plant defense. CGTs are most common in Asteraceae species (Bombo et al., 2016). CGTs are well-known for their ability to produce various secondary metabolites such as STLs. In sunflower, however, the genetics of CGT density is not well understood. In this study, we reported for the first time that CGT density in sunflower florets is a quantitative trait, and two major QTL were identified in a biparental mapping population. The F<sup>4</sup> mapping population derived from high CGT density parent HA 300 and low CGT density parent RHA 464, showed segregation in CGT number per floret (**Figure 1**). Of 179 F<sup>4</sup> plants, only 25 plants (13.9%) showed high CGT density (greater than 150 CGT per floret) and 45 plants (25.2%) showed low CGT density (less than 50 CGT per floret). The variation of CGT density from CGT counting errors was marginal since the CGT numbers from first floret and second floret were highly correlated (Figure S1). As shown in **Figure 1**, the frequency of CGT density in the F<sup>4</sup> mapping population showed a moderately skewed distribution (Skewness = 0.98) toward the low CGT density parent. For this reason, we attempted non-parametric interval mapping, which is an alternative method for analyzing non-normal phenotypic data, but it exhibited less power to detect QTL in our population (Broman, 2003; Fernandes et al., 2007; data not shown). In



general, the simple interval mapping method can give reasonable results if the phenotypic data are not highly skewed (Broman and Sen, 2009). We also determined statistical significance on the basis of a genome-wide permutation test, which utilizes the same empirical distribution (Table S3). For these reasons, we used the original CGT numbers to perform QTL analysis without any data transformation.

The linkage map in the QTL analysis was constructed with SNP markers generated through GBS. GBS is a highthroughput and highly cost-effective genotyping and SNP discovery approach, and has been applied to many plant species (Elshire et al., 2011; Kumar et al., 2012; Melo et al., 2016; Torkamaneh et al., 2016). To date, two GBS-SNP marker maps have been reported in sunflower (Celik et al., 2016; Talukder et al., 2016). In our study, we successfully mapped 2121 SNP markers, which is more than twice the number of unique SNP markers mapped in previous studies (**Table 1**). There are two possible reasons to explain why we were able to discover more unique SNP markers by GBS. First, we used a customized bioinformatic pipeline to process the GBS data. As described in our methods, we followed the GATK Best Practices to recalibrate the alignment, and used FreeBayes, a haplotype-based variant caller, to detect SNP markers (Garrison and Marth, 2012). The pipeline we used in our study could be valuable for discovering SNP markers, in addition to the TASSEL-GBS pipeline which was used in previous studies (Glaubitz et al., 2014; Celik et al., 2016; Talukder et al., 2016). Second, for our biparental mapping population, we carried out whole genome sequencing for the parent lines (HA 300 and RHA 464). These were included in the SNP marker calling and imputation steps together with their offspring, and this greatly helped to discover more reliable SNP markers. It is also a good practice to extract DNA from fresh young leaves or seedlings to obtain high quality DNA for GBS. With these modifications, we think GBS is still a good option for genotyping and SNP discovery in sunflower, especially considering the advantage of high-throughput and cost-effectiveness. The length of the genetic linkage map in this study was 1,519.52 cM, which is comparable with a sunflower consensus linkage map of 1,443.84 cM developed from three F2 mapping populations (Talukder et al., 2014), and also a recent SNP-based linkage map of 1,401.36 cM (Talukder et al., 2016). The marker density in this study (0.72 cM/SNP) is higher than the other two SNP marker maps (1.33 cM/SNP; Talukder et al., 2016; 3.03 cM/SNP; Celik et al., 2016). Despite good whole genome coverage, several regions with gaps greater than 10 cM were observed on chromosome 4, 6, and 14 (**Table 1**). Notably, the largest gap is 27.32 cM on chromosome 6, which is likely due to chromosome structure differences between the parents. Consequently, we were unable to detect SNP markers from these regions.

Two major QTL controlling CGT density in sunflower were identified in a biparental mapping population (**Figure 3**). One QTL was mapped to chromosome 5 (Ha5@14.6), which explained 11.61% of the CGT number variation. The second QTL was mapped to chromosome 6 (Ha6@60.5), which explained 14.06% of the CGT number variation (**Table 2**). No epistasis interaction was detected, and this suggests that these two QTL mediate CGT density in an additive manner. Similarly, two QTL associated with type VI glandular trichome density were identified in cultivated tomato, and seven QTL associated with trichome density were detected in soybean (Maliepaard et al., 1995; Du et al., 2009). These findings suggest that the trichome density in plants is regulated by multiple genes. A large portion of the CGT number variation is still unaccounted for in the current study, and this could be due to some error in genotyping, or a confluence of several genes of minor effect.


FIGURE 5 | (A) Alignment of genomic sequence of the heat shock transcription factor-like (HSF) gene from two parental lines and the reference genome HA 412HO. The start codon (ATG) is shaded in yellow, and the deletion in RHA 464 is shaded in red. (B) Alignment of genomic sequence of the WRKY-like gene from two parental lines and the reference genome XRQ. The start codon (ATG) is shaded in yellow, and the deletion in RHA 273 line is shaded in red.

To validate the QTL results, we selected 39 sunflower inbred lines with diverse genetic backgrounds as the validation population (Figure S4). We used multiple linear regression analysis to validate the QTL-trait association and also estimate the QTL effects. A total of 385 SNP markers were selected from the 1.5-LOD support interval on chromosome 5 and 33 SNP markers were selected from the QTL support interval on chromosome 6. Based on annotated gene functions, one SNP marker pair, Ha5\_10149906 and Ha6\_7633946, was identified as the best two-QTL model and corresponded to genes with a plausible effect on the phenotype. This two-QTL model explained 47.2% of phenotypic variation in the validation population.

Moreover, the significant associations between single SNP markers (Ha5\_10149906 and Ha6\_7633946) and CGT number were also supported by single factor ANOVA (**Figures 4C,D**). Thus, the QTL effects and the association between CGT density and QTL support intervals were confirmed in validation.

The SNP marker Ha5\_10149906 is located within the second exon of the gene Ha5g003120. A 326 bp deletion was observed in the promoter region of the gene Ha5g003120 in RHA 464, and this deletion might lead to substantial change in gene expression pattern (**Figure 5A**). The gene Ha5g003120 is annotated as a member of heat shock transcription factor (HSF) family. Although no experimental results have shown that HSF proteins are involved in trichome development, some studies indicate these proteins are required for cell division and root growth (Westerheide et al., 2012). The SNP marker Ha6\_7633946 is located in the 3′ -UTR of the gene Ha6g003560 and is associated with an alternative splice site in RHA 273, due to a 51 bp deletion in the second intron. This alternative splicing alters the WRKY domain, which determines DNA-binding specificity (Llorca et al., 2014). The phylogenetic tree showed that Ha6g003560 is grouped together with Arabidopsis thaliana gene WRKY44/TTG2. WRKY44/TTG2 is a key regulatory gene for trichome development, and a mutation in WRKY44/TTG2 causes significantly reduced trichome number and unbranched trichomes in Arabidopsis (Johnson, 2002; Ishida et al., 2007; Pesch et al., 2014). Taken together, the HSF gene Ha5g003120 and WRKY gene Ha6g003560 are strong candidates for regulating CGT density in sunflower florets. It is possible, however, that other adjacent genes are contributing to the phenotype. Further studies are required to characterize gene functions in these regions.

In summary, we successfully detected SNP markers by GBS and constructed a genetic linkage map with these SNP markers. We also identified two major QTL controlling CGT density in sunflower florets by using the F<sup>4</sup> population derived from the cross HA 300 × RHA 464. In addition, we found two plausible candidate genes in the QTL support intervals. Future work will focus on optimizing STL chemical composition in CGT for enhancing host resistance to sunflower insect pests.

#### REFERENCES


# AUTHOR NOTE

Mention of trade names or commercial products in this report is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the US Department of Agriculture. The USDA is an equal opportunity provider and employer.

# AUTHOR CONTRIBUTIONS

Q-MG analyzed the data and wrote the paper. NK, BH, and JP designed and led the genomics analyses, population development and genetics analyses, and trait evaluations, respectively. SR conducted phylogenetic analyses of the candidate genes. CP and ST conducted GBS and whole genome shotgun sequencing and bioinformatic analyses. All authors have read and agree with the contents of the manuscript.

# FUNDING

National Sunflower Association, award # 14-E03.

# ACKNOWLEDGMENTS

The authors would like to acknowledge Jayma Moore and Scott Payne at NDSU Electron Microscopy Center for their help in obtaining SEM images of florets, and Jamie Miller-Dunbar for help in quantifying CGT. We sincerely thank Brady Koehler, Mike Grove and Brian Smart for collecting DNA samples, greenhouse operations and support in the laboratory. We thank Brian Smart and Dr. Gerald Seiler for critically reviewing the manuscript. We also thank Jamie Prior Kershner at the BioFrontiers NGS Core Facility for assistance with Illumina sequencing.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2017. 02227/full#supplementary-material


plant resistance. Plant Genet. Resour. 13, 68–74. doi: 10.1017/S14792621140 00653


seven pipelines and two sequencing technologies. PLoS ONE 11:e0161333. doi: 10.1371/journal.pone.0161333


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Gao, Kane, Hulke, Reinert, Pogoda, Tittes and Prasifka. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# An Oleuropein β-Glucosidase from Olive Fruit Is Involved in Determining the Phenolic Composition of Virgin Olive Oil

David Velázquez-Palmero1,2, Carmen Romero-Segura<sup>1</sup> , Rosa García-Rodríguez<sup>1</sup> , María L. Hernández<sup>1</sup> , Fabián E. Vaistij<sup>2</sup> , Ian A. Graham<sup>2</sup> , Ana G. Pérez<sup>1</sup> and José M. Martínez-Rivas<sup>1</sup> \*

<sup>1</sup> Department of Biochemistry and Molecular Biology of Plant Products, Instituto de la Grasa (CSIC), Sevilla, Spain, <sup>2</sup> Centre for Novel Agricultural Products, Department of Biology, University of York, York, United Kingdom

#### Edited by:

Mariela Torres, Instituto Nacional de Tecnología Agropecuaria (INTA), Argentina

#### Reviewed by:

Rosario Muleo, Università degli Studi della Tuscia, Italy Benedetto Ruperti, Università degli Studi di Padova, Italy

> \*Correspondence: José M. Martínez-Rivas mrivas@cica.es

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 28 July 2017 Accepted: 20 October 2017 Published: 07 November 2017

#### Citation:

Velázquez-Palmero D, Romero-Segura C, García-Rodríguez R, Hernández ML, Vaistij FE, Graham IA, Pérez AG and Martínez-Rivas JM (2017) An Oleuropein β-Glucosidase from Olive Fruit Is Involved in Determining the Phenolic Composition of Virgin Olive Oil. Front. Plant Sci. 8:1902. doi: 10.3389/fpls.2017.01902 Phenolic composition of virgin olive oil is determined by the enzymatic and/or chemical reactions that take place during olive fruit processing. Of these enzymes, β-glucosidase activity plays a relevant role in the transformation of the phenolic glycosides present in the olive fruit, generating different secoiridoid derivatives. The main goal of the present study was to characterize olive fruit β-glucosidase genes and enzymes responsible for the phenolic composition of virgin olive oil. To achieve that, we have isolated an olive β-glucosidase gene from cultivar Picual (OepGLU), expressed in Nicotiana benthamiana leaves and purified its corresponding recombinant enzyme. Western blot analysis showed that recombinant OepGLU protein is detected by an antibody raised against the purified native olive mesocarp β-glucosidase enzyme, and exhibits a deduced molecular mass of 65.0 kDa. The recombinant OepGLU enzyme showed activity on the major olive phenolic glycosides, with the highest levels with respect to oleuropein, followed by ligstroside and demethyloleuropein. In addition, expression analysis showed that olive GLU transcript level in olive fruit is spatially and temporally regulated in a cultivar-dependent manner. Furthermore, temperature, light and water regime regulate olive GLU gene expression in olive fruit mesocarp. All these data are consistent with the involvement of OepGLU enzyme in the formation of the major phenolic compounds present in virgin olive oil.

Keywords: β-Glucosidase, Olea europaea, oleuropein, olive fruit, phenolic compounds, virgin olive oil

# INTRODUCTION

Olive (Olea europaea L.) is one of the first plants grown as oil crop. Consequently, olive oil is one of the oldest known plant oils and it can be consumed as virgin olive oil (VOO). In the Mediterranean diet, this oil constitutes the main lipid source and it has been related with several beneficial nutritional properties which are mainly associated to its phenolic components (Konstantinidou et al., 2010; Visioli and Bernardini, 2011). However, phenolic compounds are relevant not only because their nutritional properties, but also due to their organoleptic characteristics. In fact, phenolic components are involved in the pungent and bitter sensory notes of VOO (Andrewes et al., 2003; Mateos et al., 2004). Phenolic compounds are being currently used as a trait in new

cross breeding programs (León et al., 2011), and also as VOO quality markers, because of their health promoting and organoleptic properties.

Oleuropein, demethyloleuropein and ligstroside, the most significant phenolic glycosides detected in the olive fruit, belong to the secoiridoids class, a group of monoterpenoids typical of the Oleaceae family with a cleaved methylcyclopentane skeleton (Obied et al., 2008). On the contrary, the main phenolic compounds detected in VOO are the secoiridoid derivatives, resulting from the enzymatic hydrolysis of these olive fruit glycosides. Specifically, the aldehydic forms of oleuropein and ligstroside aglycones (3,4-DHPEA-EA and p-HPEA-EA, respectively), and the dialdehydic forms of decarboxymethyloleuropein and ligstroside aglycones (3,4-DHPEA-EDA and p-HPEA-EDA, respectively) (Montedoro et al., 2002). Oleuropein derivatives exhibit the highest antioxidant activity (Ramos-Escudero et al., 2015), proteindenaturing/protein-cross-linking properties (Konno et al., 1999), cytotoxic effects (Bernini et al., 2011) and effectivity as chronic disease preventive agents (Pinto et al., 2011).

The phenolic profile of VOO is mainly derived from the amount of phenolic glycosides originally found in the tissues of olive fruit and the activity of oxidative and hydrolytic enzymes operating on these glycosides during VOO processing (García-Rodríguez et al., 2011; Romero-Segura et al., 2012). Although secoiridoids biosynthesis and degradation pathways are still not fully understood (Obied et al., 2008), hydrolysis by highly specific β-glucosidases seems to be critical for the diverse roles attributed to secoiridoid derivatives. In this sense, the wide array of physiological roles assigned to plants β-glucosidases (β-d-glucoside glucohydrolases, EC 3.2.1.21), such as functions in plant secondary metabolism, symbiosis, defense, signaling, and cell wall lignification and catabolism, are determined by their tissue and subcellular localization, and their substratespecificities (Cairns and Esen, 2010).

The existence of various β-glucosidase isoforms in olive was first reported by Mazzuca et al. (2006), who described the localization of two isoforms of oleuropein-degradative β-glucosidases in the oil droplets and in the chloroplasts of mesocarp of green olive fruits. Transcriptomic (Alagna et al., 2012) and proteomic (Bianco et al., 2013) studies confirm that olive, similar to most plants, possesses several distinct β-glucosidases. Recently, the isolation and characterization of a defense-related β-glucosidase gene from olive (cv. Koroneiki) has been described (Koudounas et al., 2015). Nearly all these earlier reports on this enzyme have been centered in its physiological function as a defense mechanism which specifically generates oleuropein-derived compounds with established antimicrobial activities. In contrast, no similar studies have been carried out on the β-glucosidase genes/enzymes in relation to the VOO quality; despite that this knowledge may be very valuable to enhance marker assisted breeding programs to obtain new varieties with tailored oil quality characteristics.

We have previously isolated and purified to apparent homogeneity a protein with β-glucosidase activity from olive fruit mesocarp which exhibits high activity with the main phenolic glycoside in olive fruit, oleuropein, and gives rise to one of the most important phenolic compounds in VOO (3,4-DHPEA-EA) as the main reaction product (Romero-Segura et al., 2009). Data on the β-glucosidase activity during ripening of olive fruit from cultivars Arbequina and Picual are in good agreement with the phenolic composition of the oils obtained from fruits with different degrees of maturity (Romero-Segura et al., 2012).

The objective of this study was to characterize olive fruit β-glucosidase genes and enzymes responsible for the phenolic composition of VOO. Thus, we have isolated an olive β-glucosidase gene from cultivar Picual, which codes for an enzyme that displays the highest activity toward oleuropein. The immunological and catalytic properties of this olive β-glucosidase enzyme, together with its expression data, are in agreement with its participation in the biosynthesis of the major phenolic compounds found in VOO.

#### MATERIALS AND METHODS

#### Plant Material

Olive (Olea europaea L. cv. Picual and Arbequina) trees were cultivated in the experimental orchard of Instituto de la Grasa, Sevilla (Spain), with drip irrigation and fertirrigation from the time of flowering to fruit ripening. In the case of non-irrigated treatment, the olive trees received only natural rainfall.

Young drupes, developing seeds, and mesocarp tissue were harvested at different weeks after flowering (WAF) corresponding to different developmental stages of the olive fruit: green (9, 12, 16, and 19 WAF); yellow-green (23 WAF); turning or veraison (28 and 31 WAF); and mature or fully ripe (35 WAF). Immediately after harvesting, olive tissues were frozen in liquid nitrogen, and stored at −80◦C.

Stress treatments were carried out according to Hernández et al. (2011). Olive branches with approximately 100 olive fruit at turning stage (28 WAF) were taken from olive trees and incubated in a growth chamber at 25◦C with a 12 h light/12 h dark cycle to imitate physiological conditions of the tree. The light intensity was 11.5 µmol m−<sup>2</sup> s −1 . For stress experiments, standard conditions were modified according to the effect studied. For low and high temperature treatments, the branches with olive fruit were incubated at the standard light intensity, at 15 or 35◦C, respectively. To evaluate the effect of the darkness, the standard temperature was maintained, and light was turned off. To study the effect of wounding, the whole surface of the olive fruit was mechanically damaged with pressure at zero time using forceps with serrated tips, affecting mesocarp tissue. To maintain the natural photoperiod day/night of the olive fruit, the zero time of each experiment was selected 2 h after the start of the light period. When indicated, olive mesocarp tissues were collected, frozen in liquid nitrogen, and kept at −80◦C.

#### Isolation of a β-Glucosidase Full-Length cDNA Clone

Candidate sequences for olive β-glucosidases were identified in the olive ESTs database (Muñoz-Mérida et al., 2013) by means of the tblastn algorithm together with amino acid sequences of known plant β-glucosidase proteins. One of

them, which showed high expression levels in mesocarp tissue according to in silico expression analysis, was selected for cloning. Based on this sequence, a specific pair of primers CR3 5<sup>0</sup> -AAGAGCACCAAAGTCTGCAATG-3<sup>0</sup> and CR4 5 0 -GGAGCCCAACTCCTTTATTGG-3<sup>0</sup> was designed. These primers, together with an aliquot of an olive Uni-ZAP XR cDNA library constructed with mRNA isolated from 13 WAF olive fruit of cultivar Picual (Haralampidis et al., 1998), were used for PCR amplification. The generated DNA fragment was subcloned into the vector pSpark <sup>R</sup> I (Canvax, Spain) and sequenced in both directions. DNA sequence determination and analysis was carried out as described in Hernández et al. (2016).

#### Total RNA Extraction and cDNA Synthesis

1–2 g of frozen olive tissues harvested from at least three different olive trees, were used for total RNA isolation as described by Hernández et al. (2005). Verification of RNA quality, removal of genomic DNA and cDNA synthesis were performed according to Hernández et al. (2009).

### Quantitative Real-Time PCR (qRT-PCR)

Gene expression analysis was performed by qRT-PCR using an Mx3000PTM real-time PCR System and the "Brilliant <sup>R</sup> SYBR <sup>R</sup> Green Q-PCR Master Mix (Stratagene, La Jolla, CA, United States) as previously described (Hernández et al., 2009). Primer3 program<sup>1</sup> was used to design primers for gene-specific amplification (Supplementary Table S1). The housekeeping olive ubiquitin2 gene (OeUBQ2, AF429430) was used as an endogenous reference to normalize. The real-time PCR data were calibrated relative to the corresponding gene expression level in 12 WAF mesocarp tissue from Picual in the case of tissue and developmental expression studies, whereas for the stress studies the data were calibrated relative to the corresponding gene expression level at zero time for each treatment and cultivar. In both cases, the 2−11C<sup>T</sup> method for relative quantification was followed (Livak and Schmittgen, 2001). The data are presented as means ± standard deviation (SD) of three different qRT-PCR reactions carried out in three different 96-well plates. Each reaction was performed in duplicate in each plate.

#### Transient Expression of OepGLU Gene in Nicotiana benthamiana

For functional Agrobacterium-mediated CaMV35S-driven transient expression, the OepGLU coding sequence was PCR-amplified using the specific primers YDV1F (5<sup>0</sup> -CACC ATGGATATCCAAAGCAAC-3<sup>0</sup> ) and YDV1R+His (5<sup>0</sup> -CTAG TGATGGTGATGGTGATGCCCGGTGCTGCCTCTAAGCCTT TTAC-3<sup>0</sup> ), and subcloned into the GATEWAY <sup>R</sup> -compatible binary vector pH2GW7 (Karimi et al., 2002). The resulting purified pH2GW7-OepGLU construct was used to transform Agrobacterium tumefaciens strain GV3101 using freeze-thaw method described by Höfgen and Willmitzer (1988). Nicotiana benthamiana leaves were pressure infiltrated with A. tumefaciens

<sup>1</sup>http://primer3.ut.ee/

cultures (OD<sup>600</sup> approximately 1.0) as described by Popescu et al. (2007). Samples were collected 3 days after infiltration, frozen in liquid nitrogen and kept at −80◦C.

#### Purification of OepGLU Recombinant Isoenzyme Expressed in N. benthamiana Leaves

To obtain the crude extract, 4 g of infiltrated leaf tissue were thawed and homogenized in 30 ml of 20 mM Na-phosphate buffer pH 7.4 containing 500 mM NaCl, 20 mM imidazole, 5% (w/v) polyvinyl polypyrrolidone, and 1 mM phenylmethanesulfonyl fluoride using an Ultraturrax homogenizer at 4◦C. The resulting homogenate was centrifuged at 27000 g for 20 min at 4◦C. The clear supernatant was filtered through three layers of Miracloth (Calbiochem, United States) and was used as the crude extract.

To purify the recombinant OepGLU protein containing the C-terminal 6xHis tag motif, 30 ml of crude extract was loaded onto a 1-ml His GraviTrap column (GE Healthcare, United Kingdom), and OepGLU protein was eluted with 3 ml of 20 mM Na-phosphate pH 7.4 containing 500 mM NaCl and 500 mM imidazole. Remaining NaCl and imidazole were removed by means of a PD-10 column (GE Healthcare, United Kingdom). The enzymatic solution was concentrated in 30 kDa microcentrifuge filters Vivaspin <sup>R</sup> (GE Healthcare, United Kingdom) at 2000 g and 4◦C to a final volume of 250 µl. This purified and concentrated preparation was used for OepGLU biochemical characterization.

### β-Glucosidase Assay

Two methods for in vitro assaying β-glucosidase activity were used in this study (Romero-Segura et al., 2009). A spectrophotometric method, in which the β-glucosidase activity was determined by continuously monitoring the increase in absorbance at 405 nm related to the increasing amount of p-nitrophenol liberated from the synthetic glucoside pNPG, and a second method based on the direct determination of the hydrolyzed natural olive glucoside, oleuropein, by HPLC analysis.

#### HPLC Analysis

Analytical HPLC of phenolic compounds was performed in a Beckman Coulter liquid chromatographic system equipped with a System Gold 168 detector, a solvent module 126 and a Mediterranea Sea 18 column (4.0 mm i.d. x 250 mm, particle size = 5 µm) (Teknokroma, Spain). Quantification and identification of phenolic compounds was performed following a previously described methodology (Luaces et al., 2007).

#### Protein Determination and Electrophoresis

The protein concentration was estimated using the Bio-Rad (United States) Bradford protein reagent dye with BSA as standard. SDS-PAGE was performed as previously described (Romero-Segura et al., 2009).

# Preparation of Anti-β-Glucosidase Polyclonal Antibodies and Immunoblot Analysis

Polyclonal antibodies against the native olive β-glucosidase protein purified from olive mesocarp according to the method described by Romero-Segura et al. (2009) were prepared in rabbit by Production and Animal Experimentation General Service of the University of Seville.

For Western blotting, 5–6 µg of protein samples were separated by SDS-PAGE as described above and electro-blotted onto nitrocellulose membrane using the Mini Trans-Blot <sup>R</sup> system (Bio-Rad). Bound anti-β-glucosidase primary antibody was detected using an anti-rabbit alkaline phosphatase-conjugated secondary antibody (Sigma–Aldrich, United States). When a mouse monoclonal anti-6xHis antibody (GE Healthcare) was used as primary antibody, an anti-mouse alkaline phosphataseconjugated antibody (Invitrogen) was employed as secondary antibody. To detect alkaline phosphatase activity after antibodies incubation, nitrocellulose membrane was submerged in a solution obtained by dissolving a SIGMAFASTTM BCIP <sup>R</sup> /NBT tablet (Sigma–Aldrich) in 10 ml of distilled water.

# RESULTS

A number of contigs with a high degree of similarity to plant β-glucosidases were selected from the olive EST database (Muñoz-Mérida et al., 2013). Among them, one which showed high expression levels in mesocarp tissue according to in silico expression analysis was chosen for cloning. Two specific primers were designed on the basis of this contig sequence, and used for PCR amplification together with an aliquot of an olive cDNA library of cultivar Picual. We obtained a full-length cDNA clone of 1848 bp, which was designated OepGLU, and contained an open reading frame encoding a predicted protein of 551 amino acids (Supplementary Figure S1), with a calculated molecular mass of 62.8 kDa and a pI of 6.6. The deduced OepGLU amino acid sequence from cultivar Picual displayed a 98% identity to an olive β-glucosidase cDNA clone (AY083162) from cultivar Koroneiki (Koudounas et al., 2015).

Alignment of the deduced amino acid sequence of OepGLU from cultivar Picual with other plant β-glucosidase protein sequences (Supplementary Figure S1) suggests that it codes for a β-glucosidase enzyme because it showed the conserved sequence motifs characteristic of the glycosyl hydrolases family 1 (GH1) T(F/L/M)NEP and Y(I/V)TENG, which include the two glutamic acid residues involved in the catalytic mechanism (Esen, 2003). In addition, the conserved amino acids Gln, His, Asn, Glu and two Trp which have been shown to be essential for the binding of the glucose (Cairns and Esen, 2010) were also found. A putative N-glycosylation site (N83) has also been detected in the OepGLU sequence, and the presence of a conserved GH1 family domain has been identified by NCBI Conserved Domain Search and the Pfam software. Analysis of OepGLU deduced protein sequence with target prediction software such as WolfPSORT and TargetP did not give rise to a clear subcellular localization. In fact, a putative nuclear localization signal (RRKR) could be found at amino acids 543–546 and a 25 amino acid N-terminal signal peptide was also predicted (Supplementary Figure S1), generating after its proteolytic cleavage a mature protein of 526 amino acids, with a calculated molecular mass of 60.3 kDa and a pI of 7.0.

# Purification and Immunological Characterization of OepGLU Recombinant Enzyme

To verify the functional identity of the OepGLU gene, Agrobacterium-mediated transient expression in N. benthamiana leaves was carried out. To that end, the OepGLU coding region, including a C-terminal 6xHis tag, was subcloned into the vector pH2GW7 using the GATEWAYTM technology. The resultant plasmid designated pH2GW7-OepGLU+His was used to transform A. tumefaciens GV3101, and tobacco leaves were infiltrated with bacterial cells carrying this plasmid. Expression of the recombinant protein was optimal at 3 days after infiltration. SDS-PAGE analysis of the crude extract did not show the protein band with the expected molecular mass for the recombinant OepGLU (**Figure 1A**). However, when the enzyme preparation purified by affinity chromatography was used, an intense band with a molecular mass of 65.5 kDa was detected (**Figure 1B**, lane 4). In contrast, this protein band was not observed in purified preparations isolated from tobacco leaves infiltrated with untransformed Agrobacterium cells (**Figure 1B**, lane 3).

Purified preparations of recombinant OepGLU were also analyzed by western blot using the anti-6xHis antibody (**Figure 1C**) and the antibody raised against the native olive β-glucosidase (**Figure 1D**). In both cases, a protein band with the same molecular mass as deduced from the SDS-gel was observed.

#### Kinetic Properties of Recombinant OepGLU in Comparison to the Native Enzyme

In the present study, purified preparations of recombinant OepGLU, but not crude extracts, were able to hydrolyze the artificial substrate p-nitrophenyl-β-D-glucopyranoside (pNPG) and the major natural olive phenolic glycoside oleuropein in a time-dependent manner. The enzymatic hydrolysis of oleuropein by purified recombinant OepGLU was monitored for up to 60 min by HPLC analysis (**Figure 2**). More than 50% of the oleuropein initially present was hydrolyzed after 5 min, producing as the first reaction product a mixture of oleuropein aglycone isomers (OA-isomers). After 15 min, the broad peak of OA-isomers was reduced, whereas 3,4-DHPEA-EA and hydroxytyrosol began to accumulate. The purified recombinant olive β-glucosidase exhibited a specific activity of 67.6 U/mg using oleuropein as substrate.

Once it was established the capacity of the purified recombinant olive β-glucosidase to hydrolyze oleuropein, its activity was also measured using the synthetic glucoside pNPG as substrate, exhibiting a much lower specific activity (1.4 U/mg). Hence, the natural substrate oleuropein was used to perform the biochemical characterization of the recombinant olive β-glucosidase.

5, purified GLU protein according to the method of Romero-Segura et al. (2009). 6 µg of protein were loaded per lane in all cases. The primary antibody dilutions used were 1:1500 (C) and 1:5000 (D) and the secondary antibody dilution were 1:2500 (C) and 1:10000 (D). The band corresponding to the olive β-glucosidase protein is denoted by an arrow.

The purified recombinant olive β-glucosidase displayed an optimum pH of 5.5 with a fast decrease of activity above it (**Figure 3A**), and showed > 80% of its highest activity at 25–45◦C, with an optimum temperature when assayed at 40◦C and a strong decline over 45◦C (**Figure 3B**). Thermal inactivation kinetics showed that recombinant OepGLU was active up to 40◦C with a significant decline over this temperature (**Figure 3C**).

In addition, β-glucosidase activity was assayed with a mixture of oleuropein, demethyloleuropein and ligstroside, the three most important phenolic glycosides identified in olive fruit, in order to mimic what happens during the milling step of the industrial VOO extraction process, when the enzyme and possible substrates meet as olive fruit tissues are disrupted. Substrate selectivity experiments (**Table 1**) showed that the highest activity level was reached using oleuropein as substrate, followed by ligstroside (25.6%) and demethyloleuropein (15.6%). In particular, after a 5 min reaction time it could be observed in the corresponding chromatogram that most of the oleuropein was hydrolyzed followed by ligstroside, with the consequent appearance of OA-isomers and ligstroside aglycone isomers (LA-isomers), respectively (Supplementary Figure S2). Verbascoside was used as negative control, since its chemical structure is significantly different from that of the other three phenolic glucosides and does not contain a non-reducing terminal β–D-glucosyl residue.

To obtain the kinetic parameters of recombinant olive β-glucosidase, enzyme activity was measured over a range of concentrations of oleuropein as substrate (Supplementary Figure S3). The calculated K<sup>m</sup> for oleuropein was 26.8 mM and the Vmax 263.2 U/mg, with a catalytic efficiency (Vmax/Km) of 9.8.

#### Tissue Specificity and Developmental Expression of Olive GLU Gene Is Cultivar-Dependent

Olive GLU gene expression levels were analyzed in different Picual and Arbequina olive tissues by qRT-PCR using specific primers (**Figure 4A**), with the aim of investigating its physiological role and its possible contribution to the content of the different phenolic compounds present in the VOO. In both cultivars, higher expression levels were detected in young drupes and mesocarp compared to seeds, where transcript levels were negligible. Young drupes of 9 WAF from cultivar Arbequina showed the highest expression level. Interestingly, transcript levels in mesocarp at turning stage (28 WAF) were much higher in Picual than in Arbequina cultivar.

Besides, olive GLU transcript levels were analyzed at different times during olive fruit development and ripening in Picual and Arbequina mesocarp and seed tissues (**Figure 4B**). A maximum transcription level was observed in green mesocarp (16 WAF) from both cultivars. In the case of cultivar Arbequina, the high expression level detected at 16 WAF dramatically decreased after this maximum, reaching constant low levels during the rest of the olive fruit development and ripening periods. On the contrary, in the cultivar Picual, olive GLU expression level showed a second maximum, lower than the first one, once the olive fruit ripening period has started (28 WAF). Unlike mesocarp tissue, olive GLU gene exhibited almost undetectable transcript levels in seeds from Picual and Arbequina (**Figure 4B**), which is consistent with the very low enzyme activity levels observed in both cultivars (Romero-Segura et al., 2011).

The same study was also performed in Picudo, Hojiblanca and Manzanilla cultivars, using olive fruit mesocarp at the three main stages in which olive fruit are harvested for olive oil production: yellow-green (23 WAF), turning or veraison (31 WAF), and mature or fully ripe (35 WAF) (Supplementary Figure S4). The olive GLU gene expression levels in the cultivars Picudo and Hojiblanca remained low, showing no significant changes. On the contrary, the transcript levels in the cultivar Manzanilla showed an increase during fruit ripening, and then decreased at the end of the ripening period.

#### Transcriptional Regulation in Olive Fruit Mesocarp of GLU Gene in Response to Abiotic Stresses

To examine the effect of different abiotic stresses on the expression level of the GLU gene in mesocarp tissue, olive tree

branches from Picual and Arbequina cultivars with olive fruit at turning stage (28 WAF) were incubated for 24 h modifying the standard conditions (25◦C with a 12 h light / 12 h dark cycle) dependent on the effect to be tested. No changes in the olive GLU gene expression levels were observed in olive fruit mesocarp when standard conditions were used (**Figures 5**, **6**).

When low temperature (15◦C) was used to incubate the olive fruit, a significant transient increase in the expression levels of olive GLU was observed in both cultivars, with a maximum after 3 or 6 h of treatment for Picual and Arbequina cultivars, respectively (**Figure 5A**). On the contrary, the incubation at high temperature (35◦C) of olive fruit brought about a reduction in the olive GLU gene transcript levels in both cultivars especially after 1 h of treatment, reaching almost undetectable transcript levels after 24 h (**Figure 5B**).

To examine the effect of darkness on the transcript levels of the olive GLU gene in Picual and Arbequina mesocarp tissues, branches were incubated for 24 h at 25◦C in the darkness. A decrease in the olive GLU gene expression levels was detected in both cultivars, mainly in Picual during the first 3 h of incubation (**Figure 6A**).

In addition, the potential involvement of olive GLU in the transcriptional response to wounding was tested in olive fruit subjected to mechanical damage from olive branches incubated at standard conditions. In this case, olive GLU gene expression levels declined progressively in both cultivars (**Figure 6B**).

On the other hand, since a number of studies point out that different water regimes could affect the phenolics content of VOO, showing a negative correlation between the content of secoiridoid derivatives and the water amount used for olive growing (Gómez-Rico et al., 2006; Servili et al., 2007), the effect of water regime on the transcript levels of olive GLU was investigated in olive fruit mesocarp of Picual and Arbequina cultivars grown with natural rainfall or irrigation. A higher transcript level was detected for the olive GLU gene when Picual and Arbequina were cultivated with natural rainfall only (**Figure 7**), except for late ripening stages (35 WAF) where transcript levels were very low in both watering conditions.

TABLE 1 | Substrate selectivity of purified recombinant OepGLU on a mixture of various natural olive glycosides.


Activity was determined by measuring the corresponding hydrolyzed natural olive glycoside by HPLC. Initial concentration of the substrates in the assay mixture was 5 mM each.

#### DISCUSSION

Several candidate olive β-glucosidase sequences were identified from an olive ESTs database (Muñoz-Mérida et al., 2013). This is consistent with the numerous β-glucosidase genes usually detected in the same plant, as reported in Arabidopsis (Xu et al., 2004). Since higher β-glucosidase activity levels have been observed in olive fruit mesocarp in comparison to seeds

Picual and Arbequina cultivars (A), and in mesocarp tissue (closed squares) or seeds (open squares) during the development and ripening of olive fruit (B). The beginning of fruit ripening, which coincides with the appearance of purple color, is indicated by an arrow.

temperatures for 60 min. 100% activity was 50.9 U/ml.

standard assay conditions after incubation of purified preparation at different

(Romero-Segura et al., 2011), one of the contigs which showed high expression levels in mesocarp tissue compared to seeds according to in silico expression analysis, was selected for cloning. Sequence analysis of the β-glucosidase gene isolated from olive (cv. Picual) showed that its deduced amino acid sequence contains the conserved sequence motifs and domains characteristic of the glycosyl hydrolases family 1 (GH1) (Esen, 2003; Cairns and Esen, 2010), and suggests that it codes for a β-glucosidase enzyme.

In order to characterize the immunological and kinetic properties of the olive β-glucosidase recombinant enzyme, transient expression in N. benthamiana leaves was performed. The band corresponding to the recombinant olive β-glucosidase was observed by SDS-PAGE only when the enzymatic preparation purified by affinity chromatography was applied, but not in the case of the purified preparation isolated from tobacco leaves infiltrated with untransformed Agrobacterium cells used as control, or when the crude extract was loaded onto the gel. In addition, western blot analysis of the purified preparations was

FIGURE 6 | Effect of darkness (A) and wounding (B) on the relative expression levels of olive GLU gene in the Picual and Arbequina mesocarp tissues. Branches with approximately 100 olive fruit at turning stage (28 WAF) were incubated using standard conditions (open squares), or incubated at 25◦C for 24 h in the dark (A) or subjected to mechanical damage and incubated at standard conditions for 24 h (B) (closed squares). Boxes in the upper part indicate light (open) or dark (closed) periods.

performed using two types of antibodies. In the first case, an anti 6xHis antibody detected a band with identical molecular mass (65.0 kDa) than that observed by SDS-PAGE, confirming that it corresponded to the recombinant enzyme. This molecular mass

of OepGLU is in the range of 55–65 kDa described for almost all plant β-glucosidase monomers (Esen, 2003), and is identical to that of 65.4 kDa reported for the β-glucosidase protein purified from olive fruit mesocarp (Romero-Segura et al., 2009). In the second case, an antibody raised against the native olive β-glucosidase protein purified from olive mesocarp (Romero-Segura et al., 2009) detected a band with identical molecular mass. Furthermore, when a preparation corresponding to native enzyme purified according to the method of Romero-Segura et al. (2009) was used as positive control, an intense band with a similar apparent size was detected. All these data strongly suggest that the recombinant OepGLU corresponds to the native β-glucosidase enzyme previously purified by our group from olive fruit mesocarp, which has been demonstrated to play a key role in shaping the VOO phenolic composition (Romero-Segura et al., 2012). Interestingly, although native and recombinant olive β-glucosidase enzymes exhibit similar molecular masses, the occurrence of post-translational modifications cannot be discarded. In fact, a unique N-glycosilation site was predicted in the OepGLU amino acid sequence.

Although the capacity of the recombinant olive β-glucosidase to hydrolyze oleuropein has been previously demonstrated using crude extracts from N. benthamiana leaves (Koudounas et al., 2015), a quantitative in vitro enzymatic assay using purified recombinant protein, to avoid interferences of metabolites and enzyme activities present in the tobacco leaves crude extract, has not been reported so far. Furthermore, a comprehensive characterization of the kinetic properties of the enzyme has not been carried out up to date. Purified preparations of recombinant OepGLU exhibited β-glucosidase activity with both, the artificial substrate pNPG and the major natural olive phenolic glucoside oleuropein. This result demonstrates that the OepGLU gene code for a β-glucosidase enzyme, confirming its functional identity. Interestingly, purified recombinant OepGLU showed very low activity levels when pNPG was used as substrate, as previously reported for the native enzyme from olive (Romero-Segura et al., 2009) and privet tree (Konno et al., 1999). It has been described for plant β-glucosidases that there is not a correspondence between the activity levels exhibited using non-physiological substrates such as pNPG, and those obtained using their natural substrates (Cairns et al., 2015). The comparison of the data obtained from the time-course of the oleuropein hydrolysis catalyzed by the purified recombinant OepGLU, with those previously reported using the purified native enzyme (Romero-Segura et al., 2012), shows that the native olive β-glucosidase hydrolyzes oleuropein more efficiently. These discrepancies on the relative activity of β-glucosidase proteins purified from the native plant and the corresponding recombinant proteins have been previously described (Himeno et al., 2013). OA-isomers are the first reaction products formed by the recombinant OepGLU after the hydrolysis of the glucoside. The elimination of the glucose molecule could destabilize the phenolic aglucone and during the reaction course the formed isomers tend to its stabilization yielding 3,4-DHPEA-EA, which simultaneously produce hydroxytyrosol by chemical hydrolytic reactions, since the recombinant β-glucosidase is the only enzyme present in the reaction mixture.

The purified recombinant OepGLU displays the highest activity at pH 5.5 and 40◦C, similar values to those reported for the olive native β-glucosidase enzyme (Romero-Segura et al., 2009). Optimum pH in the range of 4.5-5.5 has been described for other β-glucosidases from plants such as rice (Akiyama et al., 1998) and Citrus sinensis (Cameron et al., 2001). With respect to optimum temperature, data in the interval of 40-50◦C has been previously reported for the β-glucosidases from maize (Esen, 1992) and Citrus sinensis (Cameron et al., 2001). The recombinant olive β-glucosidase exhibits a high thermostability up to 40◦C, as reported for the native enzyme from olive (Romero-Segura et al., 2009). Taking into account its thermal resistance profile, the OepGLU enzyme could act during the malaxation step of the industrial process to obtain VOO, where temperatures higher than 30◦C are not unusual. However, it has been reported that after 15 min of malaxation at this temperature, no β-glucosidase activity could be detected in the paste, likely due to the presence of enzyme inhibitors (García-Rodríguez, 2014). In the same way, the thermal resistance of OepGLU could explain why the thermal treatment of olive fruit at temperatures of 56–68◦C just before the milling step causes a high decrease in the content of the secoiridoid derivatives in the VOO (Yousfi et al., 2010), since at those temperatures the olive β-glucosidase enzyme should be inactivated, highly reducing the degree of hydrolysis of the phenolic glucosides. Substrate selectivity experiments showed that the recombinant olive β-glucosidase exhibits a higher preference for oleuropein as substrate, followed by ligstroside and demethyloleuropein. These results are in agreement to those reported for the native enzyme (Romero-Segura et al., 2009), and demonstrate the capacity of the recombinant OepGLU to hydrolyze the three main phenolic glucosides of olive fruit, which are the precursors of the main secoiridoid derivatives present in the VOO. The recombinant olive β-glucosidase showed kinetic parameters of K<sup>m</sup> for oleuropein (26.8 mM) and Vmax (263.2 U/mg), different of those described (3.8 mM and 2,500 U/mg, respectively) for the native enzyme (Romero-Segura et al., 2009), and indicate a lower catalytic efficiency of the recombinant enzyme as previously mentioned.

Plant GLU genes are developmentally regulated (Morant et al., 2008), and exhibit different spatial expression patterns depending on their physiological functions. In this sense, olive GLU gene showed different transcript levels in the studied tissues, showing its spatial regulation. Besides, changes in the GLU transcript level reveal that this gene is also temporally regulated, and moderately correlate with changes in the β-glucosidase activity levels in Picual and Arbequina previously reported (Romero-Segura et al., 2012). This minor discrepancy observed between transcript and activity levels could be explained by the occurrence of post-translational modifications of the olive β-glucosidase enzyme such as N-glycosilation, as previously mentioned. In fact, N-glycosilation of plant β-glucosidases has been widely described (Morant et al., 2008). Furthermore, the contribution of other olive β-glucosidases isoforms to the enzyme activity levels observed cannot be discarded. Interestingly, previous studies on these two cultivars have demonstrated significant differences not only in terms of β-glucosidase activity but also in their

phenolic profiles along fruit ripening, with Picual oils being described as a VOO with medium-high phenolic content at any ripening stage while Arbequina oils typically have mediumlow concentration of phenolic compounds (Romero-Segura et al., 2012). Moreover, expression data from cultivars Picudo, Hojiblanca and Manzanilla confirm the cultivar-dependent transcriptional regulation of the olive GLU gene during olive fruit development and ripening. Hence, knowledge of the specific GLU expression profile for each olive cultivar is critical to determine the optimum harvesting time in order to obtain VOO with the highest phenolic content.

In our study, we have also found that the transcript level of the olive GLU gene in olive fruit mesocarp is transcriptionally regulated in response to different abiotic stresses. Low and high temperatures brought about the induction and repression of olive GLU gene, respectively. These data are in agreement with those described for the Arabidopsis β-glucosidase gene AtBG1, since its expression levels increase when leaves are subjected to cold stress (Lee et al., 2006). The decrease of the expression levels of olive GLU gene observed at 35◦C is also consistent with the lower content of secoiridoid derivatives in oils extracted from olive fruit pre-treated for 24 h at 30–50◦C (García et al., 2001). Darkness treatment of olive fruit from both cultivars produces a decrease in the expression levels of olive GLU gene, indicating that light may be implicated in the regulation of its transcription. On the other hand, although plant β-glucosidases have been involved in the defense against herbivores and pathogen attacks (Minic, 2008), this primary response acts at enzyme activity level, being regulated by compartmentalization, since enzyme and natural substrates are differentially located at subcellular level and only meet when cell integrity is disrupted (Morant et al., 2008). Therefore, it is not surprising that olive GLU gene is not induced after wounding, which indicates that the regulation at transcriptional level is not operating. In contrast to our data, the transcript increase of olive GLU gene in olive fruit mesocarp after olive fruit fly attack has been reported (Corrado et al., 2012), although only one olive fruit developmental stage was used in that study. Finally, higher expression levels of the olive GLU gene were detected under water deficit conditions in Picual and Arbequina cultivars, mainly at turning stage. Similar results have been recently reported for the OeGLU12-like2 gene from cultivar Frantoio (Cirilli et al., 2017). The increase in the olive GLU transcript level detected under water deficit conditions, with the corresponding increase in the enzyme activity, could also significantly contribute to explain the higher content of secoiridoid derivatives reported in VOO obtained from olive fruit under water stress conditions (Artajo et al., 2006; Stefanoudaki et al., 2009).

#### CONCLUSION

We have purified the olive recombinant β-glucosidase enzyme (OepGLU). Immunological detection, molecular mass determination and kinetic properties of the recombinant OepGLU strongly indicates that it corresponds to the native olive β-glucosidase enzyme previously purified from olive fruit mesocarp (Romero-Segura et al., 2009), which has been shown as the main enzyme involved in the transformation during VOO processing, of oleuropein and other phenolic glycosides from olive fruit onto their corresponding secoiridoid derivatives present in VOO. However, the contribution of other olive fruit β-glucosidase isoenzymes to oleuropein hydrolysis cannot be discarded. Our results have also shown that olive GLU gene expression is not only spatially and temporally regulated in olive fruit, but also is cultivar-dependent and regulated by temperature, light and water regime. This study represents a significant step to elucidate the factors responsible for the phenolic content and profile of VOO. In addition, this information will help in the design of molecular markers for the marker-assisted selection of novel olive cultivars with improved phenolic content and composition in their oils.

# ACCESSION NUMBERS

The nucleotide sequence reported in this paper for OepGLU gene has been submitted to the GenBank/EMBL/DDBJ database with the accession number KX278417.

# AUTHOR CONTRIBUTIONS

AP and JM-R conceived and designed the study, DV-P and CR-S performed the cloning and qRT-PCR experiments, DV-P and RG-R carried out the immunological studies, DV-P performed the transient expression experiments and the analytical and biochemical studies, MH and JM-R supervised the cloning, qRT-PCR and immunological studies, FV and IG supervised the transient expression experiments, AP supervised the analytical and biochemical studies, JM-R wrote the manuscript. All authors discussed and commented the manuscript.

# FUNDING

This work was supported by the Spanish Ministry of Economy and Competitiveness [grants no. AGL2008-00258, AGL2011- 24442]. Fellowship from the JAE-Predoctoral CSIC program to DV-P. Fellowships from the FPI predoctoral program to CR-S and RG-R. Contract from the JAE-Postdoctoral CSIC program to MH.

# ACKNOWLEDGMENT

We thank Rosario Sánchez (Instituto de la Grasa, CSIC) for technical assistance.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2017.01902/ full#supplementary-material

# REFERENCES


responsible for the linoleic acid content in virgin olive oil. J. Agric. Food Chem. 57, 6199–6206. doi: 10.1021/jf900678z


secoiridoids in the small intestine. Br. J. Nutr. 105, 1607–1618. doi: 10.1017/ S000711451000526X


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Velázquez-Palmero, Romero-Segura, García-Rodríguez, Hernández, Vaistij, Graham, Pérez and Martínez-Rivas. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Variability in Susceptibility to Anthracnose in the World Collection of Olive Cultivars of Cordoba (Spain)

Juan Moral 1, 2 \*, Carlos J. Xaviér <sup>3</sup> , José R. Viruega<sup>3</sup> , Luis F. Roca<sup>3</sup> , Juan Caballero<sup>4</sup> and Antonio Trapero<sup>3</sup>

<sup>1</sup> Departamento de Agronomía, Universidad de Córdoba, Córdoba, Spain, <sup>2</sup> Department of Plant Pathology, Kearney Agricultural Research and Extension Center, University of California, Davis, Davis, CA, United States, <sup>3</sup> Departamento de Agronomía, ETSIAM, Universidad de Córdoba, Córdoba, Spain, <sup>4</sup> Departamento de Olivicultura, IFAPA Centro Alameda del Obispo, Córdoba, Spain

Anthracnose of olive (Olea europaea ssp. europaea L.), caused by Colletotrichum species, is a serious disease causing fruit rot and branch dieback, whose epidemics are highly dependent on cultivar susceptibility and environmental conditions. Over a period of 10 years, there have been three severe epidemics in Andalusia (southern Spain) that allowed us to complete the assessment of the World Olive Germplasm Bank of Córdoba, one of the most important cultivar collections worldwide.A total of 308 cultivars from 21 countries were evaluated, mainly Spain (174 cvs.), Syria (29 cvs.), Italy (20 cvs.), Turkey (15 cvs.), and Greece (16 cvs.). Disease assessments were performed using a 0–10 rating scale, specifically developed to estimate the incidence of symptomatic fruit in the tree canopy. Also, the susceptibility of five reference cultivars was confirmed by artificial inoculation. Because of the direct relationship between the maturity of the fruit and their susceptibility to the pathogen, evaluations were performed at the end of fruit ripening, which forced coupling assessments according to the maturity state of the trees. By applying the cluster analysis to the 308 cultivars, these were classified as follows: 66 cvs. highly susceptible (21.4%), 83 cvs. susceptible (26.9%), 66 cvs. moderately susceptible (21.4%), 61 cvs. resistant (19.8%), and 32 cvs. highly resistant (10.4%). Representative cultivars of these five categories are "Ocal," "Lechín de Sevilla," "Arbequina," "Picual," and "Frantoio," respectively. With some exceptions, such as cvs. Arbosana, Empeltre and Picual, most of the Spanish cultivars, such as "Arbequina," "Cornicabra," "Hojiblanca," "Manzanilla de Sevilla," "Morisca," "Picudo," "Farga," and "Verdial de Huévar" are included in the categories of moderately susceptible, susceptible or highly susceptible. The phenotypic evaluation of anthracnose reaction is a limiting factor for the selection of olive cultivars by farmers, technicians, and breeders.

Keywords: olive, diseases, anthracnose, Colletotrichum, fruit rot

# INTRODUCTION

Olive (Olea europaea ssp. europaea L.) is the most extensively planted fruit crop in the world, covering more than 10.2 million hectares of land, mainly in the Mediterranean Basin. Almost 25% of the total olive trees are grown in Spain, where more than 45% of the world's olive oil is produced (FAO, 2015). Olive industry (oil and table) is a vital sector of Spanish agro-food system with a total

#### Edited by:

Leire Molinero-Ruiz, Instituto de Agricultura Sostenible (CSIC), Spain

#### Reviewed by:

Franco Nigro, Università degli studi di Bari Aldo Moro, Italy Natalia Peres, University of Florida, United States

\*Correspondence:

Juan Moral juanmoralmoral@yahoo.es; jmoral@ucdavis.edu

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 31 July 2017 Accepted: 18 October 2017 Published: 06 November 2017

#### Citation:

Moral J, Xaviér CJ, Viruega JR, Roca LF, Caballero J and Trapero A (2017) Variability in Susceptibility to Anthracnose in the World Collection of Olive Cultivars of Cordoba (Spain). Front. Plant Sci. 8:1892. doi: 10.3389/fpls.2017.01892

**153**

value of 3 billion euros, more than 385,000 agricultural holdings, and over half million farmers (INE, 2015)<sup>1</sup> In fact, the global price of olive oil yearly depends to a large extent to the Spanish production.

Olive oil has numerous beneficial properties, mainly associated with its high content of monounsaturated oleic acid (Espósito et al., 2004) albeit other minor components, such as phenolic compounds (i.e., hydroxytyrosol, oleocanthal, and squalene) have also shown substantial benefits for consumer health (Beauchamp et al., 2005). For this reason, olive oil consumption and, concomitantly the olive-growing area, have notably increased worldwide in recent years. Projects to expand the olive-growing surface affect new areas for this crop, such as different provinces of China, India, Saudi Arabia, or the States of Florida and Hawaii in the USA. In many of these plantation areas, the adaptation of the different olive cultivars and the pests and diseases of this crop are unknown. Fortunately, the technicians and farmers have a broad range of cultivars that can be selected according to their adaptation to the target agro-environment area.

Olive tree was probably domesticated over 6,000 years ago in the Middle East, from which it spread until covering the entire Mediterranean Basin. It is also accepted the existence of other diversification centers across Mediterranean Basin (Besnard et al., 2013; Díez et al., 2015). The first farmers selected the most outstanding individuals in each olive-growing area according to their adaptation to the soil and climate and their agronomic characteristics. These original cultivars were subsequently maintained by vegetative propagation and, in general, have remained confined to small areas (Rallo et al., 2005; Díez et al., 2015). The presence of homonymy (different cultivars with the same name in different zones), synonymy (a given cultivar with various names in the areas that it occupies), and wrong denominations is frequent in this crop (Ganino et al., 2006; Trujillo et al., 2013). Thereafter, the exact number of olive cultivars is unknown but it is likely that this number reaches around 2000 (Bartolini et al., 1998). For this reason, and due to the absence of systematic studies, there is a reduced classification of the olive cultivars according to their susceptibility to many diseases, particularly in the case of anthracnose (Moral and Trapero, 2009).

Olive anthracnose, caused by the fungal complex species Colletotrichum acutatum sensu lato (s. lat.), C. boninense s. lat., and C. gloeosporioides s. lat. is the most destructive disease of olive fruit and is widely distributed in many olive-growing regions of the world (Martín and García-Figueres, 1999; Talhinhas et al., 2005; Cacciola et al., 2012; Moral et al., 2014; Schena et al., 2014). About 13 Colletotrichum species, belonging to these three complex species have been described affecting this crop (Talhinhas et al., 2005; Schena et al., 2014; Chattaoui et al., 2016). In general, several Colletotrichum species coexist in each olive-growing region with one or two dominant species and several secondary (Faedda et al., 2011; Moral et al., 2014). For example, the species C. godetiae (syn. C. clavatum) and C. acutatum sensu stricto (s. str.) are dominant in olive orchards

Olive anthracnose is also called soapy fruit due to its characteristic fruit-rot syndrome with a profuse production of spores in a gelatinous matrix under wet conditions (Moral et al., 2009). In addition to the direct losses due to the premature fall of affected fruit, phytotoxins produced by the pathogen in the rotten fruit cause a second syndrome, the dieback of shoots and branches (Ballio et al., 1969; Moral et al., 2009). Furthermore, even with low fruit-rot incidence (5%), the olive oil coming from affected orchards shows poor chemical and organoleptic characteristics that restrict or impede its commercialization as extra virgin olive oil (Moral et al., 2014).

Both cultivar susceptibility and weather conditions profoundly influence on the olive anthracnose severity (Moral and Trapero, 2012). For example, in the central provinces of Andalusia, where the susceptible cultivars Hojiblanca and Picudo grow, severe epidemics occur if the weather conditions are conducive during the autumn despite the farmers typically applying a 2–3 copper-based fungicide treatments during this period. In southern Portugal, where autumn-winter is more humid than in central Andalusia, severe epidemics occur in super-high-density (hedgerow system) olive orchards planted with the moderately susceptible cultivar Arbequina (Moral and Trapero, 2012; Moral et al., 2014).

Although satisfactory results controlling olive anthracnose can be obtained using inorganic and organic fungicides, field application is not always effective for different reasons, such as: (i) the number of registered fungicides in post-bloom is very small; (ii) Colletotrichum shows a high tolerance to copper, the basic ingredient of the main fungicides used; (iii) and the optimum period for fungicide application is relatively short (Roca et al., 2007; Cacciola et al., 2012; Moral et al., 2014). Since the unripe fruit are resistant to the pathogen, the use of early harvesting before the fruit reaches full ripening or the selection of late-maturing cultivars are efficient and environmentally friendly control measures (Moral et al., 2008). However, these practices have some agronomical inconveniences: (i) if the fruit is immature (not completely black), it usually shows less oil content than mature fruit; (ii) the immature fruit show a high fruit retention force being difficult its mechanical harvest; and (iii) when the fruit still immature (green) during the winter, it is highly sensitive to frost damage (Rallo et al., 2005). Therefore, the use of resistant olive cultivars to anthracnose is the most effective control method, which does not show the previous inconvenience, and can be combined with other measures, such as biological and chemical methods or cultural practices (Moral et al., 2008; Moral and Trapero, 2009; Preto et al., 2017).

The World Olive Germplasm Bank of Córdoba (WOGBC) from Andalusia region, southern Spain, is a magnificent setting to evaluate the susceptibility of the olive cultivars to aerial diseases, including anthracnose, for several reasons: (i) the WOGBC is currently on of the largest olive germplasm banks with more than 900 accessions and 411 cultivars from 24 countries (A. Belaj, unpublished data); (ii) it is located in an endemic anthracnose area (Moral et al., 2015); and (iii) the whole group

of southern Italy (Schena et al., 2017); while C. acutatum s. str. and C. gloeosporioides s. s. are dominant and secondary species, respectively, in Tunisia (Chattaoui et al., 2016).

<sup>1</sup>http://www.ine.es..

of olive trees of WOGBC has been identified using Simple Sequence Repeats (SSR) markers, resolving the discrepancies due to misidentification of the trees (Trujillo et al., 2013). During the last 10 years, we have developed and validated laboratory and field methods to evaluate the susceptibility of olive cultivars to anthracnose (Moral et al., 2008; Moral and Trapero, 2009). These methods have been used to assess the resistance of traditional olive cultivars and new ones, coming from the breeding program of the University of Córdoba (UCO) and the Andalusian Institute for Research and Formation in Agriculture and Fishery (IFAPA) (Moral and Trapero, 2009; Moral et al., 2015). While the disease reaction of some olive cultivars is well-known (Moral and Trapero, 2009; Talhinhas et al., 2015), most of cultivars are still unclassified for their resistance to this pathogen (Moral et al., 2014).

Because the olive cultivars from WOGBC have not been systematically screened for resistance to Colletotrichum, the objectives of this study were the following: (i) to assess the susceptibility of cultivars in the WOGBC to anthracnose caused by Colletotrichum spp.; (ii) to identify cultivars representative of each of the susceptibility categories determined in this study; and (iii) to correlate the anthracnose susceptibility with other phenotypic characteristics of cultivars. The phenotypic evaluation of disease reaction is a limiting factor for the selection of olive cultivars by farmers, technicians, and breeders.

# MATERIALS AND METHOD

#### Plant Material and Orchard

We have evaluated 308 accessions of cultivated olive trees from 22 countries of origin (**Table 1**). These accessions are conserved as a live collection in the WOGBC located in a 5.2-ha flat and uniform field (37.51◦N, 4.18◦W, altitude 113 m) at the IFAPA, Center "Alameda del Obispo." The soil of the orchard was classified as a Typic Xerofluvent with a sandy-loam texture, and the climatic conditions were typical of the Mediterranean area (García-López et al., 2016). The experimental orchard is located ≈1Km from the main river of Andalucía, Guadalquivir River, in a humid area where anthracnose is an endemic disease (Moral et al., 2015). Because these accessions are fully authenticated by Trujillo et al. (2013), we will refer to them as cultivars. The WOGBC currently contains more than 411 cultivars (Caballero et al., 2006; A. Belaj, unpublished data). Olive trees used were planted between 1982 and 1992, so they were at least 5-yearsold trees when evaluated. Because the experimental plot has a completely randomized design, several anthracnose susceptible cultivars (**Table 1**) are randomly distributed in the WOGBG resulting in a homogeneous distribution of Colletotrichum s. lat. inoculum, as with other pathogens, such as Venturia oleaginea (López-Doncel, 2003; Moral, 2009). In this study, which covers the period 1997–2008, we present the result of the evaluation of 308 well-identified cultivars according to their reaction to the pathogen. The remaining cultivars were not evaluated for different reasons: tree loss by Verticillium dahliae, absence of yield, absence of replicated trees of the same cultivar, etcetera. Each of the 308 evaluated cultivars has from two to 22 replicated trees randomly distributed in the experimental plot.

The olive trees were planted in a 7 × 7 m square using one trunk per plant. The trees were initially pruned to select three or four main branches to form the canopy structure: a free open vase. After that, the olive trees were periodically pruned to renewal branches or to eliminate dead branches. The experimental orchard was irrigated during spring-summer applying over 2,000 m<sup>3</sup> of water per year using drip irrigation. Three Bordeaux mixture treatments (Caldo Bordelés Vallés, IQV, 2 kg active Cu++ per ha and treatment) were applied throughout the year at the end of winter (February–March), spring (April– May), and autumn (October) (Roca et al., 2007). According to olive pest, it should be noted that the population of the olive fruit fly (Bactrocera oleae) is well established in the WOGBG area albeit there are not specific studies about it (Caballero J., unpublished data).

In previous studies, we identified the population of the pathogen in the WOGBC as C. acutatum group A4 (Moral et al., 2015), which is the only group described in the central provinces of Andalusia (Moral et al., 2014). The molecular group A4 was reclassified as C. clavatum species nova (sp. nov.) by Faedda et al. (2011). However, Damm et al. (2012) showed then that C. clavatum is synonym of the previous species C. godetiae. Because the latter name is rarely used in non-etiology studies, we maintain the generic name C. acutatum s. lat. throughout the present article.

#### Assessment of Disease Incidence in the Field

Fruit-rot incidence was assessed in each olive tree using a previously described and validated 0–10 rating scale (Moral and Trapero, 2009). The rating scale is the logistic transformation of the proportion of symptomatic fruit and it based on the sigmoidal equation:

$$Y = \frac{100}{1 + 3^{(7-X)}} \tag{1}$$

in which Y = percentage of affected fruit and X = scale value (0– 10). The fruit-rot incidence data are normalized using this scale, and the scale values can be analyzed directly using parametric methods (Moral and Trapero, 2009; **Table 2**). When the olive trees showed a high percentage (>5%) of symptomatic fruit, the assessor directly quantified the percentage of symptomatic fruit on the tree canopy. The percentage of affected fruit was then transformed in scaling rating values (0–10) rewriting the above equation as:

$$X = 7 - \frac{Ln\left(\frac{100}{Y} - 1\right)}{Ln3} = 7 - \log\_3\left(\frac{100}{Y} - 1\right) \tag{2}$$

Disease incidence was assessed at different times when most the fruit showed a value 4 in the 0 to 4 rating scale (from green to black, respectively) for olive fruit ripening (Rallo et al., 2005). Disease incidence was assessed from mid-December to the end of January in 1997–1998, 2005–2006, and 2006–2007 when severe epidemics of olive anthracnose occurred in the experimental orchard (Moral and Trapero, 2009). Data from lowlevel epidemic years were not considered because these data only


TABLE 1 | Reaction of olive cultivars to anthracnose, caused by Colletotrichum acutatum, in the World Olive Germplasm Bank of Córdoba (Spain).

<sup>a</sup>Reference cultivars appear in bold.

divide the olive cultivars into two groups, the highly susceptible cultivars, which show a low-medium incidence of fruit rot, and the rest of cultivars showing no symptomatic fruit (Moral and Trapero, 2009; Xaviér, 2009).

Because at the beginning of the spring 2007 there was an important peak of the dieback syndrome (chlorosis and wilting of leaves and dieback of shoots and branches, i.e., green and lignified shoots, respectively) after two epidemic years, we assessed this second syndrome according to volume of tree canopy affected using the following rating scale: 0 < 10%, 1 = 10–24%, 2 = 25– 49%, 3 = 50–74%, 4 = 75–90%, and 5 ≥ 90% (Moral and Trapero, 2009). In addition, we evaluated the presence of the pathogen on symptomatic tissues (leaves and dieback shoots) by culturing small pieces on acidified Potato Dextrose Agar plus 100 mg of copper sulfate (CuSO4·5H2O) per liter (Moral et al., 2009) from May to August during 2006 and 2007.

#### Inoculation of Detached Fruit

Apparently asymptomatic yellowish-green—value of 2 on the olive ripening scale (Rallo et al., 2005)—fruit of five cultivars ("Arbequina," "Frantoio," "Lechín de Sevilla," "Ocal," and "Picual") were collected at the onset of ripening from olive trees in the experimental orchard during the non-epidemic year 2012. These reference cultivars were selected by their well-known response to anthracnose in the field (Moral et al., 2014, 2015). The fruit were inoculated and incubated according to Moral et al. (2008). Briefly, fruit were washed, disinfested, and sprayed with a conidial suspension (10<sup>5</sup> conidia per ml or sterile water for the control) of isolate Col-104 of C. acutatum s. lat. Inoculated and control fruit were incubated in humid chambers (plastic containers) at 22–24◦C under fluorescent lights (12 h alternating photoperiod, 40 µmol m−<sup>2</sup> s −1 ). Disease severity was periodically assessed for 80 days using a 0–5 rating scale where 0 = no



<sup>a</sup>Values taken from the logistic function Y = 100 1+3 (7−X) in which Y = percentage of affected fruit and X = scale value.

<sup>b</sup>Range of affected fruit (%) for each scale value.

<sup>c</sup>Detection limit of visual assessments in the field (one affected fruit from 2,500 observed fruit per tree; Moral and Trapero, 2009).

visible symptoms, 1 = visible symptoms affecting <25% of the fruit surface, 2 = 25–49%, 3 = 50–74%, 4 = 75–100%, and 5 = soapy fruit (Moral et al., 2008). There were two replicates (moist chambers) per treatment and 25 fruit per replicate arranged in a completely randomized design. The experiment was repeated once and data analyses were performed on the pooled data from the two replicates.

#### Statistical Analysis

To estimate the phenotypic stability of the olive cultivar on anthracnose reaction during the three studied seasons, we calculated the proportion of variability explained by the cultivar, season, and the interaction cultivar-season. For that, analysis of variance (ANOVA) was performed on the rating scale data of fruit-rot incidence of the cultivars, which were evaluated during the three epidemic seasons with a representative number of repetitions (at least two olive trees of the same cultivar). Subsequently, we calculated eta squared (η 2 ) as the ratio the variability associated with an effect and the total variability of our analysis (η <sup>2</sup> = SSeffect SSTotal ). Because η <sup>2</sup> overestimates the effect size, we also calculated partial omega squared (ω 2 ) using the equation (Fritz et al., 2012):

$$
\omega^2 = \frac{\text{SS}\_{\text{effect}} - df\_{\text{effect}} \times \text{MS}\_{\text{error}}}{\text{SS}\_{\text{total}} + \text{MS}\_{\text{error}}} \tag{3}
$$

in which SS = sum of squares, df = degrees of freedom, and MS = mean square. Likewise, ANOVA was performed on the rating scale data of fruit-rot incidence and branch dieback severity of the five reference cultivars.

In previous work, we have classified 18 cultivars into three categories of susceptibility by comparing their reaction according to the disease response of moderately resistant cultivar Arbequina (Moral et al., 2015). In this study, due to the high number of studied cultivars and, for establishing groups more homogeneous according to their reaction to the pathogen, the cultivars are classified roughly into five categories. These five susceptibility categories were determined according to the reaction of the reference cvs. Arbequina, Frantoio, Lechín de Sevilla, Ocal, and Picual, which were selected by their well-known response to anthracnose under field conditions (Moral et al., 2005, 2014, 2015). Subsequently, a non-hierarchical K-Means cluster analysis with the initial cluster center method under the assumption of five groups (k = 5, one group for each of the reference cultivars) was applied to the classification of the whole of the cultivars using the average value of the epidemic seasons due to the fact that it is the best parameter to discriminate the susceptibility/resistance reaction of the cultivars (Moral and Trapero, 2009; **Table 2**). The relationship between fruit-rot incidence and branch-dieback severity were analyzed by the non-parametric Spearman rank correlation. A chi-squared test was used to determine whether the country of origin of the cultivars had any effect on the frequency of resistant or susceptible cultivars.

In the artificial inoculation test, disease severity values of inoculated fruit were used to calculate the McKinney's Index (MKI; McKinney, 1923), in which disease severity is expressed as a percentage of the maximum possible level according to the following formula:

$$\text{McKinney's Index} = \frac{\sum \text{( $n\_i \times i$ )}}{5 \times N} \times 100\tag{4}$$

where i represents the severity of symptoms (0–5), n<sup>i</sup> is the number of fruit with the severity of i, and N is the total number of evaluated fruit. For each cultivar and replication, the standardized area under the disease progress curve (SAUDPC) was calculated by trapezoidal integration of MKI values over time (80 days) expressed as a percentage of a maximum theoretical curve. The SAUDPC was transformed to arcsin<sup>√</sup> SAUDPC/100 when necessary for homogeneity of variance. ANOVA was performed on the SAUDPC data, and treatment means were compared using LSD test at P = 0.05. Eta squared (η 2 ) and omega squared (ω 2 ) were computed in MS Excel (Microsoft, Redmond, WA). Data from all experiments were analyzed using Statistix 9 (Analytical Software, Tallahassee, FL) except K-Means Cluster Analysis that was conducted using the SPSS 16 software.

#### RESULTS

#### Susceptibility of Cultivars in the Field

The incidence of fruit affected by C. acutatum s. lat. on the trees of the WOGBC was evaluated during three seasons (1997–1998, 2005–2006, and 2006–2007) due to the low fruit-rot incidence during the remaining years. Among these epidemic seasons, the most severe outbreak occurred in 1997–1998, whereas the

lowest levels of the disease occurred in 2005–2006. For example, about 50% of the evaluated olive trees had a rating value scale ≥ 8 (i.e., median ≈8) in 1997–1998 and 2006–2007, while the media value was around 5 in 2005–2006 indicating a higher degree of dispersion (**Figure 1**). In the studied seasons, the fruit-rot incidence greatly varied among cultivars and season. According to η 2 and ω 2 , the cultivar was the most important factor explaining the total variance (approximately 60%) followed by the season and the interaction genotype-season. In other words, the differences in severity of symptoms among olive trees during the epidemic seasons were mainly due to the genotype.

Because cultivar reaction to the pathogen ranged continuously from highly susceptible to highly resistant, we selected five cultivars across the susceptibility range. Each of the selected cultivars was represented by at least five repetitions (trees) in the experimental plot. These cultivars were: "Frantoio" (n = 8 trees) for the Highly Resistant cultivars (HR, 0–2 in the rating scale), "Picual" (n = 8 trees) for the Resistant cultivars (R, 2.1–4.3 in the rating scale), "Arbequina" (n = 6 trees) for the Moderately Susceptible cultivars (MS, 4.4–6.2 in the rating scale), "Lechín de Sevilla" (n = 5 trees) for the Susceptible cultivars (S, 6.3–7.9 in the rating scale), and "Ocal" (n = 21 trees) as representative of the Highly Susceptible cultivars (HS, > 8 on the rating scale). The fruit-rot incidence caused by C. acutatum s. lat. on the five selected cultivars varied significantly (P < 0.05) among them albeit these differences depended on the year (**Table 3**). Concurrent with the greatest anthracnose epidemic in 1997–1998, we detected main differences among these cultivars according to their anthracnose resistance. The cultivars Picual (R) and Arbequina (MS) did not differ significantly from each other during the three epidemic seasons according to the fruitrot incidence, although the difference was significant when the analysis was applied to the average fruit-rot incidence of the three seasons. Likewise, the branch-dieback severity showed TABLE 3 | Incidence of anthracnose in five olive cultivars in the World Olive Germplasm Bank of Córdoba during 1997–1998, 2005–2006, and 2006–2007.


<sup>a</sup>Fruit-rot incidence was estimated using a 0–10 rating scale in which binary data (proportion of affected fruit) are normalized by applying the logit transformation of proportion. Scale values were directly subjected to analysis of variance and mean comparison tests.

<sup>b</sup>Volume of olive tree canopy affected dieback syndrome (chlorosis and wilting of leaves and dieback of shoots and branches) according to the following rating scale: 0 < 10%, 1 = 10–24%, 2 = 25–49%, 3 = 50–74%, 4 = 75–90%, and 5 = > 90% (Moral and Trapero, 2009).

<sup>c</sup>Means with the same letter are not significantly different according to the Fisher's protected LSD test at P = 0.05.

by reference cultivars differed significantly, being particularly severe in "Ocal" (**Table 3**). There was a significant and positive correlation (Spearman's correlation; r = 0.691, P < 0.0001) between the fruit-rot incidence and the volume (%) of tree canopy affected by the pathogen for these five cultivars.

Due to the large number of cultivars evaluated, the results showed a continuous gradation in the susceptibility of cultivars ranging from completely resistant to extremely susceptible. In other words, the resistance ranged from cultivars with none or very few affected fruit (e.g., "Dolce Agogia," "Frantoio," "Grappolo," or "Mavreya"), to cultivars with all affected fruit (e.g., "Acebuchera," "Picudo Blanco de Estepa," or "Uovo Piccione"). By applying the cluster analysis to the 308 olive cultivars, these were classified as follows: 32 cvs. HR (10.4%), 61 cvs. R (19.8%), 66 cvs. MS (21.4%), 83 cvs. S (26.9%), and 66 cvs. HS (21.4%) (**Table 1**).

The frequency distribution of cultivars according to the categories of fruit-rot incidence is skewed to the left (skew parameter k = −0.436) and had a negative kurtosis (k<sup>u</sup> = −0.694), which highlighted the prevalence of S or HS cultivars in the collection (**Figure 2**). Instead, the frequency distribution of cultivars according to the categories of branch-dieback severity showed a positive skew (k = 2.01) and a positive kurtosis (k<sup>u</sup> = 4.907) due to the fact that most of cultivars did not show this second disease syndrome since it only occurred in some cultivars that had a high fruit-rot incidence (**Figure 3**). There was also a significant correlation (Spearman rank correlation: r = 0.530, P < 0.001) between the fruit-rot incidence and the volume (%) of tree canopy affected by the pathogen for all of the cultivars. This relationship was weak albeit it improved when we only considered the cultivars that showed a fruit-rot incidence higher than 2.5 (0.71% of affected fruit). A simple exponential growth curve was well fitted to the values of volume

(%) of affected canopy over fruit-rot incidence (**Figure 4**). Even so, some cultivars that showed high values of fruit-rot incidence did not show branch dieback, such as "Bouteillan," "Grosal de Cieza," "Imperial," "Manzanilla de Almería," and "Picudo Blanco de Estepa," which had a fruit-rot incidence >9 but had no dieback symptoms. On the other hand, the cv. Salonenque olive trees showed most of their fruit affected by the pathogen (severity value = 9) and, also, a 50% branch-dieback (severity value = 3) of their canopy affected by dieback of branches.

Comparisons for each country between the frequency of resistant cultivars (HR and R categories) and the frequency of susceptible ones (MS, S, and HS categories) were conducted to determine whether the origin of the cultivars had any effect on their susceptibility. The null hypothesis in these tests was that there was no prevalence of any category of susceptibility while the alternative hypothesis was, that there was a prevalence of resistant or susceptible cultivars. Overall, there was a significant

severity caused by Colletotrichum acutatum in 308 olive cultivars in the World Olive Germplasm Bank of Córdoba (Spain). Lines represent the fitted exponential growth equation Y = 0.009 × e 0.492 and the confidential intervals at 95%.

dominance of susceptible cultivar for all of the cultivars. Conversely, when the comparisons were individually conducted among cultivars from the same country, there was no prevalence of resistant or susceptible cultivars in most cases, probably due to the low number of cultivars available. However, the susceptible cultivars were dominant in Spain and Syria while resistant ones were prevalent in Italy (**Table 4**).

The pathogen C. acutatum s. lat. was isolated from sampled leaves, shoots, and branches with dieback symptoms albeit with a relatively low frequency (<2.8%).

# Susceptibility of Cultivars in Artificial Inoculation

All inoculated cultivars developed typical anthracnose (soapyrot) with significant differences among them. None of the noninoculated fruit showed disease symptoms during the 80 days of incubation in humid chambers. The first symptoms were observed in the fruit of the cv. Ocal at 5 days after inoculation, whereas the fruit of the cv. Frantoio showed the first symptoms 23 days after inoculation. Likewise, the pathogen caused the complete rot (severity = 5) of all the fruit of the cvs. Lechín de Sevilla and Ocal on 21 days, while it needed more than 80 days to cause the completely rot of all of the fruit of the cv. Frantoio. In this latter cultivar, only the 17% of inoculated fruit had been shown anthracnose symptoms 40 days after inoculation. The SAUDPC analysis significantly separated the five inoculated cultivars (**Figure 5**). Finally, there was a significant correlation (r = 0.989, P = 0.0015) between the SAUDPC of inoculated fruit and the fruit-rot incidence in the field.

#### DISCUSSION

Here, we present the largest evaluation of olive cultivars for their resistance to anthracnose, which is considered the most



<sup>a</sup>HR = highly resistant (0–2 in the 0-0 rating scale), R = resistant (2.1–4.3), MS = intermediate (4.4–6.2), S = susceptible (6.3–7.9), and HS = highly susceptible (8–10). <sup>b</sup>P-value of the chi-square test used to determine the dominance or non-dominance of

resistant or susceptible cultivars. <sup>c</sup>R = dominance of resistant cultivars, S = dominance of susceptible cultivars. - = no dominance of resistant or susceptible cultivars.

important fruit disease of this crop (Cacciola et al., 2012; Moral et al., 2014). We evaluated the reaction of 308 well-identified cultivars growing in the WOGBC during three epidemic seasons. Among others, two important reasons for which the WOGBC is an excellent experimental plot to evaluate the resistance to C. acutatum s. lat. are: (i) the plot is located in an endemic area for anthracnose due to the proximity of the biggest river of Andalusia (Guadalquivir river); (ii) the plot has a high number of HS olive cultivars that act as an inoculum source of Colletotrichum (Moral et al., 2015). In a previous evaluation, we presented a limited number of olive cultivars based on a single observation (Moral et al., 2005) and without the correct identification of the trees using molecular markers (Trujillo et al., 2013). In the latter study, Trujillo et al. (2013) clarified numerous problems of homonymy, synonymy, and wrong denominations in the WOGBC. For example, the original cultivars Arauco, Cañivano Blanco, Carrasqueño de Lucena, and Razzola, which were planted in the WOGBC, have been identified as synonymous of the cultivars Azapa, Picholine Marocaine, Picual, and Frantoio, respectively (Trujillo et al., 2013).

Olive fruit infection occurs at all stages of its development, from flower bud emergence to ripening (Moral et al., 2009). These infections occur mainly as a result of water-splashed

conidia. Also, the anthracnose disease cycle can be influenced by the activity of olive fruit fly (B. oleae), which may increase the fruit's susceptibility by causing wounds or directly act as a spores carrier (Malacrinò et al., 2015). Fruit ripeness increases fruit susceptibility to anthracnose (Moral et al., 2008), with the unripen (or developing) fruit being very resistant to the pathogen, regardless of cultivar (Moral et al., 2008; Cacciola et al., 2012; Moral and Trapero, 2012). Therefore, cultivar resistance is evaluated by inoculating yellowish-green fruit with a spore suspension of the pathogen (Moral et al., 2008; Talhinhas et al., 2015). In any case, the use of this rating scale under field conditions to evaluate the cultivar reaction to the pathogen provides an important (>20-fold) economic saving respect to the artificial inoculation method (Moral, 2009). However, the correct evaluation of the olive cultivars under field conditions can only be conducted during epidemic seasons (Moral and Trapero, 2009), which are sporadic in Mediterranean conditions (Cacciola et al., 2012; Moral and Trapero, 2012). In our case, there were only three epidemic seasons during a period of 10 years. The epidemic intensity during the studied years (**Figure 1**) was associated with conducive weather conditions for anthracnose, mainly the annual rainfall. Thereafter, the greatest epidemic was in 1997, a wet year (888.4 mm) in Cordoba, which had been preceded by a very wet 1996 (951.1 mm). Furthermore, the epidemic years 2005 and 2006 were moderately wet with an annual rainfall of 548.8 and 560.7 mm, respectively.

In this study, the phenotypic resistance of the cultivars was very stable, as shown by the fact that the experimental error explained only around 10% of the total variance (η 2 ), while the cultivar genotype explained over a 65% of this. This fact implies a high stability of the cultivar response to the pathogen (resistance/susceptibility) during the epidemic years. The season (which includes its weather conditions) and the interaction cultivar-season had also a similar effect as that of the experimental error. We did not conduct anthracnose evaluations during non-epidemic seasons due to the almost complete absence of the disease during these years, even in the HS cultivars (Moral, 2009). In other words, anthracnose is a highly weather-dependent disease (Moral and Trapero, 2012).

As it was previously described, the cultivar reaction to Colletotrichum is a continuous variable ranging from HR to HS cultivars (Moral et al., 2015). Nevertheless, the susceptibility/resistance of the cultivars is more useful and easily understood by the farmers and agronomist if the cultivars are placed into distinct ordinal classes (Pataky et al., 2011). In this study, as in previous studies of olive diseases (Moral et al., 2005; Trapero and López-Doncel, 2005), we classified the cultivars in five categorical groups using a K-means analysis that minimized the variances. The intervals (the ranging values) of each categorical group can shift slightly depending on the number of evaluated cultivars since K-analysis uses a centroid value (average of the data of each group) for each given categorical group (Jain et al., 1999).

In general, susceptible cultivars (MS, S, and HS, total 215 cultivars) were more prevalent than resistant cultivars (R and HR, total 93 cultivars) regardless of country of origin, except for Italy. For Italian cultivars, resistance to the pathogen (C. godetiae) in the WOGB was prevalent. Since Colletotrichum spp. is endemic in north of Italy, most of these Italian cultivars could be selected by the farmers according their resistance to the pathogen (Barranco et al., 2000; Bartolini and Cerreti, 2017). The studied Italian cultivars also show a high degree of genetic homogeneity, for example, they all belong to the chlorotype group E1.1, except the S cultivar Carolea that belongs to the group E1.2 (Besnard et al., 2013). Likewise, all the Italian cultivars in the WOGBC, which belong to the genetic cluster 2 described by Trujillo et al. (2013), are resistant to anthracnose except "Cipressino."

In the extremes of the resistance/susceptibility, we found the cultivar Dolce Agogia, which showed a complete resistance to the pathogen (i.e., absence of disease symptoms), and the cultivars Acebuchera, Picudo Blanco de Estepa, and Uovo Piccione, in which the fruit-rot incidence was 100%. In this study, we described most of the evaluated cultivars as susceptible (MS, S, or HS) to the pathogen, including the species Olea ferruginea, which was moderately susceptible. Results of present study agree with previous observations for susceptible cultivars: Ascolana tenera, Barnea, Galega vulgar, Gordal sevillana, Hojiblanca, Manzanilla de Sevilla, Morisca, Ocal, Picudo, Sant Agostino, and Verdial de Badajoz; and for the resistant or moderately resistant cultivars: Bical de Castelo Branco, Coratina, Frantoio, Empeltre, Leccino, Manzanilla cacereña, Mixani, and Picual (Barranco et al., 2000; Rallo et al., 2005; Moral et al., 2015; Talhinhas et al., 2015; Bartolini and Cerreti, 2017). Conversely, the published response to anthracnose of some cultivars does not match with our current results. In our study, four of these cultivars ["Abou-Salt Mohazan," "Azapa" (syn. "Arauco"), "Blanqueta," and "Cordovil de Castelo Branco"] were more susceptible and seven of them ("Arbequina," "Cobrançosa," "Itrana," "Moraiolo," "Morrut," and "Picholine") were somewhat less susceptible than their respective previous classification (Bartolini and Cerreti, 2017).

Errors in the classification of olive cultivars according to their resistance to anthracnose have been extensively discussed by us and they are usually associated with: cultivar misidentification, effect of the ripeness time during the evaluation moment, low inoculum pressure, unfavorable environmental conditions for disease development, and confusion with other fruit-rot diseases caused by species of Alternaria, Botryosphaeria, Fusarium, or Neofabraea (Moral et al., 2008, 2014, 2015; Moral and Trapero, 2009). Furthermore, the potential interaction between Colletotrichum species (or isolate) and the olive cultivar needs a special mention. In general, this type of interaction has been described with MS and S cultivars, such as "Galega vulgar," "Cobrançosa," or "Hojiblanca." Fortunately, R or HR cultivars, such as "Blanqueta," "Picual," or "Frantoio" show a high degree of resistance to the different isolates (Xaviér, 2009; Talhinhas et al., 2015). Concomitantly, some Colletotrichum species (or isolates) tend to be weakly (e.g., C. acutatum s. str. or C. rhombiforme) or highly virulent (e. g. C. godetiae or C. nymphaeae) against a broad range of olive cultivars (Xaviér, 2009; Schena et al., 2014; Talhinhas et al., 2015).

The histogram of resistance/susceptibility of the olive cultivars according to fruit-rot incidence showed that data fit a normal distribution but with a dominance of the susceptible cultivars. Similarly, a substantial deviation from normal distribution among cultivars for different agronomic characteristics has been described, such as yield, ripening data, and oil content (Caballero et al., 2006; León et al., 2008, 2011). Although our study was not intended to address the resistance mechanisms against the pathogen, it suggests a complex and polygenic control of the resistance to Colletotrichum species. In any case, other types of genetic control, including combinations of minor and major genes, could be involved in the resistance of the olive tree to Colletotrichum species (Geffroy et al., 2000). However, the case of major genes mediating the defense against non-biotrophic pathogens is rare (Poland et al., 2009). In addition, phenolic compounds have an important role in the defense of the fruit against fungal pathogen on different crops (Prusky, 1996), including olive (Moral et al., 2015); in this case, and due to phenolic acids derivate from different metabolic pathways in the plant (Boudet, 2007), there is not a clear relationship between genes and phenolic compound. Fortunately for the olive breeding programs, crosses between resistant cultivars produce a high frequency of resistant descendants (Moral et al., 2015).

Likewise, important differences have been observed according to the severity of branch-dieback among olive cultivars. This second syndrome of olive anthracnose is associated with the production of phytotoxins by the pathogen on rotten fruit (Moral et al., 2009, 2014). In our study, both fruit-rot incidence and branch-dieback severity appear correlated albeit some of the cultivars, which showed high values of fruit-rot incidence, did not show dieback symptoms. These results suggest differences in the resistance mechanisms for both syndromes. In the experimental plot, the percentage of isolation of Colletotrichum in semiselective medium from symptomatic tissues of the olive trees were relatively low (<3%) during spring-summer and it was associated with the rainfall events. Schena et al. (2017), using a duplex qPCR, have described a high colonization of these vegetative tissues from May to October. In our conditions, we have never observed fruiting bodies (acervuli) of the pathogen on leaves or shoots under field conditions (Moral et al., 2009), although they can be induced in humid chamber after 1 month of incubation (Moral et al., 2014). For this reason, we think that these tissues have a limited role as inoculum sources in comparison with affected fruit in southern Spain (Moral and Trapero, 2012; Moral et al., 2014). In Southern Italy, the pathogen is able to infect directly leaves and shoots beside, these latter tissues can be also colonized by the pathogen through the peduncles of rotten fruit (Martelli, 1960). Furthermore, acervuli of the pathogen have been described on olive leaves in other countries, such as Australia and Italy (Martelli, 1961; Sergeeva et al., 2008). These differences could be due to variation among pathogen populations, weather conditions, or cultivar resistance.

It is worthy of note that the evaluation of the severity of branch-dieback caused by Colletotrichum sp. may mislead the inexperienced evaluator since other pathogens can cause similar symptoms. In the event of doubt, the evaluator should conduct isolation for other candidate pathogens. For example, we diagnosed Verticillium wilt in more than 20 olive trees belonging to different cultivars in the WOGBC (Morello et al., 2016).

Current trends in planting high-density orchards (which are very conducive for olive anthracnose) and reducing the use of copper-based fungicides are contrary to the high-quality oils demanded by consumers (Moral et al., 2012, 2014; Díez et al., 2016). Thereafter, the selection of less susceptible cultivars to anthracnose is essential for new plantations. The information about the resistance to olive anthracnose that is presented in this study is fundamental for farmers, technicians, and breeders.

#### CONCLUSIONS

In this paper, we evaluated the resistance of 308 olive cultivars to C. acutatum s. lat. during three epidemic seasons under field conditions. However, there is a clear predominance (69.7%) of

#### REFERENCES


susceptible cultivars (MS, S, and HS), we have also identified 32 cultivars (10.4%) highly resistant to the pathogen. The most notable cultivar was "Dolce Agogia," which did not show any anthracnose symptom during the three seasons. This work constitutes the largest evaluation of olive cultivars according to their resistant to C. acutatum to date, albeit the response of other many cultivars to the pathogen is not well-known and, thus, it should be evaluated. For future work, the methodology described in this paper should be also used to evaluate the cultivar response to other aerial fungal pathogen affecting leaves and fruit, such as V. oleaginea and Pseudocercospora cladosporioides, causal agents of peacock spot and cercosporiosis, respectively.

#### AUTHOR CONTRIBUTIONS

Conceived and designed the study: AT and JM. Performed the evaluation under field conditions: JM, JRV, CX, and LFR. Curator of the WOGBC: JC. Performed the evaluations in controlled conditions: JM. Analyzed the data: JM and AT. Wrote the paper: JM and AT.

#### ACKNOWLEDGMENTS

This research was funded by the Spanish Ministry of Education and Science (project AGL2004-7495 co-financed by the European Union FEDER Funds) and by the Andalusia Regional Government (Project P08-AGR-03635). JM holds a Marie Skłodowska Curie fellowship launched by the European Union's H2020 (contract number 658579). CX was the holder of a fellowship from the Agencia Española de Cooperación Internacional (AECI). Thanks are due to the IFAPA for the use of experimental orchards. We are especially grateful to C. del Río and A. Belaj, who have been in charge of the WOGBC, and to Francisca Luque for helping under controlled conditions. They also thank T. J. Michailides, W. J. Kaiser, and A. Gordon for critical review of the manuscript.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Moral, Xaviér, Viruega, Roca, Caballero and Trapero. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Olive Cultivation in the Southern Hemisphere: Flowering, Water Requirements and Oil Quality Responses to New Crop Environments

Mariela Torres <sup>1</sup> , Pierluigi Pierantozzi <sup>1</sup> , Peter Searles <sup>2</sup> , M. Cecilia Rousseaux <sup>2</sup> , Georgina García-Inza<sup>2</sup> , Andrea Miserere<sup>2</sup> , Romina Bodoira<sup>3</sup> , Cibeles Contreras <sup>1</sup> and Damián Maestri <sup>3</sup> \*

#### Edited by:

Marcello Mastrorilli, Consiglio per la Ricerca in Agricoltura e l'Analisi dell'Economia Agraria (CREA), Italy

#### Reviewed by:

Sergio Tombesi, Università Cattolica del Sacro Cuore, Italy George A Manganaris, Cyprus University of Technology, Cyprus

> \*Correspondence: Damián Maestri dmaestri@unc.edu.ar

#### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 12 July 2017 Accepted: 10 October 2017 Published: 27 October 2017

#### Citation:

Torres M, Pierantozzi P, Searles P, Rousseaux MC, García-Inza G, Miserere A, Bodoira R, Contreras C and Maestri D (2017) Olive Cultivation in the Southern Hemisphere: Flowering, Water Requirements and Oil Quality Responses to New Crop Environments. Front. Plant Sci. 8:1830. doi: 10.3389/fpls.2017.01830 <sup>1</sup> Estación Experimental Agropecuaria San Juan, Instituto Nacional de Tecnología Agropecuaria (Inta), CONICET, San Juan, Argentina, <sup>2</sup> Centro Regional de Investigaciones Científicas y Transferencia Tecnológica de La Rioja (CRILAR, Provincia de La Rioja, UNLaR, SEGEMAR, UNCa, CONICET), La Rioja, Argentina, <sup>3</sup> Instituto Multidisciplinario de Biología Vegetal, Consejo Nacional de Investigaciones Científicas y Técnicas, Universidad Nacional de Córdoba, Córdoba, Argentina

Olive (Olea europaea L.) is a crop well adapted to the environmental conditions prevailing in the Mediterranean Basin. Nevertheless, the increasing international demand for olive oil and table olives in the last two decades has led to expansion of olive cultivation in some countries of the southern hemisphere, notably in Argentina, Chile, Perú and Australia. While the percentage of world production represented by these countries is still low, many of the new production regions do not have typical Mediterranean climates, and some are located at subtropical latitudes where there is relatively little information about crop function. Thus, the primary objective of this review was to assess recently published scientific literature on olive cultivation in these new crop environments. The review focuses on three main aspects: (a) chilling requirements for flowering, (b) water requirements and irrigation management, and (c) environmental effects on fruit oil concentration and quality. In many arid and semiarid regions of South America, temperatures are high and rainfall is low in the winter and early spring months compared to conditions in much of the Mediterranean Basin. High temperatures have often been found to have detrimental effects on olive flowering in many olive cultivars that have been introduced to South America, and a better understanding of chilling requirements is needed. Lack of rainfall in the winter and spring also has resulted in an urgent need to evaluate water requirements from the flower differentiation period in the winter to early fruit bearing. Additionally, in some olive growing areas of South America and Australia, high early season temperatures affect the timing of phenological events such that the onset of oil synthesis occurs sooner than in the Mediterranean Basin with most oil accumulation taking place in the summer when temperatures are very high. Increasing mean daily temperatures have been demonstrated to decrease fruit oil concentration (%) and negatively affect some aspects of oil quality based on both correlative field studies and manipulative experiments. From a practical standpoint, current findings could be used as approximate tools to determine whether the temperature conditions in a proposed new growing region are appropriate for achieving sustainable oil productivity and quality.

Keywords: chilling requirements, fatty acids, irrigation, oil concentration, oil yield, water requirements, Olea europaea L.

### INTRODUCTION

The geographic origin of cultivated olive (Olea europaea L.) can be traced to areas along the eastern Mediterranean coast where Turkey, Syria, Lebanon, Palestine, and Israel are currently located. Some records indicate that olive trees have been cultivated in those areas since at least 3000 BC (Connor, 2005). Olive then spread widely around southern Europe, northern Africa, and the Iberian Peninsula. Today, approximately 98% of olives are cultivated in Mediterranean Basin countries. Spain, Italy, and Greece together produce about 77% of the world's olive oil. Portugal, Tunisia, Turkey, Morocco, Syria, and Egypt also have an important amount of production, but oil yields are low per hectare in many instances and modern processing technology is underutilized (El-Kholy, 2012).

In the last 20–30 years, interest in olive oil production and consumption has expanded olive cultivation to regions and countries outside the Mediterranean Basin such as Australia, China, India, and South America. In the southern hemisphere, the biggest hectarage for olive cultivation is located in Argentina. Until 1990, olive cultivation covered a total area of approximately 30,000 ha, most of which corresponded to small orchards (<10 ha) with traditional management (i.e., low planting density and flood irrigation). Subsequent tax exemption laws brought in large investments that included large commercial orchards (>100 ha) with higher planting densities and drip irrigation. Currently, there are about 110,000 ha under cultivation mainly in the central-western and north-western regions bordering the Andes mountain range (27–33◦ S latitude). Spanish and Italian cultivars including "Arbequina," "Manzanilla," "Picual," and "Frantoio" have been extensively planted with about 70% of production being devoted to olive oil. "Arauco" is the only cultivar recognized from Argentina in the World Catalog of Olive Varieties (IOOC, 2000).

In South America, Chile is ranked second in area planted with about 24,000 ha of olive orchards. Production is almost exclusively dedicated to olive oil, and similarly to Argentina, the most important cultivars are Spanish and Italian cultivars with "Arbequina" comprising 50% of the total production area. Other cultivars include "Frantoio," "Arbosana," "Picual," and "Leccino." "Azapa" is a local table olive cultivar with a close resemblance to "Arauco." Perú (approximately 28,000 ha of olive orchards) is not a large producer of olive oil, but table olive production has increased considerably over the last decade. Being located near the equator, the climate conditions in Perú are very different from those found in traditional olive growing regions (Ayerza and Sibbett, 2001). Uruguay and Brazil both have minor, but increasing olive production areas (10,000 and 1,300 ha, respectively), as a result of ongoing expansion projects that are mainly for oil production.

Currently, Australia has about 11 million olive trees spread across approximately 35,000 ha. Although early orchards included a large number of cultivars, about 90% of Australian olive oil is produced from common European cultivars ("Arbequina," "Frantoio," "Coratina," "Corregiola," "Manzanilla," "Picual," and "Koroneiki") and more recently from the Israeli cv. Barnea. Australian areas under olive cultivation include a wide natural diversity of environments from the most southern point of Western Australia to the northern tropical areas of Queensland. Olive production has expanded rapidly in recent years due to the adaptation of intensive and super-high density planting systems in new commercial orchards. As a result of the low rainfall and the unpredictable nature of Australian olive crop environments, almost all Australian olive orchards are irrigated (Mailer, 2012). This is also the case in Argentina because annual rainfall is most often between 100 and 400 mm (Searles et al., 2011).

At this point, it is important to bear in mind that many of the olive growing areas in the southern hemisphere have temperature and precipitation regimes that are very different from those of the Mediterranean Basin where olive trees are traditionally cultivated (**Table 1**). This reality has encouraged, or even forced, both growers and academics to seek new approaches to crop management. Although scientific studies conducted outside the Mediterranean Basin are still limited, it is important to review and synthesize the knowledge currently available on several critical topics. Lavee (2014) provided some general guidelines on olive adaptation to new environments based mostly on knowledge from the Mediterranean Basin, and concluded that there is a strong need for local research in new production areas. For these reasons, the primary objective of this review was to analyze recently published scientific literature on olive cultivation in non-Mediterranean environments in the southern hemisphere. The review focuses on three main aspects: (a) chilling requirements for flowering, (b) water requirements and irrigation management, and (c) environmental effects on fruit oil concentration and quality. The revision also contributes to identifying areas where knowledge is insufficient and to set priorities for further research.

#### Chilling Requirements for Flowering

Olive is a crop that flowers profusely and produces high olive oil yields under the prevailing climatic and agro-ecological conditions of the Mediterranean Basin with most production being confined traditionally to latitudes between 30◦ and 45◦ North. Yet, olive trees have the ability to adjust to a wide range TABLE 1 | Temperature, rainfall, and evapotranspiration (ETo) values from different olive growing areas in South America and Australia compared with those of typical Mediterranean regions in Spain, Italy, and Tunisia.


Tmax and Tmin are the average seasonal maximum and minimum temperatures (◦C), respectively. NA, not available. ETo values were calculated using the FAO Penman-Monteith method (Allen et al., 1998).

of different environments due to a number of specific biological and anatomical characteristics (Gucci and Caruso, 2011). This adaptation often leads to significant effects on several aspects of reproductive performance such as flowering, oil yield, and oil quality that vary depending on the environmental conditions (e.g., Tura et al., 2007; Lazzez et al., 2008; Temine et al., 2008; Torres et al., 2009; Di Vaio et al., 2012; Rondanini et al., 2014). Rapoport (2014) has reviewed many of the reproductive biology responses to drought and temperature in olive under extreme conditions. Thus, our aim in this section is to focus on the specific issue of chilling hours for flowering, which is of great importance in many production areas in the southern hemisphere.

Flowering is one of the major yield determinants in olive, and although olive trees are capable of producing a large number of flowers, the percentage of flowers that set fruit is usually very low with values of about 2% (Lavee et al., 1996). Flowering occurs once buds induced the previous growing season receive sufficient chilling during the winter dormancy period to end dormancy, differentiate anatomically, and accumulate warmer temperatures adequate for budburst (Rallo and Cuevas, 2008). The accumulation of chilling requirements for flowering during winter dormancy is most often referred to as vernalization, and high temperatures during the winter may adversely affect the number of chilling hours accrued (Malik and Perez, 2011). Thus, flowering and therefore fruiting may be reduced due to insufficient chilling temperatures at low latitudes (<30◦ ).

**Figure 1** shows the chronological sequence of the main phenological stages of olive cultivation in the Mediterranean Basin compared with the main growing regions in Argentina (i.e., central-western and north-western Argentina). In NW Argentina, it has been observed that fairly high winter and spring temperatures lead to earlier flowering, and eventually to earlier oil accumulation, relative to the Mediterranean Basin (Gómez del Campo et al., 2010). Early flowering was also reported in other low latitude South American production areas such as Perú (Lavee, 2014). High temperatures have been shown to result in a lack of chilling hours for flowering in some cultivars growing in NW Argentina (Aybar et al., 2015) and in Tacna, Perú (Castillo-Llanque et al., 2014). In addition, it has been observed that trees exposed to insufficient chilling temperatures and high temperature events can flower, but the flowers are of low quality and have a low set percentage. This phenomenon has been documented in olive growing areas at low latitudes where some olive varieties produce deformed floral buds and fruit (**Figure 2**). This is in accordance with previous suggestions that winter chilling is necessary not only for floral differentiation, but also for proper formation of floral buds (Rallo and Martin, 1991; De Melo-Abreu et al., 2004).

The optimum temperature regime for reproductive development of olive buds has been considered to include fluctuating temperatures from 2 to 19◦C (Denney and McEachern, 1983). However, in greenhouse experiments under controlled temperature conditions, Malik and Bradford (2009) observed maximal flowering in "Arbequina" at minimum temperatures ranging from 4.4 to 7.8◦C and flowering reductions when night temperature was maintained at 2.2◦C. Hence, these authors suggest an inhibitory effect on flowering even at temperatures that were previously considered satisfactory for reaching chilling requirements.

At a large geographical scale, the potential of new sites for olive cultivation in the Arid Chaco Region in northern Argentina was assessed through temperature regime comparisons with more established sites in central Argentina, Italy, Spain, and the USA (Ayerza and Sibbett, 2001). At these sites, the frequencies of minimum (0.0–12.5◦C) and maximum (12.5– 21.1◦C) temperatures during the winter were determined along with those of extreme cold (<0 ◦C) and heat (>37◦C) during the flowering period. Following these criteria, all Italian and Spanish olive growing sites had at least 150 days per year that were adequate for chilling hour accumulation, while some established Argentinean sites (San Juan, 31◦ 34' S; Mendoza, 32◦ 50' S) did not exceed 110 days, and all proposed new Arid Chaco sites had less than 60 days. Another distinctive feature of the climate in the Arid Chaco region during the winter season when chilling accumulation occurs is alternating periods (i.e., several days) of high and low temperatures (Ayerza and Sibbett, 2001). This reality may add an additional drawback for flower development and fruit set because temperatures above 21◦C partially reverse chilling accumulation when they occur before chilling requirements are completed (De Melo-Abreu et al., 2004; Malik and Perez, 2011).

Lavee (2014) observed that the cold Humboldt Current along the Pacific coast of Perú lowers the expected temperatures for this latitude (<20◦ S). Interestingly, both an introduced Italian cultivar ("Frantoio") and a local Peruvian cultivar ("Criolla") clearly flower, in spite of the latitude. Castillo-Llanque et al. (2014) reported that this latter cultivar shows greater and more rapid floral bud development suggesting that it may be better adapted than the Italian cultivar to the local climatic conditions. Reductions in light intensity during the winter months due to continuous cloudiness in coastal Perú have also been suggested anecdotally to compensate for insufficient chilling (Lavee, 2014).

Even from casual observations, it seems clear that olive cultivars differ in their chilling requirement. An analytical approach for assessing the potential for flowering occurrence and date in different cultivars involves simulation models based on cultivar-specific thermal requirements (De Melo-Abreu et al., 2004). One of the models proposed by De Melo-Abreu et al. (2004) predicts the date when the chilling requirement is reached as well as the full flowering date after the accumulation of warm temperatures. Recently, this model was validated over several years at eight low latitude sites in north-western Argentina (Aybar et al., 2015). In "Arbequina," normal flowering was observed at almost all sites and in all years, while normal flowering events in "Frantoio" and "Leccino" were uncommon. The results confirmed that these two latter cultivars require a very high number of chilling units in accordance with values from the World Olive Germplasm Bank of Córdoba (Spain), and winter temperatures in NW Argentina do not meet their chilling requirements for normal flowering in most years. In several tropical areas, expansion of fruit trees has been made possible by the use of growth regulators, which replace chilling requirements to overcome endodormancy. For example, hydrogen cyanamide (HC) has been used successfully in apple trees (Mohamed,

2008), grapes (Or et al., 2000), and peaches (Marodin et al., 2002). Benzyladenine (BA), a synthetic cytokinin that acts during dormancy release in apple (Bubán, 2000), is another growth regulator often used. However, BA did not affect flower initiation in the olive cv. Sevillano (Badr and Hartmann, 1972), and a more recent field trial in NW Argentina did not successfully lead to normal flowering in "Frantoio" when BA or HC were applied (Aybar, 2010).

From this section, we conclude that insufficient chilling hours during winter dormancy in many areas of South America and potentially other parts of the southern hemisphere often lead to reductions in flowering in some cultivars. Thus, cultivar-specific simulation models are recommended as approximate tools to predict whether individual cultivars will likely flower in proposed new growing regions.

#### Water Requirements and Irrigation Management

Olive has been cultivated traditionally under rainfed conditions in the Mediterranean Basin without supplemental irrigation (Connor and Fereres, 2005). Good olive oil yields can be obtained there without irrigation in growing areas where annual rainfall is greater than 600 mm (Gucci and Fereres, 2012). Although such a value may serve as a benchmark for olive cultivation in many Mediterranean countries, additional factors such as rainfall seasonality, soil water-holding capacity, and crop evapotranspiration should be carefully considered for specific growing regions.

Since the early pioneering study of Hartmann and Panetsos (1961) in California, several dozen studies have addressed the irrigation needs of olive trees growing under Mediterranean climate conditions based on the sensibility of the main olive phenological stages to water deficit (e.g., Tognetti et al., 2005; Lavee et al., 2007; Gucci et al., 2009; Rapoport et al., 2012; Gómez-del-Campo et al., 2014). These studies have mostly focused on evaluating irrigation needs under the hot dry summer conditions when rainfall is limited and provide the physiological and agronomic knowledge necessary for the application of irrigation strategies under many commercial growing conditions (Fernández, 2014). However, some recent advances in our understanding of olive tree response to irrigation have been possible due to the different climate regimes found in South America and elsewhere in the southern hemisphere.

In the Mediterranean Basin, irrigation is normally suspended during the winter months because rainfall is more than sufficient to satisfy crop evapotranspiration (ETc) under the fairly cold and cloudy conditions. The soil moisture stored during the winter also may preclude the need to irrigate in the spring during flowering and subsequent fruit set. By contrast, in many southern hemisphere climates where olive is cultivated (e.g., the subtropics of Australia and Argentina), rainfall events occur mostly in the summer with little or no winter rainfall. Somewhat greater temperatures during the winter and spring months at these latitudes compared to those of the Mediterranean also suggest that ETc should be higher. Thus, there are not irrigation experiences from Mediterranean countries that are applicable to these regions for this time of the year.

In the last several years, some studies evaluating irrigation needs during the winter have been conducted in the arid and semi-arid regions of central and North-western Argentina. A preliminary study by Rousseaux et al. (2008) examined some physiological and yield responses to the suspension of irrigation for 6–7 weeks during the winter season (July-August) in La Rioja province (NW Argentina). Soil water content decreased after

15 days in non-irrigated plots, and both pre-dawn and midday leaf water potential showed mild reductions during most of the experiment. However, this change in leaf water status was not enough to significantly decrease net photosynthetic rate, and fruit yield at harvest the following growing season only showed a modest percentage reduction (−20%). Estimates from this study suggested that little irrigation would be needed to satisfy crop water demand during the winter. Later sap flow measurements in whole trees combined with soil micro-lysimeter data during the winter also confirmed that ETc was only about 40% of reference transpiration (sensu Allen et al., 1998). Although maximum daily temperature was often around 20◦C, transpiration per unit leaf area was minimal when daily average temperature was below 13◦C (Rousseaux et al., 2009), a condition frequently observed during July-August due to low night temperatures.

collection at INTA-San Juan (31◦S, Argentina); photographs (B,C) are from semi-tropical regions of Brazil, 22◦

In another study (Pierantozzi et al., 2013), when the deficit irrigation period was extended from early winter until midspring (i.e., mid-June through October), significant reductions in both photosynthetic pigment levels and net photosynthetic rates were observed in trees grown under moderate (50% ETc) and severe (25 and 0% ETc) deficit irrigation. Vegetative shoot growth started toward the end of the winter, and was significantly reduced by deficit irrigation (Pierantozzi et al., 2014). Flowering was also delayed in the trees receiving no irrigation. This delay, as well as reductions in flowering intensity, ultimately decreased oil yield in the treatments under moderate or severe water stress compared to the well-irrigated treatments (100 or 75% ETc).

The greater responses to deficit irrigation in Pierantozzi et al. (2013, 2014) than in Rousseaux et al. (2008) are most likely explained by the duration of the deficit period. When the influence of water availability was partitioned in four periods, water deficit during winter dormancy did not affect either flowering or fruiting parameters, but deficit during inflorescence development reduced many different flowering traits and ovule development (Rapoport et al., 2012). Thus, it appears that a short and mild water deficit during the colder, winter dormancy period (Rousseaux et al., 2008) may not greatly affect reproductive responses, but intensifying water deficit from late fall through mid-spring including winter dormancy, flower differentiation, and flower opening leads to detrimental effects (Pierantozzi et al., 2013, 2014). According to these two latter studies, little irrigation (50% ETc) may be sufficient to maintain adequate plant water potentials for the coldest winter months, but high (75% ETc) or full (100% ETc) irrigation rates could be needed by mid-August, which is approximately 2 months before full flowering, to guarantee adequate oil yields in regions with dry winter and spring seasons.


Many South American olive cultivation regions in Argentina, Chile and Perú are located at subtropical latitudes (<30◦ S) where high temperatures will likely affect annual water requirements. In a field experiment in NW Argentina (28◦ 33' S, province of La Rioja, Argentina), the warm climate facilitated excessive shoot growth when very high irrigation levels were applied (Correa-Tedesco et al., 2010). Based on both vegetative growth and fruit yield, an optimal crop coefficient (Kc) value of approximately 0.70 was determined for the entire growing season in this drip-irrigated, intensively-planted orchard ("Manzanilla fina"). According to these authors, such a Kc value may be sufficient to maintain the vegetative growth at an appropriate level without adversely affecting fruit yield. Based on transpiration and soil evaporation values in the same orchard, Rousseaux et al. (2009) calculated Kc values of about 0.65–0.70 and 0.85–0.90 for either moderately or excessively irrigated olive plots, respectively. The proposed Kc value of 0.7 is similar to reported values for many areas with Mediterranean climates such as California (Goldhamer et al., 1994) and Spain (Girona et al., 2002). However, the reference ET values in NW Argentina (1,600 mm) are higher than in most of the Mediterranean Basin (1,000– 1,400 mm). These high ET values require a greater amount of total annual water in order to satisfy crop demand. Additionally, due to the low annual rainfall (100–400 mm) in this region, a very large proportion of the total water applied must be in the form of irrigation, rather than relying on rainfall.

Irrigation studies from Australia also provide partial information about olive water requirements and irrigation management (Yunusa et al., 2008; Zeleke et al., 2012; Zeleke, 2014). Estimates of soil water use from neutron probe measurements and canopy transpiration from porometer readings were obtained in four olive orchards in southern Australia (34◦ S) by Yunusa et al. (2008). Similar to NW Argentina, these orchards experienced high annual potential ET conditions (1,600 mm) with little rainfall during much of the growing season. It was observed that crop evapotranspiration in these orchards was fairly low (600 mm; Kc = 0.4) primarily because they were not irrigated during the spring when rainfall was scarce. Based on the results of Pierantozzi et al. (2013, 2014) from Argentina, yield in these orchards might be significantly increased by irrigating during the spring when flowering, fruit set, and most shoot growth occur. In an area with a more evenly distributed rainfall regime, a water balance approach indicated that annual ETc was about 700 mm, but a yearly value for reference evapotranspiration was not given (Zeleke, 2014).

As basic information concerning water requirements has started to accrue in the southern hemisphere, more sophisticated approaches are now being examined. For example, it has been suggested from studies in New Zealand and Argentina that plantbased indicators such as fluctuations in trunk diameter and stem water potential have considerable potential for programming deficit irrigation (Greven et al., 2009; Trentacoste et al., 2015; Agüero Alcaras et al., 2016). In this regard, oil yield was only slightly reduced over three seasons when irrigation was applied below a stem water potential threshold of −2.5 MPa in cv. Arbequina in central-western Argentina (Trentacoste et al., 2015). Thus, this technique appears to be promising for reducing irrigation as competition between sectors for water resources increases. In super-high density olive orchards in Chile, there are also significant advances in determining energy balance components using ground-based measurements and unmanned aerial vehicles that have direct applications for irrigation management (López-Olivari et al., 2016; Ortega-Farías et al., 2016).

To summarize this section, the differences in rainfall distribution between some southern hemisphere sites and the Mediterranean Basin have provided new knowledge about water requirements in olive trees. Unlike the Mediterranean, little rainfall during the winter and early spring occurs in many of the main olive growing areas in Argentina and Australia. Assessments of water requirements from flower differentiation to early fruit growth indicate that irrigation during the preflowering—flowering period is essential to enhance reproductive performance and oil yields in areas with a dry winter-spring season.

#### Oil Concentration and Composition

The various stages of olive fruit growth and oil synthesis occur over a prolonged total period of 5–6 months (**Figure 1**). Under the environmental conditions in the Mediterranean Basin, most oil accumulation coincides with the late summer and fall months when temperatures are decreasing from maximum summer values. In contrast, onset of the olive oil biogenesis period takes place somewhat earlier in southern hemisphere growing regions located at relatively low latitudes, and most of the oil accumulation occurs during the summer when temperatures are higher. For this reason, this section will explore the potential importance of temperature on oil concentration and composition in non-Mediterranean climate growing regions.

In the Mediterranean Basin, the dynamics of oil accumulation are considered to have sigmoidal-type curves, irrespective of cultivar, although the rates and duration of the oil synthesis period may vary for a given cultivar according to local environmental conditions (Allalout et al., 2011; Camposeo et al., 2013). A sigmoidal, S-shaped curve includes little increase in oil concentration early in fruit growth before pit hardening, followed by an extended linear period of oil concentration increase, and then a leveling off of oil concentration late in the season. Consistent with this pattern, recent studies in Argentina focused on modeling fruit oil concentration (%) in a number of cultivars from the beginning of pit hardening until harvest described bilinear relationships where oil concentration increases linearly from pit hardening until it reaches a threshold value, above which oil concentration does not increase further (Trentacoste et al., 2012; Rondanini et al., 2014). In a fairly similar manner, oil synthesis in the local cv. Arauco was low until pit hardening and then followed a saturation curve, first increasing almost linearly, and then appearing to reach a plateau 170 days after full flowering (Bodoira et al., 2015). These results tend to indicate that the overall, basic pattern of oil accumulation is not altered by the climatic conditions in the southern hemisphere.

In contrast, the amount of oil accumulated does seem to be greatly affected by high temperatures. Based on correlative field studies using data from different cultivars, years, and locations in the warm desert region of NW Argentina, Rondanini et al. (2014) found that final fruit oil concentration on a dry weight basis decreased over a mean temperature range of 23◦ - 27◦C. Additionally, a comparison of oil concentration values reported from cooler central-western Argentina (45.5–57.4%; 33◦ S latitude; Trentacoste et al., 2012) with values from NW Argentina (36.5–48.5%; 28–29◦ S; Rondanini et al., 2014) shows that final oil concentrations were lower in the warm NW region. While such apparent differences are fairly clear when considering new crop environments over significant latitudinal gradients, decreases in oil concentration have been reported even over short geographical distances within the Mediterranean Basin. In this regard, final oil concentrations averaged 18% on a fresh weight basis for the cooler coastal plain and only 14% for a warmer interior valley in Israel (Lavee et al., 2012). Maximum daily temperatures were about 5◦C greater during oil accumulation at the interior valley site.

An important approach to directly evaluate the fruit oil concentration response to high temperature has been carried out by heating or cooling fruiting branches (cv. Arauco) in transparent plastic chambers under field conditions in NW Argentina (García-Inza et al., 2014). After 4 months of treatment, this study found a negative linear relationship between oil concentration and mean daily temperature during the oil synthesis period with oil concentration decreasing 1.1% per ◦C across the range of average seasonal temperatures tested (16– 32◦C). When temperature was manipulated for shorter periods (i.e., 1 month), temperatures 7◦C above ambient resulted in a negative effect on oil concentration even at final harvest, particularly when the exposure to high temperature took place at the beginning of oil accumulation. Such controlled, experimental results provide further evidence that oil concentration is strongly influenced by temperature.

To better understand oil concentration responses to temperature, the influence of temperature on the duration of the fruit-oil filling period and on the rate of oil accumulation needs to be considered. In central-western Argentina, data from several cultivars indicated that the fruit oil-filling period was shortened by about 40 days with increasing maximum daily temperature and solar radiation (Trentacoste et al., 2012). On the other hand, fruit oil concentration was linearly related to the rate of oil accumulation in NW Argentina, but not to the duration of the oil-filling period (Rondanini et al., 2014). Thus, further studies are needed to assess the underlying mechanisms of temperature on oil accumulation.

Regarding olive oil composition, increasing evidence shows that some European olive cultivars grown in many regions of South America and Australia produce oils with different fatty acid compositions compared with those obtained from the same cultivars in their original Mediterranean Basin growing areas (Torres and Maestri, 2006; Ceci and Carelli, 2007; Torres et al., 2009; Mailer et al., 2010; Rondanini et al., 2011, 2014; Bodoira et al., 2016). In some cases, the percentages of fatty acids such as oleic acid do not even meet current International Oil Olive Council (IOOC) trade standards. This occurs despite proper agronomic practices, harvesting, and fruit processing standards.

While genotype is considered to be the major source of variability for VOO fatty acid composition (Ripa et al., 2008), the effect of environment and the genotype x environment interaction may also be significant. Many Spanish ("Arbequina," "Manzanilla") and Italian ("Frantoio," "Coratina") cultivars show consistently lower oleic acid content and higher palmitic and linoleic acid contents when grown in Argentina vs. the Mediterranean (Ceci and Carelli, 2007; Torres et al., 2009; Rondanini et al., 2011, 2014). In contrast, tree cluster analysis indicated that the fatty acid composition was fairly similar in "Picual" when grown in Argentina or in the Mediterranean region (Rondanini et al., 2011). Additionally, oleic acid (%) appears to decrease to a much greater degree in some cultivars such as "Arbequina" than in others including "Coratina."

The cv. Arbequina was first introduced to Argentina from Spain about 70 years ago. A study by Torres et al. (2009) using AFLP DNA markers has showed high genetic homogeneity in this cultivar in central Argentina compared to its original Spanish growing region. In central Argentina, oleic acid content in "Arbequina" VOOs is 10–15% lower than in Spanish oils (**Table 2**), but this difference is unlikely to be explained by a founder effect associated with a relatively low number of "Arbequina" individuals being introduced originally to Argentina. Much newer "Arbequina" orchards established in NW Argentina, which originated from cuttings introduced from Europe in the 1990s, also produce oils with much lower oleic acid contents than oils from Spain.

Recently, some studies have evaluated the dynamics of fatty acid accumulation during fruit ontogeny in olive cultivars growing in several environments in Argentina (Rondanini et al., 2014; Bodoira et al., 2015, 2016). In all cultivars tested, the oleic acid content had similar maximum values (about 70% of the total fatty acid content) at early fruit growth stages and then it generally decreased, albeit at different times and rates depending on both the olive cultivar and the environment. This could indicate that the enzymatic activities of fatty acid desaturation metabolism are influenced by genotype x environment interactions. To assess this possibility, enzymatic studies are currently being conducted using fruit collected over a wide latitudinal gradient in western Argentina to obtain relationships between enzymatic activities and the levels of oleic and linoleic acids in two cultivars ("Arbequina" and "Coratina") showing different fatty acid profiles.

Correlation studies of olive oils from the warm valleys of NW Argentina suggest that temperature during the oil synthesis period could be the main environmental factor affecting fatty acid composition of VOOs. In this region, negative relationships between oleic acid concentrations at final harvest and seasonal mean temperatures during oil synthesis have been found for the cv. Arbequina (Rondanini et al., 2011). When the dynamics of fatty acid accumulation were modeled as a function of thermal time, oils from "Arbequina" showed a significant reduction in oleic acid content with thermal time (approximately 0.8% per 100◦Cd), resulting in concentrations of less than 50% at final harvest, which coincided with a thermal time of about 3,500◦Cd (Rondanini et al., 2014).

More direct evidence of the response of oil fatty acid composition to temperature was found by enclosing fruiting branches in transparent plastic chambers during the fruit filling period for 4 months under field conditions (García-Inza et al., 2014). These authors found that fruits (cv. Arauco) developed under temperatures 5 or 10◦C warmer than the seasonal mean ambient temperature (20.6◦C) produced oils with lower oleic acid contents. Across the whole range of temperature explored, the oleic acid concentration decreased linearly 0.7% per ◦C, while palmitic, linoleic and linolenic acids percentages increased with increasing temperature. Interestingly, further study also found that oleic acid content in oils extracted from the seed and the mesocarp showed opposite responses to ambient temperature (García-Inza et al., 2016). In other words, oleic acid content increased with temperature in the seed, but it decreased in the mesocarp. The increase of oleic acid in the seed of olive fruit is consistent with the response observed in oil-seed crops, such as sunflower and soybean (Rondanini et al., 2003; Zuil et al., 2012). Detailed biochemical and molecular research is needed to understand why oleic acid content in the mesocarp decreases with temperature under field conditions. The response of fatty acid composition to temperature has not been a subject of major concern in the Mediterranean Basin. However, it may become of interest with global warming. At the very least, this issue presents a drawback for commercial olive oil production in areas that already have high temperatures.

Similar to Argentina, the wide variations in Australian olive crop environments sometimes result in oils with chemical and sensory attributes being more variable than those observed in oils produced in Mediterranean countries (Ayton et al., 2007; Mailer et al., 2010). Particularly, the fatty acid composition of "Arbequina" is notably influenced by the environmental conditions of the different Australian growing regions (**Table 2**). The oleic acid content of "Arbequina" VOOs is markedly lower in the northern warmer areas from New South Wales and Queensland (54.5% in average) than in the southern cooler region (81% in VOOs from Tasmania). Variations in oleic acid content in other olive cultivars grown in Australia are also significant and follow the same tendency, i.e., decreased oleic acid content in oils


TABLE 2 | Fatty acid composition of virgin olive oils from cv. Arbequina cultivated at different growing areas in Spain (Tous et al., 1997; Pardo et al., 2007), Argentina (Ceci and Carelli, 2007; Torres et al., 2009; Rondanini et al., 2011), and Australia (Mailer et al., 2010).

<sup>a</sup>Data from Spain and Australia are presented as a range of mean values. <sup>b</sup>Data from each olive growing region in Argentina were averaged and reported as mean values (± standard deviation). <sup>c</sup>MUFAs, monounsaturated fatty acids; PUFAs, polyunsaturated fatty acids.

from warmer climates (Mailer et al., 2010). The concentrations of the different fatty acids generally fall within the IOOC acceptable limits, but this is not always the case.

Differences in fatty acid composition attributable to geographic variations can be also found in "Arbequina" growing in Chile where expansion has led to olive oil production in regions varying widely in latitude from 18◦ S (Azapa Valley) to 36◦ S (Central Valley). Although there is no evidence of a direct effect of temperature on composition of Chilean "Arbequina" olive oils, Portilla et al. (2014) reported lower oleic acid contents in oils from the most northern latitudes compared with those from the most southern ones (57 and 76% in average, respectively). Interestingly, Brazil and Uruguay are beginning to cultivate olives and to produce VOOs. Brazilian olive production is carried out mainly in the State of Minas Gerais in southeastern Brazil at subtropical latitudes (22–23◦ S), in an area where the mean annual temperature is about 19◦C and fluctuates between 12◦C (minimum) and 26◦C (maximum), with mean annual rainfall being approximately 1,300 mm. Under these mild temperature conditions, the oleic acid content of olive oils from 11 cultivars with different origins was found to be in the range 70.8–84.3%, and concentrations of all individual fatty acids were within the IOOC standards for VOOs (Ballus et al., 2014).

Overall, this section indicates that olive oil from regions with warmer temperatures often has lower fruit oil concentration (%) and oleic acid content than from regions with more moderate temperatures. In addition, the reductions in oleic acid appear to be cultivar-specific. This suggests that a genotype x environment interaction is likely important in olive oil quality responses to temperature. Attempts to modify oleic acid content with agricultural practices such as irrigation management have so far been unsuccessful (e.g., Berenguer et al., 2006; Vita Serman et al., 2011; Caruso et al., 2014). Only very subtle responses to irrigation level have been observed. This suggests that cultivar selection and potentially breeding will be of significant importance in obtaining olive oil with high oleic acid content in warm areas.

### CONCLUSIONS AND FURTHER RESEARCH

Increasing global demand for olive oil has expanded olive cultivation to new growing areas in the southern hemisphere. These new crop environments often do not have typical Mediterranean climates, and some of them are in the subtropics where the response of the crop is relatively unknown. Based on the results of recently published studies, this review has highlighted: (1) the occurrence of insufficient chilling hours for flowering during winter dormancy in some high chilling requirement cultivars, such as "Frantoio" and "Leccino" in specific areas; (2) the lack of winter and spring rainfall in parts of Argentina and Australia illustrate the importance of rainfall distribution and indicate that some amount of irrigation is likely needed throughout the entire year to avoid declines in oil yield in some areas; and (3) reductions in oil concentration and oleic acid content in warm areas emphasize what may be expected for cooler regions such as the Mediterranean with global warming.

With respect to selecting specific cultivars for new southern hemisphere environments, the cv. Arbequina, which is the most common cultivar worldwide in modern super-high density orchards, has been shown to flower consistently even in warm subtropical regions, but its oil concentration and oleic acid content are often much lower than when grown in the Mediterranean. Other cultivars have also been shown to have positive and negative attributes in these new environments. Nevertheless, cultivar-specific simulation models are recommended as approximate tools to predict whether individual cultivars will likely flower in proposed new growing areas.

Temperature has emerged as a key variable considering the geographic variability found in the southern hemisphere. A critical aspect of future research may be the response of olive trees to temperature from the biochemical-molecular level to the whole-plant level. It will be important to take into account the considerable genetic variability in olive trees and the apparent genotype x environment interactions that exist for some aspects of olive quality. Thus, the use of many cultivars in studies would be desirable when practical. Lastly, basic studies are not yet available for many growing regions in the southern hemisphere. Such information would enhance our overall understanding of olive cultivation, and reduce the necessity to extrapolate from only a few regions.

### AUTHOR CONTRIBUTIONS

MT, PP, PS, MR, GG, and DM contributed substantially to the conception and design of the review; MT, PP, PS, MR, and DM

#### REFERENCES


drafted the text; MT, PP, PS, MR, GG, AM, RB, CC, and DM approved the version to be published; MT, PP, PS, MR, and DM agreed to be accountable for all aspects of the work.

#### FUNDING

This research was supported by grants from the Ministerio de Ciencia, Tecnología e Innovación Productiva de Argentina (ANPCyT, PICT2015 0195) and CONICET (PUE 2016 22920160100125 and PIP 2014-16 N◦ 542).


involve the SNF-like protein kinase GDBRPK. Plant Molec. Biol. 43, 483–494. doi: 10.1023/A:1006450516982


olive oil: influence of geographical origin. Food Chem. 110, 368–374. doi: 10.1016/j.foodchem.2008.02.012


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Torres, Pierantozzi, Searles, Rousseaux, García-Inza, Miserere, Bodoira, Contreras and Maestri. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genomic Prediction of Sunflower Hybrids Oil Content

Brigitte Mangin<sup>1</sup> \*, Fanny Bonnafous <sup>1</sup> , Nicolas Blanchet <sup>1</sup> , Marie-Claude Boniface<sup>1</sup> , Emmanuelle Bret-Mestries <sup>2</sup> , Sébastien Carrère<sup>1</sup> , Ludovic Cottret <sup>1</sup> , Ludovic Legrand<sup>1</sup> , Gwenola Marage<sup>1</sup> , Prune Pegot-Espagnet <sup>1</sup> , Stéphane Munos <sup>1</sup> , Nicolas Pouilly <sup>1</sup> , Felicity Vear <sup>3</sup> , Patrick Vincourt <sup>1</sup> and Nicolas B. Langlade<sup>1</sup>

<sup>1</sup> LIPM, Université de Toulouse, INRA, Centre National de la Recherche Scientifique, Castanet-Tolosan, France, <sup>2</sup> Terres Inovia, AGIR, Castanet-Tolosan, France, <sup>3</sup> GDEC, INRA, Université Clermont II Blaise Pascal, Clermont-Ferrand, France

#### Edited by:

Leire Molinero-Ruiz, Instituto de Agricultura Sostenible (CSIC), Spain

#### Reviewed by:

Miguel Perez-Enciso, Universitat Autònoma de Barcelona, Spain Delin Hong, Nanjing Agricultural University, China

> \*Correspondence: Brigitte Mangin brigitte.mangin@inra.fr

#### Specialty section:

This article was submitted to Crop Science and Horticulture, a section of the journal Frontiers in Plant Science

Received: 28 June 2017 Accepted: 06 September 2017 Published: 21 September 2017

#### Citation:

Mangin B, Bonnafous F, Blanchet N, Boniface M-C, Bret-Mestries E, Carrère S, Cottret L, Legrand L, Marage G, Pegot-Espagnet P, Munos S, Pouilly N, Vear F, Vincourt P and Langlade NB (2017) Genomic Prediction of Sunflower Hybrids Oil Content. Front. Plant Sci. 8:1633. doi: 10.3389/fpls.2017.01633 Prediction of hybrid performance using incomplete factorial mating designs is widely used in breeding programs including different heterotic groups. Based on the general combining ability (GCA) of the parents, predictions are accurate only if the genetic variance resulting from the specific combining ability is small and both parents have phenotyped descendants. Genomic selection (GS) can predict performance using a model trained on both phenotyped and genotyped hybrids that do not necessarily include all hybrid parents. Therefore, GS could overcome the issue of unknown parent GCA. Here, we compared the accuracy of classical GCA-based and genomic predictions for oil content of sunflower seeds using several GS models. Our study involved 452 sunflower hybrids from an incomplete factorial design of 36 female and 36 male lines. Re-sequencing of parental lines allowed to identify 468,194 non-redundant SNPs and to infer the hybrid genotypes. Oil content was observed in a multi-environment trial (MET) over 3 years, leading to nine different environments. We compared GCA-based model to different GS models including female and male genomic kinships with the addition of the female-by-male interaction genomic kinship, the use of functional knowledge as SNPs in genes of oil metabolic pathways, and with epistasis modeling. When both parents have descendants in the training set, the predictive ability was high even for GCA-based prediction, with an average MET value of 0.782. GS performed slightly better (+0.2%). Neither the inclusion of the female-by-male interaction, nor functional knowledge of oil metabolism, nor epistasis modeling improved the GS accuracy. GS greatly improved predictive ability when one or both parents were untested in the training set, increasing GCA-based predictive ability by 10.4% from 0.575 to 0.635 in the MET. In this scenario, performing GS only considering SNPs in oil metabolic pathways did not improve whole genome GS prediction but increased GCA-based prediction ability by 6.4%. Our results show that GS is a major improvement to breeding efficiency compared to the classical GCA modeling when either one or both parents are not well-characterized. This finding could therefore accelerate breeding through reducing phenotyping efforts and more effectively targeting for the most promising crosses.

Keywords: genomic selection, factorial design, sunflower, oil content, hybrid, GBS

# 1. INTRODUCTION

Sunflower is one of the main oilseed crops worldwide. Although this crop was domesticated in North America, the sunflower was developed as a major crop in Russia in the first half of the twentieth century, when the breeding programs of V.S. Pustovoit increased the seed oil content from 25–30 to 45–50%. This success largely reflected the high heritability of this key breeding trait. Based on two segregating populations, involving wild-type and improved germplasms, Fick (1975) provided the first estimation of the narrow-sense heritability of seed oil content as 0.52–0.61, suggesting that the contribution of genetic additive variance is prominent. As the seed oil content can now be rapidly and inexpensively measured using nuclear magnetic resonance (NMR), selection can be performed in segregating progenies and on a single plant basis, from the F2 generation onwards. Vear et al. (2010) indicated that hybrids generally show heterosis for oil content, which is not typically the case when the parents contain approximately 50% oil.

Mapping quantitative trait locus (QTL) for oil content was initiated more than 20 years ago (Leon et al., 1995), providing congruent results across the segregating populations involved (Mestries et al., 1998; Bert et al., 2002; Bachlava et al., 2010; Merah et al., 2012) and confirming both the quantitative nature of the trait (several loci) and its high heritability (mapped QTLs accounted for 10 to 51% of the phenotypic variability). The high level of heritability for oil content suggests that this trait is an easy character to breed for, and the absence of important interactions with environmental conditions makes it feasible to obtain valid general conclusions as to the interest of a genotype for this character based on a small number of measurements under different conditions. Thus, there is no direct requirement for genomic studies to replace phenotypic measurements. However, because robust oil content data are easy to obtain, oil content is a good model trait to test the power of genomic selection models prior to applying these models to explore more complex characters, such as seed yield or quantitative resistance to fungal diseases.

Genomic prediction refers to the prediction of genetic value based on markers spread throughout the entire genome. In this framework, a mathematical model is trained on past genotyped and phenotyped resources, and new unobserved individuals who are genotyped but not phenotyped are predicted with this learned model. Among the different models since the work of Meuwissen et al. (2001), the mixed model and the genome-wide best linear prediction (GBLUP) of unobserved individuals proposed by VanRaden (2008) is the most popular model. Originally, the mixed model of Meuwissen et al. (2001) assumes that the haplotype effects of all genomic regions follow the same Gaussian distribution. When limiting each genomic region to a single marker, this model is known as the ridge regression BLUP (RR-BLUP). The RR-BLUP and GBLUP models are equivalent models (Endelman, 2011) and Goddard (2009) showed that they are similar to the classical pedigree mixed model when relatedness between individuals are estimated with markers. Mixed models and BLUP have been comprehensively compared to other methods of genomic prediction as penalized regressions (Li and Sillanpää, 2012, for a review), Bayesian modeling (Kärkkäinen and Sillanpää, 2012, for a review), semiparametric learners as the reproducing kernel Hilbert space (RKHS) (Gianola et al., 2006) and non-parametric methods, such as random forest (Chen and Ishwaran, 2012). Depending on the trait studied, one or the other of these methods was demonstrated as more reliable, but the best performers provided comparable accuracies (Heslot et al., 2012; Haws et al., 2015). The mixed model framework has consistently produced comparative results to those obtained with more complicated models. The simplicity, efficient computer implementation and flexibility of this model have meant that most novel modeling ideas have been based on this framework.

As previously described, in GBLUP modeling, effects of genetic markers are assumed to follow the same Gaussian distribution. This unrealistic assumption does not consider the biological mechanisms underlying phenotypic variation. Speed and Balding (2014) proposed an extension of GBLUP, called MultiBLUP, to include multiple random effects allocated to different sets of SNP markers. The close variants are grouped and the relatedness of random genetic effects is determined for each set using a similarity matrix calculated using the SNPs of the region of interest, thus modeling a different effect-size distribution for each set. Using MultiBLUP, Wolfe et al. (2016) predicted disease resistance in cassava. Delimiting the genome to a region representing between 30 and 66% of the genetic resistance and using the remaining SNPs facilitated an increase in the precision of prediction from 0.53 to 0.58 compared to GBLUP. Similarly, Sarup et al. (2016) using genomic feature BLUP, which is equivalent to MultiBLUP, asserted using several porcine traits that MultiBLUP prediction accuracy is better than GBLUP when the set of SNPs linked to previously known QTLs explained more than 10% of trait variability. Other methods for integrating information a priori have also been tested. Zhang et al. (2014) proposed the consideration of QTLs based on assigning a predefined weight to the region of interest in the relatedness. These QTLs could also be included as fixed effects (Bernardo, 2014; Spindel et al., 2016). Prediction accuracy increases up to 30% depending on the trait when SNPs are derived from a GWAS performed with the data used to train the genomic model (Spindel et al., 2016). However, the integration of SNPs from the literature does not show the same ability to improve on the model accuracy. MultiBLUP is currently included in the framework of multi-kernel mixed models (Weissbrod et al., 2016), as are included linear mixed models for complex trait architecture (dominance and epistasis) (de los Campos et al., 2009).

Mixed models for hybrid predictions based on GCA and/or specific combining ability (SCA) have long been applied prior to the use of genetic markers. In maize, Bernardo (1996) enhanced this old model by proposing the pedigree BLUP model, which uses co-ancestry coefficients between parents of hybrids. First attempt to estimate these co-ancestry coefficients using molecular markers was proposed in Schrag et al. (2006) and further generalized by Technow et al. (2014) using whole genomic data. In sunflower, Reif et al. (2013) did not observe any improvement of genomic BLUP compared to the pedigree BLUP. Equality between the two approaches was consistent with the work of Goddard (2009), as Reif et al. (2013) estimated co-ancestry coefficients using the same markers included in the genomic BLUP, so the two predictions are equivalent. In contrast to Reif et al. (2013), we want to compare prediction accuracy of hybrid genetic values using a classical mixed model that makes use of only pedigree information, if available, to other mixed models that use genomic data to compute relatedness between hybrids and parents. We make this comparison using seed oil content phenotypes observed in an incomplete factorial design produced in the course of the SUNRISE project.

#### 2. MATERIALS AND METHODS

#### 2.1. Plant Materials

Hybrids were obtained as an incomplete factorial design by crossing 36 maintainer lines with 36 restorer lines. The complete hybrid panel contained 492 hybrids. These plants were sown in 11 different environments (5 different environments in 2013, 3 different environments in 2014, and 3 different environments in 2015) (Bonnafous et al., 2017), but for the present study of oil content, we discarded 2 environments due to imperfect randomizations and inaccurate phenotypic observations.

The parents were genotyped by sequencing using the XRQ genome as the reference parent, and their genotypes were imputed by chromosome using Beagle (Browning and Browning, 2009) as described in Badouin et al. (2017). SNPs that were not polymorphic in either the maintainer or the restorer panels were discarded, and a single referent SNP was maintained, representing each set of redundant SNPs (i.e., SNPs in complete linkage disequilibrium in the 72 parent panel). Finally, the genomic data comprised 468,194 non-redundant SNPs, and hybrid genotypes were deduced from the parent genotypes.

Measurement of oil seed content was observed by NMR using a minispec (MQ10H, mq Series, version 1.2, January 2000, Bruker, Germany). Each 20-ml seed sample was first dried for 24 h at 80◦C and subsequently analyzed at room temperature.

Genes related to oil metabolism have been identified through the metabolic network reconstruction of the genome annotation of the sunflower (Badouin et al., 2017). The oil metabolism super-pathway has been manually constructed from several inferred metabolic pathways. Relations between genes and reactions were automatically inferred and curated based on the literature. Further details are provided in the on-line materials Badouin et al. (2017), and the examined genes are listed in the Supplementary material (Data sheet S1). An interactive view of the pathway showing with the gene/reaction links is available at https://pathway-tools.toulouse.inra.fr/HANXRQ/NEW-IMAGE?type=PATHWAY&object=PWY198A-2 We considered all SNPs identified in the genes listed above, and we added all the SNPs located 1,000 bases upstream and downstream of these genes.

#### 2.2. Predictions of Hybrid Performances

The phenotypes were initially adjusted using a spatial model, including the line and column numbers in the field, the repetition when necessary and the genotype status (check variety or hybrid) as fixed factors, and a random independent effect modeling the genotypic value of observed individuals completed the model as described in Bonnafous et al. (2017).

Predictions of hybrid performance were computed based on BLUP using several linear mixed models within each environment. Variance components of linear models were estimated using restricted maximum likelihood (REML) with the ASReml-R package (Butler et al., 2007). The models are similar to the progeny models described in Bouvet et al. (2016).

#### 2.2.1. GCA-Based Prediction

The hybrid genetic value of the fm hybrid was predicted using GCA <sup>d</sup> <sup>f</sup> <sup>+</sup> GCA <sup>d</sup> <sup>m</sup>, where <sup>f</sup> denoted the female line and <sup>m</sup> the male line. GCA BLUPs were obtained using the following model:

$$\chi\_{fm} = \mu + \text{GCA}\_f + \text{GCA}\_m + \epsilon\_{fm} \quad \text{(GCA model)} \tag{1}$$

where yfm is the adjusted phenotype in an environment, µ is the mean, GCA<sup>f</sup> and GCA<sup>m</sup> are the random GCA effects of female f and male m, respectively, and ǫfm denotes error. All random effect are assumed Gaussian and independent with σ 2 GCA<sup>f</sup> , σ 2 GCAm , and σ 2 ǫ for the GCA female, GCA male, and residual variances, respectively. When the parent pedigree is known, the relatedness of parents can be included in the variances of GCA random effects using a coancestry coefficient matrix in this model. However, the pedigree of the parental lines was considered to have too much uncertainty to account for using this analysis. Moreover, parents of the factorial design were chosen to be as unrelated to provide a good representation of the core collection studied in Cadic et al. (2013). Therefore, these parental lines are assumed independent.

#### 2.2.2. FM and FMI Model Predictions

The hybrid genetic value of the fm hybrid was predicted using BLUPs of <sup>b</sup>F<sup>f</sup> <sup>+</sup> <sup>M</sup>b<sup>m</sup> in the FM model and <sup>b</sup>F<sup>f</sup> <sup>+</sup> <sup>M</sup>b<sup>m</sup> <sup>+</sup>bIfm in the FMI model.

$$\gamma\_{fm} = \mu + F\_f + M\_m + \epsilon\_{fm} \quad \text{(FM model)} \tag{2}$$

$$\chi\_{\!m\!m} = \mu + F\_{\!\!f} + M\_m + I\_{\!\!f m} + \epsilon\_{\!\!fm} \quad \text{(FMI model)} \tag{3}$$

where F<sup>f</sup> , Mm, and Ifm are the random effects of female f and male m lines and their interactions, respectively, and ǫfm denotes error. Let **F**, **M**, **I**, and ǫ denote vectors of female, male, interaction and error residual effects, respectively. **F** ∼ N(0, σ 2 f **K<sup>f</sup>** ), **M** ∼ N(0, σ 2 <sup>m</sup>**Km**), <sup>I</sup> <sup>∼</sup> <sup>N</sup>(0, <sup>σ</sup> 2 fm**Kfm**), <sup>ǫ</sup> <sup>∼</sup> <sup>N</sup>(0, <sup>σ</sup> 2 ǫ **Id**) where **K<sup>f</sup>** is the kinship matrix for females; **K<sup>m</sup>** is the kinship matrix for males; **Kfm** is the kinship matrix for the interaction between males and females; and σ 2 f , σ 2 <sup>m</sup>, <sup>σ</sup> 2 fm and <sup>σ</sup> 2 ǫ are female, male, female by male interaction and residual variance, respectively.

**K<sup>f</sup>** = **X<sup>f</sup> X** ′ **f** , **K<sup>m</sup>** = **ZmZ** ′ **<sup>m</sup>**, and **<sup>K</sup>fm** <sup>=</sup> **<sup>W</sup>fmW**′ **fm** with **Wfm** as the Hadamard product between **X<sup>f</sup>** and **Zm**, where **X<sup>f</sup>** is the vector of x l f , the centered (0 or 1) allele transmitted by female f at the lth marker locus, and **Z<sup>m</sup>** is the vector of z l <sup>m</sup>, the centered (0 or 1) allele transmitted by male m at the lth marker locus.

Note that the GCA model and the FM model differ only by the assumptions made on the variance-covariance of the random parental effects. The variance-covariance matrix of these parental effects was proportional to the identity matrix in the GCA model when it was computed using markers in the FM model. Both models predict the parental GCA and the hybrid prediction is the sum of predicted parental GCA.

We performed two FM models: (i) in one model, the parental design matrices (X<sup>f</sup> and Xm) were computed including 468,194 genome SNPs, (ii) in the other model, these matrices included only a pre-selected set of SNPs in genes previously demonstrated as involved in the oil content metabolism network.

#### 2.2.3. Multi-Kernel Model Predictions

MultiBLUP was proposed by Speed and Balding (2014) in trait additive modeling. This model was further extended to consider more complex trait architecture, such as epistasis, and this model was included in the general and highly flexible framework of the multi-kernel model (Weissbrod et al., 2016). In the simplest linear additive form of Speed and Balding (2014), these models comprise several additive random effects, each with its own kinship (linear kernel) and variance. These models can easily be generalized to FM or FMI models by modeling several groups of parental random factors, each group having its own kinship and variance. The hybrid genetic value is subsequently predicted based on the sum of the BLUP values for the female and male effects in different groups in the FM model, as an example.

We performed two multi-kernel BLUP models using female and male SNP allelic effects. One prediction is the generalization of MultiBLUP to FM model using two SNP groups, with the SNPs in genes or close to genes involved in the oil content metabolism network in one group, and all remaining SNPs in the other group.

The other multi-kernel model adds to female and male kinships, two epistasis parental kinships computed using the Hadamard product **K<sup>f</sup>** <sup>∗</sup>**K<sup>f</sup>** and **K** ∗ **<sup>m</sup>K<sup>m</sup>** for the femalexfemale and malexmale epistasis kinship, respectively. This model is a generalization of additivexadditive epistasis modeling proposed by Su et al. (2012) to FM model. The Su et al. (2012) epistasis modeling was demonstrated to explicitly model all pairwise additivexadditive SNP interactions by Jiang and Reif (2015) and is similar to the model of Bouvet et al. (2016).

#### 2.2.4. Predictive Ability of Hybrid Performances

Predictive ability or phenotypic accuracy of predictions was based on the Pearson's correlation between the observed phenotypes and their predicted values for hybrids that were not used to train the models, the so-called test individuals or out-ofpopulation hybrids. This accuracy was computed as the mean of 100 test sets. We used two sampling schemes: a random draw of 10% of the hybrids or a random draw of 10% of the parent lines for which all observed descendants were included in the test set. This latter sampling enables the generation of test sets comprising only T1 or T0 hybrids, consistent with Technow et al. (2014), i.e., out-of-population samples with parents never observed through hybrid progeny. As for the SUNRISE incomplete factorial design, all parents had a nearly equal number of descendants, this sampling scheme generated approximately 10% of hybrids.

# 3. RESULTS

**Figure 1** presents the SUNRISE incomplete factorial design with females and males arranged according to a hierarchical clustering based on VanRaden's kinship matrices (VanRaden, 2008). As part of the sunflower elite collection studied by Cadic et al. (2013), male parents are restorers of the CMS PET1 cytoplasmic male sterility, [R-lines] for which female parents are maintainers, [Blines]. Male parents seem slightly more structured and related than female parents, consistent with the findings of Cadic et al. (2013), who distinguished two main subgroups in the B-germplasm among the core collection. The factorial design was completely connected and almost balanced, as the parents were involved in nearly an equal number of crosses. Among the 492 hybrids generated, 486 hybrids were observed in the MET at least once for the oil content phenotype; thus, for this trait, parents were observed for a minimum of 12 to a maximum of 15 descendants (number of observed hybrids per parents in the MET are detailed in Supplementary Material, Tables S1, S2).

The oil content-adjusted phenotype of hybrids varied from 31.7 to 59.0% on the MET (see histograms in Supplementary Material, Figure S1). Hybrid-adjusted phenotypes were positively and significantly correlated between environments (**Figure 2**) with a minimum of 0.47, a maximum of 0.77 and average of 0.64. Intra-year correlations were slightly higher than between-year correlations, and environments observed in 2014 (14EX04, 14RV01) were less correlated with the two other year environments (13EX01, 13EX03, 13EX04 and 13EX05 sown in 2013, and 15EX05, 15EX06, and 15EX07 sown in 2015).

Using the three principal models of prediction (GCA, FM and FMI with all SNPs), we compared the REML variance components and their part of variance (**Table 1**). Female and male parts of variance were stable in the MET, despite visible differences in variance component values, particularly the environment 15EX07, implanted in Romany with a wider interrow spacing (0.7 m) than the other environments (0.5 to 0.6 m). A decrease in the female part of variance and an increase in the male part of variance were observed in all environments using the correction based on the genomic relatedness of parents performed in the FM and FMI models . A significant female × male interaction (z-ratio equal to 2.71) was observed in a single environment (13EX05), and when included in the model, residual error was divided by 2. In all environments and for all models, the female part of genetic variance was superior to that of the male counterpart, showing roughly a ratio of (3/2) in favor of the female parent in the inheritance of hybrid genetic value for oil content.

The three models described above were compared for their ability to predict unobserved hybrid genetic values on the same test sets (**Table 2**). Two sampling processes were experimented to estimate the reliability of GS either to complete the factorial design by predicting missing hybrids or to predict hybrids for which one or both parents were never observed by a descendant in the factorial design (the so-called T0 and T1 hybrids, (Technow et al., 2014)). The predictive ability of GS is high for oil content on the MET (0.783 in average for

FM model) when the goal is to predict missing hybrids. The three models were nearly equally accurate, with only a 0.2% increase between the GCA model (the worse) and FM model (the best). The FMI model performed slightly better than the FM model in two environments (13EX05 and 14RV01), and these two environments had the greatest estimates of female × male interaction variances. The predictive ability is lower when the goal is to predict T0 or T1 hybrids with an average of 0.635 as the best performer (FM model). Once again, the GCA model was the least accurate model (0.575 in average), showing a 10% decrease in predictive ability compared to the FM model. The ranking between the FM and FMI models was similar to the previous sampling schema.

Having observed that all methods are equally accurate to predict the missing hybrids of the factorial design, we focused on the prediction of T0 and T1 hybrids. Moreover, we made a prediction without considering the female × male interaction, as this interaction did not improve the accuracy and was CPUtime consuming. We attempted to improve the FM model by considering the genes involved in the oil metabolic pathway. Three hundred and seventy-two genes located throughout all chromosomes, having 3,746 non-redundant SNPs inside or 1,000 bp upstream and downstream, were considered (see details in Supplementary Material, Table S3). Our first attempt was to compute the female and male kinships involved in the FM model by considering only the 3,746 pre-selected SNPs. We named this

model the FM\_oil model. Boxplots of GCA, FM and FM\_oil prediction accuracies for 100 random test sets of T0 and T1 hybrids are presented in **Figure 3**. FM\_oil predictions were more accurate than GCA predictions, but the FM model was still the best, not only in mean, but it also showed a less variability in test set accuracies (**Table 2**).

By limiting the computation of parent relatedness to preselected oil SNPs, the FM\_oil model is simplified and assumes that all important causal genes explaining oil content variability are already included in the considered metabolic pathway. To avoid an over-simplified assumption, we performed a multikernel model with two kinships for each parental effect, generated using pre-selected oil SNPs for one group and all remaining SNPs for the other group. This model assumes a different variance for each group of SNPs and each parental effect, leading to a more flexible model. With an average predictive ability in the MET of 0.628, this multi-kernel model slightly improved the average predictive ability of 0.612 for the FM\_oil model but did not reach that of 0.635 for the FM model (**Table 3**). The FM model assumes that no interaction occurs between SNPs, neglecting the epistasis phenomena. We performed a multi-kernel BLUP model considering both the femalexfemale and the malexmale parts of the epistasis as a generalization of the additivexadditive epistasis modeling proposed by Su et al. (2012). With an average predictive ability of 0.623, this model did not improve the FM BLUPs (**Table 3**).

Having access to genetic value prediction of all hybrids in each environment of the MET with a high level of accuracy (0.783 in average for the FM BLUPs) facilitates selection of the best hybrids on average and affords an opportunity to examine their stability across environments. The distribution of hybrid mean predicted performance on the MET is shown in **Figure 4**. The least productive hybrid was predicted with a mean performance of 38.8%, and the most productive hybrid was predicted with a


TABLE 1 | Number of observed hybrids (n.obs), mean oil content (in %), variance components and parts of variance (in %) [female, male, interaction female × male (inter.) and residual (resi.)] estimated using REML in GCA, FM, and FMI models, per environment (Env.).

TABLE 2 | Predictive ability of hybrid performances per environment (Env.) and average on the MET with GCA, FM, and FMI model BLUPs as the mean over the same 100 test sets (TS) using two sampling processes.


T1 and T0 hybrids are hybrids for which one or both parents have no observed descendant in the training set.

mean performance of 48.8% of seed oil content. Approximately 10% of hybrids had a predicted mean performance greater than 47%. To examine the stability of hybrids across environments, we computed the Wricke's ecovalence stability index (Wricke, 1962) using the hybrid predicted performances. This stability index measures how the hybrid predicted performances vary from an environment to an other. **Figure 5** is a heat map representation of the mean predicted performance of hybrid with hybrids having a Wricke's ecovalence stability index (Wricke, 1962) less than 5, highlighted as a blank square. Hybrids predicted as producing a high oil content on average are generally not stable, only a single hybrid is predicted as stable in the right top corner of the heat map, its predicted mean performance and its Wricke's ecovalence were 48.3% and 4.83, respectively.

### 4. DISCUSSION

As a starting point to evaluate the benefits of GS, in the present study, we compared the accuracy of hybrid performance predictions for seed oil content, a highly heritable breeding trait in sunflower. The simplest GCA-based model was compared with different genomic multi-kernel linear mixed models. We showed that the GCA-based model, ignoring parental pedigrees, is globally as accurate as more complex models to predict the oil content of unobserved sunflower hybrids in an incomplete factorial design where 36 maintainer lines (CMS form) were crossed with 36 restorer lines. This result reflects three main factors: (i) the accurate knowledge of the parental GCAs estimated in each environment from an average of at least 7 hybrid combinations, (ii) the strong additive effect of oil content in the MET, and (iii) the genetic distance between parents selected as unrelated to provide a good representation of the core collection studied in Cadic et al. (2013). However, there is an advantage to GS prediction (10% increase in accuracy) for hybrids of untested parents.

Hybrids from untested parents are more distant from those observed than random missing combinations in the incomplete factorial design. Indeed, Hayes et al. (2009); Clark et al. (2012) indicated that it is more challenging to predict the values of unrelated genotypes and suggested that, in such situations, genomic predictions are more accurate than classical pedigree predictions.

GCA-based or GS predictions of missing hybrid performances is accurate in the MET (predictive ability of 0.78 on average), but with much less accuracy compared with Reif et al. (2013) (predictive ability of 0.97 by a leave-one-out hybrid cross validation). These two values are, in fact, not comparable as Reif et al. (2013) predicted the hybrid mean performances on the MET, whereas we predicted intra-environment hybrid performances. It is simpler to predict the mean performance compared with intra-environment performance, as the latter depends on the genetic by environment interaction, and therefore is more variable and less heritable. The lower a trait is heritable, the lower the GS predictive ability. However, intra-environment predictions are essential to access hybrid stability. Heffner et al. (2009) highlighted that GS is an important tool to address the challenge of genetic by environment interaction. Moreover, the lower the trait is heritable, the greater the prediction improvement expected from GS.

FM and FMI models differ by an interaction term that models parental allelic interaction or dominance. These models generally showed similar levels of accuracy in predicting untested hybrids or hybrids between untested parents. When their accuracies differed in one environment, a significant variance of the parental allelic interaction was observed, suggesting that only factors with sufficient variability could increase the accuracy of models including dominance compared to additive models. Moreover, a systematic small decrease of the FMI model accuracy compared to the FM model was observed when the variance component of

TABLE 3 | Predictive ability of hybrid performances (mean over of 100 test sets and its variance) per environment (Env.) and in average on the MET with GCA, FM, FM\_oil, mk\_oil (multi-kernel FM model with two groups of SNPs) and mk\_epi (multi-kernel model with female, male, female × female epistasis and male × male epistasis kernels) model BLUPs.


The model BLUPs were computed on the same 100 test sets. The test sets contained only T1 or T0 hybrids with parents never observed by their descendants.

this interaction was estimated as zero. The benefit of the inclusion of non-additive effects in an additive GS model is still subject to debate. Using simulations, Toro and Varona (2010) observed that inclusion of the dominance effect never decreased genetic gain in first generation selection in animal breeding programs whatever the ratio between additive and non-additive parts. Similarly, in a pig population, although Heidaritabar et al. (2016) showed no impact of dominance modeling in GS model accuracy for traits with a small ratio between additive and dominance, these authors did not observe any drawback. In contrast to these studies, the results of the present study are consistent with those of Reif et al. (2013) who observed small decreases in accuracy when dominance effects were included, depending on the traits and the intra or inter [B/R] group crosses. The significance of this decrease is important, but the lack of independence between the sampled test sets made it impossible to obtain a correct estimate of the variance of the mean accuracy necessary to build a test of significance. Neither the division by the square root of the number of sampled test sets nor the bootstrapped variance is correct with dependent results. Both methods provide a too small variance of the mean accuracy and thus conclude significance where there is no significance. Altogether, it might be assumed that the narrow-sense heritability of the trait plays an important role regarding the introduction of dominance effects in prediction models. As seed oil content is highly heritable (both narrow sense and broad sense), it is difficult to make a general conclusion. However, the high predictability of either GCA or FM models can explain why, without dense molecular scan and GS model, breeders have rapidly succeeded in transforming sunflower into a high valuable oil crop in the first half of the 20th century.

The use of biological information to enhance the accuracy of GS predictions was studied using simulations published by Pérez-Enciso et al. (2015). These authors showed that imprecision on QTL locations and non-exhaustive knowledge of all causal QTLs result in the rapid decline of the nearly perfect accuracy obtained when causal QTLs are all perfectly known. However, even with imperfect knowledge of 50% of genes, including causal QTLs, these authors showed a better accuracy compared to GS predictions with all SNPs. This encouraging result shows the interest of including functional knowledge in GS models. We tested the incorporation of biological knowledge on the oil metabolic network but we did not observe any improvement of the FM model predictions despite an improvement of the GCA model predictions. This finding is not surprising and is consistent with results of Spindel et al. (2016), who did not observe any improvement in accuracy with inclusion of historical GWAS results. Nevertheless,

GS predictions using previous known genes involved in oil content metabolic network were better than the GCA model predictions (7% increase in average for hybrids of untested parents) with far less genotyping requirements than GBLUP predictions. Considering the phenotyping and genotyping efforts in breeding, this finding is an important practical result, showing that with SNPs on a limited number of genes in oil metabolism, we can accurately predict unknown hybrids without the need of either phenotyping both parents or genotyping them genomewide. Accordingly, the prediction of traits of interest can be accessible for large panels by focusing on genes implicated in the trait using functional genomics knowledge and bioinformatics pipelines.

# 5. CONCLUSION

This study was conducted to compare the performance of classical prediction of hybrid based on the general combining aptitude (GCA) of their parents to current genomic predictions using whole genome sequencing. An incomplete factorial design of 36 maintainer lines (CMS form) crossed with 36 restorer lines, created during the course of the SUNRISE project, was used to estimate and compare accuracies of several hybrid predictions of seed oil content.

We showed that in such a design, classical GCA and GS predictions of hybrid performance had equal accuracy, as the GCA of each parent is well estimated for oil content, a highly heritable and mostly additive trait. However, predictions of hybrid performances of at least one untested parent are more accurate using GS models, showing that GS can accelerate the genetic gain by enabling better selection in hybrid panel of poorly known parent lines.

#### AUTHOR CONTRIBUTIONS

BM and NL designed the study. BM performed the data analyses. FB, PP participated in the data analyses. NB, MB, GM, and EB provided genetic resources and phenotypic data. SC, LL, SM, and NP provided genomic data. LC provided metabolic data. PV designed the hybrid factorial design. BM, FB, LC, FV, PV, and NL drafted the manuscript. All authors have read and approved the manuscript.

#### REFERENCES


#### ACKNOWLEDGMENTS

This work was part of the SUNRISE project of the French National Research Agency (ANR-11-BTBR-0005, 2012-2019). We thank our partners: Biogemma, Caussades semences, Maisadour semences, RAGT 2n, Soltis, Syngenta, and Terres Inovia for providing experimental data. The authors thank Thierry André who suggested the comparison between GCA- and GS-based predictions.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpls.2017. 01633/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Mangin, Bonnafous, Blanchet, Boniface, Bret-Mestries, Carrère, Cottret, Legrand, Marage, Pegot-Espagnet, Munos, Pouilly, Vear, Vincourt and Langlade. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Identification and Functional Annotation of Genes Differentially Expressed in the Reproductive Tissues of the Olive Tree (Olea europaea L.) through the Generation of Subtractive Libraries

#### Edited by:

Mariela Torres, Instituto Nacional de Tecnología Agropecuaria (INTA), Argentina

#### Reviewed by:

Luciana Baldoni, Istituto di Bioscienze e Biorisorse (IBBR), Italy Rosario Muleo, Università degli Studi della Tuscia, Italy

\*Correspondence:

Juan D. Alche juandedios.alche@eez.csic.es

#### † Present Address:

José A. Traverso, Department of Cell Biology, University of Granada, Granada, Spain Simon J. Hiscock, Oxford Botanic Garden and Harcourt Arboretum, Department of Plant Sciences, University of Oxford, Oxford, United Kingdom

> ‡ These authors have contributed equally to this work.

#### Specialty section:

This article was submitted to Crop Science and Horticulture, a section of the journal Frontiers in Plant Science

Received: 28 April 2017 Accepted: 28 August 2017 Published: 13 September 2017

#### Citation:

Zafra A, Carmona R, Traverso JA, Hancock JT, Goldman MHS, Claros MG, Hiscock SJ and Alche JD (2017) Identification and Functional Annotation of Genes Differentially Expressed in the Reproductive Tissues of the Olive Tree (Olea europaea L.) through the Generation of Subtractive Libraries. Front. Plant Sci. 8:1576. doi: 10.3389/fpls.2017.01576 Adoración Zafra1‡, Rosario Carmona1‡, José A. Traverso1†, John T. Hancock <sup>2</sup> , Maria H. S. Goldman<sup>3</sup> , M. Gonzalo Claros <sup>4</sup> , Simon J. Hiscock 5† and Juan D. Alche<sup>1</sup> \*

<sup>1</sup> Plant Reproductive Biology Laboratory, Department of Biochemistry, Cellular and Molecular Biology of Plants, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Granada, Spain, <sup>2</sup> Faculty of Health and Life Sciences, University of the West of England, Bristol, United Kingdom, <sup>3</sup> Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, São Paulo, Brazil, <sup>4</sup> Departamento de Biología Molecular y Bioquímica, Universidad de Málaga, Málaga, Spain, <sup>5</sup> School of Biological Sciences, University of Bristol, Bristol, United Kingdom

The olive tree is a crop of high socio-economical importance in the Mediterranean area. Sexual reproduction in this plant is an essential process, which determines the yield. Successful fertilization is mainly favored and sometimes needed of the presence of pollen grains from a different cultivar as the olive seizes a self-incompatibility system allegedly determined of the sporophytic type. The purpose of the present study was to identify key gene products involved in the function of olive pollen and pistil, in order to help elucidate the events and signaling processes, which happen during the courtship, pollen grain germination, and fertilization in olive. The use of subtractive SSH libraries constructed using, on the one hand one specific stage of the pistil development with germinating pollen grains, and on the other hand mature pollen grains may help to reveal the specific transcripts involved in the cited events. Such libraries have also been created by subtracting vegetative mRNAs (from leaves), in order to identify reproductive sequences only. A variety of transcripts have been identified in the mature pollen grains and in the pistil at the receptive stage. Among them, those related to defense, transport and oxidative metabolism are highlighted mainly in the pistil libraries where transcripts related to stress, and response to biotic and abiotic stimulus have a prominent position. Extensive lists containing information as regard to the specific transcripts determined for each stage and tissue are provided, as well as functional classifications of these gene products. Such lists were faced up to two recent datasets obtained in olive after transcriptomic and genomic approaches. The sequences and the differential expression level of the SSH-transcripts identified here, highly matched the transcriptomic information. Moreover, the unique presence of a representative number of these transcripts has been validated by means of qPCR approaches. The construction

**188**

of SSH libraries using pistil and pollen, considering the high interaction between male-female counterparts, allowed the identification of transcripts with important roles in stigma physiology. The functions of many of the transcripts obtained are intimately related, and most of them are of pivotal importance in defense, pollen-stigma interaction and signaling.

Keywords: gynoecium, leaf, olive, pollen, self-incompatibility, SSH, transcripts

#### INTRODUCTION

The olive (Olea europaea L.) is an important crop in Mediterranean countries. The fruit is used for the production of olive oil. Olive oil yield, organoleptic properties, quality, fatty acid content and many other parameters are highly dependent on the procedures used for olive oil production, including which olive cultivars are used. Asexual propagation of this tree, achieved by different methods (Böhm, 2013), is the usual practice since its domestication. This practice results in very high heteroplasmy, as assessed by the accumulation of mutations in a non-coding sequence of the mitochondrial genome when vegetative propagation is maintained for a long period of time (García-Díaz et al., 2003). However, olive production relies on the successful achievement of sexual reproduction. This plant has been suggested to harbor a self-incompatibility system of the gametophytic type (Cuevas and Polito, 1997; Ateyyeh et al., 2000; Wu et al., 2002), as described for the Oleaceae family (Igic and Kohn, 2001). However, most recent and abundant literature on the issue demonstrates that the self-incompatibility in olive is sporophytic (Kusaba et al., 2001; Allen et al., 2010a; Breton et al., 2014, 2016, 2017; Farinelli et al., 2015; Saumitou-Laprade et al., 2017). The fine mechanisms governing this system are currently being deciphered, and are likely to explain the divergence of incompatibility mechanisms which can occur among members of the same family, as seen in Arabidopsis (Kusaba et al., 2001). The SI mechanism described in the olive involves the preferential presence of pollen grains from a different cultivar for successful fertilization (allogamy). The main worry of the growers is the yield, which is affected by the pollinisers–pollinator relationship (Breton and Bervillé, 2012). In the case of the olive, wind is the main factor affecting the yield, as the dispersion of the pollen in olive is mainly anemophylous.

The use of high-throughput analytic methods based in nextgeneration sequencing (NGS) is rapidly reaching the study of the olive tree. A number of recent studies have described the generation of several olive transcriptomes (reviewed by Muleo et al., 2016), which have been generated from different organs and adaptive responses, sometimes discriminating their varietal origin and built preferentially by pyrosequencing/Illumina sequencing and array technologies. Thus, transcriptomes have been generated to approach flower and fruit development (Alagna et al., 2009, 2016; Galla et al., 2009; Ozgenturk et al., 2010; Muñoz-Mérida et al., 2013; Carmona et al., 2015; Iraia et al., 2016), fruit abscission (GilAmado and Gomez-Jimenez, 2012; Parra et al., 2013), abiotic stress responses (Bazakos et al., 2015; Guerra et al., 2015; Leyva-Pérez et al., 2015), miRNA (Donaire et al., 2011; Yanik et al., 2013), plant architecture (González-Plaza et al., 2016), and even comparative transcriptomics (Sarah et al., 2017). Genome sequence of the olive tree, corresponding to 95– 99% of the estimated genome length was recently obtained and annotated by Cruz et al. (2016). Such annotation was assisted by RNAseq from different tissues and stages, and represents an important resource for future research on olive tree, as well as for breeding purposes.

The present study was based on the construction of several cDNA libraries that were subtracted using the SSH method. The aim was to study the reproductive biology of the olive and particularly to obtain clues regarding pollen and stigma physiology, including the presence of differentially expressed enzymes, allergens and other relevant gene products. For that purpose we used reproductive tissues (pollen and pistil) as well as vegetative tissues (leaf as the subtractive item). For each pair combination, the forward and reverse libraries were constructed.

#### MATERIALS AND METHODS

#### Plant Material

The different tissues were obtained from adult olive trees (Olea europaea, cv. Picual) growing at the Estación Experimental del Zaidín (Granada, Spain). Pistils were excised from the complete flower at the stage of development 4, dehiscent anthers, as defined by Zafra et al. (2010). These pistils normally include a relatively high number of mature (dehiscent) pollen, hydrated pollen grains, and even germinating pollen grains and pollen tubes, either over the stigma surface or through the transmitting tissues of the style or the ovary. The mature pollen grains were collected during the anthesis period using large paper bags by vigorously shaking the inflorescences. Pollen was sequentially sieved through a mesh in order to separate the grains from the debris. Young leaves were also selected. In all the three cases the different tissues were quickly frozen in liquid nitrogen and stored at −80◦C. Samples from three consecutive years were used for the present analysis.

# Construction of the Suppression Subtractive Hybridization (SSH) Libraries

Total RNA was isolated using the RNeasy Plant Mini Kit (Qiagen) from samples of the different years, and the contaminating genomic DNA was removed by DNAase I (Qiagen) treatment followed by a clean-up with the RNeasy MinElute Cleanup kit (Qiagen). cDNA was then synthesized from pistil, leaf, and mature pollen total RNA using the SMART PCR cDNA Synthesis kit (Clontech). The subtracted libraries were constructed with the PCR-Select cDNA Subtraction Kit (Clontech). A total of 6 libraries were constructed: 1. Pistil subtracted with pollen [P(Po)]; 2. Pollen subtracted with pistil [Po(P)]; 3. Pistil subtracted with leaf [P(L)]; 4. Leaf subtracted with pistil [L(P)]; 5. Pollen subtracted with leaf [Po(L)]; 6. Leaf subtracted with pollen [L(Po)], according to the manufacturer's instructions. Two rounds of PCR amplifications were also performed according to the manufacturer's protocol in order to enrich differentially regulated genes, by using the PCR Primer 1 and the Nested PCR primer 1 and 2R as indicated in the manufacturer's instructions and provided by the kit.

#### Cloning and Differential Screening

The secondary PCR products were cloned into the T/A cloning vector pGEM-T Easy (Promega) according to the manufacturer's instructions and transformed into DH5α E. coli cells. The colonies containing inserts were picked and used as template for PCR. The primers used in this case were SP6 and T7. Sanger sequencing of PCR products was carried out at the Estación Experimental del Zaidín DNA Sequencing Service (CSIC, Granada, Spain), the Laboratório de Biologia Molecular de Plantas (Universidade de São Paulo, Brazil), and other commercially available facilities. With the aim to perform the differential screenings, a number of membrane replicates were prepared, each one containing 1 µl of the PCR product per dot, which were spotted onto nylon membranes and fixed with a brief wash in 2x SSC followed by baking at 120◦C during 30 min. The membrane replicates were probed with the forwardsubtracted probe, the reverse-subtracted probe, the unsubtracted tester probe, and the unsubtracted driver probe in each case. The labeled probes were generated from the secondary PCRs products described in the (SSH) library construction section, which were purified using the MinElute PCR Purification Kit (Qiagen). DIG-DNA labeling, determination of labeling efficiency, hybridization, and immunological detection were carried out as described in the DIG High Prime DNA Labeling and Detection Starter Kit II (Roche) instruction manual. The membranes were revealed with the CSPD ready-to-use chemiluminescent substrate (Roche), exposed to ChemiDoc XRS system (Bio-Rad). Images were gathered with a supersensitive 12-bit CCD after 30 min of exposition (Supplementary Figure 1). All hybridizations and image captures were repeated twice.

#### Sequencing and Data Analysis

Transcripts were compared (using BLASTn) against nonredundant protein databases at the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov; Altschul et al., 1990) (E-value 10−<sup>4</sup> ) and also against the non-redundant proteins unique transcripts Olea EST database (Alagna et al., 2009). The Blast2Go (http://www.blast2go.com/ b2ghome) software was used (Conesa et al., 2005) to carry out the statistical analysis of GO (Gene Ontology) terms. For the analysis of the contigs obtained from the singletons, the Codon Code Aligner software was used (http://www.codoncode.com/ aligner/).

The Venn diagrams were constructed using the transcripts of the 6 SSH libraries analyzed. Three groups were considered, corresponding to pollen, pistil and leaf transcripts. The VENNY software (http://bioinfogp.cnb.csic.es/tools/venny/) was used for this purpose. With the aim to compare the output results after the comparison against the NCBI and Olea EST databases, two diagrams were performed separately.

To retrieve the putative Arabidopsis homologs of the olive clones obtained, the sequences from the transcripts of two selected libraries [Po(P) and P(Po)] were compared against the Arabidopsis Information Resource (TAIR) webpage (http:// www.arabidopsis.org/Blast/). A BLASTn against the TAIR10 Transcripts (−introns, +UTRs) (DNA) was carried out. The matrix weight was Blosum45, the nucleic mismatch −3, gapped alignments ON. The output results were used as input data in the plant biology resource from Genevestigator (https://www. genevestigator.com/gv/plant.jsp). The anatomy tool from this webpage was used to construct the heatmap representing the level of expression of the transcripts in olive corresponding to defense, oxidative metabolism and transport. These categories were chosen attending to two criteria: firstly a highly represented number of transcripts and also due to the implication in the reproductive process.

#### qPCR Validation of Substractive Transcripts

Total RNA from pollen, pistil and leaf from olive cv. Picual were extracted using the RNeasy Plant Mini Kit (Qiagen) according to the manufacturer's instructions from samples obtained after three consecutive years as described above. Two µg of total RNA was reversed transcribed using the High-Capacity cDNA Reverse Transcription kit (Applied Biosystems, Thermo Scientific). Nine independent reverse transcriptase reactions were carried out. cDNA was stored at −20◦C until use for qPCR analyses. Primers were designed for transcripts putatively specific of each tissue as determined by the differential screening. The Primer BLAST software (NCBI) was used for primer design with modifications in the default settings [PCR product size: 70–150 bp; Primer Tm 58–62◦C: Organism: green plants; Primer size: 18-23 bp; Primer GC content: 30–80%; Hyb Oligo Size: 18–30 bp; Hyb Oligo Tm: 68–72◦C; Hyb Oligo GC content: 30–80%]. Putative amplicons were blasted against Reprolive database aimed to confirm the specificity. For details of the primer sequence and expected sizes of the amplicons, see Supplementary Table 1. LightCycler FastStart DNA Master Syber Green (Roche) was used in a Light Cycler 480 Instrument II (Roche Diagnostic, Mannheim, Germany) in a 20 µl reaction volume. Samples were run in duplicate for each experiment. Expression levels of target genes were normalized as regard to the expression level of two housekeeping genes (Zinc-finger and katanin p60) and their relative expression levels were calculated with the 11Ct rule (Taylor et al., 2016). Housekeeping genes were gathered based on an automatic screening of the Reprolive database and were previously tested in pollen, pistil and leaf tissues (Carmona et al., 2016, 2017). Housekeeping sequences and amplicon sizes are detailed in Supplementary Table 1.

#### Comparative Transcriptomics Pre-processing of Subtractive Sequences

All clones sequenced were pre-processed using SeqTrimNext (Falgueras et al., 2010) as described in Carmona et al. (2015) to remove linkers, adaptors, vector fragments and contaminated sequences among others, while keeping the longest informative part of the sequence. Sequences below 100 bp were discarded.

#### Annotation and Correspondence with Olive Transcriptomes

Useful sequences were annotated using Full-LengtherNext (Seoane et al., in preparation). Additionally, the correspondence of the subtractive sequences with the transcriptomes reported in ReprOlive database (http://reprolive.eez.csic.es; Carmona et al., 2015), as well as with the transcripts deduced from the first olive tree genome draft (89,982 transcripts) (Cruz et al., 2016) were determined. This correspondence was estimated by comparing (using BLASTn, E-value 10−<sup>6</sup> ) the subtractive sequences against the mentioned transcriptomes. With regard to ReprOlive transcriptomes, sequences from pollen libraries [Po(L) and Po(P)] were compared against the pollen transcriptome (27,823 transcripts), sequences from pistil libraries [P(Po) and P(L)] were compared against the pistil transcriptome (60,400 transcripts), whereas sequences from leaf libraries [L(Po) and L(P)] were compared against the vegetative transcriptome (38,919 transcripts).

#### Presence/Absence of Subtractive Sequences in Different Stages/Tissues

In order to estimate the validity of the subtractive SSH libraries constructed, an in silico approximation was performed, taking advantage of the availability of Roche/454 reads used in ReprOlive (Carmona et al., 2015), which belong to seven different tissues and/or developmental stages: leaf, mature pollen, germinated pollen at two different times after hydration (1 and 5 h) and pistil at developmental stages 2, 3, and 4 as defined by Zafra et al. (2010). Each one of the subtractive sequences was compared by BLASTn (e-value 1e−<sup>6</sup> ) against the reads of each of the seven tissues/stages separately. In the event of at least one read matching with the subtractive sequence (considering a 90% or greater of identity and an alignment of 85% or greater of the subtractive sequence length), the subtractive sequence was considered to be present in this particular tissue/stage (and reported as "YES" in the corresponding output information).

#### Availability of Data and Materials

The datasets supporting the conclusion of this article are available in the European Nucleotide Archive (http://www.ebi.ac.uk/ena/ data/view/PRJEB13716) in the fastq.gz format with the title "SSH libraries of the olive tree (Olea europaea L.) reproductive tissues." Fastq identifiers in Supplementary Tables 2–7 allow location of every single sequence in its corresponding fastq file deposited at ENA.

# RESULTS AND DISCUSSION

Six libraries were generated by using the combination of tester/driver tissues indicated in **Table 1**. Special attention was given to the P(Po), P(L), and Po(L) libraries. **Table 1** includes information as regarded to the number of clones identified and sequenced from each one of the six SSH libraries generated.

The P(Po) library provided information about those transcripts that are expressed during the pollen tube germination in comparison with the mature pollen grains, within the context of the whole pistil as in this stage, the pistil is full of germinating pollen grains. This library also offered information of the pistil transcripts. The P(L) library reveals the presence of transcripts in a tissue which is a distinct form from the leaf, being in addition a reproductive tissue. Lastly, the Po(L) shows the transcripts of a reproductive dormant tissue (pollen) from which transcripts from vegetative tissue have been subtracted.

A total of 1,344 clones were sequenced. From those, 790 resulted in ESTs and 171 in contigs. The mean length of the contigs compared to the ESTs showed an increase of 25% for Po(P), 27% for P(Po), 15% for P(L), 22% for L(P), 8% for Po(L), and 7% for L(Po). The redundancy levels were relatively low, ranging between 3.17 and 2.15%. BLAST analysis was carried out by using two alternative databases. The percentage of BLAST hits averaged 70% when the alignment was carried out with the NCBI database (http://www.ncbi.nlm.nih.gov; Altschul et al., 1990) (e-value e−<sup>4</sup> ), and averaged 95% when the alignment was made against the OLEA EST database (Alagna et al., 2009) (e-values e−<sup>1</sup> ).

In order to assess the subtractive efficiency of the libraries, PCR-amplified samples of the DNA inserts of each clone were transferred to membranes and subjected to multiple hybridizations with: (a) unsubtracted tester, (b) unsubtracted driver, (c) forward-subtracted, and (d) reverse-subtracted probes. An example of the procedure is displayed in Supplementary Figure 1.

The criteria followed in order to define clones as regard to their tissue-specificity were as follows: clones hybridizing exclusively with the forward-subtracted probe were considered to be differentially expressed. The clones that hybridize to the forward-subtracted probe and the unsubtracted tester probe also correspond to differentially expressed genes with a 95% of probability. Clones that hybridize to all the four probes correspond to non-differentially expressed clones. The results from the screening (Supplementary Tables 2–7) revealed that 33.3% of the clones were differentially expressed in the Po(P) library, 47.9% in the P(Po) library, 29.1% in the P(L) library, 41.6% in the L(P) library, 73% in the Po(L), and 68.2% in the L(Po).

The use of subtractive libraries allowed us to obtain a relatively large number of tissue-specific transcripts. The number of such putative tissue-specific transcripts was highly dependent on the database searched by means of BLAST. Thus, the number of specific reproductive sequences (from pollen and pistil) was larger after using the OLEA EST db than after using NCBI db. The opposite tendency occurred with vegetative (leaf) transcripts (**Figure 1**).

In order to validate the results obtained after the construction and screening of the SSH libraries, a total of 12 genes were selected for further assessment of their expression profiles by means of qPCR on the basis of representing a panel of different expression situations (**Table 2**). Four transcripts (Expansin 11 precursor, Pectinesterase inhibitor 21, Pectin methyl esterase 2.1 and Pathogenesis related protein-1) were selected according to


TABLE 1 | SSH libraries constructed and descriptive parameters about the clones sequenced.

P, Pistil; Po, pollen; L, leaf.

their putatively unique or prevailing presence in pollen. The qPCR assays performed indicated that these transcripts were preferentially present in pollen (**Figure 2A**). Same approaches carried out with the remaining transcripts selected on the basis of their putative uniqueness or majoritarian expression in pistil and leaf (**Table 2**) yielded the qPCR results shown in **Figures 2B,C**, respectively.

Quantitative-PCR amplifications of all 12 transcripts showed the predicted patterns of tissue and abundance distribution predicted on the basis of their presence in the generated libraries. These 12 gene products exemplify a broad panel of functions and molecular characteristics, ranging from allergens, cell wall modification or loosening, secondary metabolism, hydrolase and catalytic activity, photosynthesis, oxidative stress, light reception and defense to pathogens. The importance of some of these functions in the reproductive process is discussed next.

The concentration of high- and low-abundance sequences was equalized in the different libraries, which allowed us to identify low abundance transcripts but with the drawback of missing details about their real abundance. However, analysis of the distribution of Gene Ontology terms provided a first approach to define the implication of these transcripts (**Figures 3**–**5**). The two pollen subtractive libraries [Po(L) and Po(P)] (**Figure 3**) included exclusive transcripts involved in biological processes related to the categories of pollination, responses to extracellular stimulus and post-embryonic development, and a high abundance of transcripts connected with cell differentiation, cell growth, and cellular component organization, in comparison to the rest of the libraries. The large presence of transcripts involved in cellular component organization may suggest that although the mature pollen grains have not yet started to germinate, many transcripts needed for pollen tube formation are already accumulated inside the pollen grains. In the same way, transcripts connected to transport activity also occupy a relevant role in both pollen libraries [Po(P) and Po(L)]. Therefore, the presence of such "preparative" transcripts in the mature pollen grain may be a determinant for the correct development of the pollen tube once the pollen grain arrives on the stigma and starts germinating through the style.

As regard to the distribution of transcripts among the different cellular components, the presence of transcripts apparently targeted to plastids is surprising, as olive pollen grains contain only poorly-differentiated plastids, probably lacking a highly structured biochemical machinery (Rodríguez-García and



García, 1978; Rodríguez-García et al., 1995). In both the Po(P) and Po(L) libraries, transcripts likely assigned to intracellular membrane-bounded organelle represent approximately onethird of the total transcripts.

Regarding the pie charts showing molecular function, considerable differences between the two pollen subtractive libraries were observed. These differences mainly result from the massive presence of Pectin Methyl Esterases (PMEs) in pollen. PMEs are enzymes present in higher plants, fungi and bacteria. They catalyze the demethylesterification of homogalacturonan residues of pectin, releasing methanol as the reaction product. Such modification is responsible for changes in the pectin molecule, which can then be cross-linked by calcium, and this further results in changes in the mechanical properties of the plant cell wall, altering its plasticity. This particularly affects the ability for growth and guidance of pollen tubes (Castro et al., 2013). Pollen specific PMEs have been described in other species (Tian et al., 2006; Gómez et al., 2013), with key roles during pollen germination (Leroux et al., 2015), during pollen tube elongation along the transmitting tract and when the pollen tube reaches the embryo sac in the ovule (Gómez et al., 2013). The olive pollen PME is considered a highly prevalent allergen present in the olive pollen (Salamanca et al., 2010).

Approximately half of the transcripts in the Po(P) library were involved in hydrolase activity. On the other hand, the Po(L) library did not show a high abundance of these transcripts. The subtraction carried out to create the Po(P) library possibly removed most of the PMEs of the pollen, and allowing the identification a wide variety of PME isoforms, as well as one pollen-specific PME. Therefore, the information provided by the pie charts for molecular functions delivers particular evidence on the processes happening within the pistil at stage 4.

In the pistil subtractive libraries P(Po) and P(L) (**Figure 4**), the presence of an exclusive transcript related to symbiosis (encompassing mutualism through parasitism) was detected. Further to this, both the pistil and the leaf tissues contain wide pools of transcripts related to stress and defense. A highlight here is the presence of transcripts linked to responses to biotic stimulus in the P(L) library as it has been proposed that certain self-incompatibility processes may have evolved from pathogen-defense mechanisms (Hodgkin et al., 1988; de Nettancourt, 1997; Elleman and Dickinson, 1999; Hiscock and Allen, 2008). Within these transcripts, we found Pathogenesis related proteins-1, 5, and 10, Beta-glucosidases and PME Inhibitors.

On the other hand, the pistil harbored numerous PME inhibitor transcripts, but they did not appear in the P(Po) library as the consequence of the subtraction carried out. This is likely due to the homology between PME and the PME inhibitors (PMEI) in the N-terminal pro-region, present in the stigmatic exudate of the olive (Rejón et al., 2013). This similarity among many plant species was previously described by Jolie et al. (2010) suggesting the inhibitory role of the PME pro-peptide region. PMEI may be originated from a rearrangement of plant genome which could have the starting point of the PME inhibitors (Giovane et al., 2004) dare consistent with the localization of the PME inhibitors described at the pollen tube apex (Röckel et al., 2008), which were detected in the pistil library, probably as a result of the presence of pollen tubes growing through the stigma-style.

In a similar way to both pollen libraries, there was a large presence of transcripts for proteins associated to intracellular membrane bounded organelles in the P(Po) and P(L) libraries. Among these, transcripts associated to plastids scored as an important proportion, and were more numerous in the P(Po) library than in the P(L) one.

The analysis of the molecular functions of the transcripts present in pistils revealed the noticeable presence of transcripts for proteins with electron carrier activity in both cases, which did not appeared in the pollen libraries. The number of transcripts for proteins with DNA-binding was almost six-fold higher in the P(Po) library compared to the P(L) library. Even though transcripts within the nucleotide-binding category were abundant in both libraries, they were almost three times more abundant in the P(Po) library than in the P(L) one.

DNA-binding proteins are key players in the process of expression and regulation of new proteins as such interactions are considered to be central for many basic biological processes, including transcription regulation, DNA replication and DNA repair (Bonocora and Wade, 2015). The high levels of transcripts encoding DNA-binding proteins in the pistil could be indicative of the ability to have quick responses, where finely tuned regulation is needed. On the other hand, the presence of transcripts for nucleotide binding proteins could also be related to the changes happening outside the gynoecium cells, as the plant disease resistance genes have been described to frequently encode nucleotide binding proteins (Meyers et al., 1999).

FIGURE 2 | Validation of the generation of SSH libraries by qPCR of selected transcripts. Several pairs of primers were designed for pollen- (PoP library), pistil- (PPo library) and leaf-preferentially expressed (LPo library) transcripts (A–C, respectively). Exp, Expansin 11 precursor; PEI, Pectinesterase inhibitor 21; PME 2.1, Pectin metil esterase 2.1; PRP-1, Pathogenesis Related Protein-1; PRP-1, Pathogenesis Related Protein-5; DRP-206, Disease Resistance Response Prot-206; EsterPIR7A, Esterase PIR7A; 14-3-3 prot, 14-3-3 protein; PSII, Photosystem II 10 KDa polypeptide; MAP, Mitogen-activated protein kinase 3; FructBP, Fructose-bisphosphate aldolase; Chloro, Chlorophyll a-b binding protein. mRNA expression levels (average ± SD) of each transcript analyzed are shown in the three different tissues after normalization with the average expression of Katanin and Zinc-finger as housekeeping genes, made relative to the most expressed transcript (y-value: 1).

Regarding the leaf subtractive libraries L(Po) and L(P) (**Figure 5**), most of the transcripts corresponded, not surprisingly, to proteins involved in photosynthetic metabolism, with the detection of a large proportion of transcripts for proteins located at plastids as well as a majority of transcripts implicated in biological processes such as carbohydrate metabolism and the generation of precursor metabolites and energy. This is also consistent with the large presence of transcripts for proteins with electron carrier function. As expected, these transcripts are also present in the pistil, although less abundantly, and absent in pollen.

To sum up and as discussed above, the putative origins, biological processes, cellular localizations and molecular functions of the transcripts identified in the different subtractive libraries analyzed are in good agreement with the predicted nature of such transcripts derived from the methods and tissues used for their construction, as displayed in **Table 3**. It is necessary to take into account that pistil tissues are expected to include dehiscent (mature) olive pollen, as well as hydrated pollen grains and even germinating pollen grains and pollen tubes, as described in the methods section.

Therefore, taking into account the origins for the described transcripts, the SSH libraries which best describe the pollen-pistil interactions in olive and the pollen hydration and pollen tube growth are Po(P) and P(Po). For both of these SSH libraries, the corresponding GO terms have been identified, and been represented together for comparison purposes (**Figure 6**).

Within the pool of transcripts present in the pistil from which pollen has been subtracted P(Po)], those involved in regulation, response to stress/stimulus, and signaling/cell communication are more abundantly represented in terms of relative percentage. On the other hand, the library of pollen from which pistil has been subtracted [Po(P)], is mainly rich in transcripts involved in cellular organization, localization, developmental processes, pollination and growth. Detailed lists of the transcripts detected for each pollen SSH library [Po(P) and Po(L)], together with BLAST relevant scores for each one are listed in Supplementary Tables 2, 6. Amongst these pollen transcripts it was found LAT52, which is known to play a role in pollen hydration and germination (Muschietti et al., 1998; Tang et al., 2002). The presence of SF21, with a putative function in pollen-pistil interaction (Allen et al., 2010a) confirms the importance of these transcriptsin reproduction. However, no transcripts of SF21 were found in the pistil [see Supplementary Tables 3, 4, corresponding to P(Po) and P(L)], despite a putative function in pollen tube guidance (Allen et al., 2010a). The Soluble NSF Attachment Protein Receptor (SNARE) proteins within the mature pollen grains are also present in the spore, with a role in the pollen tube movements (Bushart and Roux, 2007).

In the pistil (Supplementary Tables 3, 4), the response to stress is mainly represented by the Pathogenesis related proteins (PRPs); the signaling processes occurring in the pistil is emphasized by the presence of auxin responsive factors. Regulation is carried out by the interaction with the pollen specific auxin induced/repressed proteins (present in the mature pollen; they were found in both pollen libraries). The output results from the pistil libraries showed a similarity to that seen

with auxin-induced root cultures (Neuteboom et al., 1999), but with an unknown function in the mature pollen grains. The auxin responsive proteins present in both pistil libraries are important for pollen tube formation (Yang et al., 2013), which indicates the presence of growing pollen tubes. However, they are not present in the mature pollen grains (Supplementary Tables 2, 6). The 14-3-3 protein is also important in pollen germination as it is involved in the regulation of turgor pressure of the pollen tube (Pertl et al., 2010).

The use of BLAST analysis of the SSH-retrieved sequences against the OLEA ESTdb, specifically containing sequences from olive mesocarp only, provided in many cases further information through the annotation of our sequences. For example, the described pistil-specific thaumatin/PRP-5 (Kuboyama, 1998; Sassa et al., 2002) is identified in the OLEA ESTdb as "Thaumatin-like protein, Pathogenesis–related protein 5," whereas the NCBIdb identify them with the general term "Thaumatin-like protein." We were able to discriminate between two different thaumatin-like proteins: Pathogenesis– related protein 5, which was only found in the pistil, and the Pathogenesis–related protein 1, which was pollen specific. Moreover, three isolated transcripts from different thaumatins were found in the pistil, the first one identified as "STS14 protein/ Pathogenesis-related protein 1C," and the others as "Osmotin-like protein" (OSML13 and OSM34, respectively). The STS14 protein is proposed to be involved in the protection of the outer tissues of the pistil from pathogen attack or guidance of the pollen tubes through the pistil. It is highly expressed

in the stigma and stylar cortex around 120 h before anthesis and increases toward the end of flower development, with a maximum at anthesis (Van Eldik et al., 1996).

As an example, three groups of transcripts were selected (defense, oxidative metabolism and transport: **Figure 7**) with the aim to further analyze and discuss the expression of transcripts with high abundance as well as their key putative roles in the pollen-pistil interaction, pollen tube germination and growth. The genes considered putative homologs to Arabidopsis from each group were analyzed throughout the anatomy tool of Genvestigator. The specificity of the transcripts and the biological implications of their differential expression are described and discussed below.

# Different Transcripts Putatively Involved in Defense Are Present in the Pistil

One of the stigma-specific transcripts detected in olive was that corresponding to the Pathogenesis-related protein 5 (PRP-5). Members of this protein group have been associated with resistance to fungal infection and to responses to biotic/abiotic stresses, disease resistance or hormonal responses by inducing transcripts such as DOR, MYB, AP2, and WRKY (El-kereamy et al., 2011). A pistil-specific thaumatin/PRP-5 has been described in Japanese pear (Sassa et al., 2002) and in tobacco (Kuboyama, 1998), where the maximum levels of the transcript were reached at anthesis. This gene product has been proposed to play a role in pollen recognition and pollen tube pathways. Among the

stigma-specific PRP-5 sequences obtained in olive, we observed high homology with the SE39B specific-stigma thaumatin from tobacco (Kuboyama, 1998) and also with a specific thaumatin from the fruit of Olea europaea highly expressed in response to phytophagous larvae (Corrado et al., 2012). Another olive stigma-specific transcript of interest, involved in defense, and also considered an allergen, is Mal d 1 from apple (Vanek-krebitz et al., 1995). This belongs to the PRP-10 group. These gene products have been described throughout several developmental stages and plant tissues, with a dual role associated with defense functions and regulation/signaling (Zubini et al., 2009; Choi et al., 2012). Pectin Methyl Esterases (PMEs) and their inhibitors (PMEIs) have been considered to be involved in defense function in vegetative tissues (McMillan et al., 1993; Boudart et al., 1998; Lionetti et al., 2007, 2012; Wydra and Beri, 2007; Ann et al., 2008; Körner et al., 2009; Volpi et al., 2011, 2013). Thus, the PMEs have been reported to enhance RNAi action, acting in generegulatory mechanisms (Dorokhov et al., 2006), which include virus-induced gene silencing (VIGS) and the fight against other pathogens (Collmer and Keen, 1986). Interestingly, it has been suggested that PMEIs might be internalized by endocytosis at the flanks of the pollen tube tip, regulating pollen-tube wall stability by locally inhibiting pollen PME activity (Röckel et al., 2008). It has also been suggested that PMEIs are able to reduce the activity of cell wall PMEs, leading to a drop-in pollen tube stability (Paynel et al., 2014). A pollen specific PMEI was described in broccoli triggering partial male sterility and decreased seed set by inhibition of pollen tube growth (Zhang et al., 2010).



The large presence of cysteine proteinase in the pistil may be attributed to defense mechanisms, similar to that already described by Grudkowska and Zagdanska (2004) ´ . Another defense mechanism which seems to work actively in the olive pistil is the "disease resistance response protein 206," that has been described to be induced in pea in response to the infection by F. solani f. sp. phaseoli (Culley et al., 1995). Within the Late Embryogenesis Abundant proteins, the pistil possesses transcripts for the dehidrin Rab 18. The Late Embryogenesis Abundant-18 transcript decreases during the germination process in pea, though it is present again in the emerging hypocotyls. Therefore, this transcript might be related to the elongation process under optimal growing conditions (Colmenero-Flores et al., 1999). In the case of the olive pistil, the presence of these transcripts could be related to the elongation process occurring in the growing pollen tubes within the stigma/style. Other transcripts found in the pistil libraries were: beta-glucosidases, late blight resistance proteins, WRKY genes, mitogen-activated kinase proteins, and the MYB genes expressed in the olive pistil are also involved in defense (Pandey and Somssich, 2009; Engelhardt et al., 2012). The MYB transcription factor itself has been described to be involved in pollen development (Niwa et al., 1993; Katiyar et al., 2012). A pistil specific nodulin has been also described in the pistil of several species (Allen et al., 2010b), being involved in a successful fertilization (Shi et al., 2012).

#### Different Transcripts Putatively Involved in Defense Are Present in the Pollen Grain

Defense genes highly expressed in the olive pollen also comprise PME again, the PME inhibitor U1, and a panel of eight pathogenesis-related proteins.

PRP-1 was detected exclusively in olive pollen subtracted Po(P) and Po(L) libraries. To date, PRP-1 has only been described to be involved in food allergy (Asensio et al., 2004), as the precise function of these proteins is not in the pollen itself known. The specific expression of the heat stress transcription factor (HsfA2) was also detected. HsfA2, together with chaperones, are important protectors of the pollen maturation, viability and pollen tube germination from heat damage (Frank et al., 2009; Giorno et al., 2010; Zinn et al., 2010).

# Oxidative Metabolism in the Pistil

Closely related to defense mechanisms, oxidative metabolism interplays a dual role, keeping the balance between defense and signaling. In the case of the pistil, these two functions are even more finely tuned as the signaling processes are very important for a successful reproduction. Therefore, it is important to highlight the presence of transcripts corresponding to Glutathione S-transferases (GSTs), Ferredoxin-1, NAD(P)H-dependent oxidoreductase, Peroxidase 72 and Quinone oxidoreductases. Most of these transcripts have not been described as pistil-specific in Arabidopsis (**Figure 7**). Among these, stigma-specific peroxidases have been previously studied in several species (McInnis et al., 2005; Swanson et al., 2005; Beltramo et al., 2012), with the implication in the pollen-pistil interaction, pollination process, and signaling. The glutathione S-transferase has been classified as an allergenic protein in animal species (Yu and Huang, 2000; Huang et al., 2006). Later it was identified in birch pollen (Deifl et al., 2014). However, when compared to other birch pollen allergens such as Bet v 1, the release kinetics of Glutathione S-transferase from pollen grains upon contact with water and different physiologic solutions was much slower. It was suggested that the amount of glutathione S-transferases released during this time period was too low to induce allergic sensitization (Deifl et al., 2014).

### Oxidative Metabolism in the Pollen Grain

The presence of transcripts from Tpr repeat-containing thioredoxin ttl1-like was observed. Such gene products have been described to accumulate in response to osmotic stress and abscisic acid (ABA), and also may be involved in pollen compatibility (Haffani et al., 2004). Using analysis to look for members of the oxidoreductase family of proteins we could find transcripts for galactose oxidase, glyoxal oxidase and a specific L-ascorbate oxidase homolog (Pollen-specific protein NTP303). To our knowledge, the presence of galactose oxidase has not been connected to any particular characteristic of the plant reproductive tissues. Interestingly, the enzyme glyoxal oxidase has been described to be involved in male sterility, jointly to other enzymes implicated in cell wall expansion (Chen et al., 2012; Suzuki et al., 2013). The presence of L-ascorbate oxidase transcripts has been described in in vitro germinating pollen grains (Weterings et al., 1992), although we failed to find these mRNAs in the olive pistil, which also contains in vivo growing pollen tubes. It is interesting to highlight the presence of the olive pollen allergenic protein Cu, Zn Superoxide Dismutase which is involved in the protection against oxidative stress during pollen development. Its dual role, i.e., as an allergen and as part of the antioxidant/signaling metabolism, makes its study particularly interesting (Butteroni et al., 2005). Moreover, it has

been described to be implicated in the development of the male reproductive tissues of the olive tree (Zafra et al., 2012).

#### Transcripts Connected with Transport of Molecules in the Pistil

Pollen-stigma interactions and the growth of the pollen tube throughout the pistil tissues encompass a large exchange of molecules among these tissues, either positively or negatively regulating and/or permitting such growth, throughout providing energy, ions or structural molecules. Among the pistil preferential transcripts detected in this work, several have been attributed with functions facilitating transport of such molecules. This is the case for the Ras-related transport protein, which facilitates proteins movement through membranes, and the mitochondrial import inner membrane translocase subunit Tim13. Transcripts from a member of the solute carrier family 35 (B1) are also present in the olive gynoecium. Other transporters that have been described also in primary roots (with a growing processes comparable to that of pollen tubes within the style of receptive flowers) are the specific lipid-transfer protein (LTP) AKCS9 (present in membranes) and aquaporins, both specifically present within the olive pistil transcripts and with described vegetative/reproductive differential meanings: lipid-transfer proteins were correlated with root hair deformation and pistil abortion (Krause et al., 1994; Shi et al., 2012) whereas specific aquaporins were found in the region adjacent to the root tip and have been demonstrated to be required for the self-incompatibility process displayed for members of the family Cruciferae (Ikeda et al., 1997; Sakurai et al., 2008).

FIGURE 7 | Level of expression across several reproductive/vegetative tissues of genes considered putative homologs to Arabidopsis using the anatomy tool Genvestigator. Three categories were considered for analysis of the levels of expression: oxidative metabolism (upper part), defense (middle part), and transport (lower part). The transcripts of two selected libraries were considered only: Po(P) and P(Po). SH, shoot; SI, silique; PD, pedicel; SE, sepal; PE, petal; PI, pistil; AZ, abcision zone; PO, pollen; AN, anther; ST, stamen; FL, flower; RA, raceme; INF, inflorescence. Identities of both the olive SSH transcripts and their corresponding Arabidospsis homologous are shown for reference purposes.

# Transcripts Connected with Transport of Molecules in the Pollen Grain

Transcripts for several transporters were found in the mature olive pollen grain. The sugar transport protein must represent a key transcript in pollen and pollen germination as it has been described in tobacco (Lemoine et al., 1999). Also, the polyol transporter present in the olive pollen could share similar functions to the polyol/monosaccharide transporter 2 expressed in mature pollen grains, growing pollen tubes, hydathodes, and young xylem cells (Klepek et al., 2010). Moreover, boron transporters expression reveals the regulatory role of boron in pollen germination and pollen tube growth (Qinli et al., 2003). Nitrate transporters also act as a nitrate sensor that triggers a specific signaling pathway stimulating lateral root growth (Guo et al., 2001), which may have a similar significance in pollen tube growth. The presence of the cation proton exchanger is critical for maintaining polarity, directing pollen growth toward the ovule, and to allow cell expansion and flower development (Bassil et al., 2011; Lu et al., 2011). The transcripts encoding ABC transporters, also found in the olive pollen, could be related to the transport of sporopollenin precursors for exine formation in developing pollen (Choi et al., 2011). Rho guanine nucleotide exchange factors are crucial in polar growth of pollen tubes (Zhang and McCormick, 2007). Finally, phosphate transporters have also been described as central for gametophyte development (Niewiadomski et al., 2005).

#### Other Transcripts

The present analysis also has reported some unexpected results. As an example, anther-specific proline-rich protein APG transcripts have been found in the pistil, when they have been considered to be confined to the anther during the period of microspore development, with a dramatic decline during pollen maturation (Roberts et al., 1993). This result could be explained by the implication of the proline-rich protein APG in the pollen tube during the germination process, through a process yet to be determined.

Even though our data still do not reveal substantial information as regards some key aspects of the olive reproductive biology which are still open, such as the demonstration of the presence of a self-incompatibility system of the gametophytic type (largely suspected). Many of the transcripts detected here (either tissue-specific or not) are of great interest for the further characterization of the species, and in some cases for important issues like olive pollen and stigma physiology, as discussed above. Current knowledge of olive pollen allergenicity can also be improved, as several of the identified transcripts correspond to potential allergenic molecules already described in other species, but as such not yet described in olive. This is the case, for example, with glutathione S-transferase, considered a minor allergen in birch pollen (Zwicker, 2013; Deifl et al., 2014). Gene products corresponding to transcripts detected in the resulting SSH libraries P(Po) and P(L) described here are also consistent with proteins characterized in the olive stigma exudate by means of proteomic approaches (Rejón et al., 2014), which may act as Additional positive controls for the present methodology, because the presence of olive pollen originated peptides among those detected by was almost completely avoided.

#### TABLE 4 | Summary stats about the pre-processing of the libraries.


TABLE 5 | Percentages of "pre-cleaned" SSH transcripts mapping to the two recently developed olive transcriptome databases (ReprOlive: Carmona et al., 2015; Cruz et al., 2016).


#### COMPARATIVE TRANSCRIPTOMICS

Comparative analysis of the SSH-derived transcripts with transcriptomic information present in two recently developed databases (ReprOlive: Carmona et al., 2015; Cruz et al., 2016) was preceded by a bioinformatics cleaning by SeqTrimNext which resulted in 820 useful sequences retained from 1,344 clones sequenced with the distribution displayed in **Table 4**.

Further annotation of the cleaned sequences from pollen libraries [Po(P) and Po(L)] and screening for correspondence with the two transcriptomic databases yielded the information displayed in Supplementary Tables 8, 9. Similarly, Supplementary Tables 10–13 include the correspondence of pistil libraries [P(Po) and P(L)] and leaf libraries [L(Po) and L(P)] with both transcriptomic databases.

Overall, a high proportion (88.5–100%, with an average of 95.8%) of the SSH sequences identified here mapped to both transcriptome databases (**Table 5**). Tissue specificity of the SSH sequences described here, assessed by BLAST between the different ReprOlive datasets with Roche/454 reads (Additional files 8–13), also yielded a high proportion of "YES" matches at the appropriate tissue, thus validating in silico the usefulness of the SSH approach which was carried out experimentally here.

# CONCLUSIONS

The generation and analysis of different SSH subtractive libraries has provided a dataset of sequences, consisting in about a thousand entries of great value for the understanding of the physiological processes taking place in olive pollen and pistil during their development and interaction. They are particularly important as many of these inputs have been demonstrated to be exclusively or preferentially expressed in the reproductive tissues, and not in the leaf tissues, as this material was used to build the subtractive strategy.

The subtractive transcripts have been annotated according to their homology as regard to four main databases: a general plant database provided by the NCBI, and three olive-specific databases constructed from mesocarp, reproductive tissues and a final one derived from the lately published olive genome draft. Moreover, they have been extensively classified and their presence discussed as regard to their putative biological function, cellular localization, and the molecular functions expected to exert.

Such information will be used in the near future as the basis to examine further aspects of the olive reproductive biology through the specific analysis of the expression of these products. These aspects may include compatibility, cell-to-cell communication, pollen tube growth and guidance, and pollen allergenicity among others.

# AUTHOR CONTRIBUTIONS

AZ and JA designed the experiments and redacted the manuscript. AZ performed the experiments and analyzed the results. JT was particularly involved in the work with the databases and tools on the web servers. SH and JH took part in the lab hosting and supervision during the AZ stay in their respective laboratories. MHG participated in the sequencing and interpretation of results. RC and MGC performed database searches and other bioinformatics analyses and organized and deposited all sequences at ENA.

# FUNDING

This work was supported by ERDF co-funded projects BFU2011- 22779, BFU2016-77243-P, RTC-2015-4181-2 and RTC-2016- 4824-2 (MINECO), P2011-CVI-7487 (Junta de Andalucía) and 201540E065 (CSIC). AZ thanks ceiA3 grant for Ph.D. in enterprises.

# ACKNOWLEDGMENTS

We thank T. Batstone and A. Allen for their help with laboratory work at the University of Bristol, A. Chueca for preparing samples for sequencing and E. Lima for helping with qPCR.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpls.2017. 01576/full#supplementary-material

#### REFERENCES


endocytosis, and reflects the distribution of esterified and de-esterified pectins. Plant J. 53, 133–143. doi: 10.1111/j.1365-313X.2007.03325.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Zafra, Carmona, Traverso, Hancock, Goldman, Claros, Hiscock and Alche. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genetic Tracing of Jatropha curcas L. from Its Mesoamerican Origin to the World

Haiyan Li<sup>1</sup> , Suguru Tsuchimoto<sup>2</sup> , Kyuya Harada<sup>2</sup> , Masanori Yamasaki<sup>3</sup> , Hiroe Sakai<sup>2</sup> , Naoki Wada<sup>2</sup> , Atefeh Alipour<sup>1</sup> , Tomohiro Sasai<sup>1</sup> , Atsushi Tsunekawa<sup>4</sup> , Hisashi Tsujimoto<sup>4</sup> , Takayuki Ando<sup>5</sup> , Hisashi Tomemori<sup>4</sup> , Shusei Sato<sup>6</sup> , Hideki Hirakawa<sup>7</sup> , Victor P. Quintero<sup>8</sup> , Alfredo Zamarripa<sup>9</sup> , Primitivo Santos10, Adel Hegazy<sup>11</sup> , Abdalla M. Ali<sup>12</sup> and Kiichi Fukui<sup>13</sup> \*

#### Edited by:

Johann Vollmann, University of Natural Resources and Life Sciences, Vienna, Austria

#### Reviewed by:

Nobuko Ohmido, Kobe University, Japan Veronique Sylvie Decroocq, Institut National de la Recherche Agronomique (INRA), France

> \*Correspondence: Kiichi Fukui

kfukui@bio.eng.osaka-u.ac.jp

#### Specialty section:

This article was submitted to Crop Science and Horticulture, a section of the journal Frontiers in Plant Science

Received: 04 April 2017 Accepted: 22 August 2017 Published: 07 September 2017

#### Citation:

Li H, Tsuchimoto S, Harada K, Yamasaki M, Sakai H, Wada N, Alipour A, Sasai T, Tsunekawa A, Tsujimoto H, Ando T, Tomemori H, Sato S, Hirakawa H, Quintero VP, Zamarripa A, Santos P, Hegazy A, Ali AM and Fukui K (2017) Genetic Tracing of Jatropha curcas L. from Its Mesoamerican Origin to the World. Front. Plant Sci. 8:1539. doi: 10.3389/fpls.2017.01539 <sup>1</sup> Department of Biotechnology, Graduate School of Engineering, Osaka University, Osaka, Japan, <sup>2</sup> Plant Bioengineering for Bioenergy Laboratory, Graduate School of Engineering, Osaka University, Osaka, Japan, <sup>3</sup> Kobe Food Resources Education and Research Center, Graduate School of Agricultural Science, Kobe University, Hyogo, Japan, <sup>4</sup> Arid Land Research Center, Tottori University, Tottori, Japan, <sup>5</sup> The Center for International Affairs, Tottori University, Tottori, Japan, <sup>6</sup> Graduate School of Life Sciences, Tohoku University, Miyagi, Japan, <sup>7</sup> Kazusa DNA Research Institute, Chiba, Japan, <sup>8</sup> INIFAP-Campo Experimental Bajío, Guanajuato, Mexico, <sup>9</sup> INIFAP-Campo Experimental Rosario Izapa, Chiapas, Mexico, <sup>10</sup> College of Agriculture, University of the Philippines Los Banos, Laguna, Philippines, <sup>11</sup> Genetic Engineering and Biotechnology Research Institute, University of Sadat City, Sadat City, Egypt, <sup>12</sup> Faculty of Agriculture, Shambat, University of Khartoum, Khartoum, Sudan, <sup>13</sup> Graduate School of Pharmaceutical Science, Osaka University, Osaka, Japan

Jatropha curcas L. (Jatropha), a shrub species of the family Euphorbiaceae, has been recognized as a promising biofuel plant for reducing greenhouse gas emissions. However, recent attempts at commercial cultivation in Africa and Asia have failed because of low productivity. It is important to elucidate genetic diversity and relationship in worldwide Jatropha genetic resources for breeding of better commercial cultivars. Here, genetic diversity was analyzed by using 246 accessions from Mesoamerica, Africa and Asia, based on 59 simple sequence repeat markers and eight retrotransposonbased insertion polymorphism markers. We found that central Chiapas of Mexico possesses the most diverse genetic resources, and the Chiapas Central Depression could be the center of origin. We identified three genetic groups in Mesoamerica, whose distribution revealed a distinct geographic cline. One of them consists mainly of accessions from central Chiapas. This suggests that it represents the original genetic group. We found two Veracruz accessions in another group, whose ancestors might be shipped from Port of Veracruz to the Old World, to be the source of all African and Asian Jatropha. Our results suggest the human selection that caused low productivity in Africa and Asia, and also breeding strategies to improve African and Asian Jatropha. Cultivars improved in the productivity will contribute to expand mass commercial cultivation of Jatropha in Africa and Asia to increase biofuel production, and finally will support in the battle against the climate change.

Keywords: Jatropha, biofuel plant, genetic diversity, center of origin, genetic resources

### INTRODUCTION

fpls-08-01539 September 6, 2017 Time: 16:39 # 2

Jatropha curcas L. (Jatropha) is a shrub that produces non-edible seed oil suitable for biodiesel fuel. The oil content ranges from 30 to 50% of its seed weight (Heller, 1996; Kumari et al., 2009). Jatropha has a potential to reduce the consumption of fossil fuel and carbon dioxide emissions (Bailis and Baka, 2010; Bahadur et al., 2013). Jatropha is drought tolerant, thereby its biofuel production could avoid competition with food crops (Fairless, 2007; Bahadur et al., 2013). Saving of carbon emissions depends on how biofuels are produced. Growing perennial plants on degraded and/or fallow fields for biofuel production has an advantage of sustained greenhouse gas reduction compared with converting forests or grasslands to produce annual crop-based biofuels (Fargione et al., 2008; Searchinger et al., 2008).

Large Jatropha plantations have been planned worldwide, particularly in the semiarid and arid areas of African and Asian countries by using seeds derived from local plants (Fairless, 2007; Von Maltitz and Setzkorn, 2013). Many commercial plantations, however, have not resulted in as high yields as were anticipated because of the limited productivity (Sanderson, 2009; Singh et al., 2014; Van Eijck et al., 2014; Von Maltitz et al., 2014). Breeding high-yielding Jatropha varieties is in its infancy so far (Divakara et al., 2010). Most of previous studies showed that genetic diversity of Jatropha is low in Africa, Asia and South America, but higher in Mesoamerica, including Mexico and Central America (Basha and Sujatha, 2007; Basha et al., 2009; Pamidimarri et al., 2010; Rosado et al., 2010; He et al., 2011; Montes et al., 2014; Montes Osorio et al., 2014; Siju et al., 2016). The low genetic diversity of African and Asian accessions has limited the potential for successful breeding by using local resources. The challenge now is to develop welladapted, high-yielding varieties that are suitable for a wide range of climate conditions in African and Asian countries, since only wide plantation with high oil-production can guarantee a good supply for biofuel. Characterization and preservation of a diverse collection of Jatropha accessions, including those from Mesoamerica, is the key step to develop them.

Mesoamerica, especially Mexico, has been shown to possess high genetic variations of Jatropha and was assumed to be its place of origin (Makkar and Becker, 2009; Dias et al., 2012). Mexico has not only toxic Jatropha varieties, but also non-toxic ones, suggesting that Mexico might be also the domestication center of Jatropha (Dias et al., 2012). Recent studies described that Chiapas, the southernmost state of Mexico bordering Guatemala, might be the center of Jatropha biodiversity (Ovando Medina et al., 2011; Pecina Quintero et al., 2011, 2014; Salvador Figueroa et al., 2015). However, a comprehensive study is required to obtain more evidences for this conclusion.

It has been widely approved that Jatropha was brought by Portuguese from Mesoamerica to Cape Verde Islands, and then brought to Africa and Asia (Heller, 1996), while the transmission route in Mesoamerica is unknown and the genetic background of African and Asian Jatropha in the world population remains unclear. To improve the traits in the breeding goal of Jatropha in Asia and Africa, it is of great significance not only to identify diverse genetic resources but also to unveil the ancestral genotype of African and Asian Jatropha when shipped from Mesoamerica.

In this study, 59 simple sequence repeat (SSR) markers and eight retrotransposon-based insertion polymorphism (RBIP) markers were applied to assess the genetic diversity of 246 Jatropha accessions from eight Mexican states (Chiapas, Guerrero, Michoacan, Morelos, Oaxaca, Tabasco, Veracruz, and Yucatan), Guatemala and countries of Africa and Asia. We also evaluated the genetic variability of accessions from each of nine regions in Chiapas, to identify the center of origin. We further identified the voyage from Chiapas to the Old World by tracing the progenies in Mesoamerica sharing the same ancestors with nowadays African and Asian accessions. Human selection causing a low genetic basis in Africa and Asia is discussed. Finally, we propose strategies to improve African and Asian Jatropha, in order to increase the phenotypic performance including productivity, which will lead us to relieve from the climate change by mass production of the biofuel.

### MATERIALS AND METHODS

#### Plant Materials

A worldwide collection of 246 Jatropha accessions in this study consisted of 207 accessions from Mesoamerica (198 from Mexico and nine from Guatemala), seven from Africa, and 32 from Asia (Supplementary Table 1; see **Figure 3** for the locations of Mexican states and regions in the state of Chiapas). The Mexican accessions and seven Guatemalan accessions were obtained from the collection of the Instituto Nacional de Investigaciones Forestales, Agropecuarias (INIFAP, Mexico). Seven Philippine accessions were obtained from the collection of UPLB originating in four provinces of Luzon and Mindanao islands. Twenty-one Vietnamese accessions were obtained from plants grown in Quang Tri Province. Sudanese and Egyptian accessions were obtained from the University of Khartoum and University of Sadat City, respectively. Other accessions were the same as those used in our previous study (Sato et al., 2011). Jatropha sampling and transferring to Japan were performed in compliance with the Nagoya agreement.

#### DNA Markers

Over 500 SSR markers were developed by designing primer pairs surrounding SSR sequences identified by the genomic database of Asian accessions based on our whole genome sequencing project (Sato et al., 2011; Hirakawa et al., 2012). Effective markers were selected from them based on clear polymorphisms and low number of null alleles. Eight RBIP markers employed in this study were developed from members of the copia-type families identified in the genomic database (Alipour et al., 2013). All of them were expected to have retrotransposed more recently than other members. Primers were designed for both flanking (FLK) sequences and long terminal repeats (LTRs). Retrotransposon insertion at each locus was shown by combining primers designed from the FLK sequences at both sides and primers designed from LTR sequences. Linkages between the markers were tested by TASSEL2.1 (Bradbury et al., 2007).

#### DNA Extraction

fpls-08-01539 September 6, 2017 Time: 16:39 # 3

Genomic DNA was extracted from Jatropha leaves with a DNeasy Plant Mini Kit (Qiagen, United States) according to the manufacturer's instructions. The DNA samples were diluted to a final concentration of 0.35 ng/µl and stored at −20◦C until use.

#### Genotyping and Scoring

Polymerase chain reaction (PCR) for SSR markers was performed in a 10 µl total volume solution containing 1.4 ng of DNA template, 0.16 µl of SSR primer mix (25 µM each), 1× PCR buffer, 0.03 µM MgCl2, 0.8 µl dNTPs (2.5 mM), 0.08 U Bio-taq (5 units/µL), and Milli Q water. Touch-down amplification was performed as follows: 3 min hold at 94◦C, followed by three cycles at 94◦C for 30 s, and 68◦C for 30 s, with the annealing temperature reduced by 2◦C until 64◦C every three cycles. Continuing with three cycles at 94◦C for 30 s, 62◦C for 30 s, and 72◦C for 30 s, the annealing temperature was reduced by 2◦C until 58◦C every three cycles. A further 30 cycles of amplification was performed at 94◦C for 30 s, 55◦C for 30 s, and 72◦C for 30 s. The PCR products were stored at −20◦C until separation by polyacrylamide gel electrophoresis (PAGE). Amplified bands were stained with ethidium bromide. For markers that showed ambiguous bands, experiments were repeated until clear bands were observed by increasing annealing temperatures.

For RBIP markers, PCR was performed in a 5 µl total volume solution containing 0.7 ng of DNA template, 0.08 µl of primer mix (50 µM each), 1× PCR buffer, 0.03 µM MgCl2, 0.4 µl dNTPs (2.5 mM), and 0.08 U Bio-taq (5 units/µL), made to the volume with Milli Q water. PCR reaction was started with denaturation at 94◦C for 2 min, followed by 35 cycles of 45 s at 94◦C, 45 s at 55◦C and 2 min at 72◦C, continued with a final elongation step at 72◦C for10 min. Amplified bands were stained with ethidium bromide. The PCR products were stored and analyzed like the SSR markers.

For all the SSR markers and RBIP markers, bands were individually scored (presence 1, absence 0). All data analyses were performed based on the genotype data matrix.

#### Genetic Heterozygosity

Expected heterozygosity (HE) and observed heterozygosity (HO) show the portion of heterozygotes in populations. They are measures of the extent of genetic variation in a population. H<sup>E</sup> is heterozygosity under Hardy–Weinberg equilibrium (HWE) by random mating of individuals (Nei, 1987). H<sup>O</sup> is heterozygosity at the observed level. Differences between H<sup>E</sup> and H<sup>O</sup> are caused by inbreeding often as results of selection or the small population size. Inbreeding coefficient (FIS) is the value to estimate the level of inbreeding of the population. FIS was calculated from H<sup>E</sup> and H<sup>O</sup> by the equation: FIS = 1 − HO/HE. The larger FIS shows the larger extent of inbreeding in a population. H<sup>E</sup> and H<sup>O</sup> were calculated with GenALEx 6.5 (Peakall and Smouse, 2006). Comparisons of the significances of H<sup>E</sup> and H<sup>O</sup> among populations were performed using Kruskal–Wallis one-way analysis of variance. Calculation of the FIS and P-values, which show the significance of inbreeding, was performed with FSTAT version 2.9.3.2 (Goudet, 2001).

# Population Structure

Model-based clustering analysis was performed using Mesoamerican accessions or all accessions including African and Asian accessions with STRUCTURE 2.3.4 (Pritchard et al., 2000). The analysis was performed using the Markov chain Monte Carlo method based on 1 × 10<sup>5</sup> iterations followed by a burn-in period of 5 × 10<sup>4</sup> iterations. To determine the optimal number of populations (K), 1K method was applied. K varied from 1 to 10, after 10 independent runs for each K, 1K was calculated based on the rate of change of the log likelihood for K values between successive K values following Evanno et al. (2005). 1K should have a clear peak at true value of K.

Principal coordinates analysis (PCoA) of all individuals was performed with GenALEx 6.5. The final results were shown as a three-dimensional plot (X, Y, and Z represent the three principal coordinates) of all the accessions.

# Phylogenetic Relationship

A neighbor-joining (NJ) tree of African and Asian accessions and non-admixed Mesoamerican accessions with more than 80% of the group membership in Mesoamerican structure analysis was constructed on the basis of genetic distances (Nei et al., 1983) with POPULATION 1.2.32 (Langella, 2002). The tree was drawn with Treeview 1.6.6. (Page, 1996) and Figtree 1.4.2. (Rambaut and Drummond, 2014). Genetic distances (Nei's D<sup>A</sup> distance) between groups (admixed accessions with less than 80% of the group membership in Mesoamerican structure analysis were excluded in the analysis) were also calculated with POPULATION 1.2.32.

#### Number of Retrotransposons

Presence of each of the eight retrotransposons was judged by presence of the amplified band using a FLK primer and a LTR primer, and in the case of full-length members, also by absence of the band using FLK primers at both sides of the retrotransposon. According to the results, number of retrotransposons present in each accession was counted.

# Approximate Bayesian Computation (ABC) Analysis

To characterize the Mesoamerican ancestral genetic group of African and Asian accessions, we assumed that African and Asian Jatropha originated from different Mesoamerican genetic groups and then constructed scenarios. To decide which scenario is the best supported by data, scenarios were compared in DIYABC v2.1.0 (Cornuet et al., 2008, 2010).

#### Phenotypic Variation

Number of inflorescences per plant, number of female flowers per plant, ratio of female to male flowers and seed yield (g) per plant of 100 representative accessions from seven Mexican states [Chiapas (57), Guerrero (4), Michoacan (4), Morelos (3), Oaxaca (7), Veracruz (20), and Yucatan (5)] were measured individually at the Rosario Izapa Experimental Farm, INIFAP (Chiapas, Mexico). Chiapas accessions were further sub-classified into those from six regions [Centro (14), Frailesca (8), Fronteriza

(3), Sierra (4), Selva (2), and Soconusco (26)]. To compare the variation among accessions from central Chiapas (Centro, Fronteriza, Frailesca, and Sierra), peripheral areas of Chiapas (Selva and Soconusco), and other Mexican states, the mean value and standard deviation (SD) of each area were calculated.

#### RESULTS

#### Genetic Diversity in Mesoamerica

To survey the worldwide genetic variation of Jatropha, 59 effective SSR markers that showed the intraspecific polymorphism were selected from more than 500 Jatropha SSRs. The mean percentage of the missing data of SSR markers was 0.79%. We also identified six full-length members with two LTRs showing 100% sequence identity with each other, as well as two solo LTRs displaying 100% sequence identity with the consensus sequence, from Jatropha copia-type retrotransposon families (Alipour et al., 2013) to use as RBIP markers. All of the eight RBIP markers showed the intraspecific polymorphism. The mean percentage of the missing data of RBIP markers was 1.37%. Primers for the SSR markers and RBIP markers used in our study were listed in Supplementary Tables 2, 3. There were no or low linkage between all the markers (data not shown).

The SSR markers yielded 221 polymorphic bands ranging from two to seven with an average of 3.75 alleles per marker in all accessions. Among them, 161 alleles were specific to Mesoamerica, whereas no allele was specific to Africa or Asia (Supplementary Table 1). The state of Chiapas had 47 specific alleles, which was the highest number among Mexican states and Guatemala. In Chiapas, 30 alleles were specific to the central areas (Centro, Altos, Fronteriza, Frailesca, and Sierra), while only one allele was specific to the peripheral areas (Norte, Selva, Soconusco, and Istomo-Costa) (Supplementary Table 1).

Remarkably, 39 Asian and African accessions were almost monomorphic and homozygote in both kinds of markers, except for five Vietnamese accessions, each of which was heterozygote in a single marker. Negligible expected heterozygosity (HE) or genetic variation was observed in African and Asian accessions (H<sup>E</sup> = 0.000 and H<sup>E</sup> = 0.002, respectively) (**Figure 1A** and Supplementary Table 1). On the contrary, a significantly higher H<sup>E</sup> value was obtained for Mesoamerican accessions (**Figure 1A** and Supplementary Table 1).

In Mesoamerica, the highest genetic variation was found in Chiapas, with a clear decreasing cline of H<sup>E</sup> from Chiapas and its bordering states to non-bordering states (**Figure 1B**). These results showed that the highest genetic diversity of Jatropha exists in Chiapas.

To further narrow down the center of genetic variation, H<sup>E</sup> was examined among the regions within Chiapas. Five regions in central Chiapas had significantly higher H<sup>E</sup> values than four regions in the peripheral areas (P < 0.001, **Figure 1C**). Slight differences between expected and observed heterozygosity (H<sup>E</sup> versus HO) were observed among the accessions from central Chiapas, whereas large differences between these statistics were detected among the accessions from the peripheral areas of Chiapas, other Mexican states, and Guatemala (**Figures 1B,C**). These resulted in lower inbreeding coefficients (FIS) in central Chiapas than in most of the peripheral areas of Chiapas, other Mexican states, and Guatemala (Supplementary Table 1). This may be due to human selection and/or decrease of effective population size in the areas other than central Chiapas. Interestingly, in Soconusco, the southernmost peripheral region of Chiapas, FIS was the lowest, and in Sierra, one of the central regions, locating near to Soconusco, FIS was the second lowest. It is likely that effect of the human selection was little in these regions, but further research would be required to prove it.

#### Phenotypic Diversity in Mesoamerica

The phenotypic variation of four yield-related traits was evaluated for 100 Mexican accessions. The mean values and SDs of the accessions from central Chiapas were higher than those from the peripheral areas of Chiapas and other Mexican states (Supplementary Table 4). These suggest that Jatropha population in central Chiapas has the highest phenotypic variation in Mexico.

#### Genetic Groups in Mesoamerica

The genetic constitutions of all the Mesoamerican accessions were examined by structure analysis. The 1K value with respect to K (number of groups in the population) showed a peak when K equaled to 3 (Supplementary Figure 1A; see Population Structure). This indicates that the optical group number of all Mesoamerican accessions is 3, and then accessions from each region were classified into three clear genetic groups: A, B and C, by the model-based clustering analysis (**Figure 2**). The numbers and ratios of accessions assigned to each group in each Chiapas region, other Mexican states and Guatemala are presented in **Figure 3**. The distribution of the groups revealed a distinct geographic cline. Accessions from central Chiapas were mostly in Group A (orange). Accessions in Group B (purple) were mainly distributed in the peripheral areas of Chiapas and in neighboring states and countries, whereas accessions in Group C (blue) were mainly distributed in states distant from Chiapas (Guerrero, Michoacan, and Morelos). This geographical distribution of Jatropha genetic groups in Mesoamerica has not been reported before. The distribution of Group B is especially interesting, because it seems to surround the area of Group A.

The highest genetic variation and the lowest FIS among the three groups were observed in Group A (Supplementary Table 5), which was as expected from the results of the previous section. Mean values and SDs of phenotypic variations in Groups A, B, and C were also calculated and compared. Accessions from Group A had higher mean values and variations of the four traits than those from Groups B and C (Supplementary Table 4).

#### Mesoamerican Accessions Genetically Closest to African and Asian Jatropha

To estimate the ancestral genotypes of African and Asian Jatropha in their origin, we searched Mesoamerican accessions that are genetically close to them. We performed population structure and phylogenetic analyses on Jatropha including

∗∗∗P < 0.001.

African and Asian accessions. The 1K value with respect to K showed the maximum when K equaled to 3 (Supplementary Figure 1B), meaning that the optical group number of all the accessions is 3, and then they were classified into three genetic populations: Groups I, II, and III, by the model-based clustering analysis (Supplementary Figure 2). Group II mainly corresponds to Mesoamerican Group A, whereas Group III is composed of most of Mesoamerican accessions of Groups B and C, further indicating the distinct differences between Group A and other accessions in Mesoamerica. Group I comprises all the African and Asian accessions and eight Mesoamerican accessions: one from central Chiapas; two from Veracruz; and five from Guatemala. The genetic constitution of African and Asian accessions proved to be almost identical, indicating that they are the progenies of inbreeding through successive generations within limited ancestral genotypes. Interestingly, the eight Mesoamerican accessions in Group I were those classified into Group B in the previous section (**Figure 2**), and African

and Asian accessions had the closest genetic and phylogenetic relationships with Group B (Supplementary Table 6). Similarly, accessions from the peripheral areas of Chiapas, Veracruz and Guatemala had the closest relationships with African and Asian accessions (Supplementary Table 7). Results of Approximate Bayesian Computation (ABC) analysis supported our conclusion that African and Asian accessions were derived from Group B accessions (Supplementary Figure 3).

After eliminating admixed accessions which showed < 80% of the membership of each Mesoamerican Groups (see **Figure 2**), a phylogenetic tree of African and Asian accessions and non-admixed Mesoamerican accessions was constructed. The tree showed three clades, and interestingly, all of Mesoamerican accessions in each of the three clades (clades 1, 2, and 3) belonged to each of the three Mesoamerican Groups (Groups A, B, and C in **Figure 2**), respectively. One (Guatemalan accession) of the eight Mesoamerican accessions that genetically close to African and Asian groups (see above and Supplementary Figure 2) was not included in the tree because of its admixed membership. Remaining seven Mesoamerican accessions were in the same clade with all African and Asian accessions in the phylogenetic tree (**Figure 4**). PCoA analysis also revealed a close relationship between these seven accessions and African and Asian ones (Supplementary Figure 4). These seven accessions are genetically closest to African and Asian Jatropha, and most likely share common ancestors with them. Because insertions of copia-type retrotransposons in the genome are irreversible, the presence/absence of them is a reliable indicator for determining parental lineages. Among the seven accessions, one from the Centro region (central Chiapas) and two from Veracruz shared all eight retrotransposons identified in the African and Asian accessions (**Figures 2**, **4**, Supplementary Figure 2 and Table 8). On the other hand, the other four accessions from Guatemala lacked one of the eight retrotransposons. This observation suggests that the genotype of the three accessions [No. 127 (Soledad de Doblado, Veracruz), 210 (Entrada a Independencia, Veracruz), and 354 (Suchiapa, Centro, Chiapas)] (**Figure 5**) is the closest to the ancestral genotypes of African and Asian Jatropha.

# DISCUSSION

Aiming to identify the center of origin possessing the most diverse genetic resources of Jatropha that would contribute to

breeding projects in Africa and Asia, we examined genetic diversity of a worldwide collection of Jatropha accessions. We found that Mesoamerican populations showed significantly higher genetic diversity than African and Asian populations, and the Chiapas population showed the highest. Low genetic variation of local accessions in Africa and Asia should restrict their phenotypic performance, which would have resulted in the limitation in breeding programs. These results are consistent with previous reports (Basha and Sujatha, 2007; Basha et al., 2009; Pamidimarri et al., 2010; Rosado et al., 2010; He et al., 2011; Montes et al., 2014; Montes Osorio et al., 2014; Ouattara et al., 2014; Yue et al., 2014; Guo et al., 2016; Montes and Melchinger, 2016; Siju et al., 2016). Exceptionally, Maghuly et al. (2015) showed the most diverse population in Cape Verde by using 907 accessions from three continents. It was probably because they used only 19 Mesoamerican (Mexican) accessions from limited collection sites. Note that they found that SNPs of some genes were mostly in Mexican accessions in the same paper.

We further narrowed down the center of diversity to central Chiapas. The main part of central Chiapas is consisted of the Chiapas Central Depression (Pons et al., 2016), which is an ecoregion defined by its vegetative and geographic features. The Depression, which extends over five regions in central Chiapas, is composed of hot and semiarid lowlands dominated by deciduous shrubs. It is sandwiched by the northern highlands, central plateau and the southern Sierra Madre Mountains. An east–west running Grijalva river divides the Depression. Besides, the northern and southern parts of Chiapas have the humid tropical climate where the rainfall is plenty. Averaged annual rainfall in some areas of these parts exceeds 3,000 mm, and there exists a large scale of the tropical rainforest. On the other hand, the central Chiapas has less humid climate, annual precipitation ranges from 750 to 1500 mm (Newton and Tejedor, 2011). The tropical rainforest in the peripheral areas of Chiapas may have prevented natural diffusion of Jatropha from the central part, because the optimal rainfall for Jatropha is from 1000 to 1500 mm, and too much rain and humidity will provoke fungus and high rainfall might cause root damage (Ouwens et al., 2007). Isolated geographical and climate situations of the Chiapas Central Depression may have favored the evolution of drought-adapted plant species. The overlap among the Jatropha centers of genetic and phenotypic variation, as well as the pronounced geographical and climatic conditions, suggests that the Chiapas Central Depression is most likely the center of origin of Jatropha.

Chiapas Central Depression has the most diverse genomic resources, thus unexplored useful natural variants or mutants can be expected there. It would be the most suitable region to

accessions that share the same ancestor of African and Asian Jatropha. The red line represents a current road between Veracruz and Chiapas.

search Jatropha resources for breeding, which will contribute to enriching the pool of Jatropha for further genomic study. By using DNA markers derived from the genomic sequencing of an Asian accession (Sato et al., 2011), we could anchor the most diverse and various materials of Jatropha. Further genomic sequencing of Mexican accessions will lead to identifying more number of genome-wide effective DNA markers, which will encourage genomic selection or genome wide association studies (GWASs) of Jatropha from its center of origin by combing with phenotypic data over years. Diverse genomic resources can also be used for the study on the interactions among genes and/or genomic regions. Our identification of the center of diversity not only pointed out the center of origin, but also will inspire further genomic study of Jatropha.

We found genetic grouping (Groups A, B, and C) that correlated with geographic location in Mesoamerica, which has not been reported previously. Our results suggest that Group A, which distributes in central Chiapas represents the original genetic group. Specific genotypes of Group A might have spread from the Chiapas Central Depression to bordering areas, and subsequently formed distinct genotypes that comprise Group B and then C, according to their geographical distribution. We showed that African and Asian accessions were derived from Group B, rather than Group A or C. Further ABC analysis would be useful to make this conclusion more convincing.

In our study, we found some Group B accessions from Veracruz and Guatemala were genetically close to African and Asian accessions (Supplementary Table 7). Because Veracruz and Guatemala locate in the opposite direction (west and east, respectively) of the Chiapas Central Depression, the Jatropha in Group B from the peripheral areas of Chiapas, Veracruz and Guatemala might have experienced some genetic diffusion and exchange before evolving into the ancestral genotype of African and Asian Jatropha. This exchange would have involved human selection of the specific Jatropha genotype with advantageous phenotypes across these regions.

Identification of the two Veracruz accessions that genetically close to African and Asian Jatropha showed that Veracruz was an important junction between Mesoamerica and the Old World for the Jatropha transmission. Veracruz has been a major port facing the Gulf of Mexico since 1519, and there was no comparable port on the Caribbean coast of Guatemala. Moreover, the routes from Veracruz to other places in Mexico were gradually developed as the importance of the port increased. During the 16th and 17th centuries, Veracruz was one of the three major American ports for transatlantic trade. There was a fixed route between the

Cape Verde Islands and Veracruz for slave transport from Africa (Thomas, 2006; Bryant, 2014). It is thus reasonable to speculate that Jatropha was taken to the Cape Verde Islands and then to Africa via the Port of Veracruz.

Interestingly, the two Veracruz accessions that carry all the eight retrotransposons were collected at Soledad de Doblado and Entrada a Independencia of Veracruz, both of which are within 40 and 100 km from the Port of Veracruz, respectively. A Chiapas accession that is genetically close to the African and Asian group but also carries all the retrotransposons was collected at Suchiapa region in the Chiapas Central Depression (**Figure 5**). It is also noteworthy that this Chiapas accession is in Group B, even though Group A represents the major genetic background there (**Figure 3**). The two Veracruz accessions also belong to Group B. The absence of genetically similar accessions between central Chiapas and the port of Veracruz excludes a gradual natural expansion by seeds and favors the human-mediated spread of Jatropha from central Chiapas to the Port of Veracruz. These also suggest that there were selections when people brought Jatropha from central Chiapas. One important question is as follows: who first selected the ancestral African and Asian Jatropha in central Chiapas? The Spanish conquistadors normally did not enter the Chiapas Central Depression, where no mineral resources were expected to occur (Ochiai, 1991). In addition, there exist various non-Spanish local names of Jatropha in Chiapas and Veracruz (unpublished results by Ando). Therefore, it is probable that the indigenous people of Mexico selected and carried a limited genotype of Jatropha seeds or cuttings in central Chiapas and transported them to the area close to the Port of Veracruz, to be the ancestor of two Veracruz accessions. Then they would have been transported to Africa and Asia via the Cape Verde Islands by Portuguese from the Port of Veracruz (**Figure 5**) (Heller, 1996).

As a limited number of genotypes would have been brought to the Old World by Portuguese galleons, crosses could have occurred only among small populations in the Old World plantings, yielding very low genetic variation. This breeding bottleneck represents a typical example of the founder effect and is probably the major cause of the failure of Jatropha breeding in African and Asian countries. In addition, accumulation of mildly deleterious genes exhibited in the homozygous state by inbreeding might show inbreeding depression. In fact, one of the Veracruz accessions (No. 127), which has been suggested to share a common ancestor with African and Asian accessions as discussed above, had low values for the four yield-related traits (number of inflorescences per plant: 3; number of female flowers per plant: 18; ratio of female to male flowers: 0.06 and seed yield per plant: 15.10 g) comparing with those in each region and group in Mexico (Supplementary Table 4), suggesting that the low seedyield of African and Asian accessions is genetically determined. Moreover, Mesoamerican people seemed not to have placed any value on the seed-yielding potential of Jatropha as they selected and brought Jatropha seeds or cuttings from Chiapas to Veracruz for use as live fences and medicines (Duke, 1985; Prasad et al., 2012; Agbogidi et al., 2013), and these traditional applications are still widespread among Mesoamerican inhabitants even nowadays. Fences and medicines are the likely reasons why the Portuguese shipped the plant to Africa and Asia (Sabandar et al., 2013). Since then, seed productivity did not have a high priority in most part of Africa and Asia until recently. After arrival to Africa and Asia, epigenetic mutations might have occurred to adapt to the local climate within 100s of years, which have caused minor phenotypic variations, as discussed in previous reports (Montes Osorio et al., 2014; Yue et al., 2014).

Based on our findings that the genetic basis of African and Asian accessions is Group B in Mesoamerica and that Group A shows the highest genetic diversity in Mesoamerica, we propose the following strategies for genetic improvement in African and Asian Jatropha. First, make wide crosses between elite lines from each country in Africa and Asia, which should have their epigenetic adaptations to the climatic conditions, and accessions of Group A from central Chiapas, which are genetically distant from African and Asian Jatropha and have the largest genetic diversity. Not only hybrid vigor, but also some extent of phenotypic variations which are derived from the highest heterozygosity of Group A, would be expected in F<sup>1</sup> progenies. Second, select elite F<sup>1</sup> or F<sup>2</sup> progenies by phenotypic performance. Finally, use vegetative propagation (Behera et al., 2014) of selected progenies to preserve excellent lines in the heterozygous status. This strategy would be better than crossing among African and Asian Jatropha, or than just transplanting the Mexican one, because of the higher genetic diversity, heterozygosity and climate adaptability. Production of Jatropha cultivars with high productivity in Africa and Asia will help in the mass commercial cultivation of Jatropha there. High production of Jatropha seeds can ensure a stable and enough supply of biofuel, which will help in the battle against the climate change.

#### AUTHOR CONTRIBUTIONS

HL wrote the manuscript; HL, ST, KH, and KF discussed the results, organized and revised the manuscript; MY supported and conducted some statistical analyses; HS and NW commented the manuscript and supported the experiment; HL, AA, and TS performed the genotyping experiment; AT, HTs, TA, and HTo suggested and helped the data collection and analyses. SS and HH designed SSR markers and performed part of the marker analysis; VQ and AZ collected Mexican accessions and provided the phenotype data of Mexican accessions; PS collected and provided Philippines accessions; AH provided one Egyptian accession; AMA provided two Sudanese accessions. KF organized the project.

# ACKNOWLEDGMENTS

We would like to thank Dr. Satoshi Tabata for offering scientific assistance and Dr. Yoshiaki Kitaya for providing Jatropha samples from Vietnam. Plant Bioengineering for Bioenergy Laboratory was supported by Corporate Social Responsibility (CSR) Foundation of the Sumitomo Electric Industries (SEI), Ltd. We gratefully acknowledge Mr. Masayori Matsumoto, Mr. Masanori Yoshikai, Mr. Toshifumi Hosoya, Mr. Yasuhisa Yushio, and Mr. Naoki Ikeguchi of SEI, Ltd., for their support to KF. This study was conducted in part under the Joint Research Program of the Arid Land Research Center, Tottori University, Japan and INIFAP, Mexico.

#### REFERENCES

fpls-08-01539 September 6, 2017 Time: 16:39 # 10


#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpls.2017.01539/ full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer NO declared a shared affiliation, with no collaboration, with one of the authors MY to the handling Editor.

Copyright © 2017 Li, Tsuchimoto, Harada, Yamasaki, Sakai, Wada, Alipour, Sasai, Tsunekawa, Tsujimoto, Ando, Tomemori, Sato, Hirakawa, Quintero, Zamarripa, Santos, Hegazy, Ali and Fukui. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fpls-08-01539 September 6, 2017 Time: 16:39 # 11

# Development of Highly Informative Genome-Wide Single Sequence Repeat Markers for Breeding Applications in Sesame and Construction of a Web Resource: SisatBase

Komivi Dossa1,2† , Jingyin Yu<sup>1</sup>† , Boshou Liao<sup>1</sup> , Ndiaga Cisse<sup>2</sup> \* and Xiurong Zhang<sup>1</sup> \*

<sup>1</sup> Key Laboratory of Biology and Genetic Improvement of Oil Crops, Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Ministry of Agriculture, Wuhan, China, <sup>2</sup> Centre d'Etudes Régional pour l'Amélioration de l'Adaptation à la Sécheresse, Thiès, Senegal

#### Edited by:

Mariela Torres, Instituto Nacional de Tecnología Agropecuaria (INTA), Argentina

#### Reviewed by:

Harsh Raman, NSW Department of Primary Industries, Australia Ioannis Ganopoulos, Institute of Plant Breeding and Genetic Resources-ELGO DEMETER, Greece

#### \*Correspondence:

Xiurong Zhang zhangxr@oilcrops.cn Ndiaga Cisse cissendiaga02@hotmail.com

†These authors have contributed equally to this work.

#### Specialty section:

This article was submitted to Crop Science and Horticulture, a section of the journal Frontiers in Plant Science

Received: 23 June 2017 Accepted: 08 August 2017 Published: 22 August 2017

#### Citation:

Dossa K, Yu J, Liao B, Cisse N and Zhang X (2017) Development of Highly Informative Genome-Wide Single Sequence Repeat Markers for Breeding Applications in Sesame and Construction of a Web Resource: SisatBase. Front. Plant Sci. 8:1470. doi: 10.3389/fpls.2017.01470 The sequencing of the full nuclear genome of sesame (Sesamum indicum L.) provides the platform for functional analyses of genome components and their application in breeding programs. Although the importance of microsatellites markers or simple sequence repeats (SSR) in crop genotyping, genetics, and breeding applications is well established, only a little information exist concerning SSRs at the whole genome level in sesame. In addition, SSRs represent a suitable marker type for sesame molecular breeding in developing countries where it is mainly grown. In this study, we identified 138,194 genome-wide SSRs of which 76.5% were physically mapped onto the 13 pseudo-chromosomes. Among these SSRs, up to three primers pairs were supplied for 101,930 SSRs and used to in silico amplify the reference genome together with two newly sequenced sesame accessions. A total of 79,957 SSRs (78%) were polymorphic between the three genomes thereby suggesting their promising use in different genomics-assisted breeding applications. From these polymorphic SSRs, 23 were selected and validated to have high polymorphic potential in 48 sesame accessions from different growing areas of Africa. Furthermore, we have developed an online user-friendly database, SisatBase (http://www.sesame-bioinfo.org/SisatBase/), which provides free access to SSRs data as well as an integrated platform for functional analyses. Altogether, the reference SSR and SisatBase would serve as useful resources for genetic assessment, genomic studies, and breeding advancement in sesame, especially in developing countries.

Keywords: sesame, microsatellite, web resource, informative markers, molecular breeding

# INTRODUCTION

During the past years, the development in genetic studies and decrease of genotyping costs, have resulted in the rapid growth of the use of molecular markers (Kantartzi, 2013). Different genetic marker systems have been developed including restriction fragment length polymorphism (RFLP), randomly amplified polymorphic DNA (RAPD), amplified fragment length polymorphism

(AFLP), sequence-related amplified polymorphism (SRAP), Diversity Arrays Technology (DArT), restriction-site associated DNA sequencing (RADseq), single-nucleotide polymorphism (SNP), specific-locus amplified fragment sequencing (SLAFseq), and random selective amplification of microsatellite polymorphic loci (RSAMPL). However, simple sequence repeats (SSR) also known as microsatellite has become the molecular marker of choice because of its versatility, operational flexibility, and low-cost. This has provided the foundation for its successful application in a wide range of fundamental and applicable fields, such as, genetic diversity, linkage/association mapping of gene/QTL, marker-assisted selection (MAS), variety identification, and evolution analysis (Jiao et al., 2012; Zhang Q. et al., 2012; Li et al., 2014; Shi et al., 2014; Dossa et al., 2016c).

SSRs are relatively short tandem repeats (STRs) of DNA that are widely distributed throughout whole genomic sequences (Sharma, 2007). They are present in coding regions but are more abundant in non-coding regions (Hancock, 1995). They are characterized by a high co-dominant inheritance, reproducibility, and multi-allelic variation (Morgante and Olivieri, 1993; Kalia et al., 2011). In addition, SSRs have been demonstrated to have several important biological functions including the regulation of chromatin organization, DNA metabolic processes, gene activity, and RNA structure (Li et al., 2002, 2004).

Sesame (Sesamum indicum L.) is an emerging oil crop in the world with one of the highest oil content (up to 64%) and quality (Dossa et al., 2017) among major oilseed crops. It is mainly grown in developing countries, as such, its improvement through modern molecular breeding techniques has lagged behind other oilseed crops. Up to now, different types of molecular markers have been developed and applied to sesame genotyping and breeding efforts, such as RAPD (Bhat et al., 1999; Ercan et al., 2004), inter-SSR (ISSR) (Kim et al., 2002), AFLP (Laurentin and Karlovsky, 2006), but SSR has been the preferential marker (Zhang H. et al., 2012; Zhang Y. et al., 2012; Yepuri et al., 2013; Wei et al., 2014; Dossa et al., 2016c). Although their importance in gene mapping and MAS, only few SSR markers are available for sesame research and the available ones fail to adequately represent the entire genome (Dossa, 2016). More importantly, there is no database to search for sesame SSR information at the whole genome level and to perform functional analyses, as developed in other crops such as chickpea (CicArMiSatDB: Doddamani et al., 2014), (CMsDB: Parida et al., 2015), Cucumis melo (CmMDb: Bhawna et al., 2015), tomato (TomSatDB: Iquebal et al., 2013), sugar beet (SBMDb: Iquebal et al., 2015), brassicas (Shi et al., 2014), etc.

The completion of the full nuclear genome sequence (Wang et al., 2014a) recently updated (Wang et al., 2016) and the newly sequenced landraces (Wei et al., 2015, 2016) provide a cardinal framework to identify highly informative SSRs at the whole genome level. In this study, we took advantage of these three genome sequence resources and provided not only a large amount of genome-wide informative SSR markers for large-scale genotyping and breeding research in sesame, but also a user-friendly online database for convenient search and functional analyses of SSRs.

#### MATERIALS AND METHODS

#### Data Source

Three genome sequences of the cultivated sesame including the reference genome from the elite variety "Zhongzhi13" (Wang et al., 2014a, 2016) and the genome sequences of the landraces "Baizhima" and "Mishuozhima" (Wei et al., 2015, 2016) were downloaded from Sinbase<sup>1</sup> (Wang et al., 2014b) and SesameFG<sup>2</sup> (Wei et al., 2017), respectively. It is noteworthy that in this study, the latest version (v2) of the reference genome (Wang et al., 2016) with 13 pseudo-chromosomes (309 Mb) was employed for identifying microsatellites while the draft genome sizes of "Baizhima" and "Mishuozhima" are 267 and 254 Mb, respectively.

#### Microsatellite Mining and Primer Designing

Perl scripts from MISA (Thiel et al., 2003) were used for identifying SSRs based on the reference genome sequence. Perfect microsatellites, as well as compound microsatellites interrupted by a certain number of bases were searched (Yu et al., 2016). The parameters were set for detecting mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide (nt) motifs with a minimum of 10, 6, 5, 5, 5, and 5 repeats, respectively. The compound ones were defined as ≥2 repeats interrupted by ≤100 bp. Primer3 software (Untergasser et al., 2012) was employed to design up to three primer pairs to all the identified SSRs. We named all SSRs from SiSSM1 to SiSSMxx following their order on the pseudo-chromosomes and unanchored sequences. To identify the SSRs within genic regions, the general feature format (GFF) files of genes or transcripts were combined with the positions of the SSRs located on pseudo-chromosomes. The corresponding genes or transcripts linked to each SSR, along with the biological functions were retrieved from "Sinbase." In addition, Circos (Krzywinski et al., 2009) was used to construct the diagram of the SSR density and their genomic features in sesame.

#### Electronic Polymerase Chain Reaction

The primer pairs of 105,879 microsatellites located on the 13 pseudo-chromosomes were used to in silico amplify the genomic sequences of "Zhongzhi13," "Mishuozhima," and "Baizhima," employing the software GMATA (Wang and Wang, 2016). The primer nucleotide mismatch allowed was no more than one nucleotide and other parameters were set as default. The polymorphic primers were selected based on difference in number of repeat-units present in the three genomes.

<sup>1</sup>www.ocri-genomics.org/Sinbase/index.html

<sup>2</sup>http://ncgr.ac.cn/SesameFG/

TABLE 1 | Characteristics of SSRs identified in the whole genome of sesame.


TABLE 2 | Chromosome wise distribution of SSR types in the sesame genome.


#### Plant Materials and DNA Extraction

A total of 48 accessions of the cultivated sesame (S. indicum L., 2n = 26), comprising of landraces and modern cultivars grown in 12 countries of West, Central, and East Africa, were used in this study (Supplementary Table S1).

Leaves from 2 weeks old single seedling per accession were used for DNA isolation using the cetyltrimethylammonium bromide (CTAB) according to method described by Dossa et al. (2016c). DNA quality and quantity were assessed on 1.5% agarose gel and by spectrophotometry (NanoDrop 2000, Thermo Scientific, Wilmington, DE, United States), respectively. DNA samples were stored at −20◦C, for further use.

# Polymerase Chain Reaction, Electrophoresis, and Data Analysis

A subset of 23 SSR markers providing coverage across all the 13 pseudo-chromosomes was selected from the entire polymorphic markers identified through electronic polymerase chain reaction (e-PCR), to validate their polymorphism potential between the 48 sesame accessions. PCR was conducted as described by Dossa et al. (2016c). Briefly, PCR was performed in a total volume of 15 µL containing 30 ng of DNA, 1 pmol of each primer, 0.2 U Taq DNA polymerase and 2× reaction mix supplied with the dNTPs and MgCl2. The PCR cycles were 94◦C (5 min), 35 cycles of 94◦C (30 s), 55◦C (30 s), 72◦C (30 s), followed by the extension step for 5 min at 72◦C. The PCR amplicon sizes were scored in base pairs (bp) based on migration relative to the internal size standard of 400HD-ROX (Applied Biosystems, Foster, CA, United States) on an ABI 3130xl Genetic Analyzer (Applied Biosystems). Additionally, the amplified products were also electrophoretically separated on 1.5% agarose gel in TAE buffer and stained with ethidium bromide.

The number of alleles (Na), major allele frequency (MAF), and polymorphic information content (PIC) were calculated with the software PowerMarker version 3.25 (Liu and Muse, 2005). Moreover, to identify the pair-wise genetic relationships between

the 48 accessions, a neighbor-joining (NJ) tree based on Nei genetic distance (Nei, 1972) was drawn in MEGA version 7 (Kumar et al., 2016).

#### Development of SisatBase

The process of SisatBase development can be divided into two steps: (i) integration and consolidation of microsatellites data and (ii) developing SisatBase and embedding useful tools.

The datasets were curated to create a logic relationship among the different types of microsatellite data for their integration in SisatBase. Thereafter, SisatBase was developed using the LMAP (Linux + Apache + Mysql + Perl/PHP/Python) web application program platform. The HyperText Markup Language (HTML) and JavaScript language were also used to develop a user-friendly web interface. With the aim to enrich the functions of SisatBase, Browse, Search, customized BLAST, and MISAweb were developed for users to browse, search, and identify SSRs in the sesame genome conveniently (Altschul et al., 1997; Stein, 2013).

# RESULTS

#### Identification, Characteristics, and Genomic Distribution of SSRs in the Sesame Genome

A total of 138,194 non-redundant microsatellites were identified from 4,449 sequence scaffolds representing 94.3% of the assembled genome of sesame with an average of 507 microsatellites per Mb (**Table 1**). Mono-nucleotide and di-nucleotide SSRs were the most represented repeat types

(92.5% of the whole genome SSRs) with 79% as perfect SSR types, while the remaining were in compound forms. The most prevalent motif types were A/T, accounting for 91.85% of the total mono-nucleotide repeats. For di-nucleotide motifs, the dominant motif was "AT" accounting for 50.38% of the total di-nucleotide repeats. Overall, the dominant/major motifs (A, AT, AAG/AAT, AAAT, AAAAT, and AAAAAT) were all A/T rich, whereas the absent/scarce motifs were mostly C/G rich.

From these microsatellites, 76.5% (105,880 SSRs) were successfully mapped onto the 13 pseudo-chromosomes ("chr") of the sesame genome (**Table 2** and **Figure 1**). Overall, SSRs are distributed throughout the "chr" with some regions exhibiting higher density than others. The chr3 displayed the highest number of SSRs (10.5% of all mapped SSRs) followed by chr6, chr8, and chr9 accounting for 9.78, 9.46, and 9.46% of the all mapped SSRs, respectively. The chr11 harbored the lowest number of SSRs (5,686; 7.74%). Based on the physical location of each SSR and the GFF files of genes or transcripts, we uncovered that 18.84% of the total mapped SSRs were located in genic regions. Next, we estimated the relationship between the "chr" length and the number of SSRs harbored on each "chr" and found a high correlation (r <sup>2</sup> = 0.94) (**Figure 2**).

#### Primer Designing and e-PCR Based Polymorphic Screening of the Developed SSRs among Three Sesame Genome Sequences

With the release of new genome sequences from two landraces ("Baizhima" and "Mishuozhima"), it is now possible to provide at the whole genome level a set of polymorphic SSRs. First, we successfully designed up to three primer pairs from flanking sequences of 104,617 SSRs (98.80% of all SSRs). Secondly, we extracted 101,930 SSRs with primers (97.4%) which were located on the 13 "chr." Thirdly, we in silico amplified the three genomes mentioned above with the 101,930 SSRs. A total of 92,210 SSRs (90.5%) was conserved between the three genomes including 79.1% of total genic SSR markers. From these

SSRs, 86,414 (93.7%) were polymorphic between "Zhongzhi13" and "Baizhima," 85,753 (93%) showed polymorphism between "Zhongzhi13" and "Mishuozhima" and finally 79,957 (86.7%) SSRs were extracted as informative markers since they were polymorphic between the three genotypes (**Figure 3**). It is worthy to mention that the number of SSRs exhibiting polymorphism decreased with the increase of SSR repeat-length variation.

#### Amplification and Polymorphic Potential of Selected SSRs among 48 Sesame Accessions

We selected within the 79,957 informative markers, 23 SSRs from all the 13 "chr" with the aim of confirming their allelic variation between 48 sesame genotypes. Interestingly, only two markers did not amplify three accessions probably due to DNA quality issue. More importantly, all markers (100%) were polymorphic between the 48 sesame accessions. In total, 123 distinct alleles were obtained ranging from three (SiSSM105280, SiSSM11029, SiSSM35870, SiSSM61314, SiSSM59616, SiSSM78138, SiSSM91614, and SiSSM104985) to nine alleles (SiSSM46381) with an average allele number of 4.24 per locus. The mean MAF and PIC were estimated at 0.51 and 0.60, respectively (**Figure 4A** and **Table 3**). Based on the Nei's genetic distance between the 48 accessions, we constructed a NJ tree which divided the germplasm into three main groups (**Figure 4B**). Some geographical clustering patterns could be observed: the first group named "East Africa" gathered together the two accessions from Ethiopia. The second cluster called "West Africa" was composed of only West African accession from Senegal, Niger, Togo, Burkina Faso, Guinea, Benin, and Mali. The

last group named West and Central Africa, clustered together the accessions from Nigeria, Cameroon, Senegal, Ghana, Benin, Ivory Coast, Niger, and Togo (**Figure 4C**).

# SisatBase: An Online Database for SSR Functional Analysis in Sesame

In order to facilitate the exploitation of the SSRs at the whole genome level in sesame, we developed an online database with an easy-to-use interface<sup>3</sup> (**Figure 5A**). SisatBase supplied basic information for SSRs, including location on chromosomes, SSR type, SSR size and up to three primer pairs for each SSR entry, as well as the functional genes associated with the SSRs. Except that, SisatBase also provided the polymorphic SSRs among different sesame genotypes. In addition, SisatBase supplied useful search tools, including keyword, SSR type, and SSR location searches, which can help users to obtain their interested SSR information (**Figures 5B,C**). Customized BLAST and MISAweb were also embedded in SisatBase to help users to get or identify conveniently SSR with primers in their interested genomic regions or genomic sequences (**Figure 5D**).

# DISCUSSION

While the integration of molecular marker technologies have significantly improved the speed and precision of modern plant breeding, the molecular research in sesame has lagged behind other model crops mainly because sesame is a minor crop often grown by smallholders in developing countries. Hence, highly informative molecular marker systems with the advantage of easy and low-cost detection are capital for sesame breeding research. Microsatellite markers constitute undoubtedly the best candidate and in this study, we identified 138,194 SSRs at the whole genome level, along with their primer pairs and genome location.

The number of SSRs identified and the SSR density were higher than previous reports in sesame, mainly, because the genomic sequences examined in this study are more important (Wei et al., 2014; Uncu et al., 2015; Dossa, 2016). Furthermore, by exploiting the latest version of the reference genome, we are able to provide the accurate position of SSRs in the sesame genome compared with previous reports. This would be helpful for gene fine-mapping and association analysis in sesame. Mono-nucleotide and di-nucleotide repeats accounted for 92.5% of the whole genome SSRs in sesame. Our results are in agreement with conclusions of Cardle et al. (2000) and Sonah et al. (2011), who identified mono-nucleotide and di-nucleotide repeats as the predominant repeat types in several plant genomes including Arabidopsis thaliana, Brachypodium distachyon, Sorghum bicolor, Oryza sativa, Medicago truncatula, and Populus trichocarpa. Similarly to previous reports of Wei et al. (2014), Uncu et al. (2015), and Dossa (2016), the distribution of A/T rich motif as the major motif is highly in accordance with the AT (0.68%) vs GC (0.32%) content in the sesame genome (Wang et al., 2014a). The same findings were also observed in Brassica rapa (Xu et al., 2010; Shi et al., 2014), Brassica napus (Cheng et al., 2009), Brassica oleracea (Li et al., 2011), cucumber (Cavagnaro et al., 2010). The high correlation of SSR number and pseudo-chromosome length suggested that this type of DNA considerably increase the length of the sesame pseudo-chromosomes.

In sesame, SSRs were more concentrated in the intergenic regions compared to genic regions which is consistent with findings in Sativa japonica (Zhang et al., 2007), maize

<sup>3</sup>http://www.sesame-bioinfo.org/SisatBase/

Min 3 0.23 0.20

#### TABLE 3 | Polymorphism information of the 23 selected SSR markers.


(Xu et al., 2013), and other crops (Hancock, 1995). The landraces "Baizhima" and "Mishuozhima" exhibited similar polymorphic rates with the genome of "Zhongzhi13." This suggested that the two landraces are much closer to each other than the elite variety "Zhongzhi13." Our findings are in agreement with the conclusions of Wei et al. (2016) who found that the two landraces clustered together and were more closely related in the phylogenetic tree compared to "Zhongzhi13." We further discovered that the majority of genic SSRs in the sesame genome have been found within the conserved markers between the three genotypes. This result is understandable given that SSRs within genic regions are associated with genes which constitute the genome component more conserved within species (Xiao et al., 2016). On the other hand, this implies that the conserved set of SSRs might be related to important genes which were retained during improvement from landraces to elite cultivar, as demonstrated in soybean (Zhou et al., 2015). Therefore, we infer that these genic informative microsatellites may be linked to some important biological functions and could be potential tools for sesame breeding (Lata et al., 2014; Dossa et al., 2016a,b).

In our knowledge, there are no specific molecular markers developed for other related species in the Sesamum genus. It has been demonstrated that SSR markers have a good transferability between species of the same genus or even in the same taxa (Fan et al., 2013; Buso et al., 2016; Huang et al., 2016; Thakur et al., 2017). In sesame, Uncu et al. (2015) uncovered a high rate of SSR marker transferability between the cultivated species S. indicum and the proposed wild ancestor species S. malabaricum. In addition, different sets of SSR markers developed in the cultivated sesame also yielded good amplicons in the wild-related species including Sesamum radiatum, S. angustifolium, S. latifolium, S. angolense (Zhang H. et al., 2012; Nyongesa et al., 2013; Wu et al., 2014). Based on these reports, we speculate that our developed informative SSR markers might be relevant for other wild-related species of the Sesamum genus. This will be significant for the genetic improvement of the cultivated form by exploiting the potential of the wild-related species (Dossa et al., 2017). Such transferable SSR markers between Sesamum-related species could be used for conducting macro-synteny studies, genetic mapping, and molecular breeding. Therefore, in future studies, we will employ several wild-related species of the Sesamum genus as well as a diverse panel of the cultivated sesame to evaluate the cross-species transferability of our developed SSR markers and initiate genetic researches in the wild-related species of the Sesamum genus.

Although some SSR sets have been previously identified in the sesame genome, transcriptome, etc. (Spandana et al., 2012; Zhang H. et al., 2012; Wei et al., 2014; Uncu et al., 2015; Dossa, 2016), information regarding their amplification efficiency and polymorphic potential is limited. In the present study, we took advantage of the three available sequenced genomes to screen for amplification efficiency and polymorphism potential of our developed SSR markers. This led to the identification of 79,957 informative SSR markers of which 23 selected SSRs successfully discriminated 48 genotypes from Africa based on their geographical origins. This result suggested that e-PCR is a useful strategy for a rapid screening and an effective identification of informative markers (Wang and Wang, 2016; Xiao et al., 2016). In the works of Dossa et al. (2016c), 33 polymorphic SSRs were employed to assess the genetic diversity of 96 sesame accessions from Africa and Asia which resulted in a high genetic diversity within the African germplasm. The 23 selected SSRs used in the present study to scan the diversity of 48 African accessions were all polymorphic and yielded comparable alleles number (123 vs 137) although fewer genotypes were examined here. Similarly, a high genetic diversity was also observed in the studied germplasm proving that the global 79,957 informative SSR markers could be effectively considered as the reference SSR for large-scale genotyping and molecular breeding research in sesame (Billot et al., 2012).

All SSR data were integrated into SisatBase which also supplied useful and user-friendly tools to assist users to extract more information related to SSR markers in the sesame genome. The database will be continuously updated with new versions of the sesame genome. Moreover, with the aim of extending the utility of SisatBase over other species of the Sesamum genus, new information about the cross-species transferable SSR markers as well as novel and specific SSRs for each species will be supplied in the future.

# CONCLUSION

In conclusion, based on the latest version of the sesame reference genome and the two newly released genome sequences, we identified 138,194 SSRs of which 79,957 are proposed as the reference SSR for future genetics/genomics and breeding studies in sesame. All microsatellite data reported in this study are integrated into a user-friendly online database (SisatBase) for a convenient exploitation and further functional analyses. These tools will undoubtedly help to speed-up sesame molecular breeding especially in the developing countries.

# AUTHOR CONTRIBUTIONS

KD and JY produced the sesame SSR data, developed the online database, and drafted the manuscript. KD performed the experiments. BL, NC, and XZ designed the project, supervised the works, and revised the draft manuscript. All authors have read and approved the final manuscript.

# FUNDING

This study was financially supported by The China Agriculture Research System (CARS-15) and The Agricultural Science and Technology Innovation Project of the Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2013-OCRI). The first author is grateful to the fellowship offered by the Chinese Scholarship Council (2015GXY934).

### ACKNOWLEDGMENTS

fpls-08-01470 August 21, 2017 Time: 17:10 # 9

We acknowledge Mr. Li Donghua for helping in some figure configuration and Mr. Thomas Roberts for the language editing assistance.

# REFERENCES


#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpls.2017.01470/ full#supplementary-material



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Dossa, Yu, Liao, Cisse and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The First Molecular Identification of an Olive Collection Applying Standard Simple Sequence Repeats and Novel Expressed Sequence Tag Markers

Soraya Mousavi<sup>1</sup> , Roberto Mariotti<sup>2</sup> , Luca Regni<sup>3</sup> , Luigi Nasini<sup>3</sup> , Marina Bufacchi<sup>1</sup> , Saverio Pandolfi<sup>2</sup> , Luciana Baldoni<sup>2</sup> and Primo Proietti<sup>3</sup> \*

<sup>1</sup> Consiglio Nazionale delle Ricerche – Institute for Agricultural and Forest Systems in the Mediterranean, Perugia, Italy, <sup>2</sup> Consiglio Nazionale delle Ricerche – Institute of Biosciences and Bioresources, Perugia, Italy, <sup>3</sup> Department of Agricultural, Food and Environmental Sciences, Università degli Studi di Perugia, Perugia, Italy

#### Edited by:

Mariela Torres, Instituto Nacional de Tecnología Agropecuaria (INTA), Argentina

#### Reviewed by:

Rosario Muleo, Università degli Studi della Tuscia, Italy Raul De La Rosa, Andalusian Institute of Agricultural and Fisheries Research and Training, Spain

> \*Correspondence: Primo Proietti primo.proietti@unipg.it

#### Specialty section:

This article was submitted to Crop Science and Horticulture, a section of the journal Frontiers in Plant Science

> Received: 21 March 2017 Accepted: 07 July 2017 Published: 19 July 2017

#### Citation:

Mousavi S, Mariotti R, Regni L, Nasini L, Bufacchi M, Pandolfi S, Baldoni L and Proietti P (2017) The First Molecular Identification of an Olive Collection Applying Standard Simple Sequence Repeats and Novel Expressed Sequence Tag Markers. Front. Plant Sci. 8:1283. doi: 10.3389/fpls.2017.01283 Germplasm collections of tree crop species represent fundamental tools for conservation of diversity and key steps for its characterization and evaluation. For the olive tree, several collections were created all over the world, but only few of them have been fully characterized and molecularly identified. The olive collection of Perugia University (UNIPG), established in the years' 60, represents one of the first attempts to gather and safeguard olive diversity, keeping together cultivars from different countries. In the present study, a set of 370 olive trees previously uncharacterized was screened with 10 standard simple sequence repeats (SSRs) and nine new EST-SSR markers, to correctly and thoroughly identify all genotypes, verify their representativeness of the entire cultivated olive variation, and validate the effectiveness of new markers in comparison to standard genotyping tools. The SSR analysis revealed the presence of 59 genotypes, corresponding to 72 well known cultivars, 13 of them resulting exclusively present in this collection. The new EST-SSRs have shown values of diversity parameters quite similar to those of best standard SSRs. When compared to hundreds of Mediterranean cultivars, the UNIPG olive accessions were splitted into the three main populations (East, Center and West Mediterranean), confirming that the collection has a good representativeness of the entire olive variability. Furthermore, Bayesian analysis, performed on the 59 genotypes of the collection by the use of both sets of markers, have demonstrated their splitting into four clusters, with a well balanced membership obtained by EST respect to standard SSRs. The new OLEST (Olea expressed sequence tags) SSR markers resulted as effective as the best standard markers. The information obtained from this study represents a high valuable tool for ex situ conservation and management of olive genetic resources, useful to build a common database from worldwide olive cultivar collections, also based on recently developed markers.

Keywords: genetic variability, ex situ conservation, germplasm management, genotyping, EST-SSR, Olea europaea L.

# INTRODUCTION

fpls-08-01283 July 17, 2017 Time: 15:7 # 2

The cultivated olive (Olea europaea, subsp. europaea, var. europaea, Green, 2002) is one of the most important oil crops in the world and 95% of total olive oil production derives from the Mediterranean basin (Marra et al., 2013; Trujillo et al., 2014). The olive crop counts a very rich varietal heritage, represented by more than 1,200 named cultivars, over 3,000 minor cultivars and an uncertain number of genotypes including pollinators, local ecotypes and centennial trees (El Bakkali et al., 2013; Hosseini-Mazinani et al., 2014; Mazzitelli et al., 2015; Laroussi-Mezghani et al., 2016; Mousavi et al., 2017). Since time of ancient Greece, olive cultivars have been vegetatively propagated, either by cutting or grafting, allowing the accurate reproduction of the best-performing genotypes, leading to the present varietal assortment (Breton et al., 2009; Kaniewski et al., 2012). Thus, most cultivars represent ancient pre-bred genotypes, and the limited and sporadic genetic improvement initiatives, with classical or biotechnological approaches, forced the retention of numerous traditional cultivars despite their agronomical limitations. Among these, only a few have a large area of cultivation and a clear impact on the production of oil and table olives (Ilarioni and Proietti, 2014). But the availability of a large set of well characterized and highly different cultivars is critical to increase the ability to face new agronomical challenges (De Gennaro et al., 2012; Larbi et al., 2015) and future climatic constrains (Moriondo et al., 2013; Proietti et al., 2014; Tanasijevic et al., 2014), diversifying the gene pools, preserving unique genetic traits currently available (Bracci et al., 2011; Corrado et al., 2011; Potts et al., 2012; Klepo et al., 2013) and offering different sensory profiles of extra-virgin olive oils.

Several reasons make it difficult to ensure the identification of cultivars, as the joint cultivation of native and foreign cultivars, the ambiguous plant naming, seedlings or wild plants, or the interchange of plant material over the centuries (Marra et al., 2013; Lazovic et al., 2016 ´ ). Furthermore, the large number of cultivars, the high degree of kinship among many of them, mainly in cases of geographic proximity, and the possible appearance of clonal variation, have raised additional identification problems (Belaj et al., 2007; Caruso et al., 2014; Ipek et al., 2015).

Olive collections represent the main tool to preserve and certify germplasm resources (Belaj et al., 2012; Caruso et al., 2014), mainly when recent trends toward establishing modern orchards exclusively based on a few highly producing and lowvigor cultivars, may potentially lead to the erosion of this germplasm. More than 100 collections of olive genetic resources have been established at international, national and regional levels for conservation and evaluation purposes (Trujillo et al., 2014). A first World Olive Germplasm Bank (WOGB) was established since the years' 70 at IFAPA (Cordoba, Spain), with about 500 accessions from 21 countries (Belaj et al., 2011; Trujillo et al., 2014). In 2003, a second WOGB was created at INRA (Marrakech, Morocco), including 560 accessions originating from 14 Mediterranean countries (Haouane et al., 2011). An international olive collection built by CNR (ISAFOM) and planted in Zagaria (Enna, Italy), includes about 400 cultivars collected worldwide (Las Casas et al., 2014). A national collection has been built by CREA-OLI (Cosenza, Italy), consisting of approximately 500 cultivars from Italy, corresponding to 85% of total Italian olive germplasm (Muzzalupo et al., 2014). In Turkey, a national olive germplasm collection in Izmir contains 96 genotypes (Kaya et al., 2013), whereas the Greek National Olive Germplasm Collection counts on 47 olive cultivars (Xanthopoulou et al., 2014). Also new olive growing countries, such as the United States of America, have organized important olive collections (NCGR-Davis, CA, United States) (Zelasco et al., 2012), as well as Argentina, Chile, Uruguay, Australia, China, and South Africa (Trujillo et al., 2014). In addition to these important gene banks, many other minor collections were set up along the time to preserve dedicated pools of genotypes, such as cultivars with specific characteristics, wild plants, segregating progenies or core collections (Belaj et al., 2011, 2012; Díez et al., 2011; Marchese et al., 2016). Among these, the UNIPG (Perugia University, Italy) collection, established 50 years ago, represents one of the first attempts to collect and conserve ex situ a large number of olive cultivars. It contains genotypes of different geographical origin (although with prevalence of Italian cultivars), and holds great potential for the complete agronomic and exhaustive evaluation of cultivars, as reported by numerous previous works on agronomical, morphological or biological varietal performance (Breton et al., 2014; Portarena et al., 2015).

Simple sequence repeats were the main molecular markers used to characterized the olive germplasm collection (Haouane et al., 2011; Muzzalupo et al., 2014; Trujillo et al., 2014). In fact, SSRs represent the most popular markers for olive genotyping, due to the high polymorphism, extraordinary abundance and fast transferability (Sarri et al., 2006; Baldoni et al., 2009; Díez et al., 2011; Belaj et al., 2012; Hosseini-Mazinani et al., 2014; Mousavi et al., 2014, 2017). However, all SSR loci published so far, characterized by dinucleotide repeat motifs, have demonstrated several drawbacks due to the difficult discrimination among alleles (Baldoni et al., 2009). On the contrary, EST-SSRs derive from expressed regions of the genome, have a greater transferability among species and, since they are located within genes, their variation could find correlation with the phenotype (Duran et al., 2009). However, EST-SSRs may reveal less variations and lower polymorphic information than standard SSRs, eventhough sufficient for population genetic analysis and for genotyping purpose (Yang et al., 2013). For this reason, new trinucleotidic EST-SSR loci recently identified (Mariotti et al., 2016) should now be widely applied for a more clear varietal characterization.

In this work, we have provided the first molecular identification of the accessions present in the UNIPG olive varietal collection. The identification of all olive trees was performed by standard SSRs and, for the first time in olive collections, by EST-SSRs. We intended to reach numerous important goals: (1) the identification of all accessions, including those closely related or morphologically similar, (2) the evaluation of discrimination power between EST-SSRs and dinucleotide standard SSRs, and (3) establishing the level and wideness of the genetic variability inside a germplasm collection, in order to make available this important source of well-defined genotypes to all interested stakeholders and researchers.

# MATERIALS AND METHODS

### Sample Collection and Archival Records

The Olive Varietal Collection of the University of Perugia – Department of Agricultural, Food and Environmental Sciences (UNIPG) – is located in Prepo, Perugia (43◦ 040 53.94<sup>00</sup> N – 12◦ 220 53.25<sup>00</sup> E, altitude about 400 m asl), on a clay soil, with medium content of organic matter, phosphorus and potassium, temperate-Mediterranean climate, average annual temperatures of 12.8◦C and annual rainfall of about 900 mm. Planting distance is 5 × 5 m and trees are grown polyconic vase-shaped. Regular agricultural practices are applied to the olive plants, without irrigation. The collection, established in 1965, has been duplicated in 1984 and enlarged by adding further local, national and international cultivars. Based on the UNIPG archive, the collection consisted of 370 olive plants, where each genotype was represented at least by three replications, randomly distributed in a single block, although, some cultivars (Carolea, Maurino, Moraiolo, Leccino, Frantoio, San Felice, Nostrale di Rigali and Manzanilla de Sevilla) were represented by at least 20 trees per cultivar, distributed in four randomized blocks, allowing for their agronomical and morpho-bio-phenological evaluation. No information was available on the original source of plant material.

#### DNA Extraction and Molecular Analysis

Leaf samples were collected from each plant, for a total of 370 accessions, and plant position of each tree was recorded. For each accession, total DNA was extracted from fresh leaves following the standard manufacturer's instructions of GeneElute Plant Genomic DNA Miniprep Kit (Sigma–Aldrich).

All samples were analyzed by using nine best ranked EST-SSR markers (OLEST1-7-9-12-14-16-20-22-23) recently developed (Mariotti et al., 2016). Double step polymerase chain reactions (PCR) were performed in a volume of 25 µl containing 25 ng of DNA, 10× PCR buffer, 200 µM of each dNTP, 10 pmol of primer forward (with 18 bp tail in 5<sup>0</sup> ) and reverse, and 2 U of DNA Polymerase (Q5 High Fidelity DNA Polymerase, New England Biolabs). In the second step, fluorescent tail (10 pmol) was annealed to the forward primer using a double step PCR: the first step consisting in an initial denaturation at 95◦C for 5 min, followed by 35 cycles of 95◦C for 30 s, 60◦C for 30 s and 72◦C for 25 s, the second step (for tail annealing) made up of 20 cycles, with the same conditions of the first step except for annealing temperature (Tm = 52◦C), a final elongation at 72◦C for 40 min closed the second step PCR.

In order to verify the identity of cultivars present in the collection, all samples were genotyped by using standard dinucleotide SSRs markers, widely applied for cultivar characterization in most olive germplasm collections (Haouane et al., 2011; Muzzalupo et al., 2014; Trujillo et al., 2014). Ten high polymorphic markers were applied, including DCA3-5-9-16-18, EMO90, GAPU71B-101-103A and UDO-043 (Sefc et al., 2000; Carriero et al., 2002; Cipriani et al., 2002), previously selected as best performing loci (Baldoni et al., 2009) and common to the other genotyping works. Forward primers carried VIC, FAM, PET, or NED labels at their 5<sup>0</sup> -end. Standard PCR amplifications were performed in a reaction volume of 25 µl containing 25 ng of DNA, 10× PCR buffer, 200 µM of each dNTP, 10 pmol of each forward and reverse primer, and 2 U of Q5 High-Fidelity DNA Polymerase (New England Biolabs), with an initial denaturation at 95◦C for 5 min, followed by 40 cycles of 95◦C for 30 s, annealing temperature as suggested by authors (50–60◦C) for 30 s and 72◦C for 25 s, followed by a final elongation at 72◦C for 40 min.

Polymerase chain reactions products were loaded on an ABI 3130 Genetic Analyzer (Applied Biosystems-Hitachi) using the internal GeneScan 500 LIZ Size Standard (Thermo Fisher Scientific). Output data were analyzed by GeneMapper 3.7 (Applied Biosystems).

In order to verify the match of the 370 olive samples with previously characterized cultivars, the data obtained for the 10 standard SSR markers were compared to those available in the database of olive SSR profiles established at CNR-IBBR of Perugia (Italy), including more than 1,000 worldwide olive cultivars, and to other available datasets (Baldoni et al., 2009; Trujillo et al., 2014), allowing to establish cultivar identity and determine all cases of identical profiles, presumably corresponding to clonal genotypes with undetermined presence of mutationsclonal replicates (Baldoni et al., 2009, 2011; Bartolini, 2009; Mousavi et al., 2017).

#### Allele Frequency and Diversity Analysis

Number of alleles per locus (Na), number of effective alleles (Ne), Shannon's information index (I), observed (Ho) and expected heterozygosity (He), and fixation index (F) were calculated at each locus for novel and standard SSRs by the use of GenAlEx 6.501 software (Peakall and Smouse, 2012). Pairwise relatedness was performed on standard and OLEST SSR markers to calculate the allelic similarity for codominant data using GenAlEx 6.501 following the LRM = Lynch and Ritland (1999) estimator – Mean multiplied by 2 to give max of 1.00. The software FreeNA (Chapuis and Estoup, 2007) was applied to detect the presence of possible null alleles (Fnull), to determine the genetic uniqueness of each accession and to quantify redundancy. Polymorphic information content (PIC) was calculated for each microsatellite locus using CERVUS v.3.0 software (Marshall et al., 1998). We calculated the probabilities of identity for unrelated individuals [P(ID)] at each locus and across loci, as described by Waits et al. (2001), by using GenAlEx for both OLEST and standard SSR markers. Cumulative P(ID) was calculated by ranking the PIC values at each locus from high to low. We used the criterion of P(ID) lower than 0.001 for the estimation of the minimum number of loci required for individual identification in the study species (Waits et al., 2001).

A model-based Bayesian clustering method was applied to infer the genetic structure of 59 cultivars and to define the number of clusters in the dataset (gene pools) using the software STRUCTURE v.2.3 (Pritchard et al., 2009), for the same sample set separately for OLEST and standard SSRs. Tests were based on an admixture model with independent allele frequencies. No prior information was used to define clusters. Independent runs were done by setting the number of clusters (k) from 1 to 10. Each run comprised a burn-in length of 100,000 followed by

100,000 MCMC (Monte Carlo Markov Chain) replicates. An ad hoc statistic 1K, based on the rate of change in the log probability of data between successive K values, as described by Evanno et al. (2005), was calculated through Structure Harvester v.0.9.93 website (Earl, 2012) and used to estimate the most likely number of clusters (k). In order to verify the breakdown of cultivars present in the Perugia collection to the Mediterranean groups previously observed (Sarri et al., 2006), their profiles for ten standard SSRs were analyzed with those of 281 most widely cultivated cultivars of Mediterranean from the CNR-IBBR database by using the same Structure parameters. Data of 281 cultivars were already published (Baldoni et al., 2009, 2011; Mousavi et al., 2017).

#### RESULTS

#### Polymorphisms Detected at EST and Standard SSR Loci

The nine OLEST markers analyzed were easily scored, showed low stuttering and clear differentiation among alleles (**Table 1**, Supplementary Table S1 and Figure S1). Mean Na amounted to 7.9, ranging between 5 (OLEST9) and 15 (OLEST16). Ne was 4.466 on average, while the mean I value was 1.636. He (0.760) was in general higher than Ho (0.718), unless for OLEST22 and 23, where Ho was significantly higher than He. F values were positive on average, excluding OLEST22 and 23, and a negligible or moderate amount of null alleles was observed, with no effect on their discrimination power. PIC values were higher than 0.5 at all OLEST loci, with an average value of 0.726 and the maximum discrimination power for OLEST16 (0.848) and OLEST1 (0.804).

Total number of alleles for standard SSRs (**Table 1**) was considerably higher than for OLESTs, with 12.6 alleles per locus. Mean Ho was similar to He (0.808 and 0.802, respectively), and three out of 10 loci (DCA18, GAPU71B and GAPU101) with Ho higher than 0.9. F and Fnull were slightly negative, showing −0.012 and −0.007, respectively, whereas the mean value of PIC was 0.781. Cumulative probability of identity values (**Figure 1**) showed that a minimum of three loci was required for OLEST markers and only two for standard SSRs to reach P(ID) < 0.001. Therefore, only four and three loci were needed to distinguish all genotypes for OLEST and standard SSR markers, respectively. Nine OLEST [cumulative P(ID) = 2.5e−10] or 10 standard SSRs [cumulative P(ID) = 7.3e−14] allow for the unequivocal individual identification for this sample set with a high statistical confidence.

#### Genetic Identity and Differentiation

The comparison of standard SSR profiles with the CNR-IBBR dataset and previous published data allowed for the identification of UNIPG collection's samples. Fifty nine distinct genotypes were identified, corresponding to 72 olive cultivars reported in the UNIPG archive. In fact, some samples called in the archive by different names, showed in our work identical genetic profiles (Supplementary Table S1 and **Table 2**). Among

TABLE 1 | Indices of genetic diversity at 72 cultivars for each SSR locus: number of alleles (Na), number of effective alleles (Ne), Shannon's information index (I), observed heterozygosity (Ho), expected heterozygosity (He), fixation index (F) and presence of null alleles (Fnull), Polymorphism Information Content (PIC).


genotypes with identical profiles, the first group included the Portuguese cultivars Azeteira and Negrinha, eight cultivars resulted identical to Frantoio (Frantoio Corsini, Razzola, Casaliva, Razza, Taggiasca, Raja Sabina, and Ogliarola di Bitonto) (Group 2), Ogliarola Salentina, Mignola and Cima di Mola formed the third group, Moraiolo, Moraiolo Corsini and Corniolo the fourth, and Dritta di Moscufo and San Felice were the last case of identity. The OLEST markers showed exactly the same results, confirming all cases of identical profiles

(Supplementary Table S1). Eight different countries are represented in the collection, including Italy with 37 cultivars, Spain with nine, Greece with four, Portugal and France with three each, Morocco, Syria, and Tunisia with one each. Thirteen out of the 59 olive genotypes (Dolce d'Andria, Dritta di Loreto, Laurina, Morellona di Grecia, Negrera, Nostrale di Rigali, Olivago, Olivone, Orbetana, Pasola di Andria, Pocciolo, Santagatese, Tendellone) resulted exclusive to this collection and absent in the main WOGBs. Pairwise allelic relatedness performed by GenAlEx showed 100 percent of similarity between the synonymous cultivars (LRM = 1.00) for both set of markers. Comparing OLEST and standard SSRs for allelic similarity the highest values for non-synonymous cultivars were 0.67 and 0.57 respectively, while the minimum LRM values were −0.43 for OLEST and −0.31 for standard SSR markers.

#### Population Genetic Structure

From the Structure analysis of data derived from 10 standard SSR loci on the 59 UNIPG cultivars run with 281 Mediterranean representative cultivars (Supplementary Figure S2), the stabilization, in terms of log-likelihood values of 1K values was observed at K = 3 and, assigning individuals to a population for values above 70%, it was observed that 16 cultivars clustered into the Western Mediterranean group, 35 in the Central one and 12 in the Eastern population, only nine genotypes showed high levels of admixture among two or three groups.

The Structure analysis within the cultivars of the collection performed on OLEST and standard SSRs showed the most probable grouping at K = 4 (**Figures 2A,B**). Most of the 59 TABLE 2 | Collection code, name of cultivars and Country of origin.


(Continued)

TABLE 2 | Continued


G1, G2, G3, G4, and G5: To each group number correspond cultivars showing identical genotype. The underlined cultivars are those exclusively present in the Perugia collection and not in the World Olive Germplasm Banks.

cultivars resulted assigned to two of the four groups for standard SSRs while for OLEST the four structure population were well balanced. In fact, the proportion of membership for OLEST markers was from 0.158 (Pop2) to 0.406 (Pop1), while for standard SSRs the lowest value was 0.054 (Pop2) and for the Pop1 and Pop4 membership value were 0.423 and 0.414 respectively. Only 20 cultivars were assigned to the same population by both set of markers (**Figures 2A,B**). The expected heterozygosity individuated by Bayesian analysis within the same population was on average higher for standard than for OLEST markers (0.84 and 0.76, respectively). Furthermore, the level of population assignment for OLEST markers was lower than standard SSRs (0.75 and 0.88, respectively).

#### DISCUSSION

The application of highly effective and discriminant markers may allow the correct identification of all accessions, establishing their representativeness of the species variability and justifying their conservation in ex situ collections. This step is crucial to avoid redundancy in germplasm repositories, reducing management costs, distributing true-to-type genotypes for propagation, ratifying reliable genetic sources for breeding programs. The management of germplasm collections, in fact, requires attention and mistakes may be introduced at many stages, from the origin of plant material, that may derive from other collections, private orchards or unreliable sources, to propagation and field planting, and each accession needs correct identification and passport data (Kato et al., 2012; Potts et al., 2012; Trujillo et al., 2014). A thorough and accurate genotype profiling represents a crucial prerequisite to assist breeding programs, perform comparative studies and assess innovative researches.

The collection of olive cultivars established at the University of Perugia represents one of the first efforts to converge into a single set deeply diverse genotypes, deriving from areas with highly different climatic and growing conditions, in order to preserve the variation of cultivated olives and evaluate their characteristics. The genetic identity of genotypes at the UNIPG olive collection was never ascertained before and we were committed to achieve a complete genotyping of all accessions.

Simple sequence repeats markers have become the preferred tool for the identification of olive cultivars, due to their high discrimination power and straightforward data reading (Haouane et al., 2011; Trujillo et al., 2014), however, the largely used dinucleotide SSRs have shown problems related to difficult discrimination between neighboring alleles and low comparability of data among different labs, severely reducing their applicability for large-scale screening (Baldoni et al., 2009) and for comparing the molecular profiles of accessions distributed in different collections (Diez et al., 2015; Torkzaban et al., 2015). For this reason, we decided to apply both, the best ranked dinucleotide SSRs and the recently developed trinucleotide EST-SSRs (OLEST) (Mariotti et al., 2016), in order to also evaluate their reliability in genotyping germplasm repositories.

To establish cultivar identity and determine all clonal replicates, 10 standard dinucleotide SSR markers were preliminarly applied and allele profiles were compared with previously published data (Baldoni et al., 2009, 2011; Hosseini-Mazinani et al., 2014; Trujillo et al., 2014; Mousavi et al., 2017), or included in the CNR-IBBR database. Results derived from these analyses highlighted the presence of 59 distinct genotypes, including five groups of cultivars sharing identical SSR profiles (Bartolini, 2009; Trujillo et al., 2014), but coming from different areas of cultivation and carrying different names.

The same results were obtained when the analysis was independently performed with the new OLEST SSRs: 59 genotypes were distinguished and identified, and the same groups with identical profiles were displayed. Also the values of diversity parameters resulted quite similar to those of best ranked dinucleotide SSRs, particularly for the discrimination power and observed heterozygosity values, with a negligible presence of null alleles. The pairwise relatedness analysis demonstrated the same single-profile groups and highlighted that OLEST markers were more efficient to discriminate among the most polymorphic genotypes, showing the minimum values of allelic similarity.

The occurrence of cases of identical genotype under different cultivar names represents a primary source of problems for identification and a major challenge to the management of germplasm collections (Belaj et al., 2007; Abdessemed et al., 2015). In the olive case (Bradai et al., 2016), as for many other long living trees (Vezzulli et al., 2012; Urrestarazu et al., 2012; Fresnedo-Ramírez et al., 2013; Jiao et al., 2013; Frank and Chitwood, 2016), it can not be theoretically excluded that plant genotypes clonally propagated and living for thousands of years, may accumulate somatic mutations, over the time or as a result of environmental shocks. But these mutations could not be easily

revealed by the use of a restricted set of SSR markers and, for this reason, we decided to leave the original names of cultivars, even if they showed the same SSR profile, making them available for future in-depth genomic analyses that would highlight eventual polymorphisms otherwise undetectable (Wu et al., 2014).

By using only three OLEST markers it was possible to discriminate 96.6% of all genotypes. Moreover, OLEST SSRs resulted more easily scorable than dinucleotide SSRs, and didn't show stuttering problems due to the higher distance among similar alleles and lower slippage during replication. Using the three OLEST markers with the highest PIC values (OLEST1, OLEST14 and OLEST16), 57 out of 59 genotype were discriminated, whereas applying the three most discriminant standard SSRs (DCA09, DCA16 and GAPU103A), all 59 genotypes were completely recognized. In fact, the individual identification estimator [P(ID)] indicates two different accessions may have the same genotype at one specific locus in a population by chance rather than through inheritance, we found that both set of markers were able to clearly distinguish all 59 olive genotypes in the Perugia olive collection.

The Bayesian structure analysis of genotypes present in the Perugia assortment with the wide set of other important cultivars of Mediterranean basin, has shown that the collection well represents the groups in which the cultivated Mediterranean olives were previously splitted (Haouane et al., 2011; Diez et al., 2015), with a higher membership to the Central Mediterranean group, likely due to the prevalence of Italian cultivars. Furthermore, this repository owns 13 cultivars not present in the main international olive germplasm banks (Haouane et al., 2011; Trujillo et al., 2014), strengthening its relevant function for conservation, evaluation and protection of specific genotypes potentially endangered.

When the same analysis was exclusively performed on the UNIPG genotypes, 34% of cultivars resulted assigned to the same population by both sets of markers. The Bayesian results clearly highlighted the differences between OLEST and standard SSRs in the cultivar's assignment into the structure populations. These dissimilarity was evidenced by the values of expected heterozygosity, the overall proportion of membership and admixture level. Therefore, the results of the present study suggest that, for phylogenetic studies, by using different set of markers could achieve unbalanced assignments. The different ability of both kinds of markers to group cultivars into different clusters could be explained by the nature of OLEST markers as mutations residing in the sequence of transcribed genes, and their alleles could display a higher frequency at regional level, where cultivars were selected based on common characteristics (Biton et al., 2015; Mariotti et al., 2016). Considering that olive domestication process has implied a selection of cultivars for certain agronomic characters, resulting in a loss of genetic variation due to genetic bottlenecks and, in some cases, episodes of founder effect (Cao et al., 2014; Hosseini-Mazinani et al., 2014; Mousavi et al., 2017), EST-SSRs could be related to agronomical traits more than neutral standard SSRs. The very long history of olive growing with several trading events, introduction of alien cultural practices and changes of dietary habits, may have blurred the fingerprints of independent domestication events and led to complex relationships among cultivars (Sarri et al., 2006; Soleri et al., 2010; Díez et al., 2011; Koehmstedt et al., 2011).

The Perugia collection represents the first study case of a real olive germplasm repository validated by standard SSRs and characterized by EST-SSRs. The work has allowed to confirm the OLEST markers as effective genotyping tools, as good as best standard markers for cultivar identification, allowing to avoid

#### REFERENCES


the application of other unreliable dinucleotide SSRs. The use of the OLEST markers on a wide set of olive cultivars will help establishing a common fingerprint database without miscalling and binning, exploitable for several molecular investigations, representing a valuable resource for comparative genomics, evolutionary analyses and population studies.

#### AUTHOR CONTRIBUTIONS

SM, LB, LR, PP, MB, RM, LN, and SP contributed substantially to the conception and design of the work; SM, LR, LN, RM, and SP contributed to plant material collection; SM, LB, and RM, performed all molecular work and genotype scoring; SM, LB, LR, RM, LN, and SP interpretation of data; SM, LB, LR, PP, MB, RM, LN, and SP drafted the text; SM, LB, LR, PP, MB, RM, LN, and SP approved the version to be published; SM, LB, LR, PP, MB, RM, LN, and SP agreed to be accountable for all aspects of the work.

## ACKNOWLEDGMENTS

The study has been partially performed within the Project "BeFOre – Bioresources for Oliviculture," 2015–2019, H2020- MSCA-RISE- Marie Skłodowska-Curie Research and Innovation Staff Exchange, Grant Agreement N. 645595.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpls.2017.01283/ full#supplementary-material


persica, and human influences on perennial fruit crops. Genome Biol. 15, 415. doi: 10.1186/s13059-014-0415-1



(Spain) using SSR and morphological markers. Tree Genet. Genomes 10, 141– 155. doi: 10.1007/s11295-013-0671-3


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Mousavi, Mariotti, Regni, Nasini, Bufacchi, Pandolfi, Baldoni and Proietti. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Emerging Oilseed Crop Sesamum indicum Enters the "Omics" Era

Komivi Dossa1,2,3 \*, Diaga Diouf<sup>2</sup> \*, Linhai Wang<sup>3</sup> , Xin Wei<sup>3</sup> , Yanxin Zhang<sup>3</sup> , Mareme Niang<sup>1</sup> , Daniel Fonceka1,4, Jingyin Yu<sup>3</sup> , Marie A. Mmadi1,2,3 , Louis W. Yehouessi<sup>1</sup> , Boshou Liao<sup>3</sup> , Xiurong Zhang<sup>3</sup> \* and Ndiaga Cisse<sup>1</sup> \*

<sup>1</sup> Centre d'Etudes Régional Pour l'Amélioration de l'Adaptation à la Sécheresse, Thiès, Sénégal, <sup>2</sup> Laboratoire Campus de Biotechnologies Végétales, Département de Biologie Végétale, Faculté des Sciences et Techniques, Université Cheikh Anta Diop, Dakar, Sénégal, <sup>3</sup> Key Laboratory of Biology and Genetic Improvement of Oil Crops, Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Ministry of Agriculture, Wuhan, China, <sup>4</sup> Centre de Coopération Internationale en Recherche Agronomique Pour le Développement, UMR AGAP, Montpellier, France

#### Edited by:

Mariela Torres, Instituto Nacional de Tecnología Agropecuaria (INTA), Argentina

#### Reviewed by:

Sarita K. Pandey, International Crops Research Institute for the Semi-Arid Tropics, India Ioannis Ganopoulos, Institute of Plant Breeding and Genetic Resources-ELGO DEMETER, Greece

#### \*Correspondence:

Diaga Diouf diaga.diouf@ucad.edu.sn Xiurong Zhang zhangxr@oilcrops.cn Ndiaga Cisse ncisse@refer.sn Komivi Dossa dossakomivi@gmail.com

#### Specialty section:

This article was submitted to Crop Science and Horticulture, a section of the journal Frontiers in Plant Science

> Received: 19 April 2017 Accepted: 15 June 2017 Published: 30 June 2017

#### Citation:

Dossa K, Diouf D, Wang L, Wei X, Zhang Y, Niang M, Fonceka D, Yu J, Mmadi MA, Yehouessi LW, Liao B, Zhang X and Cisse N (2017) The Emerging Oilseed Crop Sesamum indicum Enters the "Omics" Era. Front. Plant Sci. 8:1154. doi: 10.3389/fpls.2017.01154 Sesame (Sesamum indicum L.) is one of the oldest oilseed crops widely grown in Africa and Asia for its high-quality nutritional seeds. It is well adapted to harsh environments and constitutes an alternative cash crop for smallholders in developing countries. Despite its economic and nutritional importance, sesame is considered as an orphan crop because it has received very little attention from science. As a consequence, it lags behind the other major oil crops as far as genetic improvement is concerned. In recent years, the scenario has considerably changed with the decoding of the sesame nuclear genome leading to the development of various genomic resources including molecular markers, comprehensive genetic maps, high-quality transcriptome assemblies, webbased functional databases and diverse daft genome sequences. The availability of these tools in association with the discovery of candidate genes and quantitative trait locis for key agronomic traits including high oil content and quality, waterlogging and drought tolerance, disease resistance, cytoplasmic male sterility, high yield, pave the way to the development of some new strategies for sesame genetic improvement. As a result, sesame has graduated from an "orphan crop" to a "genomic resource-rich crop." With the limited research teams working on sesame worldwide, more synergic efforts are needed to integrate these resources in sesame breeding for productivity upsurge, ensuring food security and improved livelihood in developing countries. This review retraces the evolution of sesame research by highlighting the recent advances in the "Omics" area and also critically discusses the future prospects for a further genetic improvement and a better expansion of this crop.

Keywords: Sesamum indicum, Omic resources, molecular breeding, large-scale re-sequencing, improvement

# INTRODUCTION

Since the beginning of agriculture, humans have been selecting and cultivating crops that would serve their taste, energy, and health requirements (Nagaraj, 2009). Oilseeds are crops in which energy is stored mainly in the form of oil and are a very important component of semi-tropical and tropical agriculture, providing easily available and highly nutritious human and animal food

**237**

(Weiss, 2000). Among the important oilseed crops widely grown in the world such as rapeseed, peanut, soybean, sunflower, sesame (Sesamum indicum L.) provides one of the highest and richest edible oils (Pathak et al., 2014). Sesame is a diploid species (2n = 2x = 26), an annual plant principally grown for its seeds. The seed contains 50–60% oil which has an excellent stability due to the presence of natural antioxidants such as sesamolin, sesamin, and sesamol (Anilakumar et al., 2010). The chemical composition of sesame oil characterized by a low level of saturated fatty acids (SFAs) (less than 15%) and the presence of antioxidants has been reported to have health promoting effects such as lowering cholesterol levels and hypertension in humans (Noguchi et al., 2001; Sankar et al., 2005), neuroprotective effects against hypoxia or brain damage (Cheng et al., 2006) and reducing the incidence of certain cancers (Hibasami et al., 2000; Miyahara et al., 2001). With increasing knowledge on the dietary and health benefits of sesame, the market demand for its seed and oil has enlisted a continuous steep increase. Likewise, sesame by virtue of low irrigation requirement, adaption to different types of soil and weather conditions, not being labor intensive and being instead a highly remunerative crop, is ideally suited to replace low-yield crops, especially in the current scenario of global warming affecting crop productivity in more and more traditional agricultural areas. As a result, sesame production is rapidly increasing over the years and is becoming an alternative important cash crop for smallholders, thus helping to alleviate rural poverty (**Figure 1**). In 2014, more than 6 million tons of sesame seeds have been produced under nearly 11 million ha classifying sesame at the ninth rank among the major oil crops (Food and Agriculture Organization Statistical Databases [FAOSTAT], 2015).

Despite its importance, sesame is considered as an orphan crop because it has received very little support from science, industry and policy makers. As a consequence, it lags behind the other major oilseed crops as concerns genetic improvement (Dossa, 2016). Cultivated sesame still has some wild characters including seed shattering, indeterminate growth habit and asynchronous capsule ripening leading to a very weak seed yield (300–400 Kg/ha) (Islam et al., 2016). Furthermore, sesame is often grown in harsh environments and exposed to various biotic and abiotic stresses that heavily impair its productivity (Witcombe et al., 2007). Hence, it has become crucial to enhance sesame germplasms for higher productivity and seed quality to efficiently cope with the growing demand of its oil.

Limited progress has been made in these directions through conventional breeding methods due to a lack of genomic tools and resources for deep insights into the underlying molecular background of the important agronomic traits. In addition, few scientific groups are engaged in sesame research worldwide resulting in a slow pace of sesame improvement strategies. However, in recent years, significant breakthroughs in the "Omics" area have taken sesame research into another higher stage. This has then thus propelled us to review the notable achievements made so far in this field as well as the future perspectives to speed-up sesame improvement.

# EVOLUTIONARY HISTORY OF SESAME RESEARCH

Sesame is a very ancient crop thought to be one of the oldest oil crops known by humankind (Bedigian and Harlan, 1986; Ashri, 1998). Its research history followed three major periods viz. the "germplasm collection and genebank constitution" era, the "classical breeding and genetics" era, and currently the "Omics" era (**Figure 2**). During the first era (before year 2000), genetic materials of cultivated sesame as well as wild related species were collected from many growing areas, morphologically characterized and different seedbanks have been set up in several countries (Hiltebrandt, 1932; Kinman and Martin, 1954; Bedigian and Harlan, 1986; Bisht et al., 1998). Meanwhile during that period, questions related to the origin and domestication process of the cultivated sesame were the source of long debate and investigations (Hiltebrandt, 1932; Nayar and Mehra, 1970).

The second era in sesame research (2000–2013) was characterized first, by the employment of classical breeding methods including induced mutation and screening of genotype for desirable characters (Wongyai et al., 2001; Uzun et al., 2003; Boureima et al., 2012). Afterward, sesame research has witnessed a rapid development of genetic tools particularly molecular markers and their application in genetic diversity studies and marker assisted breeding (Dixit et al., 2005; Wang et al., 2012b; Wei X. et al., 2014). In addition, during that period of time, several studies have been undertaken on sesame oil properties that shed more light on the nutritional, pharmaceutical and engineering applications of this untapped crop (Miyahara et al., 2001; Cheng et al., 2006; Saydut et al., 2008; Anilakumar et al., 2010).

Finally, since 2013, the sesame research enters the "Omics" era. With the completion of the nuclear and chloroplast genome sequencing as well as the release of various transcriptomic data, enormous genomic resources have been generated and being applied for sesame improvement.

# GENETIC RESOURCES

The Sesamum genus belongs to the Eudicotyledon clade, Lamiales order, Pedaliaceae family (Mabberley, 1997) and S. indicum is the well-known and widely grown species within this genus (Ashri, 1998). Kobayashi et al. (1990) proposed 36 species belonging to this genus including 22 species found exclusively in the African continent, five in Asia, seven commonly found in Africa and Asia and one species each in Brazil and a Greek island. Later on, based on works of Bedigian, the list of Sesamum species has been revisited to 23 species (IPGRI and NBPGR, 2004) (**Table 1**). Beside S. indicum, the species S. radiatum is also cultivated in some African countries as leafy vegetables.

Because most of wild species of the Sesamum genus exist only in Africa, sesame has been thought to be originated from this continent (Hiltebrandt, 1932). However, according to evidence in studies Bedigian' (2003, 2004), it is assumed that the crop has been domesticated from its wild relative species S. malabaricum native to south Asia and spread west to Mesopotamia before 2000

B.C. (Fuller, 2003). Sesame harbors a huge diversity probably because of an adaptation to the various environments where its presence has been recorded coupled with long-term natural and artificial selections (Bedigian and Harlan, 1986; Wei et al., 2015). In total, five major centers of diversity have been proposed for sesame including India, China, Central Asia, the Middle-East and Ethiopia (Zeven and Zhukovsky, 1975). Thanks to the meaningful efforts of the scientific community in sesame germplasm collection, characterization and conservation, huge genetic materials of cultivated sesame along with wild related species are currently preserved in several genebanks around the world mainly in Asia (Zhang Y. et al., 2012) (**Table 2**). The principal genebanks of sesame held in India (NBPGR National Gene Bank), in South Korea (National Agrobiodiversity Center, Rural Development Administration; Park et al., 2015), in China (Oil Crops Research Institute, Chinese Academy of Agricultural Sciences; Wei et al., 2015) and in United States (USDA, ARS, PGRU), have preserved about 25,000 genetic materials (**Table 2**). Moreover, several small-scale genebanks exist in some African countries including Nigeria, Ethiopia, Sudan, etc. Since these genebanks harbor important quantity of genetic resources, it is important to establish core collections (CC) which is a favored approach for the efficient exploration and utilization of novel variations in genetic resources (Hodgkin et al., 1995). In this vein, researches on sesame CC establishment have been conducted resulting in 362 accessions for Indian germplasm (Bisht et al., 1998), 453 accessions for Chinese germplasm (Zhang et al., 2000) and 278 accessions for Korean germplasm (Park et al., 2015). These are the reservoirs of genetic resources for the present and future sesame improvement programs. Unfortunately, utilization of these wealthy genetic resources for sesame improvement is very limited and most of diversity existing in the germplasm remains unexplored (Dossa et al., 2016a). Furthermore, it becomes apparent that sesame genetic resources from Asia have been well characterized and preserved in contrast to African germplasm which also harbors a valuable diversity (Dossa et al., 2016a). Therefore, further exertions are needed to gather locally available sesame accessions and wild related species from Africa and constitute an extensive genebank for their efficient conservation and exploitation.

#### "OMICS" RESOURCES

The genetic and molecular biology study of sesame began very late with only one genetic map published and no report on

TABLE 1 | Revised list of Sesamum species and their chromosomes number (2n).


Source: IPGRI and NBPGR, 2004.

TABLE 2 | List of worldwide major genebanks available for sesame species.


quantitative trait loci (QTL) mapping before 2013. However, over the last few years, some significant progress has been made in the development of large-scale genomic resources including informative molecular markers, ultra-dense genetic maps, transcriptome assemblies, multi-omics online platforms etc. In addition, the release of the draft genome of sesame (Wang et al., 2014a) triggered functional analyses of candidate genes related to key agronomic traits. With these invaluable efforts, sesame holds some important genomic resources and platforms for its improvement which for the time being are inexistent in some important oilseed crops such as groundnut. Similarly like pigeonpea (Pazhamala et al., 2015), chickpea, millets (Varshney et al., 2009, 2010), sesame has graduated from an "orphan crop" to a "resource-rich crop".

#### Molecular Markers

Molecular marker technologies have significantly speeded up modern plant breeding in enhancing the genetic gain and reducing the breeding cycles in many crop species. Different types of molecular marker systems have been developed and applied to sesame genotyping and breeding efforts. The first class of molecular markers including Random Amplified Polymorphic DNA (RAPD; Bhat et al., 1999) and Amplified Fragment Length Polymorphism (AFLP; Laurentin and Karlovsky, 2006) were designed and employed mainly for genetic diversity studies. The second class of markers involved basically Simple Sequence Repeat (SSR) types such as Inter-Simple Sequence Repeats (ISSR; Kim et al., 2002), Expressed Sequence Tags-SSR (EST-SSR; Wei et al., 2008, 2011; Badri et al., 2014; Sehr et al., 2016), cDNA-SSR (Spandana et al., 2012; Wang et al., 2012b; Zhang H. et al., 2012; Surapaneni et al., 2014; Wu et al., 2014b), Genome sequence-SSR (gSSR; Dixit et al., 2005; Wei X. et al., 2014; Uncu et al., 2015; Dossa, 2016; Yu et al., 2016), Chloroplast SSR (cpSSR, Sehr et al., 2016). By compiling all developed SSR marker resources, there are in total more than 7,000 validated and 100,000 non-validated SSR markers available for sesame research. Interestingly, a new study is underway to set up an online database gathering all SSR information and providing an integrated platform for functional analyses in sesame. Many of these markers were used for genetic and association mapping, molecular breeding and genetic diversity studies in sesame (Wei et al., 2013; Li et al., 2014; Liu et al., 2015; Uncu et al., 2015; Dossa et al., 2016a). Finally, in recent years with the next generation sequencing (NGS) technology, the third class of molecular markers came into existence. SNPs are more useful as genetic markers than many conventional markers because they are the most abundant and stabile form of genetic variation in most genomes. Therefore, the available high-throughput methods for SNP discovery and genotyping have been employed in sesame research including Restriction site-Associated DNA sequencing (RAD-seq; Wu et al., 2014a; Wang et al., 2016a), Specific Length Amplified Fragment Sequencing (SLAF-seq; Zhang et al., 2013), RNA-Seq (Wei L. et al., 2014), Whole-Genome Sequencing (WGS; Wang et al., 2014b; Wei et al., 2015; Zhang et al., 2016), Genotyping by sequencing (GBS; Uncu et al., 2016). Another important marker system referred as insertion/deletions (Indels) has also been reported in sesame (Wei L. et al., 2014; Wu et al., 2014a).

As a whole, molecular marker technologies in sesame are witnessing considerable progress and it is obvious that sesame is no longer lagging far behind major crops in this field.

#### Genome Sequence Resources

A high-quality reference genome sequence provides access to the relatively complete gene catalog for a species, the regulatory elements that control their function and a framework for understanding genomic variation. As such, it is a prerequisite resource for fully understanding the role of genes in development, driving genomic-based approaches to systems biology and efficiently exploiting the natural and induced genetic diversity of an organism (Feuillet et al., 2011). Researchers from Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, BGI and other institutes have successfully cracked the nuclear genome of sesame, generating 54.5 Gb of high-quality data from the elite cultivar "Zhongzhi No.13" using the Illumina Hiseq2000 platform. This has been the major breakthrough in the sesame research for decades (Wang et al., 2014a). The highquality draft genome encompassing 27,148 genes distributed on 16 Linkage Groups (LG) with 274 Mb of size, has become the reference genome for biology study in sesame<sup>1</sup> . This genome with a contig N50 of 52.2 kb and a scaffold N50 of 2.1 Mb has been recently upgraded to reach 13 pseudochromosomes, 94.3% of the estimated genome size and 97.2% of the predicted gene models in sesame (Wang et al., 2016a). In parallel, another genome sequencing project was initiated under the auspices of the Sesame Genome Working Group (SGWG). By 2013, they assembled from the variety "Yuwhi 11" a genome size of 293,7 Mb out of the 354 Mb estimated in sesame and predicted the function of 23,713 genes<sup>2</sup> . More recently, two new genome sequences from sesame landraces ("Baizhima" and "Mishuozhima") have also been released, increasing the genome sequence resources available for this crop (Wei et al., 2016).

In addition, works carried out by a team of National Bureau of Plant Genetic Resources from India resulted in the genome sequencing of the Indian variety<sup>3</sup> "Swetha.". Of note, nearly 1000 sesame accessions and mapping population have been re-sequenced providing tremendousum and inestimable genome-wide information (Zhang et al., 2013a, 2016; Wang et al., 2014b, 2016a; Uncu et al., 2015, 2016; Wei et al., 2015). Nowadays, gene family study, gene fine-mapping, gene cloning and molecular breeding, genome wide association studies (GWAS), genome variation and evolution studies are feasible (Wei et al., 2015, 2016; Dossa et al., 2016b,d; Yu et al., 2017). Novel breeding approaches such as genomic selection (GS) could be implemented in sesame and accelerate the crop improvement.

Beyond the nuclear genome sequences of sesame, the chloroplast genome has also been decrypted first, in a black-seeded cultivar "Ansanggae" (Yi and Kim, 2012) and subsequently, in a white-seeded cultivar "Yuzhi11" (Zhang et al., 2013b). These studies indicated that Sesamum (Pedaliaceae family) is a sister genus to the Olea and Jasminum (Oleaceae

<sup>1</sup>https://www.ncbi.nlm.nih.gov/biosample/SAMN02981519

<sup>2</sup>https://www.ncbi.nlm.nih.gov/biosample/SAMN04574066

<sup>3</sup>https://www.ncbi.nlm.nih.gov/biosample/SAMN02357081

family) clade and represents the core lineage of the Lamiales families.

#### Transcriptome Assembly

Transcriptome or EST sequencing is the first step to access the gene contents of a species and has emerged to be an efficient way to generate functional genomic data for nonmodel organisms. Like genome sequence resources, sesame holds several transcriptome data generated from various organs of the plant (**Figure 3**). The first transcriptome profiling began with works of Suh et al. (2003) who obtained 3,328 ESTs from a cDNA library of 5–25 days old immature sesame seeds. This study shed light on the metabolic pathways involved in lignan biosynthesis in sesame including sesamin and sesamolin. Wei et al. (2011) sequenced five tissues using for the first time the high-throughput Illumina paired-end sequencing technology. Likewise, works of Wei et al. (2012), Zhang H. et al. (2012), Wang et al. (2014a) yielded various transcriptome resources related to sesame growth and developmental stages using various sequencing technologies including Illumina HiSeq 2000 and GAII. These studies increased our understanding on the genomic background underpinning sesame growth and development.

On the other hand, given that sesame productivity is seriously hampered by different biotic and abiotic stresses, studies have been designed to unravel the molecular basis of stress tolerance and find out some potential genes to impart stress tolerance in sesame genotypes. The transcriptional response of root tissues to waterlogging stress was investigated in sesame root (Wang et al., 2012c). More recently based on the Illumina 2500 platform, a time-course transcriptome profiling in two sesame genotypes displaying contrasting tolerance levels provided substantial gene expression data for sesame responses to waterlogging stress (Wang et al., 2016b). Another important abiotic stress impairing sesame productivity is drought stress which is still poorly characterized at the molecular level (Dossa et al., 2016b). For this purpose, Dossa et al. (2017) explored gene expression changes in two sesame genotypes (tolerant and sensitive) based on Illumina HiSeq 4000 sequencing platform, under progressive drought and after re-watering so as to find out some potential genes associated with drought tolerance. Currently, efforts are underway to decipher sesame molecular responses to salt stress, another important abiotic stress.

Concerning the biotic stress very little research has been undertaken at the molecular level, urging more investigations for the development of disease resistant crop. RNA-seq study has been performed on resistant and susceptible sesame cultivars inoculated with Fusarium oxysporum f. sp. sesami to clarify the molecular mechanism of sesame resistance to Fusarium Wilt which is one of the main worldwide diseases in sesame, resulting in 15–30% losses of yield (Li et al., 2012; Wei et al., 2016).

Finally, transcriptome profiling approach has also been deployed to explore other important traits. For example, Liu et al. (2016) compared two near-isogenic lines [W1098A with dominant genic male sterility (DGMS) characteristic and its

fertile counterpart, W1098B] to identify differentially expressed genes related to male sterility.

To sum up, huge transcriptome data based on state-ofthe-art sequencing technologies from various tissue samples and experimental conditions are now available in sesame and have increased our knowledge on sesame biology. These resources could assist in upgrading the current reference genome in the near future. In this way, a functional transcriptome database should be built to facilitate the exploitation of these incommensurable resources.

#### Development of Genetic Maps and Breeding Populations

A genetic linkage map is a prerequisite to better understand the inheritance of traits at the genome-wide level (Verma et al., 2015). It helps to identify molecular markers associated with relevant traits that can be used in breeding programs. The first genetic map in sesame was constructed in 2009 using a combination of 220 EST-SSR, AFLP and Random Selective Amplification of Microsatellite Polymorphic Loci (RSAMPL) markers (Wei et al., 2009). It was built from a F<sup>2</sup> population COI1134 × RXBS and encompassed 30 LGs covering a genetic length of 936.72 cM with an average marker interval of 4.93 cM (**Table 3**). At that period, the number of informative molecular markers was limited hence, the development of linkage maps with a good resolution was challenging. Subsequently, this map has been improved to obtain 14 LGs spanning a genome distance of 1,216 cM. In total, 653 markers were successfully mapped with a marker density of 1.86 cM per marker interval. Thanks to the good resolution reached in this linkage map, QTL identification for seed coat color has been conducted for the first time in sesame (Zhang et al., 2013c). During the same period, another linkage map was released based on Specific-Locus Amplified Fragment sequencing (SLAF-seq) technology (Zhang et al., 2013). This map based on an F<sup>2</sup> population Shandong Jiaxiang Sesame × Zhongzhi No.13, comprised 1,233 SLAF markers that are distributed on 15 LGs, and was 1,474.87 cM in length with average marker spacing of 1.20 cM. The density of this map was higher than the previous ones and it was the first linkage map to integrate SNP markers in sesame. Surprisingly, all these three published linkage maps lacked common markers. Furthermore, an important knot is that they were constructed from temporary population (F2) that renders repeated phenotyping unfeasible, hence were not ideal for quantitative traits mapping. In this regard, another highthroughput sequencing technology named RAD-seq, has been adopted by Wu et al. (2014a) to map in a high-density linkage map nearly 1,230 markers and identified several QTLs linked to grain yield-related traits based on a recombinant inbred line (RIL) population from Zhongzhi14 × Miaoqianzhima.

Zhang et al. (2014) also developed a RIL population from Zhongzhi No.13 × Yiyangbai to analyzed QTLs related to waterlogging tolerance. Furthermore, linkage analysis strategy has been applied in an inter-specific population Ezhi1 (S. indicum L., 2n = 26) × Yezhi2 (S. mulayanum Nair, 2n = 26) and a RIL from 95 ms-5A × 95 ms-5B to study genic male sterility (GMS) traits (Zhao et al., 2013; Liu et al., 2015).

Although the resolution in sesame genetic mapping steeply increased over the years, none of the published maps included the same number of LGs as the number of chromosomes known in sesame. It was just recently that the significant works of Wang et al. (2016a), Zhang et al. (2016), and Mei et al. (2017) have permitted to reach this milestone. The first map was based on 430 RILs from Zhongzhi No.13 × ZZM2748 (semidwarf) and included 1,522 bins anchored on 13 LGs spanning 1090.99 cM genome length with a mean interval distance of 0.72 cM between adjacent bins. This bin map was used to identify several QTLs for sesame plant height and seed coat color (Wang et al., 2016a). The second one was constructed on the basis of a F<sup>2</sup> population from Yuzhi11 (indeterminate growth) × Yuzhi DS899 (determinate growth). This SNP map was comprised of 3,041 bins including 30,193 SNPs in 13 LGs with an average marker density of 0.10 cM. At present, this is densest linkage map in sesame and was efficiently applied for map-based gene identification (Zhang et al., 2016). The third one was built on 150 BC<sup>1</sup> from Yuzhi4 × Bengal Small-seed and encompassed 9,378 SLAF markers anchored onto 13 LGs spanning a total genetic distance of 1,974.23 cM with an average genetic distance of 0.22 cM.

Worth noting, a recent linkage map has been constructed from a RIL population Acc. No. 95–223 × Acc. No. 92–3091 and yielded 13 LGs encompassing 914 cM with 432 markers including 420 SNPs and 12 SSRs (Uncu et al., 2016). Besides, a haplotype map (Hapmap) has been constructed from 705 worldwide accessions providing 5,407,981 SNPs information with an average LD decay rate of 88 kb in the whole sesame genome. This genotyped worldwide panel has been selected as a "training population" which constitutes a tremendous resource for GWAS, Genomic Selection (GS) and evolution studies in sesame (Wei et al., 2015).

Definitively, from the first low-resolution genetic map constructed in 2009 to the ultra-dense and high-resolution recent ones, there is evidence that the genetic research in sesame has impressively improved and has reached a new dimension. Nonetheless, in order to use these map resources to their maximum advantage, it would be ideal to construct a consensus map that will provide a framework of unprecedented marker density and genome coverage for fine QTL analysis, association mapping, thus facilitating the application of molecular breeding strategies in diverse sesame germplasms. In this regard, an effective collaboration between the different sesame working teams will help achieve this goal.

#### Online Functional Database Resources

Owing to the multitude genomic sequence resources being released for sesame research, several integrative online databases have been created to gather sesame information and providing user-friendly platforms for researchers to easily study the molecular function of genome components, for comparative genomics and breeding applications (**Table 4**). The first webbased functional platform on sesame was set by the Sesame Genome Project but it does not supply detailed sesame genomic or genetic data at the time. Hence, after the release of the

#### Dossa et al. Advances in Sesame "Omics"

#### TABLE 3 | Genetic mapping and association mapping studies in sesame.


TABLE 4 | List of available online databases for functional genomics in sesame.


<sup>∗</sup>These database involved several species including sesame.

reference genome, a versatile online database referred to as Sinbase<sup>4</sup> was developed and provides digestible information related to sesame genomics and genetics (Wang et al., 2014c). This database included genomic component annotations to allow users to study sesame more thoroughly; genetic linkage groups for gene cloning; QTL for genetic linkage analyses; colinear blocks and orthologous genes to perform comparative and evolutionary analyses in sesame and other related species. Additionally, extensive phenotyping data were supplied describing variations in sesame plant (growth habits, branching styles, number of flowers and capsules per axil, number of carpels in a capsule, flower colors, capsule length etc.). Overall, 10 comprehensive online databases are currently available to study sesame biology including five platforms focusing solely in sesame on aspects related to genome functional components, gene expression, SSR, SNP, Indels, Transposons, QTL and functional genes, gene family, comparative genomics, genetic maps, haplotype map, phenotypes, etc. (Zhang et al., 2013a; Wang et al., 2014c; Ke et al., 2015; Yu et al., 2015, 2016; Wei et al., 2016, 2017).

Frontiers in Plant Science | www.frontiersin.org

<sup>4</sup>http://ocri-genomics.org/Sinbase

#### APPLICATION OF NEWLY DEVELOPED "OMICS" TOOLS IN SESAME RESEARCH

With mounting development of versatile "Omics" tools in sesame, their application and deployment for the crop improvement strategies have also yielded conspicuous results. Nowadays, these tools enable high efficiency and resolution in genetic diversity study, gene-trait association analysis using bi-parental or natural diverse populations, gene family study, RNA-seq based candidate gene identification. Various traits of the plant have been tagged by researchers including oil content and quality traits, yield components, tolerance to drought and waterlogging stresses, disease resistance, good plant architecture etc. A glimpse of the diverse applications of "Omics" tools in sesame research is presented in **Figure 4**.

#### Germplasm Characterization

The availability of molecular markers has paved the way for several genetic diversity studies in sesame. Knowledge on the genetic diversity and population structure of germplasm collections is an important foundation for crop improvement and a key component of effective conservation and breeding strategies (Thomson et al., 2007). Pathak et al. (2014) have previously reviewed in detail the numerous genetic diversity studies conducted on sesame germplasms. It emerges that sesame harbors a valuable diversity and incongruence between geographical proximity and genetic distance has been reported (Bhat et al., 1999; Laurentin and Karlovsky, 2006; Abdellatef et al., 2008; Pham et al., 2009; Zhang Y. et al., 2012; Alemu et al., 2013; Abate et al., 2015). However, few informative and polymorphic markers were used in these studies and might not accurately distinguish the samples. Since the genome sequence of sesame becomes available, thousands of highly informative markers including gSSRs, SNPs and Indels covering the entire genome, have been developed and applied for genetic diversity studies (Wang et al., 2014b; Wei X. et al., 2014; Wu et al., 2014b; Wei et al., 2015; Uncu et al., 2015; Dossa et al., 2016a). These studies in contrast, have proved certain patterns of association between genetic similarity and geographical proximity in sesame. Even though the exchange of sesame materials between diverse locations especially through seed trading increases gene flow, in most of cases, some locally unique gene pools still exist. For example, Dossa et al. (2016a) investigated for the first time the unknown sesame accessions from West Africa and found that they were unique and distinct from all the rest. These observations suggested that newly developed molecular markers would bring more precision and efficiency in both genetic studies and breeding programs. Furthermore, combination of molecular and morphological characterizations has been employed to assess the diversity in sesame germplasm. A total of 137 Turkish germplasm has been characterized using both morphological and molecular data which led to the selection of a core collection useful for sesame preservation and breeding (Frary et al., 2014). Ugandan sesame landraces were also investigated and results showed incongruence between morphological data and molecular data (Sehr et al., 2016). Similarly, Pandey et al. (2015) analyzed a worldwide germplasm dominated by Indian

accessions. They detected a high genetic diversity within the germplasm but there was an insignificant correlation between phenotypic and molecular marker information which highlighted the importance to associate both genetic and phenotypic diversity to efficiently inform on the extent of the variation present in sesame germplasm.

Concerning wild related species, very few genetic studies have been conducted for their characterization. Uncu et al. (2015) uncovered a high rate of marker transferability between S. indicum and S. malabaricum, supporting the designation of the two taxa as cultivated and wild forms of the same species. In earlier works, Nyongesa et al. (2013) found that, there is a high genetic diversity within wild sesame species. The clustering pattern of wild and the cultivated forms indicated that there is no cross-pollination between them during domestication. In addition, it was proved that the genetic diversity of sesame had been eroded due to selection after domestication (Wu et al., 2014b; Wei et al., 2015; Pathak et al., 2015; Mondal et al., 2016). Therefore, future sesame cultivation would benefit from the incorporation of alleles from sesame's wild relatives. Wild species of sesame possess genes for resistance to major biotic and abiotic stresses and adaptability to different environments (Joshi, 1961). Unfortunately, in contrast to several crops such as groundnut, cotton, sunflower, rice, maize, wheat, tomato, soybean, etc., which are profiting from their wild related species for the improvement of cultivars (Zamir, 2001), the introgression of useful genes from wild species into cultivars via conventional breeding has not been so far successful in sesame mainly due to post-fertilization barriers (Tiwari et al., 2011).

# Functional Genomics Research for Key Agronomic Traits

#### Oil and Quality Traits

Sesame is primarily grown for its oil-bearing seed. Beside the high oil content, sesame seeds contain almost 18% proteins and among the fatty acid compositions, oleic acid (39.6%) and linoleic acid (46%) are the two main components with the ideal ratio of almost 1:1 (Anilakumar et al., 2010). Wherefore, numerous studies have early attempted to decipher the genetic basis of the oil yield and quality which are some key agronomic traits in sesame breeding. Even so, until 2013, the molecular mechanisms of the high oil content and quality in sesame seeds were still unclear (Jin et al., 2001; Chun et al., 2003; Suh et al., 2003; Ke et al., 2011). An association mapping of oil content, protein content, oleic acid concentration, and linoleic acid concentration based on multienvironment trials was conducted using 79 SSR, SRAP, and AFLP markers in 216 Chinese sesame accessions (Wei et al., 2013). Only one associated marker (M15E10-3) was identified for oil content in two environments suggesting inadequate molecular markers and/or germplasm resources. In this regard, Li et al. (2014) analyzed 369 sesame accessions with larger phenotypic variation under 5 environments using 112 informative SSRs. A total of 19 and 24 SSRs were detected for oil content and protein content, respectively. From these, 19 markers were shared by both traits suggesting that oil and protein contents are controlled mostly by similar and major genes. By combining genome information (Zhang et al., 2013a) to the association mapping results, 36 candidate genes related to lipid pathway including fatty acid elongation gene and a gene encoding Stearoyl-ACP Desaturase were identified. Later, Chen et al. (2014) investigated the sequence divergence in the coding region of the Fatty acid desaturase (FAD) gene between wild and cultivated sesame. They found some nucleotide polymorphisms located in enzyme active site between the wild and cultivated forms which may contribute to the higher fatty acid composition in the cultivated sesame. The specific primers linked to these functional SNPs would be for a great importance in molecular breeding toward high fatty acid content in sesame varieties.

The release of the reference genome sequence has provided an unparalleled opportunity to further excavate the molecular basis of high oil content and quality traits in sesame (Wang et al., 2014a). The sesame genome was found to harbor low copy of lipid-related genes (708) compared to related species such as soybean (1,298). This finding was unexpected since there is an obvious difference in oil contents between sesame (∼55%) and soybean (∼20%). More interestingly, by combining comparative genomic and transcriptomic analyses, authors have discovered that some lipid gene families, especially the transfer protein type 1 (LTP1) genes beneficial for high oil accumulation have been expanded and retained during domestication, while lipid degradation-related families were found reduced in sesame compared to soybean and this may underlay sesame high oil content. Additionally, important genes in the triacylglycerol biosynthesis pathway were found highly implicated in the oil accumulation during early stages of sesame seed development. Finally, two potential key genes SiDIR (SIN\_1015471) and SiPSS (SIN\_1025734) were detected for sesamin production in sesame (Wang et al., 2014a).

Genome wide association studies takes full advantage of ancient recombination events to identify the genetic loci underlying traits at a relatively high resolution (Huang and Han, 2014). In a comprehensive GWAS for oil and quality traits in 705 sesame accessions under 4 environments, 13 significant associations were unraveled for oilseed compounds including oil, protein, sesamin, sesamolin, SFA, Unsaturated Fatty Acid (USFA) and the ratio SFA/USFA. Several causative genes were uncovered for oil content [SIN\_1003248, SIN\_1013005, SIN\_1019167, SIN\_1009923, SiPPO (SIN\_1016759) and SiNST1 (SIN\_1005755)], for fatty acid composition [SiKASI (SIN\_1001803), SiKASII (SIN\_1024652), SiACNA (SIN\_1005440), SiDGAT2 (SIN\_1019256), SiFATA (SIN\_1024296), SiFATB (SIN\_1022133), SiSAD (SIN\_1008977), SiFAD2 (SIN\_1009785)], for sesamin and sesamolin content [SiNST1 (SIN\_1005755)] and protein content [SiPPO (SIN\_1016759)].

Further studies of Zhang et al. (2013), Wang et al. (2016a) resulted in 4 QTLs (QTL1-1, QTL11-1, QTL11-2 and QTL13-1) and 9 QTLs (qSCa-8.2, qSCb-4.1, qSCb-8.1, qSCb-11.1, qSCl-4.1, qSCl-8.1, qSCl-11.1, qSCa-4.1 and qSCa-8.1) detected for seed coat color, respectively. Additionally, the gene SiPPO (SIN\_1016759) has been recently detected through fine mapping, as the candidate gene that controlling seed coat color in sesame (Wei et al., 2016). Seed coat color is an important agronomic

trait in sesame, as it has been shown that white sesame seeds typically have higher oil, sesamin or sesamolin content (Wang et al., 2012a), whereas black sesame seeds usually have higher ash and carbohydrate content and lower protein, oil, and moisture ratios (Kanu, 2011). Though, these QTLs harbor dozen of genes, further screening will help to pinpoint the candidate genes.

Compiling all these meaningful results regarding oil and quality traits, researchers actually have substantial genomic information at their disposal for breeding and releasing higher nutritional cultivars to meet the various demands of oil markets.

#### Waterlogging and Drought Tolerance

Sesame is highly susceptible to waterlogging stress. The crop experiences a reduction in growth and yield after 2–3 days of waterlogging, which frequently occurs when they are grown on soils that are poorly drained (Ucan et al., 2007). Wang et al. (2012c) found 13,307 differentially expressed genes (DEGs) in sesame under waterlogging stress. In a more comprehensive study, a total of 1,379 genes were found as the core gene that functions in response to waterlogging. News worthily, they reported 66 genes that may be candidate for improving sesame tolerance to waterlogging (Wang et al., 2016b). Meanwhile, 6 QTLs (qEZ09ZCL13, qWH09CHL15, qEZ10ZCL07, qWH10ZCL09, qEZ10CHL07, and qWH10CHL09) linked to waterlogging traits were identified and a SSR marker (ZM428) closely linked to qWH10CHL09 was further reported as effective marker for marker-assisted selection (MAS) toward waterlogging tolerance (Zhang et al., 2014). Currently, studies are being implemented to unveil genomic variants associated with waterlogging tolerance in sesame.

Concerning drought stress in sesame, few molecular researches have been conducted so far in this field. Using a comparative genomic approach, Dossa et al. (2016b) identified in the whole sesame genome a set of 75 candidate genes for drought tolerance enriched in transcription factors (TFs). Hence, they afterward dissected two important TF families (AP2/ERF and HSF) and proposed some candidate TFs for drought tolerance improvement in sesame (Dossa et al., 2016c,d). Evolutionary analyses of these families showed that sesame has retained most of its drought-related genes similarly as uncovered for its oil-related genes. This may thus explain the relative high drought tolerance observed in this crop. A recent RNA-seq analysis demonstrated that 722 genes act as the core gene set involved in drought responses and 61 candidate genes conferring higher drought tolerance were discovered (Dossa et al., 2017). In another very recent report, an Osmotin-like gene (SindOLP) has been uncovered to enhance tolerance to drought, salinity, oxidative stresses, and the charcoal rot pathogen in transgenic sesame (Chowdhury et al., 2017). Finally, a study is underway to decipher SNP variants significantly linked to various drought tolerance traits through an inclusive GWAS.

#### Productivity Enhancement

Although sesame has been domesticated since long time, the yield in most of the growing areas is still very low, thus, hampering its adoption and expansion in the world. Grain yield of sesame per plant is considered to be composed of three components, i.e., the number of capsules per plant, the number of grains per capsule and the grain weight. Some other factors, including plant height, length of capsules, number of capsules per axil and axis height of the first capsule were found to be strongly associated with grain yield of sesame (Biabani and Pakniyat, 2008). Concerning the plant height trait, some QTLs have been reported including Qph-6 and Qph-12 (Wu et al., 2014a); 41 QTLs were further identified and the major QTL qPH-3.3, was predicted to be responsible for the semi-dwarf sesame plant phenotype (Wang et al., 2016a). However, it contains 102 candidate genes and thus needs further excavation to pinpoint the causative gene. A semi-dwarf gene [SiGA20ox1 (SIN\_1002659)] has been recently detected through fine-mapping strategy (Wei et al., 2016). Moreover, two important candidate genes for plant height SiDFL1 (SIN\_1014512) and SiILR1 (SIN\_1018135) were found in works of Wei et al. (2015). These findings will undoubtedly assist in efforts to create mechanized cultivation varieties with super high yield.

Quantitative trait loci were also identified for the capsule related trait including capsule number per plant (Qcn-11), First capsule height (Qfch-4, Qfch-11, and Qfch-12), capsule axis length (Qcal-5 and Qcal-9), capsule length (Qcl-3, Qcl-4, Qcl-7, Qcl-8, and Qcl-12) (Wu et al., 2014a). Similarly, the gene SiACS (SIN\_1006338) coding for the number of capsule per axil was discovered and may be an important asset for yield improvement in sesame (Wei et al., 2015). Since the number of capsules per axil is related to the number of flowers per axil, Mei et al. (2017) successfully mapped a gene SiFA (mono-flower vs. tripleflower) onto the LG11 flanked by the markers Marker58311 and Marker36337.

In regard to the grain yield, few QTLs are actually available including Qgn-1, Qgn-6, and Qgn-12 for grain number per capsule and Qtgw-11 for thousand grain weight.

Flowering time is also an important trait for adaptation of crops to different agro-climatic conditions that significantly affects the yield. Two candidate genes at flowering-time loci SiDOG1 (SIN\_1022538) and SiIAA14 (SIN\_1021838) have been discovered (Wei et al., 2015). Sesame's indeterminate growth habit is one of the reasons of its low yielding capacity compared to other oilseed crops (Yol and Uzun, 2012). Recently, the gene SiDt (DS899s00170.023) was detected as a target gene for conferring the determinate trait in sesame cultivar (Zhang et al., 2016). Also, the branching habit which is an important trait in sesame as it plays a cardinal role in grain yield, cultivation practices and mechanized harvest, has been investigated. A gene SiBH controlling the branching habit (uniculm vs. branched type) was mapped onto the LG5 flanked by the markers Marker129539 and Marker31462. This QTL region will be useful for developing unbranched sesame varieties fit for mechanized harvest (Mei et al., 2017).

In another realm, AFLP markers P01MC08, P06MG04 and P12EA14 were found to be linked to the recessive GMS gene SiMs1 (Zhao et al., 2013) and 13 SSRs including SBM298 and GB50 were associated to the dominant GMS gene Ms (Liu et al., 2015). These markers will be valuable for marker-aided breeding of GMS hybrids and harnessing heterosis as one of the promising approaches for yield improvement in sesame (Murty, 1975).

### CURRENT HOT-TOPICS IN SESAME RESEARCH AND FUTURE DIRECTIONS

Recent years have witnessed a continuously increasing number of functional genes discovered for key agronomic traits in sesame thanks to the availability of versatile "Omics" tools. In this regard, the next logical step in the sesame research is the functional validation of these gene resources through genetic engineering approaches. Genetic transformation would be an ideal opportunity to quickly transfer the functional genes into sesame elite cultivars. Actually, several successful attempts of sesame genetic transformation through Agrobacterium have led to up 42.66% of transformation efficiency (Yadav et al., 2010; Al-Shafeay et al., 2011; Chowdhury et al., 2014). However, improvements of the sesame genetic transformation protocol for reaching higher efficiency are imperative. Nonetheless, studies are in the offing to transfer candidate genes for oil quality traits as well as abiotic stress tolerance into elite cultivars and unveil their molecular mechanisms. In this vein, the first study of the functional analysis in transgenic sesame came into being very recently (Chowdhury et al., 2017). This report presages a bright future for the genetic engineering era of sesame.

Alternatively, the available "Omics" tools actually spur us to enquire scientific questions that remain unexplored or weakly investigated in sesame and could considerably aid in efforts toward its enhancement. These issues constitute the present hottopics in the sesame research and involve various domains of the crop:


future researches. Also, the available genomic tools need to be more effectively applied for other no less important agronomic traits that could lead to the enhancement of sesame productivity especially the resistance to biotic and abiotic stresses.


# CONCLUSION

Sesame has become an emerging crop in the world and its entrance into the "Omics" era has raised it at the "genomic resource-rich crop" level. Invaluable efforts during recent years have engendered several genetic/genomic tools and resources that provide an impetus to research and nurture sesame production for the benefit of smallholder farmers in developing countries. The major traits in sesame including oil and quality, yield related traits, abiotic stress resistance have been thoroughly explored and our understanding of the molecular basis underlying these traits has deeply increased. More importantly, several functional genes, QTLs and molecular markers linked to these traits are now available and could be employed in sesame breeding programs. The current scope in the sesame research concerns the exploitation of these available genomic information for the effective sesame improvement trough molecular- or genomicsassisted breeding. However, the road to raise sesame as one of the major oilseed crops in the world is still long and will need more synergetic efforts, more applications of MAS and inter-disciplinary researches. Other related research fields for the valorization and expansion of sesame should also follow the same trend as the molecular field. Finally and perhaps one of the most important recommendations is to enhance partnerships between national and international sesame teams, so that major issues

of sesame production could be addressed through international projects and effective breeding strategies could be implemented.

#### AUTHOR CONTRIBUTIONS

fpls-08-01154 June 29, 2017 Time: 13:58 # 13

KD, BL, NC, XZ, and DD conceived and designed the paper. KD, LW, XW, YZ, MN, DF, JY, MM, LY, and DD collected and analyzed the literature. KD, LY, MM, and DD drafted the paper. DF, JY, LW, and DD prepared the figures. DF, YZ, LW, XW, MN,

#### REFERENCES


DD, XZ, and NC revised the manuscript. All authors have read and approved the final version of the manuscript.

#### ACKNOWLEDGMENTS

We apologize to colleagues whose works and original articles are not cited in this review owing to space limitations. We thank Mr. Sobowale Soremi for his assistance in language editing of the manuscript.


component, sesamin in human stomach cancer KATO III cells. Oncol. Rep. 7, 1213–1216.


Indian sesame germplasm. Plant Genet. Resour. 14, 81–90. doi: 10.1017/ S1479262115000106



end sequencing and development of EST SSR markers. BMC Genomics 12:451. doi: 10.1186/1471-2164-12-451



molecular markers and extraction of a mini-core collection. BMC Genet. 13:102. doi: 10.1186/1471-2156-13-102

Zhao, Y., Yang, M., Wu, K., Liu, H., Wu, J., and Liu, K. (2013). Characterization and genetic mapping of a novel recessive genic male sterile gene in sesame (Sesamum indicum L.). Mol. Breed. 32, 901–908. doi: 10.1007/s11032-013- 9919-8

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Dossa, Diouf, Wang, Wei, Zhang, Niang, Fonceka, Yu, Mmadi, Yehouessi, Liao, Zhang and Cisse. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Recent Genetic Gains in Nitrogen Use Efficiency in Oilseed Rape

Andreas Stahl <sup>1</sup> \*, Mara Pfeifer <sup>1</sup> , Matthias Frisch<sup>2</sup> , Benjamin Wittkop<sup>1</sup> and Rod J. Snowdon<sup>1</sup>

*<sup>1</sup> Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University Giessen, Giessen, Germany, <sup>2</sup> Department of Biometry and Population Genetics, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University Giessen, Giessen, Germany*

Nitrogen is essential for plant growth, and N fertilization allows farmers to obtain high yields and produce sufficient agricultural commodities. On the other hand, nitrogen losses potentially cause adverse effects to ecosystems and to human health. Increasing nitrogen use efficiency (NUE) is vital to solve the conflict between productivity, to secure the demand of a growing world population, and the protection of the environment. To ensure this, genetic improvement is considered to be a paramount aspect toward ecofriendly crop production. Winter oilseed rape (*Brassica napus* L.) is the second most important oilseed crop in the world and is cultivated in many regions across the temperate zones. To our knowledge, this study reports the most comprehensive field-based data generated to date for an empirical evaluation of genetic improvement in winter oilseed rape varieties under two divergent nitrogen fertilization levels (NFLs). A collection of 30 elite varieties registered between 1989 and 2014, including hybrids and open pollinated varieties, was tested in a 2-year experiment in 10 environments across Germany for changes in seed yield and seed quality traits. Furthermore, NUE was calculated. We observed a highly significant genetics-driven increase in seed yield *per*-*se* and, thus, increased NUE at both NFLs. On average, seed yield from modern open-pollinated varieties and modern hybrids was higher than from old open-pollinated varieties and old hybrids. The annual yield progress across all tested varieties was ∼35 kg ha−<sup>1</sup> year−<sup>1</sup> at low nitrogen and 45 kg ha−<sup>1</sup> year−<sup>1</sup> under high nitrogen fertilization. Furthermore, in modern varieties an increased oil concentration and decreased protein concentration was observed. Despite, the significant effects of nitrogen fertilization, a surprisingly low average seed yield gap of 180 kg N ha−<sup>1</sup> was noted between high and low nitrogen fertilization. Due to contrary effects of N fertilization on seed yield *per*-*se* and seed oil concentration an oil yield of 2.04 t ha−<sup>1</sup> was measured at both N levels. Collectively, the data reveal that genetic improvement through modern breeding techniques in conjunction with reduced N fertilizer inputs has a tremendous potential to increase NUE of oilseed rape.

Keywords: yield, breeding progress, Brassica napus, hybrid varieties, nitrogen, fertilization, oil, sustainable intensification

#### Edited by:

*Dragana Miladinovic,´ Institute of Field and Vegetable Crops, Serbia*

#### Reviewed by:

*Liezhao Liu, Southwest University, China Shengwu Hu, Northwest A&F University, China*

\*Correspondence: *Andreas Stahl andreas.stahl@agrar.uni-giessen.de*

#### Specialty section:

*This article was submitted to Crop Science and Horticulture, a section of the journal Frontiers in Plant Science*

> Received: *07 April 2017* Accepted: *22 May 2017* Published: *07 June 2017*

#### Citation:

*Stahl A, Pfeifer M, Frisch M, Wittkop B and Snowdon RJ (2017) Recent Genetic Gains in Nitrogen Use Efficiency in Oilseed Rape. Front. Plant Sci. 8:963. doi: 10.3389/fpls.2017.00963*

**253**

**Abbreviations:** GHG, Greenhouse gas; HN, High nitrogen; LN, Low nitrogen; N, Nitrogen; Nmin, Soil mineral nitrogen; NFL, Nitrogen fertilization level; NUE, Nitrogen use efficiency (as Seed yield over N supplied); OilConc, Seed oil concentration (at 91% dry matter content); OilY, Oil yield; OP, Open pollinated; ProteinConc, Seed protein concentration (at 91% dry matter content); ProteinConc DFF, ProteinConc in defatted fraction (at 91% dry matter content); ProteinY, Protein yield; SumProOilConc, Sum of protein and oil concentration (at 91% dry matter content); SumProOilY, Sum of protein and oil yield; SDH, Semi dwarf hybrid; SY, Seed yield (at 91% dry matter content).

# INTRODUCTION

An increase in global crop demand of up to 110% is expected by 2050 compared to 2005, requiring a tremendous increase in production (Tilman et al., 2011). Competitive crop production crucially depends on the adequate application of nitrogen (N). It is well-known that the application of N fertilizer is a substantial driver of yield increases in the last century. At the same time, it has also been found that 50–70% of the applied N is not recovered in the harvested plant organs (seeds) and can cause severe damages to the surrounding ecosystems (Sylvester-Bradley and Kindred, 2009; Liu et al., 2010; Galloway et al., 2013). Estimations have revealed that the current global conversion of atmospheric N<sup>2</sup> into reactive N and its application on fields have already transgressed the boundaries of sustainable development (Rockström et al., 2009; Steffen et al., 2015). Moreover, the lost N is not only an ecological problem but also uneconomical for farmers (Rothstein, 2007). Therefore, in the decades ahead, agricultural crop production faces the unprecedented challenge of enhancing crop yields to match the increasing demands, while simultaneously reducing environmental damages caused by unused nitrogen. To master this dichotomy, a dramatic increase in nitrogen use efficiency (NUE) is inevitable. Besides a more precise fertilizer application (Henke et al., 2008; Müller et al., 2012), the use of genetic potential by breeding and cultivation of most suitable and efficient varieties is considered to play an important role in the sustainable intensification of agriculture (Hirel et al., 2007; Kant et al., 2011; Hawkesford, 2014).

Oilseed rape (Brassica napus L.) is the principle European oil crop and the second most important oilseed crop in the world after soybean, with a wide dissemination across countries in the moderate climatic zone (Fischer et al., 2014). It is mainly cultivated because of its high quality vegetable oil used for human nutrition purposes (Kumar et al., 2016) and as a renewable source for fuels and technical oils. Not at least, the residual meal is winning wide use as a protein feed for animals and is also discussed for human nutrition recently (Fleddermann et al., 2013). Despite several positive agronomic effects, including its integral role as a break crop in cereal crop rotations, its ability to improve soil fertility, and its strong ability to capture nutrients during vegetative growth stages, oilseed rape cultivation is often associated with a relatively high N balance surplus (Sieling and Kage, 2006; Weiser et al., 2017). Consequently, a more competitive and eco-friendly production of oil and protein would benefit from increased NUE.

In the face of this challenge the question arises whether or not breeding is directing NUE improvement. A critical assessment of the genetic progress for SY under divergent N inputs is of great value in order to analyze the extent of the breeding effect on NUE improvement in the past and in what way a course correction in breeding programs is required to reach future sustainability goals. Breeding is a long-term procedure and environmental conditions as well as management practices have also changed over time. Therefore, the isolation of the genetic contribution by comparing the direct performance between varieties from different period of registration requires multilocation field trials with simultaneous cultivation of the varieties under a very similar N fertilization management. Owing to the tedious trial setup, experimental data of those comparisons are seldom presented.

In the case of oilseed rape Kessel et al. (2012) evaluated 36 genotypes in the vegetation period 1997–1998. This study included 3 hybrids, 6 resynthesized lines, 8 old cultivars, and 19 modern cultivars. It was found that modern cultivars and hybrids outperform older varieties and resynthesized lines. However, the varieties designated as modern in that study no longer have any relevance for present farming practice. Moreover, since hybrids varieties have gained enormous importance in the last two decades, today comprising the vast majority of available varieties, this type of variety must be considered for yield and NUE monitoring with an up-to-date European benchmark. In this regard, Rathke and Diepenbrock (2006) investigated the energy balance of winter oilseed rape depending on the N fertilizer inputs and suggested that SY levels should be assessed through further studies to document progress in plant breeding including the advent of hybrid varieties. Although further field studies on NUE in oilseed rape have since been conducted, most focussed on genetic mapping approaches, either in experimental populations (Bouchet et al., 2014; Nyikako et al., 2014; Miersch et al., 2016) or in case of association studies (Bouchet et al., 2016) do not address the breeding progress.

The present study aims to (i) assess the SY change through breeding during the last decades, (ii) examine the response to nitrogen of older and modern varieties in multilocation field trials, (iii) investigate the change and interrelationship of seed quality traits affected by environment, N fertilization, and breeding activities, and (iv) provide evidence-based suggestions for further orientations in oilseed rape breeding and cultivation.

# MATERIALS AND METHODS

#### Plant Material

Diverse varieties (n = 30) from different breeding companies and a period of registration between 1989 and 2014 were investigated (**Table 1**). All of them were selected varieties that were adapted to the northern European growing conditions and have passed the official varieties test by official varieties agency. In order to keep the number of genotype on a manageable size, major varieties from the different periods of registration were selected as proposed in a similar study on wheat by Lopes et al. (2012). According to the year of registration, varieties were grouped as older and modern varieties.

# Field Experiments

Since environmental factors heavily affect the phenotype, 10 experiments were conducted in total at six different locations in Germany during two subsequent growing seasons (**Table 2**). Environmental specifications are given in Supplementary Tables 1, 2. Experiments were conducted in a split plot design for N treatments. Within each nitrogen fertilization level (NFL, as main plot), the genotypes were arranged in three replicates with eight sub blocks each, according to an alpha lattice designed with R package agricolae (De Mendiburu and Simon, 2015). The sowing rate was adjusted for the previous tested germination rate to TABLE 1 | Groups of investigated varieties according to year of registration and type of variety.


*SDH, Semi dwarf hybrid.*

target a final plant density of 50 plants per square meter. At all locations, a plot in plot system was used, and only the middle part of each plot was harvested in order to avoid side effects from the neighboring plots. Organic fertilizers were applied neither during the experiment nor during cultivation of the preceding crops. Moreover, no intercrop was cultivated before the experiment. In the first year of experiment, 120 kg N ha−<sup>1</sup> was applied in both low N (LN) and high N treatment (HN) at the beginning of the spring vegetation. Another 100 kg N ha−<sup>1</sup> was applied only in HN during bolting. In the second year, 65 kg N ha−<sup>1</sup> was applied in LN and 120 kg N ha−<sup>1</sup> in HN at the first application. The second application during bolting comprised of 55 kg N ha−<sup>1</sup> in LN and 100 kg N ha−<sup>1</sup> in HN. In both years, the first application of fertilizer was reduced for soil mineral N (Nmin) present in the soil at the end of winter (Supplementary Table 1). The intention of the high N application was to simulate intensive crop production conditions as they are common in Northern Europe. The low N treatment was to investigate the response of the genotypes during reduced fertilizer application. All other nutrients were applied at the same level across both NFL at the particular locations. Furthermore, full control of weeds, pest (insects), and diseases (fungi) and straw stiffness were conducted at each side on a local appropriate level. Thus, only N was varied and all other factors were kept on an intensive level in both NFL levels.

#### Data Collection

Seed yield was determined by threshing from standing mature canopy in the second half of July or first days of August (**Table 2**). The water content of seeds was determined immediately after seed harvest, and SY was corrected to a standard water content of 9%. From each individual plot, an aliquot of at least 100 g were used to determine seed oil concentration (OilConc) and seed protein concentration (ProteinConc) in duplicates on the same machine (Unity SpectraStar 2500, Brookfield, USA) via near-infrared reflectance spectroscopy (Tkachuk, 1981; Reinhardt, 1992; Tillmann and Paul, 1998; Tillmann et al., 2000). Subsequently, OilConc and ProteinConc were also corrected to 9% water content and multiplied with SY in order to determine seed oil yield (OilY) and seed protein yield (ProteinY). Since one old hybrid variety showed a strongly reduced germination rate at all locations in the second experimental year, it was excluded from further data analysis.

In environments ASE15, ASE16, NIE16, and MOS16, lodging was evaluated before harvest on each individual plot on a scale from 1 (no lodging at all) to 9 (plot is completely lodging).

#### Data Analysis

Adjusted means for each variety at each NFL across all locations were estimated using a linear model with varieties, NFL, and their interaction as fixed factors; and interactions of year, location, NFL, replicate, and block considered as random factors (Equation 1). P-values for significance of fixed effects were derived from an analysis of variance (ANOVA) using the models described in Equation (1).

$$\begin{aligned} \mathbf{P}\_{ijklmn} &= \boldsymbol{\mu} + \mathbf{g}\_i + \mathbf{n}\_j + \mathbf{g}n\_{ij} + \mathbf{B}\_{jklmn} + \mathbf{R}\_{jkmn} + \mathbf{W}\_{jkm} \\ &+ \mathbf{E}\_{km} + \mathbf{e}\_{ijklmn} \end{aligned} \tag{1}$$

with Pijklmn as the observed phenotype of the ith variety, the jth nitrogen fertilization, the kth year, the mth location, the nth replicate, and the lth block. µ is the general mean of the experiment, **g**<sup>i</sup> is the ith fixed effect of variety, **n**<sup>j</sup> is the jth fixed effect of NFL, and **gn**ij is the fixed effect of variety by NFL interaction. Bjklmn is the random effect of the lth block within the nth replicate, within the jth main plot, at the mth location, and at the kth year. Rjkmn is the random effect of the nth replicate, within the jth main plot, at the mth location, and at the kth year. Wjkm is the random effect of the jth main plot, at the mth location, and at the kth year. Ekm is the random effect of the environment at the mth location in the kth year. eijklmn is the error term. Fixed effects are written in bold lowercase letters.

To estimate the adjusted means of each variety at individual year (Equation 2) and location (Equation 3), the linear model

TABLE 2 | Overview of single experiments and side specific conditions.


was modified and only the interactions of the location, main plot, replicate, and block were considered to be random factors. Fixed effects are written in bold lowercase letters.

$$\begin{aligned} \mathbf{P}\_{ijlmm} &= \boldsymbol{\mu} + \mathbf{g}\_i + \mathbf{n}\_j + \mathbf{g}n\_{ij} + \mathbf{B}\_{jlmn} + \mathbf{R}\_{jmn} + \mathbf{W}\_{jm} \\ &+ \mathbf{E}\_m + \mathbf{e}\_{ijlmn} \end{aligned} \tag{2}$$

$$\mathbf{P}\_{ijln} = \mu \mathbf{+} + \mathbf{g}\_i + \mathbf{n}\_j + \mathbf{g}\mathbf{n}\_{ij} + \mathbf{B}\_{jln} + \mathbf{R}\_{jn} + \mathbf{W}\_j + \mathbf{e}\_{ijln} \tag{3}$$

The model shown in Equation (4) was used to estimate the variance for broad sense heritability estimation. In contrast to Equation (1), **g**<sup>i</sup> is the ith random effect of variety, **n**<sup>j</sup> is the jth fixed effect of NFL, and **gn**ij is the random effect of variety by NFL interaction. GEikm is the random effect of the G × E interaction. Fixed effect is written in bold. Equation (5) was used to estimate the broad sense heritability.

$$\begin{aligned} \mathbf{P}\_{ijklmn} &= \boldsymbol{\mu} + \mathbf{G}\_i + \mathbf{n}\_j + \mathbf{G} \mathbf{N}\_{ij} + \mathbf{G} \mathbf{E}\_{ikm} + \mathbf{B}\_{jklmn} + \mathbf{R}\_{jkmn} \\ &+ \mathbf{W}\_{jkm} + \mathbf{E}\_{km} + e\_{ijklmn} \end{aligned} \tag{4}$$

$$h^2 = \frac{\sigma\_\mathcal{G}^2}{\sigma\_\mathcal{G}^2 + \frac{\sigma\_\mathcal{G}^2 \times \mathcal{N}}{\mathfrak{j}} + \frac{\sigma\_\mathcal{G}^2 \times \mathcal{E}}{\mathfrak{p}} + \frac{\sigma\_\mathcal{E}^2}{\mathfrak{j} \times \mathfrak{p} \times \mathfrak{l} \times \mathfrak{n}}}} \tag{5}$$

with j the number of NFL, p the number of the investigated environments, l the number of blocks, and n the number of replicates.

All analyses were conducted by statistical software R (R Core Team, 2013) by using the packages lmerTest (Kuznetsova et al., 2016), lsmeans (Lenth, 2016), and lme4 (Bates et al., 2015).

Adjusted data across all environments were used to determine the NUE, which is expressed as SY over N fertilizer application. In addition, the amount of N fertilization required to produce 1 ton of rapeseed oil was determined. Therefore, 1 ton was divided by the particular OilY of individual varieties and multiplied with the NFL. Production losses and inefficiency in post-harvest proceedings were neglected in this calculation.

Pairwise Pearson correlation coefficients (r) were estimated between individual traits and year of registration, as well as between the trait value observed at HN and LN.

Packages ggplot2 were used to design diagrams (Wickham, 2009).

#### RESULTS

#### Seed Yield

Across all 10 environments, SY varied between 3.79 and 5.00 t ha−<sup>1</sup> for LN and between 3.91 and 5.13 t ha−<sup>1</sup> for HN (**Table 3**). At some location the variation was even bigger (Supplementary Table 4). The average SY in experimental year 2014–2015 was about 220 kg and 480 kg ha−<sup>1</sup> higher in LN and HN respectively, compared to the experimental year 2015–2016. While the genetic effect on yield was highly significant in each of the environments, except RHH15 (p = 0.0181), NFL was only significant in some environments (Supplementary Table 4). Over all tested environments, difference in N fertilization resulted in a yield difference of 180 kg ha−<sup>1</sup> , which was only significant on the 10% error level, according to ANOVA. G × N interaction was significant only at several individual environments (MOS and ASE in both years, Supplementary Table 4). Generally, the correlation between the environments at HN is high, ranking between r = 0.17 and r = 0.81, and was significant in most cases (Supplementary Figure 1B). At LN, it was observed that the environment RHH15 did not show significant correlation to most other environment, while all the other environments were positive and significantly correlated to each other. The highest correlation between environments was r = 0.77 (Supplementary Figure 1A).

Across the entire set of tested varieties, correlation between SY and the year of registration was r = 0.81 for LN and r = 0.88 for HN, reflecting a strong breeding progress over the period of observation (**Figures 1A,B**, **2**). This is further underlined by a very high value for broad sense heritability in the investigated variety set (h <sup>2</sup> = 0.92, **Table 4**). The annual yield progress was ∼35 kg ha−<sup>1</sup> year−<sup>1</sup> at LN and 45 kg ha−<sup>1</sup> year−<sup>1</sup> at HN.

If the investigated set is split into groups of hybrid and OP varieties and in older and modern varieties, it becomes obvious that according to the respective arithmetic means, modern hybrid varieties outperform old hybrid varieties as modern OP varieties outperform old OP varieties in all the 10 environments and in both the NFL. There is just one exception at HN in ROS16, where the arithmetic mean of older OP varieties is marginally higher than in modern OP varieties. As indicated in **Figure 1**, at HN, Pearson coefficient of correlation between SY and year of

#### TABLE 3 | Descriptive statistics for investigated traits


*Adjusted values of highest (Max) and lowest (Min) performing variety is depicted along with the arithmetic mean of the evaluated variety set. CoV, Coefficient of variation.*

registration was r = 0.89 for new OP varieties, r = 0.87 for old OP varieties, and r = 0.94 for old hybrids (all significant on the 5% error level). In contrast, the correlations at LN of r = 0.76 for new OP varieties, r = 0.67 for old OP varieties, and r = 0.55 for old hybrids are not significant and lower, indicating a weaker association than at HN. Within the group of modern hybrids, no

Least significant difference of 0.185 (HN) and 0.184 (LN) was estimated on the 5% error level. Gray shaded area indicates the confidence interval.



*NFL, Nitrogen fertilization level. Level of significance is indicated by . for p* < *0.1,* \**p* < *0.01,* \*\**p* < *0.005 and* \*\*\**p* < *0.001.*

significant correlation between SY and year of registration could be determined either at HN or at LN.

Within the group of old hybrids, it was noticeable that two varieties showed a higher yield of 0.09 t ha−<sup>1</sup> and 0.14 ha−<sup>1</sup> under LN than under HN (**Figure 3**). However, it has to be mentioned that this is only a relative performance. On an absolute level, these varieties are only average or second lowest yielding, respectively. Interestingly, the lower-yielding variety appears to have an extreme above-average lodging score (Supplementary Figure 2). Calculations of NUE, expressed as SY per unit of applied nitrogen fertilizer reveal (i) a much higher NUE at LN than at HN and (ii) a gradient of NUE from modern hybrids over modern OP varieties and old hybrids to old OP varieties (**Figure 4**).

#### Oil

The ANOVA indicated a significant effect of NFL on OilConc (**Table 4**). In all environments, OilConc was expectedly higher in LN than in HN treatment. On average, it was 45.64 and 43.96% at LN and HN, respectively, across the entire study, and was almost constant between the investigated years (**Table 3**). OilConc was found to be very heritable (h <sup>2</sup> = 0.97). A highly significant variety effect was observed in all environments (Supplementary Table 4). With the exception of RHH15 at LN, all individual environments were very high and significantly correlation with each other, ranging between r = 0.64 and r = 0.93 (Supplementary Figure 3). Furthermore, in RHH15, and BOV16, significant G × N interactions were detected. The correlation between the year of registration and OilConc was only r = 0.28 at LN and r = 0.30 at HN and not significant. **Figure 5** indicates that modern varieties, especially the OP varieties, clearly outperform the older groups at both NFL (but not at LN in RHH15).

Furthermore, the results show that LN fertilization was sufficient to produce the same amount of oil than produced through HN. Since, OilY is the product of SY and OilConc, and both traits showed a contrary response to N fertilization, OilY is the same at both NFL (2.04 t ha−<sup>1</sup> at LN and HN). The highest OilY difference was observed with 210 kg ha−<sup>1</sup> at ROS15 (Supplementary Table 3). At both NFL, results have proven that modern varieties require less N fertilization to achieve the same OilY compared to varieties registered in olden times (Supplementary Figure 4). As correlations in **Figure 2** illustrated, the OilY is much more determined by SY (LN r = 0.95; HN r = 0.96) than by oil concentration (LN r = 0.56; HN r = 0.58). Moreover, correlations of OilY between environments at LN and HN are in most cases above a correlation of r = 0.5 and significant, except for RHH15 at LN (Supplementary Figures 5A,B).

#### Protein

ProteinConc was significantly affected by variety and NFL across the entire study and at each of the 2 years (**Table 4**). At LN, ProteinConc was 15.43% at LN, and 90% of the level achieved at HN (17.22%). Also, in each single experiment, the variety and NFL significantly affected ProteinConc, and only in RHH15 and BOV16, G × N interactions were observable (as for OilConc). In summary, Pearson coefficient of correlation between SY and ProteinY of r = 0.82 for LN and r = 0.88 for HN revealed that latter was through the overwhelming part determined by SY. No significant relationship was found between ProteinConc and ProteinY (**Figure 2**), indicating that SY was the relevant parameter for N extraction and removal from the field. For experimental year 2015–2016, ProteinConc at LN was with 15.73% higher compared to the first year's trails (14.97%). The ProteinY at LN was with 0.69 t ha−<sup>1</sup> , just 86% of the ProteinY at HN (0.80 t ha−<sup>1</sup> ). ProteinConc and ProteinY were the traits with the strongest alteration due to divergent NFL, stronger than oil-related traits (**Table 3**). As indicated in **Figure 2**, across the entire set in both NFL, a negative relationship between the year of registration (LN r = −0.56; HN r = −0.57) was observed. In addition, ProteinConc had an exceptional strong negative relationship with OilConc at LN (r = −0.79) and HN (r = −0.74). Except for RHH15 at LN protein concentration show a medium to high correlation between environments (Supplementary Figure 6). In contrast, for ProteinY also low correlations between random environments were observed (Supplementary Figure 7).

If the diversity set is separated into the four variety groups, it become obvious that the oldest varieties, in most cases the old OP varieties, have the highest protein concentration. On the contrary, modern varieties, predominantly modern OP varieties, have the lowest protein concentration (**Figure 6**). Old OP varieties (LN r = −0.61; HN r = −0.47) and modern OP varieties (LN r = −0.50; HN r = +0.64) have weaker correlation

FIGURE 2 | Inter-trait phenotypic correlations at low (A) and high (B) nitrogen fertilization. Colors indicate the strength of correlations. Only correlations significant at a confidence level of 95% are depicted.

between OilConc and ProteinConc, while old hybrids (LN r = −0.81; HN r = −0.88) and modern hybrids (LN r = −0.81; HN r = −0.76) have the strongest negative correlation.

# DISCUSSION

# Measurement of Breeding Progress in a Complex Interacting Growth System

NUE is the final outcome of a complex cropping system with Genotype × Environment × Management (G × E × M) interactions (Dresbøll and Thorup-Kristensen, 2014; Thorup-Kristensen and Kirkegaard, 2016). While environmental conditions such as rainfall, temperature, radiation, and others cannot be influenced, the management of crop rotation and decision on soil tillage, as well as dosage and timing of sowing, fertilization, and responsive use of plant protection agents can be modified by farmers. In addition, since the effect of the preceding crop on the following crop has to be taken into account, the realized NUE has to be measured within the cropping system level as the relevant benchmark (Dresbøll and Thorup-Kristensen, 2014). Within this system, the use of most appropriate genotypes is an important question. In this regard, the aim of our study was to understand the relevance of the genotype of oilseed rape and how breeding influences the progress of its NUE. Precise extraction of genetic contribution for alteration of NUE is extremely challenging due to a lack of comparable environmental conditions. This is not only because SY is determined over a lengthy maturation period, but also because numerous environmental factors and management decisions, including N fertilizer applications, influence various components of NUE throughout the growing season. Therefore,

experimental year 2015–2016 below. Windows include locations, namely Asendorf (ASE), Bovenau (BOV), Moosburg (MOS), Nienstädt (NIE), Rauischholzhausen (RHH), and Rosenthal (ROS). Within each window, nitrogen fertilization level is indicated as 1 for low nitrogen fertilization and 2 for high nitrogen fertilization.

investigation of breeding progress requires testing of varieties from different periods of registration under exactly the same conditions, however conditions always differ between individual experiments or replicates. In this study, the experiments were conducted after wheat or barley and crop management at all locations was oriented on a commonly used, relatively intensive Northern European oilseed rape production system.

The results show that the locations chosen for this study were subject to contrasting environmental influences. For example, in BOV16, the yields were extremely low due to a combination of pest and disease pressure at a historic low level (C. Algermissen, personal communication). On the contrary, MOS15 and MOS16 were characterized by favorable growth conditions, resulting in significant above-average yield levels.

Despite the divergent environmental influences and complex interactions of NUE, due to its highly quantitative inheritance, the high heritability determined in our study (h <sup>2</sup> = 0.92 for SY and h <sup>2</sup> = 0.94 for OilY) indicates that the enhancement of NUE can be achieved through genetic improvement. Thus, over a long period of almost 25 years, breeding was revealed to be a successful strategy, even when gains from year to year are rather small. This finding is in line with modeling studies by Dresbøll and Thorup-Kristensen (2014).

Several studies have described a genotype by nitrogen (G × N) interaction for several major crops (Foulkes et al., 2009; Gaju et al., 2014). In this study the overall picture (**Figure 3**) suggests that there is a very strong correlation between HN and LN. Also at most locations the correlation between HN and LN is higher than correlations between locations. This suggests that soil, weather and climate conditions often had a stronger effect than variety specific reaction to NFL. For example, the best performing variety at HN in MOS16 performs only on average in ASE16 and vice versa (Supplementary Figure 1). This finding is in line with previous studies on oilseed rape in France, where most quantitative trait loci were constitutive under LN and HN fertilization for yield parameters but not between environments (Bouchet et al., 2014, 2016).

#### Yield Increase Drives NUE

In this work, a collection of winter type oilseed rape varieties with major market importance in Germany and neighboring Northern-European countries was investigated. In contrast to a previous study on NUE in oilseed rape (Stahl et al., 2016), where genetic diversity for NUE was investigated in a broad collection of winter type oilseed rape associations, this study was conducted only on elite varieties that were registered between 1989 and 2014. Thus, the results are not biased through tipping points in breeding history of oilseed rape, as the introduction of zero erucic acid and low glucosinolate content varieties (Downey et al., 1969). In contrast to an earlier study (Kessel et al., 2012) investigating the breeding progress before 1999, the present study focuses on a very recent time window. The fact that varieties such as Express and Lirajet, labeled as modern varieties in Kessel et al. (2012) but are by far the oldest varieties in our study, is indicative of the sliding window in breeding history. Finally, since hybrid varieties have become important and meanwhile dominate oilseed rape production in Northern Europe, a reassessment of breeding progress was overdue. This issue is especially relevant due to several indications for a heterosis effect on yield, which is often pronounced particularly under limited N conditions (Gehringer et al., 2007; Koeslin-Findeklee et al., 2014; Wang et al., 2016).

The ultimate yardstick for mitigating N losses and to create a most N sustainable oilseed rape production is the amount of nitrogen, which is added to the cultivation system required to harvest one unit seed or oil (and protein). The same relationship is expressed by the reciprocal relationship, which is the widely used definition of NUE, as the SY over supplied N (Moll et al., 1982; Good et al., 2004). The above presented results provide evidence for a tremendously successful increase in SY and OilY, and are in the overall picture with many previously described data for oilseed rape (Kessel et al., 2012; Koeslin-Findeklee et al., 2014), wheat (Austin et al., 1980; Fischer and Edmeades, 2010; Cormier et al., 2013; Laidig et al., 2017a), maize (Tollenaar, 1989), rye (Laidig et al., 2017b); rice (Zhu et al., 2016), and triticale (Losert et al., 2017). Therefore, the sometimes appearing hypothesis that cultivation of older varieties could be beneficial for better nutrient use, since they might have comparative advantages in adaptation to low input systems, can unambiguously be rejected.

Since modern high-yielding varieties achieve much higher yields (SY, OilY, and ProteinY), while N inputs and management were the same as those for the old low-yielding varieties, it can be concluded that N losses are significantly lower in modern varieties. In conclusion, the amount of N fertilizer required to produce 1 ton of oilseed rape has dramatically declined within the last 25 years (Supplementary Figure 4). Therefore, our results support the statement of Burney et al. (2010) that increasing the yield is a highly effective instrument to reduce the negative effects on the environment. This is not only because of a higher yield to fertilizer ratio but also due to a lower cultivation area that is required to produce the same amount of commodities. Thus, increasing yields reduces indirect land-use change due to oilseed rape production (Don et al., 2012).

In our study, a strong increase in SY and NUE was observed for both HN and LN, with a high correlation between both the treatments. Bouchet et al. (2014) also found only small G × N interactions when NFL differs by 80–90 kg N ha−<sup>1</sup> . Nevertheless, the stronger correlation between the year of registration and SY indicates that the progress was slightly more pronounced at HN. Similarly, the majority of studies (e.g., Brancourt-Hulmel et al., 2005; Brisson et al., 2010) pointed out that progress in SY was higher at HN than at LN. Over the last few decades, farmers became used to rather high amounts of N fertilization, while breeders tried to treat their selection environments in a manner similar to common farming practice in order to select the genotypes for the target environment. This suggests that the selection for HN conditions was probably a direct selection, while selection for LN was rather indirect, and thus not quite so effective as selection under HN conditions. The results of our study should encourage breeders to select their varieties directly at reduced NFL, in order to speed up the breeding progress for low N inputs.

The breeding progress described here is still an ongoing process. Although, **Figure 1** shows a complete absence or even a negative slope for modern hybrids, unpublished data from varieties released to the market after 2014 helps to conclude that breeding for high yielding and more efficient varieties has not come to an end and is rather still an ongoing process.

# Seed Quality in Light of Efficient Nitrogen Use

The analysis of the breeding success depends not only on yield quantities but also on the changes in the specific seed quality composition. The strong positive correlation between the year of registration and OilY can be explained by the fact that oil was an economically relevant component, with farmers in Germany paid a premium for a high-oil crop, hence breeders have selected genotypes high in oil (Abbadi and Leckband, 2011). OilY increased in an over-proportional manner along with a simultaneous increase in SY. Although, the new OP varieties investigated in this study are, on average, not higher yielding than new hybrids, we observed that they outperformed all other varieties in terms of OilConc under both HN and LN (**Figure 5**).

On the contrary, the improvement of OilConc correlated with a decrease in ProteinConc (**Figure 2**). The negative correlation of −0.79 for LN and −0.74 for HN is consistent with studies reported earlier (Bouchet et al., 2014, 2016; Nyikako et al., 2014). Since, oil and protein synthesis are supposed to rely on the same carbon sources, and protein synthesis is preferentially enhanced with increasing N availability, OilConc declines with increased N fertilization (Rathke et al., 2006; Zhao et al., 2006). However, ANOVA (**Table 4**) illustrates that the effect of NFL is more pronounced on ProteinConc than on OilConc.

From the perspective of resource efficiency, a high ProteinY is desired, since it determines the proportion of N that is captured in the harvested plant organs (as the sink), removed from the field, and thus, protected from losses. Although a strong reduction of ProteinConc is evident in groups of modern varieties, the proportion of N removed from the field compared to proportion invested has improved significantly by overcompensation of a SY-driven enhancement of ProteinY. This finding is in agreement with previously published results from Koeslin-Findeklee et al. (2014).

While OilY and ProteinY are positively correlated to each other (**Figure 2**), since both are predominantly explained by the common factor SY, the improvement of ProteinY through ProteinConc is hampered by the trade-off. Nevertheless, although the notion that the strong negative correlation between OilConc and ProteinConc makes it impossible to increase OilY to N fertilization ratio through enhanced ProteinConc, it does not necessarily mean that breeders are unable to increase the sink capacity due to selection of quality traits. A selection of higher ProteinConcDFF can be a promising strategy to increase the amount of N stored and harvested in seeds without neglecting the achievements in high OilConc (Potter et al., 2016). In this case, the correlation is weakly negatively correlated to OilConc (r = −0.39 of ProteinConcDFF vs. r = −0.79 of ProteinConc at LN, **Figure 2**). However, this strategy would require that the protein content in the meal is included as a couple product in ecological footprint calculations and furthermore receives the necessary economic attention to justify breeders' attempts to select genotypes superior in this trait.

## Scope for Reduced Fertilizer Inputs without Drastic Yield Penalties

Choosing NFL that are suitable to phenotype responses to N is a non-trivial question for farming practices, selection decisions by breeders, and for experimental setup in crop research. Han et al. (2015) reviewed that phenotypic data collected under a severe N stress are not comparable to mild N stress. In some studies (Kessel et al., 2012; Miersch et al., 2016), N stress was maximized by zero N treatment in order to observe genotypes' response to provoked severe stress. On the other hand, zero N is not a realistic scenario for future oilseed rape production. Even if one considers a reduction in the maximum allowance of N application under upcoming stricter environmental policies, a certain stock application of N is always inevitable to achieve crop yields that allow rentable crop cultivation. Furthermore, simply for the reason, that soil exploitation has to be avoided N fertilization will always be essential to re-deliver the removed N in sustainable farming practice. For this reason, the application of 120 kg N ha−<sup>1</sup> instead of zero N is much closer to reality. The low NFL represents a below-average application rate that is clearly below the maximum NFL that will legally be applicable according to future fertilizer ordinance. The high NFL of 220 kg ha−<sup>1</sup> was oriented on today's common farming practice (Rathke et al., 2006).

The average SY difference between both NFL of 180 kg ha−<sup>1</sup> found in our study was surprisingly low. Even the highest SY difference of 690 kg ha−<sup>1</sup> is comparably small considering the delta of 100 kg N fertilizer between both the treatments. Thus, in our study, the additional fertilizer application did not result in higher yields but rather contributed to a higher N balance surplus. For the first experimental year, one might speculate that observation is explainable due to equal fertilization dosage at the first application date in conjunction with a potentially low availability of N after the second application due to limitations in rainfall in central Germany. Although, the interaction between water limitation and nitrogen uptake (Albert et al., 2012; Sadras and Lawson, 2013) is a reasonable explanation, this logic does not hold true for MOS15, where high rainfalls were observed during spring, and not for the second experimental year, where fertilizer application was reduced at both application dates. Therefore, we have to conclude in all environments of this study that N dosage of much <220 kg N ha−<sup>1</sup> is sufficient to produce yield levels that are usually achieved in agricultural farming practice. Since, there is a linear relationship between N fertilizer inputs and energy inputs, the reduction of NFL not only provides an advantageous effect on the mitigation of greenhouse gas emissions but also provides a drastic profit for energy balance of oilseed rape production.

Even if our results cannot be generalized and might not be applicable to all future growth scenarios, our findings that the yield difference between both NFL was very low, in 10 independent experiments conducted across Germany in 2 years, suggest a remarkable potential for reduction in N fertilization without dramatic yield penalties However, this is depending on the weather conditions, which are not known in advance. The fact that farmers have to make their fertilizing decisions in absence of knowledge about further growth conditions makes it difficult to precisely adjust NFL to the real demand (Henke et al., 2007). For further field-based research experiments with contrasting NFL, we suggest to lower the N fertilization in both treatments and agree with Miersch et al. (2016), who suggested to use a delta of at least 100 kg N ha−<sup>1</sup> between NFL.

# CONCLUSION

The study was designed to evaluate the effect of plant breeding on NUE in the past decades and the adjustments in selection required to address future sustainability goals. The experimental data provide strong evidence that direct selection for SY and seed oil concentration leads not only to an enormous (oil) yield gain in highly fertilized environments, but also to a selection of genotypes with superior performance in low Ninput cultivation systems. Thus, genetic improvement increases NUE in oilseed rape and reduces the reliance on fertilizer inputs, as already suggested for other crops (Hawkesford, 2014). From this perspective, we concur with Burney et al. (2010) that yield improvement should play a predominant role in strategies toward GHG emission mitigation and enhancement of sustainability of crop production.

The surprisingly low yield gap between high and low nitrogen fertilization provides promising hints toward further N fertilizer-saving potential. However, transfer of these results into knowledge-based farming practices remains challenging. For a more precise and directed selection of even more efficient varieties, a better understanding of G × E × M interaction und physiological determinants of NUE are essential tasks for future research.

# AUTHOR CONTRIBUTIONS

RS, BW, and AS conceived the research; AS and MP performed the experiments, AS and MF performed data analysis; AS and RS wrote the manuscript.

# FUNDING

The work was funded by Federal Ministry for Food and Agriculture grant 22020013 (Federal Agency for Renewable Resources). Additional support was provided by the German Society for the Promotion of Plant Innovation (GFPi, Bonn).

# ACKNOWLEDGMENTS

The authors thank Guido Koglin, Wolfgang Sauermann, Christoph Algermissen (Landwirtschaftskammer Schleswig-Holstein, Bovenau, Germany), Dorothee Varelmann, Jutta Ahlemeyer (Deutsche Saatveredelung AG, Asendorf, Germany), Rüdiger Beißner, Vadym Avramenko, Simone Sendke, Pia Roppel (Monsanto Deutschland GmbH, Nienstädt, Germany), Stefan Abel, Maximilian Leps (Limagrain, Rosenthal/Peine, Germany), Lothar Behle-Schalk, Karl Heinz Balzer, Mechthild Schwarte (Justus Liebig-University Giessen, Rauischholzhausen, Germany), Stephan Priglmeier and Franz-Xaver Zellner (Saaten-Union, Moosburg, Germany) for excellent technical conductance of field experiments, and Petra Degen, Sabine Frei and Stjepan Vukasovic (Justus Liebig-University Giessen, Giessen, Germany) for valuable help with seed quality analysis. Furthermore we would like to thank Dieter Stelling (Deutsche Saatveredelung

#### REFERENCES


AG, Lippstadt, Germany), Martin Frauen (NPZ, Hohenlieth, Germany) and Amine Abbadi (NPZ Innovation, Hohenlieth, Germany) for valuable comments on the study.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpls.2017. 00963/full#supplementary-material

Monograph No. 158. Australian Centre for International Agricultural Research, Canberra, ACT.


hybrids is not related to delayed nitrogen starvation-induced leaf senescence. Plant Soil 384, 347–362. doi: 10.1007/s11104-014-2212-8


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Stahl, Pfeifer, Frisch, Wittkop and Snowdon. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Use of Blue-Green Fluorescence and Thermal Imaging in the Early Detection of Sunflower Infection by the Root Parasitic Weed *Orobanche cumana* Wallr.

Carmen M. Ortiz-Bustos <sup>1</sup> , María L. Pérez-Bueno<sup>2</sup> , Matilde Barón<sup>2</sup> and Leire Molinero-Ruiz <sup>1</sup> \*

<sup>1</sup> Department of Crop Protection, Institute for Sustainable Agriculture, CSIC, Cordoba, Spain, <sup>2</sup> Estación Experimental del Zaidín, CSIC, Granada, Spain

#### *Edited by:*

Juan Moral, Kearney Agricultural Center (UC-Davis), United States

#### *Reviewed by:*

Khawar Jabran, Düzce University, Turkey Grama Nanjappa Dhanapal, University of Agricultural Sciences, Bangalore, India

> *\*Correspondence:* Leire Molinero-Ruiz leire.molinero@csic.es

#### *Specialty section:*

This article was submitted to Crop Science and Horticulture, a section of the journal Frontiers in Plant Science

> *Received:* 06 March 2017 *Accepted:* 04 May 2017 *Published:* 18 May 2017

#### *Citation:*

Ortiz-Bustos CM, Pérez-Bueno ML, Barón M and Molinero-Ruiz L (2017) Use of Blue-Green Fluorescence and Thermal Imaging in the Early Detection of Sunflower Infection by the Root Parasitic Weed Orobanche cumana Wallr. Front. Plant Sci. 8:833. doi: 10.3389/fpls.2017.00833 Although the impact of Orobanche cumana Wallr. on sunflower (Helianthus annuus L.) becomes evident with emergence of broomrape shoots aboveground, infection occurs early after sowing, the host physiology being altered during underground parasite stages. Genetic resistance is the most effective control method and one of the main goals of sunflower breeding programmes. Blue-green fluorescence (BGF) and thermal imaging allow non-destructive monitoring of plant diseases, since they are sensitive to physiological disorders in plants. We analyzed the BGF emission by leaves of healthy sunflower plantlets, and we implemented BGF and thermal imaging in the detection of the infection by O. cumana during underground parasite development. Increases in BGF emission were observed in leaf pairs of healthy sunflowers during their development. Lower BGF was consistently detected in parasitized plants throughout leaf expansion and low pigment concentration was detected at final time, supporting the interpretation of a decrease in secondary metabolites upon infection. Parasite-induced stomatal closure and transpiration reduction were suggested by warmer leaves of inoculated sunflowers throughout the experiment. BGF imaging and thermography could be implemented for fast screening of sunflower breeding material. Both techniques are valuable approaches to assess the processes by which O. cumana alters physiology (secondary metabolism and photosynthesis) of sunflower.

Keywords: broomrape, carotenoids, early diagnosis, *Helianthus annuus* L., multicolor fluorescence, thermal detection

### INTRODUCTION

Sunflower (Helianthus annuus L.) oil is a major commodity in world trade mainly in Europe, where 60% of the total world production is obtained every year (FAOSTAT, 2016). The main biotic constraint for sunflower oil production in all the countries where sunflowers are grown with the only exception of the Americas- is broomrape, caused by the achlorophyllous parasitic plant Orobanche cumana Wallr. Currently, O. cumana is present in 15–20% of the area cropped

to sunflower in the world, causing close to 100% yield losses under high infestations (Fernández-Martínez et al., 2015). After parasite seed germination in the soil early after sowing, penetration of its intrusive cells into the host root tissues triggers their division, leading to the formation of a subterranean shoot that grows outside the root of sunflower. Then, parasite shoots emerge from the soil and form a flowering spike that rapidly produces a large amount of tiny seeds. Orobanche cumana parasitizes sunflower all through the crop growing season, and flowering of both, host and parasite, are coincident in time (Molinero-Ruiz et al., 2015).

Vascular connections established between broomrapes (Orobanche spp. and Phelipanche spp.) and their hosts force a complete disruption of the vascular system of the latter during the initial weeks of parasitism (Westwood, 2013). As a result, strong sink strength diverts water and nutrients necessary for the growth of the host (Stewart and Press, 1990). From the emergence of O. cumana shoots aboveground onwards, a reduced growth of sunflower is evident (Alcántara et al., 2006) and the decrease of the yield is due to absent or smaller sized capitula, low number and small seeds, and even plant death (Molinero-Ruiz et al., 2015). Thus, most of the metabolic imbalance and water flow impairment might be produced irreversibly in sunflower during underground parasite stages. Since it is physiologically closely linked to sunflower, O. cumana is extremely difficult to control in its parasitic stage. Some herbicide treatments may be effective, but depend on the development of herbicide resistance in the crop (Molinero-Ruiz et al., 2015). With regard to the free-standing stage of the parasite, thousands of tiny seeds are produced each season, their longevity contributing to the build-up of populations in soils. Since soil fumigation and solarisation are not economically feasible, use of inducers of suicidal broomrape germination requires an appropriate application technology and agronomic practices and biological control agents have limited efficacy, the effective control of O. cumana relies on genetic resistance (Molinero-Ruiz et al., 2015; Fernández-Aparicio et al., 2016).

Imaging techniques, especially blue-green fluorescence and thermal imaging among others, are powerful tools for use in plant stress detection (Lichtenthaler and Miehé, 1997; Chaerle and Van Der Straeten, 2000), including determination of plant diseases (Nilsson, 1995). They are based on the assessment of optical properties of plants within different regions of the electromagnetic spectrum and are able to utilize information beyond the visible range (Chaerle and Van Der Straeten, 2000). Under UV-excitation, plants can emit a wide fluorescence spectrum ranging from about 400 to 800 nm. This spectrum is the sum of two distinct types of fluorescence: blue-green fluorescence (BGF) characterized by a peak around 440 nm (F440) and a shoulder near 520 nm (F520), and fluorescence in the red and far-red regions with two characteristic peaks at about 680 nm (F680) and 740 nm (F740), respectively. The intensity of the BGF after UV light excitation is constant on a short time scale (minutes) and it has been proven to be very sensitive to single stress factors in plants (Buschmann and Lichtenthaler, 1998; Buschmann et al., 2000). The BGF from intact leaves is emitted by cinnamic acids, mainly ferulic acid (Morales et al., 1996), covalently bound to the cell walls of the epidermis, and by other phenolic compounds (Buschmann et al., 2000). In addition, BGF can emanate also from other secondary metabolites as reviewed by Cerovic et al. (1999). Moreover, quantitative variation of plant secondary metabolites associated with BGF emission can be the consequence, among others, of leaf aging (Morales et al., 2005), water stress (Kautz et al., 2014) or pathogen infection (Chaerle et al., 2007; Granum et al., 2015; Pérez-Bueno et al., 2015). Additionally, F680 and F740 are emitted by chlorophyll a (Chl a) and they are dependent on the pigment content and other factors such as optics of the leaf (Buschmann and Lichtenthaler, 1998; Gitelson et al., 1998).

On the other hand, radiation emitted by plants in the thermal infrared range from 8 to 12–14µm can be detected by thermographic and infrared cameras. Infrared thermography is an indicator of transpiration and stomatal conductance (Eaton and Belden, 1929; Jackson et al., 1977; Nilsson, 1995). It has been correlated, among others, with plant water status (Zarco-Tejada et al., 2012; Raza et al., 2014; Grant et al., 2016; Mangus et al., 2016) and canopy microclimate (Lenthe et al., 2007; Leuzinger and Korner, 2007), as well as with early infections by airborne (Lindenthal et al., 2005; Baranowski et al., 2015; Pérez-Bueno et al., 2015) and soilborne (Wang et al., 2012; Calderón et al., 2013; Granum et al., 2015) plant pathogens.

Both fluorescence and thermal imaging of whole leaves and whole plants have become essential techniques from the perspective of non-destructive monitoring of plant diseases from far (remote sensing) and from near distance, since sensors are sensitive to physiological disorders in plants associated with pathogen attack and with disease resulting from that attack (Buschmann et al., 2000; Chaerle and Van Der Straeten, 2000; Mahlein, 2016). Techniques for near distance monitoring of plant diseases can have a direct application in plant phenotyping of breeding programmes as those of sunflower. In this regard, fluorescence imaging in the red and far-red region has been recently applied in the early diagnosis of the infection of sunflower by O. cumana (Ortiz-Bustos et al., 2016a).

The objectives of this study were to: (a) analyse the distribution of BGF emitted by sunflower leaves during early growth stages of healthy sunflower plants in order to identify the adequate areas and times for BGF imaging, (b) evaluate the use of BGF imaging as an indicator of the infection of sunflower by O. cumana during initial underground development of the parasite and compare the BGF information with the pigments concentration in leaves, and (c) analyse the effect of underground infection by O. cumana in early stages of sunflower growth using thermal imaging of the leaves.

# MATERIALS AND METHODS

#### Growth of healthy Sunflower

Four sunflower seeds of the inbred line NR5 were surface sterilized by immersion in 20% household bleach (50 g of active chlorine per liter) for 5 min, then thoroughly rinsed in deionized water and incubated at 25◦C in the dark at saturation humidity until radicles were 2–5 mm long. Thereafter, individual sunflower seedlings were transplanted into pots with 250 g of a soil mixture SSP (sand:silt:peat moss 2:1:1, V). Pots were kept in a glasshouse at 12–22◦C without additional lighting (15/9 h light/dark regime) for 5 weeks. Plants were watered as needed and, when they were 2 weeks old and until the end of the experiment 3 weeks later, they were fertilized once a week with 15 ml/pot of a nutrient solution with N:P:K (7:5:6).

#### Inoculation with *O. cumana*, Growth Conditions and Disease Development

Sunflower plants were inoculated with population LPA13 of O. cumana (race F) (García-Carneros et al., 2014) following the methodology by Molinero-Ruiz et al. (2014). Eight sunflower seedlings (replications) of the inbred line NR5, which is the differential for the race F (Molinero-Ruiz et al., 2015), were individually transferred into pots with 250 g of SSP uniformly infested with 10 mg of parasite seeds. Eight non inoculated plants were established as controls. Plants were grown in the glasshouse under the same conditions described above for 5 weeks.

Sunflower plants were watched for development of wilting symptoms caused by the infection by O. cumana, but no visible differences were detected in inoculated plants as compared to the controls. At the end of the experiment, each plant was removed from the soil and its root system was washed and air dried at room temperature. Then, roots were weighed and the number and weight of O. cumana tubercles attached to the roots were subsequently recorded using a stereoscope and a precision balance, respectively.

### Blue-Green Fluorescence Imaging

Blue-green fluorescence images were acquired sequentially with an Open FluorCamFC 800-O using UV (355 nm) excitation light. Images for F440 and F520 as well as images corresponding to the F440/F520 ratio were analyzed with Fluorcam7 software (Photon Systems Instruments, Brno, Czech Republic) according to Pérez-Bueno et al. (2015). Nine images were captured during 18 s in order to obtain each averaged fluorescence image. Image size was 640 × 480 pixels with a resolution of 96 pixels per inch. Measurements were taken on attached and unshaded leaves and always at the same time of day. Two replicates (one from each blade) were considered in each leaf pair (LP). Numerical data from a representative area in each replicate were analyzed.

The time span of BGF in healthy sunflower plants was determined by capturing images of the first four LPs each, every 3–4 days from 2 to 5-weeks-old. For each measurement date, LPs were independently analyzed, and between 6 and 8 replicates were considered for each of them.

Concerning the effect of O. cumana on the BGF emission of leaves of infected sunflower, measurements of F440 and F520, as well as those of F440/F520, F440/F680, and F440/F740, were acquired in all leaves of control and inoculated plants twice a week since they were 1 cm long until their complete expansion. Measurements were made during the V1–V4 vegetative stages of the plants (Schneiter and Miller, 1981) and, at each point in time, leaves of the same developmental stage were compared.

# Spectrophotometrical Determination of Pigments Concentration in Leaves

Total chlorophyll [Chl (a+b)] and carotenoids (xanthophylls and carotenes) [Car (x+c)] contents of parasitized sunflower were determined by spectrophotometry, and compared to those of the controls. Extractions were carried out on the second, third and fourth LP of the plants 5 weeks after inoculation (wai). From each LP blade, a 1 cm disk was punched, weighted and placed in vials containing liquid nitrogen. Disks were ground and photosynthetic pigments were extracted by adding 4 ml of 80% acetone (v/v). Absorbance at 470, 647, and 663 nm was measured from the resulting extracts with a Shimadzu UV-1800 spectrophotometer (Shimadzu Corporation, Tokyo, Japan). Concentrations of Chl a, Chl b and Car (x+c) were calculated according to Lichtenthaler and Buschmann (2001) and expressed as µg/g fresh leaf weight:

$$\begin{aligned} \text{Chl}\_{a} (\mu g/ml) &= 12.25 A\_{663.2} - 2.79 A\_{646.8} \\ \text{Chl}\_{b} (\mu g/ml) &= 21.50 A\_{646.8} - 5.10 A\_{663.2} \\ \text{Car}\_{\text{(x+c)}} (\mu g/ml) &= \langle 1000 A\_{470} - 1.82 C\_{a} - 85.02 C\_{b} \rangle / 198 \end{aligned}$$

The Chl (a+b)/Car (x+c) ratio was calculated according to Konanz et al. (2014).

#### Thermal Imaging

Infrared images of leaves of inoculated and control sunflowers were taken using a FLIR A305sc camera (FLIR Systems, Wilsonville, Oregon, USA) that operates in the 7.5–13.5 µm wavelength range and has a thermal sensitivity <0.05◦C at +30◦C and accuracy of ±2 ◦C. The thermal camera was vertically positioned at approximately 0.3 m from the canopy and produced images of 320 × 240 pixels, with a viewing field of 45◦ . Digital video data were analyzed by the Research & Development software by FLIR.

Images were taken twice a week, from 2 and until 5 wai. At each time point, all fully developed leaves in control and in inoculated plants were measured. Measurements were performed on attached and unshaded leaves at the same time of day. Two hours before each measurement time, inoculated and control plants were moved from the glasshouse to a growth chamber in order to avoid glasshouse thermal fluctuations.

Digital color (RGB) images (2,048 × 1,536 pixels) were obtained simultaneous to the thermal images. One representative area in the midsection of each leaf was selected for the analysis after comparison of thermal and RGB images.

# Data Processing and Statistics

Data of the spatial distribution and time span of BGF on healthy sunflower were analyzed as means and their corresponding standard errors. When O. cumana was inoculated to sunflower, BGF emission, pigments concentration, leaf temperature and broomrape incidence [BI, transformed according to sqrt (BI + 0.5)] were statistically analyzed. The experiment was set up as a completely randomized design and was conducted twice. Similarity between repetitions was tested by analysis of variance (ANOVA) with repetitions as blocks. Since no significant differences were found, the data were pooled. When, after ANOVA, differences of any of the considered variables between inoculated and control plants were significant, mean values were compared by Fisher's protected Least Significant Difference tests (P = 0.05). STATISTIX 8.0 software (Analytical software, Tallahassee, FL, USA) was used for all the analyses.

#### RESULTS

#### Blue-Green Fluorescence Emission in Healthy Sunflower Plants

A clear increase in the intensity of F440 and F440/F520 was observed in the first LP during its expansion and development. A similar trend was also observed in the case of the second LP in both parameters from the third week onwards. Values of F440 and F440/F520 of the two upper LPs evolved similarly to those of the lower LPs along the time, although smaller increases were observed in the third and fourth LP throughout the measurement period. By contrast, and with the exception of a slight increase in the F520 signal in the first LP, that of the rest of leaves remained fairly constant throughout the experiment, and even a decrease was detected in the second LP in the last week (**Figure 1**).

### Effect of *O. cumana* on BGF Emission of Sunflower

The effect of the infection of sunflower by O. cumana on the BGF emission of leaves was examined by comparison with that of leaves from the control plants at five moments during the 3 week period, and is presented in **Figure 2**.

A significantly lower fluorescence at 440 nm was consistently detected in parasitized plants throughout the experiment. Initially, leaves showed lower values of F440 than those at the end of the experiment, in both inoculated and control sunflowers (**Figure 2A**). Similarly, the F520 values in young leaves were significantly decreased by the parasite attack. As with F440, leaves of inoculated plants had lower F520, although F520 of older leaves did not significantly differ from that of the controls (**Figure 2B**).

Concerning the F440/F520 ratio, significant decreases were obtained in up to 14-day-old leaves of infected plants as compared to leaves from healthy plants. A progressive increase in F440/F520 ratio was evident in leaves of inoculated and control sunflowers throughout the experiment, although no significant differences were found between treatments in the two last measurements (**Figure 2C**).

On the other hand, the effect of the inoculation by O. cumana was also observed in the F440/F680 emission, being this fluorescence variable significantly low in 8 to 14-day-old leaves of parasitized sunflowers. Higher F440/F680 ratios were obtained in older leaves of inoculated plants, although differences with those in leaves of the control plants were not significant (**Figure 2D**). Similar results were obtained in the case of the F440/F740 ratio, young leaves of inoculated plants presenting significantly lower values than those of the control plants. When leaves were older than 14 days, the F440/F740 ratio was higher in the inoculated sunflowers, but no significant differences were found (data not shown).

# Carotenoids and Total Chlorophyll Contents in Sunflower Leaves upon Infection by *O. cumana*

The effect of the inoculation by O. cumana on chlorophyll and carotenoids contents at the end of the experiment is presented in **Table 1**. As previously observed by our research group

(Ortiz-Bustos et al., 2016a), significant differences of chlorophyll content occurred in sunflower plants upon inoculation with O. cumana. Conversely, on the second and third LPs, no significant differences in either the carotenoids content or the chlorophyll/carotenoids ratio were detected between inoculated and control plants. Significantly lower values of both variables


TABLE 1 | Measurements of pigments concentration (chlorophyll content, carotenoids content and chlorophyll/carotenoids ratio) for the second, third, and fourth leaf pair of sunflower plants inoculated with *Orobanche cumana* (I) and control plants (C) at 5 weeks after inoculation.

<sup>a</sup>Chlorophyll and carotenoids (xanthophylls and carotenes) content is expressed as content of chlorophyll a and b [Chl (a+b)], and [Car (x+c)] respectively, and chlorophyll/carotenoids ratio is expressed as Chl (a+b)/Car (x+c).

<sup>b</sup>Mean ± standard error (SE), n = 7–8.

<sup>c</sup>Level of significance of differences in the variables between control (C) and inoculated (I) plants obtained after analyses of variance according to a completely randomized statistical design.

were obtained in the fourth LP of inoculated plants (203.20µg/g and 5.07 for total carotenoids and for the chlorophyll/carotenoids ratio, respectively) compared with those in the same LP of healthy plants (274.30µg/g and 5.80 for total carotenoids and for the chlorophyll/carotenoids ratio, respectively) (**Table 1**).

#### Effect of *O. cumana* on the Temperature of Sunflower Leaf

**Figure 3** shows the progress over time of the average leaf temperature of the inoculated and control plants.

Leaves of inoculated sunflowers were warmer than those of the controls at all the measurement dates. Significant temperature increases were observed 2.5 wai and persisted until the end of the experiment (5 wai) with the exception of measurements at 3 and 4 wai. Differences of leaf temperature ranged from 0.4 to 0.8◦C at 5 and 2.5 wai respectively (**Figure 3**).

Leaf temperature showed an irregular time course irrespective whether the plants were inoculated or not, although a similar trend was observed in both situations. Thus, leaf temperature remained fairly constant during the two first weeks until 4 wai, when a clear increase was observed in the inoculated and also in the control leaves of sunflower (0.8 and 0.9◦C respectively).

#### Assessment of the Infection of *O. cumana* in Sunflower

Parasitized sunflowers reached the same growth stage than the controls. No symptoms (i.e., wilting) due to O. cumana were observed in aboveground parts of the plants throughout the experiment. Nonetheless, the infection was confirmed by tubercle formation in roots of inoculated sunflowers at the end of the experiment.

Although no significant differences in root weight were observed between inoculated and control plants, up to 0.29 g and 17 tubercles per plant were recorded in inoculated sunflowers. Tubercles were absent in the controls (**Table 2**).

# DISCUSSION

Our results showed that the BGF emission of individual LPs of healthy sunflower increases in time. Besides, the BGF emission depends on leaf development: the same pattern of the bluegreen signal emitted by a LP (i.e., the first LP) until its complete expansion was presented, 1 week later, by the following LP (i.e., the second LP). An increase of BGF of leaves throughout their elongation was observed in wheat by Meyer et al. (2003), who proposed BGF as a signature of leaf aging. Indeed, aging of leaf implies transition from source to sink tissue by diversion of carbon from primary to secondary metabolism. On the other hand, different BGF emissions in artichoke leaves with age were also observed by Morales et al. (2005), although old leaves emitted lower BGF than young ones. Interestingly, although in most plant species ferulic acid is the major bluegreen fluorophore (Morales et al., 1996), in the case of sunflower there is no ferulic acid bound to cell walls. Small amounts of caffeic acid and trichome secretions (e.g., sesquiterpenes lactones and coumarins) have been described as blue-green fluorophores in sunflower (Spring and Schilling, 1989; Olson and Roseland, 1991; Morales et al., 1996; Tourvieille de Labrouhe et al., 1997; Lichtenthaler and Schweiger, 1998; Rowe et al., 2012). These compounds could be responsible for BGF emission in our experiments. In fact, the coumarins content in sunflowers grown under optimal non-limiting conditions tends to be very low, thus their presence can be considered as a good marker of a stress event (Prats-Perez et al., 2000).

Apart from monitoring plant physiological status in response to growth, BGF has been successfully used to detect nutrient deficiencies (McMurtrey et al., 1996; Cadet and Samson, 2011), water and temperature stress (Lang et al., 1996; Hura et al., 2007), pathogen attack (Granum et al., 2015) and a simultaneous combination of stress events (Bürling et al., 2011). Preliminary studies by our group proved the potential of BGF to discriminate O. cumana infection in sunflower plants at early stages (3 wai)



<sup>a</sup>Mean ± standard error, n = 8.

<sup>b</sup>Level of significance of differences of root fresh weight between control (C) and inoculated (I) plants obtained after analysis of variance according to a completely randomized statistical design.

(Pérez-Bueno et al., 2014). The present findings allow us to distinguish infected from non-infected sunflowers 1 week earlier by means of decreases in F440 and F520 emissions, as well as by lower values of the F440/F520, F440/F680, and F440/F740 ratios. Spectral features of BGF and their intensity ratios have been investigated not only for their diagnostic value, but also for understanding the physiological changes that take place during stress development (Buschmann and Lichtenthaler, 1998). Our BGF imaging results suggest that O. cumana alters the secondary metabolism of sunflower, e.g., accumulation of caffeic acid and coumarins. Furthermore, the spectrophotometric pigment quantification provided evidence of decreased contents of carotenoids in leaves of infected plants as compared to those of the controls in agreement with the results by Shen et al. (2013) on Mikania micrantha infected by Cuscuta campestris. Many secondary metabolites are not only major contributors to specific odors, tastes, and colors of plants, but they also play a key role in defense against herbivores and pathogens (Berger et al., 2007; Wink, 2010; Ouzounis et al., 2015). Coumarins are excreted by roots of sunflowers with resistance to the parasite (Serghini et al., 2001), as well as other toxic (Zélicourt et al., 2007) and phenolic (Echevarría-Zomeño et al., 2006) compounds, all of them having a defensive role against O. cumana.

Since the F440/F520 ratio is affected by chlorophyll content (Stober and Lichtenthaler, 1992; Morales et al., 1994), decreases upon infection by O. cumana suggest that a lower content of chlorophyll is present in leaves. Recent findings by our research group also suggested that, when sunflowers are infected by O. cumana, the chlorophyll content in young leaves is decreased (Ortiz-Bustos et al., 2016a). Reductions in chlorophyll content are also induced by O. foetida attack on chickpea (Cicer arietinum L.) (Nefzi et al., 2016) and by C. australis infection on M. micrantha (Le et al., 2015). Our results also support the value of the fluorescence ratios F440/F680 and F440/F740 as very early stress indicators (Lichtenthaler and Miehé, 1997; Buschmann and Lichtenthaler, 1998; Buschmann et al., 2000). Low values of these ratios during early stages of the infection of sunflower may not only be due to decreased F440 and F520 emission in leaves, but also to higher F680 and F740 signals (Ortiz-Bustos et al., 2016a). Beyond that, our findings show that BGF and its intensity ratios with chlorophyll bands could be used, in addition to directly detecting O. cumana in sunflower, as an indirect approach to the alteration of plant photosynthesis and to the impairment of the secondary metabolism as a result of parasite infection.

The applicability of thermal infrared imaging in determining plant temperature as an early response to biotic or abiotic stresses is widely documented and a topic of hectic research activity (Nilsson, 1995; Chaerle and Van Der Straeten, 2001; Raza et al., 2014; Baranowski et al., 2015; Grant et al., 2016; Mahlein, 2016; Mangus et al., 2016). In this work we observed increases in leaf temperature of parasite infected sunflower from 2 to 5 wai. Recently, increases in canopy temperature allowed the detection of late wilt disease caused by the soil borne fungus Harpophora maydis up to 17 days earlier than symptoms development in maize (Ortiz-Bustos et al., 2016b). Also, significant increases in crown temperature have allowed the differentiation of olive trees infected by Verticillium dahliae in the field (Calderón et al., 2013). Leaf thermal increases are closely related to reduced transpiration rates due to either activation of stomatal closure or inhibition of stomatal opening. Previous studies revealed that Orobanche ramosa and C. campestris parasitism reduced stomatal conductance and transpiration rate and, consequently, slowed down host photosynthesis and host growth (Mauromicale et al., 2008; Chen et al., 2011). The effects of parasite-induced stomatal closure and transpiration reduction on the decreased development of sunflower (Alcántara et al., 2006) should be investigated in the future. To the best of our knowledge, this work constitutes the first approach to the diagnosis of parasite infection in crops by means of thermal imaging and could be further implemented in field conditions.

Although the damage done by broomrape species to crops is directly attributed to parasitic sink activity (Barker et al., 1996; Manschadi et al., 1996; Hibberd et al., 1998; Draie et al., 2011; Péron et al., 2012), our results have evidenced that, in the O. cumana—sunflower interaction, the damage might extend beyond assimilate diversion. Many parasitic angiosperms display a pathogenic nature promoting disease in the crop mainly through negative effects on the photosynthesis, physiology and hormonal balance of the host (Stewart and Press, 1990; Watling and Press, 2001; Mauromicale et al., 2008). The present work provides valuable and essential clues toward the understanding of the processes by which O. cumana seems not only to cause changes in sunflower secondary metabolism but also to alter its photosynthetic capacity and unbalance its carbohydrate metabolism. Nevertheless, additional research will be required to clarify how both physiological processes are affected.

#### CONCLUSION

The outstanding significance of our BGF imaging and thermography results, from a diagnosis point of view, is

#### REFERENCES


that the establishment of a soil borne pathogen (O. cumana) in a below-ground organ (root) of the plant can be detected prior to the development of visual symptoms in far distant and above-ground organs (leaves). Diagnostic fluorescence and thermal signals are related to host physiology alterations upon infection, and continuous measurements for long periods of time are possible. Therefore, these techniques enable not only the detection of stress onset by O. cumana, but also the monitoring of its development in sunflower over time, providing an additional tool for basic research about holoparasitehost plant interactions. Finally, and as a useful outcome of this work, a fast phenotyping of sunflower lines could be achieved by means of the implementation of BGF imaging and thermography in breeding programmes for resistance to O. cumana.

#### AUTHOR CONTRIBUTIONS

LM, MP, and MB conceived and designed the experiments. CO, LM, and MP conducted experiments. CO and LM analyzed data and interpreted the results. CO mounted images. LM and MB contributed materials, equipment and analysis tools. CO and LM wrote the manuscript and all the authors reviewed it and approved the final version.

#### ACKNOWLEDGMENTS

Financial support for this research was provided by "Consejería de Economía, Innovación y Ciencia" of the Andalusian Government (P12-AGR1281 to LM and P12-AGR370 to MB) and the European Social Fund, and RECUPERA 2020 (grant number 20134R060 to MB) and FEDER Funds. We acknowledge support of the publication fee by the CSIC Open Access Publication Support Initiative through its Unit of Information Resources for Research (URICI).


a holoparasitic plant depends on nitrogen supply. PLoS ONE 8:e75555. doi: 10.1371/journal.pone.0075555


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Ortiz-Bustos, Pérez-Bueno, Barón and Molinero-Ruiz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genome-Wide Identification of the MIKC-Type MADS-Box Gene Family in *Gossypium hirsutum* L. Unravels Their Roles in Flowering

Zhongying Ren1, 2 †, Daoqian Yu1, 2 †, Zhaoen Yang1, 2, Changfeng Li 2, 3, Ghulam Qanmber <sup>2</sup> , Yi Li <sup>2</sup> , Jie Li <sup>2</sup> , Zhao Liu<sup>2</sup> , Lili Lu<sup>2</sup> , Lingling Wang<sup>2</sup> , Hua Zhang<sup>1</sup> , Quanjia Chen<sup>1</sup> , Fuguang Li 1, 2 \* and Zuoren Yang1, 2 \*

<sup>1</sup> Xinjiang Research Base, State Key Laboratory of Cotton Biology, Xinjiang Agriculture University, Urumqi, China, <sup>2</sup> Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, China, <sup>3</sup> Cotton Research Institute, Anhui Academy of Agricultural Sciences, Hefei, China

#### *Edited by:*

Leire Molinero-Ruiz, Instituto de Agricultura Sostenible (CSIC), Spain

#### *Reviewed by:*

Antonio José Monforte, Instituto de Biología Molecular y Celular de Plantas (CSIC), Spain Ankica Kondic-Spika, Institute of Field and Vegetable Crops, Serbia

#### *\*Correspondence:*

Zuoren Yang yangzuoren4012@163.com Fuguang Li aylifug@163.com † These authors have contributed equally to this work.

#### *Specialty section:*

This article was submitted to Crop Science and Horticulture, a section of the journal Frontiers in Plant Science

*Received:* 25 December 2016 *Accepted:* 06 March 2017 *Published:* 22 March 2017

#### *Citation:*

Ren Z, Yu D, Yang Z, Li C, Qanmber G, Li Y, Li J, Liu Z, Lu L, Wang L, Zhang H, Chen Q, Li F and Yang Z (2017) Genome-Wide Identification of the MIKC-Type MADS-Box Gene Family in Gossypium hirsutum L. Unravels Their Roles in Flowering. Front. Plant Sci. 8:384. doi: 10.3389/fpls.2017.00384 Cotton is one of the major world oil crops. Cottonseed oil meets the increasing demand of fried food, ruminant feed, and renewable bio-fuels. MADS intervening keratin-like and C-terminal (MIKC)-type MADS-box genes encode transcription factors that have crucial roles in various plant developmental processes. Nevertheless, this gene family has not been characterized, nor its functions investigated, in cotton. Here, we performed a comprehensive analysis of MIKC-type MADS genes in the tetraploid Gossypium hirsutum L., which is the most widely cultivated cotton species. In total, 110 GhMIKC genes were identified and phylogenetically classified into 13 subfamilies. The Flowering locus C (FLC) subfamily was absent in the Gossypium hirsutum L. genome but is found in Arabidopsis and Vitis vinifera L. Among the genes, 108 were distributed across the 13 A and 12 of the D genome's chromosomes, while two were located in scaffolds. GhMIKCs within subfamilies displayed similar exon/intron characteristics and conserved motif compositions. According to RNA-sequencing, most MIKC genes exhibited high flowering-associated expression profiles. A quantitative real-time PCR analysis revealed that some crucial MIKC genes determined the identities of the five flower organs. Furthermore, the overexpression of GhAGL17.9 in Arabidopsis caused an early flowering phenotype. Meanwhile, the expression levels of the flowering-related genes CONSTANS (CO), LEAFY (LFY) and SUPPRESSOR OF OVEREXPRESSION OF CONSTANS1 (SOC1) were significantly increased in these lines. These results provide useful information for future studies of GhMIKCs' regulation of cotton flowering.

Keywords: *Gossypium hirsutum* L., *GhMIKCs,* phylogeny, structure, expression patterns, flower

# INTRODUCTION

Transcription factors play an indispensable role in growth and development, and MADS transcription factor family members have been detected in the genomes of plants, animals, and fungi (Becker et al., 2000; Becker and Theissen, 2003; Messenguy and Dubois, 2003). In monophyletic evolution, they are divided into two classes: type I and type II (Alvarez-Buylla et al., 2000). The type I MADS-box genes are serum response factor-like genes in animals and fungi, while

**276**

they are M-type genes in plants. They are characterized by a highly conserved MADS domain of the 58–60 amino acids, located in the N-terminal region of the proteins, which are involved in DNA binding and dimerization. Functional investigations have been mainly restricted to Arabidopsis (Parenicová et al., 2003). The type II family plays a significant role in regulating flowering during plant development (Mondragon-Palomino, 2013). Type II genes are closely related to the myocyte enhancer factor-2-like genes of animals and yeast. However, MADS intervening keratin-like and C-terminal (MIKC)-type MADS-box genes are found only in plants.

The MIKC-type plant genes contain three additional domains other than the MADS (M): Intervening (I), Keratin (K), and the C-terminal (C) domains (Theißen et al., 1996; Kaufmann et al., 2005). The I domain forms DNA-binding dimers, which is less conserved (Riechmann et al., 1996). The K domain, which consists of ∼70 amino acids, is mainly responsible for dimerization by a coiled-coil structure (Ma et al., 1991; Fan et al., 1997). The C domain exhibits transactivation and mediates protein–protein interactions (Kramer and Irish, 1999; Honma and Goto, 2001). Based on structure divergence at the I domain, the MIKC-type genes are classified into two subgroups, MIKC<sup>C</sup> and MIKC<sup>∗</sup> . Earlier investigations found 39 and 37 MIKC<sup>C</sup> genes in Arabidopsis and O. sativa, respectively (Parenicová et al., 2003; Arora et al., 2007). The MIKC<sup>C</sup> type plays a crucial role in the flowering time, floral organ identity determination and fruit ripening in plant growth and development (Theissen, 2001; Becker and Theissen, 2003; Theissen and Melzer, 2007; Li et al., 2016).

The genetic floral organ model is derived from the analysis of homeotic floral mutants. The ABC model was named after three classes of genes (A, B, and C) (Coen and Meyerowitz, 1991), and has developed into the more exact ABCDE model. MIKC<sup>C</sup> family genes have combined and determined the identities of the floral organs: sepals (A + E), petals (A + B + E), stamens (B + C + E), carpels (C + E), and ovules (D + E) (Bowman et al., 1991; Coen and Meyerowitz, 1991; Ma and Depamphilis, 2000; Zahn et al., 2006; Silva et al., 2015). In Arabidopsis, the functional genes were divided into five classes: Class A: APETALA1 (AP1); Class B: PISTILATA (PI) and AP3; Class C: AGAMOUS (AG) (Acri-Nunes-Miranda and Mondragón-Palomino, 2014); Class D: SEEDSTICK/AGAMOUS-LIKE11 (STK/AGL11); and Class E: SEPALLATA (SEP1, SEP2, SEP3, and SEP4) (Ferrándiz et al., 2000; Pinyopich et al., 2003). Other MIKC<sup>C</sup> genes that regulated flowering time and flower initiation have been identified as follows: Suppressor of Overexpression Of Constans1 (SOC1) (Lee et al., 2000; Hepworth et al., 2002); Flowering Locus c (FLC) (Michaels and Amasino, 1999; Searle et al., 2006; Reeves et al., 2007); AGAMOUSLIKE GENE 24 (AGL24) (Michaels et al., 2003; Liu et al., 2008) and Short Vegetative Phase (SVP) (Hartmann et al., 2000; Michaels et al., 2003; Lee et al., 2007). Others are involved in fruit ripening, such as SHATTERPROOF 1– 2 and FUL (Ferrándiz et al., 2000; Liljegren et al., 2000), in seed pigmentation and endothelium development, such as TRANSPARENT TESTA16 (Nesi et al., 2002), and in root development such as AGL12 and AGL17 (Rounsley et al., 1995; Tapia-López et al., 2008). Studies of the evolutionary history of MIKC genes have explored the internal mechanisms behind their functional diversification in plant growth and development.

Cotton is not only the most important source of natural fiber for textile industry (Pang et al., 2010), but also a major contributor in world oilseed economy. The extracted cottonseed oil has long been considered to be a good vegetable oil (Michaelk et al., 2010; Sawan, 2014; Zhang et al., 2014). Simultaneously, as an alternative and sustainable oil source, cottonseed oil has been developed into biodiesel and used as substitutes for petroleum (Carlsson, 2009; Alhassan et al., 2014). As the top five oil crops in the world (Wang et al., 2016), cottonseed oil occupies about 21% of the cottonseed production (Malik and Ahsan, 2016; Wang et al., 2016; Yang and Zheng, 2016). The formation of cotton seed originates from ovule which is an important part of floral organs. G. hirsutum's MIKC functions are highly significant in plant developmental processes. Especially, a number of genes could involve in the development of flower morphology (Honma and Goto, 2001; Messenguy and Dubois, 2003). For example, GhMADS3, a homolog of Arabidopsis AG and putative C function gene, overexpression can improve sepal-to-carpel and petal-to-stamen transformations in transgenic tobacco (Guo et al., 2007). GhMADS13, a high homolog of Arabidopsis AGL6, overexpression significantly promotes flower buds in cotton (Wu et al., 2009), and GhMADS14 is enhanced gradually during the early stages of fiber elongation (Zhou et al., 2014). In previous study, 53 members of the G. hirsutum MIKC<sup>C</sup> gene family were identified based on the G. raimondii genome (Jiang et al., 2014). However, owing to the lack of G. hirsutum genome sequences, a comprehensive analysis of MIKC-type MADS genes in G. hirsutum has not yet been reported.

Recently, the G. hirsutum genome was sequenced. To systematically analyze the MIKC-Type MADS family genes in G. hirsutum, 92 MIKC<sup>C</sup> , and 18 MIKC<sup>∗</sup> members of the MIKC family were identified from the whole G. hirsutum genome. Phylogeny, structures, locations and expression patterns were comprehensively analyzed. AGL17 is the biggest subgroup, and the involvement of GhAGL17 subfamily gene in regulating flowering was confirmed by ectopic expression in Arabidopsis. Our findings provide a foundation for the genetic improvement of cotton flowering.

# MATERIALS AND METHODS

#### Identification of MIKC Genes in *Gossypium hirsutum* L

To identify members of the MIKC gene family in G. hirsutum, Arabidopsis MIKC sequences were obtained from the TAIR database (http://www.arabidopsis.org) and used as queries for a BLASTP algorithm-based against the G. hirsutum genome database (https://www.cottongen.org/species/ Gossypium\_hirsutum/nbi-AD1\_genome\_v1.1) (Zhang et al., 2015). The MIKC protein domain was analyzed using the Hidden Markov Model (HMM) from the Pfam database (http://pfam.xfam.org/). The SRF-TF and K-box domains were confirmed by Pfam accessions (PF00319 and PF01486, respectively). All of the candidate proteins were manually

numbers of the MIKC proteins are listed in Supplementary S1.

checked using the above described methods to remove the redundant sequences.

#### Phylogenetic Tree Construction

To construct a MIKC-protein phylogenetic tree using MEGA 6.06, MIKC proteins from four plant species, Arabidopsis, O. sativa, V. vinifera, and G. hirsutum, were employed. The neighbor-joining method with amino acid p-distance was applied to construct the tree (Tamura et al., 2011), and the reliability was obtained by bootstrapping with 1,000 replicates.

# Exon/Intron Structure, Motif and Chromosomal Location Analyses

The exon/intron structures of MIKC genes were retrieved by the alignment of predicted coding sequences with corresponding genomic sequences using the gene structure display server (GSDS) program (http://gsds.cbi.pku.edu.cn/).

The online program MEME (http://meme-suite.org/) was employed to determine the conserved motifs in GhMIKCs with the following optimum parameters: a motif width of 8–200 amino acids and a maximum of 13 motifs. The identified motifs were annotated using the program InterProScan (Quevillon et al., 2005).

The chromosomal distributions of MIKC genes were obtained based on genome annotation data. The MapInspect software was applied to draw images of their physical locations in G. hirsutum.

### Gene Expression Analysis

The expression of MIKC family genes were measured using RNA-sequencing method. The raw RNA-sequencing data of G. hirsutum TM-1 seven different tissues (root, stem, leaf, flower, ovule, seed, and fiber) was downloaded from the NCBI Gene Expression repository under the accession number PRJNA248163 (Table S4) (https://www.ncbi.nlm.nih.gov/ bioproject/PRJNA248163/). The relative data were normalized to calculate the expression levels. Hierarchical clustering was performed using Genesis 1.7.7 (Sturn et al., 2002).

#### RNA Isolation and the qRT-PCR Analysis

Gossypium hirsutum L. (cv CCRI24) was cultivated in the field in Zhengzhou, China. Five different tissue parts of flower: sepal, petal, stamen, carpel, and ovule were sampled, respectively at full bloom stage. Arabidopsis (Columbia-0) was used as wild type; the leaves of wild type and transgenic lines grown for 25 days were harvested. All samples were frozen immediately in liquid nitrogen and kept at −80◦C for total RNA extraction. Total RNA was extracted from each sample using the TRIzol reagent (TIANGEN, Beijing, China) and treated with RNase-free DNase I. Gel electrophoresis and a Nanodrop2000 nucleic acid analyzer were employed to detect the quality of RNA. The first cDNA strand was synthesized from 1 µg total RNA using the Transcriptor First Strand cDNA Synthesis Kit version DRR047A (TaKaRa, Dalian, China). The cDNA was diluted five times for the next experiments.

The gene-specific primers used for qRT-PCR were listed in Supplementary Table S2 and S3. The G. hirsutum His3 gene and Arabidopsis Actin2 gene were used as an internal control respectively. The qRT-PCR was performed using SYBR Green (Roche) on a LightCycler480 system (Roche). Each reaction was conducted in a 96-well plate with a volume of 20µl. The PCR cycling parameters were as follows: 95◦C for 5 min, 40 cycles of 95◦C for 10 s, 60◦C for 10 s, and 72◦C for 10 s, followed by an increase from 60 to 95◦C. The relative expression levels were analyzed using the LightCycler <sup>R</sup> 480 gene scanning software. Three biological replicates were measured and each biological replicate was run three times.

### Isolation of *GhAGL17.9* and Transformation of Arabidopsis

We amplified GhAGL17.9 using cDNA templates from the mix of CCRI24 root, stem, leaf and flower. The amplified product was cloned into vector pCambia2301 (CAMBIA) containing the CAULIFLOWER MOSAIC VIRUS (CaMV) 35S constitutive promoter, and then, the constructed vector was introduced into Agrobacterium tumefaciens GV3101 (Clough and Bent, 1998). Floral dip method was used for Agrobacterium-mediated transformation of Arabidopsis. Positive transgenic lines were selected on MS medium containing kanamycin. To grow the transgenic lines, seedlings were sown in plastic pots filled with a nutrient soil and vermiculite mix. Then, they were grown in a culturing room at 22◦C under a 16-h light/8-h dark cycle for 1 month.

# RESULTS

#### Identification of MIKC Genes in *Gossypium hirsutum* L

HMMER and BLASTP algorithm-based searches were used to identify MIKC protein HMM profiles based on the highly conserved MADS and K-box domains. To identify the maximum number of MIKC genes in G. hirsutum, HMMs of SRF-TF, and Kbox domains (PF00319 and PF01486, respectively) were extracted from Pfam database to use as queries against protein sequences from the G. hirsutum genome (https://www.cottongen.org/ species/Gossypium\_hirsutum/nbi-AD1\_genome\_v1.1). A total of 145 putative MIKC proteins were identified. To verify the results, we conducted a multiple sequence alignment and removed 35 redundant sequences. Finally, 110 MIKC protein sequences were identified by confirming their conserved domains using the Pfam web server (Figure S3). From the sequences, 92 MIKC<sup>C</sup> genes and 18 MIKC<sup>∗</sup> genes were identified. Thus, 84% of the MIKC genes were MIKC<sup>C</sup> in G. hirsutum (**Figure 1A**). The identified MIKC genes were listed with their corresponding locus tag (**Table 1**). We named the MIKC<sup>C</sup> genes on the basis of their assignment to the 13 previously classified Arabidopsis, O. sativa and P. tremula subfamilies (Parenicová et al., 2003; Leseberg et al., 2006; Arora et al., 2007). Subgroup AGL17 had the greatest (13%) number of GhMIKC<sup>C</sup> genes; however, subgroups TM8 and AGL12 had the lowest (2%) number of GhMIKC<sup>C</sup> genes (**Figure 1B**). The GhMIKCs' encoding amino acids were relatively conserved, and MIKC<sup>C</sup> proteins were highly conserved, ranging from 200 to 300 amino acids in most cases. MIKC<sup>∗</sup> proteins generally possessed more than 300 amino acids. The

#### TABLE 1 | *MIKC* genes identified in *Gossypium hirsutum* L.


(Continued)

#### TABLE 1 | Continued


(Continued)


#### TABLE 1 | Continued

chromosomal locations of the 108 GhMIKCs were distributed in different subgroups of the A and D genomes, while GhAGL17.12 and GhAP3.10 were located on scaffolds.

#### Phylogenetic Analysis of the MIKC Gene Family

To examine the phylogenetic relationships among G. hirsutum MIKC proteins and to categorize them within the established subfamilies from other plants, we performed a multiple alignment analysis using the neighbor-joining method of 110 full-length MIKC proteins from G. hirsutum, 44 MIKC proteins from V. vinifera, 46 MIKC proteins from Arabidopsis, and 41 MIKC proteins from O. sativa (Table S1). The MIKC<sup>C</sup> proteins were divided into 13 subfamilies (SVP, BS, AGL17, AGL15, AP3-PI, AGL12, SOC1, AG/SHP/STK, AP1/FUL, AGL6, SEP, TM8, and FLC; **Figure 1C**). The AGL17 subgroup was the largest, and the FLC subgroup was absent in the G. hirsutum genome. Additionally, no TM8 family members were found in Arabidopsis. TM8 constituted the smallest clade, having only four members, including two GhMIKCs, GhTM8.1, and GhTM8.2. The MIKC<sup>∗</sup> proteins were divided into two subfamilies.

#### Gene Structure and Protein Motif Analysis

A phylogenetic analysis revealed that our tree corresponded to those reported recently in V. vinifera and C. sativus (Díaz-Riquelme et al., 2009; Hu and Liu, 2012). The structures of the MIKC genes also helped to determine phylogenetic relationships (**Figure 2**). Most members had significant sequence identities in the same subfamily and similar exon-intron structures, indicating close evolutionary relationships. The most important differences were in the exon-intron lengths (**Figure 2B**). In general, most members contained eight exons in the SEP, AGL6, and AP1 gene families (except GhAGL6.1, GhAGL6.4, GhAP1.2, GhAP1.6, and GhAPI.8). The SVP (other than SVP1) and AGL12 subgroups had seven exons, whereas GhAGL15.1 and GhAGL15.4 of the AGL15 subgroup had four exons, which was consistent with GhSVP1 of the SVP subgroup. The AGL17 genes displayed relatively longer lengths compared with other subgroup genes. Additionally, GhBS4 had 11 introns and the first exon was meaningfully shorter, while in GhSOC1.5, the second of seven introns was longer than the others. The MIKC<sup>∗</sup> had much shorter gene lengths and more introns than the MIKCC. GhMIKC<sup>∗</sup> 12 had the fourth longest intron, which distinguished it from other members of the MIKC<sup>∗</sup> family.

We used MEME to analyze MIKC proteins, and 13 conserved motifs were identified (**Figure 3B**). Most of the closely related MIKC proteins had similar motif type distributions in the same subfamily (**Figure 3A**). The most striking divergence among the subgroups was in the composition of the C-terminal domains. Motif 1 contained the MADS domain in all of the MIKC families, except GhSEP8. The highly conserved sequence logs were showed in Figure S2. The differences between I regions and K-box domains were distinctly shown in the MIKC<sup>C</sup> and MIKC<sup>∗</sup> proteins (**Figure 3B**). The K-box domain contained three motifs, 2, 4, and 9, in GhMIKCC. However, motif 4, 5, 8, and 9 were present in the GhMIKC<sup>∗</sup> K-box domain, depending on the lengths. The I region in the MIKC<sup>C</sup> subfamily contained Motifs 3 and 6, while members of the MIKC<sup>∗</sup> contained motifs 6 and 11, which resulted in a longer I region.

#### Chromosomes Distributions of *GhMIKC* Genes

Among the 25 G. hirsutum chromosomes, MIKC genes were physically located on all of the 13 A chromosomes and on 12 of the 13 D chromosomes (**Figure 4**). Among the 110 MIKC genes, two genes, GhAP3.10 and GhAGL17.12, could not be distributed on the G. hirsutum chromosomes, but were located on unmapped scaffolds (7,246 and 4,768, respectively). The greatest numbers of genes were located on Dt-chr12 (eight genes), followed by Dtchr2, At-chr12, At-chr13, Dt-chr11, and Dt-chr13 (seven genes on each). In contrast, two genes were located on chromosomes At-chr1, At-chr8, At-chr10, Dt-chr9, and Dt-chr10. Only one gene was mapped on At-chr9 and Dt-chr8, and no genes were located on Dt-chr1.

#### Expression Pattern Analyses of MIKC Genes

To explore the expression patterns of the MIKC family genes in G. hirsutum- specific developmental processes, the 110 genes' expression profiles were detected in seven different tissues (root, stem, leaf, flower, ovule, seed, and fiber) by transcriptome sequencing (**Figure 5**). A heat map showed that different genes shared similar expression patterns within subfamilies. For example, the SEP, AG, AP1, AP3/PI, TM8, and AGL6 subgroups were preferentially expressed in flowers. Similarly, the

FIGURE 2 | (A). Phylogenetic relationships and (B). Gene structure of MIKC genes in Gossypium hirsutum L. The neighbor-joining tree was constructed with MEGA v6.06. The 13 subfamilies are marked with different colored lines. Exons and introns are represented by green and black lines.

SVP subfamily was expressed especially in flowers. Additionally, some SEP members (GhSEP1, GhSEP3, GhSEP4, GhSEP7, and GhSEP8), and the AG and BS subgroups, were highly expressed in reproductive organs (ovules and fibers). Simultaneously, SEP, AP1 and five of the AGL6 genes (GhAGL6.1, GhAGL6.2, GhAGL6.5, GhAGL6.6, and GhAGL6.7) were also detected in roots. Interestingly, the SOC family displayed diverse expression profiles. GhSOC1.3 and GhSOC1.7 had high expression levels in roots. In addition, GhSOC1.8 was mainly expressed in flower, while GhSOC1.1, GhSOC1.4 and GhSOC1.9 were highly expressed in leaves. GhSOC1.6 and GhSOC1.10 were exclusively and highly expressed in stems. The GhAGL15 (GhAGL15.2, GhAGL15.4, and GhAGL15.5) and GhAGL17 (GhAGL17.7, GhAGL17.9, and GhAGL17.11) subfamilies were relatively highly expressed in roots, and GhAGL17.2, GhAGL17.8, and GhAGL17.12 were expressed in flowers. Four of the GhMIKC<sup>∗</sup>

genes (GhMIKC<sup>∗</sup> 6, GhMIKC<sup>∗</sup> 7, GhMIKC<sup>∗</sup> 17, and GhMIKC<sup>∗</sup> 18) had high expression levels in flowers, and GhMIKC<sup>∗</sup> 7 and GhMIKC<sup>∗</sup> 17 were also highly expressed in seeds.

ABCDE model genes regulate the formation of five floral organs in Arabidopsis (Sánchez-Fernández et al., 2001; Dietrich et al., 2009; Kuromori, 2010). To validate the participation of MIKC genes in regulating flowering, we selected 16 of ABCDE model orthologous genes to test their expression in five parts of floral organs (sepal, petal, stamen, carpel, and ovule) by qRT-PCR in G. hirsutum (**Figure 6**). GhAP1.4 and GhAP1.11 (A class) showed high expression levels in sepals, petals, and carpel. Differently, GhAP1.8 was preferentially expressed in sepal. GhAP3.5, GhAP3.6, and GhAP3.8 of the AP3 subfamily, belonging to B class, were expressed in petals and stamens. GhAG4 of the C class displayed the highest expression level in stamen. GhAG7 and GhAG8 of the D class had higher expression levels in carpel and ovules. GhSEP1, GhSEP4, and GhSEP6 (E class) were expressed in four different floral organs. GhBS2 and GhBS3 (B sister class) were mainly expressed in carpel and ovules. SOC1 accelerates the flowering time, and thus, it is involved in the promotion of floral organ formation. Therefore, high expression levels of GhSOC1.2 and GhSOC1.8 were detected in sepals, stamens and carpel. These results were consistent with the ABCDE model.

#### Overexpression of the *GhAGL17.9* Gene in Arabidopsis

AGL17 is the biggest subgroup (**Figure 1B**). To further investigate the role of the GhAGL17 subfamily in plant growth and development, we transformed GhAGL17.9 into Arabidopsis (Columbia-0) driven by the CAULIFLOWER MOSAIC VIRUS (CaMV) 35S promoter. We identified 12 T<sup>3</sup> generation transgentic lines that showed an early flowering phenotype. QRT-PCR results confirmed that GhAGL17.9 was overexpressed in transgenic lines L1 and L3 (**Figure 7**). Meanwhile, the numbers of rosette leaves were significantly decreased compared with WT (**Table 2**). To explore the molecular mechanisms that impact the flowering time in transgentic lines, qRT-PCR was used to detect the expression of flowering-related genes in transgentic lines. LFY is a flowering integration promoting factor, AGL17 can positively regulate the expression of LFY gene (Han et al., 2008), and CO is a photoperiod pathway regulator, AGL17 acts downstream of CO (Han et al., 2008). As shown in **Figure 7**, the expression levels of LFY gene in lines 35S-L1 and 35S-L3 were three times higher than in the wild type. CO gene expression was not significantly increased. SOC1 is a flowering promoter that regulates different signals of the flowering pathways (Lee and Lee, 2010; Ding et al., 2013). The up-regulation of SOC1 activates downstream targets, including LFY and promotes flowering in Arabidopsis (Schönrock et al., 2006; Lee et al., 2008). Approximate four-fold increases in SOC1 expression levels were observed in two transgenic lines.

# DISCUSSION

Cotton, as an oil crop, plays an important role in agriculture and industry all around the world (Houhoula et al., 2003; Waheed et al., 2010; Mujeli et al., 2016). Floral organs developments affect the yield and quality of cotton seed. The MIKC family members are plant-specific transcription factors containing MADS and K-box domains, and play crucial roles in plant seed development and floral identity (Nesi et al., 2002; De Folter et al., 2006; Mondragon-Palomino and Theissen, 2011). Many MIKC homologs have been analyzed in many plants, including Arabidopsis, O. sativa, P. tremula, Z. mays, S. bicolor, B. rapa, and R. sativus (Parenicová et al., 2003; Leseberg et al., 2006; Arora et al., 2007; Zhao et al., 2010; Duan et al., 2015; Li et al., 2016). However, the characterization and functional analysis of the MIKC family has not been performed in G. hirsutum, an allotetraploid species. In this study, we performed a comprehensive analysis of GhMIKCs, which included investigating chromosomal locations, phylogenetic relationships, gene structures, conserved motifs, and expression profiles in different tissues.

# Overall Summary of the MIKC Family in *Gossypium hirsutum* L

In total, 110 MIKC genes were identified based on G. hirsutum genome sequences. Based on phylogenetic relationships with Arabidopsis and O. sativa orthologs (Figure S1), the G. hirsutum type II MADS family (MIKCC) was divided into 13 subfamilies (**Figure 1C**). Interestingly, an FLC subfamily was not identified in the G. hirsutum genome. Similar results were found in O. sativa, C. sativus, Z. mays, and S. bicolor genomes as well (Arora et al., 2007; Zhao et al., 2010; Hu and Liu, 2012). The FLC genes are involved in controlling flowering time through the vernalization and autonomous pathways (Helliwell et al., 2006, 2011; Greb et al., 2007). Vernalization is not required for flowering in O. sativa, C. sativus, Z. mays, and S. bicolor (Arora et al., 2007; Zhao et al., 2010; Hu and Liu, 2012). Thus, vernalization might not be essential for cotton flowering as well. In addition, we found that in most subgroups, the numbers of proteins in G. hirsutum were not doubled, compared with in the diploids Arabidopsis and O. sativa. This implied that gene duplication could give rise to the amplification of MIKC subfamily genes in a variety of forms (Flagel et al., 2008; Hargreaves et al., 2014). As previously reported, multiple duplications and diversifications in the different clades of different species cause different evolutionary constraints (Lynch and Conery, 2000; Flagel and Wendel, 2009; Airoldi and Davies, 2012).

Chromosomal assignments indicated that the gene locations were equally divided among four pairs of chromosomes (Atchr6 and Dt-chr6, At-chr7 and Dt-chr7, At-chr10 and Dt-chr10, and At-chr13 and Dt-chr13) in A as well as in D genome (**Figure 4**). However, five D-genome chromosomes (Dt-chr2, Dtchr4, Dt-chr9, Dt-chr11, and Dt-chr12) contained more genes compared with the corresponding A-genome chromosomes (Atchr2, At-chr4, At-chr9, At-chr11, and At-chr12). Additionally, large numbers of MIKC genes were located on the last three chromosomes (chr11, chr12, and chr13) of both genomes. This could indicate that the current phenomena were derived from differential rates of genomic evolution and inter-genomic hereditary information transfer (Paterson et al., 2000; Wendel and Cronn, 2003).

FIGURE 6 | Expression profiles of 16 *Gossypium hirsutum* L. MIKC genes in five different tissues (sepal, petal, stamen, carpel, and ovule) as determined by qRT-PCR. The relative expression levels are shown against the reference gene His3. Error bars represent the standard deviations of three independent experiments.

Morphology of wild type (WT) and transgenic seedlings after 22 days of growth. Bar = 2 cm. (B). A qRT-PCR analysis of GhAGL17.9 overexpression in WT and transgenic Arabidopsis. Significant differences compared with WT (t-test):\*\*, P < 0.01. (C). Expression levels of SOC1, CO, and LFY as determined by qRT-PCR in WT and GhAGL17.9-overexpression plants. Actin2 was used as the internal control. Error bars represent the standard deviations of three independent experiments. Significant differences compared with WT (t-test):\*, P < 0.05;\*\*, P < 0.01.

#### TABLE 2 | Flowering time and the leaf numbers of rosette in WT and p35S::*GhAGL17.9* plants.


\*Represents a significant difference from wild type (t-test, p < 0.05);

\*\*Represents a significant difference from wild type (t-test, p < 0.01);

Data are presented as the mean ± SD;

Plants were grown under long-day conditions (16 h of light/8 h of dark).

# Expression Profiles of MIKC Genes in *Gossypium hirsutum* L

Global expression patterns analyses in seven different tissues showed that the API, AP3, AG, SEP, and BS subfamilies were almost all expressed in the flower development stage (**Figure 6**). Floral organ identities and flower meristem are regulated by five kinds of genetic functional genes (A-B-C-D-E) during flower development, from sepals to ovules (Díaz-Riquelme et al., 2009; Na et al., 2014). A qRT-PCR analysis showed the expression patterns of the orthologous genes of the ABCDE model in flower organogenesis (**Figure 6**), which were consistent with previous findings in Arabidopsis (Ó'Maoiléidigh et al., 2014; Xie et al., 2015). Further, the API subgroup of A class genes were not only expressed in sepals and petals, but also exhibited carpel expression profiles. Before and after pollination, the API-like gene may aid in the carpel development in Orchidaceae, which triggered ovary development (Mondragon-Palomino and Theissen, 2011; Acri-Nunes-Miranda and Mondragón-Palomino, 2014). Thus, AP1 subgroup genes may have similar expression patterns in Orchidaceae and allotetraploid cotton. A few GhMIKC∗ genes were highly expressed in flowers and seeds, which was in accordance with previous results in Arabidopsis (Verelst et al., 2007) and O. sativa (Liu et al., 2013). These results indicated that the expression profiles of MIKC∗ genes were involved in functional redundancy and conservation in the process of G. hirsutum evolution.

#### Role of the *GhAGL17* Gene in Flowering

In Arabidopsis, AGL17 acts as a novel flowering promoter, which is involved in the photoperiod pathway. Under longday conditions, the overexpression of AtAGL17 causes early flowering (Han et al., 2008). As the largest subgroup of the GhMIKC<sup>C</sup> family, one member of the AGL17s, GhAGL17.9, was overexpressed in Arabidopsis to explore its biological functions. The transgenic lines displayed earlier flowering than wild type (**Figure 7**). The expression levels of the related positive marker genes, especially LFY and SOC1, which are involved in regulating the flowering process, were higher in p35S::GhAGL17.9 lines than in wild type. LFY overexpression can prematurely cause plant development and accelerate blossoming processes (Nilsson et al., 1998; Dornelas and Amaral, 2004). AGL17 targets LFY to promote flowering (Han et al., 2008). SOC1 encodes a MIKC protein, a floral pathway integrator, which is regulated by a variety of flower signaling pathways (Lee et al., 2000; Wang et al., 2009; Ding et al., 2013). However, the relationship between AGL17 and SOC1 in flowering is not clear, which remains to be functionally explored further in the future.

#### CONCLUSIONS

In this study, 110 MIKC genes were first identified in the G. hirsutum genome. The family was divided into 13 subgroups based on a phylogenetic tree, exon/intron structures, and the distributions of conserved motifs. Chromosomal locations of MIKC gene family members were also determined. Finally, the expression patterns of GhMIKCs were explored using transcriptome sequencing and qRT-PCR, which revealed the expression levels at different developmental stages. Most MIKC<sup>C</sup>

#### REFERENCES


genes were highly expressed in the floral organs, which was consistent with the ABCDE model. The overexpression of GhAGL17.9 in Arabidopsis resulted in early flowering through the upregulated expression of SOC1, CO, and LFY, which suggested that GhMIKCs play vital roles in cotton flowering. Our work provides functional insights into the roles of GhMIKC genes in cotton flowering.

# AUTHOR CONTRIBUTIONS

ZuY and FL conceived and designed the experiments. ZR and DY performed the experiments. ZhY conducted the phylogeny analysis. CL and LL prepared the materials. HZ and QC analyzed the data. ZR and ZuY wrote the paper. GQ, YL, JL, ZL, and LW helped to revise the paper. All authors read and approved the final manuscript.

# ACKNOWLEDGMENTS

This work was supported by the National Natural Science Foundation of China (No. 31501345), Zhengzhou Science and Technology Program (153PXXCY180) and Young Elite Scientist Sponsorship Program by CAST. We thank Peng Huo (Zhengzhou Research Center, Institute of Cotton Research of CAAS, Zhengzhou) for technical assistance.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpls.2017. 00384/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Ren, Yu, Yang, Li, Qanmber, Li, Li, Liu, Lu, Wang, Zhang, Chen, Li and Yang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Modeling Allometric Relationships in Leaves of Young Rapeseed (Brassica napus L.) Grown at Different Temperature Treatments

Tian Tian<sup>1</sup> , Lingtong Wu<sup>1</sup> , Michael Henke<sup>2</sup> , Basharat Ali1,3, Weijun Zhou<sup>1</sup> \* and Gerhard Buck-Sorlin<sup>4</sup> \*

1 Institute of Crop Science and Zhejiang Key Laboratory of Crop Germplasm, Zhejiang University, Hangzhou, China, <sup>2</sup> Department of Ecoinformatics, Biometrics and Forest Growth, Georg-August University of Göttingen, Göttingen, Germany, 3 Institute of Crop Science and Resource Conservation, University of Bonn, Bonn, Germany, <sup>4</sup> IRHS, INRA, AGROCAMPUS OUEST, University of Angers, Beaucouzé, France

#### Edited by:

Dragana Miladinovic,´ Institute of Field and Vegetable Crops, Serbia

#### Reviewed by:

Najeeb Ullah, University of Sydney, Australia Ivana Maksimovic, University of Novi Sad, Serbia

#### \*Correspondence:

Weijun Zhou wjzhou@zju.edu.cn Gerhard Buck-Sorlin gerhard.buck-sorlin@agrocampusouest.fr

#### Specialty section:

This article was submitted to Crop Science and Horticulture, a section of the journal Frontiers in Plant Science

Received: 10 December 2016 Accepted: 20 February 2017 Published: 21 March 2017

#### Citation:

Tian T, Wu L, Henke M, Ali B, Zhou W and Buck-Sorlin G (2017) Modeling Allometric Relationships in Leaves of Young Rapeseed (Brassica napus L.) Grown at Different Temperature Treatments. Front. Plant Sci. 8:313. doi: 10.3389/fpls.2017.00313 Functional–structural plant modeling (FSPM) is a fast and dynamic method to predict plant growth under varying environmental conditions. Temperature is a primary factor affecting the rate of plant development. In the present study, we used three different temperature treatments (10/14◦C, 18/22◦C, and 26/30◦C) to test the effect of temperature on growth and development of rapeseed (Brassica napus L.) seedlings. Plants were sampled at regular intervals (every 3 days) to obtain growth data during the length of the experiment (1 month in total). Total leaf dry mass, leaf area, leaf mass per area (LMA), width-length ratio, and the ratio of petiole length to leaf blade length (PBR), were determined and statistically analyzed, and contributed to a morphometric database. LMA under high temperature was significantly smaller than LMA under medium and low temperature, while leaves at high temperature were significantly broader. An FSPM of rapeseed seedlings featuring a growth function used for leaf extension and biomass accumulation was implemented by combining measurement with literature data. The model delivered new insights into growth and development dynamics of winter oilseed rape seedlings. The present version of the model mainly focuses on the growth of plant leaves. However, future extensions of the model could be used in practice to better predict plant growth in spring and potential cold damage of the crop.

Keywords: allometry, functional–structural plant model (FSPM), growth function, source–sink relations, temperature, winter oilseed rape, GroIMP

#### INTRODUCTION

Winter oilseed rape (Brassica napus L.) is an important oilseed and fodder crop worldwide (Zhang and Wu, 2007), not only because of its high oil yield (between 40 and 50% of dry mass of seeds), but also because the oil has a high nutritious value, due to the presence of different kinds of aliphatic acids and vitamins (Yang and Tu, 2003). This makes rapeseed oil a very important and healthy ingredient for human nutrition, with a long history as edible oil in Asia and Europe. In addition, if the content of erucic acid is higher than 50%, the seeds can be used as raw ingredient for the

**293**

production of lubricating oil and suspending agents (Yang and Tu, 2003). Sowing of rapeseed crop mostly starts in September or October at latitudes of 30◦ in Eastern China. Undergoing vernalization throughout winter, rapeseed blooms in April and harvest in May next year.

A number of climatic and nutritional factors are known to influence both rates and daily integrals of photosynthesis and respiration. Temperature is known to be of utmost importance for photosynthesis and respiration rate (Marcelis, 1996). High temperature leads to a decrease in carboxylation rate as it decreases the specificity of Rubisco for CO<sup>2</sup> (Lambers et al., 2008), whereas low temperature affects the sucrose metabolism, which leads to accumulation of phosphorylated intermediates and ultimately reduces inorganic phosphates. Finally, inhibition occurs in ATP and photosynthetic accumulation (Chawade, 2011). Besides those effects on photosynthesis, temperature induces some modifications in the morphology of rapeseed. The stems become thinner and the leaves turn thicker under high temperature (Qaderi et al., 2006, 2010). These effects on rapeseed growth (as the decrease of total leaf area) will lead to differences in photosynthesis. Fresh and dry masses have been considered as being positively correlated with temperature during the seedling stage (Rapacz, 1998). Temperature is thus an important factor for photosynthesis rate and consequently growth of rapeseed plants. In order to avoid a potentially considerable reduction in production, it is therefore, important to be able to predict accurately the phenological stage and the topological and geometrical structure of the plant during this period, in order to better assess recovery and survival rates of the crop, which are pivotal for ultimate yield.

To accelerate breeding and optimize the production of rapeseed, the use of crop models as a tool in rapeseed research has been proposed by a number of researchers (Diepenbrock, 2000; Fourcaud et al., 2008; Xu et al., 2011; Robertson and Lilley, 2016). Models for oilseed rape production have been diversified in objectives and methodology. In some classic crop models, the plant canopy was divided into different layers, where the main processes of biomass production were computed for each layer separately and then integrated for the whole crop (Tang et al., 2007, 2009; da Luz et al., 2012). Some models considered the 3D structure and selected functions of the whole plant (Groer et al., 2007). Other models mainly focused on the accurate description of the rates of photosynthesis, respiration, and biomass production at the leaf and whole-plant scale, with local climate parameters as input (Paul and Driscoll, 1997; Müller and Diepenbrock, 2006; Deligios et al., 2013).

In the present study, we investigated the growth and development of young winter oilseed rape plants under different temperature treatments, with a special consideration of the morphology of leaves of different ranks. On the basis of the data established in this study a functional–structural plant model (FSPM) for young rapeseed plants was devised, representing leaf extension kinetics; data about leaf photosynthesis, respiration, biomass production and allocation at the organ and wholeplant scale were obtained from the literature. FSPM being a modeling paradigm that allows not only an instructive visualization of growth and development, but also the integration of heterogeneous datasets in a common scheme, the model we created already helped us to better interpret the correlated datasets of our experiments, apart from the conclusions drawn based on the statistical analyses. However, in this model, we were only able to explore leaf development of young plants. In a further extension of the model, we will eventually use it as a visual decision-support tool for rapeseed breeding and production utilization.

The present model is based on a previous rapeseed model by Groer et al. (2007) and the general sink-source-based FSPMprototype (FSPM-P; prototype model) by Henke et al. (2016). The FSPM-P is a general model based on the 3D structure of the plant including the physiological characteristics (photosynthesis, maintenance respiration, growth respiration, and plant organ biomass accumulation). The FSPM-P being not designed for a specific crop, it can be adapted in principle to any crop plant with C3 photosynthesis. Its main advantage is that it is easy to parameterize, use, and extend (Henke et al., 2016). In the present study, we discuss how this model could be used to predict the kinetics of storage of carbon reserves (starch) during autumn and winter that might be the key to successful reestablishment of the crop in spring.

#### MATERIALS AND METHODS

#### Setup of the Temperature Treatment

Two experiments were set up, differing in planting date. Both experiments focused on young rapeseed plants. Seeds of the leading commercial cultivar of winter oilseed rape (Brassica napus L. cv. ZS 758) were germinated in the dark on moist filter paper at 25◦C for 2 days. The seedlings were then transferred to plastic pots (height 20 cm, diameter 15 cm) filled with a fertile soil mixture [available N 200 mg kg−<sup>1</sup> , P2O<sup>5</sup> 350 mg kg−<sup>1</sup> , KCl 500 mg kg−<sup>1</sup> , Ca(H2PO4)2·H2O 3000 mg kg−<sup>1</sup> , K2SO<sup>4</sup> 200 mg kg−<sup>1</sup> , (NH4)2SO<sup>4</sup> 500 mg kg−<sup>1</sup> ], and cultivated in the greenhouse at a temperature of 10–15◦C for 15 days until they had one visible leaf. The potted seedlings were then transferred to a climate chamber (PGX450D, Saifu Instruments, Ningbo, China) and cultivated at three different temperature regimes, combined with a 14-h photoperiod (36 plants per temperature treatment). These temperature regimes correspond to the typical temperature prevalent from September to November at latitudes of 30◦ in Eastern China<sup>1</sup> . The temperature treatments were: 10/14◦C (low temperature, "L"), 18/22◦C (medium temperature, "M"), and 26/30◦C (high temperature, "H"; night/day temperatures, respectively). Relative humidity was maintained at around 80%. Incident photosynthetically active radiation (PAR) in the temperature chambers was measured above the plant canopy using a Li6400R portable photosynthesis system (LI-COR, Lincoln, OR, USA). It was on average 150 µmol m−<sup>2</sup> s −1 , coming from 39 wall-mounted fluorescent light tubes (fluorescent phosphor, type SP41), with a total output of 330 µmol m−<sup>2</sup> s −1 .

<sup>1</sup>http://www.tutiempo.net/en/Climate/Hangzhou/584570.htm

#### Plant Measurements and Analyses

Three plants were chosen randomly every 3 days after putting them into the climate chamber. These plants were photographed, and then sampled destructively for determination of dry weights of each organ. Leaf length, leaf width, petiole length, and leaf area were measured from photographs using ImageJ <sup>R</sup> 1.41o software (Abramoff et al., 2004). Leaf dry weight was measured after 5 days storage of samples in a drying cabinet at 80◦C. Destructive sampling commenced after the plants were put into the temperature chamber and continued for a month. Total leaf number per plant, as well as leaf blade length, leaf blade width, and petiole length were determined for each phytomer (in the sense of Room et al., 1994).

Statistical analyses of total leaf area and dry mass were carried out using the glm procedure of SAS 9.1.3 (SAS <sup>R</sup> Institute Inc., Cary, NC, USA). The different means of total leaf area and dry mass were based on statistical analyses.

#### Leaf Kinetics

Leaf kinetics (i.e., extension of the blade in length over time) was determined from the same samples as in Section "Plant Measurements and Analyses" by measuring the lengths of leaf blades every 3 days during their development. The kinetics of a rapeseed leaf blade can be conveniently described using a sigmoid logistic curve, which describes the current length of a blade as a function of plant age (expressed as accumulated temperature). **Figure 1** describes the leaf length development at different temperature treatments, as a function of accumulated temperature, for the first five leaf ranks. To fit the data, a logistic equation was used with three parameters:

$$\mathcal{Y}\_{r,T} = \frac{\mathcal{Y}\_{m\_{r,T}}}{1 + [\frac{ts}{ts\_{r,T}}]^{b\_{r,T}}} \tag{1}$$

where ymr,<sup>T</sup> [cm] is the maximum leaf length at rank r at temperature treatment T; ts0r,<sup>T</sup> [ ◦Cd] is the accumulated temperature at which extension rate is maximal; and br,<sup>T</sup> describes the slope of the curve, for a given rank r and temperature T (**Table 1**).

#### FSPM of Rapeseed Development

The dynamic FSPM of young rapeseed was developed by modification and reparameterization of the FSPM-P (Henke et al., 2016), a fully functional prototype model written in the language XL (Kniemeyer, 2008), and running on the Growth Grammar-related Interactive Modeling Platform (GroIMP) modeling platform (Kniemeyer et al., 2007). Additionally, elements were used from a previous model of rapeseed development by Groer et al. (2007). Essentially, the present model comprised modules for the main biophysical and physiological processes observable in crop plants: light interception, photosynthesis, assimilate partitioning, growth, and respiration.

Light interception was modeled using the twilight light model provided within the GroIMP platform: This model is based on a Monte Carlo raytracer and allows the computation of the spatial and temporal light distribution for suitable objects (such as leaves) in a simulated scene [for details see Buck-Sorlin et al. (2011)]. The climate chamber used in reality contained 39 wallmounted light tubes (fluorescent phosphor, type SP41). As this setup would have been too expensive in computational terms, it was reconstructed in GroIMP using area lights associated with parallelogram objects representing the two lateral walls and the back wall, with a total output of 330 µmol m−<sup>2</sup> s −1 . Two virtual light sensors were placed inside the chamber, above each one of the two shelves. Simulations reproduced the measured photosynthetic photon flux density (PPFD), 150 µmol m−<sup>2</sup> s −1 . Virtual lamps, sensors, and the construction elements were specified as geometric objects with optical properties, as described in Buck-Sorlin et al. (2009).

To model leaf photosynthetic rate the model LEAFC3 was used (Nikolov et al., 1995). LEAFC3 is a model of the shortterm steady-state fluxes of CO2, water vapor, and heat from leaves of C3 plant species, explicitly coupling all major processes involved in photosynthesis (biochemistry of assimilation process, stomatal conductance, and leaf energy balance). The model was parameterized using rapeseed-specific parameters derived from the literature (Liu et al., 2003; Ma et al., 2009).

#### Dry Matter Dynamics at Plant Level

To model the temporal dynamics of the integral of dry matter accumulation as a function of accumulated temperature at the plant level, a modified logistic function was used (Li and Ma, 2012) as proposed by Wang (1986). Furthermore, in order to improve the comparison of the dynamics at different temperatures, the input data were normalized, yielding the following equation:

$$\chi' = \frac{\mathcal{Y}\_m}{1 + e^{ats'^2 + bts' + c}} \tag{2}$$

where y 0 is the normalized dry mass (y/ymax), ts<sup>0</sup> is the normalized accumulated temperature (ts/tsmax), ym, a, b, and c are shape parameters (**Table 2**). Inspection of the curves of the normalized data yielded that for some leaf ranks and temperature treatments a double logistic function with a transition point at an accumulated temperature tstr would be more suitable:

$$\begin{array}{lcl} \mathbf{y}^{'} = \begin{cases} \mathbf{t} \mathbf{s}^{'} < \mathbf{t} \mathbf{s}\_{tr}^{'} : \frac{\mathbf{y}\_{m1}^{'}}{1 + \mathbf{e}^{a\_{1} \mathbf{s}\_{m}^{'}} + b\_{1} \mathbf{t}^{'} + \boldsymbol{\varepsilon}\_{1}}\\ \mathbf{t} \mathbf{s}^{'} \ge \mathbf{t} \mathbf{s}\_{tr}^{'} : \frac{1}{1 + \mathbf{e}^{a\_{h} \mathbf{s}\_{h}^{'}} + b\_{h} \mathbf{t}^{'} + \boldsymbol{\varepsilon}\_{h}} \end{array} \tag{3}$$

where a1, b1, c1, ym1, ah, bh, ch, and ymh are the shape parameters (**Table 3**) for the two logistic curves below and above the threshold accumulated temperature ts<sup>0</sup> tr, respectively.

In this way, the growth function for each leaf rank under different temperature treatments was obtained. With the exception of the third leaf at the high temperature treatment, which required the double growth function (Equation 3) the expansion kinetics of all leaves could be expressed using Equation 2.

As the plants were grown under conditions of potential production (ample water and nutrients, free of weeds, pests and competition), it was assumed that the measured growth corresponded to the potential growth at the given temperature.

#### Leaf Area Dynamics at the Whole-Plant Level

the data were fitted for each temperature treatment):

$$LA\_t = LA\_{0,t} + \frac{LA\_{\max,t}}{1 + (ts\_t - TS\_{m,t})^{-b}} \tag{4}$$

The dynamics of leaf area at the level of the plant as a function of the accumulated temperature was modeled using a logistic curve, with an exponential, linear and maturation phase of growth, an equation with four parameters (i.e., twelve parameters in total, as

TABLE 1 | Parameters of the logistic curve for three temperature treatments and different leaf ranks in Brassica napus seedlings.


H, M, and L: high, medium, and low temperature treatments.

Where LA0,<sup>t</sup> is the initial leaf area; LAmax,<sup>t</sup> is the maximum leaf area at a given temperature treatment; ts<sup>t</sup> is the accumulated

TABLE 2 | Parameters of the two growth functions fitted to the normalized time series data of total dry matter in B. napus seedlings grown at three temperature treatments.


H, M, and L: high, medium, and low temperature treatments.


TABLE 3 | Parameters of the growth function fitted to the normalized time series data of dry matter in B. napus with different leaf ranks at three temperature treatments.

H, M, and L: high, medium, and low temperature treatments.

temperature at a given temperature treatment; TSm,<sup>t</sup> is the accumulated temperature at which extension rate is maximum at a given temperature treatment and b describes the slope of the curve (**Table 4**).

#### Framework of the Rapeseed Model

Some basic parameters used in the rapeseed model are listed in **Tables 1**–**4**. We used leaf appearance rates derived from data as (**Figure 1**), describing the kinetics of leaf appearance, expressed as the number of visible leaves as a function of time (days after transfer into the climate chambers). The daily accumulated temperature increment, as derived from the slope, minus the temperature for growth of rapeseed (5◦C) (Tang et al., 2007, 2009) was 23◦Cd d−<sup>1</sup> , 15◦Cd d−<sup>1</sup> , and 7◦Cd d−<sup>1</sup> , at temperature regimes of 26/30◦C, 18/22◦C, and 10/14◦C, respectively. Some derived morphological parameters were calculated, such as the width/length ratio (WLR), the petiole-blade ratio (PBR) and leaf mass per area (LMA) (**Figures 2–4**). Integration of these elements into the FSPM-P yielded a first model version in which the rough shape of the leaves could be represented (**Figures 5**, **6**). Finally, total dry mass, total leaf area of the whole plant, and leaf mass per area (LMA) of each leaf from our data were used to parameterize the growth curves, which in turn were used to simulate leaf extension (**Figures 7–9**) and dry matter accumulation kinetics (**Figure 10**).

#### RESULTS

The fitted model parameters for the first five leaf blade ranks and temperature treatments are given in **Table 1**. The first two true leaves (except cotyledons) of oilseed rape were usually small, and preformed in the seed. As the first leaf was already present or fully extended by the time of the start of the experiment, the fit of timing (x0) and slope (b) parameter of the logistic equation was not very good. A true sigmoid curve was only fitted for leaf ranks of three and above (**Figure 1**). Furthermore, at low temperature leaves were smaller and grew more slowly, also leaf appearance rate was slower, leading to fewer observed leaves within the observation period (**Figure 1**).

#### Morphological Parameters

The relationship between width and length of the leaves at different temperatures was expressed as WLR. The WLR decreased with leaf rank. Young leaves had a tendency to be narrower than older ones (**Figure 2**). There was a clear relationship between PBR and leaf rank (**Figure 3**), changing in a linear fashion with rank. More specifically, PBR increased with increasing rank at the H treatment, whereas it decreased at the other two treatments.

#### Total Dry Mass and Total Leaf Area of the Whole Plant

At the first sowing date, the differences in leaf area and leaf dry weight between the H and M treatment were not significant (P = 0.01; **Table 5**). Only the low temperature treatment was significantly different from the other two, by producing leaves with smaller area and lower dry mass (P = 0.01). However, for the second sowing date, all three treatments significantly differed from each other with respect to both parameters (P = 0.01; **Table 5**).

TABLE 4 | Parameters of the leaf area function in B. napus seedlings under three temperature treatments.


H, M, and L: high, medium, and low temperature treatments.

With respect to the duration of treatment (**Table 6**), it was observed that at the earlier sowing date a significant difference in leaf area occurred between treatment duration of 6 and 9 days, and 12 and 15 days, respectively. A similar pattern was observed for leaf dry weight, but additionally, dry weights differed already between 3 and 6 days duration (P = 0.01).

#### Leaf Mass Per Area (LMA) of Individual Leaves at Different Treatments

Generally, with the exception of the L treatment, LMA significantly increased with rank (**Figure 4**). The regression between leaf rank r and LMA was, for high temperature: LMA = 0.0133e 0.0585r (R <sup>2</sup> = 0.9585, n = 102), and for medium temperature: LMA = 0.0143e 0.0607r (R <sup>2</sup> = 0.8195, n = 108). This relationship was not significant for the low temperature treatment.

#### Simulation Output

Part of the observations and data given above were used to set up and parameterize the functional–structural model in order to integrate the different results into a spatially explicit form, for the different temperature treatments. Simulations of some developmental stages of young rapeseed plants are shown in **Figures 5**, **6**. As the measured phyllochron was

FIGURE 3 | Differences in petiole length – leaf blade length ratio (PBR) of leaves at different ranks under three temperature treatments. H, M, and L represent the plants under high, medium, and low temperature treatment, respectively. The ratio of petiole length and leaf blade length (PBR) differed between temperature treatments for different ranks r. For the high temperature it was: PBR = 0.056r + 0.9591 (R <sup>2</sup> = 0.6925, n = 108); for the medium temperature: PBR = −0.0636r + 1.0704 (R <sup>2</sup> = 0.979, n = 111); for the low temperature: PBR = −0.1609r + 1.1353 (R <sup>2</sup> = 0.9985, n = 63).

FIGURE 4 | Leaf mass per area (LMA) at different leaf ranks. Each point represents the mean of all individual leaves with the same rank. Error bars are standard deviation. Plants under high (A), medium (B), and low (C) temperature treatment. The regression between leaf rank (r) and LMA was, for high temperature: LMA = 0.0133e 0.0585r (R <sup>2</sup> = 0.9585, n = 102); for medium temperature: LMA = 0.0143e 0.0607r (R <sup>2</sup> = 0.8195, n = 108). As the relationship at low temperature was not significant, no regression equation is provided.

an input to the model, the accordance between measured and simulated leaf appearance rate naturally was very good (**Figure 7**). Simulated expansion dynamics of leaf blades of different ranks and at different temperatures are shown in

**Figure 8**. Simulated blade expansion subsided more rapidly and abruptly than in reality (cf. **Figure 1**). The total simulated leaf area at different temperatures, based on the leaf area dynamics function (Equation 4) is shown in **Figure 9**. At the medium temperature treatment simulated leaf area increased faster than under low or high temperature treatment, respectively. Curves were similar among different temperature treatments. However, growth at the medium temperature treatment seemed to be faster than at the high temperature treatment.

#### DISCUSSION

Temperature is a major factor for plant growth rate and daily photosynthesis (Rapacz et al., 2001, 2003; Qaderi et al., 2006, 2010, 2012; Kromdijk and Long, 2016; Zhang et al., 2016). Furthermore, it has also an influence on oil quality (Baux et al., 2008, 2013). In the present study, the effect of temperature on leaf expansion was investigated in oilseed rape. Width-length ratio (WLR) at different temperatures exhibited significant differences only at high temperature (WLR = 0.849). In the M and L treatments PBR of the upper leaves was small, which means that the petiole was shorter relative to the blade. Overall, the decrease of PBR with rank will result in a conical leaf arrangement (with upper leaf blades being less distant from the main stem), which could be a plant-level strategy to increase light penetration to the lower leaves. More experimental work will be required to determine the exact reasons for the observed patterns. The PBR is one factor, besides WLR, determining leaf shape. Its change along the main stem can be interpreted in terms of developmental progress from the vegetative to the reproductive stage, with lower-ranked leaves being, in principle, more "vegetative". The increase in total leaf area and dry mass was lowest under the low temperature treatment. In a study similar to the present one leaf area decreased while leaf blade thickness increased when rapeseed plants were exposed to cold temperatures (Stefanowska et al., 1999). Furthermore, there were no significant differences in leaf area between medium and high temperature at the beginning of juvenile stage. However, when treatment duration was long enough (>6 or >10 days, **Table 6**), leaf area and dry mass at medium temperature were increasing faster than at high temperature. Thus, it can be concluded tentatively that an accumulated temperature increment of 75◦Cd is necessary as a phyllochron for the formation of a leaf on the main stem.

Leaf mass per area (LMA) was correlated with environmental and topological factors, such as temperature and leaf rank. LMA was positively correlated with temperature; similar research was also found in other studies (Gausman et al., 1971; Qaderi et al., 2006; Jullien et al., 2009). In the present study, the average LMA under the H, M, and L treatments were 0.01328 kg m−<sup>2</sup> , 0.01766 kg m−<sup>2</sup> , and 0.01639 kg m−<sup>2</sup> , respectively. There was a significant difference between the H treatment and the other two treatments. However, M and L did not differ significantly from each other. A more specific experimental study would be required to obtain a better idea concerning this phenomenon. LMA of upper leaves was higher than that of lower leaves, confirming the findings of Gausman et al. (1971). This effect was not significant for the three or four adjacent ranks in each treatment (**Figure 4**). The absence of a significant difference at the L treatment could be explained by the fact that only four leaves per main stem were formed in total.

Plant architecture is of major agronomic importance for the adaptability of a plant for cultivation, and to predict how the plant grows under different environments (Reinhardt and Kuhlemeier, 2002). The impact of plant architecture and morphology on the light climate within the plant canopy and the resulting amount of light locally incident, absorbed by a given assimilating surface, and finally used for photosynthesis plays a major role in plant development as it determines local assimilate availability and thus source strength (Evers et al., 2006). To investigate this complex context and the underlying relationships, we implemented a FSPM as an extension of the FSPM-P model devised by Henke et al. (2016). This model, apart from having a generic description of the sink behavior of young leaves and internodes, also contains provisions to simulate generically photosynthesis on a per-leaf basis, using local light and temperature as an input. Employing a prototype as a departure point permitted us to obtain a usable model relatively rapidly, as the only major steps to be done were parameterization and, eventually, the writing of simple

extensions of rules to accommodate crop-specific processes that were not covered by the general model. The FSPM-P is written in the language XL which is included in the GroIMP platform (Kniemeyer et al., 2007; Kniemeyer, 2008), with specialized tools for plant modeling. GroIMP provides an adequate radiation model that can compute the amount of light absorbed by each plant organ (more generally, objects in a simulated scene; Hemmerling et al., 2008). For more advanced studies of the impact of light quality (e.g., red to far-red ratio) on growth processes, spectral light can be simulated (Henke et al., 2016). The new light model makes use of parallel processing on a graphical processing unit of the computer, thereby accelerating computation enormously (van Antwerpen et al., 2011).

(days after treatment, dat) to the chamber: 2, 13, and 20 dat (top to bottom).

In recent years, models of the main agricultural crops (de Carvalho Lopes and Steidle Neto, 2011) have been developed and their usefulness and importance has been increasingly recognized by the agronomical community (Vos et al., 2010). The most striking advantage of a modeling approach is the time that can be saved when employing a model to predict yield or quality features of a new crop, compared with traditional methods of experimentation and breeding in the field. At the same time, the vivid and realistic visualization of plant development in the case of a 3D model renders teaching much easier and more attractive (Fourcaud et al., 2008). Apart from that, virtual plants or FSPMs, usually integrate a lot of information about the morphology of the crop in question as well as about the nature and spatiotemporal dynamics of physical and biological processes (Buck-Sorlin, 2013). Thus, such models that are capable

of integrating the dynamics of architectural development with the rate of photosynthesis and respiration are an excellent means to estimate the production of future biomass of a plant (Godin and Sinoquet, 2005).

In the field, a rapeseed plant stops growth below 5◦C; whereas photosynthesis usually continues well below this threshold temperature for growth (Tang et al., 2007, 2009). This leads to products of photosynthesis accumulating in the plant (as starch), where the extent of accumulation will be a function of photosynthetically active leaf area as well as of the accumulated temperature favorable for photosynthesis (below the threshold for growth yet above the threshold for photosynthesis and above

that leading to frost damage). Consequently, regrowth of the plant in spring, as temperatures rise again to favorable values for growth, will be the more rapid the more reserves the plant has been able to accumulate during winter. In a future version of the model, we will link the rate of regrowth of rapeseed in spring with the amount of stored starch during the winter (Martre et al., 2011). In doing so, an existing photosynthesis module [based on the LEAFC3 model by Nikolov et al. (1995)] will

TABLE 5 | Total leaf area and dry weight of the whole plant in B. napus seedlings under three temperature treatments.


H, M, and L: high, medium, and low temperature treatments.

<sup>∗</sup> Within each temperature treatment, means followed by the same letter are not significantly different according to the LSD test at P ≤ 0.01. 'a' is the highest and 'c' is the lowest.

be reparameterized from the literature (Leng et al., 2002; Liu et al., 2003; Ma et al., 2009) in order to obtain more accurate values for parameters influencing the rates of photosynthesis and respiration of the young rapeseed. It is possible to compute the total dry mass from an estimate of the amount of carbon fixed by photosynthesis and the amount of carbon lost through maintenance and growth respiration.

The primary aim of the present study was to build a model of the young stages of rapeseed in the leading commercial cultivar (Brassica napus L. cv. ZS 758). In order to build a robust and representative model in the future we will need to consider an experimental setup consisting of more plants

and several, contrasting rapeseed cultivars. In addition, as a further step, the current model will be extended to also represent further development from regrowth in spring to maturity (early summer). If successfully parameterized and calibrated such a model could be used to predict harvestable yield, as a function of architectural development at the seedling and juvenile stages. This could in consequence allow the use of such a model as a tool to optimize yield by proposing leaf architecture ideotypes optimized for a given temperature regime in autumn and winter, respectively, an optimal climate dynamics for a given cultivar.

#### CONCLUSION

The present study yielded new knowledge about the influence of temperature on leaf expansion in the early development of oilseed rape by combination of observed and literature-derived data. This combination is feasible and useful, as a complete measured dataset is almost never available. Furthermore, embedding of the datasets into an FSPM delivered interesting insights into the interactions of organogenetic and morphogenetic processes with temperature and into source**–**sink interactions in particular.

TABLE 6 | Total leaf area and dry weight of the whole plant in B. napus seedlings at different temperature treatment durations.


<sup>∗</sup> Within each row, means followed by the same letters are not significantly different according to the LSD test at P ≤ 0.01. 'a' is the highest and 'd' is the lowest.

#### AUTHOR CONTRIBUTIONS

fpls-08-00313 March 21, 2017 Time: 15:31 # 11

GB-S and WZ conceived the study. TT, LW, MH, and BA conducted experiments and analyzed data. MH and GB-S adapted the model. TT, MH, GB-S, and WZ wrote and revised the manuscript. All authors read and approved the manuscript.

#### FUNDING

This work was supported by the National High Technology Research and Development Program of China (2013AA103007),

#### REFERENCES


Jiangsu Collaborative Innovation Center for Modern Crop Production, the Science and Technology Department of Zhejiang Province (2016C02050-8), and the National Natural Science Foundation of China (31650110476, 31570434).

#### ACKNOWLEDGMENT

We thank Qiaojing Tao and Faisal Islam from the Institute of Crop Science of Zhejiang University for their help during the experiment.

5th International Workshop on Functional-Structural Plant Models, FSPM '07, Napier.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Tian, Wu, Henke, Ali, Zhou and Buck-Sorlin. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.