EDITED BY : Petr Smýkal, Eric Von Wettberg and Thomas M. Davis PUBLISHED IN : Frontiers in Plant Science

### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88966-143-5 DOI 10.3389/978-2-88966-143-5

### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: frontiersin.org/about/contact

# WILD PLANTS AS SOURCE OF NEW CROPS

Topic Editors:

Petr Smýkal, Palacký University, Olomouc, Czechia Eric Von Wettberg, University of Vermont, United States Thomas M. Davis, University of New Hampshire, United States

Citation: Smýkal, P., Von Wettberg, E., Davis, T. M., eds. (2020). Wild Plants as Source of New Crops. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88966-143-5

# Table of Contents


László Ivanizs, István Monostori, András Farkas, Mária Megyeri, Péter Mikó, Edina Türkösi, Eszter Gaál, Andrea Lenykó-Thegze, Kitti Szőke-Pázsi, Éva Szakács, Éva Darkó, Tibor Kiss, Andrzej Kilian and István Molnár

*45 Domesticating* Vigna Stipulacea*: A Potential Legume Crop With Broad Resistance to Biotic Stresses*

Yu Takahashi, Hiroaki Sakai, Yuki Yoshitsu, Chiaki Muto, Toyoaki Anai, Muthaiyan Pandiyan, Natesan Senthil, Norihiko Tomooka and Ken Naito

*57 Diversity of Naturalized Hairy Vetch (*Vicia villosa *Roth) Populations in Central Argentina as a Source of Potential Adaptive Traits for Breeding* Juan P. Renzi, Guillermo R. Chantre, Petr Smýkal, Alejandro D. Presotto, Luciano Zubiaga, Antonio F. Garayalde and Miguel A. Cantamutto

### *71 Pod Dehiscence in Hairy Vetch (*Vicia villosa *Roth)*

Lisa Kissing Kucek, Heathcliffe Riday, Bryce P. Rufener, Allen N. Burke, Sarah Seehaver Eagen, Nancy Ehlke, Sarah Krogman, Steven B. Mirsky, Chris Reberg-Horton, Matthew R. Ryan, Sandra Wayman and Nick P.


Jared Crain, Prabin Bajgain, James Anderson, Xiaofei Zhang, Lee DeHaan and Jesse Poland

*110 Exploration of the Yield Potential of Mesoamerican Wild Common Beans From Contrasting Eco-Geographic Regions by Nested Recombinant Inbred Populations*

Jorge Carlos Berny Mier y Teran, Enéas R. Konzen, Antonia Palkovic, Siu M. Tsai and Paul Gepts

*128 An Updated Checklist of the Sicilian Native Edible Plants: Preserving the Traditional Ecological Knowledge of Century-Old Agro-Pastoral Landscapes*

Salvatore Pasta, Alfonso La Rosa, Giuseppe Garfì, Corrado Marcenò, Alessandro Silvestre Gristina, Francesco Carimi and Riccardo Guarino

*143 Lost and Found:* Coffea stenophylla *and* C. affinis, *the Forgotten Coffee Crop Species of West Africa*

Aaron P. Davis, Roberta Gargiulo, Michael F. Fay, Daniel Sarmu and Jeremy Haggar

*161 A Chromosome-Scale Assembly of the Garden Orach (*Atriplex hortensis *L.) Genome Using Oxford Nanopore Sequencing*

Spencer P. Hunt, David E. Jarvis, Dallas J. Larsen, Sergei L. Mosyakin, Bozena A. Kolano, Eric W. Jackson, Sara L. Martin, Eric N. Jellen and Peter J. Maughan

*181 Seed Structural Variability and Germination Capacity in* Passiflora edulis *Sims f.* edulis

Nohra Rodríguez Castillo, Luz Marina Melgarejo and Matthew Wohlgemuth Blair


Vy Nguyen, Samuel Riley, Stuart Nagel, Ian Fisk and Iain R. Searle

*238 Southern Species From the Biodiversity Hotspot of Central Chile: A Source of Color, Aroma, and Metabolites for Global Agriculture and Food Industry in a Scenario of Climate Change*

Luis Letelier, Carlos Gaete-Eastman, Patricio Peñailillo, María A. Moya-León and Raúl Herrera


Kelly J. Vining, Kim E. Hummer, Nahla V. Bassil, B. Markus Lange, Colin K. Khoury and Dan Carver

# Editorial: Wild Plants as Source of New Crops

Eric von Wettberg1,2, Thomas M. Davis <sup>2</sup> and Petr Smy´kal 3\*

<sup>1</sup> Plant and Soil Science and Gund Institute for the Environment, University of Vermont, Burlington, VT, United States,

<sup>2</sup> Department of Agriculture, Nutrition, and Food Systems, University of New Hampshire, Durham, NH, United States,

<sup>3</sup> Department of Botany, Faculty of Science, Palacky University, Olomouc, Czechia

Keywords: crop wild relatives, domestication, genetic diversity, neo-domestication, breeding, ethnobotany

Editorial on the Research Topic

Wild Plants as Source of New Crops

The history of agriculture can be viewed as a series of key events, such as the Neolithic Revolution, post-domestication expansion of agriculture to new regions, secondary domestications of new crops, movement over the Silk Road, the Columbian Exchange, the Industrial Revolution, the Green Revolution and even the more recent, ongoing genomic revolutions. Each of these has had positive benefits, but they have also come at a cost, including to agricultural biodiversity.

It is estimated that on the Earth there are between 300,000 and 500,000 species of higher plants, of which approximately 369,000 have been identified or described (Willis, 2017). Many species are still unknown to science, while perhaps a third is at risk of extinction (Pimm and Joppa, 2015) . The number of plant species used for food by pre-agricultural human societies is estimated to be around 7,000, but only a small fraction of the diversity of the plant kingdom has been domesticated. Our present knowledge of domesticated plants largely reflects our experience of a relatively small number of living domesticates adapted to recent, Holocene environments. The process of crop domestication was based on selection driven by human cultivation practices and agricultural environments. Approximately, 2,500 species have undergone some degree of domestication, and 250 species are considered to be fully domesticated, in the sense that their full lifecycle became dependent on human cultivation (Meyer et al., 2012; Gaut et al., 2018; Smýkal et al., 2018). Humanity relies on a small collection of crop plants such as corn, rice, wheat, soybean, and potato constituting the majority of our dietary intake. Altogether, some 10 to 50 plant species together provide about 95% of the world's caloric intake. This concentration on a few species for most food is a key element of the vulnerability of the world food supply to the impact of climate change and the outbreak of major new plant diseases.

Crop wild relatives (CWRs) remain the largest reservoir of genetic diversity for crop improvement and have been utilized for major gene disease and pest resistance, and abiotic stress tolerance (Vavilov et al., 1992; Hajjar and Hodgkin, 2007; Warschefsky et al., 2014; Castañeda-Á lvarez et al., 2016; Smýkal et al., 2018; Coyne et al., 2020). However, there remains a large set of plant species from various plant families and genera which have favorable traits but have not been domesticated so far. As we have been gaining knowledge on the genomic and biological background of domestication processes, we can apply more effective selection to domesticate more wild species. Since many wild taxa are locally adapted to particular habitats and contain significant genetic diversity, this might create novel crops and help us to achieve more environmentally sustainable agriculture as we face climate change. Not all candidates for neo-domestication are CWR, although many and perhaps most will be, because the form/function of the related crop species provides a useful template to guide the neo-domestication of the CWR. On the other hand, all of the wild sources of useful genes for introgression into crop species are necessarily

### Edited and reviewed by:

Inaki Hormaza, Institute of Subtropical and Mediterranean Horticulture La Mayora, Spain

> \*Correspondence: Petr Smy´ kal petr.smykal@upol.cz

### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 04 August 2020 Accepted: 28 August 2020 Published: 11 September 2020

### Citation:

von Wettberg E, Davis TM and Smy´ kal P (2020) Editorial: Wild Plants as Source of New Crops. Front. Plant Sci. 11:591554. doi: 10.3389/fpls.2020.591554

**5**

CWR as a prerequisite for their ability to cross with a crop relative. So, a given CWR can be a valuable source for gene introgression into its crop relative, and can at the same time be a candidate for neo-domestication.

Given our reliance on fewer and fewer crop species, as well as the need to more sustainably intensify agricultural production and to develop more nutritious crops, utilizing a greater range of plant diversity is critically important. While conventional breeding by hybridization and progeny selection augmented by marker-assisted selection are likely to be the most practicable approaches for the time-being, recent developments in genome editing technologies, genomic selection, and inoculation of plants with beneficial microbes may help accelerate this process in the future by converting the wild forms of key domestication genes to their respective domesticated forms, as reviewed by Van Tassel et al.

Several tools exist for those wishing to neo-domesticate a wild species or to make an existing crop more tractable to genetic improvement. The most obvious of newly emerging tools are gene editing approaches. Although not technically trivial, gene editing offers a neo-domesticator the opportunity to simply engineer the domestication traits of interest, such as indehiscent propagules, based on knowledge from other taxa (e.g., Ogutcen et al., 2018). However, other tools of modern genomics may be useful to those wishing to neo-domesticate a crop. These include genome sequencing and resequencing to characterize molecular variation, Genome Wide Association Studies (GWAS) to find associations between genes and traits of interest and genomic selection to improve multi-genic traits in a breeding program. We also review several core concepts in neo-domestication. Our special issue focuses nearly equally on neodomestication of wild species and harnessing an understanding of domestication traits to improve existing crop species.

This Research Topic, Wild Plants as Source of New Crops, covers the following topics:

1. De Novo Domestication of Plant Species

One of our examples of de novo domestication comes from Takahashi et al., who examine neo-domestication in Vigna stipulaceae. The genus Vigna, which includes nine independent domestications, is particularly ripe for domestication. Many of the wild species, such as V. stipulaceae, have useful adaptation traits such as drought tolerance or salinity tolerance, making them desirable targets for neo-domestication. Renzi et al. have looked at Vicia villosa, a stress tolerant and rapidly growing forage. Kissing Kucek et al. addressed the shattering pod traits that makes V. villosa weedy and difficult to control in field settings. Selecting for non-dehiscence would help to domesticate this useful species. Common vetch has also been utilized as a drought tolerant forage (Nguyen et al.). Another take on neo-domestication is to look for traits that may preadapt certain lineages to domestication. Bronnvik and von Wettberg look at bird dispersal in legumes as a possible trait preadapting some legumes toward indehiscent pods.

2. Use of Crop Wild Relatives to Broaden Genetic Diversity of Crops

Crop wild relatives are an important source of variability for existing crops and can be used more effectively when the genetic basis of domestication traits are understood. Henry reviews the diversity of wild rice relatives in Australia, while Wang et al. use GWAS analysis of weedy rice in relation to seed nutritional quality. Berny Mier y Teran et al. have examined wild relatives of common bean, Phaseolus vulgaris, with introgression populations. Ivanizs et al. examine the potential of Aegilops biuncialis for wheat improvement. Sharma et al. examine the tertiary genepool of pigeonpea, Cajanus cajan. Interesting rediscovery of two coffee species is reported by Davis et al.

3. Use of Perennial Species

Crain et al. have used genomic selection for improvement of intermediate wheatgrass. Herron et al. examine early growth traits across the genus Phaseolus, with an emphasis on wild perennial taxa, with the aim of identifying perennial species that might be suitable targets for neo-domestication. An intriguing illustration of domestication parallels is provided by analysis of annual and perennial sunflowers (Asselin et al.). There are many domesticated tree species. They differ from typical seed propagated annual or perennial species due to their long lifespan and combination of sexual and clonal propagation. Differences in seed anatomy and germination capacity of wild and cultivated passion fruit have been studied by Castillo et al.

4. Novel Domestication Syndromes

Zhao et al. have examined the domestication of Zizania latifolia, a grass with a novel domestication as a vegetable, where a symbiosis with an endosymbiotic fungus is essential to the domestication syndrome.

5. Medicinal and Horticultural Plants, Their Domestication, and Diversity

Pasta et al. have examined the flora of Sicily for edible plants, in an effort to preserve ecological knowledge and potential medicinal value. Letelier et al. make a similar examination of the flora of Chile. A genomic approach to chromosome scale assembly of garden orach (Atriplex hortensis) is presented by Hunt et al. The use of CWR germplasm for cultivar improvement of the widely used herb mint (Mentha) is shown by Vining et al.

# AUTHOR CONTRIBUTIONS

TD, EW, and PS have equally contributed to writing and final editing. All authors contributed to the article and approved the submitted version.

# FUNDING

PS is supported by the Grant Agency of the Czech Republic and Palacký University grant Agency [IGA-2020\_003]. EW is supported by the USDA Hatch program through the Vermont State Agricultural Experimental Station and is supported to work on diversity of mungbeans by Russian Scientific Fund Project No. 18-46-08001 on the basis of a unique scientific installation "Collectionof plant genetic resources VIR". TD is supported in part by the USDA National Institute of Food and Agriculture Hatch Project NH00678 (accession number 1019990).

# REFERENCES


Conflict of Interest: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 von Wettberg, Davis and Smykal. This is an open-access article ́ distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Bird Dispersal as a Pre-Adaptation for Domestication in Legumes: Insights for Neo-Domestication

*Hester Brǿnnvik1† and Eric J. von Wettberg1,2\*†*

*1 Department of Plant and Soil Science, University of Vermont, Burlington, VT, United States, 2 Mathematical Biology and Bioinformatics Laboratory, Peter the Great St. Petersburg Polytechic University, Saint Petersburg, Russia*

Keywords: neodomestication, bird dispersal, legumes (fabaceae), preadaptation, natural history, domestication syndrome, seed dispersal

### *Edited by:*

*Roberto Papa, Marche Polytechnic University, Italy*

### *Reviewed by:*

*Giovanna Attene, University of Sassari, Italy Michael Benjamin Kantar, University of Hawaii, United States*

*\*Correspondence:*

*Eric J. von Wettberg ebishopv@uvm.edu*

*†ORCID*: *orcid.org/0000-0002-2724-0317*

### *Specialty section:*

*This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science*

*Received: 17 July 2019 Accepted: 18 September 2019 Published: 15 October 2019*

### *Citation:*

*Brǿnnvik H and von Wettberg EJ (2019) Bird Dispersal as a Pre-Adaptation for Domestication in Legumes: Insights for Neo-Domestication. Front. Plant Sci. 10:1293. doi: 10.3389/fpls.2019.01293*

The first chapter of Darwin's *On the Origin of Species* famously used examples of human selection of domesticated plants and animals to lay the groundwork for the Theory of Evolution (Darwin, 1859). Since at least the works for Darwin (1868), domestication of plants and animals have been used as major examples of strong selection radically altering the morphology, architecture, and behavior of organisms on which our contemporary society relies for food, fiber, and fuel. Consequently, it is not surprising that crop domestication remains a vibrant area of research. Despite this ongoing interest in the field of domestication, we greatly lack ecological and natural history studies of crop wild relatives in their wild settings.

One instance of the great lack of natural history of crop wild relatives is in understanding, the dispersal biology of crop wild relatives as a potential pre-adaptation for domestication. The domestication of annual crops such as cereals and grains legumes is thought to require two key traits that are part of the domestication syndrome (Hammer, 1984): seeds that lack dormancy and fruit structures that are indehiscent (Purugganan and Fuller, 2009; Ogutcen et al., 2018; Smýkal et al., 2018). Debate in the domestication literature has long been generally assumed that both of these traits are disfavored in natural populations and then speculated whether the genetic changes underlying the domestication shift arose from standing variation at low frequency in natural populations (mildly deleterious alleles) or new mutations (e.g., Morrell et al., 2012; Olsen and Wendel, 2013; Gaut et al., 2018; Hufford et al., 2019; Lye and Purugganan, 2019). However, the assumption that these traits would be disfavored in natural populations is just that and is not based on careful observation of actual dispersal or germination biology. We think it is possible that these traits may, in some, taxa have evolved long ago as part of their dispersal biology, and that this possibility has not been sufficiently studied. Birds and other animal dispersers may more effectively disperse whole fruit than individual seeds, favoring indehiscence in legumes—but we do not know this for a critical lack of natural history knowledge. If so, this is a pre-adaptation for domestication which may have pre-disposed some groups toward domestication and can be used in selected taxa for neodomesticaiton. Here, we document the limited natural history knowledge of bird dispersal in legumes that are consumed by humans and show a significant gap in our natural history knowledge of crop wild relatives.

Animal dispersal syndromes in plants are well described—for example, by the landmark review by Howe and Smallwood (1982). Birds can be particularly effective at dispersing seeds a great distance and to suitable habitats. Bird dispersal can be critical to allow offspring to escape higher disease pressure by escaping the proximity of parents. Many bird dispersal behaviors also allow directed dispersal to particular habitats where seedling establishment is more likely to be successful, whether by burying seeds through caching behavior or dropping consumed seeds with fertilizing feces. However, birds are not the only effective animal dispersers of seeds. For

**8**


TABLE 1 | Examples of bird dispersal in legumes.

herbaceous plants or grasses that occur in habitats where bird dispersal may not be effective, large mammalian herbivory may be a more common form of long-distance dispersal (Janzen, 1984). Ants are also often effective dispersers, particularly in temperate environments, although they can also be significant seed predators (e.g., Hulme, 1998).

To see if animal dispersal vectors for a suite of crops are well done, we performed a literature search using the terms "bird" and "dispersal" and a list of cultivated legumes from Smýkal et al. (2014) and our own knowledge, to find published or gray-literature reports of bird dispersal of legumes in genera known to have cultivated species. In our literature search, we were only able to uncover four instances of descriptions of bird dispersal in genera with cultivated legumes (**Table 1**). This is a rather small number of reports, particularly given the very large size of both of the *Fabaceae* and the high number of cultivated species in the family. This search approach almost certainly misses reports in languages other than English but is sufficient to make our primary point very clearly, which there is an absence of important natural history work characterizing crop wild relatives.

These examples do show some important observations. In *Phaseolus*, bird dispersal is likely widespread, and likely a key component of the very large distribution of wild *Phaseolus vulgaris* (and potentially *Phaseolus lunatus,* lima bean) from Mexico to the central Andes of South America (Ariani et al., 2017). This broad distribution likely contributed to the two independent domestications of common bean (and lima bean). Although the possibility bird dispersal has been remarked upon (Gepts, pers. comm.), there are not many natural history observations of wild *Phaseolus* to determine how widespread bird dispersal is, whether different guilds of birds are responsible for it, or whether *Phaseolus* species vary in their propensity for bird dispersal.

Understanding whether birds are a disperser is a starting point for understanding dispersal syndromes in the wild. As a preadaptation for cultivation and domestication, the pod would need to be indehiscent. This is a trait that can be tested in a common garden, although dehiscence can be modulated by the environment, such that the humidity of the environment may affect dehiscence (e.g., Lush et al., 1980; Ogutcen et al., 2018). Consequently, dehiscence in a humid environment may not indicate dehiscence in an arid environment.

Most taxa, particularly those in parts of Africa, South, Southeast, Southwest, and East Asia, and South America, where natural history observations published in English may be particularly absent, likely are simply data deficient. Given that the majority of legumes were domesticated in Vavilovian centers of origin in these regions, certainly there is almost simply a great lack of natural history data.

There is a great need for more ecological study and natural history observation of crop wild relatives. Crop wild relatives are the most significant reservoir of adaptative variation for providing disease resistance, abiotic stress tolerance, and other important traits to cultivated species. As crop wild relatives receive almost no conservation protection in natural populations and are badly underrepresented in most Genebanks (Maxted and Kell, 2009; Warschefsky et al., 2014), natural history study of these species remains critical to their long-term conservation.

### AUTHOR CONTRIBUTIONS

Both authors contributed to the idea development, literature search, and writing of this mini-review.

# FUNDING

The literature review portion of this work was supported by Russian Scientific Fund Project No. 18-46-08001 on the basis of a unique scientific installation «Сollection of plant genetic resources VIR», by a cooperative agreement from the United States Agency for International Development under the Feed the Future Program AID-OAA-A-14-00008 to D.R.Cook and Co-PI EW, by a grant from the US National Science Foundation Plant Genome Program under Award IOS-1339346 to D.R.Cook, and EW; US NIFA grant # 2018-67013-27619 R. Varma Penmetsa and EW. EW is further supported by the USDA Hatch program through the Vermont State Agricultural Experimental Station.

### ACKNOWLEDGMENTS

The authors thank Paul Gepts and Roberto Papa for helpful conversation.

# REFERENCES


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Brǿnnvik and von Wettberg. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Australian Wild Rice Populations: A Key Resource for Global Food Security

### *Robert J. Henry\**

*Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, QLD, Australia*

Rice is one of the most important food crops contributing to the diet of large numbers of people especially in Asia. Rice (*Oryza sativa*) was domesticated in Asia many thousands of years ago and more recently independently in Africa. Wild rice populations are found around the tropical world. The extensive production of rice in many areas has displaced the wild populations that were the basis of the original domestications by humans. Recent research, reviewed here, has identified wild rice species in northern Australia that have been isolated from the impact of domestication in Asia. Wild rice populations contain novel alleles that are a source of desirable traits such as erect habit, disease resistance, large grain size, and unique starch properties. These populations include the most divergent genotypes within the primary gene pool of rice and more distant wild relatives. Genome sequencing also suggests the presence of populations that are close relatives of domesticated rice. Hybrid populations that demonstrate mechanisms of ongoing evolution of wild *Oryza* have been identified in the wild. These populations provide options for both new domestications and utilization of novel alleles to improve or adapt domesticated rice using conventional or preferably new breeding technologies. Climate change and growing food demands associated with population and economic growth are major challenges for agriculture including rice production. The availability of diverse genetic resources to support crop adaptation and new crop domestication is critical to continued production, and increased efforts to support *in situ* and *ex situ* conservation of wild *Oryza* and related species are warranted.

### Keywords: wild rice, *Oryza*, domestication, genetic diversity, Australia

### INTRODUCTION

Rice is a key food crop with ongoing need for genetic improvement to satisfy food security. The *Oryza* genus is distributed around the tropical world. Domesticated rice has been cultivated in many of the areas that would have been native habits for wild *Oryza* species. Australia is a region that has escaped from the impact of rice domestication until very recently resulting in the persistence of many extensive populations of wild *Oryza* (Henry et al., 2010) across a very large area of northern Australia.

Advances in DNA sequencing technology have made it possible to analyze the genomes of large numbers of individuals. This has had significant impact on human genetics but also on rice biology revealing much about the diversity of the populations, the relationships between

### *Edited by:*

*Eric Von Wettberg, University of Vermont, United States*

### *Reviewed by:*

*Hongwei Cai, China Agricultural University (CAU), China Fengxia Liu, China Agricultural University (CAU), China*

> *\*Correspondence: Robert J. Henry robert.henry@uq.edu.au*

### *Specialty section:*

*This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science*

*Received: 07 August 2019 Accepted: 01 October 2019 Published: 22 October 2019*

### *Citation:*

*Henry RJ (2019) Australian Wild Rice Populations: A Key Resource for Global Food Security. Front. Plant Sci. 10:1354. doi: 10.3389/fpls.2019.01354*

**11**

different populations, and the evolutionary events leading up to domestication by humans (Wambugu et al., 2018).

### *Oryza* Genus

The *Oryza* genus (Mondal and Henry, 2018) is pantropical in distribution with two domesticated and 24 wild species. Rice was domesticated in Asia as *O. sativa* and in Africa as *Oryza glaberrima*. The AA genome group (*O. sativa* complex) includes the two domesticated species and six wild relatives. The genus has 11 genome types with the AA, CC, and EE genomes types


found in Australia. Other members of the *Oryzeae* tribe are also found in Australia (**Table 1**). These taxa are found in northern and eastern Australia (**Figure 1**).

### AA Genome Populations in Australia

The close relatives of domesticated rice, the AA genome species, represent the primary gene pool of rice. The close relatives of domesticated rice in northern Australia have been shown to be highly diverse, including at least two distinct taxa and represent a yet unexplored wider genetic resource for global rice production. Populations have been identified with morphological and genetic characteristics suggesting a hybrid origin and some degree of ongoing gene flow between these taxa that could support continued rapid evolution of the wild *Oryza* in Australia. These Australian *Oryza* could be critical in adaptation of rice to rapid climate change and changing consumer demands and preferences. Recent evidence also suggests that, despite rice being domesticated in Asia (Huang et al., 2012), the Australian populations have contributed genes to domesticated rice (Fujino et al., 2019). The AA genomes species in Australia have been generally classified as *Oryza meridionalis* or *Oryza rufipogon*.

### Oryza meridionalis

*Oryza meridionalis* (**Figure 2**) was first described relatively recently, in 1981, being most readily distinguished from *O.* 

FIGURE 2 | Continued

*rufipogon* in the field by having smaller anthers, longer awns, and closed panicles. This species had been considered to be an annual, but more recent research has identified widespread populations of perennials in northern Queensland (Sotowa et al., 2013). The distinctness of the annual and perennial types has not been established, but both types are evidenced by the survival or otherwise, of plants from different populations, in the glasshouse after flowering (Sotowa et al., 2013).

### Oryza rufipogon

*O. rufipogon* (**Figure 2**) was described by Griff (1851). Earlier taxonomy had included these plants in the wider group known as *Oryza perennis.* The Australian populations show significant molecular differences from Asian *O. rufipogon* and share some chloroplast sequence homology with *O. meridionalis* (Waters et al., 2012; Brozynska et al., 2014) suggesting that they might best be considered as distinct taxa. The morphological differences between Asian and Australian *O. rufipogon* are not clear with limited data from common garden experiments that would allow direct analysis of quantitative traits in the same environment. *O. rufipogon* in Australia can be most easily distinguished from *O. meridionalis* in the field by the presence of open panicle, shorter awns, and larger anthers. *O. rufipogon* and *O. meridionalis* may be confused in some Australian herbarium records with many collections being prior to the publication of *O. meridionalis*. This is further complicated by the potential for rare hybrid populations. The hybrid population has mixtures of the distinguishing traits described above. An analysis of the current distribution of these taxa and their hybrids would require sampling over a very wide area in Queensland, Northern Territory, and Western Australia.

### Molecular Analysis of Australian AA Genome Species Chloroplast Genomes

The sequence of the chloroplast genome has been widely used to determine evolutionary relationships in plants

(Wambugu et al., 2015). The advantages of this approach include the highly conserved nature of the chloroplast making direct comparisons of the same sequence in plants less difficult and providing a significant amount of DNA sequence to compare. Comparison of the chloroplast sequences of wild rices that are closely related to domesticated rice has defined events leading up to rice domestication.

Analysis of the chloroplasts of Australian AA genome (Nock et al., 2011) indicated that chloroplast genomes are distinct from those of other populations in Asia (Waters et al., 2012; Brozynska et al., 2014). These species have been used to develop methods for the assembly of an accurate complete chloroplast genome sequence from short read sequence data (Nock et al., 2011). These methods have used mapping to a close reference and *de*  *novo* assembly and rationalized the differences obtained by these two approaches (Guyeux et al., 2019).

Analysis of Australian AA genome populations confirmed the presence of two distinct chloroplast genomes that form an Australian clade that is an outgroup relative to all Asian and African populations (Moner et al., 2018). A survey of the whole chloroplast genomes of 58 genotypes of wild rice showed that the distinct Australian clade extended north to the Philippines (Moner et al., 2018). The study also confirmed the close relationships between the chloroplast sequences of *Oryza glumipatula* in South America and *Oryza longistaminata* in South Africa. This suggests inter-continental exchange of genetic material relatively recently. This exchange can be viewed as part of ongoing movement of wild rice around the tropics with birds as likely vectors. The presence of a wild rice in the Philippines with an Australian chloroplast type may be further evidence of long distance dispersal events.

Modern domesticated rices can be divided into several types including *indica*, *japonica*, and *aus* types that may have resulted from separate domestication events (Civan et al., 2015). The *indica* and *japonica* types correspond to distinct chloroplast sequences in wild populations indicating capture of these in separate domestication events. The *aus* type was found to include individuals with either wild chloroplast types (Moner et al., 2018) indicating a more complex genetic origin (Choi et al., 2017).

### Nuclear Genomes

Around 3 million years ago, a divergence in the wild rice populations that are progenitors of domesticated rice and still inter-fertile with domesticated rice resulted in a lineage that became *O. meridionalis* (Moner and Henry, 2018), a species found in Australia, separating from those leading to *O. rufipogon*, the wild rice found in Asia. Analysis of the nuclear genome shows that the Australian populations of that have been considered to be *O. rufipogon* diverged from those in Asia around 1.7 million years ago (Brozynska et al., 2017). The available evidence suggests that these *O. rufipogon* like Australian populations should also be considered as a separate species. The divergence of the Asian and African wild rices in the primary gene pool of rice was shown to be more recent.

The genome of the *O. rufipogon* like Australian wild rice showed some evidence for possible introgression of chromosomal regions from wild *indica* types (Brozynska et al., 2017).

Populations with a mixture of the traits of different taxa have been identified in the wild in Australia (Moner et al., 2018). These populations have been confirmed as hybrids by examination of the chloroplast and nuclear genomes. Plant populations were identified with characteristics intermediate between those of the two AA genome taxa. The panicle varied from opened to closed and the anther length varied. The nuclear genome was divergent and, despite morphology similar to *O. meridionalis*, the chloroplast was that of the *O. rufipogon* type. Field collections of 29 populations (Moner et al., 2018) from northern Queensland have been made from the northern to the southern extremes of the current natural distribution. These collections are the subject of ongoing analysis by DNA sequencing.

Comparison with the genome of domesticated rice suggest that the Australian wild species have significant diversity, even in chromosomal regions of very low diversity ("genome deserts") in the domesticated rice gene pool (Krishnan et al., 2014), making them import sources of novel diversity for rice improvement. These regions of low diversity suggest that selective sweeps may have occurred in wild populations, possibly due to pests or diseases, and that these areas of the genome may have been further depleted of variation by bottlenecks during domestication.

### Other Related Species

Other more distant relatives of rice are also found in the Australian flora. They have generally received less research attention than the Australian members of the AA genome clade. These more divergent species include *Oryza* species, *Oryza officinalis*, and *Oryza australiensis* and other species from within the tribe *Oryzeae* such as *Potamophila* (Abedinia et al., 1998) and Leersia and other species from the subfamily *Ehrhartoideae* such as *Microlaena stipoides* (Shapter et al., 2013). These more diverse species also have genes that may be useful in rice improvement with more difficulty. However, despite it being a more distant relative of rice, *O. australiensis* has already been used successful as a genetic resource in rice improvement. Techniques such as gene editing may allow more rapid deployment of useful novel alleles, from all of these wild relatives, in the domesticated gene pool.

### Oryza officinalis

*O. officinalis* (CC genome) has been reported only in the extreme north of Australia on the mainland and on Moa Island in the Torres Strait. Collect of this species has not been reported recently, and the status of the Australian populations is uncertain. The author was unable to locate populations on Moa Island during a visit in 2016. This species is part of the CC genome group of *Oryza* species. Populations reported in Australia may not be permanently established and may be the result of occasional migration from Asia suggesting that the current presence of this species in Australia requires confirmation.

### Oryza australiensis

*O. australiensis* (EE genome; **Figure 2**) is widespread across northern Australia*.* This species is usually perennial and grows in seasonally dry environments as it can survive the dry season as a rhizome. This species is often found in areas surrounding permanent water holes or lakes that support perennial AA genome *Oryza* or temporary bodies of water in depressions supporting annual AA genome *Oryza* in the wet season but also extending further away from the surface water on waterlogged soils beyond these water bodies. The plants can be very tall reaching 3 m in height but can also be observed as very small plants in environments that limit growth such as populations growing on subsoil in roadside drains (personal observations).

### Potamophila parviflora

River grass *(Potamophila parviflora*; **Figure 2***)* is found in just a few rivers (Manning, Hastings, Clarence, and Richmond Rivers) on the central east coast of Australia in northern New South Wales (Abedinia et al., 1998). It appears to grow only in the shallow running water of these rivers. This monotypic genus is probably more closely related to *Zizania* of North America than to *Oryza*. Cold adaptation and the presence of separate male and female flowers are potentially useful traits. This species grows just south of the border between Queensland and New South Wales but has not been reported in Queensland despite some searching by the author and others.

### Leersia hexandra

*Leersia hexandra* is a perennial found in northern and eastern Australia from the Northern Territory to north-eastern New South Wales. The use of species of *Leersia* as food has not been documented. Plants have recently been collected in urban areas close to Brisbane. The exotic *Leersia oryzoides* is also found in Australia.

### Microlaena stipoides

Weeping grass (*M. stipoides*) is found throughout temperate Australia demonstrating adaptation to cooler climates. Attempts have been made to domesticate this species as a cool climate dryland rice crop. Mutagenesis (Shapter et al., 2013; Malory et al., 2011) has allowed recovery of non-shattering types and this together with selection for large grain has generated genotypes with potential for grain production. This widespread species is often found in shaded environments and has been used as a lawn grass.

### Useful Traits for Rice Improvement in Wild Australian Populations

The capture of genes from *O. meridionalis* has been successful because, although the F1 hybrids with *O. sativa* have pollen sterility, they retain female fertility allowing backcrossing of desirable traits into domesticated rice (Yoshimura et al., 2010). Potentially useful traits include grain yield, grain size, grain color, starch properties, cooking qualities, eating qualities, and disease resistances. The genes responsible for these traits have not been characterized to date, but the availability of genome sequence data and more intensive collections of the populations of these taxa will facilitate the identification of useful genes and alleles.

### Yield

The yield of Australian wild rices in cultivation has not been evaluated. The difficulty of cultivating a species that is so prone to shattering has made this difficult. The abundant grain observed on plants in the wild suggests that good yields might be achieved if this problem was overcome. The large grain size of some Australian wild rices may be a trait that could contribute to yield. Yield evaluation in comparative trials has not been conducted for any of these species.

### Grain Quality

The grains of Australian wild rice vary from medium to long grain and include some with large grain size (Tikapunya et al., 2017). Transfer of this trait to domesticated rice is in progress. The grains of the A genome species are long or medium while *O. australiensis* grains are short. The wild rice grains are all slender when compared to domesticated rice. The impact of nutrition on grain filling and grain size has not been examined for these species, and improved grin size may be possible with optimal plant nutrition.

Australian wild rice has a color that would be described as red in rice color language. The Australian populations have colors that appear to be distinct from that of Asian red rices that have been available for comparisons. This suggests that a key market option for these species is as colored rice that has not been milled.

The starch properties of rice reflect the eating and nutritional qualities of the grain (Wang et al., 2015). The Australian species have unique starch properties (Kasem et al., 2012–Tikapunya et al., 2017). All taxa (including AA genome species and *O. australiensis*) had high amylose contents and gelatinization temperatures. Pasting properties varied. Analysis of starch structure suggested the presence of shorter chain amylose in *O. australiensis* (Tikapunya et al., 2017). Much remains to be understood about the novel starch properties of these rice species and the extent of variation in wild populations. The genetic basis of the starch properties is also not determined.

Sensory evaluation of Australian wild rice has indicated acceptable eating qualities (Tikapunya et al., 2018). Australian wild rices have been compared with Asian domesticated rices (including long grain, medium grain, basmati, red basmati, and red rice) and with Canadian wild rice (*Zizania palustris*). Specific descriptors were developed for sensory evaluation of these rices. The Australian wild rice was similar to the red rice and red basmati having a mild aroma and flavor but without the lingering aftertaste. The wild rice had a firmer texture and required a longer cooking time. The overall cooking profiles, sensory, and physical attributes suggest and that Australian wild rice has potential for commercialization in the colored rice market and may be a useful genetic resource for rice breeding.

### Disease Resistance

Production of domesticated Asian rice in northern Australia has been limited by a high incidence of disease relative to rice production further south in Australia. The southern production areas have been well outside the natural range of rice and rice pathogens. Rice and associated rice pathogens (Khemmuk et al., 2016) are native to northern Australia and a wide range of pathogen species are present. Much of the diversity of these organisms remains to be explored. Native Australian rices have resistance to the local strains of rice pathogens and should provide a useful source of resistance to local diseases for use in breeding rice varieties for this region. Asian domesticated rice varieties have not been developed to have resistance to the native rice pathogens of northern Australia. The wild populations are key resources for the successful establishment of rice production on the large areas of land that are suitable for production in northern Australia. Crops of domesticated rice have been devastated by disease in Northern Australia while surrounded by wild populations that appear resistant.

### Human Impact on Populations

The wild populations are extensive in the extreme north but are threatened by developments including the improvements of roads and the resulting more rapid spread of invasive weeds into new areas, associated with increased traffic. Aquatic weeds are displacing wild populations at some locations including protected areas such as national parks. Greater awareness of the presence of these populations and their potential to contribute to global food security is necessary to ensure adequate measures are taken to protect the wild populations.

The Australian landscapes in which rice is found in Australia have been inhabited by humans for a very long time. The way in which rice was used as food is not well documented. It is possible that human use has impacted on the genetics of wild rice populations in Australia. The large grain size of some Australian populations (Sotowa et al., 2013) might reflect some human selection. Shattering is a key trait associated with domestication of grains. The wild species are all extremely prone to shattering suggesting little progress toward a non-shattering domesticated type. However, recent evidence shows that domesticated populations can rapidly revert to shattering types under natural selection (Zeng et al., 2018).

### Potential for New Domestications

In addition to their use in rice improvement, the Australian *Oryza* could be targeted for direct domestication as new rice types. The key trait to be overcome in domestication is probably shattering, with wild populations all demonstrating high shattering characteristics. The option of domestication needs to be considered along with the potential to source many genes for useful traits in these populations for transfer into domesticated rice. The traits that are most desirable are disease resistance, grain size, grain color, and grain quality (starch properties). The strategy of new domestication may be especially useful in developing rice varieties adapted to new climates with advancing or rapid climate change especially

### REFERENCES


with the availability of extensive genomics resources for rice (Abberton et al., 2016).

### CONCLUSIONS

Wild rice populations in Australia include species at varying distances from domesticated rice. These provide a rich resource for use in rice improvement. Both novel alleles and novel traits might be sourced from the Australian wild genotypes. Increased efforts to conserve these genetic resources would be justified. The current very limited *ex situ* collections in seed banks need to be supplemented by much more extensive collections. The *in situ* conservation of these species would be aided by greater awareness of the populations and their importance and may involve efforts to control weeds invading the aquatic habitats of the wild *Oryza*. Revision of the taxonomy of the A genome species is suggested by the available molecular evidence. Clarification of the taxonomy may aid conservation efforts by ensuring efforts to conserve rarer populations, including suspected hybrid populations. The growing genomic resources for the *Oryza* genus (Stein et al., 2018) would be extended usefully by more efforts on the relatively poorly characterized Australian members of this group especially as they include the more divergent members of the A genome clade.

### AUTHOR CONTRIBUTIONS

This manuscript is the work of the author.

### FUNDING

The Australian Research Council provided support for much of the research reviewed here.


**Conflict of Interest:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Henry. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Inferring the Origin of Cultivated Zizania latifolia, an Aquatic Vegetable of a Plant-Fungus Complex in the Yangtze River Basin

*Yao Zhao1,2, Zhiping Song2, Lan Zhong3, Qin Li1,2, Jiakuan Chen2 and Jun Rong1\**

1 Jiangxi Province Key Laboratory of Watershed Ecosystem Change and Biodiversity, Center for Watershed Ecology, Institute of Life Science and School of Life Sciences, Nanchang University, Nanchang, China, 2 The Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, Institute of Biodiversity Science, Fudan University, Shanghai, China, 3 Institute of Vegetable, Wuhan Academy of Agriculture Science and Technology, Wuhan, China

### Edited by:

Eric Von Wettberg, University of Vermont, United States

### Reviewed by:

Michael Benjamin Kantar, University of Hawaii, United States Zhe Cai, Institute of Botany, China

> \*Correspondence: Jun Rong rong\_jun@hotmail.com

### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 24 June 2019 Accepted: 10 October 2019 Published: 08 November 2019

### Citation:

Zhao Y, Song Z, Zhong L, Li Q, Chen J and Rong J (2019) Inferring the Origin of Cultivated Zizania latifolia, an Aquatic Vegetable of a Plant-Fungus Complex in the Yangtze River Basin. Front. Plant Sci. 10:1406. doi: 10.3389/fpls.2019.01406

Crop domestication is one of the essential topics in evolutionary biology. Cultivated Zizania latifolia, domesticated as the special form of a plant-fungus (the host Zizania latifolia and the endophyte Ustilago esculenta) complex, is a popular aquatic vegetable endemic in East Asia. The rapid domestication of cultivated Z. latifolia can be traced in the historical literature but still need more evidence. This study focused on deciphering the genetic relationship between wild and cultivated Z. latifolia, as well as the corresponding parasitic U. esculenta. Twelve microsatellites markers were used to study the genetic variations of 32 wild populations and 135 landraces of Z. latifolia. Model simulations based on approximate Bayesian computation (ABC) were then performed to hierarchically infer the population history. We also analyzed the ITS sequences of the smut fungus U. esculenta to reveal its genetic structure. Our results indicated a significant genetic divergence between cultivated Z. latifolia and its wild ancestors. The wild Z. latifolia populations showed significant hierarchical genetic subdivisions, which may be attributed to the joint effect of isolation by distance and hydrological unconnectedness between watersheds. Cultivated Z. latifolia was supposedly domesticated once in the low reaches of the Yangtze River. The genetic structure of U. esculenta also indicated a single domestication event, and the genetic variations in this fungus might be associated with the diversification of cultivars. These findings provided molecular evidence in accordance with the historical literature that addressed the domestication of cultivated Z. latifolia involved adaptive evolution driven by artificial selection in both the plant and fungus.

Keywords: domestication, Zizania latifolia, Ustilago esculenta, genetic structure, approximate Bayesian computation

# INTRODUCTION

Cultivated plants are fundamental materials for ensuring the survival and development of human beings (Hajjar and Hodgkin, 2007; Ronald, 2014; FAO, 2019). Through domestication, humans are able to change the morphological and physiological performances of wild plants. With ongoing selective pressure, wild plants would finally adapt to human-derived cultivation environments and provide products that humans demand (Zeder et al., 2006; Gross and Olsen, 2010; Giovannoni, 2018). Plant domestication is regarded as a complex plant-animal co-evolution process, and it is an ideal model to study evolution (Purugganan and Fuller, 2009; Tufto, 2017; Whitehead et al., 2017; Zeder, 2017). As a typical scenario of recent rapid evolution, domestication has demonstrated how a species diverges from its ancestor through artificial selection (Zeder et al., 2006; Purugganan and Fuller, 2009; Diez et al., 2015; Wang et al., 2018).

For a cultivated plant, it is of great importance to decipher the puzzle of its domestication to understand the genetic basis for further improvements. Regardless of the domesticated species we study, there are several questions that need to be answered: what the ancestor is, when and where the domestication occurred, and what happened during the wild-to-cultivated transition (Doebley et al., 2006; Zeder, 2015). Benefitting from cumulative archaeological evidence and the advances of molecular biology, these questions appear to be well resolved for some major crops (Doebley et al., 2006). For example, archaeological and genetic evidence indicated that maize (*Zea mays*) had been domesticated once from its ancestor species teosinte (*Zea mexicana*) in Mexico (reviewed in van Heerwaarden et al., 2011), and common bean (*Phaseolus vulgaris*) had more than two independent domestication events (reviewed in Bitocchi et al., 2012). However, the origins of wellstudied crops still may be under debate due to the uncertainty and complexity of historical human activities, such as Asian cultivated rice (*Oryza sativa*) (Ge and Sang, 2011; Huang et al., 2012; Choi and Purugganan, 2018) and a few grain crops generally thought to have originated from the Fertile Crescent (*Hordeum vulgare*, *Triticum aestivum*, and *Avena sativa*) (Fuller et al., 2012; Heun et al., 2012). Moreover, the answers to the domestication puzzles for a given species could also be blurred by post domestication migrations or genetic improvements (eg. cross-regional introduction, diversification of varieties, and hybridizations with wild progenitors) (Burger et al., 2008; Diez et al., 2015). Therefore, cultivated plants may be domesticated in their own way, which should be dissected independently.

The genus *Zizania* belongs to the rice tribe (Poaceae, Oryzeae), and it is an aquatic genus with a discontinuous distribution between eastern Asia and North America (Wen, 1999; Wu et al., 2006). Of the four species in the genus, two are field crops with their seeds harvested for food (the annual *Zizania palustris* native to North America)or enlarged young shoots as vegetable (the perennial *Zizania latifolia* native to Asia) (Oelke, 1993; Xu et al., 2010). Because the young shoots of *Z. latifolia* become swollen, soft, and edible after being infected by the obligatory smut fungus *Ustilago esculenta* and have high nutritional and economic value, it had been domesticated as a vegetable called Jiaobai which is now widely cultivated in the Yangtze River Basin (Guo et al., 2007; Ke, 2008). The cultivated *Z. latifolia* is highly domesticated. The phenotypic differences between wild and cultivated *Z. latifolia* are distinct: the domesticated plant usually has erected plant architecture, broader leaves and shortened internodes; and it could generate much larger swollen shoots containing few black spots (Guo et al., 2007). Once *Z. latifolia* is infected by *U. esculenta*, it loses the ability to undergo sexual reproduction *via* seeds because the fungus hinders the development of the flower primordium. Thus, the breeding of new accessions can only rely on the natural somatic mutants in tillers or rhizomes. In each generation, individuals with good-quality products (swollen shoots) should be screened out for clonal propagation to ensure the stability of agronomic traits of offspring (Guo et al., 2007). The plants that generate flowers or those with too many black spots in the swollen shoots are removed during harvest. Although precautions have been made, unfavorable mutants always occur in the fields, which may be attributed to the high variability of the smut fungus (Guo et al., 2007). To date, there are two main ecotypes including more than 100 local varieties with distinct phenotypic differences. One ecotype is a single-season plant that only can be harvested once each year in the fall (from August to November). This ecotype is a strictly short-day plant that can produce swollen shoots only when the days become short in fall. The other is a double-season crop that is planted in the spring and then can be harvested twice in the fall and the summer thereafter, and it is insensitive to light and endemic to the Yangtze River Basin. The single-season ecotype is taller and has fewer underground rhizomes, lower yield, and a wider distribution relative to the double-season ecotype.

Historical records can help to trace the domestication process of *Z. latifolia* in China (Ke, 2008). The oldest record showed that grains of *Z. latifolia* had been harvested to offer as tribute to the nobles in the Zhou Dynasty (from 771 to 221 BC). Until the Tang Dynasty (618–907 AD), *Z. latifolia* was famous for the tender taste of its grains. During the period between the Zhou and Tang Dynasties, the swollen shoots were rarely reported to be used for food. Only in the low reaches of the Yangtze River was it occasionally eaten by local residents. After the Tang Dynasty, people no longer harvested the grains of *Z. latifolia*. Alternatively, *Z. latifolia* was domesticated to be an aquatic vegetable (Guo et al., 2007; Chen et al., 2012; Chen et al., 2017). Although the utilization history of *Z. latifolia* has provided important clues, several blanks still remain. It was possible that *Z. latifolia* underwent pre-domestication during a long-time use as a food. In the meantime, *U. esculenta* acted as a plant disease during grain production. Then, the plant-fungus complex was domesticated. In another scenario, this vegetable was directly domesticated once or twice after the Tang Dynasty and then generated two main ecotypes with many cultivated varieties. However, these scenarios are just assumptions, and they still lack solid molecular evidence.

Cultivated *Z. latifolia* is the one and only crop that was domesticated in the form of a plant-fungus complex. Undoubtedly, its domestication was associated with the responses of the plant and the fungus to artificial selection. Thus, the cultivated *Z. latifolia* could not only provide a unique case of domestication of a plantfungus complex but also an ideal model to investigate changes in the genetic interactions of two closely linked species under artificial selective pressures. The plant and the fungus have been independently studied. Wild populations of *Z. latifolia* are widely distributed across eastern China. This plant is of importance as a genetic resource for breeding and forage (Chen et al., 2006). Many studies have been conducted on *Z. latifolia*, including investigation of its phylogeny (Ge et al., 2002; Guo and Ge, 2005; Xu et al., 2010), ecology (Yang et al., 1999), population structure (Xu et al., 2008; Chen et al., 2012), utilization as a tertiary gene pool of rice (Liu et al., 1999), nutritional value (Zhai et al., 2001), cultivar classification, and breeding (Guo et al., 2007). The phylogeny and cytology of *U. esculenta* have been investigated as well (Stoll et al., 2005; Zhang et al., 2012). The genetic interactions between plant and fungus were recently revealed by whole-genome sequencings and comparative transcriptome analyses (Guo et al., 2015; Wang et al., 2017; Ye et al., 2017; Zhang et al., 2017). However, the origin of cultivated *Z. latifolia* has seldom been genetically addressed and remains unknown.

This study focused on dissecting the genetic structure of *Z. latifolia* and the parasitic *U. esculenta*. Twelve microsatellites markers were used to investigate the genetic variations in *Z. latifolia* populations, and model simulations with approximate Bayesian computation (ABC) were then performed to infer the domestication history. The pattern of genetic variations for *U. esculenta* was identified based on ITS sequence analyses. By combining the genetic structures of host and endophyte, the domestication scenario of cultivated *Z. latifolia* could be unveiled.

# MATERIALS AND METHODS

# Plant Collection

Thirty-two natural populations of *Z. latifolia* were collected across the species distribution range in China (**Figure 1**, **Table 1**). in each population, the individuals were randomly sampled with an interval of 10 M between plants to reduce the chance of collecting samples from the same genotype. Fresh young leaves were collected from each plant and placed into plastic bags containing silica gel for dehydration. A total of 824 samples of wild *Z. latifolia* were finally obtained. The leaf samples of 135 landraces of cultivated *Z. Latifolia* (the detail information of landraces were attached in **Supplementary Table 2**) were provided by the National Aquatic Vegetable Germplasm Nursery of Chinese Academy of Agriculture Science, Wuhan.

# Infected Plant Collection and Fungi Isolation

Among the 32 wild populations, we found five individuals infected by the smut fungus *U. esculenta*. A total of 37 strains of *U. esculenta* were isolated from 5 wild accessions and 32 cultivars following the method described by Zhang et al. (2012). These strains were grown on potato dextrose agar (PDA) slant at 26°C and stored in a 10°C incubator.

# DNA Extraction and PCR Assays

Total genomic DNA was extracted following the protocol described by Song et al. (2006). Twelve microsatellites were selected from 100 Asian cultivated rice microsatellites (www. gramene.org) and 16 *Z. latifolia* specific microsatellites (Quan et al., 2009) (**Table S1**). The PCR products were labeled with fluorescent dyes using 5′-tagged forward primers (FAM, JOE,


N, sample size; Ar , allelic richness; Ae, number of effective alleles; Ho, observed heterozygosity; He, expected heterozygosity; F, fixation index; Ne, the effective population size; Ni, the number of individuals that genetically identical to cultivars; Q, the introgression coefficient; the significant bottleneck effect is labeled by Y (TPM model) and S (Mode shift). The F values that significantly deviated from 0 are shown in bold font.

and ROX) and GS350 as an internal size standard labeled with TAMRA. The PCR products were sequenced on an ABI 3730 (Applied Biosystems) automated sequencer. The resulting chromatograms were visualized and analyzed using GENEMAPPER v4.0 software (Applied Biosystems).

The ITS region of smut fungus was amplified using ITS1 and ITS4. PCR products were purified and sequenced on an ABI 3130XL capillary sequencer (Applied Biosystems). All ITS sequences were deposited into GeneBank, with accession numbers MK811211-MK811247.

### Genetic Data Analyses

The frequencies of putative null alleles were estimated using the software FREENA (Chapuis and Estoup, 2007). Null alleles were detected in 34 of 384 population-locus combinations, and the presence of null alleles was not associated with particular populations or loci. We did not find a significant bias in the estimations of *F*st based on the data of all loci compared to the estimate based on corrected genotype data for null alleles (*t*-test, *P* = 0.419). The parameters of population genetic variations were estimated by expected heterozygosity (*H*e), observed heterozygosity (*H*o) and Wright's fixation index *F* using GENALEX 6.5 (Peakall and Smouse, 2012). We also calculated the mean effective allele number (*A*e) and allelic richness (*A*r). Hardy–Weinberg equilibrium departures were tested using exact tests implemented in GENEPOP v4.0 for each population– locus combination. A global test across loci for departures from Hardy–Weinberg equilibrium was conducted using Fisher's method. Significant deviations were further evaluated using the sequential Bonferroni test.

BOTTLENECK (Piry et al., 1999) was used to screen each population for bottleneck signatures over the last two to four *N*e generations or in a recent period. A significant heterozygote excess or deficit due to a bottleneck was assessed by a twotailed Wilcoxon signed rank test with 1,000 iterations under the assumption of a stepwise mutation model. At the same time, the distribution of allelic frequencies was drawn to identify whether there had been a mode shift away from an L-shaped distribution, indicating recent population bottlenecks.

MIGRATE (Beerli, 2008) was used to estimate the effective population size *N*e. MIGRATE uses coalescent theory to jointly estimate the mutation scaled population size *θ* (4*N*e*μ*) over a long period of time (~4*N*e generations). Three runs were conducted using MIGRATE. First, two shorter runs (10 short chains of 10,000 sampled, 500 record and three final chains of 100,000, 5,000 recorded) were performed and used to verify that the MCMC estimated the parameters correctly. Then, a final run (10 short chains of 10,000 sampled, 500 recorded and three final chains of 500,000 sampled and 25,000 recorded) was performed, and *θ* values from this final run are reported. The initial run used an estimate of *F*st as a starting parameter to calculate *θ*. Each subsequent run used the ML estimates from the previous run as new starting parameters.

Principal coordinate analysis (PCoA), as implemented in GENALEX 6.5, was conducted to investigate the genetic divergence between cultivated and wild *Z. latifolia* plants. The genetic relationship between the cultivated and wild plants was also estimated using STRUCTURE 2.3.4 (Pritchard et al., 2000) with *K* = 2 and following the method described by Harter et al. (2004). If the proportion of an individual's genome with ancestry (the introgression coefficient *Q*) from the cultivated group was >50%, we defined that finding as a sign of cultivar ferality.

The degree of population divergence was quantified using *F*st with GENEPOP 4.0. A PCoA was performed with the wild populations only to investigate the pattern of genetic divergence within this species. The genetic structure was further explored using the Bayesian clustering algorithm implemented in STRUCTURE 2.3.4. The program was given no prior information on ancestral populations and run 10 times for each value of *K* ancestral populations, under the admixture model with correlated allele frequencies, using 200,000 Markov chain Monte Carlo iterations and a burn-in of 100,000 iterations. We inferred *K* using the *ad hoc* statistic Δ*K* (Evanno et al., 2005). The resulting matrices of estimated cluster membership coefficients were permuted with CLUMPP (Jakobsson and Rosenberg, 2007). The final matrix for each *K* value was visualized with DISTRUCT (Rosenberg, 2004).

The genetic distance (GD) matrix for all pairs of individuals and populations of wild *Z. latifolia* was estimated by using MSA 3.0 (Dieringer and Schlotterer, 2003). Unrooted neighbor-joining trees were constructed using the PHYLIP package 3.6 with 1,000 bootstraps. The relatedness between individuals and populations was visualized with the software Dendroscope 3.

Isolation by distance (IBD) among the wild populations was tested using the Mantel test implemented in GENALEX 6.5. The significance of IBD values was assessed using 9,999 permutations.

### Inference of the Domestication History of Cultivated *Z. latifolia*

We tested possible models of the demographic history of wild and cultivated accessions to infer the domestication history of cultivated *Z. latifolia* using Approximate Bayesian Computing (ABC) (Beaumont et al., 2002), which allows model choice and parameter estimation without the need to calculate the likelihood function of the models. Data sets are generated under the considered models and are reduced to a set of summary statistics. Simulated data sets with the summary statistics closest to those of the observed data are used to determine the posterior distribution of the parameters and the relative posterior probability of each model. DIYABC v2.0 (Cornuet et al., 2014) was used to compare different models and infer historical parameters. According to the geographical distribution and Bayesian clustering approach implemented in STRUCTURE, the *Z. latifolia* populations could be assigned into two genetic groups, and the southern group showed watersheds-related genetic subdivisions (see the Results section for more details). To reduce the complexity of scenario settings, hierarchical methods were applied following Lombaert et al. (2011). Competing scenarios first were evaluated between the two groups to identify the one with the highest posterior probability and then repeated within the supporting group among subgroups.

We then used an ABC approach to infer demographical parameters for the best scenario. The prior parameters and designs of the scenarios are given in **Table 2** and **Figures 3** and **4**. To construct the reference table, we calculated 10,000,000 simulated data sets for each competing scenario in DIYABC. The posterior probabilities of the competing scenarios were estimated by polychotomous logistic regression based on the 1% of data sets of the simulated reference table, according to the method described in Cornuet et al. (2014).


# The Phylogenetic Analysis of U. esculenta ITS Sequences

The ITS sequences were aligned with CLUSTAL W (Thompson et al., 1994) and adjusted manually. The number of polymorphic sites (*S*), haplotype diversity (*H*d), and genetic variation measured by average pairwise differences per base pair between sequences (*θ*) (Nei and Li, 1979) and Watterson's estimates (*θ*w) from *S* (Watterson, 1975), were calculated using DNASP version 4.10 (Rozas et al., 2003). Maximum parsimony (MP) searches were performed with 1,000 random taxon addition replicates followed by tree bisection-reconnection branch swapping in PAUP\* 4.0b10 (Swofford, 2002). Gaps were treated as missing data. Parsimony bootstrap (PB) for the clades was examined with 1,000 bootstrap replicates using the same options as above.

# RESULTS

# Genetic Variations in Z. latifolia

We analyzed plant genotypes for 824 wild and 135 cultivated accessions. Based on the 12 SSR loci, the 959 *Z. latifolia* accessions yielded a total of 138 alleles with 11.46 alleles per locus on average. The mean effective number of alleles (*A*), allelic richness (*A*r), observed and expected heterozygosity (*H*o and *H*e), and Wright's fixation index (*F*) are listed in **Table 1**. *H*e ranged from 0.167 (JXDA) to 0.588 (JSBY) among wild populations. The genetic diversity of cultivated accessions (*H*e = 0.242) was much lower than that of wild accessions (*H*e = 0.432). Deviations from Hardy-Weinberg equilibrium were detected in 90 of 384 comparisons after applying the sequential Bonferroni test, which might be due to inbreeding. The Wright's fixation indices varied greatly among populations (**Table 1**). Several populations (HLJHEB, HLJMDJ, JLDH, HBBD, SHYP, AHTL, JXDA, HuNCS, FJFD, GDYJ, GXNN, and YNJH) displayed signatures of heterozygote excess or deficit.

The PCoA indicated a distinct genetic divergence between wild and cultivated accessions. However, several putative wild accessions were identified as cultivars. This identification was further supported by the result of STRUCTURE analysis (*K* = 2, **Figures 2A**, **B**). The Δ*K* analysis indicated that the wild populations of *Z. latifolia* would demonstrate fine genetic structures with an increasing K value, which were shown in **Figures 2C**, **D**. The HuNCS and FJFD populations were thoroughly composed of plants with cultivar-like genotype. In addition, another 12 accessions from 3 wild populations (SHCM, ZJHZ, and YNJH) were identified as cultivars (**Table 1**). The neighbor-joining tree showed that the cultivars were genetically close to the populations in the lower reaches of the Yangtze River (**Figure S1**).

# Genetic Structure

The effective population size *N*e estimated by MIGRATE ranged from 100 to 312 (*μ* = 10−4, **Table 1**). Thirteen of 32 wild populations showed the evidence for historical or recent genetic bottlenecks, suggesting that the effects of genetic drift on these populations were prominent. The cultivars also showed a strong genetic bottleneck due to domestication (**Tables 1** and **2**).

After excluding the individuals with cultivar-like genotype (SHCM, ZJHZ, HuNCS, YNJH, and FJFD), *F*-statistics revealed that the wild populations of *Z. latifolia* were genetically divergent (*F*st = 0.320, Table S1). The Mantel test suggested a weak but significant sign of isolation by distance among *Z. latifolia* populations (**Figure S2**, *r* = 0.21, *P* < 0.05). The outcomes of a Bayesian assignment with STRUCTURE indicated that *K* = 2 is the best for describing the actual genetic structure of wild populations (**Figure 3A**). One group included the northern populations in the north and Northeast Plain of China, and the remaining southern populations formed another group mainly located in the Yangtze River Basin. However, the PCoA showed no distinct genetic divergence among wild populations, and the first three principal components described 28.6% of the total variances. In addition, the Bayesian assignment of the southern populations revealed watershed-related genetic subdivisions in the Yangtze River Basin: the Hongzhe-Tai Lake region (Cluster I), the Poyang Lake region (Cluster II) and the Dongting Lake region (Cluster III) (**Figure 4A**). Two southern populations located in the Pearl River Basin (GDYJ and GXNN) showed a close ancestry with the populations in the Poyang Lake region cluster (**Figure 4A**).

In particular, we tested the genetic divergence between two ecotypes of cultivars. By using the software STRUCTURE, we found no significant genetic divergence with *K* = 2. Instead, the PCoA revealed the genetic differences between ecotypes (**Figure S3**). The microsatellite marker ZM24 could be an effective molecular marker to distinguish two ecotypes of cultivars.

# Modeling the Demography of Z. latifolia Domestication

Following our hierarchical ABC strategy, we first compared the two predesigned scenarios in Analysis 1. Then the posterior parameters of the best-supported scenario were estimated, and the model checking was accessed between the posterior data set and observed data (**Table 2** and **Figure S4A**). Scenario A had a higher posterior probability (0.700) (**Figure 3B**), and it indicated that the cultivars diverged from the group of southern populations. In Analysis 2, 9 competing scenarios were further investigated (**Table 2** and **Figure S4B**). Scenario 7 showed the highest posterior probability (0.295), which indicated that the Hongze-Tai Lake region cluster firstly diverged from the other clusters in the southern group 1930 generations (t3) ago. The Poyang Lake region cluster and Dongting Lake cluster then diverged 1,710 generations (t2) ago. The cultivars were single-domesticated in the Hongze-Tai Lake region 1,400 generations (t1) ago (**Figure 4B** and **Table 2**).

# Sequence Variation and Phylogenetic Analysis

A total of 37 strains of *U. esculenta*, including 5 wild and 32 cultivated strains, were sequenced. The total length of the aligned sequences was 712 bp. Twenty-seven polymorphic sites were observed. Thirty-one haplotypes were found in the wild and cultivated accessions, and the haplotype diversity (*H*d) was 0.991. The nucleotide diversity (*θ*) was 0.01, and *θ*w was 0.01.

An ITS sequence of *U. alcornii* was downloaded from GeneBank and used as the outgroup taxon. Another ITS sequence of *U. esculenta* (Genebank accession number: JX219372.1) was used as

the control. The MP tree is shown in **Figure 5**. The structure of the MP tree showed significant divergence between the wild fungi and those from cultivars, indicating a single domestication event.

# DISCUSSION

The domestication of cultivated *Z. latifolia* is a typical example that can demonstrate how a plant-fungus complex rapidly evolved under artificial selections. Our results showed that there was prominent genetic divergence between wild populations and cultivated accessions after long-time cultivation. A single domestication event occurred in the lower reaches of the Yangtze River at the Tang Dynasty (approximately 1,400 years ago), then this vegetable was constantly cultivated and bred. The genetic divergence of the plant may be responsible for the differences between the ecotypes of cultivars, and the genetic variations in *U. esculenta* are likely associated with the diversification of cultivated varieties.

### The Possible Domestication Process of Z. latifolia

There are distinct phenotypic differences between wild and cultivated *Z. latifolia*. Compared with its wild ancestor, the cultivated plant usually has an erected plant architecture, broader leaves, and shortened internodes (Guo et al., 2007). The cultivated change from Ns to N1.

plants were reported to be vigorous in photosynthetic efficiency (Yan et al., 2013). These morphological and physiological changes together constituted the Domestication Syndrome of cultivated *Z. latifolia*, which would be referenced in distinguishing wild and cultivated plants.

We found significant genetic divergence between wild and cultivated *Z. latifolia* (**Figures 2A**, **B**). The extent of genetic differentiation between wild and cultivated plants (*F*st = 0.288) was even greater than the differentiation among wild populations (*F*st = 0.175). This phenomenon is usually observed in highly domesticated species such as wheat, rice or maize (Heun et al., 1997; Harter et al., 2004; Zeder et al., 2006; Zhao et al., 2013). Unlike the annual grain crops, the breeding of *Z. latifolia* relies on asexual reproduction. By utilizing natural mutants that occur in tillers or rhizomes, cultivated *Z. latifolia* was able to generate more than 100 varieties. Theoretically, when compared with annual crops bred *via* sexual reproduction, the evolutionary rate of cultivated *Z. latifolia* is relatively low. The low evolutionary rate and short domestication time (<2,000 years) should result in low genetic divergence between wild and cultivated plants, which is contrary to our finding. The unexpected high genetic divergence between wild and cultivated plants could also be attributed to the breeding system of cultivated *Z. latifolia*. Because cultivated *Z. latifolia* would occasionally degenerate to the wild phenotype, farmers had to screen out the elite plants as genets every year. Although the clonal reproduction of cultivated *Z. latifolia* greatly restricted new mutants stemmed from sexual propagation and blocked gene flows from wild accessions to cultivars, the annually strong artificial selection would somehow accumulate mutants, inducing significant genetic divergence between wild and cultivated plants.

Previous studies had inferred that the cultivated *Z. latifolia* may be domesticated in the Yangtze River Basin, but they did not provide any convincing evidence. (Xu et al., 2008; Xu et al., 2015). Using microsatellite markers and model inference, our study located the origin site of cultivated *Z. latifolia* in the areas around Hongze Lake and Tai Lake (**Figure 4**). By ABC modeling, the cultivated *Z. latifolia* diverged from ancestor populations 1,400 generations ago. Given 1 year per generation, the domestication event occurred 1,400 years ago at the Tang Dynasty. Furthermore, the genetic relationship of fungi hinted that the single-season ecotype might have been domesticated first, and then the doubleseason ecotype was bred from the single-season ecotype. These results were in accordance with the historical records (Ke, 2008).

Based on current molecular data, we could not be sure whether *Z. latifolia* underwent historical pre-domestication as a food crop. There is no historical record or remained landraces that can support the existence of pre-domestication. We never found the plants with any domestication traits similar to cereal crops in the wild, too. However, the vegetable cultivars of *Z. latifolia* could escape to natural habitats. We found typical escaped individuals in five provinces that were the main production areas of cultivated *Z. latifolia* (**Table 1**). Cultivated plants that cannot survive on their own in natural habitats often have maladaptive domestication traits (Campbell and Snow, 2009). Cultivated *Z. latifolia* has lost the ability to undergo sexual reproduction, which would

be unfavorable for persistence in the wild. Alternatively, feral *Z. latifolia* relies on asexual growth to reproduce in natural habitats. In fact, it often occupies the shores of ponds or streams near fields by clonal propagation *via* underground rhizomes and detached shoots, just as we had noticed in FJFD and HuNCS. Moreover, without pollen flow, it seems unlikely that wild populations would be introgressed by feral cultivars.

### Genetic Diversity and Population Structure of Wild Z. latifolia

Xu et al. (2015) had characterized the genetic diversity of the four species in the genus *Zizania* using 3 cross-specific amplified microsatellite markers. They found that *Z. latifolia* showed a relatively low genetic diversity relative to its annual relative *Z. palustris* in North America (*H*e = 0.374 vs *H*e = 0.630). However, they might have underestimated the genetic diversity of both species due to insufficient genetic markers. In contrast, another study investigated the genetic variations of *Z. latifolia* restricted to the middle reaches of the Yangtze River by using 10 microsatellites and found a high level of genetic diversity (*H*<sup>e</sup> = 0.532) (Chen et al., 2012). In our study, a relatively high genetic diversity (*H*e = 0.497) was detected.

Most of the fixation indices (*F*) for the high-latitude populations were significantly greater than 0 (**Table 1**), indicating a heterozygous deficit that could be attributed to inbreeding or outbreeding among ramets. However, several populations in low latitudes were composed by a few dominant heterozygous genotypes, which resulted in negative *F* values. This phenomenon might be attributed to local adaption or a rapid population expansion based on a few heterozygous ancients *via* clonal growth. For example, 10 years ago, a few ancestors floating down from upstream colonized and formed the population JXDA (Zhao et al., field surveys). Previous phylogeographic analysis suggested that the genus *Zizania* originated in North America and then dispersed into eastern Asia *via* the Bering land bridge during the Tertiary (Xu et al., 2010). Hewitt (2004) had proposed that the range-wide patterns of population genetic diversity are usually shaped by past climate-driven range dynamic. In support of this view, the genealogical pattern based on the nuclear Adh1a

gene of *Z. latifolia* indicated that high-latitude populations harbored more abundant haplotypes (Xu et al., 2008). However, in our study, the populations in high latitudes (the northern group, *H*e = 0.447) did not show a higher genetic diversity than those in low latitudes (the southern group, *H*e = 0.444).

Our results and previous studies all indicated strong genetic differences among populations (**Figure 2**). The Mantel test further revealed the effect of isolation by distance (IBD, **Figure S2**). IBD is expected to be the primary factor leading to a high level of genetic divergence, due to limited gene flow and the effects of genetic drift (Slatkin, 1987). This process was supported by our study on wild *Z. latifolia* populations. The aquatic habitat *of Z. latifolia* is discrete and patchy. These isolated wetlands restrict migration between populations, and as a consequence, aquatic-living plants usually exhibit a high level of genetic divergence among populations (Barrett et al., 1993). The dispersal of seeds or detached shoots by water flow is unpredictable, and these are unlikely to disperse *via* water currents between spatially highly isolated populations. In wind-pollinated species, the pollen can be carried for long distances, but most effective pollination occurs at a local scale (Willson, 1983). Nevertheless, the isolation between wetlands might be counteracted by seasonal flooding, resulting in a temporal connection within a water system. The genetic structure of *Z. latifolia* provided evidence that supported the effects of hydrological connectivity in shaping the genetic structure of aquatic plant species. Isolation by geographical barriers and genetic similarity along rivers was found in regional populations of *Z. latifolia* (Chen et al., 2012; Chen et al., 2017). The effects of hydrological unconnectedness on genetic relationship of populations within a water system were also reported in another emergent macrophyte, *Oryza rufipogon* (Wang et al., 2008).

The genetic assignments and morphological performances of two populations (GDYJ and GXNN) at the leading-edge seemed to be rather special (**Table 1** and **Figure 2D**). On one hand, these populations are cultivated-like in phenotype, but they showed no sign of *U. esculenta* infection; on the other hand, they are genetically different from cultivated *Z. latifolia*, and have heterozygous genotypes. Their small population size and low diversity might be attributed to the "Founder effect." Bayesian assignment showed they could be grouped to the Poyang Lake region cluster, implying a migration through the geographical barrier (Nanling Mountains).

## The Association of Plant and Fungus Under Artificial Selection

As an obligate parasitic fungus, *U. esculenta* is harmful to its host under natural conditions. The fungus not only inhibits the emergence of inflorescence but also requires a large amount of carbohydrates to form the swollen shoot attractive to herbivorous insects. *U. esculenta* infection is not very common in wild populations, as we have only collected 5 samples across 32 range-wide populations. In our common garden trials, the fitness of infected wild plants is significantly lower than that of healthy ones (Zhao et al., data not shown), which implies a weak competitive ability of the infected plants.

The Red Queen hypothesis proposed that the parasite and host should keep on adapting and evolving to gain reproductive advantages (Van Valen, 1973). The antagonistic co-evolution between species would finally be terminated either by the extinction of one species or the emergence of symbiosis. The domestication of infected *Z. latifolia* seemed to accelerate the process of achieving symbiosis. On the one hand, the persistent selection force of mankind delayed the development of the fungus and extended the mycolic stage (Zhang et al., 2017). On the other hand, humans gradually screened out plants with the genotype that adapted to cultivated conditions (Yan et al., 2013). Moreover, with the efforts exerted by mankind, the distribution range of this plant-fungus complex was largely expanded.

We detected minor but significant genetic differentiation between the hosts of two ecotypes of cultivated *Z. latifolia* (**Figure S3**), which was consistent with the pattern revealed by the sequences of Adh1 (Xu et al., 2008). However, the ITS sequences of *U. esculenta* strains from cultivated Z. latifolia showed unexpectedly high levels of diversity compared with a previous study (You et al., 2011).There might be inconsistent evolutionary rates in the parasite and host. The evolution of the host determines the responses to photoperiod and temperature which resulted in the differences between two ecotypes in cultivated *Z. latifolia* (Guo et al., 2015), and the faster evolutionary rate of the parasite may lead to the diversification of varieties (**Figure 5**, also see Ye et al., 2017).

# REFERENCES


Currently, it is still difficult to dissect the relative contributions from the host and parasite in shaping Domestication Syndrome. We could only infer the changes in morphological traits during the domestication process based on the phylogeny of cultivated *U. esculenta* (**Figure 5**). First, the single-season ecotype was domesticated from a wild plant-fungus complex. This ecotype retained a few primary traits, such as light sensitivity, long internodes, and small swollen shoot. Then, artificial selections on one of the single-season varieties led to the emergence of the double-season ecotype, which was insensitive to light and could be harvested twice. The double-season ecotype was more advanced than the single-season ecotype, but it could only be cultivated in a restricted region. There is an urgent need to improve the fitness of the double-season ecotype.

# DATA AVAILABILITY STATEMENT

All datasets for this study are included in the article/ **Supplementary Material**.

# AUTHOR CONTRIBUTIONS

YZ, ZS, JC and JR conceived the idea and designed the research project. YZ, QL and LZ collected the data. YZ performed the data analysis. YZ drafted the initial manuscript with contribution from JR and ZS. All the authors contributed critically to the discussion and edited the manuscript before submission.

# FUNDING

This study was supported by the National Natural Science Foundation of China (31600293) and the China Postdoctoral Science Foundation Grant (2015M571483).

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2019.01406/ full#supplementary-material


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Zhao, Song, Zhong, Li, Chen and Rong. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Unlocking the Genetic Diversity and Population Structure of a Wild Gene Source of Wheat, Aegilops biuncialis Vis., and Its Relationship With the Heading Time

*László Ivanizs1†, István Monostori1†, András Farkas1, Mária Megyeri1, Péter Mikó1, Edina Türkösi1, Eszter Gaál1, Andrea Lenykó-Thegze1, Kitti Szo˝ke-Pázsi1, Éva Szakács1, Éva Darkó1, Tibor Kiss1, Andrzej Kilian2 and István Molnár1,3\**

### Edited by:

Petr Smýkal, Palacký University, Czechia

### Reviewed by:

Shigeo Takumi, Kobe University, Japan Ilaria Marcotuli, University of Bari Aldo Moro, Italy

### \*Correspondence:

István Molnár molnar.istvan@agrar.mta.hu

†These authors share first authorship

### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 23 August 2019 Accepted: 01 November 2019 Published: 22 November 2019

### Citation:

Ivanizs L, Monostori I, Farkas A, Megyeri M, Mikó P, Türkösi E, Gaál E, Lenykó-Thegze A, Szo˝ke-Pázsi K, Szakács É, Darkó É, Kiss T, Kilian A and Molnár I (2019) Unlocking the Genetic Diversity and Population Structure of a Wild Gene Source of Wheat, Aegilops biuncialis Vis., and Its Relationship With the Heading Time. Front. Plant Sci. 10:1531. doi: 10.3389/fpls.2019.01531

1 Agricultural Institute, Centre for Agricultural Research, Martonvásár, Hungary, 2 University of Canberra,Diversity Array Technologies, Canberra, ACT, Australia, 3 Institute of Experimental Botany, Center of the Region Haná for Biotechnological and Agricultural Research, Olomouc, Czechia

Understanding the genetic diversity of Aegilops biuncialis, a valuable source of agronomical useful genes, may significantly facilitate the introgression breeding of wheat. The genetic diversity and population structure of 86 Ae. biuncialis genotypes were investigated by 32700 DArT markers with the simultaneous application of three statistical methods neighbor-joining clustering, Principal Coordinate Analysis, and the Bayesian approach to classification. The collection of Ae. biuncialis accessions was divided into five groups that correlated well with their eco-geographic habitat: A (North Africa), B (mainly from Balkans), C (Kosovo and Near East), D (Turkey, Crimea, and Peloponnese), and E (Azerbaijan and the Levant region). The diversity between the Ae. biuncialis accessions for a phenological trait (heading time), which is of decisive importance in the adaptation of plants to different eco-geographical environments, was studied over 3 years. A comparison of the intraspecific variation in the heading time trait by means of analysis of variance and principal component analysis revealed four phenotypic categories showing association with the genetic structure and geographic distribution, except for minor differences. The detailed exploration of genetic and phenologic divergence provides an insight into the adaptation capacity of Ae. biuncialis, identifying promising genotypes that could be utilized for wheat improvement.

Keywords: Aegilops biuncialis, genetic diversity, DArTseq markers, population structure, hierarchical clustering, heading time

# INTRODUCTION

The genome of wild relatives of common wheat (*Triticum aestivum* L.) can be considered as a potential reservoir of gene variants for wheat improvement (Schneider et al., 2008; Zhang et al., 2015; Kishii, 2019). Interspecific hybridization is a promising approach to enlarge the genetic diversity of cultivated bread wheat by the chromosome-mediated transfer of the wild alleles present in related species (Jauhar, 1993; Jauhar and Chibbar, 1999; Kishii, 2019). Goatgrasses (*Aegilops*),

1 **32** which comprise 11 diploid, 10 tetraploid, and 2 hexaploid species, are the closest relatives of *Triticum* (van Slageren, 1994). Seven different genomes (D, S, U, C, N, M, and T) were identified in the diploid species, indicating the extreme genetic diversity of the genus. Accessions of several *Aegilops* species are highly resistant to important cereal diseases (Friebe et al., 1996; Marais et al., 2006; Bansal et al., 2017; Olivera et al., 2018), while others shows good tolerance of abiotic stresses such as salt, drought, frost, and heat stress (Rekika et al., 1997; Zaharieva et al., 2001a; Zaharieva et al., 2001b; Molnár et al., 2004; Colmer et al., 2006; Dulai et al., 2014). Some alleles associated with these agronomic traits have already been introgressed from *Aegilops* into the wheat gene pool by the development of wheat-*Aegilops* hybrids and addition or translocation lines (Schneider et al., 2008; Marais et al., 2009; Liu et al., 2011a; Liu et al., 2011b; Olson et al., 2013). However, the genetic potential of *Aegilops* is still largely underutilized. In the case of biotic stresses, the 41 resistance genes that have so far been integrated into the wheat genome originated from only 30 accessions from 12 *Aegilops* species, most of them belonging to the primary gene pool of hexaploid wheat (Zhang et al., 2015). There are numerous *Aegilops* accessions in gene banks in various parts of the world (Monneveux et al., 2000) that have not yet been utilized for wheat improvement, so their introduction into breeding programs would be desirable.

The annual allotetraploid *Aegilops biuncialis* Vis. (2n = 4x = 28; UbUbMbMb) is largely autogamous and belongs to the tertiary gene pool of bread wheat (Friebe et al., 1996). *Ae. biuncialis* is native to Mediterranean and Western Asiatic regions and populations can be found in the Aegean, Turkey, Bulgaria, Cyprus, in the western part of the Fertile Crescent, in Cis- and Transcaucasia, and in the southern parts of Russia and Ukraine (van Slageren, 1994; Kilian et al., 2011). The annual rainfall in these habitats ranges from 225–1250 mm and some of them are characterized by a dry summer season with high temperature and high irradiance (van Slageren, 1994). The wide eco-geographical distribution suggests the presence of great diversity in the phenological traits of *Ae. biuncialis*. The heading time, as one of the main phenological factors, is crucial for the ecological adaptation of plants to local conditions. Early flowering, as an avoidance mechanism, may have a major role in the adaptation of plants to a Mediterranean climate by allowing them to escape drought (Araus et al., 2004). The adaptation strategy of wild emmer wheat populations to natural habitats that are characterized by frequent high temperatures in spring includes early flowering (Peleg et al., 2005), while adaptation to habitats with high altitudes involves late heading. Nevertheless, no data have been published on the intraspecific variation of heading time in accessions representing the broad ecological adaptability of *Ae. biuncialis*, which exhibits great genetic diversity.

The genetic diversity of wild species that are gene sources for cultivated wheat is a critical component of research in evolution, population genetics, conservation and breeding (Tahernezhad et al., 2010; Wang et al., 2013; Arora et al., 2017; Edet et al., 2018; Etminan et al., 2019; Singh et al., 2019). Little has been reported about the genetic diversity of *Ae. biuncialis*. Studies based on amplified fragment length polymorphism (AFLP), sequence-specific amplified polymorphism, random amplified polymorphic DNA, and inter-simple sequence repeat molecular markers were used to reveal the genetic variability within the genome of *Ae. biuncialis* accessions (Okuno et al., 1998; Monte et al., 2001; Nagy et al., 2006; Thomas and Bebeli, 2010), but Nagy et al. (2006) concluded that the marker data of only 10 accessions cannot demonstrate the existing genetic variability in *Ae. biuncialis*, so it is essential to investigate a larger diverse collection of genotypes with high marker density for the comprehensive study of genetic diversity. The 5–10 accessions so far assessed using a larger number of loci all originated from narrow geographical regions (Greece, the Iberian Peninsula or Transcaucasia), so do not represent the wide distribution of the species and do not allow an objective picture to be obtained of the genetic diversity existing within *Ae. biuncialis* (Okuno et al., 1998; Monte et al., 2001; Thomas and Bebeli, 2010). Moreover, different parts of the genome may have undergone different evolutionary changes (Kellogg et al., 1996). In the Triticeae, structural rearrangements frequently occur in the pericentric region of the chromosomes after polyploidization (Qi et al., 2006), as also revealed in *Ae. biuncialis* accessions (Molnár et al., 2011). Additionally, the frequency of recombination and the rate of interstitial deletions and insertions of gene loci is much higher in the distal third of the chromosomes than in the proximal two-thirds (Dvorak and Akhunov, 2005). As a consequence, only part of the genetic polymorphisms present in the species can be detected if the genome is not sufficiently covered by molecular markers.

Diversity Array Technology (DArT) was originally developed as a hybridization-based microarray platform to detect polymorphism at the recognition sites of methylation-sensitive restriction enzymes (Jaccoud et al., 2001; Wenzl et al., 2004). More recently, a combination of this genome complexity reduction approach of DArT technology with next-generation sequencing technologies resulted in a sequence-independent, low-cost, high-throughput genotyping by sequencing platform allowing the simultaneous detection of several thousands of polymorphic loci spread over the genome (Wenzl et al., 2004; Tinker et al., 2009; Sansaloni et al., 2011; Kilian et al., 2012). The advanced DArTseq technology has been efficiently utilized for genotyping, genetic diversity analysis, genome-wide association studies and linkage mapping in cultivated and wild relatives of wheat such as Triticale (Badea et al., 2011), rye (Bolibok-Brągoszewska et al., 2014; Gawroński et al., 2016), barley (Comadran et al., 2011), Tibetan wild barley (Cai et al., 2013), durum wheat (Baloch et al., 2017), *Aegilops tauschii* (Kumar et al., 2015), *Triticum monococcum* (Jing et al., 2009), and hexaploid wheat cultivars (Nielsen et al., 2014; Monostori et al., 2017).

In the present work, genetic diversity and its association with the phenotypic variation in heading time were studied in a collection of 86 *Ae. biuncialis* genotypes to obtain a better understanding of the background of its genotypic and phenotypic diversification. To achieve this goal, the genetic relationships between the accessions were analyzed using three statistical methods after the DArTseq genotyping of the plants. Furthermore, the phenological variation pattern was compared with the genetic structure and eco-geographical distribution to obtain information on possible correlations between them. The comprehensive study of how changes in phenological traits relate with genetic diversity will facilitate the utilization of the enhanced adaptability of *Ae. biuncialis*, especially in the light of climate change.

# MATERIALS AND METHODS

### Plant Material

Eighty-six wild *Ae. biuncialis* Vis. (2n = 4x = 28, UbUbMbMb) accessions, collected from 64 sites in 16 countries from Libya to Azerbaijan, were genotyped together with the Mv9kr1 wheat accession on a DArTseq® platform (**Table 1** and **Supplementary Table S1**). This population, originating from a wide range of ecological habitats, is representative of the geographical distribution of the species. The *Aegilops* genotypes were provided by the following germplasm collections: Genebank of the Agricultural Institute, Agrártudományi Kutatóközpont (ATK) (Martonvásár, Hungary), Wheat Genetics Resource Center (Kansas State University, USA), United States Department of Agriculture Agricultural Research Service (Beltsville, MD, USA), and Institute of Plant Genetics and Crop Plant Research (Gatersleben, Germany). The *A. biuncialis* accessions were maintained and grown in the Department of Plant Genetic Resources of ATK Mezőgazdasági Intézet. To avoid hybridization and intercrossing between the genotypes, the isolated accessions were multiplicated by self-fertilization.

# Field Experiments

The collection of 86 *Ae. biuncialis* accessions was grown under field conditions in three consecutive seasons (2015–16, 2016–17, and 2017–18). The seeds were planted in fall in fields belonging to the



The geographic origin of the Ae. biuncialis genotypes can be found in the Plant Inventory Books: USDA ARS, European Wheat Database and Genesys Accession browser. Climatic conditions at the site of origin of the Ae. biuncialis accessions, expressed according to the Köppen-Geiger classification system.

Agricultural Institute of ATK (Breeders nursery, Martonvásár, Hungary, geographic coordinates: 47°19'39"N, 18°47'01"E). The soil type at each location was chernozem. Each genotype was grown in a 6 m2 plot with 6 × 3 m rows, 50 seeds per row, and a row distance of 0.15 m, as previously described by Mikó et al. (2014). The heading time of each plot was recorded and defined as the period elapsing between January 1 and the day when 50% of the inflorescences reached the DEV59 developmental stage (Tottman, 1987).

Additionally, to monitor the stay-green ability of the plants, the leaf chlorophyll content of each *Ae. biuncialis* genotype was determined twice (on May 12 and 31, 2018) during the grainfilling period using a Soil Plant Analysis Development 502 cholorophyll meter (Minolta Camera Co., Ltd, Tokyo, Japan). At least 10 leaves were measured from randomly selected plants for each genotype on each occasion. For each leaf, the average of three SPAD readings around the midpoints of the flag leaf was taken.

# DNA Extraction and Genotyping

Genomic DNA was extracted from young leaves of 86 *Ae. biuncialis* genotypes together with hexaploid wheat (*Triticum aestivum* L.) genotype Mv9kr1 using Quick Gene-Mini80 (FujiFilm, Japan) with a QuickGene DNA tissue kit (FujiFilm, Japan) according to the manufacturer's instructions (Cseh et al., 2011).

The DNA of the *Ae. biuncialis* accessions and the Mv9kr1 wheat genotype was sent for a high-throughput genotyping commercial service to Diversity Arrays Technologies Pty. Ltd., Australia (http://www.diversityarrays.com). The "wheat DArTseq™ 1.0" genotyping by sequencing service was optimized for wheat. This method used the combination of complexity reduction methods developed initially for array-based DArT and multiplex sequencing in Illumina HiSeq2500 instrument (Sansaloni et al., 2011; Kilian et al., 2012). Briefly, DArT markers are DNA fragments obtained by the genome complexity reduction method contains the digestion of genomic DNA by *Pst*I/*Taq*I endonucleases, ligation of the genomic fragments to a *Pst*I adapter, amplification using a primer complementer with the adaptor sequence and transformation into *Escherichia coli*. DNA fragments obtained by this genome complexity reduction method were sequenced by Illumina HiSeq2500.

The markers developed by the above mentioned "wheat DArTseq™ 1.0" genotyping by sequencing platform have been included in **Supplementary Data S1**. The obtained silicoDArT marker set was filtered on the basis of individual marker-related statistics. After removing markers with inappropriate quality control parameters, including call rate <90%, reproducibility <95% and minor allele frequency < 2%, the 0/1 binary matrix of the remaining 32,700 markers was used in the subsequent analysis.

### Analysis of Population Structure and Phylogenetic Relationship

In order to assess the population structure of the *Ae. biuncalis* accessions, three different statistical methods were adopted and compared. First, a clustering approach based on the Bayesian model was applied to estimate the real number of subpopulations (K) using the admixture model of STRUCTURE software 2.3.4 with correlated allele frequencies (Pritchard et al., 2000). Three independent runs were performed for each hypothetical number of subpopulations (K) from one to eight applying a burn-in period of 100,000 iterations followed by 100,000 Markov Chain Monte Carlo iterations to obtain a precise parameter estimate. The most probable number of subpopulations was determined by means of the ΔK method using STRUCTURE HARVESTER software (Evanno et al., 2005). Each genotype was assigned to one subpopulation based on its membership probability. Principal coordinate analysis was then carried out using PAST software version 3.12 to visualize the genetic stratification within the *Ae. biuncialis* collection based on genetic correlations among individuals (Hammer et al., 2001). Thirdly, the phylogenetic relationship between the 86 accessions was estimated using PAST software with the neighbor-joining method based on marker data. The genetic distance matrix based on Jaccard's similarity coefficient was applied to construct a phylogenetic tree. The neighbor-joining dendrogram was generated using bootstrap analysis with 1,000 replicates. The wheat genotype Mv9kr1 was used as outgroup species in order to define the root position of the phylogenetic tree.

# Statistical Analysis

The phenotypic values of the *Ae. biuncialis* accessions were grouped based on the subpopulations obtained in the STRUCTURE cluster analysis. The variations in heading time among the subpopulations were analyzed for 3 years using descriptive statistics: mean, standard deviation, boxplot, and histogram (Excel 2016, Microsoft Company).

One-way analysis of variance (ANOVA) (SPSS 16.0, IBM) was conducted to compare the heading time of the *Ae. biuncialis* subpopulations in each year at the P < 0.05 significance level. In addition, the significant differences in the heading time of each subpopulation were examined among the 3 years by means of one-way ANOVA (SPSS 16.0). Bivariate correlation of the heading times of the *Ae. biuncialis* accessions among the 3 years was analyzed pair-wise by means of Pearson coefficient (SPSS 16.0). Linear regression analysis was carried out for each subpopulation among the years.

The principal component analysis of the Statistica 6 software was used to study associations between the heading times of the *Ae. biuncialis* accessions by visualizing the grouping pattern of phenotypic variations.

The stay-green trait (delayed foliar senescence) was determined by calculating differences between the SPAD values measured on the two dates. Pearson and rank correlation (Spearman, Kendall tau) coefficients were applied to analyze the dependence and relationship between the heading time and the differences in SPAD values at the P < 0.01 significance level (SPSS 16.0).

# RESULTS

### Population Stratification

A total of 47,777 polymorphic dominant silicoDArT markers were generated from 86 *Ae. biuncialis* accessions originating from different eco-geographical habitats (**Supplementary Data S1**). The silicoDArT marker set was filtered for various quality parameters (call rate, reproducibilty, minor allele frequency) and reduced to 32,700 markers. Based on the marker data, the genetic diversity in the *Ae. biuncialis* collection was analyzed using the Bayesian clustering approach performed with STRUCTURE software. After the STRUCTURE analysis had been run for K = 1 to K = 8, the most likely number of subpopulations was estimated with the STRUCTURE Harvester software following the ΔK method. The maximum ΔK value occurred at K = 5 (**Figure 1** and **Supplementary Figure S1**). Each genotype was assigned to one of the subpopulations based on a membership probability coefficient > 0.51.

The *Ae. biuncialis* accessions were grouped into five clusters A, B, C, D, and E with 6, 14, 3, 31, and 29 accessions, respectively. Three accessions had mixed allelic patterns that could not be assigned to any of the subpopulations with a probability of more than 50% (mixed group). The accessions clustered in accordance with their origin: the accessions in cluster A originated mainly from North Africa; cluster B represented genotypes from the Central Balkans, North Greece, and West Turkey; the three genotypes in cluster C originated from different geographic regions (Balkans and Near East); cluster D contained accessions from Asia Minor, the Crimean Peninsula, and Southern Greece (Peloponnese); while cluster E comprised genotypes from the Levant region (Jordan, Syria, and South Turkey) and Azerbaijan (**Figures 2** and **3**).

Based on the membership probability of genotypes, subpopulations A, B, and C could be clearly separated, whereas D and E were less distinct due to their similar share in the genome of several accessions (**Figure 2**). Subpopulation A was differentiated and sub-divided according to the membership probability of each cluster: Three of the six accessions (AE78689, AE84388, and AE84484) segregated into a homogeneous subgroup, but the

other three formed a heterogeneous subgroup. Subpopulation A appeared in different proportions in the genome of the two subgroups. Subpopulation D contributed to the gene pool of some accessions in subpopulation B whereas the B subpopulation was also detectable in several genotypes in subpopulation D. These accessions within the two subpopulations occupy a common eco-geographic area in Greece and West Turkey (Aegean Sea region), indicating admixture events between subpopulations B and D. In subpopulation C, other subpopulations also significantly contributed to the genome of two accessions (TA2782 and MvGB377). Subpopulation B was revealed in genotype TA2782 from Kosovo, whereas subpopulation E was detected in the genome of MvGB377, originating from Jordan. The third, Syrian genotype (MvGB642) had a 97% membership probability in cluster C.

### Analysis of Genetic Diversity

To refine the genetic relationship between the *Ae. biuncialis* accessions, the genetic similarity within the population was calculated based on Jaccard's coefficient. The average genetic similarity between all 86 genotypes was estimated to be 0.45 and ranged from 0.28 to 0.96, indicating that the *Ae. biuncialis* population had large genetic variability (data of the genetic similarity matrix not shown). The phylogenetic dendrogram split the *Ae. biuncialis* collection into five separate branches, which had a good fit with the subpopulations (A-E) revealed by STRUCTURE analysis (**Figure 2**). The pair-wise genetic similarity of the accessions was studied within each subpopulation, of which cluster D had the lowest average value (0.51) and the widest range (0.41–0.96), indicating its broad genetic diversity. The *Ae. biuncialis* accessions were genotyped together with wheat accession, Mv9kr1 which was used as an outgroup accession for constructing the phylogenetic dendrogram for the *Ae. biuncialis* collection.

Based on the STRUCTURE and neighbor-joining analyses results, the subpopulations could be further divided into subgroups (or intra-cluster lineage), where genotypes from common areas share similar genetic ancestry and show high genetic similarity (**Figure 2**). This intra-cluster differentiation was in agreement with the provenance of the genotypes within the larger geographic regions. Cluster B could be separated into subgroups originating mainly from Bosnia and Herzegovina, the central area of the Balkans and coast of the Aegean Sea (North

Greece and Northwest Turkey). Subpopulation D was divided into clades springing from Asia Minor, Crimea and region of the Aegean See (South Greece and West Turkey). Based on their genetic similarity, three accessions from Cyprus and Syria (MvGB379, PI483007, and PI483013) could be discriminated clearly in subpopulation D as showing a close genetic relationship with subpopulation E. The latter could be partitioned into distinct subgroups representing their geographic origin, such as Jordan, Azerbaijan, Syria and the southern part of Turkey.

Principal coordinate analysis was used as an alternative way of analyzing and visualizing the population structure. The first three principal coordinates explained 38.428% of the genotypic variance (PCo1: 21.46%, PCo2: 10.81%, PCo3: 6.15%), and also discriminated the accessions of clusters A, B, D and E, confirming the STRUCTURE result (**Figure 4**, **Supplementary Figures S2** and **S3**).

# Phenotypic Evaluation of Heading Time

The heading time, as a phenological trait, was grouped according to the subpopulations obtained by cluster analysis (**Supplementary Table S2**). One-way ANOVA was carried out in order to study the differences between the heading time of the five subpopulations within each year, and to analyse the differences in each subpopulation among the 3 years. Although subpopulations A and C had much smaller sample size than the others, they were not excluded from the statistical analysis.

The heading time within each subpopulation differed not significantly among 2016 and 2017, while the heading time values in the 2018 season were considerably shorter than in the other 2 years (**Figure 5** and **Supplementary Figure S4**),

subpopulations (A–E) in the population.

which was confirmed by the correlation analysis (**Figure 6**). The difference could be explained by the very different precipitation and temperature conditions during the spring months that are critical for the flowering period (**Supplementary Figure S5**). One-way ANOVA of the heading time within 1 year pointed out significant differences between four subpopulations (A, B, D, and E) in the *Ae. biuncialis* collection (**Figure 5** and **Supplementary Figure S4**). The subpopulation C was not separated from those of subpopulations D and E; and in 2018 the subpopulation C differed not from those of A, D, and E.

To visualize the correlation between genetic diversity and phenotypic variation, principal component analysis was performed for the heading time of the *Ae. biuncialis* accessions (**Figure 7**, **Supplementary Figures S6** and **S7**). The first two components explained 97.29% of the phenotypic variance and grouped the majority of genotypes into four phenotypic categories, which corresponded with the genetic structure and geographic distribution of the population in spite of the fact that they showed not clearly differenctiation (**Figure 7**). Since the third principal component represented only 2.72% of the variation,

FIGURE 5 | Boxplot chart of the heading times of the Ae. biuncialis subpopulations (A–E) showing the mean, median and range of the phenotypic data in 3 years. The heading time trait was expressed in number of days elapsed from January 1 to the DEV59 developmental stage. Different letters indicate significant differences between the subpopulations within the same year at P < 0.05, using one-way analysis of variance. \*Indicates the heading time in 2018 differs significantly from the others within relevant subpopulation at the P < 0.05 level.

FIGURE 6 | Correlation of the heading times of Ae. biuncialis accessions among three different seasons. Heading time was defined as the number of days required to reach the DEV59 developmental stage from January 1. Color codes indicate different subpopulations (A–E) identified from STRUCTURE analysis. Number of accessions (n), clustered into one subpopulation, are represented in the figure. The R2 values of pair-wise correlation for each subpopulation among different seasons are included in the figure.

it was not taken into account in the phenotypic classification. Subpopulations A and E correlated to group I and group II, respectively, exhibiting the earlier heading phenotypes, whereas subpopulation D largely coincided with the intermediate group III. Subpopulation B corresponded the group IV, which had the latest heading date (**Figure 7**). The heading time of subpopulation C could not be classified in a separate phenotypic group, which was in accordance with the results of one-way ANOVA.

### Analysis of Delayed Foliar Senescence

The relationship between delayed foliar senescence (stay-green trait of plants) and the heading time was also investigated. At the first measuring date (May 12), the average SPAD values of the genotypes ranged between 42 and 61, indicating that these plants were still green (**Supplementary Table S3**, **Supplementary Figures S8** and **S9**). In the second phase (May 31) SPAD values could not be recorded for seven *Ae. biuncialis* accessions, because the green color intensity of their leaves was below the detection level due to their advanced maturity. For the other genotypes, a significant correlation was revealed between the heading time and the difference in SPAD values based on the Pearson and rank coefficients. This revealed eight genotypes which had early heading times, but exhibited the small differences in SPAD values that are characteristic of the stay-green phenotype (**Supplementary Table S3** and **Supplementary Figure S10**).

### DISCUSSION

### Genetic Structure of the Aegilops biuncialis Collection

The present study is the first report on the use of the DArTseq platform to estimate the intraspecific genetic diversity of *Ae. biuncialis* on accessions representing the natural distribution area of the species. Three statistical methods gave consistent results for the genetic diversity and population structure: The *Ae. biuncialis* accessions clustered into five subpopulations in accordance with their place of origin. Three genotypes (MvGB380, TA2784, and TA2659) could not be classed into any of the distinct subpopulations based on their membership probability and these showed the lowest genetic similarity to the other accessions, indicating that this mixed group was the most distant genetically from the other genotypes in the population.

Based on random amplified polymorphic DNA markers, *Ae. biuncialis* accessions from Transcaucasia were classed by Okuno et al. (1998) in two clusters, which corresponded with groups from the eastern shores of the Black Sea and the western shores of the Caspian Sea, while *Ae. biuncialis* subpopulations from the central and southern parts of the Iberian Penninsula were clearly discriminated by AFLP markers that correlated with their geographical distribution (Monte et al., 2001). Based on simple sequence repeat, AFLP, and single nucleotide polymorphism markers, *Ae. tauschii* genotypes were grouped in two lineages, showed clear differentiation (Pestsova et al., 2000; Mizuno et al., 2010; Jones et al., 2013; Wang et al., 2013; Arora et al., 2017; Singh et al., 2019). Large range of genetic similarity (0.61 to 0.99) was reported for *Ae. tauschii* by Sohail et al. (2012), who used DArT markers to study the genetic relationship between 81 genotypes representing the eco-geographic distribution of the species. In the present study the *Ae. biuncialis* collection had a relatively wide genetic variability (0.28 to 0.96). When comparing the genetic diversity of various *Aegilops* species, Kilian et al. (2007) found lower genetic similarity in *Aegilops speltoides* (0.656) and *Ae. tauschii* (0.75) than in *Aegilops sharonensis* (0.84), *Aegilops searsii* (0.825), and *Aegilops longissima* (0.812). Species in the *Sitopsis* section, except for *Ae. speltoides*, have limited distribution in the Levantine, representing a smaller geographical area in the eastern Mediterranean Basin (van Slageren, 1994; Mendlinger and Zohary, 1995; Millet, 2007; Kilian et al., 2011). The narrow geographic range of *Ae. sharonensis*, restricted to the coastal areas of Levantine, could explain the high average genetic similarity (0.82) estimated between the genotypes (Olivera and Steffenson, 2009; Olivera et al., 2010). AFLP marker analysis on a set of *Aegilops geniculata* genotypes revealed that accessions originating from the main regions of the Mediterranean Basin could be classed into seven groups (Arrigo et al., 2010). Six of the seven groups had a strong bio-geographical structure, derived from the northern parts of the Mediterranean, whereas the remaining one exhibited less genetic structuring, representing the southern Mediterranean areas, including North Africa, Levantine, and the Bosphorus Strait. In contrast, the *Ae. biuncialis* population in the southern and eastern Mediterranean regions could be divided into distinct subpopulations, where subpopulation A occupied North Africa, subpopulations E and C were characteristic of Levantine and subpopulation D was located mainly in Asia Minor. The *Ae. biuncialis* collection, fragmented in the eastern part of the Mediterranean Basin, showed a distinct phylogeographical pattern, indicating the extremely large genetic diversity of the species.

The subpopulations of the *Ae. biuncialis* collection could be further divided into subgroups, resulting in intra-cluster differentiation, which may explain the admixture events between the genotypes (**Figure 2**). Three of the six accessions in subpopulation A had a close genetic relationship, suggesting that the higher rate of gene flow between them may have resulted in a homogeneous subgroup. They differ phenotypically from the rest of the subpopulation as they have an earlier heading date. As this homogeneous subgroup is derived from the northwest part of Libya, they were presumably isolated from the other members of the subpopulation while adapting to the arid climate. The natural distribution areas of certain accessions, representing an admixture of subpopulations B and D, overlap in the region of the Aegean Sea, suggesting that the genome shared by the two subpopulations was formed due to the intercrossing of parental genotypes derived from the different subpopulations. Three accessions (MvGB379, PI483007, and PI483013) were grouped in subpopulation D based on their membership probability, whereas they also showed a closer phylogenetic relationship with some Syrian genotypes from subpopulation E. As both subpopulations have made similar contributions to the genome of these accessions, their genetically similar germplasm and recent common ancestor suggest that the two clusters have not yet fully separated phylogenetically or that intraspecific hybridization has occurred between them. Subpopulations B and C contributed in similar proportions to the genome of the accession TA2782 originating from Kosovo, postulating an earlier admixture event between ancestral genotypes belonging to different subpopulations. Although three genotypes are not enough to represent the native distributional range of subpopulation C, the TA2782 accession seems to have developed outside this area and may have reached Kosovo as a result of migration. Genotypes AE75182 and TA10058 have close genetic similarity (0.95) in spite of being from distant regions. TA10058 originated from Azerbaijan, but was found well away from the natural distribution area of its subpupolation, so it might be an introduced accession. Herbivores, as vectors, can support the long distance dispersal of plants (Janzen, 1984; Vellend et al., 2003; Römermann et al., 2005), which is enhanced by human activities, such as pastoralism (Blondel, 2006). On the other hand, humancaused climate change may have an impact on the expansion of numerous species (Parmesan and Yohe, 2003; Thomas et al., 2004; Hickling et al., 2006; Jarvis et al., 2008; Devictor et al., 2012). The occurrence of two genotypes (PI614609 and PI614611) in the Crimean Peninsula corresponds to the prognosis that predicts a probable shift in the natural range of *Ae. biuncialis* to the Azov Sea in response to global warming (Ostrowski et al., 2016).

The large genetic diversity of *Ae. biuncialis*, enabling its adaptation to various climatic conditions (Zaharieva et al., 2004; Zaharieva and Monneveux, 2006), may serve as a rich source of potential genes to improve the adaptive capacity of cultivated wheat. It has been reported that some *Ae. biuncialis* accessions have good tolerance of heat and drought stress (Molnár et al., 2004; Dulai et al., 2014) and others are promising for resistance to rust diseases (Molnár-Láng et al., 2014; Olivera et al., 2018).

### Phenotypic Diversity Pattern

Based on one-way ANOVA and correlation analysis (**Figures 5** and **6**), the heading times differed among subpopulations similarly in each year in spite of the fact that the heading times of all subpopulations was shorter in 2018 than in 2016 and 2017. The extrem weather conditions, observed in the 2018 season, can explain the low level of correlation and shorter phenotypic values. Significant differences in heading date was found between four subpopulations (A, B, D, and E), corresponding with genetic relatedness and geographic origin with the exception of subpopulation C, which could not be separated from subpopulations D and E (**Figures 5** and **7**). In subpopulation C, the variation pattern of the heading date suggested that the admixed genomes of these accessions represented a diverse gene pool, and had different geographic origin. However, subpopulation C contained relatively few accessions, so no additional conclusions can be drawn on its phenotypic diversity.

Subpopulations with different phenotype are originating from eco-geographic regions that can be characterized by different agro-climatic conditions. Subpopulation A from North Africa had early phenotype which is prevalent in the semi-arid area, whereas subpopulation B and D from the Balkans and Asia Minor exhibited later heading times, which are predominant in the Continental and Mediterranean regions, respectively. The phenological trait patterns of the subpopulations correlated largely with the climatic conditions found in the relevant geographical areas, suggesting that the heading time diversification might be an adaptive change for *Ae. biuncialis*. The broad genetic diversity and its association with adaptive changes in heading date allowed the species to spread widely. Nevertheless, genealogically-based chloroplast DNA analysis could lead to a more comprehensive identification of the ancestral and derived sublineages and a better understanding of the course of the geographic spread of *Ae. biuncialis*.

The *Aegilops triuncialis* accessions in California, which evolved due to the independent introduction of a few Eurasian genotypes (Peters et al., 1996), were grouped using microsatellite markers into three lineages, each of which had different eco-geographical distribution and showed significant differences in flowering time (Meimberg et al., 2006; Meimberg et al., 2010). However, the phenotypic diversity pattern of the invasive lineages cannot express the intraspecific variation for flowering time because the introduced accessions represent only a part of the natural distribution area of *Ae. triuncialis*.

The correlation between the population structure and flowering time diversification in a collection of 200 Ae*. tauschii* accessions was studied by Matsuoka et al. (2008) using chloroplast DNA geneological analysis. Two of the four major haplogroups (HG7 and HG16) were phenotypically heterogeneous, representing a large proportion (83.5%) of the population. When comparing the DArT marker-based genetic divergence and phenological variation pattern of *Ae. tauschii*, Matsuoka et al. (2015) found that the intraspecific lineage structure was associated with changes in the flowering trait. This finding has been supported by the present study, as the *Ae. biuncialis* subpopulations A, B, D, and E showed significant differences in the heading time. The broad intraspecific diversity of the heading phenotypes could be introduced into bread wheat by means of chromosome-mediated gene transfer in order to diversify the maturation time and prolong the harvest period.

Besides the heading time, the stay-green trait, which is associated with foliar senescence characters, was also determined in the *Ae. biuncialis* accessions (Bachmann et al., 1994; Thomas et al., 1996). Delayed senescence allows the leaves to maintain photosynthetic activity during the active grain-filling period, thus ensuring the incorporation of assimilates to the grain (Thomas and Howarth, 2000; Spano et al., 2003; Kobata et al., 2015). The stay-green phenotype is associated with the ability of plants to maintain grain development longer, resulting in better yields, especially under water-limited conditions (Borrell et al., 2000; Christopher et al., 2008). In the present experiments, seven of the 86 *Ae. biuncialis* genotypes had SPAD values below the detection limit, due to premature leaf senescence. Eight accessions that exhibited the stay-green phenotype with early flowering were identified as potential gene sources for wheat breeding programs aimed to improve drought tolerance. Terminal drought during the grain-filling period can be avoided through accelerated reproduction, while the crop yield may be increased due to the preservation of the green leaf area.

# CONCLUSIONS

This paper confirmed that the DArTseq genotyping approach can be used efficiently to investigate the genetic diversity and population structure of *Ae. biuncialis*. The population structure defined using a combination of three clustering methods was in agreement with the geographic origin of the accessions. The heading trait diversity was also found to correlate largely with the genetic structure and geographic distribution. The wide-ranging genetic diversity of *Ae. biuncialis* may make it a potential genetic source for wheat improvement. The large intraspecific variation for heading time offers opportunities for the maximization of grain yield by shortening the plant life cycle, resulting in the termination of the grain-filling period before the drought season. The comparative analysis of genetic and phenological diversity patterns enables the selection of *Ae. biuncialis* genotypes adapted to a wide variety of ecological habitats, which could be used to breed new wheat cultivars able to cope with extreme climatic changes.

# DATA AVAILABILITY STATEMENT

The SilicoDArT marker data of the *Ae. biuncialis* collection can be found in EBI https://www.ebi.ac.uk/biostudies/studies/S-BSST277.

# AUTHOR CONTRIBUTIONS

LI participated in the analysis of the phenotypic data and wrote the manuscript. IMon performed the cluster analysis and made valuable comments on the manuscript. MM and PM maintained the germplasm and phenotyped it for the heading time. AF, ET, EG, AL-T, KS-P, and ÉS contributed to the maintenance of the plant material and the field experiments. ÉD carried out the phenotypic analysis of SPAD. TK evaluated the heading time of the genotypes. AK helped in GBS data analysis. IMol planned and managed the whole study, and supervised the manuscript preparation.

# FUNDING

This paper was financed by a Marie Curie Fellowship Grant ('AEGILWHEAT'-H2020-MSCA-IF-2016-746253) under the H2020 framework program of the European Union and by the Hungarian National Research, Development and Innovation Office—NKFIH 116277, 112226, and 119387 and by the ERDF project "Plants as a tool for sustainable global development" (no. CZ.02.1.01/0.0/0.0/16\_019/0000827).

# ACKNOWLEDGMENTS

Special thanks to Tamás Árendás for providing the weather data measured between 2015 and 2018. The technical assistance of Fanni Tóth, Ildikó Könyves-Lakner and Ágnes Bencze is gratefully acknowledged. Thanks are due to Barbara Hooper for revising the manuscript linguistically.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2019.01531/ full#supplementary-material

# REFERENCES


translocations and a recombinant chromosome conferring resistance to stem rust. *Theor. Appl. Genet.* 122, 1537–1545. doi: 10.1007/s00122-011-1553-4


assessment of population structure and diversity in Aegilops tauschii. *Breed. Sci.* 62, 38–45. doi: 10.1270/jsbbs.62.38


D-genome genetic diversity and pinpoint the geographic origin of hexaploid wheat. *New Phytol.* 198, 925–937. doi: 10.1111/nph.12164


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Ivanizs, Monostori, Farkas, Megyeri, Mikó, Türkösi, Gaál, Lenykó-Thegze, Szo˝ ke-Pázsi, Szakács, Darkó, Kiss, Kilian and Molnár. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Domesticating Vigna Stipulacea: A Potential Legume Crop With Broad Resistance to Biotic Stresses

*Yu Takahashi1, Hiroaki Sakai2, Yuki Yoshitsu3, Chiaki Muto1, Toyoaki Anai4, Muthaiyan Pandiyan5, Natesan Senthil6, Norihiko Tomooka1 and Ken Naito1\**

1 Genetic Resources Center, NARO, Tsukuba, Japan, 2 Advanced Analysis Center, NARO, Tsukuba, Japan, 3 Kenpoku Agricultural Institute, Iwate Agricultural Research Center, Iwate, Japan, 4 Department of Agriculture, Saga University, Saga, Japan, 5 Agricultural College and Research Institute, Tamil Nadu Agricultural University, Thanjavur, India, 6 Agricultural College and Research Institute, Tamil Nadu Agricultural University, Madurai, India

Though crossing wild relatives to modern cultivars is a usual means to introduce alleles of stress tolerance, an alternative is de novo domesticating wild species that are already tolerant to various kinds of stresses. As a test case, we chose Vigna stipulacea Kuntze, which has fast growth, short vegetative stage, and broad resistance to pests and diseases. We developed an ethyl methanesulfonate–mutagenized population and obtained three mutants with reduced seed dormancy and one with reduced pod shattering. We crossed one of the mutants of less seed dormancy to the wild type and confirmed that the phenotype was inherited in a Mendelian manner. De novo assembly of V. stipulacea genome, and the following resequencing of the F2 progenies successfully identified a Single Nucleotide Polymorphism (SNP) associated with seed dormancy. By crossing and pyramiding the mutant phenotypes, we will be able to turn V. stipulacea into a crop which is yet primitive but can be cultivated without pesticides.

Keywords: plant domestication, wild species, legume, Vigna, mutant screening, seed dormancy, pod shattering, bulked segregant analysis

# INTRODUCTION

To feed the growing population in the world, we have to produce more food with less input. This is a challenging issue because of global climate change, limited water resource, and acquired resistance of pests and diseases against chemicals.

To achieve this, many scientists are now focusing on harnessing genetic diversity of genebank accessions including wild crop relatives and neglected crops (McCouch et al., 2013). One of the limitations to the above described issue is genetic vulnerability of modern cultivars. They have gone through strong bottleneck and often sacrificed resilience to biotic and abiotic stresses (Pingali, 2012). On the other hand, many wild species are well-adapted to ecological niche, which is often harsh to domesticated species (Zhang et al., 2018). Thus, utilizing the adaptability of such wild or semi-wild species will be a key to sustainable agriculture.

To fully exploit the genetic diversity of wild species, "*de novo* domestication" or "redomestication" are now proposed (Fernie and Yan, 2019). Until recently, the main idea of using wild genetic resources were to cross with cultivars to introduce resistant alleles. However, cross compatibility is often limited even between a cultivar and its close relatives. In addition, adaptation to a certain environment is often a complex trait with multiple genes involved. On the other hand,

### Edited by:

Eric Von Wettberg, University of Vermont, United States

### Reviewed by:

R. Varma Penmetsa, University of California, Davis, United States Robert Henry, University of Queensland, Australia

> \*Correspondence: Ken Naito knaito@affrc.go.jp

### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 30 August 2019 Accepted: 15 November 2019 Published: 06 December 2019

### Citation:

Takahashi Y, Sakai H, Yoshitsu Y, Muto C, Anai T, Pandiyan M, Senthil N, Tomooka N and Naito K (2019) Domesticating Vigna Stipulacea: A Potential Legume Crop With Broad Resistance to Biotic Stresses. Front. Plant Sci. 10:1607. doi: 10.3389/fpls.2019.01607

1 **45** domestication-related traits often arose with loss-of-function mutations in single loci (Doebley et al., 2006). Thus, it might be easier to introduce domestication-related mutations into wild species than to introduce adaptation-related alleles into crops.

Although the technologies of sequencing and editing genomes are expected to facilitate *de novo* domestication (Shapter et al., 2013; Li et al., 2018), we still believe that simple mutagenesis + forward phenotype screening could be the easiest way to do it. To apply CRISPR/Cas9 system, one has to sequence the whole genome and develop transformation system of the plant to be domesticated. In addition, genes involved in domesticationrelated traits are not well-catalogued except Solanaceae, Brassicaceae, and Poaceae (Østerberg et al., 2017). As such, mutagenesis followed by forward screening is currently the only practical approach to *de novo* domesticate most of the potentially useful wild plants, such as wild legumes.

Thus, in this study, we tried to domesticate *Vigna stipulacea* Kuntze by ethyl methanesulfonate (EMS) mutagenesis followed by phenotype screening. *V. stipulacea* inhabits mainly in South Asia, and has fast growth, short vegetative stage, and resistance to pests and diseases (Tomooka et al., 2014). The seeds are edible, and some local people cultivate it mainly as pasture but sometimes as food. However, less and less farmers use it because of the high labor caused by strong behavior of pod shattering (Tomooka et al., 2011). In addition, it retains seed dormancy, which needs to be reduced for uniform germination. To improve these traits, we screened and obtained mutants with reduced pod-shattering and with reduced seed dormancy. We also identified a SNP associated with seed dormancy in one of the obtained mutants by whole genome analyses.

### MATERIALS AND METHODS

### Plant Materials and Growth Condition

**Table 1** summarizes the materials tested in this study. Besides *V. stipulacea*, we used two or three accessions of three domesticated species, soybean [*Glycine max* (L.) Merr.], common bean (*Phaseolus vulgaris* L.), and cowpea [*Vigna unguiculata* (L.) Walp.], to evaluate pod shattering and seed dormancy. All the materials were provided by the NARO gene bank (https://www. gene.affrc.go.jp/index\_en.php). We cultivated *V. stipulacea* in a field or in a bucket with gardening soil in a greenhouse of our institute, Tsukuba, Japan (36.030577, 140.098021). We cultivated

### TABLE 1 | Plant materials.


three plants of the domesticated species in a bucket with gardening soil in a greenhouse of our institute, Tsukuba, Japan.

### Development of a Mutant Population

The process of mutant screening is shown in **Figure 1**. To increase mutation density, we adopted two cycles of successive chemical mutagenesis as described by Tsuda et al. (2015). We started with 12,000 seeds of *V. stipulacea* (JP245503). We first scratched the seed coat with a knife because it is hard and waterproof, treated the scratched seeds with 0.35% EMS solution for 8 h, and thoroughly washed them with distilled water (M1 seeds). The M1 seeds were incubated in a wet condition for 3 days and those with germination were sown. The M1 plants were cultivated in our greenhouse, where the temperature was kept above >20°C. Three pods per plant were harvested from 3,000 fertile M1 plants (M2 seeds). Of the harvested seeds, four per line (12,000 in total) were scratched and treated with 0.35% EMS solution as described above (M2M1 seeds). Germinating M2M1 seeds were sown and cultivated in a field, located in Tsukuba, Japan. Three thousand fertile plants were selected and three pods per plant were harvested (M2M2 seeds). Six seeds per M2M2 line were sown in a 1 L plastic pot filled with gardening soil and cultivated in the greenhouse until three pods per line were harvested (M2M3 seeds). Of the M2M3 lines, four lines with mutant phenotypes were selected and seven seeds per line were sown in a 7 L bucket filled with the gardening soil and cultivated in the greenhouse.

### Evaluation of Seed Dormancy and Pod Shattering

To evaluate pod shattering, we harvested 20 pods of each domesticated accession and three pods per M2M2 plant and counted the number of shattered pods. Before evaluation, harvested pods were left at room temperature for a month and then completely dried at 40°C for 24 h in an incubator.

To evaluate seed dormancy, 20 seeds were soaked in distilled water and incubated in a dark incubator at 25°C, and imbibed seeds were counted daily until the 3rd day, twice a week until the 14th day, and weekly until the 28th day. We replicated the measurements by seven times and performed Dunnett test to compare averages of each time point between the wild type and each mutant line.

### Morphological Observation

We observed the imbibing seeds to detect the imbibition start point. We also observed pod sclerenchyma (tissue with dead cells with thickened secondary wall) with a stereoscopic microscope (ECLIPSE Ci-L, Nikon, Tokyo, Japan). Three pods were collected from each individual and were sliced by a microtome (MTH-1, Nippon Medial & Chemical Instruments Co., Ltd. Osaka, Japan), and stained with Phloroglucinol-HCL solution (1 g phloroglucin in 50 mL ethanol + 25 mL concentrated hydrochloric acid).

### Whole Genome Sequencing, Assembly, and Annotation

We sequenced the whole genome of *V. stipulacea* (JP245503) with RSII sequencer (Pacific Biosciences, Menlo Park, CA), as we have done previously for azuki bean (Sakai et al., 2015). DNA

FIGURE 1 | Schematic of domesticating V. stipulacea. After two rounds of EMS treatment, 3,000 M2M1 plants were selected to collect M2M2 seeds (six per line). The 18,000 M2M2 plants were grown in a field, where pod shattering mutants were screened for. Three pods (~20 seeds) per plant was harvested, which were tested for seed-imbibition screening.

was isolated from 1 g of unexpanded leaves with CTAB method and purified with Genomic Tip 20/G (Qiagen K. K. Tokyo). The extracted DNA was sheared into 20 kb fragments using g-TUBE (Covaris, MA, USA) and converted into 20kb SMRTbell template libraries. The library was size-selected for a lower cutoff of 10 kb with BluePippin (Sage Science, MA, USA). Sequencing was performed on the PacBio RS II using P5 polymerase binding and C3 sequencing kits with 360min acquisition. In total, 52 SMRT cells were used to obtain ~19.6Gb of subreads.

In total, ~3.3 million PacBio reads were used for *de novo* assembly with Celera Assembler 8.3rc1 (asmOvlErrorRate = 0.1, asmUtgErrorRate = 0.06, asmCgwErrorRate = 0.1, asmCnsErrorRate = 0.1, asmObtErrorRate = 0.08, utgGraphErrorRate = 0.05, utgMergeErrorRate = 0.05) (Berlin et al., 2015). About 25.2x of the longest error-corrected and trimmed reads were assembled to contigs. Redundant contigs were discarded by conducting all-to-all BLASTN searches. The non-redundant contigs were polished by PacBio subreads by using Quiver in SMRT Analysis v2.2.0 (Pacific Biosciences of California, Inc.) and then further polished by Illumina shortreads using BWA 0.7.9a (Li and Durbin, 2009), Samtools 0.1.19 (Li et al., 2009), Picard 1.94 (http://picard.sourceforge.net/), and GATK 3.3 (McKenna et al., 2010). The polished contigs were scaffolded by Reference-Assisted Chromosome Assembly (RACA) program v.0.9.1.1 (Kim et al., 2013b) using the genome sequences of *Vigna angularis* and *P. vulgaris* as the reference and outgroup species, respectively.

Repetitive sequences in the genome assembly were predicted using Censor (Kohany et al., 2006) with a composite library consisting of *de novo* created library constructed by RepeatModeler 1.0.8 (http://www.repeatmasker.org) and the MIPS Repeat Element Database ver. 9.3 (Nussbaumer et al., 2013).

*Ab initio* gene prediction was done by BRAKER version 1.6 (Hoff et al., 2016) with RNA-Seq data. Besides, gene structures were predicted by genome-guided and *de novo* RNA-Seq data assembly approaches using TopHat 2.1.0 (Kim et al., 2013a), Cufflinks 2.2.1 (Trapnell et al., 2010), Trinity 2.1.1 (Grabherr et al., 2011), and PASA pipeline 2.0.2 (Haas et al., 2003; Rhind et al., 2011). Open reading frames were predicted by Transdecoder 2.0.1 and Trinotate 2.0.2 (Bryant et al., 2017). Protein sequences of the *G. max* (Wm82.a2.v1), *P. vulgaris* (v.1.0), *Medicago truncatula* Gaertn. (Mt4.0v1), and *V. angularis* (Willd.) Ohwi & H.Ohashi (VANGULARIS\_V1.A1) were downloaded from Phytozome (*G. max*, *P. vulgaris*, and *M. truncatula*) (Goodstein et al., 2012) and *Vig*GS (*V. angularis*) (Sakai et al., 2016) and mapped to the genome assembly by Exonerate 2.2.0 (Slater and Birney, 2005). The *ab initio* gene models and transcript and protein alignments were then combined by EvidenceModeler 1.1.1 (Haas et al., 2008) and the predicted gene models were updated by PASA. Gene models with extremely long introns and those merged by PASA were manually curated. BUSCO v3 (Waterhouse et al., 2017) was used to evaluate protein sequences of annotated genes.

# Bulked Segregant Analysis

We performed MutMap (Abe et al., 2012) to isolate the candidate gene for the mutant phenotype. We crossed one of the mutants of increased seed imbibition (*isi1*) to the wild type (JP245503) and obtained 280 F2 seeds. We cultivated all of them, extracted DNA from an unexpanded leaf of each F2 plants by CTAB method, and obtained F3 seeds. To evaluate seed dormancy, 20 seeds of each F3 line was soaked with distilled water for 24 h to test imbibition. DNA of 59 F2 plants with mutant phenotype (100% seed imbibition) and DNA of 221 F2 plants with wild type phenotype (0% imbibition) were pooled to make the MT pool and the WT pool, respectively, and were sequenced with Illumina HiSeq X (Illumina Co. Ltd, San Diego, USA).

The obtained sequences were mapped to the draft genome sequences with bwa-0.7.17 (Li and Durbin 2009) and formatted with samtools-1.9 (Li et al., 2009) as described in the distributed manuals. SNPs were called with bcftools-1.9, and the DP4 values were used to calculate the SNP-index. If a SNP site is responsible for the mutant phenotype, the locus must be fixed with the alternative allele [a] in the MT pool ([aa] shares 100%). On the other hand, in the WT pool, the expected ratio of the reference allele [A] and the alternative [a] is 2:1, because the WT pool [AA]:[Aa]:[aa] should be 1:2:0. Thus, at the responsible locus, the expected values of SNP-index are 1.0 in the MT pool and 0.33 in the WT pool. Thus, after calculating SNP-index, we screened all the SNP sites for those that met the criterion above.

### Genotyping by Amplicon Sequencing

We genotyped 280 F2 plants by Sanger sequencing for two SNPs whose SNP-index was 1.0 in the MutMap analysis. Two primer pairs were designed to amplify a 232 bp genomic region including 6,804,429 nt (5′-gagggaatacgaagagtttaaggtt-3′ and 5′-ttgaaaaccaggtcttttctctcta-3′) and to amplify a 243 bp genomic region including 7,009,873 nt (5′-acagagcaaaagattaaacgagaga-3′ and 5′-aaagccgcttcctagtccttac-3′) on scf0015. For Sanger sequencing, we amplified the template DNA with AmpliTaq Gold 360 Master Mix (Thermo Fisher Scientific K. K., Tokyo), performed sequencing reaction with BigDye Terminator v3.1 (Thermo Fisher Scientific K. K., Tokyo), and sequenced with ABI Genetic Analyzer 3130xl (Thermo Fisher Scientific K. K., Tokyo), according to the provider's protocol.

### RESULTS

### Phenotypes of Seed Dormancy and Pod Shattering in Domesticated Legumes

Though many studies have investigated seed dormancy in legume crops, there are still some arguments on the sites of water entry (Smýkal et al., 2014). Thus, we evaluated seed imbibition and pod shattering in soybean, common bean, and cowpea.

As a result, domesticated species showed variation in both imbibition start sites and time to imbibe (**Figure 2**). Among the domesticated accessions tested, soybean imbibed within a few minutes and was the quickest. In soybean, the whole testa seemed permeable. The second quickest was the cowpea "JP244182," where the testa was partially permeable and imbibition was observed in an hour. The common bean "JP41232" showed imbibition in 2 to 3 h, which started at the site of micropyle. It took 4 to 5 h for the common bean "JP41234" and the cowpea "JP239215," where imbibition started at the site of lens. All the seeds were fully imbibed within 48 h.

We also observed abscission layers between the valves of seed pods and the sclerenchyma on the endocarp, because pod shattering is dependent on these tissues (Murgia et al., 2017). In soybean, as described by Dong et al. (2014), abscission layer was not completely formed at the fiber cap cells (**Figure 3**). Abscission layer was even less completely formed in the common bean accession JP41963 whereas it was fully formed in all other

accessions (**Figure 3**). Pod sclerenchyma was thicker in the soybean accessions and the cowpea accession JP244182, whereas it was thinner in common bean accessions (**Figure 3**).

### Mutant Screening

To obtain mutants with reduced seed dormancy and reduced pod shattering, we treated 12,000 seeds of *V. stipulacea* with EMS and developed 3,000 M2M2 plants. Of the 3,000, we found one line which exhibited reduced pod shattering and designated it as *reduced pod shattering1* (*rps1*) (**Figure 4**). We also screened M2M3 seeds for seed imbibition and obtained three lines which exhibited increased seed imbibition and designated them as *increased seed imbibition1* (*isi1*), *increased seed imbibition2* (*isi2*), and *increased seed imbibition3* (*isi3*), respectively (**Figure 5**). Of them, the *isi1* seeds had cracks in the hilum, and the *isi3* seeds showed reduced pigmentation in the seed coat (**Figure 5**).

The mode of imbibition also varied across the mutants (**Figure 5**). In the *isi1*, the water entry was not necessarily through the cracks in the hilum but through the whole seed coat. Even though we sealed the hilum with glue, imbibition was quickly initiated [see the "*isi1* (sealed hilum)" in Figure 5]. In *isi2*, the imbibition was initiated at the lens. In *isi3*, the imbibition started near the hilum, but not necessarily through the micropyle or the lens.

### Mutant Phenotypes Regarding Seed Dormancy

To elucidate the extent of seed dormancy in the mutant lines, we put the seeds on wet filter paper and evaluated the rate of imbibed seeds (**Figure 6**). We also tested the seeds of 1 and 6 months old to elucidate the effect of duration time after harvesting. When we did the experiment with the wild type seeds, the imbibition rate was zero for at least 4 weeks in 1-month-old seeds and was 11% in the 6-month-old seeds (**Figure 6**).

When we tested the mutant seeds of 1 month old, the imbibition was quick in *isi1* and *isi2*, whereas it was slow in *isi3*. In *isi1*, all the seeds fully imbibed within 1 h (**Figure 6**). In *isi2*, 85% of the seeds imbibed within a week, and almost 100% did so within 4 weeks. In contrast, the imbibition rate of the *isi3* seeds was only 16% in 4 weeks, which was not significantly different from the wild type. The imbibition rate of the reduced shattering mutant *rps1* was also higher but not significantly different compared to the wild type (**Figure 6**).

When we tested the mutant seeds of 6 months old, the overall imbibition rate was increased compared to the experiment with 1-month-old seeds (**Figure 6**). In *isi1*, the seeds also fully imbibed within 1 h. In *isi2*, the imbibition rate was already significantly different from the wild type in 1 day, more than 90% of the seeds imbibed within a week, and reached plateau (~95%) in two weeks. In *isi3*, 20% of the seeds imbibed in 3 days, 36% in a week, and 71% in 4 weeks. However, this line exhibited a large standard deviation between the replicates and was not significantly different before 2 weeks (**Figure 6**). In *rps1*, ~20% of the seeds imbibed in 4 weeks, which was twice as high as the wild type but not significantly different from the wild type was not significant.


pod sclerenchyma with bright red while it stains abscission zone with dark red. Soybean accessions have brightly-stained sites at the tip of fiber cap cells (arrows), which indicates abscission layer is not completely formed.

# Mutant Phenotypes Regarding Pod Shattering

To evaluate pod shattering in the mutant lines, we calculated the rate of shattering of the harvested pods which were completely dried in the incubator. Whereas the shattering rate was 100% in the wild type, it was 0% in the *rps1* mutant (**Table 2**). The *rps1* mutant also showed a reduced twisting of the seed pod. The number of twists/cm in the pods was 0.371 ± 0.018 in the *rps1*

mutant, which was less than half of the wild type (0.866 ± 0.022) (**Table 2**).

Interestingly, the *isi1*, one of the mutants of seed imbibition, also exhibited slightly reduced shattering rate (73.99 ± 18.53%) and number of twists/cm (0.579 ± 0.093) (**Table 2**). Other mutants were also slightly reduced in number of twists/cm, but their shattering rate was 100% (**Table 2**).

We also observed cross-sections of seed pods and found the *rps1* mutant did not form abscission layer between the valves at all (**Figure 4**).

### Whole Genome Sequence and Annotation of V. stipulacea

To build a reference sequence of *V. stipulacea,* we sequenced the genomic DNA with a PacBio sequencer, assembled, and annotated. We obtained 19.6 Gbp of subreads using 52 SMRT cells. Genome size of the *V. stipulacea* was estimated to be ~445.1 Mbp based on the k-mer frequency distribution obtained from 10.3 Gbp of the Illumina short-reads. The assembled contigs showed N50 length of ~1.9 Mbp and mean length of ~169.8 kbp and covered ~387.7 Mbp (87.9%) of the estimated genome size


FIGURE 5 | Increased seed imbibition mutants of V. stipulacea. Time after watering is indicated at the bottom-left of each photo. Arrows indicate where water entry was initiated.

FIGURE 6 | Rate of imbibed seeds in the mutant plants over time. Twenty seeds of 1 and 6 months old were soaked in distilled water and number of imbibed seeds were manually counted twice a week for 4 weeks. The error bars indicate standard deviation of replicated evaluations (n = 7). Asterisks indicate that the mean values are significantly different from the wild type (\*\* for p < 0.01 and \* for p < 0.05).

TABLE 2 | Phenotypic data of mutant lines.


(**Table 3**). Out of the 2,102 scaffolds, 52 scaffolds were comprised of 233 contigs scaffolded by RACA.

By combining the *ab initio* gene models and transcript and protein alignments, 26,038 protein coding genes were predicted on the assembled genome (**Table 3**). We assessed the completeness of the gene set by BUSCO and found that 95.8% of the BUSCO gene models were completely detected (**Table 3**). The BUSCO metrics was almost the same as that assessed for TABLE 3 | Stats of the assembled genome sequence of V. stipulacea.


the recently published high quality genome sequence of cowpea (Lonardi et al., 2019), suggesting that our genome assembly covered nearly complete gene space of the *V. stipulacea* genome.

### Mapping a Locus for isi1 Phenotype

To identify the responsible genes for the mutant phenotype, we performed MutMap analysis. Of the 280 F2 plants derived from *isi1* x WT, 221 showed the wild type phenotype and 59 showed the mutant phenotype, which was close to the expected 3:1 ratio (*p* = 0.13). We sequenced the genomic DNA of the wild type plants (WT pool) and the mutants (MT pool) and identified 33,936 SNPs across the whole genome. We then calculated SNPindex of each SNP and screened for those where it was 1.0 in MT pool and 0.3 ± 0.1 in WT pool.

As a result, we found only two on scf0015, a C to T substitution at 6,804,429 nt and a G to A at 7,009,873 nt. The C to T SNP was present in the 8th exon of Vigst.0015s042600.01, which encoded CELLULOSE SYNTHASES A 7 (CesA7) protein, and turned the Glu codon into a STOP codon. The G to A SNP was present in Vigst.0015s044500.01, which encoded a nucleoporin-like protein, but was synonymous.

For further mapping, we looked for recombinants between the two candidate SNPs by direct sequencing of the SNPcontaining regions in all the F2 plants. As a result, one F2 plant, which showed the wild type phenotype, was heterozygous at Vigst.0015s042600.01 but was fixed with the mutant A allele at Vigst.0015s044500.01 (see #130 in **Supplementary Table 1**). Another F2 plant, which showed the mutant phenotype, was fixed with the alternative T allele at Vigst.0015s042600.01 but was heterozygous at Vigst.0015s044500.01 (see #251 in **Supplementary Table 1**). The result was further confirmed by genotyping and phenotyping F3 plants derived from the F2 plants with recombination between the two candidate SNPs (#130, #251, and #275 in **Supplementary Table 1**). As shown in **Supplementary Table 2**, the phenotype was completely linked with the SNP at scf0015\_6804429 on CesA7 gene but not with the other.

### DISCUSSION

In this study we demonstrated a practical approach to *de novo* domesticate a wild legume species *V. stipulacea* by chemicallyinduced mutations. We successfully obtained three independent mutations in seed dormancy and one in pod shattering. In addition, we identified a SNP in the candidate gene, which encoded CELLULOSE SYNTHASE A 7, in one of the three mutants of reduced seed dormancy.

We demonstrated that forward screening following mutagenesis was practical enough to screen for mutants in domestication-related traits. This could be easily predicted from the fact that legume crops have reduced seed dormancy and pod shattering in various manners (**Figures 2** and 3). This indicates that many genetic loci are involved in these traits, and loss-offunction mutation in any of them could cause such phenotypes. This is why we identified three mutants for seed dormancy and one for pod shattering from only 3,000 M2M2 lines (18,000 plants) (Figure 1). We also note that we obtained the mutants within 3 years since we initiated the first mutagenesis.

Of the three mutants for seed dormancy, *isi1* showed the severest phenotype and imbibition was always completed within hours (**Figures 5** and **6**). This was because of water-permeability in the seed coat, which were probably caused by the loss-of-function mutation in CesA7. CesA7 is involved in cellulose synthesis and its malfunction might have disrupted development of hilum and seed coat. The phenotype of cracked hilum had not been observed in other legume crops and makes the seed almost completely nondormant. However, the seeds might not be suitable for long-term storage, because the cotyledons are exposed to the air and could be easily oxidized. In addition, *isi1* showed slightly lower rate of pod shattering. This could be a pleiotropic effect of the mutation, because cellulose fibers are also important for pod shattering (Suanum et al., 2016). However, we cannot exclude the possibility that *isi1* contains other mutations involved in pod shattering.

Compared to *isi1*, *isi2*, and *isi3* showed milder phenotypes (**Figure 6**). Because *isi2* seeds imbibe through lens, the mutation disturbed the same pathway as in the common bean "JP41234" or the cowpea "JP239215" (**Figure 2**). However, it took much longer time for *isi2* to complete imbibition than for those cultivars. This might be because the domesticated accessions have accumulated multiple mutations involved in seed coat permeability (Jang et al., 2015; Sun et al., 2015). As for *isi3*, the seed coat color was retarded but imbibition did not always initiate in the seed coat. It sometimes did through the lens and sometimes through the micropyle. Given the water entry sites are basically stable in the cultivated accessions, the *isi3* mutant might be a novel phenotype of seed dormancy. Or, it is also possible that *isi3* have multiple mutations involved in seed imbibition, given we are currently aware of only the seed color in this mutant. If so, such mutant alleles (except the one involved in seed pigmentation) could be segregating and lead to variation of water entry sites and imbibition rate within the mutant line (**Figure 6**). Though *isi2* and *isi3* have only partial effect on seed imbibition, doublemutant of both might be able to complete imbibition within a week with smaller effect on long-term storability.

The *rps1* mutant almost completely lost the pod shattering behavior because of suppressed formation of the abscission layer between the valves (**Figure 4**). This phenotype was similar to the effect of the domestication-type allele of SHAT1-5 in soybean (**Figure 2**) (Dong et al., 2014). Thus, the mutation in *rps1* might be in a gene involved in the SHAT1-5 pathway. The responsible gene for *rps1* phenotype might be useful for improving shattering problem in other legumes because *rps1* phenotype was severer than soybean SHAT1-5. On the other hand, however, severe disruption in development of abscission layer could increase labor to thresh. In addition, though not significant, we repeatedly observed that the *rps1* mutant exhibited slightly increased seed imbibition compared to the wild type (**Figure 6**). Such pleiotropy, unless it has other mutations involved in seed dormancy, might be because secondary wall thickening plays important roles in shattering behavior in seed pod (Suanum et al., 2016; Parker et al., 2019; Rau et al., 2019; Takahashi et al., 2019) and water permeability in seed coat (Smýkal et al., 2014; Hradilová et al., 2019).

Last but not least, the technology of long-read sequencing (Eid et al., 2009) enabled us to obtain high-quality genome assembly of *V. stipulacea.* Although we did not use further scaffolding techniques such as optical mapping (Lam et al., 2012) or Hi-C (Burton et al., 2013), it achieved the N50 of nearly 2 Mbp and the BUSCO score of 96.8 (**Table 3**). The high-quality reference genome should be the reason for that we successfully identified the candidate SNP by a single resequencing analysis.

As described above, here we report that we have achieved the first step of domesticating the wild legume *V. stipulacea*. As reported in Tomooka et al. (2011), this legume showed resistance to broad range of pests and diseases in Tamil Nadu, India, and does not need pesticides to spray. Broad range of pests and diseases resistance of *V. stipulacea* was confirmed during the mutant screening procedure conducted in our experimental field in Tsukuba, Japan. Thus, if we cross and pyramid the mutant phenotypes obtained in this study, we will be able to make *V. stipulacea* easier to cultivate and easier to harvest, and

### REFERENCES


we believe it will be a step forward to accomplish low input agriculture. By screening for more useful traits from more mutagenized population, we hope this kind of "domesticated wild plants" will be popular and prevalent.

# DATA AVAILABILITY STATEMENT

The raw and assembled sequence data (DRA009127) are all available from DNA Data Bank of Japan (https://www.ddbj.nig. ac.jp/dra/index.html) or *Vig*GS (https://viggs.dna.affrc.go.jp/).

# AUTHOR CONTRIBUTIONS

NT and KN provided the idea of the study. YT, MP, NS, NT and KN planned the study. YT, CM, TA, NT and KN cultivated plants and collected data. YT, YY, HS and KN analyzed data. YT, HS, MP, NS, NT and KN wrote the paper.

# FUNDING

This study was supported by JSPS KAKENHI Grant Number 13J09808, 19KT0016, and 26850006. It was also partially supported by Research Supporting Program of the Advanced Analysis Center, National Agriculture and Food Research Organization (NARO) and the Genebank Project, NARO.

# ACKNOWLEDGMENTS

We are grateful for Ms. Motoyoshi, Ms. Asano and Ms. Yamamoto for taking good care of plant materials. We also appreciate for the support provided by Research Supporting Program of the Advanced Analysis Center, National Agriculture and Food Research Organization (NARO) and the Genebank Project, NARO.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2019.01607/ full#supplementary-material


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Takahashi, Sakai, Yoshitsu, Muto, Anai, Pandiyan, Senthil, Tomooka and Naito. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Diversity of Naturalized Hairy Vetch (Vicia villosa Roth) Populations in Central Argentina as a Source of Potential Adaptive Traits for Breeding

Juan P. Renzi 1,2\*, Guillermo R. Chantre2,3, Petr Smýkal <sup>4</sup> , Alejandro D. Presotto2,3, Luciano Zubiaga<sup>1</sup> , Antonio F. Garayalde<sup>5</sup> and Miguel A. Cantamutto1,2,3

<sup>1</sup> EEA H. Ascasubi Instituto Nacional de Tecnología Agropecuaria, Buenos Aires, Argentina, <sup>2</sup> Departamento de Agronomía, Universidad Nacional del Sur, Bahía Blanca, Argentina, <sup>3</sup> Centro de Recursos Naturales Renovables de la Zona Semiárida (CERZOS), CONICET, Bahía Blanca, Argentina, <sup>4</sup> Department of Botany, Palacký University in Olomouc, Olomouc, Czechia, <sup>5</sup> Departamento de Matemática, Universidad Nacional del Sur, Bahía Blanca, Argentina

### Edited by:

Laurent Gentzbittel, National Polytechnic Institute of Toulouse, France

### Reviewed by:

Hamid Khazaei, University of Saskatchewan, Canada Fred Stoddard, University of Helsinki, Finland

> \*Correspondence: Juan P. Renzi renzipugni.juan@inta.gob.ar

### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 07 October 2019 Accepted: 07 February 2020 Published: 28 February 2020

### Citation:

Renzi JP, Chantre GR, Smýkal P, Presotto AD, Zubiaga L, Garayalde AF and Cantamutto MA (2020) Diversity of Naturalized Hairy Vetch (Vicia villosa Roth) Populations in Central Argentina as a Source of Potential Adaptive Traits for Breeding. Front. Plant Sci. 11:189. doi: 10.3389/fpls.2020.00189 Hairy vetch (Vicia villosa ssp. villosa Roth) is native of Europe and Western Asia and it is the second most cultivated vetch worldwide. Hairy vetch is used as forage species in semiarid environments and as a legume cover crop in sub-humid and humid regions. Being an incompletely domesticated species, hairy vetch can form spontaneous populations in a new environment. These populations might contain novel and adaptive traits valuable for breeding. Niche occupancy based on geographic occurrence and environmental data of naturalized populations in central Argentina showed that these populations were distributed mainly on disturbed areas with coarse soil texture and alkaline-type soils. Low rainfall and warm temperatures during pre- and post-seed dispersal explained the potential distribution under sub-humid and semiarid conditions from Pampa and Espinal ecoregions. Conversely, local adaptation along environmental gradients did not drive the divergence among recently established Argentinian (AR) populations. The highest genetic diversity revealed by microsatellite analysis was observed within accessions (72%) while no clear separation was detected between AR and European (EU) genotypes, although naturalized AR populations showed strong differentiation with the wild EU accessions. Common garden experiments were conducted in 2014–16 in order to evaluate populations' germination, flowering, and biomass traits. European cultivars were characterized by low physical seed dormancy (PY), while naturalized AR accessions showed higher winter biomass production. Detected variation in the quantitative assessment of populations could be useful for selection in breeding for traits that convey favorable functions within specific contexts.

Keywords: Vicia villosa genotypes, naturalized population, niche-modeling, genetic resource, phenotypic characterization, microsatellites

# INTRODUCTION

The Vicia genus, of the Fabaceae family, includes several winter annual legumes, generically grouped as "vetches." Within this complex, Vicia villosa ssp. villosa Roth, commonly known as hairy vetch (HV), is a relevant member. It is native in Europe and West Asia, being introduced as a crop or weed worldwide to temperate climate regions. Hairy vetch is considered a cosmopolitan species due to its high capacity to naturalize under different conditions. It is present in the flora of both South and North Americas, including Argentina (Van de Wouw et al., 2001; Bryant and Hughes, 2011).

Hairy vetch is the second most important vetch in agricultural systems worldwide (Francis et al., 1999). Generally, it is grown for forage, consumed under direct or indirect grazing, or for green manure. In conservation agriculture, the use of HV as cover crop is increasing. Hairy vetch displays high tolerance to biotic and abiotic stresses (Francis et al., 1999). It is one of the recommended cover crop in organic or conservation farming, mainly because it enhances soil nitrogen content by biological fixation (Vanzolini, 2011). Due to its valuable traits, HV could help to improve soil structure, reduce soil erosion, and enhance weed suppression (Clark, 2007; Wayman et al., 2016; Frasier et al., 2017). Typically, HV produces between 2.6 and 6.2 ton ha−<sup>1</sup> of above-ground dry biomass (Lawson et al., 2015; Mirsky et al., 2017; Ackroyd et al., 2019).

Hairy vetch shows the capacity to form spontaneous populations in ruderal habitats of cultivated areas (Aarssen et al., 1986; Renzi and Cantamutto, 2013). Under natural conditions, the ability to regenerate populations from the soil seed bank is associated with primary combinational seed dormancy (i.e., physical plus physiological dormancy, PY+PD) (Renzi et al., 2014). These naturalized populations can be useful as a genetic resource for breeding. However, despite the high potential agronomic value, HV is an incompletely domesticated species and only a few improved varieties exist. Likewise, HV cover crops are often unreliable in terms of establishment, performance and biomass production (Wilke and Snapp, 2008; Aapresid, 2018). Altogether these drawbacks frequently limit HV adoption by farmers (Maul et al., 2011). Studies concerning geographic distribution and climatic requirements of this species are scarce (Aarssen et al., 1986). The study of the ecological niches of natural HV populations would improve our understanding of its potential adaptation to different environmental factors.

The most important breeding goals of HV include high early vigor, high winter, and spring biomass production and low level of seed dormancy (mainly due to the physical component of primary dormancy). Rapid growth under low temperatures is especially important when HV is used to produce biomass at the end of winter (i.e., cover crop). The time-window for HV growth control (by desiccation or mechanical methods) during early spring, depends on the planting date of the subsequent summer crop. As the growth rate of HV accelerates with the spring advance the adjustment of the control intervention is critical to determine the cover crop performance. Moreover, HV spring biomass production largely determines the amount of nitrogen supplied to subsequent cash crops (Vanzolini, 2011). An additional challenge of HV is seed dormancy control, which can lead to incomplete emergence after seeding (Jacobsen et al., 2010; Maul et al., 2011). On the contrary, in agroecosystems of semiarid regions, HV natural reseeding capacity is a desirable trait reducing establishment costs (Renzi et al., 2017; Renzi et al., 2019), especially when used as forage crop by livestock farmers.

HV was introduced in Argentina more than a century ago (Manganaro, 1919). Since then, several naturalized populations have been established in ruderal habitats surrounding agricultural areas. These populations are considered an unexplored genetic resource for breeding. However, to confirm their potential value as a genetic resource, it is imperative to collect and characterize such material, by comparison, to currently registered cultivated accessions.

The objectives of this study are: i) to describe the natural habitats of naturalized HV populations from Argentina, ii) to assess the phenotypic variability of naturalized populations compared with a set of 41 introduced accessions of HV (including wild and cultivars), and iii) to study the genetic structure using simple sequence repeat (SSR) markers.

### MATERIALS AND METHODS

### Ecological Characterization Naturalized Populations

The study area comprised nine provinces: Buenos Aires, La Pampa, Río Negro, Neuquén, Mendoza, Córdoba, San Luis, Santa Fe, and Entre Ríos, belonging to three eco-regions: Pampa, Espinal, and Shrubs of Plateau and Plains (Figure 1A). Three exploration trips were accomplished during December 2013-2015, covering a total of 21.400 km. The survey on HV populations was based on specialized systematic bibliography (Burkart, 1952), and voucher specimens deposited at Instituto de Botánica Darwinion (http://www2.darwin.edu.ar) and Museo de La Plata (http://www.museo.fcnym.unlp.edu.ar). To be considered, studied HV populations must be observed for at least two different years at the same locality, and they must contain more than 50 individuals. Recorded information of collection site consisted of i) ecological region, ii) latitude, longitude and altitude, iii) environment (soil and climate) and plant community (dominance of co-occurring plant species) characterized by family, iv) life cycle and origin (Marzocca, 1994). Global positioning system (GPS) coordinates of 63 naturalized populations were recorded (Table S1).

### Environmental Variables

The WorldClim (http://worldclim.org) version 2.0 database was used to extract information about the climate (period 1970– 2000). Data were extracted using DIVA-GIS software from ESRI grids with a spatial resolution of 30 arc-seconds (~ 1 km) in the WGS-84 (EPSG: 4326). Bioclimatic variables (BIO1–BIO19) were derived from monthly temperature and rainfall values (Fick and Hijmans, 2017). To avoid over-parameterization, 10 bioclimatic variables were selected to represent annual trends and extreme conditions of temperature and precipitation: annual

mean temperature (BIO1), maximum temperature of the warmest month (BIO5), minimum temperature of the coldest month (BIO6), mean temperature of warmest quarter (BIO10), mean temperature of coldest quarter (BIO11), annual precipitation (BIO12), precipitation of the wettest month (BIO13), precipitation of the driest month (BIO14), precipitation of wettest quarter (BIO16), and precipitation of driest quarter (BIO17) (Hijmans et al., 2005). In addition to the bioclimatic variables, edaphic variables related to soil texture, pH, and bulk density were obtained from soil databases (FAO/IIASA/ISSCAS/JRC, 2012) using WGS84 and spatial resolution of 30 arc-seconds. For details, including basic descriptive statistics of each environmental variable see Table S1.

Soil samples (at depth of 0–15 cm) were collected at each site to assess the data obtained from soil databases. Soil samples were air-dried and sieved to < 2 mm. pH was measured using a glass electrode pH-meter (soil: water, 1:2.5). Texture analysis (% clay, silt, sand) on HCl and H2O2 treated and chemically [0.05 M (NaPO3)6 and 0.15 M Na2CO3] dispersed samples was carried out by a combination of sieving and pipette methods (Cantamutto et al., 2008; Santos et al., 2017). Pearson's Correlation Coefficient was performed between soil databases and samples using the InfoStat software (Di Rienzo et al., 2013).

### Niche Analysis

Ecological niche models were constructed using the geographic locations of HV naturalized populations. Maxent (version 279 3.4.1, Phillips et al., 2018) at default conditions, a maximumentropy based machine learning method was used for modeling purposes. Maxent showed better performance than other methods when samples sizes are small (Hernandez et al., 2006) and it estimates the potential niche instead of the realized distribution of the modeled entity (Phillips et al., 2006; Hradilová et al., 2019). As environmental predictors, bioclimatic and soil variables at a resolution of 2.5 arc minutes were used. Logistic output with suitability values ranging from 0 (unsuitable habitat) to 1 (optimal habitat) was used. Occurrence points (75%) were used to calibrate the model. The remaining (25%) occurrence points were used for model evaluation. Model strength was quantified using the area under the curve (AUC) of the receiver operator generated within Maxent (Zhu et al., 2017; Phillips et al., 2018).

### Phenotypic Characterization Plant Material

In order to determine the biodiversity of studied naturalized populations (Figure 1A, Table S1), we compared accessions originating from Argentina (AR) and Europe (EU). Twenty-nine naturalized populations were evaluated in a common garden (2014 and/or 2016) experiment (Table 1). They were selected based on the amount of available collected seed stock (> 30 g) and wide distribution (Figure 1A). Cultivated germplasm from AR consisted of 10 varieties (landraces) maintained by farmers (Table 1) and two registered cultivars (Tolse F.C.A and Ascasubi INTA) (www.inase.gov.ar).

Wild (n = 5) and cultivated germplasm (n = 24) of HV from EU was represented by 29 accessions (Table 1). Origin and accession name of each cultivar were provided by the Research Institute of Crop Production (CRI) of the Czech Republic (Table 1; for more information see https://grinczech.vurv.cz/ gringlobal/search.aspx) (Renzi et al., 2016).

### Common Garden

Plants (Table 1) were cultivated at the Experimental Agricultural Station (EEA) of Hilario Ascasubi (Buenos Aires, Argentina; 62° TABLE 1 | Country, improvement status (cultivar, wild, and naturalized), and name of hairy vetch accessions included in the phenotypic and genotypic studies.


(Continued)


\*Include VNS, variety not stated.

\*\*Phenotypic common garden evaluation in 2014 (a) and 2016 (b), and genotypic analysis with SSR (c).

† Correspond to name of a locality.

37′W, 39°23′S) during 2014 and 2016 growing seasons. The predominant climate in this location is semiarid-temperate with 489 mm mean annual precipitation and 14.8°C mean annual temperature (EEA H. Ascasubi, 1966–2018). The soil was an entic haplustoll, sandy loam, slightly alkaline (pH ≈ 7.5), high in phosphorus (P) content (≈ 22 ppm P Bray & Kurtz) and low organic matter content (≈ 1.6%) at 20 cm (Renzi et al., 2016). Weather data from each year were registered at the nearby meteorological station (less than 500 m) (http://inta.gob.ar/ documentos/informes-meteorologicos).

The accessions were arranged in row plots in a randomized complete block design, with three replications. Each experimental unit consisted of a row of 2.50 meters sown with 30 seeds on 10th April 2014 and 27th April 2016. Original collected seeds were used in 2014 and 2016 experiments. Seeds were inoculated with commercial inoculum (Rhizobium leguminosarum bv viciae) immediately before sowing (Deaker et al., 2004).

To determine the number of days from sowing to 50% flowering, the growing stage was recorded twice a week. After 50% flowering, leaf length (mm), leaflets per leaf, and the number of flowers per raceme were measured on 10 randomly selected individual stems in each plot. For foliar observations, the leaf of the third upper node of the stem and the basal leaflet of the leaf were chosen. Above-ground total dry matter was measured at end of winter (mid-September) and late spring (mid-December) by cutting plant shoots at ground level in a 0.50 m row in each plot. The biomass of each accession, in each year, was expressed in relation to the average biomass per year. Maturity was defined by the presence of 75% ripe pods, approximately at the beginning of summer. Seeds from mature pods were immediately harvested and threshed by hand, on 21st December 2014 and 26th December 2016, for physical (PY) testing. Moisture content at harvest was ≤ 14% (Renzi and Cantamutto, 2013).

### Statistical Analysis

Analyses of variance (ANOVA) considering a randomized complete block, between improvement status (EU Cultivar, Wild, AR Cultivar, Naturalized; Table 1) and between accessions for each improvement status, were performed using InfoStat software. Accessions and improvement status means were compared by Fisher's least significant difference test. Correlations between quantitative traits were calculated using Pearson's correlation coefficient. Canonical variate analysis (CVA) was performed with all phenotypic traits based on Euclidean distance through InfoStat software.

### Physical Dormancy Testing

After the harvest from the common garden experiment, seeds were cleaned and seed weight was estimated, in a sample of 100 seeds in 2014 (n = 1) or 50 seeds in 2016 (n = 3). PY seeds (i.e., "hard" or impermeable) were determined by an imbibition test performed at 20 ± 2°C for 3 days (Baskin and Baskin, 2014). Intact non-germinated seeds of each replication were placed on moist filter paper in Petri dishes and watered daily with tap water. Imbibed seeds showed a visible change in its size/volume ratio, and were easily distinguished from non-imbibed ones (Renzi et al., 2014). Seed viability of non-germinated seeds was assessed by slicing longitudinally with razor and immersion in a 0.5% (wt/vol) tetrazolium chloride (2,3,5-triphenyltetrazolium chloride) (Sigma-Aldrich) solution for 24 h at 30°C in the dark (ISTA, 2019). Seeds with pink or red-stained embryos were considered viable. The total number of viable seeds consisted of germinated plus stained one (Renzi et al., 2016).

For all accessions, PY break dynamics as a function of storage time (38 days) and temperature (20°C) under wet conditions were analyzed using the area under the curve (AUC) calculated by GraphPad Prism Software (GraphPad, San Diego, California, USA). Where AUC = 1 indicates seed without PY (Initial non-PY seeds = 100%) and AUC = 0 indicate PY seed (final non-PY seeds = 0%) (Renzi et al., 2016). Accessions were grouped by improvement status and further compared by Fisher's least significant difference test using InfoStat software.

### Relationships Among Phenotypic Traits, Geography, and Environment

To assess relationships between phenotypic traits in 16 and 20 naturalized populations of Argentina (2014 and 2016) and both geographic and environmental distances, six matrices were prepared and examined using the Mantel test (Smouse et al., 1986). The physical distance between naturalized populations was estimated using geographic distance (GGD) for latitude (x)/ longitude (y) values: GGD = ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (xi − xj) <sup>2</sup> + (yi − yj) <sup>2</sup> p (Peakall and Smouse, 2006; Garayalde et al., 2011). The geographic matrix contained pairwise geographical distances while phenotypic distance was calculated as Euclidean distances between populations. All environmental variables were standardized and were calculated using soil and climatic variables ("environmental"). The significance of the normalized Mantel coefficient was calculated using a two-tailed Monte Carlo permutation test with 1,000 permutations using InfoStat software.

# Genotypic Characterization

### Plant Sampling and DNA Isolation

Three representative naturalized populations (n = 3) were contrasted with AR cultivar (n = 2), EU cultivar (n=4), and wild accession (n = 1) (Table 1). Seeds were sown in a common garden at the Experimental Agricultural Station (EEA) of Hilario Ascasubi. Leaf material from 10 randomly selected plants from each accession was collected at the vegetative stage (August 2016). Genomic DNA was extracted using a modified cetrimonium bromide (CTAB) method (Hoisington et al., 1994) from leaf tissue dried on silica gel.

### Microsatellite Markers

The five most polymorphic SSR markers were chosen (Table S2) from set of 36 simple sequence repeat (SSR) markers developed by Raveendar et al. (2015) for common vetch (Vicia sativa subsp. sativa) (Chung et al., 2013) and being applicable for HV genotyping analysis. Amplification reactions were performed in 17 ml volumes containing: 0.25 U of Taq DNA polymerase (Invitrogen), 1 mM MgCl2, 1.1 pmol of primers, 1 mM of each deoxynucleoside triphosphate (dNTP), and 30 ng of genomic DNA template. The optimum annealing temperature was determined for each primer set: KF008505 (55°C), KF008507 (59°C), KF008512 (59°C), KF008526 (59°C), and KF008536 (60° C). Amplifications were initially checked on 1.5% agarose gels. PCR products were analyzed on 6% denaturing polyacrylamide gel, 1×TBE electrophoresis buffer at 60 W for 75 min and the bands were visualized by silver staining and scanned (modified from Tang et al., 2003 and Garayalde et al., 2011). The size of each SSR allele was estimated using a 100 bp molecular weight marker. Each DNA fragment was considered as an allele of a single co-dominant locus.

### Genetic Data Scoring and Genetic Variability

The amplified SSR loci were scored for 10 accessions. Homozygous and heterozygous genotypes were inferred from the band patterns and allele frequencies (pi) calculated accordingly. The absence of band (null allele) was scored as missing data. Mean expected heterozygosity values (He) and the percentage of polymorphic loci (P%) were calculated: HE = 1 <sup>−</sup>opi<sup>2</sup> ; <sup>P</sup> % = Lp LT 100, where pi is the frequency of the ith allele, Lp is the number of polymorphic loci, and LT is the total number of loci. Hardy–Weinberg equilibrium was tested using chi-squared test <sup>X</sup><sup>2</sup> <sup>=</sup> <sup>x</sup><sup>2</sup> <sup>=</sup> o<sup>k</sup> i=1 (Oi − Ei) 2 Ei , where Oi is the observed number of individuals of the ith genotype, Ei is the expected number under equilibrium hypothesis, and K is the total number of genotypes. Degrees of freedom for the chi-squared test were calculated as d.f. = [Na(Na–1)]/2, where Na is the number of alleles at the locus (following Garayalde et al., 2011).

The calculation of genetic distances (GD) followed the method of Peakall et al. (1995) and Smouse and Peakall (1999). For the analysis of a SSR single-locus, the first step involves the calculation of the vector by additive genotype scoring convention per individuals. Subsequently, the squared distance (d<sup>2</sup> ) between any two genotypes is one-half the Euclidean distance between their respective pair of vectors as follows: d<sup>2</sup> ij = <sup>1</sup> <sup>2</sup> o<sup>k</sup> <sup>k</sup>=1(yik − yjk) 2 , where i and j are the genotypes and k is the scoring character. Squared distances range from 0, when individuals share the same alleles, to 4 when individuals are homozygous for different alleles. Genetic distance matrices for each locus were summed across loci under the assumption of independence. At population level, a ØPT (analogue of FST) obtained from analysis of molecular variance (AMOVA) was used as an estimate of population genetic differentiation with SSR markers. Principal coordinate analyses (PCO) were performed on GD matrices. The correlation between genetic distance and phenotypic matrix was analyzed by the Mantel test (see Garayalde et al., 2011).

### Analysis of Molecular Variance

The individual pairwise GD matrices were subjected to AMOVA. Total genetic variation was partitioned into three levels: within and between accessions and between origin (AR and EU). Variation was summarized both as the proportion of the total variance and as Ø-statistics (Excoffier et al., 1992). Genetic variability measures, distance metrics, PCO analysis, correlation analysis, and AMOVA were analyzed using GenAlEx 6 (Peakall and Smouse, 2006).

### RESULTS

### Ecological Niche of Naturalized Populations

Naturalized populations of hairy vetch were found in the three monitored regions (Figure 1A), corresponding to Pampa, Espinal, and Shrubs of Plateau and Plains. In the central temperate region of Argentina, annual rainfall varies from semiarid to arid conditions with only 200 mm, to sub-humid and humid environments with approximately 1,000 mm. The vegetation changes in all three regions, from arid steppes in the west (Shrubs of Plateau and Plains) to grass steppes without woody species in the east (Pampa). The Espinal is an intermediate savannah, with grasses and scarce xeric trees, mainly of the Prosopis genus.

The proposed niche modeling explained most of the variation in HV geographical distribution. The area under the receiver operating curve (AUC) score of MaxEnt models, both training and test AUC values, were 0.957 and 0.956, respectively, indicating that most climatically suitable areas predicted by MaxEnt were highly correlated with the occurrence of natural HV populations. The distribution was significantly affected by precipitation amount of the driest quarter (BIO17), max temperature of the warmest month (BIO5), annual mean temperature (BIO1), and clay content in the soil surface (t\_clay, Table 2). The main suitable habitats for HV are distributed in the southeast of Espinal and southwest of the Pampa region, characterized by sub-humid and semiarid temperate climates, with warm-dry summers and cold-wet winters (Figure 1B).

Plant communities associated with HV comprised of 63 species (Table S1). Most frequent species were cosmopolitan weeds, including Avena fatua, Cynodon dactylon, Sorghum halepense (Poaceae), Carduus sp., and Centaurea solstitialis (Asteraceae) and natural communities of the perennial pasture Festuca arundinacea (Poaceae). Table 3 presents the life cycles and origins of the 20 species frequently associated with naturalized HV populations in the explored region, considered as the dominant community species. Exotic species represent 85% of the co-occurring vegetation.

### Plant Growth and Phenotypic Variability

Registered rainfall in EEA Hilario Ascasubi during HV growing season (from April to December) was 50% higher in 2014 (498 mm) and 21% lower (264 mm) in 2016, compared to historical long-term means (331 mm). Mean daily air temperature values were slightly higher in 2014 (13.5°C) compared to 2016 (12.8°C) growing season. All AR accessions performed well, except for Tolse F.C.A in 2014. However, nine out of the twenty-four EU

TABLE 2 | Contribution (%) of the bioclimatic and soil variables in the MaxEnt models, and suitable habitats of naturalized hairy vetch populations from Argentina.



Life cycle: A, annual; P, perennial; B, biannual. Origin: Na, native; E, exotic.

tested cultivars (nr. 1, 7, 9, 10, 12, 13, 15, 16 and 19; Table 1) and two out of the five wild populations (1 and 2) did not produce pods during 2014.

Using canonical discriminant analysis with the phenotypic traits, the accessions were grouped in three clusters (Figure 2). These corresponded to improvement status and origin. Cluster 1 consisted predominantly of wild populations, cluster 2 consisted accessions of AR origin, and in cluster 3 were EU cultivars.

AR accessions showed higher winter biomass production compared to EU cultivars and wild genotypes (Table 4).

TABLE 4 | Phenotypic variability of hairy vetch (HV) for each improvement status and origin (means and range) grown in a common garden during 2014 and 2016 in Experimental Agricultural Station (EEA) Hilario Ascasubi.


For mean values, different letters indicate significant differences among status (Fisher's LSD test, a = 0.05).

'\*\*'Indicates significance at p < 0.01 and '\*' at p < 0.05.

†Relative to mean biomass per year (winter 222 and 168 g m<sup>−</sup><sup>1</sup> and spring 1,174 and 887 g m<sup>−</sup><sup>1</sup> in 2014 and 2016, respectively).

NS, not significant. Renzi et al.

Renzi et al. Diversity of Vicia villosa

Biomass in spring was less variable between improvement status. Over the total data set, HV winter biomass was negatively correlated to the number of leaflets per leaf (r = -0.20; P < 0.01), and the spring biomass was positively correlated with the leaf length (r = 0.74; P < 0.001). There was strong positive correlation between winter and spring biomass (r = 0.40; P < 0.001). Variations in the days to flowering among improvement status were narrow (≤ 7 days) but statistically significant within status (Table 4).

Seed viability was over 80% in all cases. The area under the curve (AUC), showing the following dormancy gradient rank: EU cultivars < AR cultivars < naturalized < wild genotype. No significant interaction between improvement status x year was found in the AUC (Table 4). Wild genotypes had a smaller seed weight (Table 4).

Figure 3 shows the relationship between the main evaluated traits (winter biomass and seed dormancy), in order to identify the more suitable genotypes for breeding programs. Accessions number 9, 12, 19, 21, 26 of naturalized populations and 2, 5, 10, 12 of AR cultivars showed better potential for winter biomass with high PY. While the genotypes number 3, 6, 8 for AR cultivars, 2 for naturalized populations and 6 for EU cultivar showed better potential for both winter biomass and low PY.

### Genotypic Characterization

The five selected polymorphic SSR loci produced altogether 24 alleles in 100 tested individuals. The mean number of alleles per locus (N) was 4.8 ± 0.9 ranging from 3 to 8 among the five loci. The mean expected heterozygosity (He) was 0.667 ± 0.03.

Differences in variability were observed between improvement status, being higher in EU cultivars (He = 0.64 ± 0.03; N = 4.4 ± 1.0) and naturalized populations (He = 0.63 ± 0.05; N = 4.6 ± 0.9) compared to AR cultivars (He = 0.57 ± 0.06; N = 3.8 ± 0.9) and wild (He = 0.30 ± 0.09; N = 2.2 ± 0.4) genotypes. The lower value of variability in wild ones might be due to the smaller number of analyzed individuals (n = 10). Two private alleles were found only in naturalized populations. Equilibrium tests were significant in the 68% of cases, indicating non-random mating within populations (Table S3). AMOVA showed a significant differentiation between improvement status, which explained around 19.1% of the variance. Naturalized populations differed from the wild accession (a genetic differentiation of 46.7%) and also from EU (8.3%) and AR (14.3%) cultivars. Lower differentiation was found between AR and EU cultivars (4.3%) (Figure 4).

AMOVA for the total marker data set is shown in Table 5. Genetic diversity was high within accessions. Between-accession estimated variance was significant and around 25% of the total variation. A small but significant portion of variance (3%) was found attributable to differences between AR and EU genotypes.

### Relationships Among Distance Measures

Environmental distance matrices were significantly correlated with geographic distance matrices (Mantel test; environment: r2014 = 0.61, P < 0.01; r2016 = 0.78, P < 0.01), suggesting that environmental (climatic + soil) conditions diverge with increasing geographic distance. However, geographic (r2014 = 0.19, P = 0.16; r2016 = 0.10, P = 0.20) nor environmental (r2014 =

TABLE 5 | Analysis of molecular variance (AMOVA) and sources of variation for hairy vetch accessions.


\*SS, sum of squares.

\*\*MS, mean sum of squares.


Genetic and phenotypic distance matrices of naturalized populations showed a positive but non-significant statistical correlation (Mantel test; r = 0.19, P = 0.21).

# DISCUSSION

### Ecological Niche

Argentinian hairy vetch populations occur on a transitional zone between two defined ecological regions, Pampas with sub-humid climate and Espinal with semiarid conditions (Figure 1). HV was used as a forage species in Buenos Aires province before 1900 (Manganaro, 1919). Thereafter, it probably escaped from cultivation (Aarssen et al., 1986) and became naturalized. As HV natural seed dispersal potential is very limited (Jannink et al., 1997), seed spillage due to handling and transportation was probably the most important distribution method. Humanmediated dispersal is the most likely explanation of establishment and subsequent naturalization of HV into further suitable habitats (Horvitz et al., 2017; Pascher et al., 2017).

After its introduction in Argentina, HV spread over the areas which met the appropriate conditions, following a patchy distribution. HV showed adaptation in broad geographic (33– 41°S, latitude, 60–66°W longitude) and climatic range (400–800 mm rainfall; 11–13.6°C annual mean temperature; Table 2). Duke (1981) mentioned that HV is well adapted to a greater range of annual mean temperatures between 4.3–21°C. In this study, low rainfall and warm temperatures during summer months explained HV potential distribution of natural populations (Table 2). HV was generally associated with neutral-alkaline (range 6.1–9.7 pH), sandy or sandy loam soils. However, it could occur on most soil types with sufficient drainage capacity (Duke, 1981). Clark (2007) stated that HV preferred neutral (pH 6.0–7.0) soils with tolerance to alkalinity (Duke, 1981). Conversely, low pH (< 6.2) can decrease the rate growth, nodulation, and nitrogen fixation (Aarssen et al., 1986).

### Fitness for the Ecological Niche

The ability of HV to produce PY+PD dormant seeds and their subsequent germination and emergence are important factors that influence natural population dynamics and persistence (Kimball et al., 2010). During the period of seed formation, dry and warm conditions could shorten the species life cycle due to rapid thermal-time accumulation (Petraityte et al., 2007) as well as favor a decrease on seed moisture content. In HV, the acquisition of PY is initiated only when the moisture content of the seeds is ≤14% (Hyde, 1954). Furthermore, HV is a crosspollinated species where bees play an important role (Zhang and Mosjidis, 1995; Renzi et al., 2017), thus dry and warm weather is favorable for the activity of pollinating insects (Petraityte et al., 2007; Al-Ghzawi et al., 2009).

In the humid central region of Argentina, the spread of naturalized populations of HV would be limited by a negative combination of two main factors. First, the abundant rainfall stimulates the virulence of foliar fungal diseases (e.g., Ramularia sphaeroidea Sacc. and Ascochyta viciae Lib.), which reduce photosynthetic leaf area thus limiting seed formation (Petraityte et al., 2007; Renzi and Cantamutto, 2013). In addition, high humidity conditions enhance HV indeterminate growth, nonuniform maturity and extended growing season favoring a biennial behavior (Duke, 1981). These consequently limit the seed formation, drying, and acquisition of PY, and can cause a sharp reduction of seed bank persistence and consequent natural regeneration, similar to observations of Toser and Ooi (2014) in Acacia saligna.

After the seed dispersal, warm summer temperatures are required for seed dormancy release (Renzi et al., 2014; Renzi et al., 2016). Seed dormancy acquisition, release, and seedling emergence requirements are important fitness traits determining ecological niches of HV (Figure 1B). These adaptive traits seem to have evolved in Mediterranean-like environments, where hot and dry summer conditions regulate seed dormancy alleviation while cool and wet winters contribute to enhance vegetative growth providing safe-sites for seedling recruitment (Van Assche and Vandelook, 2010; Picciau et al., 2019).

### Phenotypic Variability

Two cultivars of woolly-pod vetch (Capello and Tolse F.C.A) were found among AR and EU cultivars. This subspecies (V. v. ssp varia) is characterized by shorter leaves (29.8 ± 6 vs. 47.8 ± 9.3 mm in HV, P < 0.01), fewer leaflets (12.3 ± 1.9 vs. 15.2 ± 1.1, P < 0.01); and flowers (20.1 ± 7.7 vs. 26.9 ± 3.4, P < 0.01), early flowering and maturity (more than 2 week before, P < 0.01), higher winter (Figure 3) but lower spring biomass in relation to HV (≈ 60%, P < 0.01). On the other hand, Jannink et al. (1997) found that accessions of HV were more winter-hardy than woolly-pod vetch, and late flowering may be positively genetically correlated with winter hardiness (Maul et al., 2011). The susceptibility of HV to low-temperature increases with more advanced phenological stages (Brandsaeter et al., 2002) while slow growth during winter can be an adaptation attribute to avoid frost damage (Loi et al., 1993). Prompt maturity is a desirable trait for earlier biomass production as well as N accumulation in regions with a shorter growing season. Also, flowering timing can greatly influence the capacity for weedsuppression as a cover crop (Mischler et al., 2010; Maul et al., 2011). Therefore, woolly-pod vetch could be a source of desirable genes for breeding program seeking early flowering cultivars for cover crop usage in mild winter zones. Notably, most of the genotypes characterized as early flowering by Maul et al. (2011) corresponded to woolly-pod vetch (https://npgsweb.ars-grin. gov/gringlobal/search.aspx).

Quantitative traits evaluated among accessions resembled a continuous probability distribution, although the difference between improvement status was statistically significant (Table 4). AR accessions showed higher winter biomass accumulation compared to EU, probably due to a greater adaptation to Argentinian ecological conditions.

Seed dormancy is largely genetically determined but also depends on the environmental conditions experienced by the mother plant (maternal effect) and the subsequent degree of seed dehydration (Hudson et al., 2015; Finch-Savage and Footitt, 2017). HV seeds were collected from mature pods with seed moisture content less than 14% (determined as a critical value for PY acquisition, Hyde, 1954), and the environmental effect between years was not detected on AUC of PY (Table 4). PY was variable among genotypes and could act mainly as an adaptive trait (Hudson et al., 2015; Long et al., 2015). EU cultivars had lower PY values compared to AR cultivars and naturalized accessions. PY is a highly heritable trait (Hudson et al., 2015) thus it could be useful germplasm for both breeding and artificial (or natural) selection for higher or lower levels of dormancy (Lacerda et al., 2007). It is probable that observed differences between accessions could be explained by genetic adaptations of HV to the local environment (Baskin and Baskin, 2014) as shown in pea (Hradilová et al., 2019) or by selection for improved genotypes (Fuller and Allaby, 2009; Kluyver et al., 2013; Renzi et al., 2016).

Observed large variability within available germplasm could be used by breeders to select parental accessions for HV improvement breeding program that maximizes the winterspring biomass with low PY for cover crops (Wilke and Snapp, 2008; Wayman et al., 2016), or with high PY for ´ley farming´ systems (Loi et al., 2005; Renzi et al., 2017; Renzi et al., 2018). No significant correspondence was found between the geographic distance matrix and the phenotypic distance matrix (P > 0.15) among the naturalized AR populations and these results differ from Medicago polymorpha L., in which a correspondence between collection site and phenotypic traits (Loi et al., 1993; Helliwell et al., 2018) was observed.

Among the measured traits there was a significant correlation between winter and spring biomass. The latter was also highly correlated with the length of the leaf as described in Vicia sativa (De La Rosa et al., 2002). Leaf size is related to the photosynthetic rates which in turn affect growth and could be potentially maximized with water availability (Carlson et al., 2016). Thus, leaf size would be an indirect selection trait for biomass that improves breeding program efficiency.

### Genotypic Variability

There is scarce information on genetic diversity of HV and therefore the knowledge of genetic variability is useful for other studies. Our data showed low genetic differentiation between AR and EU cultivars, but strong differentiation between wild EU and naturalized AR populations. Similarly to other outcrossing species, variation between accessions was small in comparison with the variance found within populations (Table 5), which is expected for an obligate cross-pollinated species (Hamrick and Godt, 1989). Maul et al. (2011) reported similar results, with 93% of genetic diversity within populations.

The Hardy-Weinberg equilibrium was not stable within populations (Table S3) and this can be attributed to mutations, natural selection, non-random mating, genetic drift, and gene flow. As mentioned above, a lack of geographical signature in the pattern of population variation, as occurs with other allogamous species (Garayalde et al., 2011), can also be explained by human activities on seed dispersal and genetic drift (Knapp and Rice, 1998). Cultivated populations are openpollinated and highly heterogeneous and would be subject to natural selection and genetic drift throughout the cycles (Wiering et al., 2018). These results are consistent with observed in the phenotypic traits.

### CONCLUSIONS

This study increases the understanding of the value of naturalized hairy vetch populations in agroecosystems of Argentina. Naturalized populations showed good soil adaptation in disturbed areas and neutral response to alkaline soil niches from central Argentina. Low rainfall and warm temperatures during pre- and post-dispersal seem to explain and regulate the potential distribution of HV populations. Within this ecological context, dry and warm climate may be considered as favorable environmental conditions to increase seed dormancy and timing of germination-triggering. Considering HV genetic variability and agro-ecological adaptation, naturalized populations could be considered as a source of potential adaptive traits for breeding. The AR germplasm constitutes an important reservoir of genes for high winter and spring biomass production. On the other hand, high levels of innate seed dormancy of HV accessions from Argentina reduce its possible use as a cover crop. In this sense, dedicated crosses with more domesticated EU cultivars will serve to reduce the seed dormancy.

### REFERENCES


### DATA AVAILABILITY STATEMENT

All datasets generated for this study are included in the article/ Supplementary Material.

### AUTHOR CONTRIBUTIONS

JR and MC conceived the topic. JR and LZ performed the experiments. JR and AG analyzed all statistical data. JR, GC, and AP wrote the manuscript. PS and MC revised the manuscript.

### FUNDING

This work was supported by the Instituto Nacional de Tecnología Agropecuaria (PE-E6-I146). Agencia Nacional de Promoción Científica y Tecnológica. MINCyT (PICT-2017- 0473 and PICT-2016-1575) and Universidad Nacional del Sur (PGI 24/A223 and PGI 24/A225).

### ACKNOWLEDGMENTS

The authors would like to acknowledge V. Holubec of the Department of Gene Bank of the Crop Research Institute (CRI), for providing the cultivars of hairy vetch used in this study, and T. Vymyslický of the Research Institute for Fodder Crops, for performing taxonomic identification of the Serbian accessions. We would like to thank the personnel at CRF-INIA, in particular to M.T. Marcos Prado for help in the niche analysis. Appreciation is extended to O. Reinoso, J. Castillo and M. Bruna, agricultural specialists at the EEA Hilario Ascasubi, for assistance in establishing the field experiments.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2020.00189/ full#supplementary-material


in a South African shrub. Ann. Bot. 117, 195–207. doi: 10.1093/aob/mcv146


dactyloides. (Nutt. Engelm.). Mol. Ecol. 4, 135–147. doi: 10.1111/j.1365- 294X.1995.tb00203.x


Conflict of Interest: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Renzi, Chantre, Smýkal, Presotto, Zubiaga, Garayalde and Cantamutto. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Pod Dehiscence in Hairy Vetch (Vicia villosa Roth)

Lisa Kissing Kucek <sup>1</sup> , Heathcliffe Riday 1\*, Bryce P. Rufener <sup>1</sup> , Allen N. Burke<sup>2</sup> , Sarah Seehaver Eagen<sup>3</sup> , Nancy Ehlke<sup>4</sup> , Sarah Krogman<sup>5</sup> , Steven B. Mirsky <sup>2</sup> , Chris Reberg-Horton<sup>3</sup> , Matthew R. Ryan<sup>6</sup> , Sandra Wayman<sup>6</sup> and Nick P. Wiering<sup>4</sup>

<sup>1</sup> Dairy Forage Research Center, USDA-ARS, Madison, WI, United States, <sup>2</sup> Sustainable Agricultural Systems Laboratory, Beltsville Agricultural Research Center, USDA-ARS, Beltsville, MD, United States, <sup>3</sup> Crop and Soil Science, North Carolina State University, Raleigh, NC, United States, <sup>4</sup> Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, United States, <sup>5</sup> Noble Research Institute, Ardmore, OK, United States, <sup>6</sup> School of Integrated Plant Science, Cornell University, Ithaca, NY, United States

### Edited by:

Petr Smýkal, Palacký University, Olomouc, Czechia

### Reviewed by:

Tomas Vymyslicky, Agricultural Research, Ltd., Czechia Paul Gepts, University of California, Davis, United States

\*Correspondence: Heathcliffe Riday Heathcliffe.Riday@usda.gov

### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 30 September 2019 Accepted: 21 January 2020 Published: 03 March 2020

### Citation:

Kissing Kucek L, Riday H, Rufener BP, Burke AN, Eagen SS, Ehlke N, Krogman S, Mirsky SB, Reberg-Horton C, Ryan MR, Wayman S and Wiering NP (2020) Pod Dehiscence in Hairy Vetch (Vicia villosa Roth). Front. Plant Sci. 11:82. doi: 10.3389/fpls.2020.00082 Hairy vetch, Vicia villosa (Roth), is a cover crop that does not exhibit a typical domestication syndrome. Pod dehiscence reduces seed yield and creates weed problems for subsequent crops. Breeding efforts aim to reduce pod dehiscence in hairy vetch. To characterize pod dehiscence in the species, we quantified visual dehiscence and force required to cause dehiscence among 606 genotypes grown among seven environments of the United States. To identify potential secondary selection traits, we correlated pod dehiscence with various morphological pod characteristics and field measurements. Genotypes of hairy vetch exhibited wide variation in pod dehiscence, from completely indehiscent to completely dehiscent ratings. Mean force to dehiscence also varied widely, from 0.279 to 8.97 N among genotypes. No morphological traits were consistently correlated with pod dehiscence among environments where plants were grown. Results indicated that visual ratings of dehiscence would efficiently screen against genotypes with high pod dehiscence early in the breeding process. Force to dehiscence may be necessary to identify the indehiscent genotypes during advanced stages of selection.

Keywords: pod dehiscence, germplasm characterization, vetch, domestication syndrome, phenotyping, legumes (Fabaceae)

### INTRODUCTION

Hairy vetch, Vicia villosa (Roth), is an outcrossing diploid legume (2n = 14; Chooi, 1971; Yeater et al., 2004). Commonly used as a green manure cover crop (SARE et al., 2015; CTIC et al., 2016; USDA-NASS, 2019), the species excels in winter hardiness (Brandsæter et al., 2002) and nitrogen supply to subsequent cash crops (Parr et al., 2000). With prevalent pod dehiscence (PD) and seed dormancy, hairy vetch does not exhibit a typical domestication syndrome (Meyer and Purugganan, 2013; Abbo et al., 2014). PD raises seed costs and causes weediness in fields, reducing utilization of this cover crop. Breeding efforts are underway to reduce or eliminate PD in hairy vetch. The goal of such efforts is to increase cover crop use and improve soil and water conservation.

Few published studies have evaluated PD in the genus Vicia. Renzi et al. (2017) documented 15% to 46% PD in one Argentinian landrace evaluated at one location over two years, but no studies have evaluated PD of hairy vetch among diverse germplasm or growing conditions. In common vetch (Vicia sativa L.), PD varied widely (3% to 96%) among diverse lines (Abd El-Moneim, 1993; Dong et al., 2017). Common vetch lines differing in PD exhibited 22 differentially expressed unigenes (Dong et al., 2017).

In other members of the Fabeae tribe, domestication has successfully eliminated PD (e.g. Pisum sp.) or reduced PD to very low levels relative to wild types (e.g. Lens sp.; Abbo et al., 2014; see review by Ogutcen et al., 2018). PD was controlled by one to three dominant loci in lentil (Lens sp.; Ladizinsky, 1979; Ladizinsky, 1985; Fratini et al., 2007) and one to two dominant loci in pea (Pisum sp.; Blixt, 1972; Weeden et al., 2002; Weeden, 2007). PD has been more extensively studied in the Phaseoleae tribe of Fabaceae. In soybean (Glycine max L. Merr.), the transcription factor SHATI-5 and the gene Pod dehiscence 1 (Pdh1) meditate and control PD. In common bean (Phaseolus vulgaris L.), various QTL have been identified among bean races (Parker et al., 2019), the most documented being the Stringless (St) gene in snap beans (Koinange et al., 1996).

PD is influenced by environmental conditions, length of pod drying, and handling methods postharvest (see reviews by Ogutcen et al., 2018 and Zhang et al., 2018). With varying maturity timings, diverse genotypes can be exposed to differing weather conditions during pod development. Consequently, genotype by environment interactions (Tukamuhabwa et al., 2002; Helms, 2011) can cloud genetic effects. More controlled measurements of PD, such as oven drying of pods to standardize moisture and/or applying force to a pod to induce dehiscence were more associated with genetic effects than measuring PD under field conditions (Weeden, 2007; Dong et al., 2016; Murgia et al., 2017; Parker et al., 2019). Such methods, particularly measuring the force needed to induce dehiscence, demand substantial phenotyping time and specialty equipment. Identification of traits that are easier to measure and highly correlated with PD could improve breeding efficiency (Falconer, 1981). Such secondary selection traits could accelerate improvement of hairy vetch.

The objectives of this research were to characterize PD among diverse genotypes of hairy vetch, and to identify the most efficient phenotyping methods for PD. First, we assessed PD among 606 genotypes of hairy vetch in seven environments of the United States. Second, we correlated potential secondary selection traits with measures of PD including visual PD, force to induce dehiscence, and pod spiraling.

### MATERIALS AND METHODS

### Data Collection

The germplasm for our genetic analysis of PD originated from an existing hairy vetch breeding program. For further details on source germplasm, nursery design, field data collection, and management see Kissing Kucek et al. (2019). In the late summer and early fall of 2017, seven nursery environments (Clayton, NC; Goldsboro, NC; Beltsville, MD; Ithaca, NY; Varna, NY; Prairie Du Sac, WI; and St. Paul, MN) planted half-sib seed from 92 maternal plants selected in the summer of 2017, in addition to 12 check lines (Table S1). Fifteen lines were replicated at all environments, including three maternal lines from the 2017 nurseries, four commercially available cultivars, one breeding line, and seven PI accessions. The remaining 77 lines at each site were half-sibling progeny of maternal plants selected from the breeding program in 2017. The seven environments planted a total of 14,304 genotypes of hairy vetch.

Individual plants were screened for visual vigor on a scale from zero to nine in the fall and spring (see Kissing Kucek et al., 2019 for details). When 50% of lines had begun to flower in each nursery, collaborators recorded maturity of each individual plant in the field using the Kalu and Fick (1981). In the spring of 2018, each site selected the top 53 to 122 hairy vetch plants based on winter survival, flowering time, and fall and spring vigor. The selected individuals cross pollinated at each site. Sites recorded the dry weight of selected plants at pod maturity, except Clayton, NC and Prairie du Sac, WI. All sites measured the seed yield of individual plants.

Collaborators harvested a subset of pods from a total of 606 selected plants for PD evaluation. These 606 plants are subsequently referred to as "genotypes." Collaborators at each site were instructed to collect a subset of around 50 mature pods from the selected maternal plant. If pods appeared to contain few seeds per pod, more pods were collected per plant, to obtain enough seed for a separate analysis on germination. Figure S1 shows the distribution of pods collected per maternal line per site.

Pods were stored in paper bags under low humidity (< 35% R.H.) conditions at 20°C for five months. A calibration experiment determined that drying pods for least 24 h at 30°C was necessary to stabilize the PD trait and reach critical pod moisture in Vicia villosa (see supplementary material and Figure S2). Consequently, pods were dried above 30°C for three days to stabilize moisture and PD prior to analysis. Maternal lines were evaluated for metrics of PD and morphology (Table 1) in a completely randomized order.

For each pod per maternal line, visual dehiscence was scored on a 0–3 scale, with zero indicating a fully intact pod (no openings along sutures), one indicating one suture was partially opened (one side of pod), two indicating two sutures were partially opened (both sides of pod), and three indicating that the pod had fully opened.

Pods with a zero rating (unopened) were further tested with an MTS Insight ® 1kN (MTS Systems Corporation) machine for measuring cracking force on unopened pods. The pod was placed horizontally on a bottom compression platen, similar to the methods of Dong et al. (2016), and crushed with a top platen until the machine detected peak force required for suture rupture. The evaluation used a 100 N load cell and TestWorks ® [v.4.11B] (MTS Systems Corporation) set to 95% break sensitivity, 0.1 N break threshold, and 25 Hz data acquisition rate. Maternal lines were only evaluated if they had at least three unopened pods available for testing.


TABLE 1 | Metrics of pod dehiscence evaluation, including photos of pods exemplifying the maximum and minimum of rating scales.

After pods were hand-threshed between two ribbed rubber surfaces, the degree of spiraling was rated by the tightness of the curls formed in the carpels. A zero was given to pods with no curling of the carpels, one for intermediate curling, and two for very tight curling of the carpels.

Pods from each maternal line were further rated for morphological features. Traits included corrugation, fracture of the wall, flexibility, and pith tissue (Table 1). Prior to threshing, corrugation was visually rated by the compression of the pod wall on the seeds inside the pod. Pods were rated zero if no signs of indents formed by the seeds in the pod wall, one if some indenting was present, and two if the pod wall had significant indentation formed by the seeds. After threshing, pod walls were rated for fracture structure by hand bending the carpel until a break formed. The break would be categorized as zero for pods that formed a jagged fracture, and one if the break followed a straight path. Flexibility was tested in a similar method to fracture. Pods received a zero if they displayed resistance when bending the carpel, and one if the carpel bent with little resistance. After threshing, pods were observed for the presence or absence of a spongy white pith tissue inside of the pod. Pods were given values of zero if there was little to no pith tissue, and one if pith tissue was prevalent.

To verify that differences in pod moisture did not impact PD, a subset of lines diverging for visual PD and force to PD were analyzed for percent moisture. Three lines each from high and low PD categories from each site were included, for a total of 42 lines based on visual dehiscence scores and 40 lines based on force to PD. Some lines overlapped, leaving a total of 76 lines analyzed. Threshed pods, without seeds, were weighed, then dried at 105°C for 24 h and reweighed.

### Statistical Analysis

Analyses investigated the relationships between all metrics of PD (visual dehiscence, force to dehiscence, and pod spiraling), pod morphology (corrugation, fracture, flexibility, pith tissue), field measurements (fall vigor, spring vigor, maturity timing, plant weight, and seed yield), and pod moisture. Mean values per line were used for visual dehiscence and force to dehiscence. Lines with fewer than four pods observed were removed from analyses. One extreme outlier for force to dehiscence (see Figures 1B and 2B) was also removed prior to analysis.

Line mean visual dehiscence, force to dehiscence, maturity timing, and pod moisture were continuous variables. Corrugation

an extreme outlier for force to dehiscence. When excluding the outlier, there was a moderate linear relationship (r = −0.33) between visual dehiscence and force to dehiscence. The linear relationship broke down at low levels of visual dehiscence and high force required to cause pod dehiscence. Both traits may be necessary to identify lines most resistant to dehiscence. Visual dehiscence was scored on a 0–3 scale, with "0" indicating a pod fully intact pod (no openings along sutures), "1" indicating one suture was partially opened (one side of pod), "2" indicating two sutures were partially opened (both sides of pod), and "3" indicating that the pod had opened fully or partially.

and pod spiraling were ordinal variables artificially created to represent underlying continuous normal distributions, while fracture, flexibility, and pith tissue were dichotomous variables created to represent underlying continuous normal distributions. Maximum likelihood estimates and standard errors of correlations were computed using Pearson's product-moment correlation coefficient (r) for continuous vs. continuous metrics, polyserial maximum likelihood estimates (r) for continuous vs. dichotomous or ordinal metrics, and polychoric maximum likelihood estimates (r) for dichotomous or ordinal vs. dichotomous or ordinal metrics using the 'polycor' package (Fox, 2016) in R [version 3.6.0] (Olsson, 1979; Olsson and Drasgow, 1982; R Core Team, 2019). To visualize potential nonlinear relationships between visual dehiscence and force to PD, smoothing splines were generated using the 'loess' function in R [version 3.6.0] (R Core Team, 2019) and plotted using 'ggplot2' (version 3.0.0) (Wickham, 2016).

Kissing Kucek et al. Pod Dehiscence in Hairy Vetch

Relationships were tested for difference from zero using Pearson's product-moment correlation for continuous vs. continuous metrics, Kruskal-Wallis one-way analysis of variance for continuous vs. dichotomous or ordinal metrics with nonnormal distribution, one-way analysis of variance (ANOVA) for continuous vs. dichotomous or ordinal metrics with normal distribution, and chi-squared test of independence for dichotomous or ordinal vs. dichotomous or ordinal metrics.

For each combination of metrics, we tested whether correlations varied among environments. Using the 'psych' package [v.1.8.12] (Revelle, 2019) in R [version 3.6.0] (R Core Team, 2019), independent correlations among environments for each trait combination were z-transformed. The difference between the z-transformed correlations was divided by the standard error of the difference of the z scores. All analyses were conducted in R [version 3.6.0] (R Core Team, 2019), using a significance threshold of P < 0.05.

To further understand the contribution of pod morphology and field maturity to PD, visual dehiscence and force to dehiscence were regressed onto pod morphology and maturity timing using the model below. All effects were treated as random to determine variance contribution of each effect. Variances were estimated using the 'lme4' package [version1.1-10] (Bates et al., 2015) in R [version 3.6.0] (R Core Team, 2019).

$$Y\_{ijklmn} = \mu \, + \, \alpha\_i + \, \beta\_j + \, \gamma\_k + \, \delta\_l + \, \zeta\_m + \, \eta\_n + \, \epsilon\_{ijk}$$

Yijklmn: visual or force to dehiscence of a pod from environment i, corrugation j, fracture k, flexibility l, pith tissue m, and flowering maturity n

m: grand mean of visual dehiscence or force to dehiscence

ai: random effect of environment i

bj: random effect of corrugation j

gk: random covariate of fracture k

dl : random covariate of flexibity l

zm: random covariate of pith tissue m

hn: random covariate of flowering maturity n

ϵijklmn: error term

# RESULTS

606 genotypes varied from completely indehiscent to completely dehiscent (Figure 1A). Twenty genotypes, sourced from five environments, created pods that all opened (mean visual score = 3). In two genotypes, sourced at Prairie du Sac, WI and Varna, NY, all pods were fully closed (mean visual score = 0).

A total of 458 genotypes had at least three unopened pods available for force testing. Mean force to dehiscence varied from 0.279 to 8.97 N among maternal lines (Figure 1B). One outlier genotype was evident from the Clayton, NC site. Force to dehiscence for this line was four times the grand mean. Nine of ten pods evaluated for this line were in the 99th percentile of force to dehiscence measured in the study, requiring between 6.2 and 12 N to break the sutures. This line also exhibited the lowest visual dehiscence from its growing location, indicating its promise as a genetic oddity for PD.

Pod moisture varied from 6.7% to 9.3% in a subset of 76 divergent lines for PD. Pod moisture was not significantly related to visual dehiscence, force to dehiscence, flowering maturity, nor pod morphology traits (Table S2). Pod moisture showed a weak inverse correlation with spiraling (r = -0.25, p = 0.02048). As a result of low influence of moisture in the extreme phenotype subset, pod moisture was not included in further analyses, nor evaluated on the remaining 535 lines.

Corrugation of the pods and presence of pith tissue on the interior of the pod were only common at the three southern nurseries, and rare in the northern nurseries. Northern nurseries were subsequently removed from corrugation correlation calculations. Pod walls at one site (Ithaca, NY) nearly all fractured linearly, and therefore, pods from Ithaca, NY were removed from correlation calculations for fracture. Flowering maturity did not strongly differ at Beltsville, MD. Consequently, correlations with flowering maturity did not include Beltsville, MD.

All metrics of PD were significantly related to one another (Table 2). Across environments, spiraling was highly correlated with visual dehiscence (r = 0.64 to 0.87) and moderately correlated to force to dehiscence (r = −0.17 to −0.42). Correlations between visual and force dehiscence were moderate (r = −0.14 to −0.43). Visual dehiscence exhibited a linear relationship with force to dehiscence, except for pods resisting dehiscence at high levels of force (> 4 N; Figure 2).

PD was influenced by the environment where the pods developed. For the response of force to dehiscence, environment explained more variance (13.27%) than any pod morphology metrics (0% to 8.51%) or field maturity traits (0%, Table 3). Visual dehiscence was also influenced by environment (7.63% of variance), but to a lesser degree than force to dehiscence.

Corrugation, pith tissue, fracturing structure, and flexibility of the pod tended to be related to metrics of PD. However, low frequency of corrugation and pith tissue at the northern environments limited the inference of these morphological correlations to only three southern environments. Pod corrugation had the highest correlation with PD among the studied morphology metrics. Higher levels of corrugation were associated with lower visual dehiscence (r = −0.33 to −0.54), and spiraling (r = −0.36 to −0.76) at all environments. Corrugation also accounted for 46% of variance in the random effects model with the response of visual dehiscence (Table 3), and 3.9% of variance with the response of force to dehiscence.

The presence of pith tissue in the pod was associated with more force required to split the pod (r = 0.12 to 0.48) and reduced spiraling (r = −0.16 to −0.27) among environments. In the random effects model with the response of force to dehiscence, pith tissue accounted for the most variance among pod morphology metrics. However, the variance explained by pith tissue was small (8.5%, Table 3).

The fracturing structure of the pod wall was related to spiraling at all environments (r = 0.23 to 0.52), and visual dehiscence at five of six environments (r = 0.23 to 0.46). Pod flexibility was moderately related to spiraling (r = −0.14 to −0.58), and to visual dehiscence at five of six environments (r = −0.27 to −0.63). In the random effects model with the TABLE 2 | Strength, significance, and consistency of association between pod dehiscence, morphology, and field data.


\*, \*\*, and \*\*\* indicate significance of correlations at the 0.05, 0.01, and 0.001 probability level, respectively.

NS indicates that correlations are not significant at the a < 0.05 threshold.

† indicates only one observed paired environment comparison.

¥ indicates only three observed paired environment comparisons.

Numbers show Pearson's product-moment correlation coefficient for continuous vs. continuous metrics, polyserial maximum likelihood estimates for dichotomous vs. ordinal or dichotomous metrics, and polychoric maximum likelihood estimates for dichotomous vs. dichotomous or ordinal metrics across environments. Significance values were calculated through Pearson's product-moment correlation for continuous vs. continuous metrics, chi-squared test of independence for dichotomous or ordinal vs. dichotomous or ordinal metrics, and Kruskal-Wallis one-way analysis of variance for continuous vs. dichotomous or ordinal metrics. Shading indicates consistency of the correlation among environments. Dark grey indicates significant differences in correlations among environments, with change in direction; light grey indicates significant differences in correlations among environments, but no change in direction; and white indicates no significant differences among environments).

March 2020 | Volume 11 | Article 82

Kissing Kucek et al.

TABLE 3 | Variance contribution of environment, pod morphology, and flowering maturity on metrics of dehiscence.


Visual dehiscence was most explained by corrugation, while force to dehiscence was explained by pith tissue, followed by corrugation.

response of visual dehiscence, fracturing structure and flexibility accounted for a small portion of variance, at 5.5% and 2.2%, respectively (Table 3).

Some measures of field performance were associated with PD. Spring vigor showed a weak correlation with visual dehiscence (r = −0.01 to −0.28), spiraling (r = −0.04 to −0.27), and force to dehiscence (r = −0.08 to 0.23), but the relationship varied by environment. Seed yield was correlated with spiraling at most sites (r = 0.08 to 0.56), force to dehiscence (r = 0.01 to −0.24) at some sites, and visual dehiscence at the southern sites (r = 0.13 to 0.47).

### DISCUSSION

Our results indicated the potential to select for indehiscence in hairy vetch. Wide variation in visual and force to dehiscence existed among diverse genotypes. More importantly for selection, multiple lines exhibited indehiscence or very low levels of dehiscence.

This dataset also demonstrated environmental influence on PD, which is well documented in other species (see reviews by Ogutcen et al., 2018 and Zhang et al., 2018). Growing environment contributed substantial amounts of variance for visual dehiscence and force to dehiscence (Table 3). Moreover, correlations between metrics of PD, pod morphology, and flowering maturity significantly differed among environments (Table 2). To separate genetic effects from environmental influences and interactions, PD studies should utilize multiple environments. Secondary selection traits to speed phenotyping would need to consistently correlate among diverse environments within a breeding program region of interest.

Spiraling was highly correlated with PD and was a highthroughput measurement, requiring only 15 seconds per sample. Visual dehiscence provided higher resolution in PD than spiraling and was moderately time intensive, requiring 5 min to rate per line, at 50 pods evaluated per line. Force to dehiscence was the most involved measurement, requiring specialty equipment, a trained operator, and 18.5 min of evaluation time per line, with five pods evaluated per line. Although visual dehiscence and spiraling may be adequate to identify strongly dehiscent lines, force to dehiscence may be useful for identifying extreme lines most resistant to dehiscence. For initial screenings of dehiscence, spiraling could identify the genotypes most susceptible to PD at low cost. Visual dehiscence would be useful in early and middle stages of selection to eliminate moderately dehiscent lines. Once mean visual dehiscence levels become low (< 1) in a breeding population, force to dehiscence measurements would likely be necessary to further advance gains in selection (Figure 2).

Although pod morphology metrics were fast to measure (15 seconds per line), none were strongly related to PD among environments. Pod corrugation was moderately correlated with all measures of PD, and explained a large portion of variance for visual dehiscence. Pith tissue was moderately correlated with force to dehiscence and pod spiraling. However, pod corrugation and pith tissue did not commonly appear at the three environments in the northern United States. Consequently, pod corrugation and pith tissue would not be useful PD secondary selection traits for breeding programs including cold temperate climates.

Further study is needed to understand the physiology of pith tissue in hairy vetch pods. The pith tissue created a foam-like structure that seemed to inhibit compression force from breaking a pod, hence the trait's contribution to force to dehiscence (Table 3). However, the pith tissue may not be genetic resistance to PD, but rather a plant response to an environmental threat (e.g. a pathogen). To separate out environmental effects from true PD, the trait of pith tissue could serve as a covariate when analyzing force to dehiscence.

The fracture structure of the pod wall was moderately related to spiraling, and with visual dehiscence at some environments. The linear fracture morphology described in our paper likely relates to the alignment of pod wall fibers at an angle to pod sutures, which can cause spiraling of the carpel (Funatsuki et al., 2014). As the evaluation of spiraling required equal time to measure as fracture, and spiraling was more correlated to other metrics of PD, we see little utility for a rating of fracture.

Some traits showed inconsistent correlation with PD metrics among environments, such as pod flexibility. Such traits would not be reliable secondary selection traits for PD.

Pod moisture was not correlated with visual dehiscence or force to dehiscence. Consequently, pods in our study had likely reached the critical pod moisture required for PD. The weak correlation between spiraling and pod moisture could indicate that some samples were above the critical pod moisture threshold for PD. Moisture contents in our evaluation (6.7% to 9.3%) were below the critical pod moisture (10.1% to 10.4%) associated with PD in soybean (Zhang et al., 2018). However, these moisture contents were above the stable moisture found in common vetch (5%) (Dong et al., 2016). Supplementary Figure 2 shows the results of PD after various pod drying times, heat conditions, and pod moistures to identify critical pod moisture in hairy vetch.

Flowering timing of lines were not strongly correlated with any measures of PD. In other species, genotypes with earlier flowering timing have exhibited more PD, as they were exposed for more time to heat and drying forces that can cause rupture of the dehiscence zone (Zhang et al., 2018). In our dataset, the stabilization of pod moisture via drying may have reduced the influence of maturity timing on PD.

Selecting for pod indehiscence may conflict with other field traits of interest in Vicia villosa. Lines with high spring vigor, a trait desired by growers (Wayman et al., 2016), also tended to have low PD, indicating the potential to select for both desired traits. However, there was a tradeoff between PD and seed yield in some environments. Selection for PD should closely monitor seed yield, to ensure lines developed for low PD also produce adequate yield for seed growers.

# AUTHOR'S NOTE

Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture.

### DATA AVAILABILITY STATEMENT

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation, to any qualified researcher.

### REFERENCES


### AUTHOR CONTRIBUTIONS

LK and HR designed and analysed the experiment. LK wrote results. BR, AB, NE, SK, SM, CR-H, MR, SE, SW, and NW acquired and assisted with interpretation of the data and critical revision for intellectual content. All authors provided approval of publication and agree to be accountable for all aspects of the work.

# FUNDING

Funding provided by National Institute for Food and Agriculture, Organic Research Extension Initiative Grant number 2015-51300-24192 and National Institute for Food and Agriculture Grant number 2018-67013-27570.

### ACKNOWLEDGMENTS

The authors thank Rebecca Heidelberger for technical assistance in the field.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2020.00082/ full#supplementary-material


Meyer, R. S., and Purugganan, M. D. (2013). Evolution of crop species: genetics of domestication and diversification. Nat. Rev. Genet. 14, 840–852. doi: 10.1038/nrg3605


cover crop use and breeding. Renew. Agric. Food Syst., 32, 376–386. Available at http://www.journals.cambridge.org/abstract\_S1742170516000338.


Conflict of Interest: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Kissing Kucek, Riday, Rufener, Burke, Eagen, Ehlke, Krogman, Mirsky, Reberg-Horton, Ryan, Wayman and Wiering. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Comparative Analysis of Early Life Stage Traits in Annual and Perennial Phaseolus Crops and Their Wild Relatives

Sterling A. Herron1,2\*, Matthew J. Rubin2 , Claudia Ciotir <sup>1</sup>† , Timothy E. Crews <sup>3</sup> , David L. Van Tassel <sup>3</sup> and Allison J. Miller 1,2\*

<sup>1</sup> Department of Biology, Saint Louis University, St. Louis, MO, United States, <sup>2</sup> Donald Danforth Plant Science Center, St. Louis, MO, United States, <sup>3</sup> The Land Institute, Salina, KS, United States

### Edited by:

Petr Smýkal, Palacký University, Czechia

### Reviewed by:

Steven B. Cannon, United States Department of Agriculture, United States Péter Török, University of Debrecen, Hungary

### \*Correspondence:

Sterling A. Herron sterling.herron@slu.edu Allison J. Miller allison.j.miller@slu.edu

Present address:

Claudia Ciotir, Department of Evolutionary and Environmental Biology and Institute of Evolution, University of Haifa, Haifa, Israel

†

### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 25 October 2019 Accepted: 13 January 2020 Published: 10 March 2020

### Citation:

Herron SA, Rubin MJ, Ciotir C, Crews TE, Van Tassel DL and Miller AJ (2020) Comparative Analysis of Early Life Stage Traits in Annual and Perennial Phaseolus Crops and Their Wild Relatives. Front. Plant Sci. 11:34. doi: 10.3389/fpls.2020.00034 Herbaceous perennial species are receiving increased attention for their potential to provide both edible products and ecosystem services in agricultural systems. Many legumes (Fabaceae Lindl.) are of special interest due to nitrogen fixation carried out by bacteria in their roots and their production of protein-rich, edible seeds. However, herbaceous perennial legumes have yet to enter widespread use as pulse crops, and the response of wild, herbaceous perennial species to artificial selection for increased seed yield remains under investigation. Here we compare cultivated and wild accessions of congeneric annual and herbaceous perennial legume species to investigate associations of lifespan and cultivation with early life stage traits including seed size, germination, and first year vegetative growth patterns, and to assess variation and covariation in these traits. We use "cultivated" to describe accessions with a history of human planting and use, which encompasses a continuum of domestication. Analyses focused on three annual and four perennial species of the economically important genus Phaseolus. We found a significant association of both lifespan and cultivation status with seed size (weight, two-dimensional lateral area, length), node number, and most biomass traits (with cultivation alone showing additional significant associations). Wild annual and perennial accessions primarily showed only slight differences in trait values. Relative to wild forms, both cultivated annual and cultivated perennial accessions exhibited greater seed size and larger overall vegetative size, with cultivated perennials showing greater mean trait differences relative to wild accessions than cultivated annuals. Germination proportion was significantly lower in cultivated relative to wild annual accessions, while no significant difference was observed between cultivated and wild perennial germination. Regardless of lifespan and cultivation status, seed size traits were positively correlated with most vegetative traits, and all biomass traits examined here were positively correlated. This study highlights some fundamental similarities and differences between annual and herbaceous perennial legumes and provides insights into how perennial legumes might respond to artificial selection compared to annual species.

Keywords: perennial grain, Phaseolus, Fabaceae, legume, pulse, crop wild relative, domestication

# INTRODUCTION

Life history theory traditionally categorizes plants as annuals, typified by fast growth and high reproductive effort, and perennials, typified by long-term survival and delayed reproduction (Cole, 1954; Harper, 1977). However, exceptions to this general trend occur; patterns of plant resource allocation exist along a gradient, with some perennial species showing fast growth and high reproductive output (e.g., Verboom et al., 2004; González-Paleo and Ravetta, 2015). Exploration of this life history diversity is important in plant breeding, as wild, herbaceous perennial species are increasingly considered as sources of novel genetic variation through crosses and as entirely new domesticates targeted for seed products. Many major crops have perennial wild relatives that remain uncharacterized despite their potential.

Nearly all grain and legume crops grown for human consumption are annual plants that complete their life cycle in a single year, or are perennial species cultivated as annuals (Van Tassel et al., 2010). Although thousands of herbaceous perennial species exist within major crop families (e.g., Ciotir et al., 2016; Ciotir et al., 2019), annual plant species were likely selected during the early stages of domestication due to pre-existing agriculturally favorable traits, e.g., high reproductive yield in a single season and accelerated germination and flowering (Van Tassel et al., 2010). Over time, artificial selection has led to exceptional gains in reproductive output in annual crops, particularly in the last century (Mann, 1997); however, cultivation intensity and other agronomic practices have resulted in widespread soil loss (FAO, 2019). Current research focuses in part on the development of crops to support the ecological intensification of agriculture, which aims to achieve both high yields and ecosystem services, such as soil and water retention (Ryan et al., 2018).

The use of herbaceous perennial species as seed crops has been proposed as one method of achieving ecological intensification (Jordan and Warner, 2010; Ryan et al., 2018). Longer-lived species have deep, persistent root systems that mitigate erosion and enhance nutrient uptake, they produce perennating shoots which reduce yearly planting costs, and they have a longer photosynthetically active growth period, allowing for high biomass production each year (Cox et al., 2006; Crews et al., 2018). However, only a few perennial seed crops (principally cereals, oilseeds, and pulses) have entered the domestication process, and we know relatively little about how artificial selection for increased seed production will impact the rest of the perennial plant.

Artificial selection is the human-mediated evolutionary process that leads to changes in plant traits over the course of generations. This process happens as a result of selective cultivation, the act of preferentially planting individuals with desired features. Over time cultivated populations evolve in response to artificial selection, leading to domestication: the evolution of morphological and genetic changes in cultivated populations relative to their wild progenitors (Harlan, 1995; Meyer et al., 2012). Because domestication is an ongoing, evolutionary process, a continuum of cultivated populations exist in species undergoing domestication, ranging from cultivated populations which display little or no differences relative to their wild (uncultivated) progenitors, to highly modified elite breeding lines, such contemporary maize, which differs dramatically from its closest wild relatives (Harris, 1989; Bharucha and Pretty, 2010; Breseghello and Coelho, 2013). To determine the extent to which domestication has occurred, the precise identity of the wild ancestor of a crop and the exact cultivated populations derived from it are required. Because these data are not available for all species examined here, we use the term "cultivated" to refer to any accession that has a history of cultivation, acknowledging that this encompasses a broad spectrum of phenotypic and genetic change under artificial selection.

The "domestication syndrome" describes a common suite of trait changes seen across different species in response to artificial selection for seed and/or fruit production (Hammer, 1984; Harlan, 1992). In annual species, the domestication syndrome typically includes loss of seed dormancy, higher germination, loss of shattering, greater seed size or number, and erect, determinate growth, among many others (Harlan et al., 1973; Olsen and Wendel, 2013; Abbo et al., 2014). In contrast to annuals, the domestication syndrome of woody perennials, such as fruit and nut trees, is characterized by an extended juvenile phase, outcrossing mating system, and often clonal propagation; consequently, woody perennials have typically undergone fewer cycles of sexual selection under domestication and retain a greater proportion of wild genetic diversity relative to annual domesticates (Zohary and Spiegel-Roy, 1975; Miller and Gross, 2011; Gaut et al., 2015). Herbaceous perennial plants cultivated for both edible seeds and sustained perennation were only recently targeted for selection (Suneson et al., 1963; DeHaan et al., 2016; Kane et al., 2016; Kantar et al., 2016; Crews and Cattani, 2018), and it is unclear if they will follow an evolutionary trajectory similar to domesticated annual and woody perennial species, or if they will show a unique domestication syndrome.

The agricultural context provides an entirely new adaptive landscape that may allow novel combinations of traits in herbaceous perennial species (e.g., high reproductive output and longevity), combinations that are often unfavorable in natural environments (Crews and DeHaan, 2015; Cox et al., 2018). Ongoing work seeks to understand how artificial selection for increased seed yield in herbaceous perennial species might impact vegetative traits and the capacity for perennation more generally. One hypothesis is that perennial seed crops are constrained by a vegetative-reproductive trade-off, where high reproductive allocation and sufficient storage allocation for perennation cannot coexist (Van Tassel et al., 2010). In other words, it may be possible for artificial selection to drive increases in seed yield in wild, herbaceous perennial species, but those increases may cause losses in allocation to vegetative and perennating structures, resulting in a shift from perenniality to annuality (Denison, 2012; Smaje, 2015). Some studies have supported such a trade-off (e.g., González-Paleo et al., 2016; Vico et al., 2016; Cattani, 2017; Pastor-Pastor et al., 2018). An alternative hypothesis is that reproductive yield and vegetative biomass may be selected for in concert, leading to sustained perennation. Concomitant perennation and high seed yield have been observed in some perennial cereals (Sacks et al., 2007; Jaikumar et al., 2012; Culman et al., 2013; Huang et al., 2018).

Herbaceous perennial seed crop studies can benefit from including comparisons with closely related annual domesticates, to clarify if domestication responses are dependent upon lifespan. Vico et al. (2016), in a meta-analysis of 67 annual-perennial pair studies across nine plant families, found greater reproductive allocation in annual crops and greater root allocation in perennial crops, which was the same pattern found in unselected wild groups. Since a majority of the studies in this meta-analysis were from the grass family, such annual-perennial meta-analyses may be augmented by empirical research on specific lineages of plants, to determine if phylogenetically focused trends are similar to the broader patterns observed, as well as to allow a more precise biological interpretation.

Here we focus on the legume family (Fabaceae), which includes 19,500+ species of which more than 30% are predominantly herbaceous perennials (Ciotir et al., 2019). Fabaceae is the second most economically important plant family after the grasses (Poaceae), with 41 domesticated species dating back to the first agricultural systems, and 1,000+ species cultivated for various purposes across the world (Harlan, 1992; Lewis et al., 2005; Hammer and Khoshbakht, 2015). To date, few herbaceous perennial legume species have been thoroughly assessed for agriculturally relevant traits such as seed size, germination rate, time to maturity, root/shoot allocation, and reproductive yield. Characterizing these and similar traits in herbaceous perennial crops and their wild relatives will be critical to assess what genetic variation is available for crop improvement through introgression and de novo domestication (Schlautman et al., 2018; Smýkal et al., 2018).

While there are known consistent differences in growth and resource allocation in some groups of annual and perennial species, gaps remain in our knowledge about how herbaceous perennials respond to artificial selection relative to their annual congeners in many plant lineages. Here we explore life history differences in members of the common bean genus Phaseolus by addressing the following questions: 1) How do wild annual and perennial Phaseolus species from multiple geographic origins allocate resources to seed production and vegetative growth? 2) What is the phenotypic signature of artificial selection in cultivated annual and perennial Phaseolus species? 3) What covariation exists between seed and adult vegetative growth traits, and is it consistent across lifespans and between cultivated and wild forms in Phaseolus? We address these questions by examining seed size, germination, and vegetative growth allocation among three annual and four perennial Phaseolus species (Figure 1). Through this work, we hope to contribute to ongoing efforts characterizing life history differences in closely related annuals and perennials within Fabaceae and shed light on how these lifespan groups may change with artificial selection.


FIGURE 1 | Schematic of the four developmental stages analyzed in this study (seed size, germination, early vegetative growth, and biomass harvest) and examples of phenotypic diversity across the Phaseolus species studied: seeds of (A) Phaseolus acutifolius, (B) P. angustissimus, and (C) P. coccineus; (D) P. coccineus seeds prepared for analysis in ImageJ; (E) scarification by nicking the seed coat; (F) P. vulgaris germinants; (G) epigeal germination of P. filiformis; (H) hypogeal germination of P. coccineus; (I, J) Phaseolus shoot apex at which stem height was measured to; (K) Phaseolus first node (unifoliate) below which stem diameter was measured; (L) fully grown P. filiformis shoot with developed nodes (unfolded leaves); (M, N) P. filiformis flower and ripe fruit; (O) P. acutifolius shoot biomass; (P) P. coccineus root biomass. Photo credit: SH.

### Herron et al. Phaseolus Lifespan and Cultivation Effects

### METHODS

### Plant Material

Three annual Phaseolus species (P. acutifolius, P. filiformis, P. vulgaris; 58 accessions) and four perennial Phaseolus species (P. angustissimus, P. coccineus, P. dumosus, P. maculatus; 66 accessions) were included in this study (Tables 1 and 2). Species were chosen based on phylogenetic proximity and similar habitat types. The perennials P. coccineus and P. dumosus and annuals P. acutifolius and P. vulgaris are in the Vulgaris clade, and the perennial P. angustissimus and annual P. filiformis are in the Filiformis clade. Perennial P. maculatus is the only species in this study from the Polystachios clade (Delgado-Salinas et al., 2006). Geographically, our sampling of Phaseolus includes arid-adapted species of the Sonoran Desert (P. acutifolius, P. angustissimus, P. filiformis, and P. maculatus) and more tropically distributed species (primarily Mesoamerica: P. coccineus, P. dumosus, and P. vulgaris, with some South American accessions of the latter; Supplementary Table S4).

In total, we obtained seeds from 124 accessions from the United States Department of Agriculture's National Plant Germplasm System (Western Regional PI Station, Pullman, WA, stored at -18°C) in spring 2016, which were stored in a desiccator at 4°C, 33-50% relative humidity. All seeds were derived from plants regrown from the original collection material at the germplasm facility, but there were nevertheless possible genotype by environment and maternal effects that cannot be resolved here. 2,759 seeds were germinated and a subset of these grown from July to September 2016. The seed age for each accession, i.e., the length of time they were in frozen storage, ranged from one year to greater than 46 years (coded as 46).

### Lifespan and Cultivation Status Assignment

We classified species in terms of the predominant lifespan observed in wild populations from their native range. In some cases, this differed from lifespan assignment in the USDA accession description, in which case it was confirmed by

TABLE 1 | Summary of sampling for seed size and germination traits for Phaseolus, by species and cultivation status.


† Some species included subspecies designations: P. acutifolius (wild: 3 accessions of var. tenuifolius; cultivated: 2 accessions of var. acutifolius); P. coccineus (wild: 2 accessions of var. coccineus); P. vulgaris (wild: 8 accessions of var. aborigineus); P. maculatus (all accessions of ssp. ritensis). P. vulgaris includes wild accessions from both the Mesoamerican and South American ranges.

'Accessions' refers to the total number of accessions used for each species/cultivation status combination. 'Seeds' refers to the average seed number among accessions studied for seed size and germination (range of seed number in parentheses). Seed size collectively includes seeds at least measured for length and area, as well as seed weight in all but a few cases.

TABLE 2 | Summary of sampling for early vegetative growth and biomass traits for Phaseolus, by species and cultivation status.


"Acc." refers to the total number of accessions used for each species/cultivation status combination. "Plants" refers to the average plant number among accessions studied for that trait (range of plant number in parentheses).

extensive literature review; this occurred for P. coccineus, P. dumosus, and P. filiformis. Wild P. coccineus is a vigorous, perennial, indeterminate vine with an extensive root system; perenniality is also maintained in many cultivated forms (Delgado-Salinas, 1988; Smartt, 1988; Debouck, 1992; Freytag and Debouck, 2002). P. dumosus, a hybrid of P. coccineus and P. vulgaris, is also perennial, although it is less frost tolerant than P. coccineus (Smartt, 1988; Schmit and Debouck, 1991; Debouck, 1992; Freytag and Debouck, 2002; Mina-Vargas et al., 2016). Lastly, P. filiformis is an ephemeral, annual vine primarily found in the Sonoran Desert, which can survive up to seven months in favorable conditions (Buhrow, 1983; Nabhan and Felger, 1985; Freytag and Debouck, 2002). In addition, one accession of P. maculatus (PI 494138) was labeled as annual, although the species in general and and other accessions of this species are classified as perennial (Freytag and Debouck, 2002).

Cultivation status was taken directly from the USDA's description, with the descriptors "cultivated," "cultivar," and "landrace" all categorized as "cultivated" for the purpose of this data set. We use the umbrella term "cultivated" rather than "domesticated," since we do not have the data to determine the extent to which phenotypic and genetic change has occurred from the original wild population selected upon (see Introduction). All Phaseolus species are native to the Americas and were originally cultivated in either Mesoamerica or South America with some later selection occurring in Eurasia (Bitocchi et al., 2017). Cultivated Phaseolus species included here were first domesticated at least 1000 years before present, with most being domesticated much earlier (Kaplan and Lynch, 1999). Species-specific details on geographic origin and domestication are available in Supplementary Table S4.

### Traits Measured Seed Size

A total of 51 annual accessions (1,227 seeds) and 66 perennial accessions (1,436 seeds) were analyzed for size traits (Figure 1; Table 1). Seeds from each accession were weighed in bulk to the nearest mg and mean single seed weight was estimated by dividing the bulk weight by the total number of seeds for that accession. We imaged all accessions on a light table with a fixed camera at a resolution of 640 × 360 or 1349 × 748 pixels (differences accounted for in linear models), with all seeds oriented on their lateral side. From these two-dimensional images, we used ImageJ (Rasband, 1997-2016) to measure mean single seed length and area.

### Germination

58 annual accessions (1,390 seeds) and 64 perennial accessions (1,369 seeds) were monitored for germination (sampling differences from seed size are due to a few germinated accessions not being analyzed for seed size and vice versa, and some seed loss in the germination procedure). Number of germinated seeds was monitored for each accession and used to calculate germination proportion (Figure 1; Table 1). Seeds were germinated on RO-water dampened germination paper or with a dampened cotton ball in petri dishes, following 1) sterilization by soaking in 1% bleach for 2 minutes and rinsing, 2) scarification by nicking the seed coat with a scalpel (to break physical dormancy, i.e., a water-impermeable seed coat), and 3) soaking for an average of 23 hours (range: 13-32 hours) by submersion in RO water. The soaking start date ranged from June 26 to July 8, 2016 (one accession soaked on June 19); this was considered time point 0 (i.e., the sowing date) for germination counts, since all necessary resources were available for the seeds to germinate. The germination apparatus was placed onto a 24°C heat mat in 24-hour dark conditions (except for germinant counts and planting; López Herrera et al., 2001). Germination was defined as an extension of the radicle past the seed coat. In rare cases where the seed coat was lost or very hard, it was defined as a distinct vigorous movement of the radicle away from the seed or a distinct pushing of the seed coat away from the seed, respectively. Petri dishes were treated with Banrot 40WP fungicide solution (prepared according to the product label). Fungus-infected, potentially salvageable seeds were soaked in 1% to 2% bleach and rinsed with RO water. Germinated seeds were also scored for seed quality following any potential damage from pre-germination treatments (0-2, with 0 being no damage and 2 being the highest damage), and seeds which were compromised due to procedural damage or had prematurely germinated in storage were removed from the analysis. Germination was monitored prior to planting; accessions with remaining seeds were checked once more 7–10 days after planting for any new germinants. Any variation in germination protocol was noted and addressed in statistical models, as well as the covariates seed quality and seed age (years of frozen storage at the germplasm center since the last seed increase). Subsets of individuals from the original number germinated were chosen for further growth measurement based on the presence of ≥10 vigorous individuals, if the accession was from the native range of the species (preferred), and if the accession was from a duplicate geographic location (removed).

### Vegetative Growth Measurements

495 annual individuals (40 accessions) and 224 perennial individuals (29 accessions) of Phaseolus were transplanted to a greenhouse and measured for at least one vegetative trait (details below; Figure 1; Table 2). Seedlings were initially planted on July 13–14, 2016 in a mixture of unsterilized local riverine soil (Smoky Hill River, Salina, KS: 38.765948 N, -97.574213 W), sand, and potting soil (PRO-MIX) in small trays until they could be planted in 8" tall x 4" wide bag pots after 2-3 weeks. Initial planting date in small trays was used as the baseline for future measurements (days after planting, DAP), since despite different sowing dates, they were developmentally similar upon planting. Bag pots were filled with a mixture of the same riverine soil and coarse sand, to mimic field soil while also maximizing drainage. All Phaseolus were twining and were trained up four-foot bamboo poles. Plants did not receive any rhizobial inoculant treatment. Plants were initially bottom-watered twice daily (10AM, 6PM) for 20 minutes, and a shade cloth was incorporated in the greenhouse. After measurement of early vegetative growth (before biomass), some modifications were made. On August 18, 2016, watering was changed to once for 10 minutes every two days, and the shade cloth was removed on August 23, 2016. 90 of the most vigorous Phaseolus plants were moved from the greenhouse to the outdoors on August 25-26, 2016 to expose them to a more light-intense, natural environment. At this time, individual plants in both the greenhouse and outdoors were randomized to reduce spatial bias. All growth analyses were conducted at the research facilities of The Land Institute (Salina, KS).

At 19-23 DAP (25-40 days after sowing), plants were measured for early vegetative growth traits, which included stem diameter below the first node, total developed node number counted from the unifoliate node to the last node with an unfolded leaf, and stem height from ground to shoot apex on the tallest main stem, with twining stems being uncoiled from their poles and straightened as far as possible without damaging the plant. Plants were checked for reproductive status before biomass harvest. At 68-75 DAP (74-93 days after sowing), a random subset of plants was harvested for shoot and root (washed) biomass (Figure 1; Table 2), which was dried at a minimum of 37°Cfor at least 24 hours. Biomass was weighed on a precision or analytical scale depending on plant size. Root mass fraction was calculated on an individual plant basis from biomass measurements (root dry mass/total dry mass). Variation in growth conditions (greenhouse or outdoors), plant health (ordinal rating: 0,1,2; unhealthy, moderate health, healthy), and reproductive status (ordinal rating: 0,1,2,3; no reproduction, budding, flowering, fruiting) were recorded for individual plants and accounted for in statistical analyses (see below).

### Statistical Analyses

In order to assess associations of lifespan and cultivation status with trait variation, we used linear models and post hoc comparative analyses on a set of mean values for each trait from each accession (Table 3). Associations of lifespan, cultivation (nested within lifespan), and species (nested within lifespan and cultivation), in addition to any relevant covariates for the focal trait, were tested using linear models. All potentially confounding factors were included in the original model and were then dropped sequentially iffound to be nonsignificant. The base model for all main analyses was: trait = lifespan + lifespan/ cultivation + lifespan/cultivation/species. Analyses were calculated for accession-level means for all traits and covariates. Due to concerns about lifespan lability of P. dumosus, linear models for all traits were checked with this species included and then with the species removed from the dataset. Accessions with any uncertainty associated with their lifespan or cultivation status were also dropped from the model and checked in the same manner; these included PI 494138 (P. maculatus; called annual while usually perennial—see Lifespan Assignment) and PI 390770 (P. vulgaris, noted as "wild or naturalized" in the USDA description). Pairwise comparisons of lifespan and cultivation effects for each trait were evaluated with post-hoc Tukey HSD tests (Supplementary Table S1).

We ran separate linear models and Tukey HSD tests for Phaseolus wild accessions to detect any phenotypic signatures of geographic origin, divided broadly into desert-adapted and tropical-adapted species (Supplementary Tables S2 and S3). The base model for the geographic analyses was: trait = geography + lifespan + geography×lifespan + geography/ lifespan/species, including any relevant covariates. Cultivated accessions were not included in this model due to potentially confounding effects of artificial selection. See Supplementary Table S4 for the assignment of geographic origin categories to species.

To assess trait covariation within the whole dataset, Pearson product-moment correlations were performed on all pairwise combinations of the 11 measured traits using mean accessionlevel data, to determine the magnitude and direction of their relationships. Pearson correlations were also run on the


\*P < 0.05; \*\*P < 0.01; \*\*\*P < 0.001.

†Differences in image resolution (640x360 or 1349×748 pixels) were accounted for in this model: F values were nonsignificant (0.0566 for seed length and 0.8301 for seed area). Letters denote separate models with different covariates, while the main effects are the same for all traits: (a) seed size traits, (b) germination proportion, (c) early vegetative growth traits, and (d) biomass traits. See the Methods for explanations of each covariate.

following subsets of the data to qualitatively assess any trait covariation differences: annual, perennial, cultivated, and wild (Supplementary Figure S1). Each subset of the data was only restrictive in regard to one criterion, e.g., the annual subset contained data for both cultivated and wild annual accessions. Statistical analyses and figure generation were performed in R v. 3.6.1 (R Core Team, 2019).

All accession-level and individual plant data can be found in Supplemental Table S5. All individual seed size data may be found in Supplemental Table S6.

### RESULTS

We investigated annual and perennial Phaseolus species for potential differences and covariation in seed size, germination, and vegetative growth. We found that wild accessions of annual species showed nonsignificantly greater germination and vegetative trait values than wild perennial accessions, but did not show greater seed size traits (Figure 2). Cultivated accessions of both annual and perennial species had greater trait values compared to wild accessions for almost all traits measured, with

greater mean increases observed in perennial species (Figure 2). Lastly, seed and vegetative traits were significantly positively correlated, with some variation in this trend within subsets of the data (Figure 3; Supplementary Figure S1).

### Trait Differences in Wild Annual vs. Perennial Phaseolus Accessions

Although nonsignificant, wild perennial Phaseolus mean seed weight was nearly twice that of wild annuals (59 mg annual vs. 104 mg perennial; Figure 2A; see Supplementary Table S1 for all mean values, standard deviation, and Tukey test significance). Wild annual germination proportion was nonsignificantly higher than that of wild perennials (0.91 annual vs. 0.64 perennial; Figure 2D; Supplementary Table S1). Wild annual Phaseolus had similar to nonsignificantly larger vegetative trait values compared to wild perennials, with the largest relative differences seen in root dry mass (0.32 g annual vs. 0.19 g perennial) and total dry mass (1.48 g annual vs. 0.85 g perennial; Figures 2I, J; Supplementary Table S1). Mean root mass fraction was nearly equivalent for wild annuals (0.22) and wild perennials (0.21; Figure 2K; Supplementary Table S1).

In our main linear models, lifespan explained a significant amount of the variation seen in all seed size traits, node number, and most biomass traits (except root mass fraction; Table 3). Seed age and soak time were not significant predictors of germination; seed quality had significance at P < 0.001 (Table 3). Plant health was significant (at least at <sup>P</sup> < 0.01) in the linear models for all vegetative growth and biomass traits except stem diameter and shoot dry mass (Table 3). Reproductive status was not significant for any biomass trait; outdoor proportion was significant at P < 0.05 for shoot dry mass and root mass fraction (Table 3). The removal of P. dumosus and the accessions PI 494138 (P. maculatus) and PI 390770 (P. vulgaris; see Methods) changed some linear model results; here we note traits for which any variable's significance was lost or gained in the main model, which occurred for some vegetative traits. The exclusion of P. dumosus resulted in lifespan becoming significant for stem diameter (P < 0.05), in species becoming nonsignificant for shoot dry mass, and in outdoor proportion becoming nonsignificant for root mass fraction. Similarly, the exclusion of PI 494138 resulted in lifespan becoming significant for stem diameter (P < 0.05) but also for root mass fraction (P < 0.05). The exclusion of PI 390770 did not change significance in any of our models.

While our data do not allow precise interpretation of geographic effects, linear models including geographic origin as a main effect along with lifespan found that geographic origin explained a significant portion of the variation seen in all seed

FIGURE 3 | Correlation diagram of all traits for all accessions in the Phaseolus dataset. Numbers in boxes represent the Pearson correlation coefficient. Blue and red colors indicate significant positive and negative correlations (at P < 0.05), respectively; absence of color indicates lack of significance.

size traits and most vegetative traits for wild Phaseolus accessions (Supplementary Table S2). In general, tropically distributed Phaseolus accessions had larger seed and vegetative growth traits compared to desert species (Supplementary Table S3). Mean germination proportion was relatively high (> 0.85) for all Phaseolus groups except tropical perennials (0.48, although tropical perennials also had the greatest standard deviation; Supplementary Table S3).

### Trait Differences in Cultivated vs. Wild Phaseolus Accessions

Cultivated annual and perennial Phaseolus accessions showed generally larger seed and vegetative size characteristics compared to wild relatives (Figures 2). Cultivation differences in seed size were only significant for seed length in annual Phaseolus, although cultivated annual seed weight (153 mg) was nearly three times larger than wild annuals (59 mg) (Figure 2A; Supplementary Table S1). Cultivated perennial Phaseolus had significantly greater seed size in all traits, with seed weight over six times larger in cultivated perennials (646 mg) than wild perennials (104 mg) (Figure 2A; Supplementary Table S1). Germination proportion was significantly lower in cultivated annual Phaseolus (0.54) relative to wild annuals (0.91), while cultivated perennial germination proportion was only slightly lower (0.56) than wild perennials (0.64) (Figure 2D; Supplementary Table S1).

Cultivated perennial Phaseolus accessions tended to have significantly larger vegetative features compared to their wild relatives, whereas cultivated annual Phaseolus accessions usually displayed nonsignificantly larger vegetative features compared to their wild relatives (Figures 2E–K; Supplementary Table S1). Stem diameter was however significantly greater in both cultivated annual and perennial Phaseolus (Figure 2E; Supplementary Table S1). Cultivated annual Phaseolus showed nonsignificantly lower values in node number and stem height compared to wild annuals, while cultivated perennial Phaseolus showed nonsignificantly larger values in both traits compared to wild perennials (Figures 2F, G; Supplementary Table S1). Both cultivated annual and perennial Phaseolus had greater dry biomass trait values than their wild relatives, but for cultivated annual Phaseolus, the only significantly larger biomass value was shoot dry mass (1.06 g cultivated vs. 0.73 g wild; Figure 2H; Supplementary Table S1). In contrast, all biomass trait values were significantly greater in cultivated relative to wild perennial Phaseolus (except root mass fraction), with a greater than six-fold larger root dry mass value (1.60 g cultivated vs. 0.19 g wild; Figure 2I; Supplementary Table S1). In our linear models, cultivation explained a significant amount of variation for all traits except stem height (Table 3).

### Phenotypic Covariation Across All Traits

Trait covariation was predominantly positive within the total dataset (Figure 3) and each subset of the data (Supplementary Figures S1A-D). Considering the entire dataset, all seed dimensions (weight, length, area) were nearly perfectly correlated (R<sup>2</sup> = 0.95-0.98; Figure 3). Seed traits were also significantly positively correlated with all vegetative growth traits, having the highest correlation with stem diameter and most biomass traits (R<sup>2</sup> = 0.65-0.76; Figure 3). Stem diameter had positive albeit non-significant correlations with node number and stem height, and significant positive correlations with all biomass traits. Node number and stem height had nonsignificant positive correlations with most biomass traits (Figure 3). Biomass traits including shoot dry mass, root dry mass, total dry mass, and root mass fraction were significantly positively correlated with one another, and there was a tight relationship between shoot and root dry mass (R<sup>2</sup> = 0.84; Figure 3). Germination proportion was the only trait in the total dataset to have no significant correlations and to have more than two negative correlations with other traits (Figure 3).

Subsets of our data also displayed similarly positive trait correlations with some exceptions. Due to sample size, each subset was restricted in regard to only one criterion (lifespan or cultivation status), e.g., the perennial subset contained both cultivated and wild perennial accessions. Notably, the annual subset showed negative correlations between node number and most traits (significant for seed size traits, stem diameter, and shoot dry mass), as well as a significant negative correlation of germination proportion with seed weight and stem diameter (Supplementary Figure S1A). In contrast, the perennial, cultivated, and wild subsets respectively tended to show positive correlations for the same trait pairings which were negatively correlated in annuals, although germination proportion and node number showed at least one negative correlation in all subsets (Supplementary Figure S1B-D). Significant positive seed size to vegetative trait correlations tended to be stronger and more common in the perennial and cultivated subsets than the annual and wild subsets (Supplementary Figures S1).

# DISCUSSION

This study examined lifespan and cultivation effects among annual and herbaceous perennial legume species in the economically important genus Phaseolus, with the aim of identifying common trends and potential covariation among traits. Consistent trends included greater seed and plant size in cultivated relative to wild accessions for both annual and perennial species, and positive correlations among most seed and vegetative traits for annual and perennial species in the wild and under cultivation.

### Wild Annual and Perennial Phaseolus Accessions Show Only Slight Differences in Trait Values

Wild perennial Phaseolus accessions in this study exhibited somewhat greater seed size trait values relative to wild annual accessions, which is consistent with the general life history expectation that later successional species produce larger seeds that are better able to compete for resources (Thompson and Hodkinson, 1998). This trend was consistent for Phaseolus species native to the desert and the tropics (Supplementary Table S3). Tropical Phaseolus species had higher mean seed size than desert Phaseolus, which is consistent with the broad pattern of increasing seed size at lower latitudes (Moles et al., 2007; Supplementary Table S3). The large range of phenotypic variation observed in annual and perennial Phaseolus (Figures 2A–K) is also consistent with the large distributions of several species, such as P. vulgaris and P. coccineus, which inhabit environments ranging from very arid to very humid (Dohle et al., 2019).

While few studies have compared germination in wild Phaseolus species, Bayuelo-Jiménez et al. (2002) reported a similarly high germination proportion (0.85+) for scarified wild annual and perennial Phaseolus accessions, including the desert species P. angustissimus and P. filiformis. Although wild annuals showed higher mean germination than wild perennials considering the whole dataset, desert perennial accessions all reached 100% germination, suggesting that the broader trend is more applicable to the tropical species studied here (Supplementary Table S3). Our lowest wild mean germination was observed in the tropical perennials (P. coccineus and P. dumosus; Supplementary Table S3), which may reflect the trend that seed dormancy is less common in tropical herbaceous species in regions of higher rainfall (Baskin and Baskin, 2014).

While germination showed consistent differences between wild annuals and perennials, we cannot make ecologically robust conclusions here, since we effectively removed the legumes' primary form of physical dormancy through scarification, and each accession was stored frozen for different lengths of time at USDA facilities, which can have diverse effects on species' germination biology (Walters et al., 2005). Our results may be more reflective of the viability of seeds after long-term storage. In this light, our findings are consistent with life history predictions that annual species will maintain a complex, longlived seed bank in which dormancy breaking and germination is staggered over time to ensure offspring survival in variable environments (Venable and Lawlor, 1980; Gremer et al., 2016). Perennials may be less selectively constrained by this pressure due to the parent plant's persistent survival in more stable environments (Cohen, 1966; Thompson et al., 1998).

The lack of significant differences in vegetative traits between wild annual and perennial Phaseolus suggests that the traits studied here do not diverge at this growth stage among different Phaseolus life history strategies in nature. Wild perennials' slightly lower mean vegetative growth could be due to a slower growth rate and the early stage of growth at which the traits were measured (three to eleven weeks after planting), since some perennials have been shown to be able to achieve a higher total biomass than related annuals when the entire growing season is considered (Dohleman and Long, 2009). Previous studies have also found that vegetative growth, shoot biomass, and root biomass are similar between closely related annuals and perennials, up to 40 days of growth, before their resource allocation patterns diverge (De Souza and Vieira Da Silva, 1987; Garnier, 1992). The significance of plant health in vegetative linear models for stem height and node number may reflect a greater sensitivity of response in stem length traits to environmental stressors compared to stem diameter; the health effect on root dry mass and derived traits (total dry mass and root mass fraction) may be due to a lower sample size for these traits compared to shoot dry mass alone (Tables 2 and 3).

### Cultivated Phaseolus Accessions Show Greater Seed and Plant Size Relative to Wild Accessions

Phaseolus species showed larger seed and vegetative trait values in cultivated relative to wild accessions, consistent with domestication syndrome expectations; however, there were some notable differences in this trend between the annual and perennial groups. Consistent with previous studies in Phaseolus (Smartt, 1988; Koinange et al., 1996; Aragao et al., 2011), all cultivated annuals and perennials exhibited greater seed size relative to wild accessions. Perennial Phaseolus showed a particularly large difference in cultivated vs. wild seed size, as well as substantial variation in this trait, suggesting a large amount of genetic diversity. Much of the perennial seed size variation stemmed from P. coccineus, which is primarily outcrossing and more genetically diverse than other Phaseolus crops (Bitocchi et al., 2017; Guerra-García et al., 2017). Our seed size observations in USDA accessions are consistent with an analysis of wild and domesticated Phaseolus accessions from the International Center for Tropical Agriculture (CIAT), with the exception that they observed larger ranges of seed weight in cultivated annual species (P. acutifolius, P. vulgaris; Chacón-Sánchez, 2018). They similarly found that the perennial P. coccineus had the largest range in seed size in both cultivated and wild accessions, that P. coccineus showed the largest mean seed weight increase with cultivation, and that the perennial P. dumosus also showed some of the largest cultivated seed weights (Chacón-Sánchez, 2018).

Cultivation effects on Phaseolus germination may reflect different annual-perennial seed dormancy strategies. Domestication in annual common bean (P. vulgaris) and other seed crops has been known to reduce seed coat thickness and seed dormancy (Koinange et al., 1996; Fuller and Allaby, 2009), which could expose cultivated accessions to premature water imbibition and seed mortality in storage (López Herrera et al., 2001). Comparably low germination proportion in cultivated and wild perennial Phaseolus could be due to a less selective pressure for seed dormancy (and therefore lower seed longevity) in wild perennials relative to wild annuals, resulting in less change in germination proportionwith domestication. Since our accessionswere of various ages and geographic origin, more precise studies on fresh seedstock are necessary to confirm germination differences with cultivation for both annual and perennial species.

Vegetative size was generally larger in cultivated relative to wild annuals and perennials, although there were some notable exceptions. Mean node number and stem height were lower in cultivated annual Phaseolus and higher in cultivated perennial Phaseolus relative to wild accessions. Similarly, node number and height have been found to be lower in cultivated relative to wild annual P. vulgaris (Koinange et al., 1996; Berny Mier y Teran et al., 2018). This suggests that on average cultivated annual Phaseolus exhibit lower degrees of stem growth, which may be the result of direct selection for determinate growth or an indirect consequence of increased harvest index, i.e., biomass allocated to reproductive structures at the expense of vegetative growth. This is in spite of all of the cultivated accessions in this study retaining their twining habit. In contrast, cultivated perennial Phaseolus exhibited consistently higher values for vegetative growth relative to wild perennials, possibly indicating a persistence of indeterminate growth in cultivation (Smartt, 1988), allowing for simultaneous selection on increased vegetative growth and reproduction (but see Delgado-Salinas, 1988). Both cultivated annual and cultivated perennial accessions displayed higher shoot and root dry mass relative to wild accessions, in agreement with the findings of Berny Mier y Teran et al. (2018) in P. vulgaris; this coincided with higher values in stem diameter for cultivated accessions of both lifespans, in agreement with Smartt (1988). Under cultivation, higher shoot and root biomass relative to wild accessions have also been observed in 30 diverse annual crop species, attributed to greater seed size and leaf area allowing enhanced downstream effects on growth (Milla et al., 2014; Milla and Matesanz, 2017). Our study suggests that some cultivated perennial species in Phaseolus exhibit similar size increases.

Root mass fraction, a resource-conservative trait, exhibited higher values in cultivated relative to wild accessions for both lifespans; this contradicts the expectation that plants become more resource-acquisitive with domestication (McKey et al., 2012; Vilela and González-Paleo, 2015; Pastor-Pastor et al., 2018). Higher root mass fraction was also observed in the annuals Phaseolus vulgaris (Berny Mier y Teran et al., 2018) and Pisum sativum (Weeden, 2007) relative to wild forms. This difference could be a byproduct of a more general higher allocation to vegetative growth, allowing greater amounts of photosynthate to be allocated to roots (Weeden, 2007). In summary, while many vegetative traits exhibit higher values in cultivated relative to wild accessions for both annual and perennial Phaseolus species, some traits may exhibit divergent lifespan patterns. It remains to be determined if this is due to lifespan per se or each species' unique history of artificial selection.

# Trait Covariation Is Significantly Positive in Most Cases

Positive correlations among traits observed in our dataset suggest that some suites of traits are synergistic, including seed size, some early growth traits, and biomass allocation (Figure 3). Positive seed-vegetative size correlations have also been found in other herbaceous plant systems (Geber, 1990; Kleyer et al., 2019), as well as across vascular plants more generally (Díaz et al., 2015). The nearly perfect correlation among seed size parameters (weight, area, and length) suggests that both two-dimensional seed features and mass are jointly selected upon and can be used interchangeably for these species. Significant positive correlations between seed size and vegetative biomass are consistent with the fundamental constraint of plant size on seed size, i.e., small plants cannot produce very large seeds (Venable and Rees, 2009). Furthermore, positive correlations between stem diameter and seed size support the hypothesis that seed size is biomechanically limited by the size of the subtending branch (Aarssen, 2005; Venable and Rees, 2009). Seed size alone may be less reflective of whole-plant reproductive allocation (Vico et al., 2016) and more so of a tolerance-fecundity tradeoff, where larger seeds are produced in fewer numbers but have greater competitive ability in stressful conditions (Muller-Landau, 2010). Nevertheless, the perennial grain crop Thinopyrum intermedium showed a positive relationship of total biomass and single seed mass with total reproductive yield across three years of growth, suggesting that these traits are positively correlated in some perennial species (Cattani and Asselin, 2018; also see Kleyer et al., 2019). Lastly, significant positive associations between all dry biomass traits do not support aboveground-belowground tradeoffs; rather, the data suggest that larger plants require greater resource acquisition with proportionally larger roots. This is consistent with a previous study reporting positive correlations among shoot dry mass, root dry mass, and root mass fraction in Phaseolus vulgaris (Berny Mier y Teran et al., 2018).

There were a few notable exceptions to the general trend of positive correlations. There were unexpectedly no significant correlations between seed size and germination proportion considering the entire dataset, with only slight negative correlations detected. There is a theoretical expectation that germination dormancy and seed bank persistence are negatively related to seed size, since dormancy and seed size entail different strategies of surviving in different environmental conditions, i.e., small, dormant seeds dominate seasonal environments and large, non-dormant seeds are typical of aseasonal environments (Baskin and Baskin, 2014; Rubio de Casas et al., 2017). The annual and wild subsets showed greater negative seed size-germination correlations (significant for germination vs. seed weight for annuals), contrasting with positive correlations in the perennial and cultivated subsets, suggesting some association of lifespan and cultivation with this trait relationship (Supplementary Figure S1). The removal of the physical dormancy barrier through scarification must also be considered, although we may still expect differences in seed longevity in storage proportional to seed size.

There was also a general lack of significant covariation between early vegetative stem growth traits (stem height and node number) and most biomass traits (Figure 3). This is in contrast to a previous study reporting that stem height is a highly interconnected hub trait in herbaceous perennial species (Kleyer et al., 2019), although we find many of the same positive associations between vegetative and seed size traits. The significant negative correlations detected in annual species between seed size traits and node number may reflect tradeoffs between seed size and stem length allocation due to the biomechanical restraints of bearing larger seeds on a longer stem (Kleyer et al., 2019) or may be the result of reduced stem growth during artificial selection (see above discussion), although this correlation was positive in all other data subsets (Supplementary Figure S1).

Overall, positive correlations among traits measured here suggests that future breeding efforts targeting greater seed size in perennials may be possible without concomitant reductions in vegetative allocation. Consistently high correlations may also allow for measurement of simpler traits (e.g., stem diameter, shoot dry mass) as proxies for traits that are more difficult to assess (e.g., root dry mass, total yield).

### CONCLUSION

For this set of Phaseolus species, we found that cultivation is associated with an increase in seed size and overall vegetative size in both annuals and perennials, and that seed size and most vegetative traits were positively correlated in both cultivated and wild annuals and perennials. Traits are shaped and predicted by more specific factors than lifespan alone, including habitat differences, evolutionary history, and which specific organs were targeted during artificial selection. Resolving these factors will require a focused sample of close sister species from the same native range, or intraspecific ecotypes of differing lifespan. Also, all perennials in our study were analyzed in their first year of growth, while their overall life history strategy is contingent upon their survival across multiple years. Thus, this study may also be augmented by multiyear assessments of reproductive and vegetative trait variation in perennial individuals. Such studies will advance our basic knowledge of life history evolution and inform plant breeding as we determine viable methods of ecological intensification. Conclusively, we highlight here common features in cultivated annual and perennial species compared to their wild relatives, and we observed few tradeoffs among seed and vegetative traits in the first year of growth. This offers insight into how perennial legume crops may respond to artificial selection relative to their annual relatives, and it suggests that many traits of interest may be selected for in concert.

### DATA AVAILABILITY STATEMENT

All datasets generated for this study are included in the article/ Supplementary Material.

### AUTHOR CONTRIBUTIONS

AM and SH designed the project. SH implemented the project and wrote the manuscript, with significant input from AM and

### REFERENCES


MR. MR assisted in statistical analysis and figure generation. CC, TC, and DT provided extensive technical assistance, helpful feedback on the basis and design of the study, and revisions to the manuscript. All authors read and approved the final version of the manuscript.

### FUNDING

This research was funded by the Perennial Agriculture Project (Malone Family Land Preservation Foundation and The Land Institute). SH is supported by a graduate assistantship from Saint Louis University. MR is supported by the Donald Danforth Plant Science Center and the Perennial Agriculture Project.

### ACKNOWLEDGMENTS

The Land Institute provided research tools and space for this project. Seeds were provided by the United States Department of Agriculture Western Regional PI Station (Pullman, WA). We are especially grateful to Daniel Debouck and Colin Khoury for their extensive feedback on Phaseolus species' biology and lifespan assignment. We thank the following Land Institute researchers and associates for valuable input in experimental design and intellectual contributions: Lee R. DeHaan, Matthew Newell, Damian A. Ravetta, Alejandra Vilela, and Shuwen Wang. We extend a special thanks to The Land Institute interns, technicians, and associates who assisted in experimental set-up, tending plants, and measuring traits: Mindelena Adams, Sheila Cox, Eliot Cusick, Tiffany Durr, Nick Feijen, Katherine Fortin, Maya Kathrineberg, Laura Kemp, Ron Kinkelar, Jordan Lowry, Maged Nosshi, Diego Sanchez, Codie Van de Meter, and Eline Van de Ven. Brandon Schlautman and REU student Dahlia Martinez also assisted in phenotyping and plant caretaking. We are grateful to the Miller Lab Group for valuable comments on previous versions of this manuscript.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2020.00034/ full#supplementary-material

Baskin, C. C., and Baskin, J. M. (2014). Seeds: Ecology, Biogeography, and Evolution of Dormancy and Germination. (San Diego, CA: Academic Press), 1600.


Conflict of Interest: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Herron, Rubin, Ciotir, Crews, Van Tassel and Miller. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Enhancing Crop Domestication Through Genomic Selection, a Case Study of Intermediate Wheatgrass

Jared Crain<sup>1</sup> , Prabin Bajgain<sup>2</sup> , James Anderson<sup>2</sup> , Xiaofei Zhang<sup>3</sup> , Lee DeHaan<sup>4</sup> \* and Jesse Poland<sup>1</sup> \*

<sup>1</sup> Department of Plant Pathology, Kansas State University, Manhattan, KS, United States, <sup>2</sup> Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, United States, <sup>3</sup> The Alliance of Bioversity International and International Center for Tropical Agriculture, Cali, Colombia, <sup>4</sup> The Land Institute, Salina, KS, United States

Perennial grains could simultaneously provide food for humans and a host of ecosystem services, including reduced erosion, minimized nitrate leaching, and increased carbon capture. Yet most of the world's food and feed is supplied by annual grains. Efforts to domesticate intermediate wheatgrass (Thinopyrumn intermedium, IWG) as a perennial grain crop have been ongoing since the 1980's. Currently, there are several breeding programs within North America and Europe working toward developing IWG into a viable crop. As new breeding efforts are established to provide a widely adapted crop, questions of how genomic and phenotypic data can be used among sites and breeding programs have emerged. Utilizing five cycles of breeding data that span 8 years and two breeding programs, University of Minnesota, St. Paul, MN, and The Land Institute, Salina, KS, we developed genomic selection (GS) models to predict IWG traits. Seven traits were evaluated with free-threshing seed, seed mass, and non-shattering being considered domestication traits while agronomic traits included spike yield, spikelets per inflorescence, plant height, and spike length. We used 6,199 genets – unique, heterozygous, individual plants – that had been profiled with genotyping-by-sequencing, resulting in 23,495 SNP markers to develop GS models. Within cycles, the predictive ability of GS was high, ranging from 0.11 to 0.97. Acrosscycle predictions were generally much lower, ranging from −0.22 to 0.76. The prediction ability for domestication traits was higher than agronomic traits, with non-shattering and free threshing prediction abilities ranging from 0.27 to 0.75 whereas spike yield had prediction abilities ranging from −0.22 to 0.26. These results suggest that progress to reduce shattering and increase the percent free-threshing grain can be made irrespective of the location and breeding program. While site-specific programs may be required for agronomic traits, synergies can be achieved in rapidly improving key domestication traits for IWG. As other species are targeted for domestication, these results will aid in rapidly domesticating new crops.

### Keywords: intermediate wheatgrass, genomic selection, multi-environment, domestication, perennial crops, shared data resources

### Edited by:

Eric Von Wettberg, University of Vermont, United States

### Reviewed by:

Ken Naito, National Agriculture and Food Research Organization (NARO), Japan Steven B. Cannon, Agricultural Research Service, United States Department of Agriculture, United States

\*Correspondence:

Lee DeHaan dehaan@landinstitute.org Jesse Poland jpoland@ksu.edu

### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 06 January 2020 Accepted: 04 March 2020 Published: 24 March 2020

### Citation:

Crain J, Bajgain P, Anderson J, Zhang X, DeHaan L and Poland J (2020) Enhancing Crop Domestication Through Genomic Selection, a Case Study of Intermediate Wheatgrass. Front. Plant Sci. 11:319. doi: 10.3389/fpls.2020.00319

**Abbreviations:** BLUPs, best linear unbiased predictors; GBS, genotyping-by-sequencing; GS, genomic selection; IWG, intermediate wheatgrass; PCA, principal component analysis; QTL, quantitative trait loci; SNP, single nucleotide polymorphism; TLI, The Land Institute.

# INTRODUCTION

fpls-11-00319 March 21, 2020 Time: 9:8 # 2

Currently, 80% of the world's calories are provided by annual crops (Pimentel et al., 2012), with only three crops, maize (Zea mays), wheat (Triticum aestivum), and rice (Oryza sativa), providing nearly 60% of human calorie consumption (fao.org). Additionally, 70% of arable land is planted to annual crops (Cox et al., 2010; Pimentel et al., 2012) that are resourceintensive and can result in environmental degradation (Power, 2010; Crews et al., 2018). Perennial grain crops could provide abundant ecosystem services while simultaneously providing food, feed, and fuel for the global population. To date, there are no widely planted perennial grain crops, but recent research has resulted in large scale evaluations of perennial rice (Huang et al., 2018) and perennial versions of several other crops including wheat, sorghum (Sorghum bicolor), sunflower (Helianthus), and pulses are in development (Batello et al., 2013).

One species showing promise for domestication and widescale production is intermediate wheatgrass (Thinopyrum intermedium, IWG). Intermediate wheatgrass is native to Eastern Europe and the Mediterranean region (Tsvelev, 1983) and was introduced into the United States for erosion control and forage purposes in 1932 (Vogel and Jensen, 2001). This species was selected for domestication as a grain crop in the 1980's from an evaluation of nearly 100 perennial grasses based on its seed size, vigorous growth habit, and potential for mechanical harvest, among other desirable characteristics, at the Rodale Institute, Kutztown, PA (Wagoner, 1990). In the early 2000's after two cycles of selection at the USDA's Big Flats Plant Materials Center, Corning, NY, breeding efforts shifted to The Land Institute (TLI), Salina, KS (Zhang et al., 2016). Since 2003, nine cycles of selection have been completed at TLI. Interest in IWG has led to the development of several other breeding programs, including the University of Minnesota (UMN) and University of Manitoba in 2011 using material from the third cycle of selection from TLI (TLI-C3) (Zhang et al., 2016). Products made from IWG grain are being sold under the trade name Kernza in limited markets (DeHaan and Ismail, 2017).

Along with intensive breeding effort, IWG has also been evaluated for a host of ecosystem services. Research has shown that IWG can reduce soil nitrate leaching by 86% or more compared to annual wheat crop systems (Culman et al., 2013). Jungers et al. (2019) found that nitrate leaching under perennial grasses, including IWG, were one to two orders of magnitude less than annual maize. IWG has also been reported to have 15 times more root growth and nearly two times the above-ground biomass of annual wheat (Sprunger et al., 2018), which should translate into greater below-ground carbon storage rates. Research also indicates that perennial landscapes have significantly increased and diverse microbial communities, allowing for greater food web complexity and increased nutrient cycling capacity (Culman et al., 2010; Pugliese et al., 2019). While IWG has the potential to provide both food and ecosystem services, factors such as grain yield and ability to mechanically harvest must be improved to an economically viable level for farmers to adopt this new crop.

Breeding new crops from wild species requires domestication, which often utilizes rare allelic mutations to facilitate the development of crops. One common domestication trait has been the prevention of shattering, which enables mechanical harvest. Numerous domestication events have been recorded in barley, rice, and sorghum (Østerberg et al., 2017), with reduction of shattering a hallmark of domestication as the plant becomes more dependent on humans for seed dispersal (Purugganan and Fuller, 2009). Other key traits that have evolved through domestication include larger seed size, free threshing seeds, and an increase in percent seed set (Harlan et al., 1973). Within IWG breeding, key domestication traits being targeted are greater percent of free threshing seeds, reduction in seed shattering, and increased seed mass.

Early work in domestication architecture through quantitative trait loci (QTL) often suggested single or a few genes with large effects (Koinange et al., 1996; Olsen and Wendel, 2013), which would allow for more efficient selection than selection on numerous loci with small effects (Falconer and Mackay, 1996). As molecular tools and studies have improved, there has been increasing evidence that many domestication traits are controlled by numerous loci with small effects. While the exact number of domestication genes is unknown (Meyer and Purugganan, 2013), in maize one study has identified nearly 500 genomic regions that had been under selection for domestication features (Hufford et al., 2012).

While original IWG breeding work utilized recurrent phenotypic selection, modern genetic tools have provided breeders with new options. One of the most promising genetic tools for breeding is genomic selection (GS). Proposed by Meuwissen et al. (2001), GS functions by having dense marker coverage of the entire genome so that each QTL is in linkage disequilibrium with a marker (Goddard and Hayes, 2007). Using a population that has been both phenotyped and genotyped, a model can be developed to predict the phenotype of individuals that have only been genotyped. GS has been shown to increase the rate of genetic gain in animal and plant breeding (Bernardo and Yu, 2007; García-Ruiz et al., 2016; Crossa et al., 2017). Given the predicted benefits of GS, Zhang et al. (2016) evaluated the potential of GS in IWG for the UMN breeding program, and currently, TLI is primarily using GS within their IWG breeding program.

While multiple locations are breeding IWG, there has been limited integration of information between breeding programs. The opportunity to utilize molecular tools like GS across wide environments could open new potentials for faster genetic improvement, specifically by increasing the training population size (VanRaden et al., 2009), integrating more genotypes (Knapp and Bridges, 1990), and taking advantage of correlated environments (Spindel and McCouch, 2016). Genomic selection has been used to improve a variety of polygenic agronomic traits including yield, quality and disease resistance (Rutkoski et al., 2014; Battenfield et al., 2016; Guzman et al., 2016). For crop wild relatives undergoing domestication, there has been less work on the ability of GS to improve key domestication

traits such as shattering and free threshing. Work by Zhang et al. (2016) suggested that GS could be used to improve free threshing in IWG.

Applying GS to IWG across multiple environments could be a very cost-effective and efficient method to increase IWG breeding gains, but even within annual crops there is limited information about multi-site or multi-environment GS compared to single site GS studies. Lopez-Cruz et al. (2015) found that using marker-by-environment interactions resulted in a greater prediction accuracy than using withinenvironment models. A reaction norm model was used to generate prediction accuracies up to 0.4 in wheat in different environments throughout Kansas (Jarquín et al., 2017). In barley (Hordeum vulgare), a multi-environment GS model was shown to increase prediction accuracy 11% over singleenvironment analysis (Oakey et al., 2016). Resende et al. (2012) found that prediction accuracies in loblolly pine (Pinus taeda) were relatively consistent across environments as long as the environments were within the same breeding zone. However, in both of these examples many of the lines had true replication, whereas the IWG programs usually have single genotypes due to the challenges of cloning large numbers of individuals. As IWG breeding expands, the ability to combine data across multiple locations and breeding programs with differing, unreplicated, germplasm could be beneficial to increasing the rate of genetic gain.

Given the need for new crops and the challenges associated with developing perennial crops, this study focused on (1) How data from diverse sites and breeding programs could be combined to improve prediction abilities of models for enhanced selection decisions, (2) The ability of GS to accurately predict traits across a range of environments and traits, with emphasis on differences between domestication and agronomic traits, and (3) How insights gained from IWG breeding could be applied to other potential new crops undergoing domestication.

### MATERIALS AND METHODS

### Plant Material and Field Establishment

Using terminology consistent with Zhang et al. (2016), we refer to a genet as a unique individual plant with its own genetic makeup. The genets used for this study consisted of the TLI Cycles 6, 7, and 8 (TLI-C6, TLI-C7, TLI-C8) and UMN Cycles 1 and 2 (UMN-C1, UMN-C2) breeding programs. The IWG TLI-C6 consisted of 3,658 genets from 674 full-sib families grown in one site location at Salina, KS (38.7684◦ N, 97.5664◦ W) between 2015 and 2017. Genets were established in the fall of 2015 with 91 cm between rows and 61 cm between columns, and phenotypic evaluations were conducted in 2016 and 2017. DeHaan et al. (2018) provide additional details about the TLI-C6 population. TLI-C7 was formed from random intermating between selected TLI-C6 genets. Genomic selection was used in the TLI-C7 generation, and a training population consisting of 1,179 genets from approximately 4,000 genotyped genets, was planted in the fall of 2017. TLI-C8 genets were progeny from selected TLI-C7 individuals and consisted of 988 selected, training population genets from approximately 3,500 genotyped genets, with field planting occurring in the fall of 2018. Both TLI-C7 and C8 were divided into two groups with approximately half of each cycle being planted in an irrigated field, and the other half in a non-irrigated field, providing two contrasting environments for evaluation.

UMN-C1 consisted of 2,560 genets from 66 half-sib families from TLI-C3 material. Genets were established in the field, St. Paul, MN (44.9906◦ N, 93.1799◦ W) in the fall of 2011 with field observations in 2012 and 2013. Additional information about the UMN-C1 population can be found in Zhang et al. (2016). The UMN-C2 training population consisted of 372 genets that were established in the fall of 2014 with observations in 2015 and 2016. UMN-C2 was obtained from open-pollination of 48 genets selected from the UMN-C1 population with the best agronomic performance. UMN-C2 consisted of 1,656 genets, but phenotypic observations were only recorded for 372 genets, the training population for GS within the UMN breeding program. In both cycles, genets were planted in a single replication at a distance of 1 m rows and columns, 67 kg ha−<sup>1</sup> of N was applied in April of each year. Weed control in the plant nurseries was primarily done manually with a one-time application of herbicide Dual II Magnum (S-Metolachlor 82.4%, Syngenta) in April at a rate of 1.2 L ha−<sup>1</sup> . Experimental genets were surrounded on all sides with IWG plants. While each program is selecting genets for its respective growing region, all original UMN material, i.e., UMN-C1, came from TLI-C3, providing a common genetic link between the programs. All genets were evaluated as single plants with no replication.

## Field Evaluations

Field evaluations were completed for several key domestication and agronomic traits including: plant height, spikelets per inflorescence, spike length, spike yield, shattering, seed mass, and free-threshing. Plant height was measured after plants reached physiological maturity and was measured from the ground to the tip of the tallest spike. Shattering was measured on a five point scale, with 0 representing no shattering and 4 representing over 50% shattering by visual observation (DeHaan et al., 2018). Spike length was measured from the peduncle to the tip of the spike, and spikelets per inflorescence represented the average number of spikelets per head. Not all traits were measured for each year and genet, resulting in an unbalanced data set. Of key domestication traits, shattering was the only trait not observed in UMN-C1 and seed mass was not available for UMN-C2; all other traits were recorded in all cycles. In addition, minor differences in data collection between programs were noted. For UMN-C1 free threshing was measured on a fourpoint categorical scale, while for other years free threshing was estimated on a 0–100 percentage scale. The four-point scale was translated to match the percentage scale. For TLI cycles, spike yield was the mass of clean seed from one head, whereas in UMN cycles spike yield was estimated by weighing the entire seed head. Trait data was measured for 2 years with each year being considered a separate trait, with the

exception of TLI-C8 with first year phenotypic data being recorded in 2019.

### Genotyping and Bioinformatic Methods

All genets were profiled using genotyping-by-sequencing following protocols of Poland et al. (2012) using a two enzyme restriction digest with PstI and MspI. Libraries were prepared by multi-plexing 192 samples per GBS library, and all GBS libraries were sequenced on Illumina HiSeq 2500. Single nucleotide polymorphisms (SNPs) were called using the GBS pipeline in Trait Analysis by aSSociation, Evolution, and Linkage (TASSEL) software version 5.2 (Glaubitz et al., 2014) in association with the IWG reference genome (access provided by the Thinopyrum intermedium Genome Sequencing Consortium.<sup>1</sup>

Initial SNP discovery resulted in identifying 126,138 SNPs. To identify a final data set, filtering was completed using the following criteria, (1) minor allele frequency greater than 0.01, (2) each SNP was called in 30% or more of the individuals, (3) GBS tags uniquely aligned (one location) to the reference genome to prevent aligning to orthologous sequences, (4) only biallelic SNPs were retained, (5) a minimum read depth of four tags per individual were required to call a homozygote. Using a custom Perl script, homozygotes that had less than four reads per site were set to missing. Heterozygotes were called with a minimum of two contrasting tags. Additionally, any genet that had more than 95% missing SNPs calls was discarded from the analysis, resulting in a final data set of 23,495 SNP loci and 6,199 genets. Any missing genotype calls in the final data set were imputed using Beagle version 4.1 using the default settings (Browning and Browning, 2016).

The STRUCTURE program (Pritchard et al., 2000) was used to evaluate population structure among the 6,199 genets. A subset of 8,011 markers that had minor allele frequency greater than 0.05 and were present in more than 50% of the individuals were used to evaluate population structure. A total of 10 subgroups (K = 1–10) were evaluated using the admixture model with 100,000 reps and the first 25,000 as burn-in. Ten replicates of each value of K were assessed, with Structure Harvester (Earl and vonHoldt, 2012) used to determine the optimal number of K. CLUMPP (version 1.1.2) (Jakobsson and Rosenberg, 2007) was used to evaluate K = 1 and K = 2 through graphically assigning individuals to a cluster. In addition to STRUCTURE, principal component analysis (PCA) was performed on the imputed marker matrix in R (R Core Team, 2017). The PCA results were used to subset genets into two similarity groups based on breeding programs.

### Statistical Analysis

A mixed linear model using ASREML version 4.1 (Gilmour et al., 2015) was fit to the data to develop best linear unbiased predictors (BLUPs) for each genet in each cycle. The model consisted of a two-step model, where each cycle was analyzed separately (Piepho et al., 2012), and BLUPs were then combined for GS. The model accounted for the genetic relationships between genets using the realized additive genomic relationship matrix and spatial location by fitting a separate row and column autoregressive order 1 (AR1 × AR1) residual structure for each site. The general form of the mixed model is (Isik et al., 2017):

$$\mathbf{y} = \mathbf{X}\mathbf{b} + \mathbf{Z}\boldsymbol{\mu} + \boldsymbol{e} \tag{1}$$

where **y** is a vector of observed phenotypes, **X** and **Z** are design matrices for fixed and random effects, respectively, **b** and **u** are vectors of coefficients for fixed and random effects, and **e** is a vector of random residuals. The vector **y** is assumed to be distributed normally with mean **Xb** and variance **V**, **y**∼ N(**Xb, V**). The total variance, **V**, is defined as **V =** u e = <sup>G</sup> <sup>0</sup> 0 R . The **G** structure accounts for the variation between genets using the realized additive genomic relationship matrix and is defined as **G** = σ 2 <sup>A</sup>**K** where σ 2 A is the additive genetic variance and **K** is the realized additive genomic relationship matrix. **K** is computed as θ**MM**' where **M** is a matrix with n individuals and m columns of markers and θ is a proportionality constant (Endelman and Jannink, 2012). The genomic relationship matrix was computed using the function A.mat in rrblup (Endelman, 2011) R package using the methods of Endelman and Jannink (2012). The **R** structure accounts for residual variation using the row-column design for each cycle. The **R** for each site was defined as **R =** σ 2 <sup>e</sup> 6c(ρc) ⊗ σ 2 <sup>e</sup> 6r(ρr), fitting an AR1 row and AR1 column effect with an independent error variance for each site. A total of seven sites were fit, as TLI-C7 and TLI-C8 each had two separate locations, whereas all other cycles were grown in one location. 6 is an identity matrix with dimensions equal to the number of rows or columns (6<sup>r</sup> , 6c) respectively and ρ is the correlation parameter between rows and columns, respectively. A minimum of 350 observations were recorded from each cycle for use in GS models after adjusting phenotypic data for genetic relationship and spatial location in the field.

### Genomic Selection

Using the five cycles of data, GS models using the genomic best linear unbiased predictor (GBLUP) were developed to assess prediction ability. Within each cycle, a fivefold cross-validation method was repeated 100 times. For each iteration of the crossvalidation, we randomly sampled all of the genets that were in a given cycle, splitting the genets into a training population (80% of genets) and a prediction population (20% of genets). The GS model was fit with the training population using rrBLUP kin.blup function (Endelman, 2011), with predictions then being made on the prediction population. The GS model has the form (Endelman, 2011):

$$\mathbf{y} = \mathbf{W}\mathbf{g} + \mathbf{e} \tag{2}$$

where **y** is a vector of observations (phenotypic BLUPs, section Statistical Analysis), **W** is a design matrix relating genets to observations, **g** is a vector of genotypic values, and **e** is a vector of random residuals. The vector of genotypic values, **g**, is distributed as g ∼N(0, **K**σ 2 g ), where **K** is the realized additive relationship matrix and σ 2 g is the additive genotypic variance.

For each iteration, a random sampling without replacement was used to divide the training and prediction populations. Additionally, the random sampling did not prevent full or

<sup>1</sup>https://phytozome-next.jgi.doe.gov/info/Tintermedium\_v2\_1

half-siblings from being both in the training and prediction populations, potentially upwardly biasing predictions. Predictive ability was assessed using Pearson correlation between the predicted value (genomic BLUP, GBLUP) and the BLUP for the respective phenotype. From the GS model, variance components were extracted to calculate genomic heritability using the genetic variance and residual error variance using the formula (Endelman and Jannink, 2012):

$$h^2 = \frac{\sigma\_a^2}{\sigma\_a^2 + \sigma\_e^2} \tag{3}$$

where h 2 is narrow-sense heritability, σ 2 a is genetic variance, and σ 2 <sup>a</sup> + σ 2 e is the sum of genetic and residual variance representing total phenotypic variance.

To evaluate multi-environment predictions, each cycle was used as the training population to predict all other cycles. In this method, each cycle was fit as the training population, and then all other cycle genets were predicted. Using BLUPs for observed traits, accuracy was considered the correlation between the phenotypes and the GBLUPs, with the 95% confidence intervals for the correlation computed using the psychometric R package (Fletcher, 2010). Along with predicting all other sites from each site, a model was evaluated with a leave-one-out strategy, where the training population consisted of four cycles, and the final cycle was predicted from the combined training population.

Two other models were developed with the goal of identifying the best ways to use the data sets to increase genetic gain. A subset of data was made using the results of the PCA analysis to create two similar groups, UMN-PCA and TLI-PCA. These models used the 2nd principal component to divide UMN and TLI material (**Figure 1**), with training data only consisting of genets within a respective group. In addition, to developing training sets by genetic similarity, each individual breeding program was used as a prediction set to predict all other cycles. The multi-environment models, where one cycle was predicted from all others, were ran again using these two data subsets to evaluate the effect of using more related training data sets in the prediction model. A minimum of 100 genets were required to be in the training set to make predictions for each model.

### RESULTS

### Phenotypic Evaluations

We analyzed 8 years of breeding program field trials representing two independent breeding programs and five cycles of selection. Across all sites, several traits were measured, including the key domestication traits of free threshing and shattering and agronomic traits like spike yield and seed mass (**Table 1**). For all of these traits, a large range in observations were observed in all cycles. For example, individuals in most cycles ranged from no shattering to maximum shattering. For agronomic traits, a two or threefold range was present for spike length and spikelets per infloresence (**Table 1**).

### Population Structure

We implemented a Bayesian cluster method to estimate population structure. While all genets were derived from TLI breeding material, this study evaluated five cycles of selection at different locations, times, and generations from the base population, allowing for potential population structure. Results from this analysis suggested that there was no population grouping of genets. Further analysis using PCA confirmed minimal population structure as the first principal component contained 3% of the variation and the first 10 components only accounted for 13% of the total variation. There was minor clustering among cycles (**Figure 1**), with the second principal component partially separating the UMN material and later TLI-C6 and C7 breeding programs.

### Genomic Selection Models Within-Cycle Predictive Ability

To evaluate the potential of GS to increase the rate of genetic gain in IWG breeding, we fit several GS models to the phenotypic BLUPs. To determine predictive ability of GS, we fit a random fivefold cross-validation model to each cycle and trait individually. Using 100 iterations, within-cycle prediction ability, correlation between predicted value and the phenotypic BLUP, ranged from 0.11 to 0.97 (**Table 2**). Within cycles, prediction abilities were generally high, with a trend that free threshing percent, seed mass and shattering had higher average within-site prediction compared to agronomic traits like spike yield, plant height, and spikelets per inflorescence.

### Across-Cycle Predictive Ability

After confirming that GS could accurately predict traits within cycles, we fit GS models to predict across cycles. For each trait, all cycles were used individually as the training population, and then all other cycles were predicted from the chosen training population. This resulted in predicting each cycle from four different cycles. Across all traits, prediction ability ranged from −0.22 to 0.76, but there were striking differences between traits. For key domestication traits there was relatively high predictive ability with seed shattering in a range of 0.50–0.74, and free threshing had a range of 0.27–0.75. In comparison the agronomic trait of spike yield had a much lower range from -0.22 to 0.26 (**Figure 2**). These traits represent a general trend that was seen among all traits and years, allowing further discussion to be defined to domestication and agronomic traits. All other traits are provided in **Supplementary Figure S1**. Additionally for a trait with high and low predictive ability, scatter plots of predicted versus observed values are provided in **Supplementary Figure S2**.

To further investigate the validity of the across-environment GS results, we developed GS models that used all cycle data except for the prediction set. This resulted in a larger training population which could increase GS accuracy. Prediction accuracy based on all other sites ranged from 0.35 to 0.77 for domestication traits (**Figure 3**). Agronomic traits such as spike yield ranged from -0.10 to 0.37 (**Table 3**). The predictions from this leave-oneout strategy were paired with the genomic heritability that was calculated from the GS models. Plotting these two values showed

FIGURE 1 | Scatterplot of the first two principal component axis for intermediate wheatgrass genets, made from principal component analysis on the marker matrix, n = 6,199 genets, markers = 23,495. Each point is an individual genet that is color coded by cycle, with the 2nd principal component providing separation between the UMN and TLI breeding programs at the dashed line. Total variance explained by each principal component is listed on the axis.


TABLE 1 | Range of phenotypic observations collected for five cycles of intermediate wheatgrass breeding trials from the University of Minnesota and The Land Institute.

Range of phenotypic observations and the number of individuals (n) for each phenotype are displayed. †Trait was measured on a five-point categorical scale and converted to percentages. ‡Spike yield for UMN sites measured as entire inflorescence, not just clean seed.

TLI-C8 2019 872 4–100 961 40–140 867 4.8–16.3 873 0–4 873 11–31 870 0.0–1.1

TABLE 2 | Within-site fivefold cross-validation genomic selection predictions for intermediate wheatgrass traits.


Prediction abilities are reported as correlation (r) between predicted value and phenotypic best linear unbiased estimator (BLUP), along with the standard deviation (sd) of 100 random iterations.

FIGURE 2 | Performance of genomic selection (GS) across five cycles. Each panel represents one trait, shattering (A), free threshing (B), spike yield (C), and spikelets per inflorescence (D). The x-axis is the cycle that was used as the prediction population. Colored bars represent the prediction ability for each of the four other cycles, where each cycle forms the training population. For comparison, the fivefold cross-validation within cycle is represented for each training and prediction cycle, which usually provides the highest predictive ability. The y-axis is the prediction ability which is the correlation between the GS predicted value and the phenotypic best linear unbiased predictor (BLUP). Error bars represent the 95% confidence interval for the correlation value.

a significant relationship between these variables (p < 0.001, **Figure 4**). Key domestication traits of shattering, free threshing, and seed mass showed high heritability across cycle predictions. In comparison, spike yield, spikes per inflorescence, and plant height had lower heritability estimates and prediction accuracies.

### Optimizing GS Prediction and Training Set

Finally, in an effort to determine ideal GS training populations and enhance GS results, we used two different sub-setting methods. The first subset utilized results from the PCA decomposition of the genomic marker matrix to develop two subpopulations based on relatedness, **Figure 1**. The second sub-setting method used each individual breeding program as a unique training population. Using these data sets, we evaluated the same across-environment GS models, with the GS training population being more closely related to the prediction population. The GS model using all cycles in a leave-one-cycleout method, with all other cycles in the training population (**Figure 3**), was used as the reference. A model was declared better than the reference if the 95% confidence intervals were non-overlapping. We tested five different training populations for each of 55 cycle/trait combinations. The top performing model for each combination is listed in **Table 4**. Overall, there was much inconsistency between the best performing model and each cycle/trait combination (**Figure 5** and **Supplementary Figure S3**). However, using the leave-one-out as a reference resulted in the best performing model 62% of the time (34 of 55 combinations).

### DISCUSSION

### Combining Data Resources

The affordability of next-generation sequencing provides many opportunities for breeding that were previously unavailable. Particularly for programs that are implementing GS, there is an opportunity to leverage data across breeding programs and identify synergistic opportunities. This is particularly the case for minor and emerging crops. We were able to combine five cycles representing nearly a decade of breeding progress for IWG in the Central USA. Across the two programs, many key traits were measured each year, but there were often minor differences in trait measurement, specifically scoring of freethreshing and total spike yield between the TLI and UMN programs. While our results did not show any marked difference in these traits, i.e., consistent free-threshing prediction and

FIGURE 3 | Genomic selection (GS) performance for shattering, free threshing, spike yield, and spikelets per inflorescence, (A–D), respectively. Within each panel the x-axis is grouped by cycle name. Predictions were made by leaving out the named cycle and predicting that cycle from all other data. The prediction ability is the correlation between the predicted GS value and the phenotypic best linear unbiased predictor (BLUP), with standard error bars representing the 95% confidence interval.


TABLE 3 | Genomic selection prediction abilities of intermediate wheatgrass traits across sites.

Prediction population was one cycle, with the training population comprising all other cycles. Predictive ability is reported as correlation between predicted value and phenotypic best linear unbiased predictor (BLUP) with ± range for the 95% confidence interval for correlation.

year of observation where SPKYLD is spike yield; PTHT, plant height; SPKLNG, spike length; SDMG, seed mass; SPKHD, spikelets per inflorescence; SHAT, shattering; FTH, free threshing.

TABLE 4 | Highest performing genomic selection (GS) model for each trait/cycle combination across five breeding cycles representing two different breeding programs.


Predictive ability was assessed as the correlation between the GS predicted value and the phenotypic best linear unbiased predictor (BLUP). Models differed with respect to the training population used to develop the model. The leave one-out, LOO, model was used as the reference model and only if a model exceeded the 95% confidence interval of the LOO model was it considered superior. †Models are: LOO, leave-one-out, prediction cycle is left out of the training set, and all other cycles are used to train the model. MN and KS are breeding-program specific where only genets from Minnesota (or Kansas) are used to predict each cycle. For TLI-C6 2016 plant height, KS training population would consist of TLI-C7 and TLI-C8, with TLI-C6 as the prediction population. UMN-PCA and TLI-PCA are where the training population is made from PCA analysis of the marker matrix, with UMN-PCA encompassing most UMN lines and some of TLI that were more similar to UMN material than the TLI subset.

inconsistent spike yield across other cycles, it is unknown if more consistent data collection would result in higher predictive ability within this data set. As other breeding programs are established trait standardization using crop ontology (Shrestha et al., 2010) could greatly increase the inter-operability of experimental data.

# Genomic Selection Accuracy and Analysis

### Within-Cycle Predictive Ability

Using data generated from the field trials and next generation sequencing, we evaluated the potential of GS to predict trait values across geographically distant IWG breeding programs. First, within-cycle predictions were generated to verify GS could appropriately predict trait values (**Table 2**). These crossvalidation predictions were the highest GS predictive abilities achieved because the training sets were highly related and the training and test sets were grown in the same environment, minimizing any genotype by environment interactions (Desta and Ortiz, 2014; Zhang et al., 2016). These prediction abilities provide a potential maximum value that could be achieved utilizing the current markers and phenotypes within the study. Additionally, these predictions show that within breeding programs, GS could be an effective way to enhance genetic gain in IWG.

FIGURE 5 | Performance of genomic selection (GS) across five cycles with different training populations. Each panel represents one trait, shattering (A), free threshing (B), spike yield (C), and spikelets per inflorescence (D). Within each panel the x-axis is grouped by cycle name. Predictions were made by: LOO, leave one out where all data other than the predicted cycle were used in the training population. MN or TLI where only data from each separate breeding program, Minnesota or Kansas respectively, were used as the training population. MN-PCA or TLI-PCA where principal component analysis (PCA) was used to cluster genets within breeding programs, MN or TLI, and form the training populations. The prediction ability is the correlation between the predicted GS value and the phenotypic best linear unbiased predictor (BLUP), with standard error bars representing the 95% confidence interval.

### Across-Cycle Predictive Ability

fpls-11-00319 March 21, 2020 Time: 9:8 # 11

After evaluating within-site GS prediction, across-site predictions were generated for all cycles. As the relatedness and environments changed, a decrease in GS predictive ability was observed. Within these evaluations, two general trends emerged. For key domestication traits such as shattering and free-threshing, GS predictions were relatively high and constant across environment (**Figure 2**). For agronomic and yield related traits, the results were inconsistent, with some sites even producing negative prediction abilities. This suggests that certain traits may be more amenable to multi-environment GS than other traits.

To further investigate this trend, we examined the genomic heritability from the GS models. Plotting genomic heritability and predictive ability (**Figure 4**) suggests that domestication traits may exhibit lower genotype-by-environment interaction than agronomic traits. Additionally, the resulting prediction abilities of various traits were reaching the level of the trait heritability.

Domestication traits were highly predictive across environments, possibly indicating that these traits are not as influenced by environment as other traits. Within wheat, there are several well-known genes that control these traits. Free threshing in wheat is determined by a recessive mutation in the Tg (tenacious glume) locus and the dominant mutation of the Q gene. The Tg loci has been reported to explain up to 44% of the variation in threshability, and at least five other quantitative trait loci (QTLs) for threshability have been observed in wheat (Jantasuriyarat et al., 2004). Within IWG, recent research by Larson et al. (2019) found that QTL markers explained up to 46% of variation for free threshing across two locations. The Br (brittle rachis) locus controls shattering in wheat with two dominant genes and is homoeologous to the Btr loci in barley (Nalam et al., 2006). Traits such as free threshing and shattering that may have larger-effect QTL could be both increasing GS predictions as well as maintaining predictive ability across environments.

For agronomic traits, many more QTL of much smaller size have been reported. Bajgain et al. (2019) identified over 154 QTL for seven agronomic traits in IWG with the largest QTL effect sizes explaining only 4% of the phenotypic variation. Larson et al. (2019) found 12 QTL that explained up to 27% of the variation of spike yield in a biparental population grown in five environments. As the number of QTL increase and their size decreases, adequately accounting for their effects across environments may be more challenging. Simulation studies have shown that as heritability decreases GS accuracies are lowered (Iwata and Jannink, 2011). Other research has indicated that GS accuracy diminishes as the number of QTL increases (Shengqiang et al., 2009).

### Optimizing GS Prediction

Complementary to evaluating how traits may respond to GS, we also examined how the training population could be optimized to achieve the best results when combining data across breeding programs. While all germplasm originated from TLI material, UMN-C1 was only a subset of the entire TLI program and UMN-C2 was selected for MN conditions, which are different than KS. Additionally, from the founding lines (TLI-C3) two and five generations of selection had occurred for UMN and TLI respectively, allowing for potential population divergence.

We evaluated models using a leave-one-out approach for all cycles, which should result in the largest training population available for GS prediction. This leave-one-out strategy insured that the models were not biased by the size or the relationship of the training population (Desta and Ortiz, 2014) in comparison to GS prediction made from individual cycles. Additionally, we used PCA to develop a subset of data more related to each breeding program to ensure any large population structure differences did not influence GS prediction (Norman et al., 2018). Finally, data predictions were also developed using data specific to each breeding program.

The results from these models were inconsistent, with the leave-one-out model performing as well as or better the majority of the time. Often breeding program-specific or PCAspecific subsets performed well, but there was no clear pattern to this performance (**Table 4**). For example, the optimized training set using PCA for UMN provided the best prediction for TLI-C8 free threshing, whereas the TLI-PCA optimized training set provided the best prediction for UMN-C1 seed mass. In this case the training sets had optimal performance in data sets for which they were not specifically optimized. While developing highly optimized prediction sets has been shown to increase prediction accuracies (Isidro et al., 2015; Rutkoski et al., 2015) we did not note this in this data set. This could result from the large amount of genetic variance compared to domesticated crops. While future breeding efforts may be enhanced by optimizing the training set, these data suggest that increasing the training population, i.e., leave-oneout, is generally more useful than optimizing relatedness to prediction candidates.

### Implication for Future Development

As concerted efforts to develop new crops through domestication of crop wild relatives continues for food security and environmental benefits (Glover et al., 2010; Mayes et al., 2012), we have evaluated approaches for genomics-assisted breeding of neo-domesticated crops with insights into maximizing genetic gains. While plant breeding is both expensive and time consuming (Crews and DeHaan, 2015; DeHaan et al., 2016), genomic technologies provide a way to accelerate compared to phenotypic selection (Varshney et al., 2012; Unamba et al., 2015). Next-generation sequencing coupled with powerful tools such as GS and genome wide association studies could allow for significantly improving agronomic and domestication traits in short periods of time, especially in non-model plants. Within the TLI-IWG breeding program, GS has reduced the breeding cycle time from 2 years to 1 year, which should effectively double the rate of genetic gains if the predictability is roughly equivalent to the narrow-sense heritability. Additionally, the genetic resources generated can be used to better understand the genetic architecture of important agronomic and domestication traits (examples include Bajgain et al., 2019; Larson et al., 2019).

These results show that as plant species undergo early domestication, collaboration will accelerate progress, i.e., not

every breeding program will have to solve the same domestication problems and that progress can be made across programs. As domestication traits are fixed, breeding programs can work toward developing adapted lines for targeted growing regions. DeHaan et al. (2016) suggest a pipeline strategy for new crop domestication where many candidates are tested and attrition occurs as information about candidates are gained. Cooperative efforts in early breeding stages, along with applied genomics, should result in more quickly advancing and developing promising species into commercially viable crops.

These data provide several potential use cases for breeding programs. If a program is beginning, there appears to be little downside in utilizing training data sets from across programs. As programs mature and have sufficient, data from multiple years and locations, GS models can be developed within programs. This could be especially important for agronomic traits such as spike yield as combining data across programs could result in negative predictions (**Figure 5**). However, when looking at GS models using program-specific data, the GS predictions were always positive, so program-specific models may be the most conservative way to insure genetic gains. For domestication traits, predictions were usually similar regardless of the training population, suggesting minimal benefit to pooling multiple locations.

Our results show that GS can be a powerful tool in breeding programs, yet GS is not a single, stand-alone solution for quickly developing new crops. While we envision GS improving with larger data sets and new statistical model development, multi-environment predictions are extremely complex. To fully leverage genomic resources, GS should be integrated with phenomic and environmental data. High-throughput phenotyping is an emerging field that is providing dense phenomic measurements (White et al., 2012; Araus et al., 2018) that have been shown to increase GS model accuracy (Rutkoski et al., 2016; Crain et al., 2018). A further complement to better predict how the environment influences phenotype will include incorporating crop models to better understand plant development within a range of environments (i.e., review of crop models in wheat by Chenu et al., 2017). Future advances in these areas as well as incorporating them into unified prediction models will allow scientist to drive genetic gain in novel crops across a range of environments.

# CONCLUSION

Domesticating crop wild relatives is a challenging and time consuming task (Cox et al., 2002; DeHaan et al., 2014). Previous research at TLI has shown that a 77% increase in seed yield was achieved in two cycles of selection, however, to reach yields of annual wheat another 20 years of sustained breeding gains would be required with even longer time intervals to achieve similar seed mass to wheat (DeHaan et al., 2014).

Perennial grains derived from the domestication of wild species hold much promise for environmental and human benefit. To achieve these benefits, specific traits of wild species will need to be modified. Within IWG, free-threshing and nonshattering seed types are two key domestication traits that must be improved for wide-scale adoption. In addition, the economic yield of IWG must be sufficient to incentivize the transition to new crops. Along with fixing key traits for domestication, breeding efforts should also ensure that crops are broadly adapted (DeHaan et al., 2016).

The ability to use molecular tools such as GS, combined with modern breeding methodologies, may allow perennial crops and crop wild relatives to compress the 10,000 year selection history of many annual crops into a few decades. While GS predictions for agronomic traits like spike yield were low between breeding sites and environments, significant synergies could be achieved by utilizing collective information about domestication traits. While site-specific or regional programs will be necessary to breed for the best locally adapted genets, progress made toward improving key domestication traits could be shared among all programs. This is especially important for resource-limited programs that are domesticating new crops, allowing improvement for traits that are less environmentally influenced and are essential for domestication. Early domestication work could be carried out by a single program or shared among programs with each program phenotyping a few lines in diverse locations to quickly and efficiently improve key traits. As more programs are initialized for the breeding of IWG, they will be able to identify germplasm that has key domestication traits and be able to focus breeding efforts toward achieving higher site-specific agronomic performance.

# DATA AVAILABILITY STATEMENT

The genotypic datasets analyzed for this study have been placed in the NCBI Sequence Read Archive (SRA) (https://www.ncbi.nlm.nih.gov/bioproject/) BioProject accession numbers PRJNA563706, PRJNA609095, and PRJNA608473. All phenotypic data and scripts for data analysis have been placed in the Dryad Digital Repository (https: //doi.org/10.5061/dryad.3j9kd51d9).

### AUTHOR CONTRIBUTIONS

JC, LD, and PB conceived experimental ideas and methods. LD, JA, XZ, and PB conducted all field evaluations. LD, JP, PB, and XZ performed DNA extraction and genotyping. JC completed data analysis and wrote the manuscript. All authors read, reviewed, and approved the final manuscript.

# FUNDING

This work was funded in part by the Perennial Agriculture Project, in conjunction with the Malone Family Land Preservation Foundation and The Land Institute.

### ACKNOWLEDGMENTS

fpls-11-00319 March 21, 2020 Time: 9:8 # 13

We acknowledge the excellent field and laboratory assistance of Shuangye Wu, Marty Christians, Brett Heim, and Professor Donald Wyse. The Thinopyrum intermedium Genome Sequencing Consortium provided pre-publication access to the IWG genome sequence. Computational work was completed on the Beocat Research Cluster at Kansas State University, which is funded in part by NSF grants CNS-1006860, EPS-1006860, and EPS-0919443.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2020.00319/ full#supplementary-material

FIGURE S1 | Performance of genomic selection (GS) across five cycles. Each panel represents one trait, A-N. The x-axis is the cycle that was used as the prediction population. Colored bars represent the prediction ability for training populations for other cycles. For comparison, the within cycle fivefold

# REFERENCES


cross-validation is represented for each training and prediction site, which usually provides the highest predictive ability. The y-axis is the prediction ability which is the correlation between the GS predicted value and the phenotypic best linear unbiased predictor (BLUP). Error bars represent the 95% confidence interval for the correlation value.

FIGURE S2 | Relationship of genomic selection (GS) predicted values (x-axis) and observed values (y-axis) for two traits, including line of best fit. In panel A, a free threshing (n = 2,496), a trait with high predictive ability is shown, while panel B represents a trait with low predictive ability, spike yield (n = 2,508). The training population was TLI-C7 and the prediction population was TLI-C6.

FIGURE S3 | Genomic selection (GS) performance across five cycles where each panel represents one trait A-N. Within each panel the x-axis is grouped by the cycle of data that was predicted, with different training populations represented by colored bars. Leave-one-out (LOO) was a training population where the cycle of interest was left out and all other cycles were used to predict the cycle. Breeding program-specific training populations were developed for the Minnesota (MN) and Kansas (KS) breeding programs. Finally, UMN-PCA and TLI-PCA are training populations that were developed using principal component analysis (PCA) of the marker matrix. UMN-PCA is a training population more closely related to UMN genets while TLI-PCA is more closely related to TLI genets. The prediction ability is the correlation between the predicted GS value and the phenotypic best linear unbiased predictor (BLUP), with standard error bars representing the 95% confidence interval.


marker × environment interaction genomic selection model. G3 (Bethesda) 5, 569–582. doi: 10.1534/g3.114.016097


fpls-11-00319 March 21, 2020 Time: 9:8 # 14


Tsvelev, N. (1983). Grasses of the Soviet Union. New Delhi: Oxonian Press Pvt. Ltd.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Crain, Bajgain, Anderson, Zhang, DeHaan and Poland. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Exploration of the Yield Potential of Mesoamerican Wild Common Beans From Contrasting Eco-Geographic Regions by Nested Recombinant Inbred Populations

Jorge Carlos Berny Mier y Teran<sup>1</sup>† , Enéas R. Konzen<sup>2</sup>†‡, Antonia Palkovic<sup>1</sup> , Siu M. Tsai<sup>2</sup>

<sup>1</sup> Department of Plant Sciences, University of California, Davis, Davis, CA, United States, <sup>2</sup> Cell and Molecular Biology Laboratory, Centro de Energia Nuclear na Agricultura, Universidade de São Paulo, Piracicaba, Brazil

Genetic analyses and utilization of wild genetic variation for crop improvement in common bean (Phaseolus vulgaris L.) have been hampered by yield evaluation difficulties, identification of advantageous variation, and linkage drag. The lack of adaptation to cultivation conditions and the existence of highly structured populations make association mapping of diversity panels not optimal. Joint linkage mapping of nested populations avoids the later constraint, while populations crossed with a common domesticated parent allow the evaluation of wild variation within a more adapted background. Three domesticated by wild backcrossed-inbred-line populations (BC1S4) were developed using three wild accessions representing the full range of rainfall of the Mesoamerican wild bean distribution crossed to the elite drought tolerant domesticated parent SEA 5. These populations were evaluated under field conditions in three environments, two fully irrigated trials in two seasons and a simulated terminal drought in the second season. The goal was to test if these populations responded differently to drought stress and contained progenies with higher yield than SEA 5, not only under drought but also under water-watered conditions. Results revealed that the two populations derived from wild parents of the lower rainfall regions produced lines with higher yield compared to the domesticated parent in the three environments, i.e., both in the drought-stressed environment and in the well-watered treatments. Several progeny lines produced yields, which on average over the three environments were 20% higher than the SEA 5 yield. Twenty QTLs for yield were identified in 13 unique regions on eight of the 11 chromosomes of common bean. Five of these regions showed at least one wild allele that increased yield over the domesticated parent. The variation explained by these QTLs ranged from 0.6 to 5.4% of the total variation and the additive effects

### Edited by:

Petr Sm*ı*kal, Palack*ı* University, Czechia

### Reviewed by:

and Paul Gepts<sup>1</sup>

\* †

Kirstin E. Bett, University of Saskatchewan, Canada Juan M. Osorno, North Dakota State University, United States

### \*Correspondence:

Paul Gepts plgepts@ucdavis.edu

### †ORCID:

Jorge Carlos Berny Mier y Teran orcid.org/0000-0003-3709-9131 Enéas R. Konzen orcid.org/0000-0001-5176-7410 Paul Gepts orcid.org/0000-0002-1056-4665

### †Present address:

Enéas R. Konzen, Universidade Federal do Rio Grande do Sul, Campus Litoral Norte, Imbeì, Brazil

### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 31 October 2019 Accepted: 09 March 2020 Published: 03 April 2020

### Citation:

Berny Mier y Teran JC, Konzen ER, Palkovic A, Tsai SM and Gepts P (2020) Exploration of the Yield Potential of Mesoamerican Wild Common Beans From Contrasting Eco-Geographic Regions by Nested Recombinant Inbred Populations. Front. Plant Sci. 11:346. doi: 10.3389/fpls.2020.00346

**110**

ranged from −164 to 277 kg ha−<sup>1</sup> , with evidence suggesting allelic series for some QTLs. Our results underscore the potential of wild variation, especially from droughtstressed regions, for bean crop improvement as well the identification of regions for efficient marker-assisted introgression.

Keywords: common bean, crop wild relative, eco-geographic adaptation, nested backcrossed inbred populations, quantitative trait loci, yield

### INTRODUCTION

Among pulses, common bean (Phaseolus vulgaris L.; 2n = 2x = 22) plays an important nutritional and economical role (Broughton et al., 2003; Gepts et al., 2008). The yields of pulses are usually lower than those of cereals, mainly because their production is located in more marginal cultivation niches, produce more energy-dense seeds and the cost of association with nitrogen fixing rhizobia (Sinclair and Vadez, 2012). Production is also constrained by biotic and abiotic factors, drought being one of the main causes of yield reduction and crop failure in beans (Singh, 2001; Beebe et al., 2013; Ramirez-Cabral et al., 2016). Furthermore, drought severity is likely to increase due to the effects of climate change (Prudhomme et al., 2014). Increasing crop yield and resilience is an essential goal of crop breeding and cultivar development, as well as a direct advantage to farmers and, ultimately, to consumers (Kissoudis et al., 2016). Several strategies to improve yield include maximizing nitrogen fixation, photosynthesis and partitioning to grain, as well as minimizing water deficit impacts (Monteith and Moss, 2006; Zhu et al., 2010; Sinclair and Vadez, 2012).

Common bean is one of the five domesticated species among ∼80 wild Phaseolus species, all originating in the American continent (Freytag and Debouck, 2002; Delgado-Salinas et al., 2006). Wild common bean originated in Mexico and was dispersed by at least two long-distance movements to Ecuador and northern Peru (some 500,000 years ago) and the southern Andes (southern Peru, Bolivia, and northwestern Argentina; some 100,000 years ago) leading to two additional gene pools in the Andes (Bitocchi et al., 2012; Rendón-Anaya et al., 2017; Ariani et al., 2018; Gepts, 2019). Common bean was domesticated independently within two of the three gene pools (Southern Andes and Mesoamerica), followed by divergence into six genetically distinct races (Kwak and Gepts, 2009). Mesoamerican beans were presumably domesticated in Western Mexico while Andean beans were domesticated in northern Argentina and southern Bolivia (Kwak et al., 2009; Rodriguez et al., 2016). Within the domesticated Mesoamerican gene pool, the largerseeded races 'Jalisco' and 'Durango' are distributed in the subhumid and semi-arid highlands, respectively, in Central and Northern Mexico, while the small-seeded race Mesoamerica is distributed in the lowlands from southern Mexico to northeast Brazil (Singh et al., 1991). Within each race, there are thousands of landrace types, as well as fewer, but established, commercial market classes (Gentry, 1969; Singh et al., 1991; Moghaddam et al., 2016). In common bean breeding, the variation included in the breeding programs is mostly constrained within market classes and within races, as the inheritance of color, size, and shape of seeds and plant architecture are highly polygenic and dispersed throughout the genome (Park et al., 2000; McClean et al., 2002; Checa and Blair, 2008; Schmutz et al., 2014). Introgressions between market classes or gene pools have focused mostly on transferring disease resistance, modifying growth habit, and introducing drought resistance (Kelly, 2001; Acosta-Gallegos et al., 2007; Beebe, 2012; Porch et al., 2013; Dohle et al., 2019).

The use of wild variation has been even more limited. Wild variation has been identified as a source of resistance to bruchids (Kornegay et al., 1993; Osborn et al., 2003, 2006), common bacterial blight (Beaver et al., 2012), web blight (Beaver et al., 2012) and white mold (Mkwaila et al., 2011). However, it is possible that beneficial wild variation for highly quantitative important traits, like grain yield and drought adaptation, is not present the domesticated forms due to early genetic bottlenecks (Gepts et al., 1986; Tanksley and McCouch, 1997; Acosta-Gallegos et al., 2007). For example, within the Mesoamerican gene pool, a single domestication event originated in only one of the three genetically and geographically distinct groups of wild common beans (Kwak et al., 2009; Ariani et al., 2018). Although domesticated forms expanded to areas of those non-domesticated, wild groups, and can outcross with the wild relatives, gene flow has been found highly asymmetrical, introgressing more regions from the domesticated to the wild types (Papa and Gepts, 2003). Therefore, diversity unique to the non-domesticated groups might not be present in the domesticates. This is supported by a strong genetic diversity bottleneck, especially during the Mesoamerican bean domestication (Gepts et al., 1986; Sonnante et al., 1994; Schmutz et al., 2014). Previous efforts in genetic analyses in common bean using wild by domesticated crosses include: (a) two populations with the same Andean domesticated cultivar (ICA Cerinza) crossed to a wild type from Colombia (Blair et al., 2006) and to one from northern Mexico (Blair and Izquierdo, 2012); (b) a population resulting from a cross of Midas (an Andean snap bean), and a wild type from central Mexico G12873 (Koinange et al., 1996); and (c) a population involving Peruvian accessions both originating in the Andean gene pool (Singh et al., 2019). In the first two efforts, genomic regions carrying a positive-effect allele for yield from the wild parent were detected in the wild accession from a high rainfall area in Colombia (Blair et al., 2006), but not in the accession from a low rainfall region in northern Mexico (Blair and Izquierdo, 2012).

In the present investigation, the development of three backcrossed-inbred-line populations in a nested design is described, as is their evaluation under drought pressure in field conditions followed by a QTL analysis of grain yield.

The populations result from the cross between three wild Mesoamerican accessions originating from areas with different levels of precipitation/evapotranspiration to the same elite breeding line, also of Mesoamerican origin. The nested design allows the sampling of more diversity than single biparental populations, increases the power and mapping resolution, but most importantly allows the testing of genetic effects among accessions in similar genetic backgrounds (Yu et al., 2008; Garin et al., 2017). Our two-fold hypothesis is that: (1) The three wild accessions have different adaptations to rainfall/evapotranspiration regimes; and (2) Some of these contrasting adaptation mechanisms were not included into the domesticated gene pool during domestication.

### MATERIALS AND METHODS

### Parental Materials and Population Development

Three populations were established, which resulted from a cross of the Mesoamerican elite breeding line SEA 5 with three wild accessions from the Mesoamerican gene pool. SEA 5 (PI 613166; Number and DOI of the CIAT gene bank: G51502, doi: 10. 18730/PHA81) was obtained at the International Center for Tropical Agriculture (CIAT, Cali, Colombia) from an interracial (Durango × Mesoamerica) double cross (BAT 477/San Cristobal 83//Guanajuato 31/Rio Tibagi) within the Mesoamerican gene pool (Singh et al., 2001). SEA 5 was selected for high productivity under drought, has an indeterminate inclined growth habit IIb, photoperiod neutrality, resistance to Fusarium root rot and Bean Common Mosaic Virus (I gene), but shows susceptibility to anthracnose, common bacterial blight and rust (Singh et al., 2001; Terán and Singh, 2002). SEA 5 develops a vigorous and deep root system (Polania et al., 2017), has a high capacity of photoassimilate remobilization (Polania et al., 2016; Rao et al., 2017), and a high capacity for nitrogen fixation under drought (Devi et al., 2012).

Three wild accessions within the Mesoamerican wild gene pool (Ariani et al., 2018; Berny Mier y Teran et al., 2019a) were chosen to maximize genetic variation and annual precipitation of their collection site. From North to South and along a gradient of increasing temperature and rainfall, the wild accessions included PI 319441 (G10022, DOI: 10.18730/JRH59), collected in the state of Durango, Mexico (104.47◦N, 24.24◦W, mean annual temperature: 18◦C, mean annual precipitation: 588 mm), PI 417653 (G12910, DOI: 10.18730/PGEK4) from Guanajuato, Mexico (101.72◦W, 20.62◦N, 19.4◦C, 733 mm) and PI 343950 from Huehuetenango, Guatemala (91.82◦N, 15.68◦W, 23.5◦C, 1600 mm). The seeds were obtained from the National Plant Germplasm System (NPGS) of the USDA at the Western Regional Plant Introduction Station in Pullman, WA, United States. Climatic conditions of each collection site were extracted from the WorldClim database<sup>1</sup> (Hijmans et al., 2005).

The three populations (henceforth called dw319441, dw417653, and dw343950) were developed in an identical

<sup>1</sup>www.worldclim.org

way as follows: (1) SEA 5 was crossed to the wild accession using SEA 5 as the male parent; (2) The F<sup>1</sup> plant was then used as the male parent crossed back to SEA 5 plants to obtain at least 250 BC<sup>1</sup> seeds; (3) Each BC<sup>1</sup> seed was scarified before planting and allowed to self in the greenhouse; (4) BC1S1:<sup>2</sup> families were grown in the field in the summer months (June–September, short night/long day photoperiod) planting ∼50 seeds per family, to allow selection against photoperiod sensitivity. Seeds from a randomly chosen plant from each family were harvested; (5) The plants were grown in the greenhouse for two more cycles using single-seed-descent to obtain 220, 237, and 238 BC1S<sup>4</sup> plants from the dw319441, dw417653, and dw343950 populations, respectively.

### Trial Design and Phenotyping

The populations were evaluated under field conditions in 2014 and 2015 at the Plant Sciences Field Facility on the University of California, Davis, campus (38.53◦N, 121.78◦W). The soil type of the site belongs to the Yolo series, a member of fine-silty loam, mixed, non-acid, thermic Mollic Xerofluvents, considered well-drained, with slow to medium runoff and moderate permeability<sup>2</sup> . The seeding was carried out on the 8th of June in 2014 and on the 9th of June in 2015. The plants were harvested on the 22–26th of September in 2014 and the 14–25th of September in 2015. In 2014, the populations were evaluated only under full irrigation, using furrow water delivery as needed, in four irrigations. In 2015, a terminal drought water regime was added. Terminal drought was simulated by withdrawing irrigation in the last two of the four irrigations of the full irrigated treatment. Plants received water only from irrigations as there were no rain events during the experiments. The agricultural management was according to standard practices in California (Long et al., 2010).

In both years, the RILs were planted in un-replicated fashion and SEA 5 in replicated fashion (eight replicates) according to a modified augmented design (Lin and Poushinsky, 1985). The field was divided in blocks and planted with a main check in the middle of each block, as well as two to three secondary checks randomly distributed within each block. In 2014, 30 blocks of 3 plots × 9 plots were used and, in 2015, 21 blocks of 5 plots × 5 plots per treatment were used. UCD 9634, a pink-seeded breeding line, was included in both years as the main check due its high yield and stability. In 2014, Tio Canela 75 (Rosas et al., 2010), Matterhorn (Kelly et al., 1999), and Flor de Mayo Eugenia (Acosta Gallegos et al., 2010) were included as secondary checks. In 2015, a small, black-seeded line L88-63 (Frahm et al., 2004) was added as a check. The experimental unit was a plot of 60 plants grown in a row of 6 m-long rows and 0.76 m between rows (density of 131,578 plants per hectare). In 2014, 230, 238, and 237 progeny lines of the dw319441, dw417653, and dw343950 populations, respectively, were grown.

There was a bimodal distribution in days to flowering (**Figure 1**) with a smaller set of lines that showed late flowering. These later genotypes were not used for further evaluation and genotyping, as a wide variation in phenology can be a

<sup>2</sup>https://soilseries.sc.egov.usda.gov

confounding effect (Pinto et al., 2010), especially in the case of late flowering and maturity, which are negatively correlated with yield in beans (Klaedtke et al., 2012; Berny Mier y Teran et al., 2019b). The final analyses used 171, 170, and 165 lines from the dw319441, dw417653, and dw343950 populations, respectively.

Grain yield was measured per experimental plot and extrapolated to kilograms per hectare. Flowering time was taken when at least 50% of the plot had an open flower. Seed weight was evaluated in random subsample of 100 seeds.

### Statistical Analyses

To adjust for within-field spatial variation, a two-dimensional tensor product was used, i.e., the penalized splines (P-splines) method with the package SpATS (Rodríguez-Álvarez et al., 2018) in R (R Development Core Team, 2018). Briefly, the method fits the mixed model y = Xβ + Xsβ<sup>s</sup> + Zss + Zuu + Z<sup>g</sup> g + e, where β and β<sup>S</sup> are vectors that include the intercept and check cultivar effect as fixed term, XSis the design matrix, Xsβ<sup>s</sup> and Zss are fixed and random components of the mixed model, respectively, s is the vector of random spatial effects, u is a random row and column effect sub-vector accounting for discontinuous variation, and g is the random genotypic effect (Malosetti et al., 2017; Rodríguez-Álvarez et al., 2018). The genotype was considered as a random effect in both years and irrigation treatment as a fixed effect in the second year. An analysis of variance was carried out as a linear mixed model using population, environment, and their interaction as fixed effects, and genotype within population as random effects. The evaluations in 2014 under full irrigation, in 2015 under terminal drought and full irrigation were considered as separate environments. The statistical analyses were performed with JMP (SAS Institute Inc., 2016). The Drought Susceptibility Index (DSI) calculated from the two irrigation treatments in 2015 as (1 – Yds/Yns)/1 – (Xds/Xns), where Yds and Yns are yields in drought stress and no stress environments, respectively, and Xds and Xns are the overall yield under drought stress and no stress treatments, respectively (Fischer and Maurer, 1978; Beebe et al., 2013).

### Genotyping

The BC-RILs, parents, and F<sup>1</sup> hybrids were genotyped with the BARCBean6K\_3 BeadChip platform of 5,398 SNP markers (Song

et al., 2015) at the Genome Center at the University of California-Davis. After filtering in GenomeStudio Module v1.8.4 (Illumina Inc., San Diego, CA, United States), SNP calling was performed with the software's cluster algorithm, with subsequent manual adjustments and a quality control with a 0.15 Gencall score cutoff. The markers were filtered for less than 10% missing data and polymorphism between parents, verified with the F<sup>1</sup> hybrids.

## Map Construction

Linkage maps were constructed for each population in R (R Development Core Team, 2018) using the package asMAP (Taylor and Butler, 2017). Genetic distances were determined using the Kosambi function (Kosambi, 1944). The recombination fraction was estimated with qtl (Wu et al., 2003). A consensus map, combining results for the three individual maps, was developed with LPmerge (Endelman and Plomion, 2014), with the linkage maps of the three populations constructed without first filtering for co-located markers, to maximize the markers shared among populations. LPMerge uses linear modeling to keep the maximum interval with the lowest root mean squared error applying weights to population size (Endelman and Plomion, 2014). When there were co-localized markers in the consensus map, only one marker per bin was kept. Chromosomes were numbered Pv01 to Pv11 matching the standard numbering in P. vulgaris (Pedrosa-Harand et al., 2008).

## QTL Mapping

The QTL analysis was performed with the NAM function of QTL IciMapping version 4.1 (Meng et al., 2015). The NAM module implements joint inclusive composite interval mapping (JICIM), uses generalized linear models with population and marker by population interaction as fixed effects through stepwise regression for marker selection and subsequent interval mapping with adjusted phenotypes from the selected markers outside the current marker interval (Li et al., 2007, 2011; Wang et al., 2016). A probability of 0.01 and a step of 1 cM was used for the stepwise regression and the significance. The LOD threshold was calculated by 2,000 permutations in each environment at a significance level of 0.05. Furthermore, a QTL analysis was also conducted for the average yield across the three environments in 2014 and 2015 and the drought susceptibility index (DSI).

# Data Availability

The marker segregation and phenotypic evaluation data have been deposited in the Dryad public database: https://datadryad. org/stash/dataset/doi:10.25338/B8FW3M.

### RESULTS

### Sources of Variation

The effects of population, environment, and their interaction were highly significant (**Table 1**). Nevertheless, the environmental effect was larger (F = 1,672, P < 0.001) than both population and their interaction (F = 5.7 and 3.6, at P < 0.01, respectively). Across environments, the yield of the dw417653 population was the highest, being significantly higher



<sup>1</sup>Significance codes: \*\*\*<0.001, \*\*< 0.01. <sup>2</sup>Levels not connected with the same letter are significantly different according to the Tukey-Kramer HSD test (P < 0.05). <sup>3</sup>Treatments: full irrigation (C), terminal drought (D).

(113 kg, +11%) than the yield of the dw343950 population, whereas the dw319441 population was not statistically different from other two populations. Across populations, the yield in 2014 was significantly lower than both water regimes in 2015, with a reduction of 56 and 47%, relative to drought and wellwatered treatments, respectively. There was a significant yield reduction in 2015 because of terminal drought, with 17% lower yield relative to the irrigated treatment.

# Comparison of Populations Across Environments

Although the population by environment effect was significant (**Table 1**), the ranks of the performance of the populations were similar between environments. Across environments, the dw417653 population produced higher yield than the dw319441 population, which, in turn, was higher than the dw343950 population. However, population dw417653 was significantly different from population dw343950 only in 2015 (**Figure 2**). In all environments, the average of the population yields was lower than that of SEA 5. Across environments, SEA 5 produced 45% more yield than the average of the RILs (**Table 1**).

For breeding purposes, some RILs showed transgressive segregation for higher yield compared to the average of SEA 5 plots in 2015, although none were higher than the highest yielding SEA 5 plot in 2014 (**Figure 2**). However, in 2015 in both treatments, some RILs yielded more than the highest plot of SEA 5. Furthermore, there were high-performing RILs from each population in both years and treatments (**Figure 2**). Two populations, dw417653 and dw319441, included progeny lines with yields that were significantly higher (+20% on average) – across the three environments – than that of the domesticated control SEA 5 (**Table 2**). These lines, nevertheless, had similar number of days to flowering (42 to 54 days vs. 47 for SEA 5) and only slightly smaller seeds than SEA 5) (15–20 g/100 seeds vs. 24 g/100 seeds for SEA 5; **Table 2**).

# Correlations Among Traits and Distribution

The correlation of yield between the 2014 and 2015 wellwatered treatments and the 2015 terminal drought treatment was relatively low, with an R of 0.51 and 0.5, respectively (**Figure 3**), but the correlation between drought and well-watered conditions in 2015 was higher (R = 0.88). The overall distributions of yield among the RILs in the first year and in 2015 under drought were not normally distributed (P < 0.001, Shapiro–Wilks test) but were skewed toward the low- yielding side of the distribution, while the distribution of yield in well-watered plots in 2015 was normally distributed (P = 0.17).

# Molecular Linkage Map and QTL Analyses

### Polymorphism, Recombination Rate and Allele Frequency

There were 1554, 1404, and 1499 polymorphic markers between SEA 5 and PI 343950, PI 417563, and PI 419441, respectively. Jointly, there were 2,201 polymorphic markers between the wild accessions and SEA 5 and 1,858 markers shared among the three wild accessions. The trends in recombination rate were similar across populations. Higher recombination rates were observed in the distal parts of all the chromosomes, except on chromosome Pv06 where higher recombination was observed from the middle to the distal portion and chromosome Pv09 where higher recombination was located in the second and fourth quarter of the chromosome (**Figure 4**).

The wild allele frequency was variable between and within chromosomes. After one backcross to the domesticated parent, one would expect an average frequency of 0.25 of the wild parent. The average frequency was 0.23 across chromosomes, with a range of 0.17 to 0.4, in Pv01 and Pv02, respectively. The populations had similar frequencies across chromosomes; however, within chromosomes there were differences among populations. For example, in Pv02, compared to the other two populations, the dw417653 population showed higher wild allele frequency in the central region but very low frequencies at the chromosomal ends. In Pv10, the P319441 population had


higher wild allele frequency than the other populations. In some chromosome regions, all three populations showed a lower than expected wild allele frequency, like in the distal region of Pv01. There were two regions with markedly low wild allele frequency, the region mentioned before in Pv02 and another region in Pv07, both at the end of the chromosomes.

### Consensus Molecular Linkage Map

The consensus map was built with 721 markers and had a genetic length of 925 cM (**Table 3**). Per chromosome, there were on average 66 markers, an average length of 84 cM, and an average spacing of 1.3 cM. The average maximum spacing was 9 cM, while the largest interval was 14 cM in Pv01. Through a comparison of the range of the markers on each chromosome and their physical position according to the G19833 reference genome version 2.1 (Schmutz et al., 2014), the genetic map spanned 510,318,067 bp, which represents 99% of the sequenced genome. The coverage ranged from 96.5% of Pv06 to 99.7% of Pv08.

### Identification of Additive QTLs and Their Distribution on the Molecular Linkage Map

QTL analyses were performed for grain yield in each environment, consisting of 2 years grown in full irrigation, terminal drought in the second year, and the average yield across environments. The same analysis was conducted for the DSI calculated from the drought and well-watered conditions in 2015 (**Table 4** and **Figure 5**). Significance thresholds for LOD scores, calculated by permutations for each environment, were 3.7 (environment 2014), 3.6 (2015c), and 3.6 (2015d). LOD scores for averages across environments had a threshold of 4.7. The threshold LOD score for DSI was 12.1. Twenty QTLs had LOD scores about the respective thresholds. They were distributed on eight chromosomes. These included three QTLs in 2014, five in 2015c, six in 2015d, five for the average across environments, and one for DSI.

The magnitude of these QTLs ranged from 0.6% to 5.4% with an outlier at 38% (for DSI on chromosome 9, observed in the dw319441 and dw343950 populations). The allelic effects of the chromosome regions marked by these QTLs varied considerably among the three accessions between negative and positive values. The most negative value (−164.8 kg ha−<sup>1</sup> ) was for yield QTL Pv07.55 in 2015, inherited from PI 343950. The largest positive allelic effect (277.4 kg ha−<sup>1</sup> ) was observed in well-watered 2015 conditions and was inherited from PI 417653. Overall, the average allele effect of PI 343950 was the lowest in all environments and in the mean across environment (**Figure 6** and **Table 4**). In contrast, the allele effect of PI 319441 was higher than that of PI 417653 in 2014 and 2015, both under full irrigation, while the average allele effect of PI 417653 was higher than that of PI 319441 under drought in 2015. In addition, the allele effects across genotypes were relatively higher under drought than under full irrigation in 2015. The average QTL significance interval was about 0.8 Mbp, ranging between 0.07 Mbp (on Pv08) and 2.18 Mbp (on Pv01). Many but not all QTL intervals appeared to be located toward the extremities of the chromosomes (**Table 4**).

### DISCUSSION

# Population Development and Segregation Distortion

The main objective of the investigation was to survey ecogeographic adaptive variation in wild beans as a source of novel alleles to increase productivity, as well as to test if these alleles have differential expression under drought constraints. A resource commonly used for genetic studies are diversity panels, which allow a large sample of the variation to be tested and higher genetic resolution to be obtained. Prior to this study, a panel of wild accessions of the Mesoamerican gene pool was evaluated in a greenhouse setting revealing phenotypic variation in root and shoot traits, as well as genomic regions controlling these traits (Berny Mier y Teran et al., 2019a). However, these studies are limited by the confounding effects of population structure and relatedness, as well as low power of detection of rare alleles (Anderson et al., 2011; Bazakos et al., 2017). Population structure is even more geographically constrained in wild bean populations than in its domesticated forms, as dispersal and intercrossing between wild populations are limited compared with domesticated populations (Papa and Gepts, 2003; Zizumbo-Villarreal et al., 2005). In addition, some wild forms, including in common bean, are not adapted to cultivated conditions, due to their profuse and extended climbing growth habit and photoperiod sensitivity (Acosta-Gallegos et al., 2007).

Multiparent populations have several advantages over biparental populations for genetic studies: an increase in allelic diversity, sample size at a specific locus (increasing the power of detection), and mapping resolution (Garin et al., 2017; Holland, 2015). Various schemes for multiparent populations have been proposed; they are mostly divided into two groups: intermating of many parental lines and interconnected biparental populations (Fouilloux and Bannerot, 1988; Cavanagh et al., 2008; Buckler et al., 2009; Holland, 2015). Here, three populations were developed from domesticated by wild crosses using an elite domesticated breeding parent, selected for drought tolerance, as the common parent and three wild accessions from a range of rainfall conditions from the driest part of the distribution in Durango, northern Mexico, to a high-rainfall region in highland Guatemala. A domesticated genotype as the common parent allows for comparison of QTLs contributed by the different wild genotype relative to each other and to those of the domesticated parent. Domesticated × wild nested populations have been developed in maize (Liu et al., 2016) and barley (Schnaithmann et al., 2014; Maurer et al., 2015; Nice et al., 2016). In all cases, the populations were developed after one to four backcrosses to the domesticated parent.

Inbred-backcrossed lines facilitate QTL detection (Kaeppler, 1997) as the lines are more homogeneous and their interaction with other traits can be defined more precisely. For example, the benefit of earliness as an escape for terminal drought can be better assessed in this type of population after measuring

earliness and yield under different irrigation regimes (Beebe et al., 2014; Polania et al., 2017). In addition, if superior lines are identified in domesticated × wild inbred backcross populations, few or no additional backcrosses are needed for breeding use (Tanksley and Nelson, 1996). However, if a trait is controlled epistatically by one or more loci without independent additive effect in the wild, it will be more difficult to identify such epistatic interactions in backcross populations (Tanksley and Nelson, 1996; Kaeppler, 1997; Johnson and Gepts, 2002). Only one backcross generation was used to limit the loss of detection power of the effect of wild alleles as every added backcross decreases the number of alleles from the wild parent and, hence, the power of QTL detection (Kaeppler, 1997).

Nested populations have been developed and used to study an array of traits in maize (Buckler et al., 2009; Wu et al., 2016; Xiao et al., 2016), soybean (Song et al., 2017), rice (Fragoso et al., 2017), barley (Saade et al., 2016), wheat (Bajgain et al., 2016), and common bean (Hoyos-Villegas et al., 2016), among others. While wild by domesticated nested populations exist in maize (Liu et al., 2016) and barley (Schnaithmann et al., 2014; Maurer et al., 2015; Nice et al., 2017), ours is the first domesticated by wild nested population in common bean. Besides the beneficial alleles for yield, these three populations could be of great use as breeding material and for future evaluations as the domesticated parent and the wild accessions can be polymorphic for other potentially useful traits. In evaluations of wild germplasm, PI 319441 was found to have high sulfur amino acid content in the seed and a high degree of protein hydrolysis after cooking (Montoya et al., 2008), a high content in polyphenols (Espinosa-Alonso et al., 2006), a high iron concentration (Blair et al., 2013) and, thus, could be used as a source for improved nutrition. PI 417653 had a high root efficiency ratio (total P content: root area) in low nitrogen conditions (Araújo et al., 1998), a high level of resistance to cucumber mosaic virus (Griffiths, 2002), and medium tolerance to salinity during early vegetative growth (Bayuelo-Jiménez et al., 2002).

The average wild allele frequency in our populations was 0.23, very close to the expected frequency of 0.25 in a biparental population with a single backcross. Allele frequencies significantly different from the expected frequency can be due to genetic mechanisms of segregation distortion, selection against photoperiod sensitivity and late flowering, or unintended selection. Processes that lead to segregation distortion include gametic incompatibility, genetic load, and asymmetric allelic inheritance in heterozygotes (Bodénès et al., 2016; Lyttle, 1991) besides the selection applied to the populations during their development. Genomic regions of low wild allele frequency were found in various chromosomes. In Pv01, an area of low frequency in all three populations was found around the 48 Mb position. The low frequency of wild alleles is likely due to the presence of a photoperiod sensitivity gene, Ppd, identified in this region (Koinange et al., 1996; Kwak et al., 2008; Weller et al., 2019) and selected against during population development (**Figure 1**). Segregation distortion in this region has also been found in two biparental wild by domesticated common bean populations (Blair et al., 2006; Blair and Izquierdo, 2012).

At least two other photoperiod loci have been identified, but have not been mapped as yet: the locus Hr, which is recessive and hypostatic to Ppd (Gu et al., 1998; Kwak et al., 2008) and the locus Tip, which is also recessive and increases earliness at cooler temperatures in long daylength (White et al., 1996). Tip might be allelic to either Ppd or Hr (White et al., 1996). Other regions controlling quantitatively photoperiod sensitivity have been located on Pv03 and Pv04 (Wallach et al., 2018).

There were other regions almost devoid of wild alleles at the end of chromosome Pv02 in one population (dw417653) and at the end of Pv07 in two populations (dw319441 and dw417653). Distortion in the latter region was also observed by Blair et al. (2006). In contrast, some regions in Pv02 and Pv11 showed a wild allele frequency of 0.4, that is, higher than the expected wild allele frequency after a single backcross. Nevertheless, although QTL analyses assume low segregation distortion, including distorted markers does not necessarily increase false positives or bias the effect and position, especially in large populations (Xu, 2008).


### Field Evaluation

The three populations were evaluated under field conditions in 2 years under optimal irrigation conditions. In the second year, a terminal drought stress was imposed by withdrawing the final two irrigations. Thus, the three populations were tested in a total of three environments. The analysis of variance showed a significant effect of population, environment, and their interaction, for yield production. The environmental effect was larger than the population and interaction effects. The overall yield in 2014 was almost half that of 2015 and the correlation between treatments was higher between drought and full irrigation in 2015, than between full irrigation in 2014 and full irrigation in 2015 and drought in 2015. This might be explained by the effect of hot weather experienced during flowering in 2014. In addition, terminal drought resulted in a marked 17% yield reduction relative to the irrigated treatment. Nevertheless, the ranks of the populations within environments were similar. The dw417653 population was higher-yielding than the dw319441 populations, which, in turn, was higher-yielding than the dw343950 population.

The wild parent of the latter population originated in the warmest and wettest climate (23.5◦C, 1600 mm) of the three wild parental lines, suggesting that variation for increased productivity can be found in drier and cooler areas. However, the wild parent of population dw417653, which had the highest yield, originated in a slightly warmer (19.4◦C vs. 18◦C) and wetter (733 mm vs. 588 mm) environment as the second-ranked population (PI 319441). Thus, there may not be a linear relationship between the aridity of the environment of origin and the ability to increase yields in domesticated × wild crosses. Other factors may play a role in addition to aridity adaptation, such genetic distance between the wild accession and the domesticated gene pool. PI 417653 (G12910) has been implicated in the Mesoamerican domestication of common bean (Kwak et al., 2009).

### Map Development and Joint Linkage QTL Analysis

A consensus map from the three populations was built, which was adequately dense for joint linkage analysis and subsequent QTL mapping (**Table 3**). Furthermore, the map covered 99% of the G19833 reference genome (Schmutz et al., 2014). Through joint linkage mapping, 20 QTLs for grain yield were identified in the individual environments (year and irrigation regime), the overall grain yield across environments, and the drought susceptibility index calculated from the drought effect in the second year. Among the 20 QTLs, there were 13 non-overlapping QTLs. Two QTLs on chromosome Pv01 at 76 and 81 cM were tightly linked genetically and physically and had similar additive effect patterns among wild alleles. It is possible, therefore, that they are the same QTL. On the same chromosome, a QTL was mapped at the 72 cM position; however, the pattern of additive effects was reversed from that at the 76 and 81 cM positions, suggesting a different QTL. On Pv07, QTLs at the 78 and 87 cM positions had different additive effect patterns for two of three parental loci suggesting these two QTLs are distinct.

fpls-11-00346 April 2, 2020 Time: 10:8 # 11


TABLE 4 | Summary of theQTL analysis for yield (kg ha−<sup>1</sup>) and drought susceptibility index (DSI) evaluated in three environments (2014 and 2015 under full irrigation, and 2015 under terminal drought).

<sup>1</sup>Mean: average of the three treatments. <sup>2</sup>Percentageof variation explained. <sup>3</sup>For further explanations, see the section "Discussion."

From the 20 QTLs, three were detected in 2014, five in 2015 under full irrigation, six in 2015 under terminal drought, five in the overall average across environments and one for DSI. From these, four QTLs were unique to 2015C and three to 2015D. The variation explained by the QTLs ranged from 0.6 to 5.4%, expected for a highly polygenic trait such as yield (Johnson and Gepts, 2002; Blair et al., 2006). For all QTLs that were expressed in different environments, the additive effects were of similar sign, suggesting that there were no tradeoffs between years or treatments. Although for most QTLs the effect of the wild alleles showed the same sign, for some, e.g., Pv08.87, the allele of PI 417653 had a positive effect (115 kg ha−<sup>1</sup> ) while the alleles of PI 343950 and PI 319441 had a negative effect (−123 kg ha−<sup>1</sup> and −33 kg ha−<sup>1</sup> , respectively). This observation suggests that this QTL consists of an allelic series at one locus or represents several, linked loci (Buckler et al., 2009). The detection of allelic series is one of the advantages of nested populations compared with biallelic variation in genome-wide association of diversity panels or other multiparent populations (Brachi et al., 2011).

Furthermore, within and across environments, the alleles of PI 343950, the wild population from the wettest location, had the lowest average additive effect. Conversely, the average allelic effects of the accessions from the drier habitats were higher: PI 319441 alleles in both 2014 and 2015, both under full irrigation, whereas PI 417653 alleles were higher under drought in 2015. The overall allele effect of PI 319441 was higher than that of PI 417653 in 2014 and 2015, both under full irrigation, while the average allele effect of PI 417653 was higher than that of PI 319441 under drought in 2015. This suggest that drought tolerance could be a driver of local adaptation. Some traits, like deeper rooting, water use efficiency, earliness in flowering and maturity (Berny Mier y Teran et al., 2019a), might be also beneficial under cultivation if growing seasons are shorter. Berny Mier y Teran et al. (2019a) observed that PI 417653, the wild accession with the strongest allelic effects under drought, has deeper roots and the fastest early growth (Days to the V3 stage) compared to the other two wild accessions. On the contrary, high rainfall conditions could increase selection pressure toward more vigorous vegetative growth and higher disease resistance.

There are two previously published wild by domesticated yield QTL analyses. Blair et al. (2006) evaluated a cross between a domesticated Andean cultivar (ICA Cerinza) and a wild type

from Guatemala (G24404). They found nine QTLs for grain yield, with four of them having the wild allele increasing the trait. A QTL on Pv04 overlapped with our findings, which was detected in 2014 and as average across the three environments (**Table 4**). Their confidence interval was 0.4 to 9.5 Mbp while the QTL identified in this study was located within a 7.9 to 9.3 Mbp interval (**Table 4**). However, the additive effect of the G24404 accession was positive while the alleles from the three wild sources in our study had a negative effect. It is possible that the allele from the Andean gene pool in ICA Cerinza had a relatively smaller effect than that of SEA 5 and the wild accessions. Blair and Izquierdo (2012) evaluated a population of the same domesticated cultivar (ICA Cerinza) crossed to G10022 (PI 319441), one of the parents of the nested populations studied here. They found one QTL on Pv05, with the domesticated allele having the positive effect. This QTL did not overlap with Pv05.68 of the current investigation. By comparing the current results with our previous evaluation of a panel of wild germplasm (Berny Mier y Teran et al., 2019a), the QTL in Pv10 located in the interval of 41.2 to 42.1 Mb was situated near a SNP at 38.3 Mb associated with total biomass in the wild panel. This QTL showed a positive additive effect on yield resulting from one parental allele (PI 417653) but a negative effect from the other two parents, suggesting that this genomic location might be involved in local adaptation within the wild germplasm.

Two domesticated by domesticated mapping populations have been developed using SEA 5 (Briñez et al., 2017; Mukeshimana et al., 2014). The latter found three QTLs for yield, with one QTL at Pv09 (SY9.1, confidence interval of 25.1–27.1 Mb) overlapping with our findings (Mukeshimana et al., 2014). The allele of SEA 5 had a negative effect compared to a parental line of the Andean gene pool, while SEA 5 had a positive effect when compared to the wild types in the present study. Briñez et al. (2017) developed a population based on SEA 5 crossed to an Andean accession. They found two QTLs for grain yield, which did not overlap with our findings. Within other published QTL analyses for yield, our findings overlapped with QTLs found by Trapp et al. (2016) in Pv01 (SY1.1) and Pv05 (SY5.1), by Hoyos-Villegas et al. (2016) in Pv07 (SY7.4) and Pv10 (SY10.1), and by Berny Mier y Teran et al. (2019b) in Pv09. The latter QTL also overlapped with the SY9.1 QTL found by Mukeshimana et al. (2014).

From the 13 non-overlapping QTLs in this study, five had at least one wild allele with a significant positive additive effect. The allele of PI 417653 at Pv01.81 had the largest effect (277 kg ha−<sup>1</sup> ), detected under drought in 2015. PI 417653 was the only parent that had effects close to zero, in three of the 13 QTLs. This wild parent is part of the putative Mesoamerican domestication center of common bean in west-central Mexico and is, therefore, the closest genetically to the domesticated parent of the three wild populations used in this study (Kwak et al., 2009). This

observation suggests that the ancestor of this wild population contributed yield alleles to the Mesoamerican domesticated gene pool or that it could harbor an introgression from the domesticated gene pool (Papa and Gepts, 2003; Papa et al., 2005).

# Putative Candidate Genes for QTLs

The QTLs detected in this study encompass genomic regions with high number of genes, in general. For example, the statistical significance region of the QTL at Pv09.33, which was detected in the 2014 well-watered treatment and for DSI, ranges from 24.43 to 25.98 Mb in version 2.1 of the common bean reference genome deposited in Phytozome<sup>3</sup> . Within this region, 105 distinct gene models have been identified. Their annotation (obtained from PFAM<sup>4</sup> and PANTHER<sup>5</sup> ; see **Supplementary Table S1**) reveals genes implicated in a variety of cellular processes and molecular functions, such as response to stress, oxidative response, signal transduction, protein ubiquitination, chromosome modification, histone modification, metal ion binding, and RNA processing. For example, within this QTL region, Phvul.009G164600.1 was annotated as a serine carboxypeptidase, which was described as involved in oxidative stress in rice (Liu et al., 2008). Phvul.009G169400.2 is related to callose synthase, involved in callose deposition, a functional category that was also described by Recchia et al. (2018), in an experiment with BAT477 under drought and well-watered conditions, in the presence or absence of arbuscular mycorrhizal fungi. Moreover, this QTL region also contained members of a leucin-rich repeat family. This group of genes belongs a larger protein family of receptor-like kinases, playing important roles in stress resistance (Ye et al., 2017), a category also found by Recchia et al. (2018) in common bean. However, because the genomic significance region of a lowheritability trait like yield is so large, an exhaustive list of gene models for each QTL identified in this study becomes quite long as illustrated in **Supplementary Table S1**. It is difficult to focus on likely candidate gene models without further experimentation, which falls beyond the scope of this work.

An alternative approach is to map genes with a putative role in drought-tolerance and examine to what extent they co-segregate with QTLs. Genome-wide categorizations have been published for a few families in common bean, such as aquaporins (AQP) (Ariani and Gepts, 2015) and Dehydration Responsive Element-Binding (DREB) genes (Cortés et al., 2012; Konzen et al., 2019). AQPs play important roles as water channel proteins in plants. The AQP gene PvPIP1;1 is located near a QTL detected under drought in 2015 on chromosome Pv01, with a distance of approximately 120 Kbp from marker ss715645251. Two transcription factors belonging to the DREB gene family, traditionally characterized as involved in abiotic stress responses such as drought, were located within a specific QTL. Phvul.001G136100 is located within the QTL at Pv01.45, a QTL detected in the 2015 well-watered treatment and Phvul.003G241700 is located at Pv03.72, a QTL detected in the well-watered treatment in 2015. Both genes were previously categorized as DREB2 genes, which are normally involved in responses to abiotic stresses such as caused by water deficit (Konzen et al., 2019).

### CONCLUSION

Our original hypothesis was that wild P. vulgaris populations from drier areas would be better sources of yield-enhancing genes in a domesticated line than those from more humid areas. We showed here that this is indeed the case. However, we also showed that these same wild populations from drier areas also increased yields under well-watered conditions. On average, the superior progeny lines from arid wild beans increased yield by around 20%. Taken together, various wild genomic regions were identified that had positive effects on yield under well-watered and drought-stress conditions. Our results have the potential to make future introgressions assisted with markers faster and more efficient. The alleles with positive (and negative) effects help explain the transgressive segregation found in this study and underscore the potential of wild variation to improve the productivity of domesticated beans. Future work is needed to validate the QTLs with a positive effect on yield introgressed in different domesticated genetic (Des Marais et al., 2013; Shen et al., 2018). Variation within some QTLs in the magnitude of the effect or sign of the effect suggest allelic series associated with a range of phenotypic variation. This variation could be the basis to local adaptation (Buckler et al., 2009; Kronholm et al., 2012). In addition, further sampling of the wild variation is warranted (Stich, 2009), as well as evaluation in wetter and more humid conditions than in the field site in California.

### DATA AVAILABILITY STATEMENT

The datasets generated for this study can be found in Dryad https: //doi.org/10.25338/B8FW3M.

### AUTHOR CONTRIBUTIONS

JB and PG designed the experiments. JB, AP, and EK carried out the field trial and genotyping. JB analyzed the data and drafted the manuscript. JB, EK, AP, ST, and PG contributed to and edited the manuscript.

### FUNDING

Funding was provided by the Agriculture and Food Research Initiative (AFRI) Competitive Grant (2013-67013-21224) from the USDA National Institute of Food and Agriculture (NIFA).

### ACKNOWLEDGMENTS

We would like to thank the crew at the Plant Sciences Field Facility for assistance in the field experiments. Thanks

<sup>3</sup>https://phytozome.jgi.doe.gov/pz/portal.html

<sup>4</sup>https://pfam.xfam.org/

<sup>5</sup>http://pantherdb.org/

to Laura Gaminño, Jamily Ramos de Lima, Arthur Martins Almeida Bernardeli, Poliana Silva Rezende, Bruno Lima Martins, Guilherme Coelho Portilho, Higor Da Costa Ximenes de Souza, Mayara Rocha, Adam Yang, Ninh Khuu and the rest of the Gepts Lab for the help during planting, harvest, and sample processing.

### REFERENCES


### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2020.00346/ full#supplementary-material



populations within the secondary gene pool. J. Plant Breed. Crop Sci. 4, 49–61. doi: 10.5897/JPBCS11.087



unadapted germplasm into elite breeding lines. Theor. Appl. Genet. 92, 191–203. doi: 10.1007/BF00223376


inflorescence size in maize. Plant Biotechnol. J. 14, 1551–1562. doi: 10.1111/pbi. 12519


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Berny Mier y Teran, Konzen, Palkovic, Tsai and Gepts. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# An Updated Checklist of the Sicilian Native Edible Plants: Preserving the Traditional Ecological Knowledge of Century-Old Agro-Pastoral Landscapes

Salvatore Pasta<sup>1</sup> , Alfonso La Rosa<sup>2</sup> , Giuseppe Garfì<sup>1</sup> , Corrado Marcenò<sup>3</sup> , Alessandro Silvestre Gristina<sup>1</sup> , Francesco Carimi<sup>1</sup> \* and Riccardo Guarino<sup>4</sup>

1 Institute of Biosciences and Bioresources (IBBR), National Research Council of Italy (CNR), Palermo, Italy, <sup>2</sup> Cooperativa Silene, Palermo, Italy, <sup>3</sup> Department of Botany and Zoology, Faculty of Science, Masaryk University, Brno, Czechia, <sup>4</sup> Dipartimento STeBiCeF, Sezione Botanica, University of Palermo, Palermo, Italy

### Edited by:

Petr Smýkal, Palacký University Olomouc, Czechia

### Reviewed by:

Hanno Schaefer, Technical University of Munich, Germany Rosario Schicchi, University of Palermo, Italy

> \*Correspondence: Francesco Carimi francesco.carimi@ibbr.cnr.it

### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 31 October 2019 Accepted: 18 March 2020 Published: 29 April 2020

### Citation:

Pasta S, La Rosa A, Garfì G, Marcenò C, Gristina AS, Carimi F and Guarino R (2020) An Updated Checklist of the Sicilian Native Edible Plants: Preserving the Traditional Ecological Knowledge of Century-Old Agro-Pastoral Landscapes. Front. Plant Sci. 11:388. doi: 10.3389/fpls.2020.00388 The traditional use of native wild food plants (NWFP) may represent a valuable supplementary food source for the present and future generations. In Sicily, the use of wild plants in the human diet dates back to very ancient times and still plays an important role in some rural communities. Moreover, in this regard, the natural and cultural inheritance of this island is wealthy and diversified for several reasons. First, Sicily hosts a rich vascular flora, with 3,000 native and 350 endemic plants. Second, due to its central position in the Mediterranean, the island has acted as a veritable melting pot for the ethnobotanical knowledge of the rural communities of the entire basin. We reviewed all the available literature and, starting from such omnicomprehensive checklist, partially improved thanks to the data issuing from recent field investigations, we critically revised the whole species list, basing our review on field data issuing from interviews and on our expert knowledge. As a result, we provide a substantially updated list of 292 NWFP growing on the island. Further 34 species, reported as NWFP on previous papers were discarded because they are not native to Sicily, while 45 species were listed separately because their identity, occurrence and local use as food is doubtful and needs to be further investigated. Moreover, we tried to shed light on the ecology (growth form and preferential habitat) of the Sicilian NWFP, with special focus on crop wild relatives (CWR). Our preliminary ecological analyses point out that a high percentage of these plants are linked with the so-called 'cultural' landscapes, patchy semi-natural environments rich in ecotones, leading to the conclusion that the maintenance of century-old agro-pastoral practices may represent an effective way to preserve the local heritage of edible plants. Our study allowed to identify as much as 102 taxa of agronomic interest which could be tested as novel crops in order to face ongoing global changes and to comply with sustainable agriculture policies. Among them, 39 taxa show promising traits in terms of tolerance to one or more environmental stress factors, while 55 more are considered CWR and/or can be easily cultivated and/or show high productivity/yield potential.

Keywords: ethnobotany, agro-pastoral landscapes, sustainable agriculture, TEK (traditional environmental knowledge), Ellenberg Indicator Values (EIV)

# INTRODUCTION

fpls-11-00388 April 27, 2020 Time: 19:42 # 2

Sicily is the largest Mediterranean island (**Figure 1**), with an extension of c. 25,400 km<sup>2</sup> . Its territory is predominantly hilly or mountainous, with less than 20% of its surface below 300 m a.s.l. The geographical position of Sicily, its complex geological history (Carbone et al., 2018) and the sharp topographic, edaphic and climatic contrasts make this island one of the most heterogeneous Mediterranean territories. Moreover, Sicily and its satellite islands belong to the Tyrrhenian area, one of the main hot-spots of plant diversity in the whole Mediterranean basin (Médail and Quézel, 1999). Its rich flora counts about 3,000 native plant taxa, around 12% of which are endemics (Guarino and Pasta, 2018).

Thanks to its location, Sicily has also been a major pathway of human migration, acting as a cultural, economic and ethnic crossing point and melting pot. Human presence in Sicily knows no pauses since 14–13 thousand years ago (Mannino et al., 2012), and with no doubt during this long time-lapse entire plant assemblages have been profoundly shaped by hunters-gatherers, and subsequently even wiped out by the early onset of agrosilvopastoral practices (Leighton, 1999; Tinner et al., 2016).

The vegetation of the island shows almost everywhere the traces of long-lasting land exploitation; not surprisingly, anthropogenic plant communities (Artemisietea vulgaris, Chenopodietea, Papaveretea rhoeadis, Parietarietea judaicae, Poetea bulbosae, and Polygono-Poetea, etc.) characterize almost 50% of the island's vegetation (Guarino and Pasta, 2017). Furthermore, hard-wheat crop fields currently occupy a large portion of the island's territory, and other traditional forms of dry-land farming (olive, almond, carob tree, pistachio, ashtree, hazelnut, and chestnut groves) still characterize part of the Sicilian rural landscape. Along with the disappearance of traditional practices (e.g., seasonal transhumance) and crafts (e.g., charcoal burners, pipe manufacturers, cork and sumac gatherers, miners, millers, etc.), also many man-made landscapes and habitats, such as dry-stone terraces, fruit orchards, dry-land groves and fallows are fading or have already disappeared. Abandonment and land-use changes are responsible for the fast collapse of local natural and cultivated plant richness (La Mantia et al., 2011; Albrecht et al., 2016). Currently, those traditional landscapes and many others suffer from abandonment or, worse, undergo deep transformations due to agricultural intensification and urban sprawl. Intensive cultivations already cover around 25% of the island's surface, and they are still expanding. Two blatant examples are the ongoing replacement of Citrus orchards with greenhouses, while intensive vineyards are gradually substituting dry groves. Mechanized agricultural practices and the massive input of chemical fertilizers and pesticides select the weeds, to the detriment of Mediterranean plants and archaeophytes and favoring plenty of non-native tropical and subtropical species, which also take advantage of high nutrientand water input (Guarino et al., 2015).

In respect to this scenario and considering the challenges related to future global climate changes, Sicilian native wild food plants (from now on NWFP) could play a significant role with respect to the fulfillment of both sustainable agriculture and dietary supplement. In addition to raising awareness on the importance of Mediterranean diet (Willett, 2006; Gamboni et al., 2012), there are many other good reasons to go on with the investigation and valuation of the ethnobotanical knowledge on NWFP throughout the Mediterranean area (Guijt, 1998). In fact, during the last 20 years, an increasing number of scientific papers (e.g., Grivetti and Ogle, 2000; Trichopoulou et al., 2000; Simopoulos, 2004; Pieroni and Price, 2005; Schaffer et al., 2005; Bharucha and Pretty, 2010; Ranfa et al., 2014; Disciglio et al., 2017; Renna, 2017) highlighted the strong correlation between wild food consumption and health.

An increasing amount of field data highlights the fact that the Mediterranean countries were not only important as a target of the spread of the so-called 'Neolithic revolution' but played an active role in the early cultivation of the wild ecotypes of several woody species, such as Olea europaea (Diez et al., 2015), Ceratonia siliqua (Viruel et al., 2019), and Vitis vinifera (De Michele et al., 2019), as well as of many vegetables (e.g., Allium spp., Brassica spp., Cynara spp., etc.) and cereal crops (e.g., Avena spp., Hordeum spp., and Triticum spp.), even far before than expected (Snir et al., 2015). The current change of paradigm about the history and geography of early agriculture underlines the importance of studying, conserving and effectively exploiting the germplasm of Mediterranean CWR.

In this paper, we aimed to critically review the so far available lists of traditional Sicilian NWFP, and to provide a first evaluation of the ecological requirements of these plants, paying special attention to the natural and semi-natural ecosystems where they grow and stressing the importance of managing and maintaining them in view of the ongoing global changes. These ecosystems represent important potential reservoirs of food because they host most of the NWFP and many CWR. Additionally, we evaluated some ecological features of NWFP that could be retained as traits of interest for the detection of novel crops and/or the improvement of the existing ones. These plants may help to face the future challenges of Mediterranean agriculture due to global change, such as extreme heat events, water shortage, salinization and adaptation to nutrient-poor soils.

# MATERIALS AND METHODS

# Data Set

The list of the Sicilian NWFP provided by Pasta et al. (2011), based on a critical review of the whole regional ethnobotanical literature concerning edible plants up to 2010, was improved and updated by consulting all the most recent and authoritative works on this topic, i.e., Arcidiacono (2016), Schicchi and Geraci (2016), Morreale (2018), and Geraci et al. (2018). Coria (1989), Lentini and Mazzola (1998), Arcidiacono et al. (2007, 2010), as well as all the Sicilian papers published after 2010, i.e., Aleo et al. (2013), Tuttolomondo et al. (2014a,b), Mazzola et al. (2015), Signorello (2015), Licata et al. (2016), Quave and Saitta (2016), Schicchi and Geraci (2016), Cucinotta and Pieroni (2018), Gargano et al. (2018), and Monello (2018) were consulted, too.

The works of Ghirardini et al. (2007) and Guarrera and Savo (2016) were used as basic reference for the whole Italian territory. The assessment of the native status of the Sicilian edible plants

FIGURE 1 | Sicily (black) and its position in the Mediterranean Basin. The numbers refer to the localities where interviews were carried out in order to perform an additional assessment of the harvest frequency of Sicilian NWFP and to update the available knowledge on their use (see Supplementary Table S1), i.e., Joppolo Giancaxio (1), Caltanissetta (2), San Cataldo (3), Sutera (4), Piedimonte Etneo (5), Sant'Alfio (6), Pergusa (7), Barcellona Pozzo di Gotto (8), Lipari Island (9), Patti (10), Salina Island (11), Castelbuono (12), Chiusa Sclafani (13), Gangi (14), Palermo (15), Avola (16), Calatafimi (17), Favignana Island (18), Marettimo Island (19), Pantelleria Island (20), and Trapani (21).

was mostly based on Galasso et al. (2018), and non-native food plants were not taken into account for further elaborations. For each considered taxon, we reported the scientific name according to the second edition of the Flora of Italy (Pignatti et al., 2017– 2019), the synonym(s) encountered in the consulted literature on NWFP, and the plant family according to Chase et al. (2016), while their growth form was assessed according to Pignatti et al. (2017–2019). We decided to skip the analysis of the life forms and chorotypes of the Sicilian NWFP and we provided just basic information concerning other interesting ethnobotanical issues, such as edible part(s) and traditional use(s) as food, because all these topics have been already treated in the most recent and comprehensive papers concerning Sicilian edible plants (Lentini and Venza, 2007; Pasta et al., 2011; Geraci et al., 2018). Basic information on the potential risks induced by toxic or poisonous compounds contained in the Sicilian NWFP is provided, too.

Additionally, a preliminary semi-quantitative evaluation of the importance of Sicilian NWFP to local harvesters has been performed by comparing the data reported by Geraci et al. (2018) with those issuing from the interviews carried out across the whole Sicilian territory between 2016 and 2019 (**Figure 1**).

Finally, we annotated whether the listed NWFP were considered CWR by Heywood and Zohary (1995) or maybe deemed CWR following the definition: 'A crop wild relative is a wild plant taxon that has an indirect use derived from its relatively close genetic relationship to a crop' (Maxted et al., 2006).

### Ecological Assessment

In order to outline the ecological preferences of the Sicilian NWFP, the altitudinal range and the Ellenberg Indicator Values (EIVs) concerning each species, extracted from the database of Pignatti et al. (2017–2019), were adopted.

The preferential habitat types of the Sicilian NWFP were assessed following the list of diagnostic species of phytosociological classes of the European vascular plant communities (Mucina et al., 2016, Appendix S6). The phytosociological classes were associated with the EUNIS habitats recorded from Sicily in Carta della Natura (ISPRA, 2012) according to the crosswalk proposed by Angelini et al. (2009). In order to obtain more clues about the correlation between NWFP and different habitat types, we also used the habitats assigned in the database of Pignatti et al. (2017–2019), which recognizes only 24 habitat types belonging to three main ecological groups, i.e., 'terrestrial,' 'aquatic,' and 'anthropogenic.' In both cases, all the habitats where a given species may occur were assigned according to the fuzzy logic, with a numerical score ranging from 0 to 1. In this way, the fuzzy value was equal to 1 for the stenoecious species, whereas for the dioecious species the fuzzy values resulted to be >0 and <1 depending on the number of habitats assigned to a species.

We used the non-metric multidimensional scaling (NMDS) analysis to summarize the similarity in habitat composition among NWFP, using Bray–Curtis distance measure to calculate the NWFP distance. This analysis was performed by CANOCO 5 (ter Braak and Smilauer, 2012).

We avoided to correlate the values concerning the relative abundance of edible plants basing on the habitat coverage reported in the already mentioned Sicilian map of EUNIS habitats because we considered that any attempt to do this without a statistically significant amount of plot data may lead to unreliable and even misleading results. In fact, a plant may be very rare or very frequent in a given habitat regardless to the area of occupancy of the habitat itself in the island.

### RESULTS

### Inventory of Sicilian NWFP

A critical and updated inventory is provided and counts a total of 292 Sicilian NWFP (**Supplementary Table S1**). The taxa are listed in alphabetical order according to their scientific name. Synonyms, plant family, growth form, EIVs, altitudinal range, preferential habitat types, and CWR status are reported. Basic information on edible part(s), raw vs. cooked consumption, potential risks due to toxic or poisonous compounds, importance to local harvesters are provided, too.

According to the data issuing from our field investigations and those published by Geraci et al. (2018), the most commonly harvested Sicilian NWFP are Asparagus acutifolius, Beta vulgaris subsp. maritima, Borago officinalis, Brassica rapa subsp. campestris, Cichorium intybus, Foeniculum vulgare s.l., Sonchus oleraceus and Sonchus tenerrimus, while the less commonly harvested are Centranthus ruber, Narcissus tazetta s.l., Papaver setigerum, Rorippa sylvestris, Rumex crispus, and Tordylium apulum.

Our inventory includes 28 species which should probably be considered as archaeophytes in Sicily, but they are not mentioned in the checklist of the vascular alien flora of Italy (Galasso et al., 2018), and their regional status remains uncertain. The scientific name of these plant taxa is followed by an asterisk in the **Supplementary Table S1**. Moreover, many species commonly cultivated in Sicily and sometimes spreading from cultivations into semi-natural habitats were also taken into account, because palaeobotanical records testify that they occurred in Sicily throughout the Holocene. Hence, their native status cannot be discarded, even if their current distribution does not mirror the distribution range of their original wild populations. This is the case of Castanea sativa, Celtis australis, Ceratonia siliqua, Corylus avellana, Ficus carica, Laurus nobilis, Mespilus germanica, Olea europaea, Pinus pinea, Sorbus domestica, and Vitis vinifera.

On the contrary, a set of 34 plant species, despite being mentioned in the Sicilian ethnobotanical literature, were not considered in our analyses because they are not native (23 archaeophytes and 11 neophytes, see **Supplementary Table S2**). We also treated as archaeophytes several species which are native to other Italian regions but have certainly been introduced by man in Sicily. This is the case of Asparagus officinalis, Rhus coriaria, Ruscus hypophyllum, and Salvia officinalis, never spreading far from rural and disturbed areas, Origanum onites, whose Sicilian distribution range is restricted to the ruins of the ancient Syracuse, and Cercis siliquastrum, recently experiencing some successful colonization even in semi-natural habitats, but always next to the urban areas where it has been introduced.

The list of Sicilian NWFP was revised taking into account also the misleading information contained in some previous studies. In fact, due to identification errors, to the misinterpretation of Sicilian vernacular plant names and/or to the inappropriate inclusion of congeneric species, some authors reported plants which, either: (1) do not occur in Sicily or at least not in the study areas the papers were focused on; (2) have an extremely narrow niche and distribution range, so they could not be considered of any interest for local rural communities; (3) have been assumed to be food plants because they belong to the same genus of some NWFP, but their local exploitation needs to be better documented; (4) sometimes are used in Sicily as healing plants but cannot be used raw as food because they contain stinky, disgusting, toxic or even lethal compounds.

As a result of all the considerations above, 45 wrong, questionable and doubtful records were excluded from our NWFP inventory. These are listed in **Supplementary Table S3**,

along with the reasons for the rejection and the source(s) of the wrong/doubtful records.

Indeed, numerous reliable sources attest that Sicilian people harvest and eat at least 30 potentially toxic and 11 poisonous plant species. The Apiaceae (11 taxa) and Polygonaceae (6) account for more than 40% of them. Other families rich in dangerous food plants are Amaryllidaceae, Asteraceae, Fabaceae, and Ranunculaceae (3 taxa each), and Asphodelaceae, Iridaceae, and Solanaceae (2).

# Diversity and Ecology of Sicilian NWFP

The 292 taxa listed in **Supplementary Table S1** belong to 51 plant families; **Figure 2** shows the families including more than 5 NWFP. Around 2/3 of the Sicilian NWFP belong to six families only, i.e., Asteraceae (80, i.e., 27.4% of the whole list), Brassicaceae (39, 13.4%), Lamiaceae (18, 6.2%), Fabaceae (18, 6.1%), Apiaceae (16, 5.5%), and Rosaceae (16, 5.5%). The number of species and genera mirrors the overall taxonomic richness of the main NWFP families, except for Fabaceae, which showed rather a low percentage of edible species despite being one of the richest families of the Sicilian vascular flora. None of the Sicilian NWFP belongs to Poaceae, even if this family is the second richest of the Sicilian vascular flora, counting 293 different species on the Island.

If we focus on the five growth forms which count more than 10 Sicilian NWFP, the rosulate edible herbs result proportionally more represented than in the whole regional flora and the same pattern has been observed for climbing herbs and lianas (**Figure 3**). The percentage of scapose species among wild food plants is slightly higher, whereas that of caespitose and geophytes slightly lower than that of the whole Sicilian flora.

FIGURE 4 | Non-metric multidimensional scaling (NMDS) ordination plot of the NWFP of Sicily with respect to the habitats reported in the Flora of Italy (Pignatti et al., 2017–2019). Only the habitats explaining >3% of the total variance are displayed. EugBofor, woodlands and forests; EugPrard, dry grasslands; EugRamar, edges, clearings, deciduous shrubberies; EugRocce, rocks, barks, small outcrops; EugSand, sands; SinCaorv, crops, vegetable gardens, orchards, vineyards, olive groves; and SinIncur, fallows, ruderal and urban habitats.

The relationship between the NWFP and habitat types is shown in the NMDS ordination plots (**Figure 4** and **Supplementary Figure S1**). In both of the considered habitat

classifications, most species are concentrated near the center of the graphics, proving to be rather dioecious, with a clear preference for anthropogenic habitats. The first two axes of the NMDS explain 80.7% (EUNIS) and 72.18% (Flora of Italy) of the total variance, and in both cases, they appear to be related to a gradient of anthropogenic disturbance and a gradient of edaphic humidity, respectively.

Non-metric multidimensional scaling ordination plots show the prevalence of ruderal-nitrophilous species linked to anthropogenic vegetation and man-made habitats and ecotones, which represent the core of local edible plants. Yet, three further species groups cluster rather clearly: (1) the fruit-bearing woody species linked to forest and shrubland communities, (2) a group rich in chamaephytes and hemicryptophytes adapted to the harsh conditions of bare and nutrient-poor soils mostly occurring in rocky habitats, and (3) a group including many species adapted to sandy or salty soils.

### Good Performers and Stress-Tolerant Sicilian NWFP

As far as EIVs are concerned, some significant differences in the U-, R-, and N-values of the Sicilian NWFP with respect to the whole regional flora (data not shown) have been detected. In particular, even if the mean values are comparable, the EIVs assigned to the NWFP tend to be less dispersed than those of the regional flora. Based on the EIVs, the Sicilian NWFP displaying the maximum tolerance to heat, drought, soil salinity and adaptation to nutrient-poor soils are reported in **Table 1**. This list includes 39 plant species having good performances under harsh environmental conditions, being able to overcome at least one out of four severe stress factors, i.e., two edaphic (low soil nutrient content and high soil salinity) ad two climatic (lack of water and extreme thermic events). In particular, Capparis species can be considered well adapted to harsh climatic and edaphic conditions (Mercati et al., 2019), with Capparis sicula showing a high ecological plasticity, as it can survive both in poorly aerated, salty and clayey soils as well as in secondary habitats such as roadsides (Gristina et al., 2014), while C. spinosa mainly colonizes coastal rocky cliffs. Wild cabbages (Brassica spp.) can thrive on nutrient-poor soils and tolerate intense heat and water stress growing on base-rich cliffs (from the coastline up to 1,000–1,200 m a.s.l.), while Diplotaxis crassifolia and Thymbra capitata can stand the enduring summer drought stress typical to Mediterranean grasslands and shrublands. Many NWFP belonging to Asteraceae, Brassicaceae, Lamiaceae, Valerianaceae tolerate very low soil nutrient availability and water stress but cannot stand thermal stress. Cakile maritima, Crithmum maritimum, Echinophora spinosa, and Juncus acutus, all growing in coastal habitats, are well adapted to salt-rich soils, the last two being resistant to nutrient-poor soils and only the latter also tolerant to water stress.

### Sicilian NWFP and CWR

Based on literature reviews, we found that as much as 55 Sicilian NWFP (i.e., 18.8% of the total) are CWR (**Table 2**). Many of these plant taxa show high interest for their promising agronomic traits and/or for their high stress-tolerance. The richest families in CWR are Brassicaceae (9 taxa), Rosaceae (8), Asteraceae (7), Lamiaceae (6), Fabaceae (5), Apiaceae, and Asparagaceae (4), Amaryllidaceae and Capparaceae (2). With four taxa, the genera Asparagus and Brassica result to be the richest ones, followed by Allium, Capparis, Cichorium, Mentha, Rubus, Salvia, and Sorbus (2 taxa each).

To the family Asteraceae belong the wild progenitors of globe artichoke (Cynara scolymus L.), lettuce (Lactuca serriola L.) and chicories (Cichorium intybus and Cichorium pumilum), while the family Brassicaceae includes different species of wild cabbages (Brassica incana, B. nigra, B. rupestris and B. tournefortii) and wild rockets (Eruca sativa). Lamiaceae count many relatives of common spices such as mints (Mentha spicata and Mentha suaveolens), oregano (Origanum vulgare subsp. viridulum), rosemary (Rosmarinus officinalis), sages (Salvia fruticosa subsp. thomasii and Salvia sclarea). Other noteworthy CWR are the wild fennel (Foeniculum vulgare), wild strawberry (Fragaria vesca), wild garlic (Allium ampeloprasum and Allium commutatum) and wild capers (Capparis sicula and C. spinosa).

# DISCUSSION

# A Matter of Natural and Land-Use History

The landscapes of Sicily are the result of anthropogenic disturbances occurring over several millennia. Through the centuries, rural communities have managed their environment and farmed the land in their natural way, creating a rich diversity of landscapes, choral representation of the historical identity of the territory and human cultural heritage (Guarino and Pasta, 2017, and references therein). One of the first effects of human land use was an increased fire frequency: wildfires were the easiest way to obtain grass-dominated terrains, which could be used as rangelands or, eventually, cultivated. The prevalence of NWFP in semi-natural and anthropogenic habitats could be seen as the result of a process of selection and adaptation lasting since 10,000 years, at least. The plants growing in habitats of this kind are providing most of the harvest, because it is easy to identify them and because they are usually found in dense and homogeneous populations. For instance, the relatively high percentage of rosulate plants among NWFP could mirror their ecological adaptation to grazing disturbance. This growth form enables to reduce the damages (namely defoliation) caused by the bites and by the trampling activity of domestic herbivores (Xu et al., 2013, and references therein). Most of the edible climbers, vines and lianas (e.g., Clematis, Lathyrus, Rubia, Smilax, Tamus, and Vicia) are instead common in ecotones, which have been created and shaped by humans.

As far as we know, no literature data is explaining why rosette-bearing and scapose (erect) taxa are more common than caespitose among wild food plants, whereas the relatively low percentage of edible geophytes is likely to depend on the high frequency of poisonous species within this group. As for EIVs, the values of 'U' (edaphic humidity) of the Sicilian edible plants appear to be relatively low. This pattern matches with

TABLE 1 | An overview on the Sicilian NWFP which show the best performance with respect to four major stress factors, as suggested by the numerical values of the selected EIVs, i.e., T (temperature, range 1–12), U (edaphic humidity, range 1–11), N (nutrients, range 1–9), and S (salinity, range 0–3).


Best responses to one or more stress factors are in bold.

### TABLE 2 | Families of Sicilian NWFP – including CWR and/or stress-tolerant taxa – to be tested in order to detect and develop new crops for their high potential as resources for agronomic, genetic, pharmaceutical, and nutritional purposes.


(Continued)

### TABLE 2 | Continued

fpls-11-00388 April 27, 2020 Time: 19:42 # 9


the fact that basing on the bioclimatic classification of Sicily (Bazan et al., 2015), most of the Sicilian NWFP thrive under thermo-mediterranean bioclimatic conditions, characterized by long-lasting summer drought stress. The values of 'R' and 'N' indicate that many NWFP prefer base- and nutrient-rich soils, typical to the Sicilian hilly landscapes prone to human (namely agro-pastoral) practices. Interestingly, the high rate of ruderalnitrophilous species among the edible plants appreciated for their fleshy stem or foliage was confirmed in the study carried out in northern Croatia by Vitasovic Kosi ´ c et al. (2017) ´ , who also used EIVs to investigate the ecology of local wild food plants. Moreover, the altitudinal distribution pattern of the Sicilian NWFP (data not shown) suggest that they are slightly more concentrated in the lowlands and hilly areas of the island, where most of the permanent settlements and arable lands are concentrated since millennia.

Non-metric multidimensional scaling provides interesting clues on the spatial and temporal distribution of the food resources afforded by NWFP. For instance, most of the woody species are linked to forests, riverine plant communities and shrublands. They produce large amounts of fruits or berries that can be eaten raw, and represent an important food resource between the end of spring and the end of summer when all the annual and most of the perennial herbaceous NWFP are not available anymore.

Most of the edible Lamiaceae group together and are well adapted to nutrient-poor and rocky habitats (**Figure 4** and **Supplementary Figure S1**), where they cope with water and nutrient shortage by producing many aromatic compounds that play the twofold action of helping them to save water and to inhibit predation by herbivores. Other stress-tolerant plants typical to sandy and/or salty soils form another cluster rich in species adapted to high salt input and long-lasting water shortage.

Under an ecological perspective, nutrient-poor habitats promote selection for traits allowing efficient resource conservation, while nutrient-rich environments select for species with acquisitive trait profiles (Reich, 2014). Domesticated plants, and in particular herbaceous crops, share few common traits and represent a small portion of the phenotypic spectrum displayed by their wild ancestors: the majority of those living on nutrient-rich soils bear soft, large and short-lived leaves, are fast-growing and proficient competitors, with high leaf nitrogen concentration and tall canopies (Freschet et al., 2015; Milla et al., 2018). As cultivation generally involved higher and more regular nutrient and water supply rates, humans probably focused their interest on NWFP bearing resource-acquisition

trait profiles (Bogaard et al., 2013; Araus et al., 2014). Some of them, like the ancestors of several cereal crops, were chosen because they showed a rapid shift of their functional traits under domestication, being able to invest more energy on leaf biomass and height growth than other wild grasses that were used by hunter-gatherers, but were never domesticated (Cunniff et al., 2014). Many other cultivated plants descend from wild ancestors which were pre-adapted for cultivation thanks to several favoring traits (Triboullois et al., 2015; Martin-Robles et al., 2019). These considerations fit perfectly with the fact that most of the Sicilian NWFP, commonly used as vegetables and concentrated near the center of **Figure 4** and **Supplementary Figure S1**, share many morphological and ecological traits: they colonize nutrient-rich soils and ecotones, they bear large and tender leaves, they grow very fast, they are good competitors.

The massive number of edible Asteraceae (subfam. Cichorioideae) is a common pattern in Mediterranean countries (Della et al., 2006; Leonti et al., 2006; Rivera et al., 2006; Hadjichambis et al., 2008). Interestingly, Cichorioideae appear to be strongly associated with pastoral activities since ancient times (Florenzano et al., 2015). Also, Brassicaceae are an essential component in wild harvesting in south Italian regions (Biscotti et al., 2018). These plants have probably been gathered since the very first stages of agriculture by Neolithic farmers, as they behaved as weeds.

All these ecological clues lead to the same conclusion: the majority of the Sicilian NWFP may be considered as 'old companions' of local human communities. Early Sicilians probably discovered very soon that many 'proto-weeds' colonizing the open spaces created after the disruption of pristine woodlands were useful and tasty, as it has been recently shown also for Israel (Snir et al., 2015). Also, the available palaeoecological studies (Sadori et al., 2008; Stika et al., 2008; Noti et al., 2009; Tinner et al., 2009) suggest that from the beginning of the Holocene (∼10 Ka) onward, human impact has been an important – if not the main – factor inducing the final opening of the Sicilian landscape. With their activities (burning, clearing, cutting, farming, plowing, etc.) local Neolithic communities not only fostered the success of many non-native pioneer lightdemanding plants inadvertently introduced with crop species (the so-called 'archaeophytes') but also ended the shaping of the regional natural and semi-natural landscape, giving rise to a complex mosaic of prevalently open habitats with scattered nuclei of woodlands, shrublands and species-rich garrigues and grasslands (Guarino et al., 2005; Guarino, 2006; Brullo and Guarino, 2007).

## The Importance of a Multidisciplinary Approach

An unexpected result of our investigation concerned the detection of 46 wrong, questionable and doubtful plants that cannot be included among the Sicilian NWFP.

There are two distinct groups of experts coping with useful plants, plant uses and plant names: from one side the 'practitioners,' such as farmers, shepherds, artisans, cooks, forest workers, fishermen, hunters, etc., from the other side the 'scholars,' like ethnobotanists, ethno-anthropologists, philologists, historians, biochemists, veterinaries, etc. Until recent past, these two groups have been hardly sharing their knowledge. This is the main reason why we frequently find plant misidentifications in the papers written by non-botanists, while 'pure' botanists often fail to spell correctly, to trace the root of vernacular names and/or to report adequately the uses of the plants they study. Such 'communication gaps' may cause risky misidentifications due to the transfer of erroneous information concerning the alimentary use and therapeutic properties of some poisonous plants (i.e., Corsi and Pagni, 1979; Provitina, 1991; Atzei, 2003; Giardina et al., 2007; Łuczaj et al., 2012; see **Supplementary Table S3**).

The use of toxic and poisonous plants is a common pattern worldwide, especially were seasonality strongly affects food availability, and local human communities learned to exploit even the less attracting resources. This fact is not only a consequence of famine but also a question of timing and a result of centuryold folk knowledge: in fact, some plants may contain a much lower concentration of poisonous compounds in some periods of the year and/or in some specific organs. Moreover, the fact that some communities use a species does not necessarily mean it is safe to eat it. Hence, detailed data on the gathering and cooking procedures are of paramount importance, and such information should be taken into account before discarding any poisonous plant from the list of edible ones: this is the case of several species which are cooked before being consumed as vegetables (like the tender shoots of Asphodeline lutea, Carlina gummifera, Clematis vitalba, and Tamus communis) or to prepare jams (such as the fruits of Rubia peregrina).

The data reported by Geraci et al. (2018) are based on interviews carried out throughout the main island of Sicily and only focused on the plants and plant portions consumed as vegetables, while c. 25% of our interviews were carried out on the circum-Sicilian islets and regarded the whole spectrum of NWFP, including 90 aromatic and fruit-bearing plant species. The same value of harvest frequency was assigned to 42 out of the 202 plants (20.8%) of the common pool, while for 49 taxa (24.2%) the evaluation was consistently different (2–4 points of the adopted scale). Most of the differences between the two assessments may depend on the way (total number, geographical distribution and scope) the interviews were carried out by the two research groups.

Our results point out the need of making further efforts in order to retrieve and homogenize the information already available in the gray ethnobotanical literature, mostly published in Italian on regional and local papers which are often hard to find and to consult. To overcome errors and to improve the quality and effectiveness of their researches, ethnobotanists, ethnologists, and all the skilled persons involved in wild plant harvesting should start a tighter collaboration and launch an ambitious multidisciplinary research program focused on Sicilian traditional environmental knowledge (TEK: Heckler, 2012) and edible plants' use.

In order to avoid the transfer of erroneous information, in our opinion, the international scientific community should perform a more severe review of the data concerning the human diet.

As a rule of thumb, papers concerning edible plants should be not only written but also reviewed (and eventually rejected) by skilled local botanists in order to avoid that wrong information is published on scientific journals and reaches (and eventually kills) uninformed readers.

### Useful, Yet at the Brink of Oblivion

Taking into account the worrying forecasts regarding food security for the forthcoming years due to climate warming and global change (Ford-Lloyd et al., 2011), during the last decades an increasing effort has been addressed on the detection and conservation of CWR (Valdés et al., 1997; Heywood et al., 2007; Maxted et al., 2010; Dempewolf et al., 2014). Regional and national inventories have been strongly encouraged, and seed banks are currently created and implemented (Maxted et al., 2007, 2012; Vincent et al., 2013). These efforts aim at preserving wild food plants, restoring old varieties, finding out new technologies to enhance their use and new solutions to manage modern agricultural systems (Hajjar and Hodgkin, 2007) more sustainably (Caneva et al., 2013). In this framework, we are convinced that in-depth ecological insights on NWFP may innovate and increase the spectrum of cultivated vegetables.

In the perspective of implementing sustainable agriculture under the current global change scenario, each of the 39 plants reported in **Table 1**, showing the most promising traits in terms of tolerance to several stress factors (water shortage, high temperatures, and edaphic constraints), could be cultivated for experimental purposes. Also, given that a considerable number of Sicilian NWFP can be considered CWR, they could be tested in ad hoc genetic improvement programs of traditional crops, like it has been already done for Allium (Odeny and Narina, 2011), Asparagus (Falavigna et al., 2008; Kanno and Yokoyama, 2011), Beta (McGrath et al., 2011), Daucus (Grzebelus et al., 2011; Iorizzo et al., 2013), Lactuca (Davey and Anthony, 2011), Vicia (Bryant and Hughes, 2011), and for many Brassicaceae (Brassica: Branca and Cartea, 2011; Diplotaxis: Pignone and Martínez-Laborde, 2011; Eruca: Pignone and Gómez-Campo, 2011) and Rosaceae (Fragaria: Hummer et al., 2011; Malus: Ignatov and Bodishevskaya, 2011; Prunus: Potter, 2011; Pyrus: Bell and Itai, 2011; Rubus: Graham and Woodhead, 2011). Moreover, further research should be focused on many speciesrich and promising families and genera of NWFP (**Table 2**) whose members could be used in domestication programs aiming at developing new crops: i.e., new fragrances such as thymes (e.g., Thymbra capitata and Thymus spinulosus) and other Lamiaceae such as Clinopodium nepeta, Lavandula stoechas, and Micromeria juliana, new vegetables such as several wild species of Asparagus and wild rockets (e.g., Diplotaxis spp., Erucastrum virgatum).

Any future breeding activity concerning the most promising NWFP should start from a better understanding of their ecological traits, preferring those species that may provide additional ecosystem services and belong to genera that proved to be prone to a sensitive shift of above- and below-ground biomass allocation during the domestication process (Denison, 2012; Milla et al., 2015, 2017).

Despite their potential and prominent interest, especially for future generations, most of Sicilian NWFP seem doomed to oblivion. Due to ongoing social and economic changes in the Mediterranean area, the vast majority of the last custodians of ethnobotanical knowledge are more than 70 years old. As a consequence of their aging, not only the traditional knowledge but also entire cultural landscapes – shaped by men for millennia and providing the most suitable habitat for plenty of NWFP – is deemed to be lost within the next 10–20 years. We will lose most of the diversity of NWFP if we will stop eating them. Interviews to last living memories of the past cultural heritage (De Gregorio, 2008; Sottile and Genchi, 2010) are urgently needed before traditional knowledge fades forever (Quave and Saitta, 2016; Cucinotta and Pieroni, 2018), in addition to the collection and the conservation of Sicilian wild and cultivated germplasm (Hammer and Perrino, 1995; La Mantia and Pasta, 2005; Portis et al., 2005; Hammer and Laghetti, 2006; Schicchi et al., 2008; Forconi and Guidi, 2013; Marino et al., 2013).

# CONCLUSION

In this paper, we provide some preliminary clues about the ecology of all Sicilian NWFP, but knowledge on this topic needs to be improved, in order to detect any discrepancy between their real frequency, their altitudinal range, their distribution pattern and their local use as a food resource. To do that, we still need to analyze the data of all the published vegetation surveys carried out in Sicily and to cover the ethnobotanical knowledge gaps in some sectors of the island.

Tightly connected with the traditional agro-pastoral practices and landscapes, many NWFP are becoming rarer and rarer as a consequence of the ongoing processes of land-use change and abandonment. The only way to maintain both TEK and NWFP is to envisage concrete and shared measures aiming at promoting the self-sustainment of traditional agro-silvopastoral practices at the European, the national and the regional level.

Humans have shaped natural landscapes worldwide since prehistoric times. However, they often succeeded to modify ecosystem services and functioning without destroying them (Kareiva et al., 2007; Willis et al., 2007). Instead of trying to come back to nature, which may result a hard and even anachronistic target (Willis and Birks, 2006), we should try to inherit and replicate the past practices combining extensive and sustainable land use with the conservation of species diversity and ecosystem services (Plieninger et al., 2006; Guarino and Pignatti, 2010). In the end, TEK in general, and NWFP use in particular, issue from a wise and dynamic combination of silvopastoral and crop farming activities, and land use practices which allowed people and landscape to co-evolve and co-occur over millennia (Mercuri et al., 2019).

We should try to learn more about the best solutions adopted by mankind throughout history, focusing on the co-evolution between weeds, plant harvesters and farmers. At the same time, we should make more efforts in order to detect and value the TEK of the last Mediterranean bio-cultural refugia (Barthel et al., 2013).

### DATA AVAILABILITY STATEMENT

All datasets generated for this study are included in the article/**Supplementary Material**.

### AUTHOR CONTRIBUTIONS

fpls-11-00388 April 27, 2020 Time: 19:42 # 12

SP, FC, and RG conceived and supervised the project. AL carried out interviews across the whole Sicilian territory and built up the revised checklist of Sicilian NWFP with the help of SP and RG. SP, CM, RG, FC, and AG analyzed and interpreted the data. SP, RG, AG, GG, CM, and FC wrote the first draft. All authors made a substantial, direct and intellectual contribution to this work. All authors approved the final version of the manuscript.

### FUNDING

This research was supported by Regione Siciliana (PO FESR 2014/2020, Azione 1.1.5 -Project: Sicily Seeds: Metodologie e tecnologie innovative per il recupero, la moltiplicazione, la

### REFERENCES


valorizzazione e l'utilizzo di piante spontanee commestibili della flora siciliana, Grant No. 08PA6201000062). CM was supported by the grant no. 19-28491X of the Czech Science Foundation.

### ACKNOWLEDGMENTS

The kind support of Francesca La Bella (IBBR-CNR, Palermo), Tommaso La Mantia and Giovanna Sala (Department of Agricultural, Food and Forest Sciences, University of Palermo) during our bibliographic research was much appreciated. We are grateful to Giovanni Salerno (Department of Environmental Biology, University of Rome III) for his advice about the best way to treat the doubtful records concerning potentially dangerous species reported as edible by other authors coping with Sicilian wild food plants.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2020.00388/ full#supplementary-material

farmers. Proc. Natl. Acad. Sci. U.S.A. 110, 12589–12594. doi: 10.1073/pnas. 1305918110


a hotspot in the middle of the Mediterranean Basin. Front. Plant Sci. 10:1506. doi: 10.3389/fpls.2019.01506



Nutraceuticals. Forum Nutr, eds M. Heinrich, W. E. Müller, and C. Galli (Basel: Karger), 18–74. doi: 10.1159/000095207


Sottile, R., and Genchi, M. (2010). Lessico Della Cultura Dialettale Delle Madonie. 1. L'alimentazione. Centro di Studi Filologici e Linguistici Siciliani-Dipartimento di Scienze Filologiche e Linguistiche. Palermo: L'ALS per la scuola e il territorio.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Pasta, La Rosa, Garfì, Marcenò, Gristina, Carimi and Guarino. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Lost and Found: Coffea stenophylla and C. affinis, the Forgotten Coffee Crop Species of West Africa

Aaron P. Davis<sup>1</sup> \*, Roberta Gargiulo<sup>1</sup> , Michael F. Fay<sup>1</sup> , Daniel Sarmu<sup>2</sup> and Jeremy Haggar<sup>3</sup> \*

<sup>1</sup> Royal Botanic Gardens, Kew, Richmond, United Kingdom, <sup>2</sup> Welthungerhilfe, Freetown, Sierra Leone, <sup>3</sup> Department of Agriculture, Health and Environment, Faculty of Engineering and Science, Natural Resources Institute, University of Greenwich, Medway, United Kingdom

Coffea arabica (Arabica) and C. canephora (robusta) almost entirely dominate global coffee production. Various challenges at the production (farm) level, including the increasing prevalence and severity of disease and pests and climate change, indicate that the coffee crop portfolio needs to be substantially diversified in order to ensure resilience and sustainability. In this study, we use a multidisciplinary approach (herbarium and literature review, fieldwork and DNA sequencing) to elucidate the identity, whereabouts, and potential attributes, of two poorly known coffee crop species: C. affinis and C. stenophylla. We show that despite widespread (albeit small-scale) use as a coffee crop species across Upper West Africa and further afield more than 100 years ago, these species are now extremely rare in the wild and are not being farmed. Fieldwork enabled us to rediscover C. stenophylla in Sierra Leone, which previously had not been recorded in the wild there since 1954. We confirm that C. stenophylla is an indigenous species in Guinea, Sierra Leone, and Ivory Coast. Coffea affinis was discovered in the wild in Sierra Leone for the first time, having previously been found only in Guinea and Ivory Coast. Prior to our rediscovery, C. affinis was last seen in the wild in 1941, although sampling of an unidentified herbarium specimen reveals that it was collected in Guinea-Conakry in 2015. DNA sequencing using plastid and ITS markers was used to: (1) confirm the identity of museum and field collected samples of C. stenophylla; (2) identify new accessions of C. affinis; (3) refute hybrid status for C. affinis; (4) identify accessions confused with C. affinis; (5) show that C. affinis and C. stenophylla are closely related, and possibly a single species; (6) substantiate the hybrid C. stenophylla × C. liberica; (7) demonstrate the use of plastid and nuclear markers as a simple means of identifying F1 and earlygeneration interspecific hybrids in Coffea; (8) infer that C. liberica is not monophyletic; and (9) show that hybridization is possible across all the major groups of key Africa Coffea species (Coffee Crop Wild Relative Priority Groups I and II). Coffea affinis and C. stenophylla may possess useful traits for coffee crop plant development, including taste differentiation, disease resistance, and climate resilience. These attributes would be best accessed via breeding programs, although the species may have niche-market potential via minimal domestication.

Keywords: agronomy, climate change, coffee, West Africa, crop wild relatives (CWRs), DNA, Sierra Leone, speciality coffee

### Edited by:

Petr Smýkal, Palackı University Olomouc, Czechia

### Reviewed by:

Alan C. Andrade, Brazilian Agricultural Research Corporation (EMBRAPA), Brazil Benoit Bertrand, Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), France

\*Correspondence:

Aaron P. Davis a.davis@kew.org Jeremy Haggar J.P.Haggar@greenwich.ac.uk

### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 19 December 2019 Accepted: 21 April 2020 Published: 19 May 2020

### Citation:

Davis AP, Gargiulo R, Fay MF, Sarmu D and Haggar J (2020) Lost and Found: Coffea stenophylla and C. affinis, the Forgotten Coffee Crop Species of West Africa. Front. Plant Sci. 11:616. doi: 10.3389/fpls.2020.00616

### Davis et al. Lost and Forgotten Coffees

# INTRODUCTION

fpls-11-00616 May 15, 2020 Time: 17:0 # 2

Coffee is a globally significant crop that supports a multibilliondollar global industry (International Coffee Organization (ICO), 2019), over a lengthy value chain from farmer to consumer. Coffee farming alone involves the farming activities of around 100 million people worldwide (Vega et al., 2003). Two species dominate global coffee production: Arabica (Coffea arabica) and robusta (C. canephora), providing c. 60% and c. 40% of traded coffee, respectively (International Coffee Organization (ICO), 2019). Liberica coffee is cultivated worldwide in small quantities, and is insignificant in terms of global trade, although production in the Philippines and Malaysia can be substantial. Aside from C. arabica, C. canephora, and C. liberica, there are another 121 coffee species known to science (Davis et al., 2006, 2011, 2019). Some of these are used to make the beverage coffee, such as C. congensis, C. eugenioides, and C. racemosa, some have been used in breeding programs, and others have been used as high performing pest and diseases resistant rootstocks (Davis et al., 2019). Many more are used on a small, local scale, or are harvested directly from the wild, in Africa, Madagascar, and Asia. In previous centuries, and particularly at the end of the 1800s and early 1900s, there was considerable interest in, and use of, a range of beverage producing coffee species, more than there is today (Davis et al., 2019). It is also of note that since those times, a substantial proportion of the world's coffee species diversity has been discovered and named by science, particularly from the 1960s onward (Bridson, 1988; Davis et al., 2006; Davis and Rakotonasolo, 2009). The decline in the interest in these 'other' coffee species has been largely due to the overwhelming success of robusta coffee, which was itself transformed from a wild plant, and minor African crop species, to a major global commodity in around 150 years (Davis et al., 2019). Robusta gained market share against Arabica from the early 1900s onward due to its resistance to coffee leaf rust (CLR; Hemileia vastatrix) (Wrigley, 1988), a broader agroecological envelope (Davis et al., 2006), higher productivity (Wellman, 1961; Wrigley, 1988), lower purchase price (International Coffee Organization (ICO), 2019), and other specific attributes (Davis et al., 2019). Recently, however, there has been renewed interest in underutilized, forgotten, and little known coffee species, both cultivated and wild, due to their potential to counter specific pests and diseases, and provide resilience in an era of accelerated climate change (Davis et al., 2019). There is also an increasing curiosity in lesser known coffee species from the specialty coffee sector, in its quest to discover new and differentiated sensory experiences in coffee.

Among those of particular interest are two West African species: C. stenophylla and C. affinis, mainly due to historical reports of a superior taste, particularly for C. stenophylla (Cheney, 1925) but also C. affinis (De Wildeman, 1904). Given that these two species occur in Upper West Africa at relatively low elevations (see below) there may also be the potential for climate resilience. Both species fall within Coffee Crop Wild Relative Priority Group II, which includes species closely related to the main crop species, for which gene transfer to the crop is proven or assumed (with low to high post-crossing fertility rates) (Davis et al., 2019). Priority Group II includes all African species, apart from the main coffee crop species and their progenitors (C. arabica, C. canephora, C. liberica, and C. eugenioides: Priority Group I) and African species of Priority Group III. Priority Group III includes all the short-styled Coffea species (previously assigned to the genus Psilanthus) from Africa, Asia and Australasia, and all Madagascan species and Mascarene species (Davis et al., 2019).

Our recent knowledge of C. stenophylla and C. affinis is principally limited to germplasm surveys. Coffea stenophylla is recorded as a living plant in several (ex situ) coffee research collections (Anthony et al., 2007; Engelmann et al., 2007; Bramel et al., 2017); C. affinis is included in the most recent of these reviews (Bramel et al., 2017) but only as an entry based on our knowledge of accepted coffee species (Govaerts et al., 2019). Contemporary evaluations of coffee species diversity (Davis et al., 2006, 2011, 2019; Maurin et al., 2007; Hamon et al., 2017) clearly show that our knowledge of C. stenophylla and C. affinis is inadequate. Initial review of literature for C. affinis showed almost no extra knowledge of this species has been gained since 1937 (Portères, 1937a), with the exception of work in Ivory Coast and Guinea in the 1980s (Berthaud, 1983, 1986; Le Pierrès et al., 1989). It is imperative that we improve our knowledge of these two species, both in cultivation (including any commercial production) and in the wild.

In this study, our main objectives were to elucidate: the current cultivated and wild status of C. stenophylla and C. affinis; the taxonomic identity and systematic position of the poorly known C. affinis; and to assemble available information on crop plant attributes. To achieve these objectives we undertook: (1) a literature review; (2) a survey of herbarium and economic botany collections; (3) field surveys in Sierra Leone, visiting farms, research stations and natural forest locations; and (4) DNA sequencing of recently collected material, historical samples (herbarium and economic botany collection samples), known interspecies hybrids, and their analysis incorporating a reference set of previously published Coffea sequences.

### MATERIALS AND METHODS

### Literature Review

We examined all key literature pertaining to C. affinis and C. stenophylla. Knowledge of ex situ cultivation in research collections was gleaned from published works (Anthony, 1982, 1992; Anthony et al., 2007; Engelmann et al., 2007; Bramel et al., 2017; Davis et al., 2019), supported by personal observations (A. Davis, J. Haggar) and personal communication.

# Review of Herbarium Collections and Economic Botany Collections

Herbarium specimens are well suited to this type of study because they are verifiable in space (location), time (date) and form (species identity), and are often accompanied by additional information on the herbarium label (e.g., ecology, elevation, geology, and uses). We consulted herbarium specimen records from nine herbaria (BM, BR, K, MO, P, UPS, WAG) including

those in Sierra Leone (SL, FBC). Herbarium codes follow standard abbreviations (Holmgren et al., 1990; Thiers, 2019). The specimen data was disaggregated into unique records and duplicate specimens. Unique records comprise the combination of collector's name and number (e.g., Chillou 2381) or collector's name and date (e.g., Cope s.n., 7 iii 1912); s.n. is an abbreviation for sine numero, and lowercase Roman numerals to represent the month). Duplicate specimens possess the same unique identifier, i.e., they are from the same plant or possibly nearby individuals, but are found on separate herbarium sheets; these may either be found in the same herbarium or across two or more herbaria.

### Fieldwork in Sierra Leone

During 2014 and 2016 we made a request for samples of C. affinis and C. stenophylla and any atypical coffee morphotypes, from NGOs and farmer associations representing 10,000 coffee farmers across Kenema, Kailahun, and Kono Districts, which represent the major coffee producing region of Sierra Leone. We also visited the Sierra Leone Agricultural Research Institute (SLARI) research collection at Pendembu, Kailahun District (**Table 1**), to sample putative examples of C. stenophylla and C. affinis. In addition, 50 A4 posters showing the most obvious morphological differences (leaf shape and size) between the two cultivated coffee species, robusta coffee (C. canephora) and Liberica (C. liberica) and C. stenophylla were printed and distributed to district agriculture offices with coffee farming communities in southern Sierra Leone, between Freetown and Kenema. The aim was to provide an additional means of identifying farms that might be cultivating C. stenophylla or C. affinis. Visits to sites where C. stenophylla had been recorded in cultivation in northern Sierra Leone (based on the herbarium survey) were visited in 2017. In December 2018, we followed up on the poster survey, by visiting five farms that had stated cultivation of C. stenophylla. On the same trip, we visited the last known (1954) forest sites for C. stenophylla in the Kasewe Hills (Southern Province), and several possible locations: around Freetown (Western Area), near Moyamba Junction (Southern Province), and the forest area of Kambui Hills, adjacent to Kenema (Eastern Province). Follow up visits to the Kambui Hills were made throughout 2019 and in early 2020.

### Assembly of DNA Reference Collection

The most taxonomically comprehensive DNA dataset for wild coffee species is for plastid (trnL–F intron, trnL–F intergenic spacer (IGS), rpl16 intron and accD–psaI IGS) and the internal transcribed spacer (ITS) region of nuclear rDNA (ITS 1/5.8S/ITS 2) (Maurin et al., 2007; Davis et al., 2011). These markers have the ability to distinguish between African coffee species, and identify recently formed hybrids via differential inheritance of plastid and nuclear genomes (Maurin et al., 2007). Thirtyone accessions (**Tables 1**, **2**) were sequenced with the four markers: 14 collected by us in Sierra Leone; nine reference samples from the museum collections of RBG Kew (K), including three for C. stenophylla, four for C. affinis (including two farmed accessions), and four other coffee species; five samples associated with the production of the artificial hybrid C. arabica × C. racemosa (Medina Filho et al., 1977a,b); and one unpublished sequence of C. zanguebariae (**Table 2**). The five museum collections of C. stenophylla and C. affinis were selected to represent authentic material, i.e., that being cultivated at the end of the 19th and beginning of the 20th centuries (respectively) in Sierra Leone. The published reference sequence dataset (Maurin et al., 2007; Davis et al., 2011) included a single verified example of C. stenophylla. Two known interspecific hybrids, as identified in a previous study using the same markers (Maurin et al., 2007), were included in the sampling: C. arabica (C. canephora × C. eugenioides) and C. liberica × C. eugenioides [originally accessioned (Maurin et al., 2007) as C. heterocalyx]. Initial analyses were conducted using the study species (see above) and a global data set of African, Madagascar, Mascarene, and Asian coffee species (Maurin et al., 2007; Davis et al., 2011). Following this analysis and confirming general placement of accessions, this was reduced to African taxa, excluding short-styled Coffea species (former Psilanthus), equating to Coffee Crop Wild Relative (Priority) Groups I and II (Davis et al., 2019).

# DNA Extraction, Sequencing, and Data Analysis

Total DNA was extracted from silica dried leaves, fresh seeds, and seeds from herbarium specimens and other archival material (**Tables 1**, **2**) using a modified CTAB method (Doyle and Doyle, 1987) and purified using the QIAquick PCR purification kit (QIAGEN). Genetic variation among the accessions was assessed by employing four regions: nuclear internal transcribed spacers (ITS1 and ITS2), plastid trnL–trnF (trnL intron and trnL–trnF intergenic spacer), rpl16 intron and accD–psaI intergenic spacer. Amplifications were carried out following the protocol of Maurin et al. (2007); PCR products were purified using QIAquick PCR purification kit (QIAGEN) and sequenced following the methods employed by Maurin et al. (2007). Capillary electrophoresis was conducted on an ABI3730 DNA Analyzer (Applied Biosystems). Sequencing results were inspected in GENEIOUS v. 8.1.7 (Kearse et al., 2012). Newly sequenced accessions and unpublished sequences held at RBG Kew were referenced against GenBank accessions of Coffea species (Maurin et al., 2007; Davis et al., 2011). The sequences were aligned using MUSCLE (Edgar, 2004), as implemented in GENEIOUS. Gaps were treated as missing data and ambiguities were scored with IUPAC ambiguity codes. The model of character evolution was assessed in jModelTest v. 2.1.10 (Posada, 2008). Relationships among the taxa were reconstructed in MrBayes v. 3.2.7a (Huelsenbeck and Ronquist, 2001; Ronquist and Huelsenbeck, 2003) as implemented on the CIPRES Scientific Gateway v3.3; C. rhamnifolia was used as the outgroup. Analyses were conducted separately for the ITS and the plastid DNA datasets and regions difficult to align were excluded from the phylogenetic analysis. We also conducted a separate analysis on the concatenated ITS/plastid matrix for the Upper Guinea (UG) clade (including C. togoensis, C. affinis, and C. stenophylla), with C. canephora and C. liberica as outgroups. MCMC sampling was performed with two runs and four chains for 2 × 10<sup>7</sup> generations, with a sampling

TABLE 1 | List of material examined, with origin, source of material, and (DNA) identification.


The taxa in the furthest right-hand column are those used on the phylogenetic trees (Figures 4, 5); to find the accession identifier (as received and as discussed in the text) use left-hand column. For previously sequenced accessions included in Figures 4, 5, see Table 2). Seed, single seed from herbarium sample. Leaf, leaf sample in silica gel.

frequency of 1,000 and a relative burn-in of 25%; specified model of character evolution was GTR+G. Convergence was visually assessed with Tracer v. 1.6 (Rambaut et al., 2013), by combining trace files to confirm mixing and high effective sampling size (EES). Maximum clade credibility trees were drawn in FigTree v. 1.4.4 (Rambaut, 2018). Clade and


(Continued)

**147**

### TABLE 2 | Continued

fpls-11-00616 May 15, 2020 Time: 17:0 # 6


Newly sequenced material in bold. See Table 1 for details of accessions sequenced for this study. See Maurin et al. (2007) for details of previously sequenced material. \*Accessions sequenced from herbarium material; <sup>+</sup>Accession listed as Coffea heterocalyx in Maurin et al. (2007).

species alliance terminology follow Maurin et al. (2007) and Davis et al. (2011).

### RESULTS

### Literature Review

### Coffea stenophylla (Highland Coffee of Sierra Leone, Rio-Nunez Coffee, Senegal Coffee, Sierra Leone Coffee) (Figures 1, 2A)

Knowledge of C. stenophylla and its commercial potential dates back to at least 1794, based on reports by Adam Afzelius (1750–1837), who worked in, and collected plants from, Sierra Leone (Hiern, 1876). Coffea stenophylla was described as new to science in 1834 (Don, 1834) and was characterized on the basis of having narrow leaves (hence the species epithet) and black fruits (most coffee species have red fruits). Don (1834) stated that is was a 'Native of Sierra Leone, where it is cultivated'. . . and that 'The seeds of this species are roasted and used as the common coffee, and are even considered superior to it.' De Wildeman (1904) reported C. stenophylla as indigenous in Guinea (in the forests that border the Southern Rivers area) and Ivory Coast. From at least the 1850s, the seeds of C. stenophylla were disseminated from Sierra Leone, with the accompanying vernacular names of 'Highland coffee of Sierra Leone' or 'Sierra Leone coffee' (Hooker, 1896). Commercial (cultivated) samples reached the Royal Botanic Gardens, Kew (England) in 1856 (specimens in the Economic Botany Collection, Kew). According to Chevalier (1929), C. stenophylla was being cultivated in quantity in Sierra Leone in the 1890s (c. 1893), and in Guinea, to the extent that is was exported as a commercial product to France. In France it apparently received an exceptionally favorable market price (Scott Elliot, 1893). Living material (seeds) of C. stenophylla were sent to the Royal Botanic Gardens, Kew in May 1894, and from here it was sent to India, Sri Lanka (then known as Ceylon), Trinidad (**Figure 1**), and Java (Cheney, 1925; Scott Elliot, 1893). From Guinea is was sent to Vietnam (Chevalier, 1929) and probably other countries under French colonial rule at that time. In 1904, De Wildeman (1904) provided a summary of the cultivation of C. stenophylla in Guinea, where it seems to have been cultivated in some quantity as Rio-Nunez coffee, after the Nunez River (a major river in Guinea). It was also cultivated in Ghana, Senegal (where it was known as Senegal coffee), and Ivory Coast, possibly through early intervention by the Portuguese (Haarer, 1962), and in Uganda (Tothill, 1940). Chevalier (1929), states that the export of C. stenophylla in Sierra Leone and Guinea amounted to around three to five tons (3,000 to 5,000 kg) per year, although this does not include the amount of coffee consumed in these producing countries, which may have been substantial. Coffea stenophylla appears to have been a prominent feature of agriculture in Sierra Leone up until at least the 1920s (Dudgeon, 1922), but it may have been in decline after that time (Chevalier, 1929), perhaps due to a fall in coffee prices (Dudgeon, 1922). Elsewhere, despite all the reports of an excellent flavor (Don, 1834; Scott Elliot, 1893; De Wildeman, 1904, 1906b; Watt, 1908; Macmillan, 1914; Dudgeon, 1922; Cheney, 1925; Chevalier, 1929; Wellman, 1961; Haarer, 1962) and a range of potential agronomic attributes (see below) it did not prevail as a coffee crop species. Chevalier (1929) reported that although it was considered by many to be an exquisite coffee ("Suivant beaucoup de dégustateurs, c'est un café exquis") it was not widespread in global cultivation, due to low yields. Likewise, despite being introduced to Uganda in 1919 and then again in 1931, small bean size and low yield prevented it from being a sustainable coffee crop plant (Tothill, 1940). Despite this, a good agronomic performance at low elevations (e.g., c. 150 m) has been reported for C. stenophylla (Watt, 1908; Cheney, 1925; Bramel et al., 2017), and there is a report of potential resistance to CLR (Cheney, 1925). Given that the natural, and once-cultivated, environments of these two species in Upper West Africa were at relatively low elevations (150–610 m) (Watt, 1908) and that C. stenophylla is reported to withstand dry conditions (Wellman, 1961; Wrigley, 1988), there may also be

FIGURE 1 | Coffea stenophylla, cultivated in Trinidad Botanical Garden, with Demerara sugarcanes, photograph taken around 1900. The man in the photograph is 5 ft. 8 in. (1.72 m) tall. Image: Royal Botanic Gardens, Kew.

some resilience to high temperatures and low rainfall, compared to the main crop species.

Since the 1940s, most of the literature on C. stenophylla has been restricted to the recycling of information from previous publications (Wellman, 1961; Haarer, 1962; Wrigley, 1988; Stoffelen, 1998; Davis et al., 2006). The exceptions to this are the reports of dedicated coffee collection missions in Upper West Africa undertaken in the 1980s. Reporting on various missions to Ivory Coast between 1984 and 1987, Le Pierrès et al. (1989) record two wild populations of C. stenophylla from the main forest block of the Sud-Est region of Ivory Coast (north east of Abidjan); and from Guinea, 114 examples of small scale cultivation of this species in the gardens of local houses (between Boffa and Boke, and around Boke). Berthaud (1986) demonstrated the existence of populations of C. stenophylla in Ivory Coast, from Ira Forest (Forêt L'Ira), three other localities (populations) in the east of the country. In addition, Berthaud (1983) recorded this species at a dry forest site in the Ouellé area in western Ivory Coast. Berthaud (1986) reported C. stenophylla, C. liberica, and C. canephora in Ira Forest (Forêt L'Ira); C. stenophylla was restricted to the upper, drier parts of the hills, whereas the other two species were found in the valley bottoms (lower, wetter areas).

### Coffea affinis (Kamaya Coffee)

This species was first reported in c. 1900 from the coffee research garden of M. Boery in Guinea, having been originally collected from nearby native forests (De Wildeman, 1904). On first inspection, De Wildeman considered these plants to be similar in many characteristics to C. stenophylla (i.e., the presence

FIGURE 2 | Coffea affinis and C. stenophylla. (A) C. stenophylla in fruit, at Centre National de Recherche Agronomique (CNRA), Ivory Coast (image: Charles Denison); (B) C. affinis in flower; (C) C. affinis, fruits and seeds (partially dried); (D) C. affinis, leaves. Images (B–D), from Kambui Hills, Sierra Leone (images: Daniel Sarmu).

of black fruits, rather than the usual red) but different in other respects and particularly leaf size and shape (De Wildeman, 1904). When De Wildeman visited Guinea (De Wildeman, 1904) to observe these plants, which were being grown collectively as Rio-Nunez coffee, he declared that there were two species, C. stenophylla and another species, which he named as a new to science: C. affinis. According to De Wildeman (1904) C. affinis was akin to C. stenophylla in the color and shape of the fruit, and shape of the seeds, but differed in its vegetative characters (e.g., stems, leaves, stipules) and mainly by its larger leaves. De Wildeman (1904) believed that C. affinis was of considerable importance as new coffee crop species, due to its vigor, the quality (high value) of the coffee, and general resistance to disease (compared to Arabica coffee). A contemporaneous photograph of C. affinis (De Wildeman, 1906b) shows a coffee plant that differs from the narrow-leaved C. stenophylla by having larger, broader leaves.

Contrary to the viewpoints of De Wildeman (1904, 1906b), Chevalier (1905) suggested that C. affinis was native in Sierra Leone, as he had no knowledge of it growing wild in Guinea. Subsequently, Chevalier (1929) considered C. affinis to be a hybrid between C. liberica and C. stenophylla. The potential hybrid status of C. affinis was discussed at length by Portères (1937a), who argued against a hybrid origin, particularly in relation to a new coffee plant he considered indigenous to the Ivory Coast, known locally as 'Kamaya.' He drew a close association between 'Kamaya' and C. affinis, but owing to the uncertainty over the application of C. affinis, decided to name this plant C. stenophylla var. camaya. Portères (Portères, 1937a) reported that C. stenophylla var. camaya was found as single example in a coffee plantation near Abengourou (**Figure 3**), but that it originated from the wild at a nearby location close to Niabli (6◦ 39<sup>0</sup> N 3◦ 16<sup>0</sup> W) and that a few small plantations were established in Ivory Coast. A decade later, Chevalier (1947) suggested that C. stenophylla var. camaya and C. affinis were the same species, and in contrast to his earlier report was only found in its wild state in Ivory Coast. Wellman (1961) considered C. affinis to be indigenous to Guinea and Ivory Coast, and suggested that it was a fixed mutation of C. stenophylla. Other workers (Cramer, 1957; Stoffelen, 1998) referred back to the earlier opinion of Chevalier (1905), i.e., that C. affinis is a hybrid between C. liberica and C. stenophylla (Davis et al., 2006).

general collection sites [Boké, Friguiagbé, Coyah (Saliya Forest Reserve), Kasewe (Kasewe Hills Forest Reserve, near Moyamba), Kambui (Kambui Hills Forest Reserve, near Kenema, Ira [Ira Forest (Forêt L'Ira), north of Man], Abengouru and Singrobo)]. Species overlap, indicates where C. affinis and C. stenophylla occur in the same location.

# Herbarium Collection Survey

For C. affinis the herbarium survey yielded 12 unique records (seven cultivated and five wild records) with a total of 28 herbarium specimens (including duplicates), and for C. stenophylla 50 unique records (29 cultivated, 12 wild, and 9 with no data) and 72 herbarium specimens. Herbarium records associated with these species include the hybrid C. liberica × C. stenophylla, of which there were two unique records, both from cultivated material found in research collections. Compared to many other coffee species the number of herbarium specimens is low, especially for C. affinis. We examined two commercial seed collections of C. stenophylla, one with and one without parchment (pre-milling stage, with endocarp attached) and one of clean (green, pre-roasted) coffee (endocarp removed), and one fruit collection (whole, sun-dried fruits) from the Economic Botany Collection of the Royal Botanic Gardens, Kew (K). Examination of herbarium specimens confirms many aspects of the literature survey (see above).

From the herbarium survey, C. stenophylla is confirmed as an indigenous (wild) species of Guinea, Sierra Leone and Ivory Coast (Davis et al., 2006), and that it was also farmed and otherwise cultivated (e.g., research stations and farms) in these countries, as per the literature (see above). Most of the herbarium specimens date from the late 1800s and early 1900s. The most recent collections for these countries are as follows: Guinea (from the wild, 1941; from (small-scale) farm cultivation 1961); Sierra Leone [wild, 1954; from (small-scale) farm cultivation 1963]; Ivory Coast (wild 1932; from farms, no data). By comparison, the literature survey reveals small scale cultivation of C. stenophylla in Guinea and Ivory Coast, in the mid to late 1980s (Berthaud, 1983, 1986; Le Pierrès et al., 1989). Herbarium data also show that this species was cultivated in coffee research collections, and other germplasm collections, in Africa (Ivory Coast, Guinea, Nigeria, Sao Tomé, Sierra Leone, Tanzania, Ghana, and the Democratic Republic of Congo) and in Asia (Vietnam, Java).

The herbarium survey reveals that C. affinis is as an indigenous (wild) species of Guinea and Ivory Coast (De Wildeman, 1904). Chevalier could not find any evidence of its wild status in Guinea, referring only to cultivated material from the gardens of Conakry and Cameyenne (Chevalier, 1905), but herbarium data provide clear evidence of collections from natural forests in Guinea (see below). In contrast to the views of Chevalier (1905), and even more recent opinion (Davis et al., 2006) we could not find any evidence of wild C. affinis in Sierra Leone, which is also the case for the literature survey (but see Fieldwork in Sierra Leone, below). Herbarium specimens exist that were collected from a coffee plantation [Cope s.n., 7 iii 1912 (K)], labeled as C. affinis, which has the key leaf characteristics of this species but the flowers are absent, and the fruit color is not noted. Regarding the native status of C. affinis in Guinea, there are two collections in the Paris herbarium (P) collected from the environs of Boké (Boké Prefecture) that are identifiable as C. affinis and the labels clearly state that the plants were spontaneous (wild): Chillou s.n. 20 xii 1923 (three duplicates); Chillou s.n., 17 xii 1923 (two duplicates); a third specimen collected from Friguiagbé (Kindia Prefecture), by the same collector (Chillou 2381, 3 ii 1941) is also

likely to be spontaneous, although the native/cultivated status is not indicated on the specimen. The most recent collections for these countries are as follows: Guinea (from the wild, 1941; from farms, 1905); Sierra Leone (from a coffee plantation, 1912); Ivory Coast (wild 1930; from farms 1934).

# Fieldwork in Sierra Leone

A collection of 20 samples (leaf samples, images, and DNA samples) were made between 2014 and 2016 resulting from our request for samples of C. affinis and C. stenophylla and of any atypical coffee morphotypes (see the section "Materials and Methods"). Of these, seven were selected for DNA analysis (**Table 1**); the remaining samples conformed to regular variants of C. robusta and C. liberica. Three samples of putative C. stenophylla and C. affinis coffee were collected from the SLARI research collection (**Table 1**). The 10 samples were either considered as potential candidates for C. stenophylla or C. affinis, or hybrids between these species. Visits to sites (in 2017) where C. stenophylla had been recorded in cultivation in northern Sierra Leone failed to produce any coffee sightings. In December 2018, we followed up on the farm survey, visiting five farms that had stated cultivation of C. stenophylla, but no plants of this species or C. affinis were located (only C. canephora). Our visit to Kasewe Hills (Southern Province) resulted in the collection of a single sterile (no flowers or fruits) immature plant, which we preliminary identified as C. stenophylla. We did not find any plants matching C. stenophylla in forest locations within the Western Peninsula National Park (near Freetown) or near Moyamba Junction, but located a small population (with mature trees up to 7 m tall) matching this species in the forested area of Kambui Hills. At both localities the plants were collected in humid evergreen (lowland) forest at c. 400 m elevation — on a ridge top in the case of Kasewe Hills and on the side of a ridge on steeply sloping ground at Kambui Hills. Further visits to Kambui Hills throughout 2019 and early 2020 yielded further C. stenophylla, and trees provisionally identified as C. affinis, in both flower and fruit (**Figures 2B–D**). DNA samples from the two locations (Kasewe Hills and Kambui Hills) were added to the DNA analyses (see below and **Table 1**).

### DNA Analyses

A total of 120 sequences from four DNA regions, from 31 accessions, were generated for this study and their sequences deposited in GenBank (with NCBI accession numbers; see **Table 2**); 35 species-level reference sequences were downloaded from GenBank, from two previous studies (Maurin et al., 2007; Davis et al., 2011). The ITS alignment had a length of 804 bp, whereas the concatenated plastid dataset (trnL–trnF, rpl16 and accD–psaI) had a total length of 3253 bp. A merged ITS/plastid matrix for a subset of species containing members of the Upper Guinea (UG) Clade, plus two outgroup species, had a length of 3989 bp. Some sequences were missing due to poor sequencing quality: rpl16 in C. stenophylla (2) SL cult. and C. stenophylla (6) SL; and accD–psaI in C. stenophylla (2) SL cult. and C. affinis (1) SL. Accession information is given in **Tables 1**, **2**.

### ITS Analysis (Figure 4)

The results obtained are consistent with previously obtained ITS analysis (Maurin et al., 2007; Davis et al., 2011), in terms of species relationships and their placement into geographically delimited clades. All tetraploid hybrids (4n = 44) between C. arabica and C. racemosa are placed in the East African (EA) Clade (2) with C. racemosa (BS = 1). The diploid hybrid (2n = 22) C. liberica × C. eugenioides is placed (BS = 1) with species of the East-Central Africa (EC-Afr) Clade, in an unresolved position with C. eugenioides. The natural tetraploid hybrid (4n = 44) C. arabica (two accessions) is placed with species of the Lower Guinea/Congolian (LG/C) Clade in the 'Canephora Alliance' (BS = 1), sister to two accessions of C. canephora (BS = 1). Specimens of C. stenophylla collected from the wild in Sierra Leone [specimens (5) SL & (6) SL], other wild accessions of this species, and wild accessions of C. affinis from Sierra Leone, all fall within the Upper Guinea (UG) Clade (BS = 0.93) with C. humilis and C. togoensis [specimens (1) SL & (2) SL]. Three cultivated collections from Sierra Leone originally accessioned as C. arabica and C. affinis (×2) were placed in the Upper Guinea (UG) Clade (labeled in **Figure 4** as C. liberica × C. stenophylla SL cult. (1), (2), and (3). The historical accessions of farmed C. affinis from Sierra Leone (C. affinis in **Table 1**) were not placed with C. stenophylla or C. affinis, but in a clade with C. liberica and C. montekupensis (BS = 0.79), within the LG/C Clade [labeled as C. sp. (1) and C. sp. (2) in **Figure 4**].

### Plastid Analysis (Figure 5)

The results obtained are consistent with previously obtained plastid analysis using the same markers (Maurin et al., 2007; Davis et al., 2011), in terms of the relationships between species and their placement into geographically delimited clades. The F1 tetraploid hybrids (4n = 44) C. arabica × C. racemosa are placed in the East-Central Africa (EC-Afr) Clade, sister to a clade comprising C. arabica, C. eugenioides, C. kivuensis, and C. anthonyi. In contrast to the ITS analysis, the backcrossed C. arabica × C. racemosa is placed in the EA clade with wild and cultivated C. racemosa accessions. Coffea liberica × C. eugenioides is placed in one of the Lower Guinea/Congolian (LG/C) Clades with two species, C. liberica and C. magnistipula (BS = 0.66) in an unresolved position with three C. liberica accessions (BS = 0.99). The tetraploid (4n = 44) hybrid species C. arabica is placed with species of the EC-Afr Clade, viz. C. eugenioides, C. kivuensis, and C. anthonyi (BS = 0.97). Three cultivated collections from Sierra Leone originally accessioned as C. arabica and C. affinis (×2) were placed within one of the two LG/C clades [labeled in **Figure 5** as C. liberica × C. stenophylla SL cult. (1), (2), and (3)] in an unresolved position in a clade with various C. liberica accessions (BS = 0.97), and a subclade of C. liberica × C. eugenioides, C. liberica and C. magnistipula (BS = 0.66).

All wild and cultivated accessions of C. stenophylla, and wild accessions of C. affinis fall within the UG Clade (BS = 1), which includes C. togoensis and C. humilis. A separate clade containing only all wild and cultivated accessions of C. stenophylla and all

abbreviations: CAR, Central African Republic; DRC, Democratic Republic of Congo; IC, Ivory Coast; Moz, Mozambique; SL, Sierra Leone. Clade terminology follows Maurin et al. (2007) and Davis et al. (2011): EA, East Africa; LG/C, Lower Guinea/Congolian; EC-Afr, East-Central Africa; UG, Upper Guinea. All known and identified interspecies hybrids are marked in blue text and with a star (\*).

wild species of C. affinis, but including C. humilis fall within and in an unresolved clade (BS = 0.76). Our historical accessions of farmed C. affinis (C. affinis in **Table 1**) from Sierra Leone are not placed with C. stenophylla or C. affinis, but in a clade with C. liberica (BS = 0.98), sister to C. montekupensis (BS = 1), within one of the two LG/C clades [accessions are labeled as C. sp. (1) and C. sp. (2) in **Figure 5**].

### Incongruence Between ITS and Plastid Trees

There are several points of substantial incongruence between the ITS and plastid analyses (**Figures 4**, **5**), including those taxa of known (manmade) or proven (via DNA study) hybrid origin: C. arabica (C. arabica × C. eugenioides (Maurin et al., 2007), C. arabica × C. racemosa (Medina Filho et al., 1977a,b), and C. liberica × C. eugenioides (Maurin et al., 2007). These incongruencies are anticipated based on the knowledge, DNA and otherwise, that they are hybrids. In this study we identified three samples, originally accessioned (**Table 1**) as C. arabica and C. affinis (× 2), that were substantially incongruent in each analysis. Given the specific positions of these accessions in the analyses (see ITS and plastid results, and **Figures 4**, **5**), and their morphological features, we suggest that they represent the hybrid C. stenophylla × C. liberica. All known and identified interspecies hybrids are marked in the phylogenetic trees (**Figures 4**, **5**) in blue text and with a star (<sup>∗</sup> ).

### Combined Analysis for the Upper Guinea (UG) Clade (Figure 6)

Combining ITS and plastid data sets for all members of the Upper Guinea (UG) clade, C. humilis, C. togoensis, C. affinis, and C. stenophylla, produced relationships congruent with previous analyses using the same data (Maurin et al., 2007; Davis et al., 2011). The four species of the UG Clade are monophyletic (BS = 1), with C. humilis sister to C. togoensis, C. affinis and C. stenophylla (BS = 1); C. affinis and C. stenophylla form a clade (BS = 0.88) but each species is not monophyletic. The Kambui Hills accessions of C. affinis [(1) SL and (2) SL] and C. stenophylla accessions from Kasewe Hills and Kambui Hills [(5) SL and (6) SL, respectively] are monophyletic (BS = 0.92), and are sister to two accessions of C. stenophylla [(3) IC and (4) IC] from Ivory Coast (BS = 0.76). The C. affinis accession [(3) Guinea] from Guinea falls (BS = 0.78) with C. stenophylla accessions [(1) SL cult. and (7) Brazil cult], which originate from unknown localities in Sierra Leone.

### DISCUSSION

# Historical and Present-Day Status of C. affinis and C. stenophylla

In 2018, we rediscovered C. stenophylla in two locations in Sierra Leone, one from where it been collected before (Kasewe Hills, in 1954) and one a new location (Kambui Hills) (**Figure 3**). In both locations, C. stenophylla is extremely localized, and seemingly threatened. In the Kasewe Hills, near Moyamba, we were only able to locate a single plant, in an area of high deforestation. In the Kambui Hills, near Kenema, we located a small population, the extent of which is as yet unknown, but there are ongoing threats from logging, human encroachment, and artisanal gold mining. In late 2019, we located C. affinis in the Kambui Hills, not far from the populations of C. stenophylla. This is the first record of this species from the wild in Sierra Leone. The present day status of C. stenophylla and C. affinis in Guinea and Ivory Coast is poorly known. In Guinea, C. stenophylla was recorded as being under limited small scale cultivation in the 1980s (Le Pierrès et al., 1989), but there were no records of wild plants at that time. From a basic survey of remaining forest cover in Guinea, using satellite imagery (Google Earth, 2019), the likelihood of finding C. affinis and C. stenophylla in many of the localities where it was previously recorded as an indigenous plant (see the section "Results"; **Figure 3**) is limited, although possible. Field survey in more remote locations in Guinea, in appropriate environments and elevations, may reveal wild populations of these species. Indeed, we report here on a recent collection [2015; Couch 757 (K)] of a sterile (no flowers or fruit) coffee specimen from Guinea, which was identified on the basis of our DNA sequencing and morphology as C. affinis (see below). Despite this encouraging find, deforestation rates in Guinea are very high and ongoing (Couch et al., 2019); in 1992 it was calculated that 96% of the original forest had already been destroyed (Sayer et al., 1992). In Ivory Coast, the likelihood of finding more extensive wild populations of C. stenophylla and C. affinis are better, particularly as satellite data (Google Earth, 2019) shows the existence of native remnant vegetation in localities where this species was previously recorded, and where it is likely to be located. That said, one of the best known forest sites for this species in Ivory Coast (Berthaud, 1986), i.e., at Ira Forest (Forêt L'Ira), was largely destroyed around 2008. In summary, C. stenophylla and C. affinis are threatened throughout their indigenous ranges, and particularly in Guinea. On the IUCN Red List C. stenophylla is assessed as Vulnerable (VU); C. affinis is currently Data Deficient (DD) (International Union for Conservation of Nature (IUCN), 2020). On the basis of our literature, herbarium, and field survey, and DNA analysis, C. stenophylla and C. affinis are indigenous species of Guinea, Ivory Coast and Sierra Leone (**Figure 3**).

Historical data shows that C. stenophylla and C. affinis were farmed in some quantity in Upper West Africa, and especially C. stenophylla in Guinea and Sierra Leone. Coffea stenophylla was also widely cultivated in research stations across Africa and in various Asian countries; the presence of C. affinis in research stations appears to have been restricted to Upper West Africa (Guinea, Ivory Coast) during the early part of the last century. Our field surveys in Sierra Leone indicates that neither C. stenophylla nor C. affinis are under commercial cultivation (i.e., being farmed), or otherwise cultivated, in the present day. The last confirmed record of C. stenophylla production in Sierra Leone may have been the small plantation at Njala University grounds, which was apparently cut down after being abandoned in the 1980s (A. Lebbie pers. comm.). In Guinea, C. stenophylla was recorded as being under limited small scale cultivation in the 1980s (Le Pierrès et al., 1989). Coffea stenophylla has

Country abbreviations: CAR, Central African Republic; DRC, Democratic Republic of Congo; IC, Ivory Coast; Moz, Mozambique; SL, Sierra Leone. Clade terminology follows Maurin et al. (2007) and Davis et al. (2011): EA, East Africa; LG/C, Lower Guinea/Congolian; EC-Afr, East-Central Africa; UG, Upper Guinea. All known and identified interspecies hybrids are marked in blue text and with a star (\*).

been recorded recently in several coffee research collections including the Centre National de Recherche Agronomique (CNRA), in Ivory Coast (**Figure 2A**); L'Institut de recherche pour le développement (IRD), France [A. Davis pers. observ.], and Entebbe Botanical Gardens, Uganda [A. Davis pers. observ.]. It is also likely to exist in other ex situ collections. However, across the ex situ coffee germplasm network there is the problem of duplication, i.e., the same genotype(s) being represented in multiple sites (Anthony et al., 2007; Bramel et al., 2017; Davis et al., 2019). There also seems to be some confusion with the narrow-leaved variant of C. arabica 'Angustifolia,' which owing to its narrow leaves is sometimes mistakenly accessioned as C. stenophylla. Coffea affinis has not been recorded in coffee research collections since the beginning of the twentieth century. Genotyping by sequencing, or similar methods, are required to make a full assessment of ex situ collections of C. stenophylla, and all other coffee species.

## Natural Habitat (Growing Environment) of C. affinis and C. stenophylla

Knowing the location and associated habitat of crop wild relatives is important, as it can provide an initial assessment of environmental suitability as a crop plant. Our literature survey indicates that C. stenophylla may have drought tolerance characteristics (Portères, 1937b; Wellman, 1961; Wrigley, 1988). Both species occur at relatively low elevations (150–700 m) (De Wildeman, 1906a; Watt, 1908; Portères, 1937b). In Ivory Coast (at Ira Forest) C. stenophylla occurs on the upper, drier parts of hills; in the same location, C. canephora and C. liberica was found in the valleys (i.e., the lower, wetter areas). The locations for this species in Ivory Coast are generally drier than Sierra Leone (see below), with rainfall in the region of 1,500–1,700 mm per year, a 3–4 months dry season (Portères, 1937b), and an average annual temperature of c. 25.5◦C. In Guinea, De Wildeman (1906a) reported that in its natural state C. affinis occurs in gallery forest (forest associated with rivers) bordering waterfalls and in humid (evergreen) forest, and that it was frequently found at elevations of 400–700 m at a distance of 100 to 300 km from the sea.

In Sierra Leone our fieldwork located C. stenophylla at precisely 400 m at two locations (Kasewe and Kambui Hills), even when there was sufficient forest down to 200 m. We visited higher elevation locations within the Western Peninsula National Park (up to 600 m), but did not locate either of the two species. Sierra Leone is generally not a mountainous country, and most of its land is below 500 m. At Kasewe and Kambui Hills the climate is tropical monsoonal, with an annual rainfall average of 2,350–2,650 mm, an average annual temperature of c. 26◦C, and a distinct 3–4 months dry season (November to March/April). It should be noted that there is a considerable difference in rainfall between the wild locations of C. affinis and C. stenophylla in Ivory Coast and Sierra Leone, although this requires careful verification.

Our herbarium survey did not provide a great deal of further information on the habitat of either of these two species. Notes on herbarium specimens for C. stenophylla infrequently state 'hills,' one specimen states 'very common [on] more open places on augite hills,' and another 'dans les montagnes de Sierra Leone,' which suggested an association with high ground topology.

# Species Status and Systematic Affinities of C. affinis and C. stenophylla

A recent hybrid origin for C. affinis, as a result of crossing between C. liberica and C. stenophylla (Chevalier, 1929; Cramer, 1957; Stoffelen, 1998) is ruled out on three counts: (1) C. affinis is clearly fertile and productive, as evidenced from literature records (Portères, 1937a), herbarium specimens (with plentiful seed), and recent (2020) field observation (D. Sarmu pers. observ.; **Figure 2C**). Diploid interspecies hybrids of coffee are usually sterile, and while they may produce flowers, fruit set and production of viable seed are minimal unless fertility is restored via polyploidization (usually tetraploids) (Carvalho and Monaco, 1968; Charrier, 1978). (2) The fruits of C. affinis are always described as black; the hybrid C. liberica × C. stenophylla (see below) has purple fruits. (3) Our samples of C. affinis do not alternate between ITS and plastid markers, as in known hybrids [C. arabica, C. arabica × C. racemosa, C. liberica × C. eugenioides, and C. liberica × C. stenophylla (see below)], but instead are consistently resolved as sister to C. stenophylla. The idea that C. affinis is a fixed mutation of C. stenophylla, (Wellman, 1961) is also ruled out because of the variation evident in this taxon.

Our DNA analyses infer that C. stenophylla and C. affinis are closely related. Separate ITS (**Figure 4**) and plastid analysis (**Figure 5**) fail to resolve the systematic positions of the four Upper Guinea (UG) clade species (i.e., including C. humilis and C. togoensis), but a combined analysis of these data sets retrieves monophyly monophyly for C. affinis and C. stenophylla (**Figure 6**). Coffea affinis and C. stenophylla share specific characters, including: the habit of a small tree, obovate leaves with a distinct apical tip (acumen) and drying green, 2 to 4 flowers per axil, 6- to 8-merous flowers (i.e., six to eight corolla lobes and anthers per flower; and black or black–purple fruits (see **Figure 2**). Coffea togoensis, from Ghana and Togo, is also a small tree, and has leaves with a distinct apical tip (acumen), 6- to 8 merous flowers and black fruits, but generally has elliptic leaves, drying grayish or gray-green, 1 or 2 flowers per axil, and smaller fruits and seeds. Coffea humilis is unlike C. affinis, C. stenophylla and C. togoensis, as it is a monocaul dwarf (single-stemmed woody plant, up to 1 m high), with large (up to 22 cm long) obovate leaves, 5- to 7-merous flowers and red fruits.

Coffea stenophylla and C. affinis exhibit considerable morphological variation, particularly with regard to leaf size and shape. Some examples of C. stenophylla approach C. affinis in terms of leaf shape and dimensions. Further work, including morphological and more detailed molecular study, is required to determine the precise relationship between these two species. There could be grounds for subsuming C. affinis within C. stenophylla, for example as C. stenophylla var. camaya (Portères, 1937a).

Our historical accession of farmed C. affinis from a coffee plantation in Sierra Leone (two genotypes from a single accession [Cope s.n., 7 iii 1912 (K)] is not related to either C. affinis or C. stenophylla. ITS and plastid marker data place this accession in the Lower Guinea/Congolian Clade. The specimens of this accession are a reasonable morphological match for C. affinis, with leaves of the same dimensions and possessing an acuminate leaf tip, but the seeds are somewhat larger and narrower than C. affinis; flower morphology and fruits color are unknown. The Cope s.n. accession is compelling, as it placed with the Liberica Alliance but does not conform to any of the known species in this alliance. The distinct, elongated leaf tip (acumen) immediately sets it apart from all variants of C. liberica (Stoffelen, 1998).

In both ITS (**Figure 4**) and plastid (**Figure 5**) analyses C. liberica is not monophyletic. Further molecular data is required for C. liberica and closely related taxa. As currently circumscribed, C. liberica is a highly polymorphic (Bridson, 1985, 1988; Stoffelen, 1998; Davis et al., 2006) encompassing a broad range of morphological variation.

## Identification of Coffea Species Hybrids

Identification of interspecies hybrids or introgressed plants on the basis of incongruence between biparental (e.g., nuclear) and uniparental (plastid) phylogenetic trees is well established (Linder and Rieseberg, 2004). In most flowering plants plastid DNA is maternally inherited, whereas the nuclear DNA and the ITS region of ribosomal DNA is not (Chase et al., 2005). Previous analysis of ITS and plastid markers in Coffea have been used to identify the parents of the allotetraploid C. arabica (C. eugenioides × C. canephora) and the diploid hybrid C. eugenioides × C. liberica (Maurin et al., 2007). As part of this study, we undertook further tests of this method using an additional interspecies hybrids of known crossing history, viz. C. arabica × C. racemosa (Medina Filho et al., 1977a,b), which demonstrated the utility of the method, at least for recently produced hybrids (**Figures 4**, **5**). Using this method, we were able to identify the interspecies hybrid C. liberica × C. stenophylla (**Figures 4**, **5**) from cultivated material collected by us in Sierra Leone (see **Table 1**). The hybrid C. liberica × C. stenophylla has been reported before (Cramer, 1957), but never authenticated. The accession of this hybrid from Sierra Leone appears to be partially or totally sterile; in some years it produces a few fruits, but neither the viability nor fertility of seeds have been tested. The low level of fruit setting suggests that this hybrid is a diploid (2n = 22) rather than a polyploid; tetraploids generally have higher fertility or restored fertility (Carvalho and Monaco, 1968; Charrier, 1978). We have no means of knowing whether this hybrid was the result of a cross in cultivation or in the wild. Either origin is plausible: both species were grown together in a number of research stations in Africa, and Asia (Cramer, 1957); and in situ crossings have been reported. In Ivory Coast hybrid seedlings (but not mature plants) of C. liberica × C. stenophylla, were detected in the wild (Berthaud, 1986). Generally, interspecific hybridization in natural populations of Coffea is a rare phenomenon (Charrier, 1978; Berthaud, 1986).

Interspecies hybrids, once fertility (and thus yield) is restored via conversion to the tetraploid (4n = 44) state (Carvalho and Monaco, 1968; Nagai et al., 2008), are valuable for coffee crop development, as they provide the possibility of introducing useful traits. For example, CLR resistance (C. canephora × C. arabica; Clarindo et al., 2013; Avelino et al., 2015) and leaf-miner resistance (C. arabica × C. racemosa; Medina Filho et al., 1977b) for Arabica coffee. In our DNA survey of long-styled African species [Coffee Crop Wild Relative Priority Groups I and II (Davis et al., 2019)] we confirm that hybridization is possible across all the major African coffee clades (lineages), indicating the potential to create custom interspecies hybrids across this wide spectrum of Coffea species diversity.

## CONCLUSION

fpls-11-00616 May 15, 2020 Time: 17:0 # 16

Coffea affinis and C. stenophylla may possess useful attributes for coffee crop plant development, including taste, disease resistance, and climate resilience. These attributes would be best accessed via breeding programs, including those involving interspecies crossing, followed by tetraploidization. Here, we confirm that (initial) hybridization is possible across all the major clades of long-styled African Coffea species (Maurin et al., 2007; Davis et al., 2011; Hamon et al., 2017), i.e., Coffee Crop Wild Relative (Priority) Groups I and II (Davis et al., 2019). For C. stenophylla we confirm via DNA sequencing that a cross can be made with C. liberica, supporting the work of Louarn (1992), who also demonstrated that C. stenophylla can be crossed with C. canephora, C. congensis, and C. pseudozanguebariae. Development of C. stenophylla and C. affinis via minimal domestication (e.g., the selection of trait-specific genotypes) may be possible, although this route would probably only be feasible for high-value markets, such as the upper end of the speciality coffee sector, based on the historical reports of its superior taste. Productivity (green coffee yield) appears to be lower than the major commercial species (C. arabica, C. canephora and C. liberica). A key caveat here, is that C. stenophylla has not undergone sensory or agronomic evaluation in a contemporary setting. Despite the shortfall in our understanding of these species, the available evidence, as summarized and reviewed here, is more than sufficient to warrant further research for C. affinis and C. stenophylla, and to take measures to ensure their survival in the wild (in situ) and in cultivation (ex situ). Deforestation, and other forms of land-use change, are threatening the survival of these species in the wild, in Sierra Leone, Ivory Coast and Guinea.

The decline in the use of C. stenophylla use as a crop plant in Upper West Africa was dramatic, from once widespread (although comparatively small scale) use in the late and early parts of the 19th and 20th centuries, to apparently nothing today. Our survey of local farming communities in Sierra Leone, reported an absence of indigenous knowledge for this species. One of the other main reasons for the decline in its use may have been the considerable agronomic and commercial success of robusta coffee (C. canephora), which was introduced into global cultivation around the same time as C. stenophylla and greatly surpassed the comparatively meagre productivity of other underutilized coffee species (Davis et al., 2019). Following on from our fieldwork in Sierra Leone (2018–2020), wild stock of C. affinis and C. stenophylla is now being propagated in quantity, for sensory and agronomic evaluation, and to safeguard its existence. Field work in Guinea and Ivory Coast are required to further ascertain the present day indigenous and cultivated (farmed) status, respectively, and to provide a conservation management plan to ensure its survival in the wild. An in-depth review of coffee research collections, including genotyping (genome banking), is required to formulate an effective ex situ conservation management strategy for C. affinis and C. stenophylla, and indeed many other coffee species. African coffee species provide key resources for the sustainability of the global coffee sector (Davis et al., 2019) and should receive appropriate conservation measures in the wild (in situ) and in cultivation (ex situ).

# DATA AVAILABILITY STATEMENT

The accession numbers for the sequencing data presented in this article can be found in **Table 2** within the article.

# AUTHOR CONTRIBUTIONS

AD devised the non-fieldwork elements of the study, undertook the literature and herbarium, participated in fieldwork, and was the lead on writing the manuscript. RG undertook the DNA sequencing and analyses, and contributed toward the writing of the manuscript. MF helped to devise the study, provided assistance with DNA sequencing, and contributed toward the writing of the manuscript. DS undertook the bulk of the fieldwork in Sierra Leone. JH devised the overall framework of the study, devised and undertook substantial fieldwork in Sierra Leone, and contributed toward the writing of the manuscript.

# FUNDING

Funding for this research was provided from different sources including the EU funded Agriculture for Development Programme - Robusta Coffee Development Project contract FED/2013/322-213 administered by the Government of Sierra Leone; and a Darwin Initiative Scoping Project DARSC196 - Conservation and use of native coffee species in Sierra Leone.

### ACKNOWLEDGMENTS

We acknowledge the support and collaboration of the Department for Forestry, Ministry of Agriculture, and Sierra Leone Agricultural Research Institute, in Sierra Leone. We would like to offer our sincere gratitude to Welthungerhilfe (WHH), and to staff at their offices in Freetown and Kenema (Sierra Leone) for various support during this project. At WHH we would particularly like to thank Franz Moestl, Manfried Bischofberger, and Derek Wambulwa Makokha. We would also like to thank: Arnaud Havet, at WARC, in Sierra Leone; Martin Cheek at the Royal Botanic Gardens, Kew (UK) for advice on deforestation in Upper West Africa and for showing us project material from Guinea; Charles Denison for providing images of C. stenophylla; and Iris Wiegant and László Csiba for their assistance with lab work at Kew.

### REFERENCES

fpls-11-00616 May 15, 2020 Time: 17:0 # 17



**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Davis, Gargiulo, Fay, Sarmu and Haggar. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Chromosome-Scale Assembly of the Garden Orach (Atriplex hortensis L.) Genome Using Oxford Nanopore Sequencing

Spencer P. Hunt<sup>1</sup> , David E. Jarvis<sup>1</sup> , Dallas J. Larsen<sup>1</sup> , Sergei L. Mosyakin<sup>2</sup> , Bozena A. Kolano<sup>3</sup> , Eric W. Jackson<sup>4</sup> , Sara L. Martin<sup>5</sup> , Eric N. Jellen<sup>1</sup> \* and Peter J. Maughan<sup>1</sup>

<sup>1</sup> Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT, United States, <sup>2</sup> M.G. Kholodny Institute of Botany, National Academy of Sciences of Ukraine, Kyiv, Ukraine, <sup>3</sup> Institute of Biology, Biotechnology and Environmental Protection, Faculty of Natural Sciences, University of Silesia in Katowice, Katowice, Poland, <sup>4</sup> 25:2 Solutions, Rockford, MN, United States, <sup>5</sup> Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, ON, Canada

### Edited by:

Eric Von Wettberg, University of Vermont, United States

### Reviewed by:

Namshin Kim, Korea Research Institute of Bioscience and Biotechnology (KRIBB), South Korea Shanshan Chen, Zhengzhou University, China

> \*Correspondence: Eric N. Jellen jellen@byu.edu

### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 28 February 2020 Accepted: 22 April 2020 Published: 25 May 2020

### Citation:

Hunt SP, Jarvis DE, Larsen DJ, Mosyakin SL, Kolano BA, Jackson EW, Martin SL, Jellen EN and Maughan PJ (2020) A Chromosome-Scale Assembly of the Garden Orach (Atriplex hortensis L.) Genome Using Oxford Nanopore Sequencing. Front. Plant Sci. 11:624. doi: 10.3389/fpls.2020.00624 Atriplex hortensis (2n = 2x = 18, 1C genome size ∼1.1 gigabases), also known as garden orach and mountain-spinach, is a highly nutritious, broadleaf annual of the Amaranthaceae-Chenopodiaceae alliance (Chenopodiaceae sensu stricto, subfam. Chenopodioideae) that has spread in cultivation from its native primary domestication area in Eurasia to other temperate and subtropical regions worldwide. Atriplex L. is a highly complex but, as understood now, a monophyletic group of mainly halophytic and/or xerophytic plants, of which A. hortensis has been a vegetable of minor importance in some areas of Eurasia (from Central Asia to the Mediterranean) at least since antiquity. Nonetheless, it is a crop with tremendous nutritional potential due primarily to its exceptional leaf and seed protein quantities (approaching 30%) and quality (high levels of lysine). Although there is some literature describing the taxonomy and production of A. hortensis, there is a general lack of genetic and genomic data that would otherwise help elucidate the genetic variation, phylogenetic positioning, and future potential of the species. Here, we report the assembly of the first high-quality, chromosome-scale reference genome for A. hortensis cv. "Golden." Long-read data from Oxford Nanopore's MinION DNA sequencer was assembled with the program Canu and polished with Illumina short reads. Contigs were scaffolded to chromosome scale using chromatin-proximity maps (Hi-C) yielding a final assembly containing 1,325 scaffolds with a N50 of 98.9 Mb – with 94.7% of the assembly represented in the nine largest, chromosome-scale scaffolds. Sixty-six percent of the genome was classified as highly repetitive DNA, with the most common repetitive elements being Gypsy- (32%) and Copia-like (11%) long-terminal repeats. The annotation was completed using MAKER which identified 37,083 gene models and 2,555 tRNA genes. Completeness

of the genome, assessed using the Benchmarking Universal Single Copy Orthologs (BUSCO) metric, identified 97.5% of the conserved orthologs as complete, with only 2.2% being duplicated, reflecting the diploid nature of A. hortensis. A resequencing panel of 21 wild, unimproved and cultivated A. hortensis accessions revealed three distinct populations with little variation within subpopulations. These resources provide vital information to better understand A. hortensis and facilitate future study.

Keywords: Amaranthaceae, Atriplex hortensis, Hi-C, orach, orphan crop, proximity-guided assembly

## INTRODUCTION

Atriplex hortensis L. (2n = 9x = 18), also known as garden orach or mountain-spinach, is a highly nutritious, leafy annual plant. It is a moderately xero-halophytic species that is resistant to salinity, a wide range of temperatures, and drought. Originating in Eurasia, A. hortensis has been a minor vegetable food source in multiple areas of the Trans-Himalayan region and has since become naturalized throughout the Americas. It exhibits incredible variation in pigmentation as a result of its variable content of betalains, as well as substantial differences in height and seed production (Tanaka et al., 2008; Simcox and Stonescu, 2014).

Atriplex hortensis has been recognized for its medicinal properties which were shown to improve digestion, increase circulation and boost the immune system (Rinchen et al., 2017). Additionally, A. hortensis has been used in land rehabilitation projects because of its ability to establish well, grow rapidly, reduce soil erosion and compete with native plants (McArthur et al., 1983; Simon et al., 1994; Wright et al., 2002). As a result, A. hortensis is important for both domestic and wild browsing animals where other forage crops are lacking. Despite its affinity for low to moderate saline areas where it has little competition from non-halophytes, A. hortensis can also grow where total soluble salts are low, making it well suited to a multitude of different environments (Welsh and Crompton, 1995).

As the world continues to search for new ways to feed its ever-growing population, new food sources have gained popularity that have helped provide diversity to diets while capitalizing on less desirable, underutilized or even fallow landscapes for agriculture. Given its xero-halophytic characteristics, A. hortensis is an intriguing candidate for contributing to world food security, especially in areas rich in saline soils. In comparison to other leafy vegetable crops, A. hortensis seeds and leaves are both edible and have protein contents of 26% (dry weight) in seeds, which is comparable to some legumes (Wright et al., 2002), and 35% (dry weight) in leaves, which is higher than spinach (Spinacea oleracea L.), a close relative of A. hortensis also belonging to the same subfamily Chenopodioideae, but to a different tribe (Anserineae; see Fuentes-Bazan et al., 2012). However, the seeds contain antinutritional saponins that must be removed by washing

and/or seed abrasion. In this respect A. hortensis resembles its distant relative quinoa (Chenopodium quinoa Willd.); the name recently formally proposed for nomenclatural conservation (Mosyakin and Walter, 2018), which also contains saponins. Interestingly, sweet varieties of quinoa have been identified that have a nonsense mutation in the regulator of the saponin biosynthetic pathways (Jarvis et al., 2017) – suggesting similar pathways could be targeted to remove antinutritional saponins in A. hortensis. The seeds of A. hortensis have higher fat, ash, fiber and lysine contents than most cereal grains (Wright et al., 2002). Its high protein content, which includes an essential amino acid profile that meets the WHO and UN-FAO recommended adult levels, also makes A. hortensis very attractive as a novel protein source.

Atriplex hortensis belongs to the family Chenopodiaceae in the strict sense, which is now often included in the extended family Amaranthaceae sensu lato; this group (Chenopodiaceae+Amaranthaceae) is phylogenetically nested in the core clade of the order Caryophyllales, which in turn, belongs to core eudicots, the largest and most diverse clade of angiosperms (for an overview of high-level phylogeny of the group, see Hernández-Ledesma et al., 2015; APG IV, 2016, and references therein).

The merger of the traditionally recognized families Chenopodiaceae and Amaranthaceae sensu stricto into one family under the priority name Amaranthaceae sensu lato proposed already in the first version of the APG system (APG (Angiosperm Phylogeny Group), 1998) remained unchanged in all other APG modifications (see, APG IV, 2016, and references therein). It was widely followed by many researchers and users of botanical nomenclature, but usually not by the experts in taxonomy of Chenopodiaceae (s. str.), who mainly continued to accept the two families. Not discussing here the reasons of and arguments for the two concepts of familial and subfamilial delimitation in the Amaranthaceae/Chenopodiaceae alliance (which will be discussed in a separate article, now in progress), we, however, note that the merger of the two families resulted in some confusion and miscommunication in recent literature regarding the usage of family names, and especially names of infrafamilial suprageneric entities (such as subfamilies and tribes). For example, some authors use the subfamily name Amaranthoideae in its traditional sense for just one group in Amaranthaceae s. str., while others may use it to cover all formerly recognized groups in Amaranthaceae s. str. (including Amaranthoideae, Gomphrenoideae, etc.). To avoid any uncertainty, we conventionally use in the

**Abbreviations:** BLAST, basic local alignment search tool; Mbp, megabases; MYA, million years ago; ONT, Oxford nanopore technology; SNP, single nucleotide polymorphisms.

present article the following nomenclature (both formal and informal names): (1) the group uniting Chenopodiaceae and Amaranthaceae s. str. (forming together the extended Amaranthaceae sensu APG) is referred to under an informal designation "Amaranthaceae/Chenopodiaceae aliance;" (2) the family-rank names Amaranthaceae and Chenopodiaceae refer to the groups corresponding to the two traditionally recognized families; and (3) the sufbamily-rank name Chenopodioideae (in paralel with other recognized subfamilies, such as Betoideae, Salsoloideae, etc.) corresponds to just one group of Chenopodiaceae s. str., but not to the group covering the whole family Chenopodiaceae in its traditional circumscription; similarly, Amaranthoideae refers to the subfamily-rank subdivision of Amaranthaceae s. str., comparable to Gomphrenoideae.

Recent molecular phylogenetic and taxonomic studies have led to considerable improvements in taxonomy and in our understanding of phylogenetic relationships in the order Caryophyllales in general and Atriplex and its closer relatives in particular (Kadereit et al., 2010; Zacharias and Baldwin, 2010; Brignone et al., 2019; Morales-Briones et al., 2020). However, few molecular studies have been focused specifically on A. hortensis in recent years.

As it is viewed now, Atriplex is nested in the larger clade corresponding to the tribe Atripliceae (including Chenopodieae, which is the correct name for the group if placed in Chenopodiaceae, not Amaranthaceae s.l.) as outlined by Fuentes-Bazan et al. (2012), and/or to a smaller clade corresponding to the tribe Atripliceae in a narrower sense, as outlined by Kadereit et al. (2010). The Atripliceae in the narrow sense is sister to another clade (informally called Chenopodieae I; see Kadereit et al., 2010) containing Chenopodium s. str. (including Australian Rhagodia R. Br. and Einadia Raf.; see Fuentes-Bazan et al., 2012; Mosyakin and Iamonico, 2017) in its much restricted sense, excluding taxa formerly placed in Chenopodium sensu lato but now recognized in phylogenetically more distant genera Blitum L. (which is close to Spinacia L.), Chenopodiastrum S. Fuentes, Uotila & Borsch, Dysphania R. Br., Lipandra Moq., Oxybasis Kar. & Kir., and Teloxys Moq. (see Fuentes-Bazan et al., 2012; Hernández-Ledesma et al., 2015).

The clade of Atripliceae (sensu Kadereit et al., 2010) contains two main subclades (informally named as the Archiatriplex-clade and Atriplex-clade) with several smaller lineages, some of which are currently recognized as separate genera. As circumscribed now, the phylogenetically coherent and monophyletic Atriplex includes several groups that were earlier described and recognized as separate genera, such as Obione Gaertn. and some Australian and North American groups. Despite morphological distinctiveness of some of those groups, they are phylogenetically deeply rooted in Atrilpex and thus their recognition as separate genera is not recommended. In contrast, several genera are recognized in the Archiatriplex-clade, namely Archiatriplex G.L.Chu, Exomis Fenzl ex Moq., Extriplex E.H. Zacharias, Grayia Hook. & Arn., Holmbergia Hicken, Manochlamys Aellen, Microgynoecium Hook. f., Proatriplex (W.A.Weber)

Stutz & G.L. Chu, and Stutzia E.H. Zacharias (Kadereit et al., 2010; Brignone et al., 2019). They represent relicts of earlier diversification events in the group. Also, some additional early-branching ("basal") lineages of the Atriplexclade can also probably be recognized as separate genera. For example, in addition to the currently recognized genera Halimione Aellen and Atriplex s. str. (Kadereit et al., 2010), such groups as Cremnophyton Brullo & Pavone from Malta containing C. lafrancoi Brullo & Pavone (=Atriplex lafrancoi (Brullo & Pavone) G. Kadereit & Sukhorukov; see Kadereit et al., 2010) and the mainly Central Asian Sukhorukovia Vasjukov (2015) with S. cana (C.A. Mey.) Vasjukov (Atriplex cana C.A. Mey. = Cremnophyton canum (C.A. Mey.) G.L. Chu) may be probably assigned the generic rank after further research.

Since A. hortensis is the nomenclatural type of the genus, it naturally belongs to Atriplex subgen. Atriplex sect. Atriplex (Art. 22.1 of the International Code for Nomenclature of algae, fungi and plants: Turland et al., 2018). This section houses at least two other species, A. sagittata Borkh. (earlier often known under the synonymous name A. nitens Schkuhr) and A. aucheri Moq., which seem to be most closely related to A. hortensis (Sukhorukov, 2006). The clade of A. hortensis and its two close relatives belongs to the grade of early-branching clades of Atriplex s. str. containing taxa with C<sup>3</sup> photosynthesis (Brignone et al., 2019).

The geographic and taxonomic origins of domesticated A. hortensis remain elusive because at present the species is known mostly (or exclusively?) in cultivated and escaped (and locally naturalized) populations. It probably originated somewhere within the geographic ranges of its closest relatives, in Central Asia or adjacent regions, or it could be native in the Mediterranean region and/or Asia Minor (Sukhorukov, 2014).

Although recent studies have tested the limits of salt-tolerance of A. hortensis (Vickerman et al., 2002; Sai Kachout et al., 2011), there has been little to no research conducted to develop genetic tools necessary for accelerating A. hortensis breeding. One phenotypic characteristic in need of improvement for seed production is the panicle, which consists of two types of flowers usually mixed on the same plant. One type produces 3–5 mm diameter seed that are encased within large, papery bracteoles that are not retained well under windy conditions at maturity. The other flower type produces 1–2 mm black fruits/seeds that have no bracteoles but are instead subtended by easily removed tepals.

To better understand the underlying genetic basis of the xero-halophytic, nutritive and unique pigmentation characteristics of A. hortensis, and to more accurately assess phylogenetic relationships within its family and genus, we sequenced the A. hortensis genome. We show that ultra-long reads produced by the portable, real-time Oxford Nanopore Technology (Oxford, United Kingdom) MinION sequencing system (Lu et al., 2016) with short-read polishing and chromatin-contact mapping is an effective approach to generate a high-quality genome assembly in a moderately large and complex genome of a diploid plant species. We annotated the genome with a deeply

sequenced transcriptome from various A. hortensis plant tissues, and we demonstrate the quality of the chromosomelevel genome assembly and annotation using Benchmarking Universal Single-Copy Orthologs (BUSCO) (Simão et al., 2015) to assess the completeness of the assembled genome. Genomic comparison to other Caryophyllales within the Amaranthaceae-Chenopodiaceae family identified highly syntenic and orthologous chromosomal relationships. Together, these resources provide an initial, important foundation for accelerated genetic improvement to neodomesticate this potentially valuable crop.

# MATERIALS AND METHODS

# Plant Material

Atriplex hortensis cv. "Golden" was obtained from Wild Garden Seed (Philomath, Oregon) and used for wholegenome sequencing and assembly. Sterilized seed were grown hydroponically in a growth chamber at BYU. An 11-h photoperiod was maintained using broad-spectrum light sources. Growing temperatures ranged from 18◦C (night) to 20◦C (day). Hydroponic growth solution, changed weekly, was made from MaxiBloom <sup>R</sup> Hydroponics Plant Food (General Hydroponics, Sevastopol, CA, United States) at a concentration of 1.7 g/L.

The resequencing panel consisted of 21 A. hortensis accessions: 15 from the United States Department of Agriculture collection (USDA, ARS, NALPGRU;<sup>1</sup> ); five each from two separate commercial seed vendors (Baker Creek Heirloom Seed Company, Mansfield, Missouri and Wild Garden Seed, Philomath, Oregon); and one accession collected in the wild in Utah (BYU 1317 from Park City, Utah). Plants used in the resequencing panel were originally collected from across Europe (France, Poland, countries of the former Soviet Union, former Serbia/Montenegro and Norway) and North America (United States, and Canada). A complete list of all plant materials including passport information is provided in **Table 1**.

### DNA Extraction, Library Preparation, and Oxford Nanopore Sequencing

The Golden variety of A. hortensis was grown hydroponically in a growth chamber at BYU as previously described. Plants were dark-treated for 72 h at which point young leaf tissue was harvested and extracted for high molecular weight (HMW) genomic DNA using the Qiagen (Germantown, MD) Genomictip protocol. The DNA concentration was checked using the dsDNA High Sensitivity DNA Assay on the Qubit <sup>R</sup> 2.0 Fluorimeter (Invitrogen, Merelbeke, Belgium).

Samples for DNA sequencing were prepared with and without fragmentation using Covaris g-TUBEs (Woburn, MA) and the ZYMO DNA Clean & Concentrator-5 column (Irvine, CA, United States). Samples were fragmented using both the ZYMO DNA kit and Covaris g-TUBEs following

<sup>1</sup>https://npgsweb.ars-grin.gov/

manufacturer's instructions. Samples prepared with the Covaris g-TUBEs were fragmented at several centrifugation speeds, including 3,800, 4,000, and 4,200 RPM. In total, nine libraries from the original DNA stock were prepared for sequencing using the 1D Genomic DNA by Ligation MinION library preparation kit. Libraries were sequenced on R9 flow cells on a MinION for 48 h using MinKNOW 2.0 software with the following settings: DNA, PCR-free, no multiplexing, SQK-LSK109 kit (Oxford Nanopore Technologies, Ltd., Oxford, United Kingdom). No alterations were made to voltage or time. Albacore v2.3.1, part of the MinKNOW package, was used for base calling.

## Read Cleaning, Draft Genome Assembly, and Polishing

MinIONQC (Lanfear et al., 2019) was used with default settings to summarize sequence data. NanoFilt (De Coster et al., 2018) was then used to trim and filter reads using the following options: −q = 8, headcrop = 25, −l = 2000. Porechop v.0.2.3 (Wick et al., 2017) was used to trim adaptors from sequence data with the default options. Draft genomes were assembled using multiple assemblers, specifically Canu v.1.7.1 (Koren et al., 2017), MaSuRCA v.3.2.8 (Zimin et al., 2013), Flye v.2.3.6 (Kolmogorov et al., 2019) and wtdbg2 (Ruan and Li, 2020). Illumina reads were used to polish the Canu assembly using Nanopolish (Loman et al., 2015), and Pilon v.1.22 (Walker et al., 2014). The completeness of each of the draft genome assemblies was assessed using BUSCO v4 (Simão et al., 2015) using the flowering plant (embryophyte\_odb10) orthologous gene data set. Specific commands and flags for each assembly program used are provided in **Supplementary File 1**.

# Hi-C Scaffolding

Atriplex hortensis plants (cv. Golden) were dark-treated for 72 h prior to flash-freezing young leaf tissue in liquid nitrogen. Tissue samples were then shipped to Dovetail Genomics (Scotts Valley, CA, United States) for Chicago <sup>R</sup> and Hi-C proximity ligation sequencing. Dovetail Chicago <sup>R</sup> libraries are similar to Hi-C libraries but differ in that they rely on library preparation from in vitro rather than in vivo reconstituted chromatin that has been cross-linked and subsequently sheared (Moll et al., 2017). Chromosomescale scaffolds were generated using Dovetail Genomics' HiRiSETM assembler.

### Illumina Sequencing and Transcriptome Assembly

The "Golden" variety of A. hortensis was grown hydroponically as previously described. Plants were either grown in a control hydroponic solution or in hydroponic solution supplemented with NaCl. For the salt treatment, NaCl was added daily at 50 mM increments to the hydroponic solution of 21 day old plants until a concentration of 350 mM NaCl was reached (7 days). Tissue for RNA extraction was harvested 24 h after 350 mM NaCl concentration was reached. Root,

The Orach Genome

fpls-11-00624 May 20, 2020 Time: 17:23 # 5

TABLE 1 |Identification and passportinformation for plant materials used for the genome sequencing and the resequencing panel.


Accessions of A. hortensis originated throughout Europe and North America. For the whole genome sequencing, Hi-C scaffolding and transcriptome data, the source tissue and/or library type is provided in the Latitude/Longitude column. <sup>1</sup>Meters above sea level. <sup>2</sup>Sequence read archive accession number for each resequenced line. Maintained by national Center for Biotechnology Information (https://dataview.ncbi.nlm.nih.gov/objects?linked\_to\_id = PRJNA607334). <sup>3</sup>N/A indicates no data. <sup>4</sup>Treated with 350 mM NaCl (see section "Materials and Methods").

stem and leaf tissue was harvested from both control and treated plants. One-week old whole plantlet and inflorescence (tissue and immature seed) tissues from untreated plants were also collected.

In total, seven libraries were prepared with 180-bp inserts. Sequencing was conducted using the Illumina HiSeq platform at the Beijing Genomics Institute (Shenzhen, China). Reads were trimmed and quality controlled using the program Trimmomatic-0.35 (Bolger et al., 2014). RNA-seq data were aligned to the Hi-C assembly using HiSat v2.2.1 with the max intron length set to 50,000 bp (Kim et al., 2015). Data was then assembled into potential transcripts using StringTie (Pertea et al., 2015) with default parameters.

### Repeat Analysis and Annotation

Atriplex hortensis-specific repeats were identified using RepeatModeler v.1.0.11 (Smit and Hubley, 2008–2015). RepeatMasker v.4.0.7 (Smit et al., 2013–2015) was used to classify A. hortensis-specific repeats using the RepBase database version 20160829. The MAKER2 v2.31.10 pipeline (Holt and Yandell, 2011) was used to annotate the A. hortensis genome with ab initio gene predictions using AUGUSTUS (Stanke et al., 2004) speciesspecific gene models for A. hortensis. Additional evidence sources for the annotations included expressed sequence tags (EST) and protein homology from the transcriptomes of C. quinoa (Jarvis et al., 2017) and C. pallidicaule Aellen (Mangelson et al., 2019) as well as the A. hortensis transcriptome produced from the RNA-seq data previously described. The uniprot\_sprot database (downloaded 11/13/2018) was used for Basic Local Alignment Search Tool (BLASTp)-based annotation of the gene models.

### Resequencing

Genomic DNA from each of the 21 A. hortensis accessions was extracted using the mini-salts protocol reported by Todd and Vodkin (1996). The DNA concentrations and quality were checked using the dsDNA BR Assay from Qubit <sup>R</sup> 2.0 Fluorimeter. Libraries were sent to Novogene (San Diego, CA, United States) for whole-genome Illumina HiSeq X Ten sequencing (2 × 150-bp paired-end). Reads were trimmed with Trimmomatic using default parameters (Bolger et al., 2014). Reads from each accession were then aligned to the final A. hortensis reference genome using Bowtie2 using the very-sensitive-local flag (Langmead and Salzberg, 2012) to produce BAM files that were further marked for PCR duplicates using the MarkDuplicates subroutine in the Picard package.<sup>2</sup> Single nucleotide polymorphism (SNP) genotype likelihoods and covariances were then determined from the 21 accessions using ANGSD using a p-value of 10E-06 for a site being variable (Korneliussen et al., 2014) to produce a genotype likelihood (beagle) file. Multivariate analysis of the covariance data was accomplished using PAST4 (Hammer et al., 2001), while population structure and admixture were then inferred using PCAngsd (Meisner and Albrechtsen, 2018) at K = 3 based on the DeltaK method described by Evanno et al. (2005). Bootstrapped (n = 1000) UPGMA phylogenies based on Euclidean similarity indices were produced using PAST4 (Hammer et al., 2001).

# Cytogenetics and Genome Size Estimation

Atriplex hortensis cv. Golden seeds were germinated on petri dishes for 36 h. Root meristems were collected and immersed in ice water for 24 h. Root meristems were then treated for another 24 h in a 3:1 mixture of ethanol (95%) – glacial acetic acid. Root tips were prepared under a dissecting microscope where they were placed on slides, treated with iron-acetocarmine, warmed on an alcohol burner, and squashed. Chromosomes were examined using a Zeiss Axioplan 2 phase-contrast microscope and images were captured on an Axiocam (Carl Zeiss, Jena, Germany) CCD camera. Fluorescent in situ hybridization (FISH) rDNA images of mitotic chromosome preparations of A. hortensis cv. "Triple Purple" were taken using yellow-green fluorescing digoxygenin to highlight the NOR-35S region and red fluorescing rhodamine to highlight the 5S region using the protocol described by Maughan et al. (2006). Chromosome spreads and DNA probes for FISH were prepared using the protocol described in Kolano et al. (2012).

Genome-size estimation was conducted using a Beckman Coulter (Miami, FL, United States) Gallios flow cytometer by Agriculture and Agri-Food Canada (AAFC) as described by Yan et al. (2016). Samples were analyzed in triplicate (technical replicates) conducted over three different days. Characteristics of the florescence peaks including mean, nuclei numbers, and coefficients of variation were determined using the R package flowPloidy (Smith et al., 2018). The 2C DNA value of each sample was calculated as: (mean of sample G1peak/mean of standard G1 peak) × 2C DNA content (pg) of the radish (Raphanus raphanistrum subsp. sativus, estimated 515 Mb) standard.

# RESULTS

### Library Fragmentation

Since Oxford Nanopore Technologies (ONT) sequencing is still relatively new, we tested the relationship between fragmentation strategies, read length and total sequence output to discover the optimal sample preparation method. To achieve sufficient coverage, we developed nine different libraries that were each sequenced independently on different flow cells. In total, the nine libraries yielded 65.4 Gb of data from 5,525,447 reads with a read length N50 of 22,087 bp, a mean read length of 13,487 bp and a mean quality score of nine (**Table 2**). Individual DNA libraries prepared with fragmentation (Covaris g-TUBEs and ZYMO DNA concentrator-5 column kit) or without fragmentation produced dramatically varied results in terms of read lengths and total sequence yield. Not unexpectedly, the library prepared without fragmentation produced the longest read lengths (N50 = 40,434 bp) but also exhibited the lowest overall

<sup>2</sup>http://broadinstitute.github.io/picard/


TABLE 2 | Oxford Nanopore library preparation and sequencing statistics. Non-fragmentation and fragmentation techniques were used in sample preparation.

sequence yield (1.26 Gb). Fragmentation using the Covaris g-TUBEs at different centrifugation speeds (3,800, 4,000, and 4,200 RPM) produced variable results, but with general trends, specifically: (1) Covaris g-TUBE fragmented libraries always outproduced the non-fragmented library (average yield = 8.67 Gb), but the N50 of the read lengths of these libraries was always smaller (average N50 = 17,166 bp), and (2) lower centrifugation speeds produced longer read lengths (3,800 RPM = 17,979 bp vs 4,200 RPM = 15,939 bp), but with lower yield (3,800 RPM = 8.5 Gb vs 4,200 RPM = 10.9 Gb; **Table 2**). The two fragmentation libraries produced using the ZYMO DNA kit yielded intermediately to the nonfragmented libraries and the Covaris fragmented libraries, with an average of 3.93 Gb of sequence with a read length N50 of 28,495 bp, suggesting that the ZYMO DNA kit only minimally fragmented the DNA.

### Genome Assembly

Flow cytometry indicated that the A. hortensis genome is approximately 1.172 Gb (**Table 3**), while karyotyping of cell nuclei showed that A. hortensis carries nine pairs of chromosomes (2n = 2x = 18). In the A. hortensis karyotype, chromosomes were metacentric to slightly submetacentric (**Figure 1**), and similar in length.

Multiple assemblers were tested to determine which would most optimally assemble the A. hortensis genome. These assemblers included Canu (Koren et al.), MaSuRCA (Zimin et al., 2013), Flye (Kolmogorov et al., 2019) and wtdbg2 (Ruan and Li, 2020). All assemblers were run with default parameters. The wtdbg2 assembler produced the

TABLE 3 | Flow cytometry results of A. hortensis (cv. "Golden") leaf tissue. A C-value of 2.4 picograms yielded a genome size estimate of 1.17 Gb.


<sup>1</sup>Haploid genome size was calculated as 1 pg = 978 Mbp per Dolezel et al. (2003).

largest number of contigs and the Flye assembler produced the smallest N50 (**Table 4**). Both the wtdbg2 and the Flye assemblers produced smaller (total genome size) assemblies relative to the MaSuRCA and Canu assemblers. Both the MaSuRCA and Canu assemblers produced >1 Mb contig N50s, with the Canu assembler producing the least collapsed assembly (relative to the predicted genome size of 1.2 Gb).

The MaSuRCA assembler uses a hybrid approach for assembly that initially utilizes high-quality short-reads to produce super-reads that are then scaffolded and gap-filled with the long reads to produce high-quality scaffolds that do not require error correction (Zimin et al., 2013). The Flye, Canu and wtdbg2 assemblers are based solely on the error-prone long reads and are thus considered unpolished assemblies and require polishing to correct the inherently high sequencing error rate associated with the ONT technology. We polished the Flye, Canu and wtdbg2 draft assemblies with Nanopolish, which uses the original ONT reads for consensus correction along with two rounds of Pilon, which in turn uses the high-quality Illumina short reads for correction to produce a high-quality, polished set of draft assemblies for comparison (**Table 4**). We evaluated the final assemblies using Benchmarking Universal Single Copy Orthologs (BUSCO) (Simão et al., 2015) which quantifies gene content completeness based on a large core set of highly conserved orthologous genes (COGs). After polishing, BUSCO

TABLE 4 | Assembly and Benchmarking Universal Single Copy Orthologs (BUSCO) statistics for the MaSuRCA, Flye, wtdbg2, Canu and Hi-C scaffolded Canu assemblies.


%Complete COGS found [%single, %duplicate] 90.0% [85.8%, 4.2%] 90.6% [88.1%, 2.5%] 86.0% [83.5%, 2.5%] 97.3% [95.1%, 2.2%]<sup>2</sup> 96.7% [95.0%, 1.7%]

<sup>1</sup>Scaffold statistics are reported for the MaSURca and Canu Hi-C assembly. All other assembly statistics are contig metrics.

was used to identify complete COGs within the various assemblies, which ranged from a low of 86.0% in the Flye assembly, to a high of 97.3% in the Canu assembly. The necessity of the polishing steps was reflected in the increasing BUSCO scores after successive rounds of polishing. For example, the BUSCO scores for complete COGs identified for the original, Nanopolished, Nanopolished+Pilon and Nanopolished+Pilon+Pilon Canu assemblies were 50.5, 73.8, 90.1, and 97.3%, respectively.

Both the MaSuRCA and Canu assemblers produced superior assemblies based on the total size of the contigs, contig N50, and BUSCO scores; however, the polished Canu assembly was ultimately chosen as the draft genome for Hi-C scaffolding due to concerns of repeat collapse within the MaSuRCA assembly as reflected in the smaller total size of the contigs. The polished Canu assembly resulted in 3,183 contigs, spanning 965 Mb, a contig N50 of 1.114 Mb, an L50 of 223, and a BUSCO score of 97.3% (**Table 4**).

### Chromosome-Scale Scaffolding

To further improve the Canu assembly, contigs were scaffolded using chromatin-contact maps using Dovetail Chicago <sup>R</sup> and Hi-C libraries. Chicago <sup>R</sup> library contact maps are based on in vitro reconstituted DNA and are ideal for detecting and correcting miss-joins in de novo assemblies as well as short-range scaffolding (Putnam et al., 2016). A total of 163 million read pairs (70X coverage) were generated from the Chicago <sup>R</sup> library and were used to detect misalignments and scaffold the Canu assembly using the HiRiSETM scaffolder. In total, 429 breaks and 1,421 joins were made, resulting in a net decrease in the total number of scaffolds to 2,191 and a slight decrease in N50 (817 kb) for the assembly. Whenever a join was made between contigs, an "N" gap, consisting of 100 Ns, was created. The total percent of the genome in gaps was less than 0.1%.

The Chicago <sup>R</sup> -based assembly was then further scaffolded using an in vivo Hi-C library created from native chromatin to produce ultra-long-range (10–10,000 kb) mate-pairs. A total of 200 million mate-pair reads, representing a physical coverage of 62×, were generated and scaffolded using the HiRiSETM scaffolder. In total, 868 joins and no breaks were made, producing a final assembly containing 1,325 scaffolds, spanning a total sequence length of 965 Mb with an N50 and L50 of 98.9 Mb and 5 scaffolds, respectively. Nine chromosome-scale scaffolds were assembled containing 94.7% of the total sequence length. The chromosome-scale scaffolds ranged in size from 93.6 to 113.5 Mb and were numbered sequentially based on scaffold length (e.g., Ah1–Ah9). Scaffold joins produced by the Hi-C mate-pairs introduced new "N" gaps in the assembly, thereby increasing the number of gaps in the assembly to 2,295. The final number of "N" nucleotides in the final Hi-C assembly was 229,050 (<0.1%; **Table 2**).

A BUSCO analysis of the final Hi-C assembly identified 1,331 (96.8%) complete COGs from the Embryophyta database (n = 1375), of which only 1.7% (23) were duplicated – reflecting the diploid nature of the Atriplex genome and suggesting that only minor paralogous duplications have occurred. Another nine (0.7%) fragmented COGs were identified. Only 35 COGs were missing, which is indicative of a highly complete assembly.

### Repeat Features

The RepeatModeler and RepeatMasker pipelines were used to annotate and mask the repeat fraction of the Hi-C assembly. Approximately 66% (639.6 Mb) of the genome was annotated as repetitive, which is slightly higher than the repetitive fraction classified for other members of the Amaranthaceae/Chenopodiaceae alliance with reference genomes [48% in Amaranthus hypochondriacus L. (Lightfoot et al., 2017), 64% in Spinacia oleracea (Li et al., 2019), 63% in Beta vulgaris L. (Flavell et al., 1974), 64.5% in C. quinoa (Jarvis et al., 2017)]. The most common repeat elements identified were long-terminal repeat retrotransposons (LTR-RT). The LTR-RTs are the most abundant genomic component in flowering plants (Du et al., 2010; Galindo-Gonzalez et al., 2017) and their frequency is strongly correlated with increased genome size (Michael, 2014).


TABLE 5 | Repetitive element classification for final assembly (Canu Hi-C) as reported by RepeatMasker.

<sup>a</sup>LINE, long interspersed nuclear elements; LTR, long terminal repeat; RC, Rolling circle. <sup>b</sup>The most common mono- di-, tri-, and tetra- nucleotide repeat motifs were (T)n, (TA)n, (ATT)n, (TTTA)n, respectively.

Of the various LTR-RTs present in the A. hortensis genome, Gypsy-like (31.90%) and Copia-like (11.13%) elements represent greater than 40% of the genome and are in a 3:1 (Gypsy:copia) ratio, similar to the 2.9:1 ratio reported for 50 sequenced plant genomes (Ou and Jiang, 2018). An additional 5.14% (49.5 Mb) of the genome was classified as low-complexity (satellites, simple repeats, and rRNAs), while 14.86% (143 Mb) of the genome was characterized as unclassified repetitive elements (**Table 5**) – presumably representing Atriplex-specific repeat elements that will undoubtedly be important for understanding the evolution of the A. hortensis genome.

A BLAST search (Altschul et al., 1990) of the complete rRNA gene sequence found in C. quinoa (DQ187960.1) was conducted to identify the 35S rRNA gene (NOR) location in the A. hortensis genome using the C. quinoa sequence as query. The 35S rRNA locus was located on chromosome Ah6. Another BLAST search was conducted to identify matches for the 5S rRNA gene locus in A. hortensis, again using the homologous 5S rDNA repeat sequence in C. quinoa (DQ187967.1) as the query. The 5S rDNA sequences mapped primarily to chromosome Ah4 and to several other smaller unscaffolded contigs that did not assemble into specific chromosomes. The appearance of these smaller scaffolds in the BLAST search results was not surprising as 5S rDNA repeats are highly repetitive and of low-complexity, and thus extremely difficult to assemble and scaffold accurately. A FISH analysis of mitotic chromosome preparations for A. hortensis cv. Triple Purple revealed a physical location of a single NOR-35S (green) locus and of a single 5S (red) rRNA gene tandem repeatarray locus (**Figure 1**). The identification of the cytological and genomic position of the 5S rRNA and 35S rRNA gene loci gives unique identities to two of the nine chromosome pairs in the A. hortensis kayotype (specifically Ah4 and Ah6).

The sequence for telomeric repeats in plants is highly conserved and has been identified as TTTAGGG (Richards and Ausubel, 1988). A BLAST search of this sequence motif against the nine A. hortensis chromosomes identified tandemly repeated telomeric sequences on at least one end of each of the nine chromosome assemblies with a total of 13 telomere-like repetitive regions identified (**Figure 2**). Four of the nine chromosomes had telomere-to-telomere assemblies (telomeres identified on both ends of the chromosome assembly).

### Genome Annotations

A de novo A. hortensis transcriptome, derived from 30–40 million RNA-seq reads each from stem, leaf, floral and whole plantlet tissues, consisted of 272,255 isoforms with an N<sup>50</sup> of 3,325 bp and a mean length of 1,956 bp. The A. hortensis transcriptome, along with the EST and peptide models from C. quinoa and C. pallidicuale and the uniprot-sprot database, were provided as primary evidence for annotation in the MAKER pipeline. The RNA-seq data mapped with high efficiency to the final genome assembly, with an overall alignment rate of 92% and with 81.5% of the pair reads aligning concordantly exactly one time, with only 4.26% aligning more than once concordantly – suggestive of a high-quality genome assembly and reflective of the diploid nature of the A. hortensis genome. The MAKER pipeline identified a total of 39,540 gene models and 2,555 tRNA genes. The average length of genes identified by MAKER was 1,750 bp. The completeness of the annotation was assessed by BUSCO which identified 1,278 (92.9%) complete COGs from the transcript annotation (Complete: 92.9% [Single Copy: 90.4%, Duplicated: 2.5%], Fragmented: 4.7%, Missing: 2.4%). To assess the quality of the annotations, we used the mean Annotation Edit Distance (AED) which is calculated by combining annotation values corresponding to specificity and sensitivity. AED values of 0.5 and below are considered good annotations, and values of 0.30 and below are considered high quality annotations (Holt and Yandell, 2011). Over 90% of the genome models have an AED

value <0.5, with the majority (51.7%) of the models having AED values below 0.325 (**Figure 3**). An analysis of the completeness of the gene models was further assessed by comparing the matched length of the transcripts with orthologous C. quinoa transcripts. Orthologs were determined using BLAST analysis (e-value < 1e-20) with the max target set to 1. Of the 18,657 orthologs identified with C. quinoa, ∼80% (14,764) covered at least 70% of the C. quinoa orthologs. The AED score coupled with the BUSCO assessment and ortholog analysis are suggestive of a high-quality genome assembly and annotation. In addition, the observed chromosomal distribution of the annotated genes, with higher

gene density near the ends of chromosomes and lower gene density in the centromeric regions (**Figure 2**), is suggestive of a high-quality genome assembly and annotation. An examination of the self-synteny map (not shown but accessible via CoGe) revealed no obvious blocks of paralogous genes.

### Genomic Comparison and Features

Several species within the Amaranthaceae/Chenopodiaceae alliance have chromosome-scale genome assemblies, including the ancient allotetraploids C. quinoa (Jarvis et al., 2017) (2n = 4x = 18) and Amaranthus hypochondriacus (Lightfoot

et al., 2017) (2n = 4x = 16), and the diploid Beta vulgaris (Funk et al., 2018) (2n = 2x = 9). Previous phylogenic research using chloroplast DNA (rbcL gene, atpB-rbcL spacer) and nuclear rDNA internal transcribed spacer (ITS), clearly demonstrated that Atriplex is more closely related to Chenopodium and Beta, which are both found in the same family Chenopodiaceae s. str. but in different subfamilies, Chenopodioideae (Chenopodium and Atriplex) and Betoideae (Beta), while Amaranthus is more distantly related to these chenopods and is found within the family Amaranthaceae s. str., subfamily Amaranthoideae (Kadereit et al., 2003, 2010; Fuentes-Bazan et al., 2012; Morales-Briones et al., 2020). Syntenic relationships between A. hortensis and these other genomes were explored using DAGChainer (Haas et al., 2004), which identifies syntenic blocks of collinear homologous gene pairs between genomes.

A synteny analysis of A. hortensis and B. vulgaris identified 226 shared syntenic blocks between the genomes, with 11,697 colinear gene pairs (averaging 52 gene pairs/block) spanning 469 and 616 Mb of the B. vulgaris and A. hortensis genomes, respectively. Moreover, syntenic block sizes between the species were also correlated (R<sup>2</sup> = 0.36), further reflecting a shared ancestry of two species. One-to-one orthologous relationships between A. hortensis and B. vulgaris chromosomes (**Figure 4** and **Table 6**) were clearly ascertained for six of the nine A. hortensis chromosomes: Ah1 = Bv5 (100% shared syntenic block sequence), Ah2 = Bv4 (99%); Ah4 = Bv8 (100%); Ah5 = Bv3 (100%); Ah7 = Bv7 (100%); Ah8 = Bv2 (100%). The remaining three A. hortensis chromosomes shared substantial levels of synteny with multiple B. vulgaris chromosomes, suggestive of intergenomic rearrangements (i.e., reciprocal translocations), with Ah3 = Bv6 (51%), Bv9 (49%); Ah6 = Bv6 (52%), Bv1 (48%); and Ah9 = Bv9 (54%), Bv1 (44%). We note that we cannot exclude that these rearrangements are possible misassembles – although our Hi-C data strongly supports the current placements.

Atriplex hortensis and B. vulgaris are both diploid and share a haploid chromosome number (n = 9), whereas C. quinoa is an allotetraploid member (showing amphidiploid inheritance) of the subfamily Chenopodioideae, having experienced an ancient allopolyploidization event (Storchova et al., 2015). Our analysis of synteny between C. quinoa and A. hortensis identified a combined total of 24,710 syntenic gene pairs, spanning 1.1 Gb and 1.3 Gb of the C. quinoa and A. hortensis genome, respectively, using a tetraploid-to-diploid (2:1) analysis. The synteny observed among the A. hortensis and C. quinoa chromosomes suggests several orthologous relationships with known homeologous C. quinoa chromosome pairs, including Ah1 = Cq5A (51% shared syntenic block sequence), Cq5B (49%); Ah2 = Cq4A (50%), Cq4B (48%); Ah4 = Cq8A (48%), Cq8B (51%); Ah5 = Cq3A (50%), Cq3B (50%); Ah7 = Cq7A (49%), Cq7B (43%); Ah8 = Cq2A (52%), Cq2B (43%). As with B. vulgaris, A. hortensis chromosomes Ah3, Ah6 and Ah9 have large rearrangements showing synteny to Cq1A&B, Cq6A&B and Cq9A&B.

Atriplex hortensis, B. vulgaris and C. quinoa share a base chromosome number of x = 9, whereas the base number in Amaranthus is x = 8, due to a chromosome loss (Am5) and a chromosome fusion (Am1; Lightfoot et al., 2017). The amaranths belong to the family Amaranthaceae s. str. Subfamily Amaranthoideae and were thus expected to be the most divergent of the three genomes compared. Indeed, while our genome comparison of A. hortensis with A. hypochondriacus clearly showed synteny (**Figure 4** and **Table 6**), the size of the 410 syntenic blocks (12,306 syntenic gene pairs) observed was the smallest of the three genomes (Bv: 2.1 Mb/block; Cq: 2.7 Mb/block; Am: 0.84 Mb/block), accompanied by the lowest block size correlation between the species (Bv: R<sup>2</sup> = 0.36; Cq: R <sup>2</sup> = 0.42; Am: R<sup>2</sup> = 0.04). These decreases are reflective of the more distant evolutionary relationship between Atriplex and Amaranthus within the family. We confirm the chromosome

fusion event in Amaranthus as seen by the synteny plot of Ah3, where Ah3 aligns twice with Am1 (**Figure 5**; red arrow). Although many additional rearrangements are present which obscure one-to-one orthologous chromosome relationships with the known homeologous amaranth chromosomes (Lightfoot et al., 2017), several can be confirmed: Ah2 = Am4 (46%), Am6 (52%); Ah4 = Am9 (52%), Am14 (48%); Ah7 = Am8 (58%), Am15 (42%; **Table 6**).

To elucidate the timing of the evolutionary events that separate Atriplex from C. quinoa, B. vulgaris and A. hypochondriacus, we calculated the rate of synonymous substitutions per synonymous site (Ks) in duplicate gene-pairs between the species (**Figure 6**) using the CodeML (Yang, 2007) tool on the CoGe platform (genomevolution.org/coge). As expected, C. quinoa is most closely related to A. hortensis, with a clear peak present at Ks = 0.25, followed by B. vulgaris (Ks peak = 0.55), while A. hypochondriacus, as expected, is more distantly related, with a Ks peak = 0.7. The timing of the divergence events (time to last common ancestor) can be established using the Ks peak values and synonymous mutation rates, such as the core eukaryotic rate (8.1E–09) proposed by Lynch and Conery (2000) or with lineage specific rates, calibrated to the fossil record. Kadereit et al. (2003) used three paleobotanical fossils to establish a much lower synonymous substitution rate for Chenopodioideae (2.8–4.1E-09), which showed rate constancy among the lineages studied,

TABLE 6 | Orthologous genes were identified between A. hortensis and beet (A), C. quinoa (B) and Am. hypochondriacus (C) to detect orthologous chromosome relationships.

Total syntenic bases are shown between all chromosome comparisons. Syntenic relationships are colored red and transition to white as the amount of synteny decreases.

suggesting that the Amaranthaceae-Chenopodiaceae have a lower nucleotide substitution rate than other angiosperms, including the Arabidopsis rate (1.5E-08). The CodeML workflow in CoGe identifies syntenic gene pairs between species, extracts coding sequences, and aligns protein sequences using the Needleman–Wunsch alignment algorithm, which is then back-translated to a codon alignment that is then used for Ks estimation. Using the lower substitution rates calculated by Kadereit et al. (2003), we date the last shared common ancestor between A. hortensis and C. quinoa, B. vulgaris, and

FIGURE 5 | Genomic comparison of A. hortensis with B. vulgaris, Am. hypochondriacus and C. quinoa. Synteny dotplot showing syntenic coding sequences between A. hortensis and B. vulgaris (A), Am. hypochondriacus (B) and C. quinoa (C) coding sequences. Increasing color intensity is associated with increasing homology.

A. hypochondriacus to approximately 30.4 – 44.6 MYA, 67.1 – 98.2 MYA, and 85.3 –125 MYA, respectively.

### Resequencing

A diversity panel consisting of 21 diverse accessions of A. hortensis (**Table 1**) was re-sequenced using Illumina pairedend sequencing, resulting in an average of 13X coverage (13.2 Gb) per accession. Following alignment and genotype likelihood calling with ANGSD (Korneliussen et al., 2014), a total 17,711,684 SNPs were filtered from the 846,491,542 sites analyzed using a 5% minimum minor allele frequency. A principal components analysis of the covariance data using PC1 and PC2 explained a total of 99.92% of the total variation and clearly identified three clusters of Atriplex accessions, which also agreed with our DeltaK analysis of the number of groups in the data set (K = 3; **Figure 7B**). Analysis of the 1000-bootstrap, consensus tree identified three distinct clades, with two accessions including the commercial cultivar Triple Purple and a wild accession collected in Alberta, Canada forming the first clade. The second clade consisted of four cultivated accessions of Serbia/Montenegro origin with a single wild accession at their root originating from France. The last and largest clade consisted of two subgroups with the first subgroup consisting of five cultivated lines from Serbia/Montenegro and a second subgroup consisting of four

commercially available cultivars (obtained from Wild Garden Seed and Baker Creek Heirloom Seeds) and four accessions from disparate localities across Europe (Poland, Uzbekistan, Norway and Serbia) that were rooted by a wild accession collected in Utah, United States (**Figure 7A**). The structure plots (**Figure 7B**) indicate little to no admixture among the subpopulations, suggesting three distinct subpopulations with little to no interbreeding.

### DISCUSSION

Multiple different libraries were prepared for ONT sequencing, including with and without fragmentation, to ascertain the influence of fragmentation on sequencing yield and read length – both important components of successful genome assembly. Fragmentation consistently improved throughput and yield, with the Covaris g-TUBEs producing the most effective and least variable fragmentation (i.e., based on sequencing yield and read length variation). The effect of centrifugation speed (3,800, 4,000, and 4,200 rpm) was also an important, albeit less controllable, factor. In general, higher centrifugation speeds produced higher yields, but concomitantly with decreased read lengths. Indeed, flow cell nanopores remained active for longer periods with fragmented libraries as compared to those without fragmentation. Kubota et al. (2019) demonstrated a similar correlation between DNA length and nanopore inactivity, with inactivity increasing exponentially in relation to increasing DNA molecule size. Nivala et al. (2013) suggested that one possible reason for this could be that longer molecules correlate with an increased presence of secondary and/or tertiary structures in the DNA molecules. Nanopores are restricted to the width of one DNA molecule at a time; thus, if secondary and/or tertiary structures are present in the DNA molecules, they increase the probability of clogging the nanopores, rendering them inactive. The combination of Covaris g-TUBE libraries prepared with differing centrifugation speeds resulted in a dataset with enough yield to provide ample coverage to compensate for the high error rate of ONT sequencing while still yielding long reads needed to span repetitive or otherwise problematic genomic regions (**Table 2**).

Canu (Koren et al., 2017), MaSuRCA (Zimin et al., 2013), Flye (Kolmogorov et al., 2019) and wtdbg2 (Ruan and Li, 2020) assemblers were used to assemble the ONT sequence data to ascertain which assembly program would perform best with the A. hortensis ONT sequence data. There were substantial differences in the overall time to finish an assembly, with wtdbg2 being the fastest of the assemblers tested. However, the MaSuRCA and Canu assemblies produced superior assemblies in terms of total contig size, N50, L50 and BUSCO statistics (**Figure 2**), with the polished Canu assembly ultimately being chosen as the draft genome for Hi-C scaffolding due to concerns of repeat collapse within the MaSuRCA assembly as reflected in its smaller total size of contigs assembled – a concern noted by Kolmogorov et al. (2019) who demonstrated the difficulty of assembling telomeric and centromeric chromosome regions

with the MaSuRCA assembler. Three rounds of polishing were conducted utilizing Nanopolish followed by two rounds of Illumina read-based Pilon correction. Nanopolish uses an index to detect misassemblies based on sequencing-generated signal levels generated from the original nanopore sequence data that correspond to likelihood ratios, while Pilon uses read alignments of high-quality Illumina reads to consensus-correct the draft genome (Walker et al., 2014; Loman et al., 2015). RACON (Vaser et al., 2017), another popular long-read consensus polisher that can use ONT and Illumina sequence for consensus correction, was also tested as a substitute for both Nanopolish and Pilon but showed no significant enhancement to the final BUSCO statistics (data not shown). We note that over-polishing an assembly can also be problematic, as seen by a decrease in BUSCO scores, and should therefore be avoided. In our assembly, a third round of polishing did not improve BUSCO scores.

Unsurprisingly, the B subgenome of C. quinoa, which is approximately 25% larger than the assembled A subgenome, shared more and longer syntenic blocks (209 vs 189; 2.9 vs 2.4 Mb average) with A. hortensis. The higher synteny with the B subgenome of C. quinoa may also reflect a closer ancestry of A. hortensis with the B subgenome – whose closest extant known Chenopodium species (C. suecicum Murr or C. ficifolium Sm.) are of Old-World origin, similar to that of A. hortensis. We note that the A-subgenome of C. quinoa is suspected to be of New World origin with its closest known extant species being C. watsonii A. Nelson, which is native to the southwestern United States (Jellen et al., 2019). It should, however, be noted that at least one diploid with the A-genome, C. bryoniifolium Bunge, is native to eastern Siberia (Walsh et al., 2015; Mandák et al., 2018a). Moreover, a new allohexaploid species containing the A-subgenome from C. bryoniifolium has been recently described from the Far East

of Russia as C. luteorubrum Mandák & Lomonosova (Mandák et al., 2018b). Thus, an East Asian origin of the A-genome lineage in Chenopodium, with its subsequent trans-Beringian migration and explosive diversification in the Americas, cannot be excluded at the present state of our knowledge; however, that scenario is less parsimonious than the New World origin of the A-genome lineage.

The genome of A. hortensis is highly repetitive with approximately 66.3% of the sequence containing interspersed repetitive sequence. By comparison, the genome of quinoa is 64.5% repetitive (Jarvis et al., 2017). Genomes that contain substantial repeat fractions can be difficult to assemble correctly. To overcome this challenge, Hi-C chromosome-contact maps were used for genome scaffolding which dramatically increased the continuity of the assembly, producing nine chromosomesized scaffolds presumably representing each of the haploid chromosomes in A. hortensis (n = 9). Additionally, the Hi-C chromatin contact maps leverage the spatial orientation of the chromatin to identify and correct misassemblies in the overlaplayout-consensus assembly produced by Canu that potentially would have gone unnoticed. The nine chromosome pairs in A. hortensis are metacentric to slightly submetacentric (**Figure 1**). Due to the difficulty in assembling highly conserved and repetitive sequence regions within telomeres, the identification of 13 of the possible 18 telomeric ends is indicative of a highly complete, chromosome-scale genome assembly (**Figure 2**). The unexpected location of telomeric sequences in the subtelomeric region of one of the arms of chromosome Ah5 could reflect a potential assembly error – although careful inspection of the chromatin maps for this region do not show any indications of misassembly. Similar paracentric inversions have been seen in other species which result in telomere-specific tandem repeats being present in abnormal locations in plant chromosomes (Tek and Jiang, 2004). Nonetheless, additional research, potentially including optical mapping (e.g., BioNano genomics) and/or highdensity linkage map development (neither of which have been developed for A. hortensis) should be targeted to this region to verify the orientation of this segment of the chromosome. Such investigations will also help verify the assemblies of chromosomes Ah3, Ah6, and Ah9, which show syntenic relationships with multiple B. vulgaris and C. quinoa chromosomes arms, thus obscuring their orthologous relationships. Such research would undoubtedly provide additional insight into the chromosomal evolution that characterizes the family Chenopodiaceae s. str. and the whole Amaranthaceae/Chenopodiaceae alliance – such as the homoelog loss and chromosomal fusion reported in Amaranthus hypochondriacus (Lightfoot et al., 2017).

It is not surprising that the North American-derived materials grouped with European accessions, as it is commonly understood that the center of origin of A. hortensis is the Trans-Himalayan (central Asia and Siberia) and Southeast European regions and that it was likely introduced during the third century B.C. into the Mediterranean littoral and from there to the Americas in Colonial times (Ruas et al., 2001). The species has become locally naturalized along riverbanks, roadsides, and ditches in parts of the Great Basin of North America (personal observations). There is also evidence of its use as a food in Switzerland as early as the Neolithic Age (Andrews, 1948), suggesting widespread, albeit ancient use of the species. Unfortunately, the United States National Plant Germplasm System curates only 45 A. hortensis accessions, of which very few are publicly available. The identification of three highly distinct clades, showing only limited admixture in our results, emphasizes the need for additional collections of wild and cultivated germplasm from throughout its native range, particularly in its presumed center of origin. Indeed, of the 45 curated accessions at the USDA, nearly two-thirds (28) are derived from a single European region corresponding to the Balkan Peninsula in Southeast Europe (Serbia-Macedonia). Subsequent phylogenetic analysis using materials from much broader geographic collections should improve our understanding of extant genetic variation and speciation processes within A. hortensis.

# DATA AVAILABILITY STATEMENT

The raw sequences are deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive database under the BioProject ID PRJNA607334 with the following sequence read archive (SRR) accession numbers: SRR11147376 (Nanopore data), SRR11147367 – SRR11147368 (Hi-C), SRR11147369 – SRR11147375 (transcriptome) and SRR11123164 – SRR11123184 (resequencing panel; **Table 1**). Bulk data downloads, including annotations and BLAST analysis, and JBrowse viewing of the final Hi-C assembly are available at CoGe (https: //genomevolution.org/coge/;Genomeid56906). The scaffold names in CoGe corresponding to specific pseudo chromosome assemblies are as follows: Scaffold\_552\_HRSCAF\_710 = Ah1; Scaffold\_579\_HRSCAF\_742 = Ah2; Scaffold\_1312\_HRSCAF \_2063 = Ah3; Scaffold\_481\_HRSCAF\_623 = Ah4; Scaffold\_390 \_HRSCAF\_510 = Ah5; Scaffold\_1281\_HRSCAF\_1836 = Ah6; Scaffold\_1313\_HRSCAF\_2064 = Ah7; Scaffold\_1311\_HRSCAF \_2062 = Ah8; Scaffold\_291\_HRSCAF\_384 = Ah9. In addition, the variant call file (VCF) for the diversity panel is available as a download from CoGe (ID# 15277).

### AUTHOR CONTRIBUTIONS

PM, ENJ, and EWJ conceived and designed the study. SH and DL performed the sequencing experiments and managed the plant material. SM performed the flow cytometry experiments, and provided plant material, taxonomic information, and phylogenetic analysis. EWJ provided the assembly bioinformatics and expertise. BK preformed the fluorescent in situ hybridization experiments. PM, SH, and ENJ wrote the manuscript. All authors read and approved the final manuscript.

### FUNDING

This work was supported by funding from General Mills, Inc., and internal funds of Brigham Young University.

### ACKNOWLEDGMENTS

fpls-11-00624 May 20, 2020 Time: 17:23 # 18

We gratefully acknowledge the assistance of the USDA, ARS, NALPGRU at Parlier, California, and Frank Morton at Wild Garden Seed Co., for seed contributions and to General Mills, Inc., for financial assistance in carrying out this project. We are also indebted to Isaac Clouse and other undergraduate students who helped take care of plants and performed DNA extractions. We are also grateful to Dr. Daniel J. Fairbanks (Utah Valley

### REFERENCES


University, Orem, Utah, United States) for sharing his extensive insight of the Atriplex genus.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2020.00624/ full#supplementary-material




**Conflict of Interest:** The authors declare that this study received funding from General Mills, Inc. The funder had provided financial assistance based on an initial interest in investigating development of new specialty crops. The authors declare that the funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication. EJ was initially employed by General Mills and subsequently left to form 25:2 Solutions.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Hunt, Jarvis, Larsen, Mosyakin, Kolano, Jackson, Martin, Jellen and Maughan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Seed Structural Variability and Germination Capacity in Passiflora edulis Sims f. edulis

### Nohra Rodríguez Castillo<sup>1</sup> , Luz Marina Melgarejo<sup>1</sup> \* and Matthew Wohlgemuth Blair<sup>2</sup> \*

<sup>1</sup> Laboratory of Plant Physiology and Biochemistry, Department of Biology, Universidad Nacional de Colombia, Bogotá, Colombia, <sup>2</sup> Department of Agricultural and Environmental Sciences, Tennessee State University, Nashville, TN, United States

Purple passion fruit, Passiflora edulis Sims f. edulis, is an important semi-perennial, fruit bearing vine originating in South America that produces a commercial tropical juice pulp for international and national consumption. Within the round purple passion fruit are a large number of membranous seed sacs each containing individual seeds. Little is known about the seed anatomy of the commercial passion fruit, differences between wild collected and commercial types, and its effect on seedling germination. Therefore, our main objective for this study was to analyze the seed anatomy variability of different germplasm as well as the effect on viability and germination of the seeds of this species. Germplasm was evaluated from three sources: (1) commercial cultivars grown in current production areas, (2) genebank accession from the national seed bank, and (3) landraces collected across different high and mid-elevation sites of the Andean region. A total of 12 morphometric descriptors related to seed anatomy were evaluated on the 56 genotypes, of which three were most informative: Angle to the vertex which is related to the shape of the seed, the thickness of the tegument and the horizontal length; separating the seed according to its source of origin. Germination was found to be positively correlated with the number (r = 0.789) and depth (r = 0.854) of seed pitting. Seeds of the commercial cultivars had more seed pits and higher germination compared to seeds of landraces or genebank accessions showing a possible effect of domestication on the crop. Interestingly, passion fruits often germinate during the rainy season as escaped or wild seedlings especially in the disturbed landscapes of coffee plantations, so some dormancy is needed but faster germination is needed for intensive cultivation. Harnessing passion fruit diversity would be useful as the semi-domesticated landraces have valuable adaptation characteristics to combine with rapid germination selected in the commercial cultivars. The variability of seed pitting with cultivars more pitted than landraces possibly resulting in faster germination may indicate that purple passion fruit is still undergoing a process of selection and domestication for this trait.

Keywords: morphometric variability, seed anatomy, germination, viability, descriptors

# INTRODUCTION

Seed structural variability is related to germination capacity and can be a step in the process of domestication that distinguishes wild accessions from semi or fully domesticated crops (Koomneef et al., 2002). Fruit crops propagated by seed may be subject to this type of selection given that they are often grown in hedgerows rather than in row crop agriculture. Purple passion fruit, Passiflora

### Edited by:

Petr Smykal, ˙ Palacký University Olomouc, Czechia

### Reviewed by:

Alma Orozco Segovia, National Autonomous University of Mexico, Mexico Anca Macovei, University of Pavia, Italy

### \*Correspondence:

Luz Marina Melgarejo lmmelgarejom@unal.edu.co Matthew Wohlgemuth Blair mblair@tnstate.edu

### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 11 November 2019 Accepted: 03 April 2020 Published: 29 May 2020

### Citation:

Castillo NR, Melgarejo LM and Blair MW (2020) Seed Structural Variability and Germination Capacity in Passiflora edulis Sims f. edulis. Front. Plant Sci. 11:498. doi: 10.3389/fpls.2020.00498

edulis Sims f. edulis, is an important seed-propagated, semiperennial fruit bearing vine, which bears fleshy, and spherical "pepo type" berries with purple rinds that are filled with up to 200 small black seeds (Morton, 1987). The species originated in South America and exports from there and other tropical regions make it a major source of fruit pulp (Thokchom and Mandal, 2017). While passion fruit species can be propagated clonally (Salomão et al., 2002; Gelape et al., 2019), to date there are no seedless varieties and all are seed propagated or grafted onto seedlings. Thus, seed physiology is important to study.

Within the purple passion fruit are the many membranous seed sacs containing individual seeds (Morton, 1987). These are produced within a single ovary after self-pollination assisted by carpenter bees (Garcia, 2010; de Lira et al., 2016). The amount of seeds per fruit and the seed size and shape typically vary within and between species of Passiflora (Pérez-Cortéz et al., 2002). When pressed for juice the pithy rind is cracked and liquid is removed from the sacs with the seed remaining as a by-product (Macedo et al., 2015). Although the seeds are edible they are poorly studied but contain interesting nutritional and biochemical properties (Devi et al., 2018). The species is of great importance for varietal maintenance and production (Morton, 1987) but has spatula type embryos that are inhibited from germination by physical and physiological dormancy (Black, 1972; Baskin and Baskin, 1998; Koomneef et al., 2002).

Little is known about the seed anatomy of purple passion fruit and its effect on seedling germination or its variability among landraces and wild collected accessions. Seed characteristics are key to the seed dormancy, seedling propagation and plant survival of passion fruits (de Oliveira et al., 2010), and plants in general (Baskin and Baskin, 1998). The sweet flesh around the seeds of passion fruits are important for seed dispersal by birds, serving as an attractant and motivator for them to pick through the thick egg-like rind around the fruit (Torres, 2018). The round fruits can fall off naturally from the vine after a long period of drying on the plant and can roll downhill to settle in a moist ditch, where they are likely to germinate as a mass of seedlings (Veiga et al., 2013). This form of whole fruit dispersal can be considered advantageous in the natural environment of Passiflora spp., which grow mostly in hillside cloud forests of South America (Delanoy et al., 2006).

Dormancy and germination rates are other factors that can control the number of progenies a plant can make and is variable both in space and time with large environment effects (Smykal ˙ et al., 2014). The semi-domesticated passion fruit have been shown to have strong dormancy effects (de Oliveira et al., 2010; Torres, 2018). Slow germination for wild passion fruit seed can be overcome by stratification and temperature (Ramírez et al., 2015). The effect of seed structure on dormancy is less well studied than biochemical factors that inhibit germination in cultivated passion fruit (Araújo et al., 2007). Seed coat shape and thickness have been shown in some other plant families to affect germination (Chernoff et al., 1992; Gabr, 2014). In this study, we observed differences in the seed coat pitting abundance of the lignified tegmen of purple passion fruit genotypes which relate to viability and ultimately to germination. Seed coat characteristics of dry seed influence electrical conductivity and desiccation tolerance, which in turn affect seed preservation strategies as well as viability/germination (Baskin and Baskin, 1998; Veiga et al., 2013; Mira et al., 2015).

Dry seed characteristics, the focus of this study while important for plant establishment can be related to fruit characteristics as well (Macedo et al., 2015). The ratio of seed to pulp and juice has been investigated in purple passion fruit by comparing fresh mucilaginous and dry seed weight and is one factor that distinguishes commercial cultivars and landraces, and economic yield of the crop (Rodríguez et al., 2019, 2020). Seed/pulp ratio along with fruit size and fruit wall thickness are important parameters for crop improvement in breeding programs in East Africa and Asia (Matheri et al., 2016; Liu et al., 2017) but little work has been done on seed size, germination rate and their relation to pulp mass.

The objective of this research was to evaluate purple passion fruit genotypes from different sources for seed coat morphology and other seed characteristics, and associate these with seed viability as reflected in dormancy and time to or capacity to germinate. The germplasm sources used were commercial cultivars, genebank accessions and landraces collected from around Colombia. The hypothesis of lower dormancy and faster germination for commercial cultivars of purple passion fruit seeds was tested by comparing the average for these genotypes compared to the other two groups. The seed was not pretreated and therefore differences may be a result of selection during commercialization or even during domestication of purple passion fruit genotypes. While seed morphology has been studied as a characteristic to distinguish different species within the Passifloraceae family, this is the first study to evaluate intraspecific variability for seed morpho-anatomical traits and adds to the data for tropical plants found in Baskin and Baskin (1998). This study showed that it was important to understand the processes of seed trait selection in cultivar development and to conserve the purple passion fruit landraces and genetics resources in the Andean region both for conservations and production of the crop. In Colombia, the purple passion fruit is the fourth most important export in the category of fresh or processed fruits, after bananas, berries, and pineapples (Fischer et al., 2009). Despite its importance, the purple passion fruit remains poorly investigated agronomically as well especially compared to the yellow passion fruit, which is more prevalent throughout tropical lowland areas of Brazil, Central America, and the Caribbean (Cerqueira et al., 2014; Marostega et al., 2017). Although purple passion fruit is now found in the highland tropics of Africa and sub-tropical regions, its original range was in mid to higher altitude elevations of the Andes (Franco et al., 2014).

# MATERIALS AND METHODS

### Plant Material and Sites of Collection

Germplasm of purple passion fruit (P. edulis f. edulis Sims) was collected from three different sources: 8 commercial cultivars from farmers' fields in the department of Cundinamarca and Boyacá, 14 genebank accessions were from the field grown in situ genebank collection kept by Agrosavia (ex. Corpoica)

at the Llanogrande Experiment station in Rionegro, Antioquia and 34 were landraces from all over Colombia collected in the year prior to the study. This gave a total of 56 genotypes. The landraces were collected from backyard gardens, roadsides or small farms where the purple passion fruits were semi-wild or isolated plants rather than the main crop. This was common to many departments in the country since purple passion fruit is widely distributed in both the coffee region and vegetable or potato/corn and bean farms of the highland region. Departments of Colombia represented by the landraces included Antioquia (2 genotypes), Boyacá (6), Cauca (2), Cundinamarca (8), Huila (7), Nariño (5), Norte de Santander (2), Putumayo (4), Quindío (2), Risaralda (2), Santander (1), and Tolima (1) as found in **Supplementary Table 1**.

### Seed Viability and Germination

Seed viability was determined for the 56 genotypes by staining with 1% tetrazolium, an accepted process according to Sawma and Mohler (2002). Percentage seed germination was evaluated on a standard medium based on the method of Suárez and Melgarejo (2010) using 100 seeds from every accession grown on experimental plots described by Rodríguez et al. (2019, 2020). We were studying dried seed with mucilage and pulp removed and therefore our focus was on the pitted seed coat/integument alone.

### Anatomical and Morphological Traits

Dry seed from the same source described above were evaluated for a total of 12 morphological traits: seven of these were morphometric descriptors suggested by Tangarife et al. (2009), Santos et al. (2014), Marostega et al. (2017); while five were novel characteristics of seed shape and seed pitting that were new to this study. Two traits were evaluated qualitatively; namely FOR = seed form and COL = seed color. Seed form was based on shapes being divided into elliptical or semi-elliptical, and closer to oval. Given the pointed ends to the passion fruit seed none were truly oval but rather semi-elliptical. Seed color were ranked as black or brown and had no arils on them. Seed pits were present on all accessions so were evaluated for their prevalence or area coverage of the seed coat and depth into the seed coat (integument). These pits consist in semi-circular indentations in the seed coat. After this, 10 traits were measured as quantitative variables and were given the following abbreviations: PCI = fresh weight of 100 seed; LOH = seed width; LOV = seed height; ASE = seed area; PSE = seed perimeter; NSU = number of seed pits per side; ASU = area of seed pits; PSU = perimeter of seed pits; GTE = integument width; and AVE = angle of seed tips. This last trait measured the seed shape as characterized by the seed angle between the two vertices formed by the seed tips. Each quantitative trait was measured on 100 seed in three replicates and averages reported for 56 genotypes.

### Statistical Analysis

The data were analyzed for descriptive statistics comparing the three groups of purple passion fruit genotypes. Quantitative data were tested for normality and significance in analyses of variance (ANOVA) along with positive or negative correlation values with R software. The quantitative trait values for each group were then compared with mean separation using Duncan tests in SAS version 9.4 (SAS Institute, Cary, NC, United States). The full set of traits was also used for Multidimensional scaling (MDS) tests done with average values to evaluate the diversity between characteristics, genotypes and germplasm groups, based on RWizard software<sup>1</sup> . The MDS defined each trait as a vector whose proximity to other vectors for other traits reflected associations between traits confirming correlations, and their relative importance for the phenotyping, individually or in conjunction. Variance inflation factors (VIF) were estimated to determine which traits explained the major portion of variability and subsequently a principal component analysis (PCA) was done with the same software to separate germplasm groups (cultivars, genebank accessions, and landraces) and to identify outlier genotypes. Note that seed was dried to determine seed water content comparing fresh to dry weight of 100 seed (PCIs) and expressing on a wet-weight basis moisture content in percent (%) with the formula: [(W2–W3)/(W2–W1)] × 100; considering that W1 = weight of container with lid; W2 = weight of container with lid and sample before drying; and W3 = weight of container with lid and sample after drying.

# RESULTS

### Seed Viability and Germination Tests

Seed viability varied significantly (p < 0.001) based on the source of seeds tested and group of genotypes (**Table 1**), being highest on average for commercial cultivars (91%), followed by landraces (71%), and genebank accessions (65%). Similarly, germination rates presented highly significant differences (p < 0.001) between germplasm groups with 80%, 63%, and 59% average germination for commercial cultivars, landraces and genebank accessions, respectively (**Table 2**). Skewing, a form of non-normality in a population, was found for the distribution of genotypes' seed viability in the genebank group but not in the landraces or cultivars. For the germination rate, skewing was much less evident. Slight negative kurtosis was found for the distribution of values for both variables.

# Morphological Seed Variability

Seeds of purple passion fruit were found to vary in seed shape from elliptical to semi-elliptical or oval; and in seed color from either brown to black (**Table 3**). Determination of elliptical or oval shape was based on measurements of the vertical and horizontal cross section widths and lengths, angle between the vertices at the seed tips and other morphological characteristics. Observable differences in seed lengths, seed widths and resulting seed sizes measured as area of each genotypes' seeds measured on one side, gave an overall picture of the morphological differences in passion fruit types. These variables were associated with each other and with the qualitative differences seen in the figure part of the **Table 2**. Larger elliptical black seed were the only kind of seed in commercial cultivars and genebank entries. All seed had some degree of seed pitting.

<sup>1</sup>http://www.ipez.es/RWizard

TABLE 1 | Sum of Squares from the Analyses of Variance (ANOVAs) for 10 seed characteristics evaluated on 56 genotypes of purple passion fruit (Passiflora edulis f. edulis Sims) in three germplasm groups made up of 8 commercial cultivars, 14 genebank accessions and 34 landraces.


Abbreviations: nsnot significant, \*significant (0.01 < P < 0.05), \*\*high significance (0.001 < P < 0.01), and \*\*\*highly significant (P < 0.001), Df = degrees of freedom; Variables: ASE = seed area, ASU = area of seed pitting, LOH = seed length on the horizontal axis, LOV = seed width on the vertical access PCI = fresh dry weight of 100 seeds, NSU = number of seed pits, PSE = seed perimeter, and PSU = seed pit perimeters.

TABLE 2 | Average seed viability and germination percentage based on three germplasm groups of purple passion fruit (Passiflora edulis f. edulis Sims) made up of 8 commercial cultivars, 14 genebank accessions and 34 landraces.


P < 0.01, Based on n = 100 seed per accession.

TABLE 3 | Qualitative traits measured on the seed of 56 genotypes of purple passion fruit (Passiflora edulis f. edulis Sims).


n(cultivars) = 8; n(genebank) = 14; and n(landraces) = 34. Photograph showing the angle between the central axis of the purple passion fruit seed and the seed tips at each vertices (AVE) for semi-elliptical (a) and elliptical seed (b). Seed pitting is observable on both elliptical and oval seeds as well as the difference between color variants.

Semi-elliptical black seeds were found in two landraces, BUN009 from Boyacá, and BUN037 from Huila (for geographic coordinates see **Supplementary Table 1**). Meanwhile elliptical brown seed was found for four genotypes: namely BUN002 from Cundinamarca, BUN032 from North Santander, BUN036 from Cauca, and BUN040 from Huila. All other genotypes were black seeded with elliptical seed, the only phenotype found among commercial cultivars and genebank accessions. Seed pits were common across colors.

A diagram was drawn based on typical seed structure from our histological analysis to show the seed coat and cell layers below (left part of **Figure 1**). The first layer consisted of an external wax coating that was continuous and highly resistant to cracking and fractures. This was subtended by an exotesta and mesotesta layer of sclereid cells, then by a layer of palisade cells presumed to have originated from the integument based on previous studies (Cárdenas-Hernández et al., 2011). The palisade cells were oriented perpendicular to the sclereid cells. Seed pits consisted of an embossed pattern of indentations found in the sclereid cell layer with fewer and more narrow cells subtended directly by the exotegmen tissue and narrow layer of hyaline cells. Seed pitting was found on both sides of the seed and was evaluated for pit number, perimeter and area. All together, the three layers making up the seed coat varied in thickness based on the pattern of pitted and non-pitted areas. Every layer presented differences in the form of the cells, color and thickness, between the sections of the basal and medium parts of the seeds, with differences in width notable between accessions. Overall, the seed coat measurements were 0.45 mm thick on average below the seed pit; while in non-pitted areas the thickness of the seed coat was 0.50 mm on average. This is thicker than what has been found in P. ligularis, the granadilla from Cárdenas-Hernández et al. (2011), upon which our studies were based.

### Correlation Values

During the seed viability testing, the seed thickness was found to differ and influence seed germination percentage and time to completely germinate, which fluctuated between 1.5 and 3.5 months. The seed coat also varied in amount, location and depth of seed pitting which in turn affected the seed coat thickness at the seed pits and between them. This was observable upon inspection by visible microscopy.

Castillo et al. Purple Passion Fruit Seed Germination

Highly significant (P < 0.001) positive correlation values were found for the germination percentage with the area covered in seed pitting (r = 0.854) as well as number of seed pits (r = 0.789, P < 0.001; **Table 4**); but not between percentage germination and seed size or length. Seed germination was also correlated significantly and positively with perimeter of seed pits (r = 0.432, P = 0.047) as well as fresh seed weight (r = 0.547, P = 0.023).

The high correlation value showed that genotypes with the most seed pitting, germinated much more quickly than those without much pitting. Microscopy showed that seed pits were the areas of thinner seed coat thickness and might capture water or allow water entry and faster seed imbibing. The seed pitted area, number of seed pits and seed pit perimeter were all positively correlated with each other as were fresh and dry seed weights and seed perimeter but not with angle between seed vertices, a measure of seed shape.

As expected, angle between vertices as a quantitative trait was related to the shape of the seed as a qualitative trait. Classification of the passion fruit genotypes into two groups found semielliptical seed with seed vertices at an angle ∼140◦ and elliptical seed with seed vertices at an angle ∼110◦ . Therefore, visual observation of the trait of semi versus elliptical seed was just as good as measuring angle between seed vertices.

### Principal Component Analysis Relationships

In the PCA we could distinguish the seed morphological characteristics most related to classifying the genotypes of purple passion fruit (**Figure 2**). Principal component 1 was explained by the angle of vertices and seed coat (tegument) thickness while principal component 2 was explained by the variables seed length seed width as well as fresh and dry seed weight.

These first two components and therefore these main variables, explained 85% of the variability found for seed phenotypes. In looking at the relationships between individual traits based on the proximity of their vectors we found relationships between the number of seed pits, the area covered by seed pits and percentage germination. These were the most important traits in our study in terms of variability.

Meanwhile, component traits of seed size were associated, such as seed length on vertical and horizontal axis as well as fresh and dry seed weight. Perimeter of the seeds and perimeter of the seed pits were associated with the primary group of traits as well and they were closely linked to seed viability.

Angle between vertices and seed coat thickness were outliers that were not associated with other traits showing that the genetic control for this trait is likely to be different than for the traits of number of seed pits and area covered by seed pitting or those genes controlling seed size. The vector for perimeter of seed pits was associated with those for vertical length and horizontal length as well as fresh and dry seed weight showing some association for seed size factors. These were also associated with area and perimeter of the seed. Epistatic interaction between traits are likely for the seed size traits as one group and the seed pitting traits as another group.

The most distinct genotypes in the PCA were BUN001, BUN002, BUN003, BUN004, BUN005, BUN006, and BUN007 in the left bottom quadrant; while BUN10, BUN12, BUN14, BUN41, BUN44, and BUN46 were distinct in the right bottom quadrant. Apart from these only BUN17, BUN22, and BUN35 were distinct toward the left upper quadrant.

Among the three germplasm groups, the commercial cultivars had significantly greater seed pitting, more seed pitted area, larger seed size, and associated longer seed perimeter traits than landraces or genebank accessions. Landraces, on the other hand, had the smallest values for all these characteristics, and genebank accessions were intermediate between landraces and commercial cultivars for the seed size and seed pitting traits. Meanwhile, no significant differences (P = 0.8) were found between groups for seed moisture content and fresh or dry seed weight (PCI and PCIs, respectively).

### Variance Inflation and Multi-Dimensional Scaling

The VIF was determined for each of the quantitative morphological variables, in order to establish which ones explained the diversity among the accessions. The most significant variables including average angle of divergence, seed coat thickness, length of the seed on the horizontal axis and seed fresh weight, were then used in a MDS analysis (**Figure 3**).

The advantages of an MDS were that qualitative and quantitative traits could be analyzed together by comparing and contrasting their distributions across the population of genotypes, without any preconditions or assumptions made about their relationships based on the sampled population.

TABLE 4 | Correlation values (above diagonal) and corresponding significance (p-values, below diagonal) for seed traits measured on 56 genotypes of purple passion fruit (Passiflora edulis f. edulis Sims).


n(cultivars) = 8; n(genebank) = 14; and n(landraces)=34. Color intensity used to indicate significance values of P < 0.05, P < 0.01, and P < 0.001. Abbreviations: ASE = total seed area; ASU = area covered by seed pits; AVE = angle between vertices; GER = percentage germination; GTE = seed coat/tegument thickness; LOV = seed length on its vertical axis; LOH = seed length on its horizontal axis; NSU = number of seed pits: PCIs = dry seed weight; PCI = fresh seed weight; PER = seed perimeter; PSU = perimeter of seed pits; and VIA = seed viability percentage.

# Geographical Distribution of Genotypes by Seed Size

The seed size descriptor was plotted by geographical coordinates of latitude and longitude for the collection site of each genotype on a map of Colombia (**Figure 4**). The main trend was that larger seeded genotypes with an area of 0.21 to 0.27 mm<sup>2</sup> were distributed only in the Central Andes Mountains while smaller seeded genotypes with an area of 0.1 to 0. 16 mm<sup>2</sup> were distributed throughout all other regions.

Comparing the geographical origins of the genotypes used in the study by regions shown on the map figures, we found that seeds were larger in the Central region of the Colombian Andes, probably due to the type of climate or soils in this region; and to the commercial cultivars originating in this region. Landraces from other parts of the country are more likely to be from ecologically less favorable regions outside the production zones and therefore, have unselected and smaller sized seed.

### DISCUSSION

A major accomplishment of this study was to characterize seed of a large number of genotypes of purple passion fruit for characteristics important to establishment of seedlings and potentially domestication. Seed germination, dormancy, and viability plus subsequent seedling vigor are important limitations for many wild and semi-domesticated plant species (Baskin and Baskin, 1998). Among our observations about seed, we found that the purple passion fruit genotypes from our collection required at least 1.5 months to germinate but could take as long as 3.5 months or could remain dormant. Late and delayed germination was also found by Montaña et al. (2014), who observed that 50% of their seed germinated after 2 months. The rate gradually increases with longer time. Some seeds in our study only germinated after 4 months but were not used for evaluation. Previous studies of purple passion fruit also have observed long physiological dormancy periods prior to germination (Delanoy et al., 2006; Araújo et al., 2007; Barbosa et al., 2012; Veiga et al., 2013).

Another finding of our study was that the low germination values found in purple passion fruit were especially pronounced in landraces compared to commercial cultivars. Montaña et al. (2014) found similar slow germination for landraces with 50% of seeds germinating at 48 days after sowing. Current studies of passion flower seed germination are trying to determine the type of dormancy in P. edulis and its relatives. Some authors have suggested that passion fruit seeds are intermediate between rapid germination typical of fully domesticated annual crops and slow germination typical of many wild or semidomesticate seeds (Silva et al., 2015). The intermediate nature of passion fruit seed is important for conservation efforts, however, more importantly new protocols for successful germination are needed (Meletti et al., 2007; Magalhães, 2010; Veiga et al., 2013; Moreno et al., 2015).

In support of a hypothesis of semi-recalcitrance, this study found that seed viability was slightly higher (cultivars 91%; landraces 71%, and genebank accessions 65%) than seed germination (cultivars 80%; landraces 63%, and genebank accessions 59%); suggesting that many seeds were alive but dormant. Barbosa et al. (2012) and Mira et al. (2015) reported percentage viability greater than 75% using conductivity tests, despite low germination rates of 40%. Germination capacity and seed viability are thought to vary depending on seed type among families of plants, or within species and genera (Baskin and Baskin, 1998). Ramírez et al. (2015) have suggested a role for mycorrhizae in germination of passion fruit seeds. Overall the purple passion fruit can be considered to have strong dormancy that is both physical and physiological in nature (Baskin and Baskin, 1998). The low germination values seen for some genebank accessions in our study (Average 59%) could be due to seed shipment and storage conditions but commercial and landrace types were fresh seed.

Among the groups of genotypes used in this study, we observed that the viability and germination was higher in the commercial cultivars (91 and 80%, respectively) compared to the landrace (74% and 63%) or genebank accessions (65% and 59%). This difference could be explained by selection during the domestication process; whereby commercial cultivars have been selected for higher seed germination so as to be good genotypes for crop establishment, better than landraces. Other possibilities could relate to genetic diversity among the groups, since passion fruit flowers are known to be primarily inbreeding but up to 70% outcrossing (Bruckner et al., 1995), crosses between low and high germination genotypes are needed. Seed quality can also relate to plant physiological differences during fruit production (Carvalho and Nakagawa, 2012; Lima et al., 2017; Fischer et al., 2018). Genebank accessions were briefly stored prior to the tests so some difference with landraces could be due to seed age (Larré et al., 2007).

(100) fresh seed weight; and PCIs = one hundred (100) dried seed weight.

In other observations from this study, seed size was variable between the groups of genotypes. Seed size and weight compared to fruit weight is a determinant of juice yield (Macedo et al., 2015). Juicing success is also based on fruit size and weight or percentage of the fruit made up of the fruit wall or shell. Tradeoffs in aryl versus seed production determine plant nutritional traits of seeds since these are higher in proteins and starches requiring more nitrogen and remobilized carbon, respectively,

while the pulp is higher in sugar content (Devi et al., 2018). Given the characteristics of many seed per fruit and sweet aryls surrounding each seed, evolutionary forces would tend to increase the number of seeds for the dispersal by birds but decrease the seeds for the germination from fallen and rotting fruits. Bird dispersal could also scarify seed through the digestive tract making seed coat thickness and dormancy important issues in the evolution of passion fruit species. In contrast, seedlings germinating from fallen fruit would rely on natural weathering for seed germination. The inter-relationship of all these factors on landraces, which often grow naturally in coffee plantations, would be expected to be higher than on commercial cultivars where farmers select seed harvested from fruits dried and processed in artificial settings, separate from natural processes. These evolutionary forces among others (Baskin and Baskin, 1998) could explain the variations in seed anatomy and morphology which were observed.

Passion fruit species are known to be in the process of diversification (Bruckner et al., 1995) and evolution (Cerqueira et al., 2014; Silva et al., 2017), aided by the high capacity for interspecific hybridization and karyotype plasticity (Melo et al., 2001; Melo and Guerra, 2003). Although purple passion fruit are thought to be mostly inbreeding some outcrossing (Bruckner et al., 1995; Brum et al., 2011; de Lira et al., 2016) may have result in morphological variability (Marostega et al., 2017: Ocampo and d'Eeckenbrugge, 2017). Our study confirms the importance of seed descriptors in evaluating genetic variability.

Seed pitting in addition to seed shape was found to be a critical seed descriptor in our study. In the case of the Passifloraceae family's evolutionary record, seed coats have varied from fovealate (notable seed pitting), to coarsely foveolate, reticulate-foveolate or transversely grooved (Martinez, 2017). Seed lengths varied greatly from very short seeds (1.5 mm) to longer seeds (14 mm); although seed shape tended to be ovoid, obovoid to elliptic. Seed pitting and seed size could have affected the seeds' relationship with the soil it is planted in and the water available for germination. Seed surface characteristics can be determining factors in water absorption capacity and resulting germination across many plant species (Koomneef et al., 2002). The greater the contact, the more water can be absorbed. Therefore, the structure of the seed coat is an important factor to measure. In the case of the purple passion fruit seed, the exposure area was amplified by level of seed pitting of the fovacous seed coats, increasing the contact surface for water and its retention and absorption. Therefore, another important result of our study was the identification of a direct relationship between germination and seed pitting irrespective of seed size, exhibiting the germination the highest correlation with number of pits (0,789) and area pits (0.854).

We observed that the germination was correlated with the three traits of number, perimeter and area of seed pitting, which indicates that if the seed has a large number of pits over a large area of the seed surface or that the seed pits are large, then the seed germination can be accelerated by several months. Although not part of our study, seed pits reduce the thickness of the seed coat in specific portions of the seed, likely favoring water uptake leading to germination. The variability in the expression of seed pitting among landraces and cultivars is probably related to selection and domestication processes for the species. The increase in fruit size may indirectly select for larger seed size and even if seed pitting remains the same in density the number of seed pits increases.

Water content could also play a role in germination capacity; however, in our study we found no significance in between group (p-value = 0.8) or within group (p-value = 0.27) variation for hydration based the weight of 100 fresh (PCI) compared to dried seeds (PCIs). Water in seed can be of four types (1) free water in between seed structure; (2) capillary water circulating in seed tissues; (3) cellular water in seed organs; and (4) molecular water bound to other metabolites, molecules or macromolecules of the seed (Craviotto et al., 2009). All of these in conjunction affect seed conservation strategies and seed quality traits of germination rate, viability and vigor. Water content of our seeds varied between 10 to 13% based on seed drying with silica gel. This value was similar to values reported for many crop species as well (soybean for example with previous reference) and in other types of passion fruits. Posada et al. (2012) found that the seed of granadilla, maracuya or yellow passion fruit and gulupa or purple passion fruit (Passiflora ligularis, P. edulis f. flavicarpa and P. edulis f. edulis, respectively) all had between10 al 12% moisture and that excess drying only affects germination when water content falls to 6%. In future studies we hope to see the effect of seed dessication and moisture content on seed germination and viability in genebanks but for now this was not a priority of our study but rather seed morphology and germination under normal moisture content levels of 10 to 12% were priorities.

In summary, the most important findings of our study were (1) passion fruit seed coats had a rigid structure that was associated with the long dormancy periods; (2) landraces differed from commercial cultivars and genebank accessions in germination time; and (3) there was a positively correlated relationship of germination capacity with the number of seed pits and the area of seed pitting. Seeds with a high number of seed pits over a larger area, had faster germination. Seed structure of purple passion fruit was similar to that of P. ligularis based on histological analysis (Cárdenas-Hernández et al., 2011). Three well differentiated layers were likely to be the internal exotegmen, the middle mesotesta, and the external exotestal. Each layer had a different tissue type with hyaline, sclereid and/or palisade cells. In closing, we can see from this study that seed morphology is relevant both today (Crochemore et al., 2003) and in the study of the fossil record of passion fruits (Martinez, 2017). Seed coat structure is critical for survival of the species and their descriptors are often used along with above ground foliar and floral characteristics to characterize genetic diversity. Viability, meanwhile, is important as it can be lost in storage especially in genebank conservation to over-drying of seed or power failure. In vivo conservation has its own set of advantages and disadvantages. It is therefore necessary to carry out further morphological and physiological evaluations of the seeds found for landraces and commercial varieties as a complement to the determinations of diversity in passion fruit genotypes held

by genebanks. All morphological and genotyping information can be used to find associations with productivity, yield and other morphoagronomic characteristics. These analyses are also useful in the plans for crop improvement or seed certification after varieties have been established in breeding programs.

### DATA AVAILABILITY STATEMENT

All datasets generated for this study are included in the article/**Supplementary Material**.

### AUTHOR CONTRIBUTIONS

NC carried out research conceived by LM and MB. All authors wrote sections of the manuscript. MB edited the manuscript. NC and MB prepared figures and tables.

### FUNDING

We are grateful to the Colombian funding agency, Colciencias; to the Universidad Nacional de Colombia (UNAL); and to

### REFERENCES


Tennessee State University (TSU) for providing funding and support for this project. UNAL-TSU have a memorandum of understanding (2019–23) for student exchange. The grant numbers involved were Colciencias RC 0459-2013; UNAL research and extension division grant Hermes 37360; and the TSU Evans Allen fund TEN-X07.

### ACKNOWLEDGMENTS

We are thankful to Dr. Gustavo Morales (Botanical Garden José Celestino Mutis- Bogotá) and Prof. Fredy Ramos (UNAL), for the collaboration in taxonomic determinations and support in field trips. Profs. Gustavo Ligarreto, Maria Isabel Chacon and Gerard Fischer (UNAL) are acknowledged for helpful advice.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2020.00498/ full#supplementary-material



Agropecuaria Trop. 45, 257–265. doi: 10.1590/1983-40632015v453 3273


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Castillo, Melgarejo and Blair. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Weedy Rice as a Novel Gene Resource: A Genome-Wide Association Study of Anthocyanin Biosynthesis and an Evaluation of Nutritional Quality

Wenjia Wang† , Minghui Zhao† , Guangchen Zhang, Zimeng Liu, Yuchen Hua, Xingtian Jia, Jiayu Song, Dianrong Ma\* and Jian Sun\*

Rice Research Institute, Shenyang Agricultural University, Shenyang, China

### Edited by:

Thomas M. Davis, University of New Hampshire, United States

### Reviewed by:

Dongying Gao, University of Georgia, United States Kanako Bessho-Uehara, Carnegie Institution for Science (CIS), United States

### \*Correspondence:

Dianrong Ma madianrong@syau.edu.cn Jian Sun sunjian811119@syau.edu.cn †These authors have contributed equally to this work

### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 31 January 2020 Accepted: 28 May 2020 Published: 11 June 2020

### Citation:

Wang W, Zhao M, Zhang G, Liu Z, Hua Y, Jia X, Song J, Ma D and Sun J (2020) Weedy Rice as a Novel Gene Resource: A Genome-Wide Association Study of Anthocyanin Biosynthesis and an Evaluation of Nutritional Quality. Front. Plant Sci. 11:878. doi: 10.3389/fpls.2020.00878 The pericarp color of rice grains is an important agronomic trait affected by domestication, and the color pigment, anthocyanin, is one of the key determinants of rice nutritional quality. Weedy rice, also called red rice because its pericarp is often red, may be a novel gene resource for the development of new rice. However, the genetic basis and nutritional quality of anthocyanin are poorly known. In this study, we used a genome-wide association study (GWAS) to find novel and specific QTLs of red pericarp in weedy rice. The known key gene site of red pericarp Rc was detected as the common genetic basis of both weedy and cultivated rice, and another 13 associated signals of pericarp color that were identified may contribute specifically to weedy rice pericarp color. We then nominated three pericarp color genes that may contribute to weedy rice divergence from cultivated rice based on selection sweep analysis. After clarifying the distribution and growth dynamics of pigment in weedy rice caryopsis, we compared its nutritional quality with cultivated rice. We found that sampled weedy rice pericarps had much greater quantities of anthocyanin, beneficial trace elements, free amino acids, and unsaturated fatty acids than the cultivated rice. In conclusion, the gene resources and novel genetic systems of rice anthocyanin biosynthesis explored in this study are of great value for the development of nutritious, high anthocyanin content rice.

Keywords: weedy rice, genome-wide association study, anthocyanin biosynthesis, nutritional quality, pericarp color

### INTRODUCTION

Weedy rice (Oryza sativa f. spontanea) refers to rice plants that grow in rice fields or in surrounding fields as a weed. Weedy rice is spread throughout the rice paddy system worldwide, with temperate and subtropical areas being the hardest hit (Brespatry et al., 2001; Tai, 2002; Vaughan et al., 2017). Currently, research on the origin and evolution of weedy rice has made some progress, however,

**Abbreviations:** C3G, cyanidin-3-glucoside; GMO, genetically-modified organism; GWAS, genome-wide association study; KNN, k-nearest neighbor algorithm; P3G, paeoniflorin-3-glucoside; QTL, quantitative trait locus; WR, weedy rice; WRAH, weedy rice from Asian high latitudes; WRSC, weedy rice from south china.

there is no definite conclusion about the origin of weedy rice (Li L. F. et al., 2017; Qiu et al., 2017; Sun et al., 2019).

In general, the morphological characteristics of weedy rice fall between the wild rice species (Oryza rufipogon) and cultivated rice (Oryza sativa L.) (Ma et al., 2008). Weedy rice has distinctive biological characteristics such as a short growth period, black hull, strong granulation, long dormancy period, long awning, and red pericarp (Chen et al., 2004; Pipatpongpinyo et al., 2019). Because most weedy rice has a colored hull and red pericarp, it is often called red rice (Ma et al., 2008). The layers of weedy rice caryopsis, from the outside to the inside, are pericarp, seed coat, nucellus, aleurone layer, and endosperm (Sellappan et al., 2009; Juliano and Tuaño, 2019). The colored pericarp is an important feature distinguishing it from ordinary cultivated rice (Cui et al., 2016).

Brown rice refers to the caryopsis after the rice husk is removed, and the pericarp, seed coat, and nucellus are intact. Milled rice refers to rice grains with pericarps and seed coats removed. Colored rice such as red rice, gold rice, and black rice, refers to brown rice with colored pericarp (Kim et al., 2008), and the caryopsis color is mainly due to the accumulation of anthocyanins in the pericarp and seed coat (Liu et al., 2011). Colored rice contains high levels of anthocyanins and proanthocyanidins, which have strong antioxidant properties (Lei et al., 2006). Some studies have shown that rice color is related to its nutritional quality. For example, brown rice is not only rich in protein, amino acids, vitamins, vegetable fats, and trace elements such as Ca, Fe, Zn, and Se, it is also rich in biologically active substances such as flavonoids that have anti-oxidation, anti-tumor, free radical-scavenging, and hypoglycemic effects (Gu, 1992; Meng et al., 2005; Guo et al., 2011; Yue-Ting et al., 2016).

With recent developments in genetic analysis, some molecular mechanisms regulating rice pigmentation are well known. For instance the red color of pericarp is affected by the interaction of two genes, Rc and Rd, and the purple color of pericarp is controlled by the joint action of genes Pb and Pp (Furukawa et al., 2007; Rahman et al., 2013). Furukawa et al. (2007) found that the 14 bp deletion of the coding region of the Rc gene caused the loss of Rc gene function, resulting in a white pericarp. Two SNP mutations in the coding region of the Rd gene caused early termination of translation, resulting in the loss of Rd function (Furukawa et al., 2007; Sweeney et al., 2007). Studies have shown that the pericarp color of the weedy rice in southern China is mainly regulated by the Rc gene. The same results were found in American weedy rice (Yang et al., 2009; Gross et al., 2010). As a companion weed of cultivated rice, weedy rice infests rice fields worldwide and is believed to have multiple evolutionary origins from distinct ancestors. Thus whether other genes besides the four known ones regulate its anthocyanin content is still unknown.

The pericarp color is an important agronomic trait affected by domestication, and the color pigment, anthocyanin, is one of the important determinants of rice nutritional quality (Wang et al., 2018). Weedy rice may be a novel gene resource for the development of new rice (Sun et al., 2013). However, the genetic basis and nutritional quality of anthocyanin are poorly known in weedy rice. Therefore, we used genome-wide association analysis to explore the main sites regulating the color of weedy rice pericarp. A follow-up study was carried out on the locations and timing of anthocyanin deposition in weedy rice with different pericarp colors. Also, nutritional quality was measured and compared between cultivated and weedy rice.

# MATERIALS AND METHODS

### Materials

We collected 297 rice samples of six subgroups, including 46 weedy rice from Asian high latitudes (WRAH) and 14 middlelatitudes (WRSC), 69 temperate japonica, 12 tropical japonica, 145 indica and 11 Aus. The samples were divided into two sets for genome-wide association analysis (GWAS): set 1 for the red pericarp from weedy rice samples and set 2 the genetic background for the red pericarp from cultivar samples. All rice samples were maintained and cultivated in the germplasm resources field of Shenyang Agricultural University, China.

In order to further understand the production and distribution of weedy rice pericarp pigments, the typical red pericarp weedy rice WR07-14, WR07-47, WR03-32, WR07- 141, WR07-142 and WR03-29 and white pericarp cultivated rice Shennong265, Akihikari, Nipponbare, Qishanzhan were selected from a GWAS panel for the observation of pigment components and their deposition process and for a nutrient quality analysis.

# Methods

### Genome-Wide Association Study

For GWAS, the phenotype value of pericarp color was defined as one of four levels: 0 for white, 1 for orange, 2 for red, 3 for dark red. Short Oligonucleotide Alignment Program (SOAP) software (Li et al., 2008) was used to aligning the reads of each sample with the reference genome IRGSP v1.0 (Li et al., 2008). The reads that were successfully and uniquely aligned with the reference genome at both ends were considered reliable and used to define SLAF tags. We called population SNPs based on the corresponding genome-wide SLAF tags. A total of 122,777 unimputed SNPs with a minor allele frequency > 0.05 and integrity > 0.5 for evolutionary study have been reported in our previous study (Sun et al., 2019).

In the present studies, we performed imputation to fill the missing genotype for the 122,777 unimputed SNPs by using the k-nearest neighbor algorithm (KNN) model in two sets of GWAS populations (Roberts et al., 2007; Huang et al., 2010). Consequently, the quality of this 122,777 SNPs was improved with integrity > 0.9. Genome-wide association analyses were conducted using a compressed MLM model that could effectively reduce the false positives. The equation of the compressed MLM model is y = Xα + Pβ + Kµ + e, in which y is phenotype, X is genotype, P is population structure matrix (Q matrix), and K is the kinship matrix. The P matrix was built from the top five principal components for population structure correction. The K matrix was built from the matrix of simple matching coefficients. The analyses were performed using TASSEL 5 software (Bradbury et al., 2007). The threshold of significant P-value was determined based on Bonferroni correction method. In order to increase the possibility for overlapping the selection signal, we lowered

the threshold of P-value by an order of magnitude from the Bonferroni correction threshold. Both thresholds of the significant P-value were used in the present study.

### Selection Sweep Analysis and Candidate Gene Nomination

In this study, red pericarp rice (including landrace and weedy rice) were considered as the unimproved rice population, and modern cultivars with white pericarp were considered as the improved population. Nucleotide diversity (π) and Tajima's D were obtained for 500 kb sliding windows between unimproved and improved populations by using vcftools 1.0.3 software (Danecek et al., 2011). Then the selection sweeps were defined by π ratio (πunimproved population/πimproved population) and Tajima's D in per 500 kb genomic slide windows and both selection signals were represented by heat maps. The cut-off of the π ratio for defining the strong selection genomic regions was set at 2.5, and the selected signal defined by negative value of Tajima's D was as a secondary reference. We then nominated the candidate genes according to key motifs that related to anthocyanin synthesis from the Rice Genome Annotation Project<sup>1</sup> . We also searched nutrient-related genes around the genomic regions of GWAS peaks.

### Developmental Dynamics of Pericarp Color and the Determination of Pigment Distribution

The weedy rice that we used to observe the development of pericarp color were sampled every 2 days after flowering. The caryopsis to be used for hand-sliced sections were sampled every week after flowering. A digital SLR camera (Canon 550D) was used to record pericarp color and pigment distribution (Liu et al., 2011).

We further sampled the pericarp, seed coat, and endosperm of weedy rice 1, 2, and 3 weeks after flowering respectively in order to study the distributions of their pigments. Referring to the method of Yu et al. (2010) we dissected these parts by paraffin section. The sample was dehulled and fixed with FAA fixative, and then dehydration, transparency, waxing, embedding, slicing, dipping, dewaxing and rehydration, dyeing, and sealing of the sample were carried out in sequence (Dai et al., 2009; Yu et al., 2010). The histological structure of the weedy rice caryopsis was recorded using a stereo microscope (Zeiss Lumar. V12).

### Determination of Pigment Composition and Content

We collected seeds of each sample (WR07-14, WR07-32, WR07- 47 and WR07-141) at 5, 10, 15, 20, and 25 days after flowering, and we threshed, shelled, and ground the rice into flour. The total content of anthocyanins was determined by the pH differential method (Li et al., 2014). The components of the pigment in the mature weedy rice were identified through High Performance Liquid Chromatography-Mass Spectrometer (HPLC-MS). The column temperature was set to 35◦C. The binary mobile phase consists of a formic acid-ammonium formate solution. The mobile phase gradient decreased from 99% to 60%. The flow rate was 0.3 mL/min (Li R. et al., 2017).

### Determination of Functional Nutrient Quality

Metal element concentrations in the pericarp of weedy rice and control samples were measured by using inorganic mass spectrometry (Gross, 2017). The amino acid concentrations were determined by chromatography using the Hitachi L8800 automatic amino acid analyzer (Hua et al., 2016). The fat was separated by soxhlet extraction for methyl esterification and then analyzed by using Gas chromatography–mass spectrometry (Agilent Technologies 7890A GC System and Agilent Technologies 240 Ion Trap). Resistant starch concentration was measured using the Shanghai Rongsheng Biotechnology Elisa kit.

# Statistical Analyses

Statistical analyses for the comparison of phenotypic values ANOVA were carried out with the statistical software IBM SPSS 2.00 (IBM Crop, Armonk, NY, United States). The level of significance taken as P < 0.05.

# RESULTS

### Genome Wide Association Study of Weedy Rice Pericarp Color

The red pericarp phenotype was observed in all subgroups (**Supplementary Figure S1**). All weedy rice has colored pericarps, and some of these are extremely dark. Aus, indica, and tropical japonica have red pericarps and white pericarps. Most temperate japonica have white pericarps, and a few have red pericarps (**Figure 1**).

WRAH is a branch of temperate japonica, and WRSC is grouped with indica as reported in our previous research (Sun et al., 2019). In addition, the phenotypic variation of pericarp color was observed in all subgroups and did not correspond with subgroup differentiation. For these reasons, the false positives of GWAS due to population differentiation may have little impact, and we conducted GWAS across the six subgroups. In the compressed MLM of set 1 (weedy rice), we identified 13 association signals of pericarp color (P < 1 × 10−<sup>7</sup> ), and in set 2 (cultivated rice), we identified four signals with clear peaks (P < 1 × 10−<sup>7</sup> ) (**Figure 2**). The results showed that the genetic basis of pericarp color had both commonalities and differences between weedy rice and cultivated rice. The genetic basis of the pericarp color of weedy rice was more complex than that of the cultivated rice, which is reflected in the greater number of significant genomic association peak signals (**Figure 2**). Finally, we nominated five candidate genes according to the annotation information based on the Rice Genome Annotation Project (**Table 1**). The known key gene site of red pericarp Rc (Furukawa et al., 2007) was detected in both GWAS panels with significant peak signals. We also found that the association signals of redpericarp synergistic-gene Rd could be detected in the weedy rice GWAS panel. The association signals of purple rice genes Pp and Pb could not be detected in either weedy rice or cultivated rice GWAS panels.

<sup>1</sup>http://rice.plantbiology.msu.edu/


### Selection Sweep Analysis and Candidate Genes

Selection sweep analyses were conducted in the "unimproved" red pericarp population and in the "improved" cultivated white pericarp population. The selection sweep parameter of πunimproved population/πimproved population per 500 kb genomic slide windows were highlighted from yellow (0.17) to magenta (4.22) (**Figure 3A**). Another artificial selection parameter Tajima's D represented by a black and white heat map, supported the selection sweep to define the selection genomic regions (**Figure 3B**). In order to nominate candidate genes that may contribute to the red pericarp color of weedy rice, we considered whether the GWAS peaks were covered by the selection signal. Finally, seven GWAS peaks with the strong selection signal were detected, of which the positions of known red pericarp gene Rc and two new candidate genes, LOC\_Os03g08930 and LOC\_Os12g37419, were overlapped (**Figure 3C** and **Table 1**). In addition, a Glutelin-family-protein gene Os03g0188500 (without MSU ID) was linked with LOC\_Os03g08930, which may together contribute to grain quality and nutrition. Although the GWAS peak related Rd gene was detected, its selection signal was weak. The genome-wide GWAS peaks combined with the selection sweep were shown in **Supplementary Figure S2**.

### Developmental Dynamics of Weedy Rice Pericarp Color

The typical red pericarp weedy rice WR07-14, WR07-47, WR03- 32, WR07-141, were selected from the GWAS panel for the observation of pigment components and the deposition process. The pericarp color of all sampled weedy rice was green in the early stage of development (0–5 days after flowering). The pigment began to deposit on both ends of the caryopsis (6–8 days after flowering), accumulating along the vascular bundle at the back of the caryopsis (9–15 days after flowering), and finally occurring throughout the entire caryopsis (16–18 days after flowering). When the seeds were mature, the pigmented pericarps were slightly faded (26–28 days after flowering) (**Figure 4A**).

We dissected the weedy rice WR07-14 caryopsis, including pericarp, seed coat, and endosperm (including aleurone layer) at different stages of development. At 1–7 days after flowering, the caryopsis was green, the seed coat was transparent, and the endosperm was white. At 8–13 days after flowering, the accumulated pigment in the pericarp was a light color, the thin seed coat was lighter in color than the pericarp, and the endosperm was still white. At 14–21 days after flowering, the pericarp presented dark red, the seed coat was colored but thinner than the pericarp, and the endosperm was white. Based on these observations, the pigment was mainly deposited in the pericarp, a small amount was deposited in the seed coat, and the pigment was absent in the endosperm (**Figure 4B**).

At 2 weeks after flowering, the pericarp and seed coat cells of the weedy rice began to shrink, and the binding between them gradually became tight. The pericarp cells began to shrink, and the pigment began to accumulate mainly in the lower part of the mesocarpal cells. A small amount of pigment was deposited in the outer pericarp. Pigments were difficult to detect in the seed

fpls-11-00878 June 10, 2020 Time: 12:30 # 5

Chr represents

 a chromosome.

coat. The aleurone cells began to form out of the outer layer of endosperm, and they arranged themselves into a rectangular shape. At 3 weeks after flowering, the cell binding between pericarp and seed coat was more compact, and the seed coat layer was completely flattened. A large amount of pigment was deposited in the pericarp, and a small amount was deposited in the seed coat. The gap between the aleurone layer and the endosperm cells shrank, and neither the aleurone layer nor the endosperm portion was pigmented (**Figure 4C**).

# Developmental Dynamics of Pigment Concentration and Composition in Weedy Rice Pericarp

To determine changes in pericarp pigmentation with weedy rice seed growth, we further measured the changes of anthocyanin concentration and composition in four typical weedy rice accessions, WR07-14, WR07-47, WR03-32, and WR07-141, since the anthocyanin is the main component of pericarp pigment. The concentration of anthocyanins rose and then fell until it stabilized, with the peak appearing sometime between the 15th to 20th day after flowering (**Figure 5A**). As expected, the anthocyanin concentration in dark red pericarp weedy rice WR07-14 was higher and accumulated more quickly than that in red pericarp weedy rice during the whole development period (**Figure 5A**). The ranking order of anthocyanin concentration and color from deep to light among rice strains was WR07- 14 > WR07-47 > WR03-32 > WR07-141.

The anthocyanin components of weedy rice were detected by using the hydrochloric acid-vanillin method and high performance liquid chromatography-mass spectrometry (HPLC) (Baldi et al., 1995; Pei et al., 2017). We found that paeoniflorin (P3G) only occurred in dark red pericarp weedy rice (WR07-14 and WR07-47), based on HPLC, and proanthocyanidins only occurred in the red pericarp weedy rice (WR03-32 and WR07- 141), based on the hydrochloric acid-vanillin method (**Table 2**). By comparing the peak positions of the 4 weedy rice accessions with the standards in the HPLC spectrum, we determined the anthocyanin composition of the samples (**Figure 5B**). Cyanidin (C3G) was detected in all 4 weedy rice accessions, and its concentration was higher in dark red pericarp weedy rice than in red pericarp weedy rice (**Table 2**).

# Trace Element Differences Between Weedy Rice and Cultivated Rice

Cultivated red rice is considered to be a beneficial, healthful food (Rhanissa et al., 2011). However, the functional and nutritional qualities of weedy rice in red and other colors is poorly known. The concentrations of the metal elements were measured in weedy rice accessions WR07-14, WR07-47, WR03-32, WR07- 141, WR07-142 and cultivated rice accessions Shennong265, Akihikari, Nipponbare, and Qishanzhan. The concentrations of various metal elements in weedy rice were significantly higher than that of control cultivated rice, sometimes 2–3 times greater (**Table 3**). The concentration of Mg was higher than that of other metal elements in both weedy rice and cultivated rice, followed by Ca > Zn and Mn > Cu > Mo > Se > Ge.

### Free Amino Acid Content Differences Between Weedy Rice and Cultivated Rice

Free amino acids are important biologically active molecules which play a role in the synthesis of proteins and in providing energy for the body and brain activity (Zarei et al., 2017). A total of 17 free essential amino acids were detected in the five weedy rice and four cultivated rice varieties analyzed (**Table 4**).

Glutamate (Glu) was the most abundant free amino acid in both weedy and cultivated rice. The total amino acid concentration was higher in the weedy rice than in the cultivated rice, and most individual free amino acids (Asp, Glu, Val, and Thr) were significantly higher in concentration in the weedy rice than in the cultivated rice. However, Cys and Gly were significantly lower in the weedy rice than in the cultivated rice (**Table 4**).

# Fatty Acid Content Differences Among Color Pericarp Weedy Rice

Saturated fatty acids (palmitic acid and stearic acid) and unsaturated fatty acids (oleic acid and linoleic acid) were detected in weedy rice and cultivated rice in this study. We found that the overall ratio of saturated to unsaturated fatty acids in the weedy rice was slightly lower than that in cultivated rice. The ratios of saturated fatty acid and unsaturated fatty acid content were 20.53 and 79.47% in weedy rice, and the ratios in cultivated rice is 23.03 and 76.97%. There was no significant difference in the proportion of palmitic acid (a saturated fatty acid) between weedy rice and cultivated rice. The proportion of stearic acid (a saturated fatty acid) in weedy rice was significantly lower than that of control cultivated rice (**Table 5**). These indicators imply that weedy rice fatty acid is superior to the control cultivated rice.

# DISCUSSION

Our research has shown certain differences in the development of pigmentation in weedy rice compared to previous studies. Liu et al. (2011) found that the growth and endosperm development of colored rice caryopsis are basically the same as

those of conventional rice varieties except for pigments such as anthocyanin. Four to five days after flowering, anthocyanin began to deposit in the pericarp of the caryopsis, and then deposited in the pericarp at both ends of the caryopsis and at the vascular bundle at the back, and finally covered the entire caryopsis 10 days after flowering. The anthocyanidin in the caryopsis mainly accumulated in the 7–20 days after flowering (Liu et al., 2011). Han observed that black rice pigment was synthesized and deposited only in the pericarp of the caryopsis, and no deposition in the seed coat and aleurone layer was observed (Han et al., 2009). The anthocyanidin was deposited on the pericarp 3 days after pollination, and the anthocyanin increased rapidly after 5–6 days of pollination. After 7 days, the anthocyanids filled the entire pericarp. The far end of the caryopsis first deposited anthocyanidose, then gradually extended to the end of the embryo and stained the entire skin (Heo et al., 2006). The pigment deposition process and location of weedy rice found in the present study were basically consistent with previous studies. Pigment deposition of weedy rice has 1–2 days delay compared to cultivated rice. In our study, accumulation of pigment in weedy rice began at both ends of the caryopsis and then accumulated in the vascular bundle on the back of the caryopsis, and the entire caryopsis was covered by pigment. The time when the weedy rice pigment began to deposit was 6–8 days after flowering, and the pigmentation time of the colored rice was slightly earlier. Most pigmentation occurred slightly in seed coats.



C3G stands for cyanidin-3-glucoside and P3G stands for peonidin-3-glucoside. The different lowercase letters in the columns of the table indicate the significant difference in the content of each component between the different weedy rice at the p < 0.05 level. Because the color of the control cultivated rice pericarp is white and colorless, it is 0. The unit of the anthocyanins is (mg/100 g).

It has been reported that anthocyanin accumulation was positively correlated with Superoxide Dismutase (SOD) in seeds, and anthocyanin accumulation led to darker seed coat color (Yi et al., 2019). In the present study, anthocyanin in grains increased with filling during the filling stage. However, the anthocyanin content decreased after the seed maturation. We speculate that the decrease in grain vigor after seed maturation leads to a decrease in anthocyanin content, specifically the decrease in selfoxidant activity leading to a rapid decline in the accumulation rate of anthocyanins. Further related mechanisms are worth to study in the future.

### Genetic Basis of Weedy Pericarp Color and Its Significance to Domestication

During the domestication of Oryza sativa, pericarp color was an important trait that humans targeted to improve rice quality (Qiu et al., 2017). Oryza rufipogon and some of the early landraces exhibited red pericarp. However, modern rice cultivars appear white due to a lack of red pigmentation in the pericarp. Rc and Rd are two key genes related to pericarp color. The wild type Rc allele encodes a basic helix–loop–helix (bHLH) transcription factor, and the loss-of-function mutant rc allele causes the change from red to white pericarp (Sweeney et al., 2006). Rc gene loci were shown to have contributed to rice domestication (Sweeney et al., 2007). Rd is another synergistic gene of rice pericarp color that enhances the effect of Rc gene to promote proanthocyanidin synthesis (Furukawa et al., 2007). The purple seed coat gene Pb and its complementary gene Pp are located on chromosomes 4 and 1, respectively. When the mutant allele of Pb is present, the pericarp is white. When the Pb gene is present alone, the pericarp is brown. When the genes Pb and Pp are present at the same time, the seed coat is purple (Wang et al., 2014).

In this study, the association signals of purple rice genes Pp and Pb could not be detected in red pericarp rice (weedy or cultivated) through the GWAS panel. This implies that the two genes did not contribute to the red pericarp of weedy rice because they are two rare alleles specific to purple rice. On the other hand, we found that the weedy rice may have a novel and complex genetic system of pericarp color that involves Rc, Rd, and other unexplored genes. Rice pericarp color was a typical domestication-related trait of cultivated rice with a relatively simple genetic basis, i.e., the Rc gene experienced intense artificial selection on the pericarp color. We found some minor effect or synergistic alleles that may retain an undomesticated pericarp color gene in weedy rice. Therefore, in semi-domesticated weedy rice populations, there may exist some unexplored genetic basis of different types of agronomic traits that will further clarify the process of rice domestication. For instance, a nutrient gene (Os03g0188500) related to glutamic acid biosynthesis is linked to a candidate pericarp color gene (LOC\_Os03g08930), which implies a potential selection pressure related to pigments and nutrients of weedy rice.

# Significance of Weedy Rice Anthocyanin Biosynthesis

Rice anthocyanins could be a beneficial part of human diets due to their high antioxidant activities. Lei pointed out that red rice pigment was synthesized and deposited only in the pericarp of the caryopsis and absent in the seed coat and aleurone layer (Lei et al., 2006). However, a breakthrough was


WR (Weedy rice), AKI (Akihikari), SN265 (Shennong265), QSZ (Qishanzhan), NIP (Nipponbare), WR-AVE (Weedy rice Average), CR-AVE (Cultivated Rice Average). AKI, SN265, QSZ, and NIP are control cultivated rice. The different lowercase letters in the columns of the table indicate the significant difference in the content of each component between weedy rice and control cultivated rice at p < 0.05 level. Capital letters indicate the significant difference in the content of each component between weedy rice and control cultivated rice at p < 0.01 level. The unit of the anthocyanins is (mg/kg).



WR (Weedy rice), AKI (Akihikari), SN265 (Shennong265), QSZ (Qishanzhan), NIP (Nipponbare), WR-AVE (Weedy rice Average), CR-AVE (Cultivated Rice Average). AKI, SN265, QSZ, and NIP are control cultivated rice. The different lowercase letters in the columns of the table indicate the significant difference in the content of each component between weedy rice and control cultivated rice at p < 0.05 level. Capital letters indicate the significant difference in the content of each component between weedy rice and control cultivated rice at p < 0.01 level. The unit of the anthocyanins is (mg/g).

made by Zhu et al. who recently engineered a high-efficiency vector system for transgene stacking to enable anthocyanin biosynthesis in endosperm. They made a construct containing eight anthocyanin-related genes driven by endosperm-specific promoters and generated a novel biofortified germplasm, "Purple


WR (Weedy rice), AKI (Akihikari), SN265 (Shennong265), QSZ (Qishanzhan), NIP (Nipponbare), WR-AVE (Weedy rice Average), CR-AVE (Cultivated Rice Average). AKI, SN265, QSZ, and NIP are control cultivated rice. The value represents the percentage of each fatty acid component in the total. The different lowercase letters in the columns of the table indicate the significant difference in the content of each component between weedy rice and control cultivated rice at p < 0.05 level. Capital letters indicate the significant difference in the content of each component between weedy rice and control cultivated rice at p < 0.01 level.

Endosperm Rice," with a high anthocyanin content (Zhu et al., 2017). However, the use of such as genetically modified organism (GMO) as a staple food still requires a long process of research and approval. Alternatively, developing natural germplasm resources with high anthocyanin contents can be achieved in the short term. In particular, marker-assisted breeding can accelerate this process once the genetic basis of desired traits is determined.

Rice nutritional quality has attracted more attention in the traditional growing areas of Asia, where monotonous consumption of rice may lead to deficiencies of essential trace elements, amino acid, and other nutrients (Bouis, 2003). In this study, we found that sampled weedy rice had much greater quantities of anthocyanin, beneficial trace elements, free amino acids, and unsaturated fatty acids than the control cultivated rice. Unsaturated fats are considered healthier for consumption than saturated fats (Yasumatsu and Moritaka, 2008). Therefore, the gene resources and novel genetic systems of rice anthocyanin biosynthesis explored in this study are of great value for the development of high-anthocyanin content rice.

### DATA AVAILABILITY STATEMENT

The original contributions presented in the study are included in the article/**Supplementary Material**, further inquiries can be directed to the corresponding authors.

### AUTHOR CONTRIBUTIONS

DM, JS, and MZ conceived the project and experiment. JS, WW, and MZ performed the SLAF sequencing and population

genetic analysis. WW observed the process of pigmentation in weedy rice and cultivated rice. WW and JyS examined the composition and content of pigment in weedy rice and cultivated rice, and analyzed the nutritional quality of weedy rice and cultivated rice. WW, JS, DM, MZ, GZ, ZL, YH, and XJ provided the germplasm and performed the germplasm management. JS and WW conducted a selection strength analysis. WW, JS, MZ, and DM interpreted the data and wrote the manuscript. All authors contributed to the manuscript and approved the submitted version.

### FUNDING

This work was supported by the National Key R&D Program of China (grant number 2017YFD0100501), Program of Liaoning Revitalization (XLYC1808003), and the National Natural Science Foundation of China (grant number U1708231).

### REFERENCES


Gross, J. H. (2017). Inorganic Mass Spectrometry. Cham: Springer.


### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2020.00878/ full#supplementary-material

FIGURE S1 | QQ Plot diagram of the improved and unimproved red rice populations' GWAS results. (A) The red pericarp in the sample is the result of the cultivation of rice. The lower fitting part indicates that the model is reasonable, and the upper raised part indicates that it is affected by the color-related locus of the pericarp. (B) The red pericarp in the sample of weedy rice. The lower fitting part indicates that the model is reasonable, and the upper raised part indicates that it is affected by the color-related locus of the pericarp.

FIGURE S2 | Genome-wide selection sweep with GWAS peak. (A) Selection sweep plot of π ratio (πunimproved population/πimproved population) , the black color line indicates the threshold line which used to determine strong or weak selection signal. (B) The dark green indicates selection sweep of Tajima's D in unimproved population. The light green indicates selection sweep of Tajima's D in improved population. The red dotted line indicates the GWAS peaks.

TABLE S1 | Material information of the GWAS populations.


seedbank longevity in weedy rice. Heredity 124, 135–145. doi: 10.1038/s41437- 019-0253-8


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Wang, Zhao, Zhang, Liu, Hua, Jia, Song, Ma and Sun. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# New Food Crop Domestication in the Age of Gene Editing: Genetic, Agronomic and Cultural Change Remain Co-evolutionarily Entangled

David L. Van Tassel<sup>1</sup> , Omar Tesdell<sup>2</sup> , Brandon Schlautman<sup>1</sup> , Matthew J. Rubin<sup>3</sup> , Lee R. DeHaan<sup>1</sup> , Timothy E. Crews<sup>1</sup> and Aubrey Streit Krug<sup>1</sup> \*

<sup>1</sup> The Land Institute, Salina, KS, United States, <sup>2</sup> Department of Geography, Birzeit University, Birzeit, Palestine, <sup>3</sup> Donald Danforth Plant Science Center, St. Louis, MO, United States

### Edited by:

Eric Von Wettberg, The University of Vermont, United States

### Reviewed by:

Ashley DuVal, Mars, United States Edward Marques, The University of Vermont, United States

\*Correspondence: Aubrey Streit Krug streitkrug@landinstitute.org

### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 06 January 2020 Accepted: 18 May 2020 Published: 11 June 2020

### Citation:

Van Tassel DL, Tesdell O, Schlautman B, Rubin MJ, DeHaan LR, Crews TE and Streit Krug A (2020) New Food Crop Domestication in the Age of Gene Editing: Genetic, Agronomic and Cultural Change Remain Co-evolutionarily Entangled. Front. Plant Sci. 11:789. doi: 10.3389/fpls.2020.00789 The classic domestication scenario for grains and fruits has been portrayed as the lucky fixation of major-effect "domestication genes." Characterization of these genes plus recent improvements in generating novel alleles (e.g., by gene editing) have created great interest in de novo domestication of new crops from wild species. While new gene editing technologies may accelerate some genetic aspects of domestication, we caution that de novo domestication should be understood as an iterative process rather than a singular event. Changes in human social preferences and relationships and ongoing agronomic innovation, along with broad genetic changes, may be foundational. Allele frequency changes at many loci controlling quantitative traits not normally included in the domestication syndrome may be required to achieve sufficient yield, quality, defense, and broad adaptation. The environments, practices and tools developed and maintained by farmers and researchers over generations contribute to crop yield and success, yet those may not be appropriate for new crops without a history of agronomy. New crops must compete with crops that benefit from long-standing participation in human cultural evolution; adoption of new crops may require accelerating the evolution of new crops' culinary and cultural significance, the emergence of markets and trade, and the formation and support of agricultural and scholarly institutions. We provide a practical framework that highlights and integrates these genetic, agronomic, and cultural drivers of change to conceptualize de novo domestication for communities of new crop domesticators, growers and consumers. Major gene-focused domestication may be valuable in creating allele variants that are critical to domestication but will not alone result in widespread and ongoing cultivation of new crops. Gene editing does not bypass or diminish the need for classical breeding, ethnobotanical and horticultural knowledge, local agronomy and crop protection research and extension, farmer participation, and social and cultural research and outreach. To realize the ecological and social benefits that a new era of de novo domestication could offer, we call on funding agencies, proposal reviewers and authors, and research communities to value and support these disciplines and approaches as essential to the success of the breakthroughs that are expected from gene editing techniques.

Keywords: domestication, coevolution, gene editing, agronomy, cultural evolution, new crops

# INTRODUCTION

fpls-11-00789 June 9, 2020 Time: 20:58 # 2

De novo domestication of new crops from currently wild plants could help solve a wide range of problems, including genetic and species diversification of agricultural systems (Fernie and Yan, 2019); expansion of agricultural production onto degraded sites, stressful environments, or regions highly vulnerable to climatic extremes (Zhang et al., 2018); improvement of the fit between crops and particular local ecological niches (Fernie and Yan, 2019); and intensification of the range of vital ecosystem services provided by crops (Weißhuhn et al., 2017). Many new food crops will need to produce larger, more harvestable tubers, roots, fruits and seeds. Because this kind of change is both visually obvious, including in archeological records, and something that evolved independently many times in the past, it is not surprising that these are the core traits of the domestication syndrome for food plants (Doebley et al., 2006; Dong et al., 2019; Woodhouse and Hufford, 2019) and attractive targets for genetic modification.

In much of the recent de novo food crop domestication literature, domestication and the fixation of alleles conferring the domestication syndrome are used as interchangeable concepts (discussed below). However, our definition is closer to that of Harlan (1992), who describes broader changes: "Domesticated plants are those brought into the domus [Latin for household] which may mean the dooryard, garden, field, orchard, vineyard, pasture or ranch. It may also include yards, parks, cemeteries, golf courses, roadsides, forests, and other managed areas. In ecological terms it is the change in habitat that is critical.... domestication tends in the direction of making the plant populations dependent on human interference and man-made habitats. Since the processes of domestication are evolutionary in nature, all intermediate degrees and conditions may be expected, but a fully domesticated plant is entirely dependent on human intervention for survival."

Advances in gene editing technological interventions, such as CRISPR/Cas9, could enable a new era of de novo domestication (Østerberg et al., 2017; Chen et al., 2019; Eshed and Lippman, 2019; Fernie and Yan, 2019; Khan et al., 2019; Wolter et al., 2019) defined as "the introduction of domestication genes into nondomesticated plants" (Fernie and Yan, 2019). Several labs have independently and successfully modified domestication-related genes in wild species as proof-of-concept (Lemmon et al., 2018; Li et al., 2018; Zsögön et al., 2018). Discovering or creating favorable alleles of relatively simply inherited "domestication genes" (DGs) (Doebley et al., 2006; Østerberg et al., 2017; Khan et al., 2019) – whether through classic selective breeding, mutagenesis or genome-editing – is helpful during de novo domestication.

Such research could contribute new understanding and opportunities, particularly in well-characterized genes such as the seed dormancy gene G, which appears to have been selected in parallel during the domestication of several crops in at least three plant families (Rendón-Anaya and Herrera-Estrella, 2018). Gene editing and similar technologies may be necessary to produce effective DG alleles on a realistic timeframe for the de novo domestication of some wild plant taxa (Eshed and Lippman, 2019; DeHaan et al., 2020) and some authors recognize that novel variation must then be introduced into diverse germplasm (Lemmon et al., 2018) and "tuned" by traditional selection on standing variation of many small-effect loci (Eshed and Lippman, 2019).

However, other recent gene editing papers may inadvertently oversimplify the rich, complex process of plant domestication. Some imply that prehistoric domestication happened when "simple choices ultimately led to the pyramiding of valuable mutations and re-combinants in key genes" (Fernie and Yan, 2019) and emphasize the importance of monogenic domestication traits (Li et al., 2018). Zsögön et al. (2018) state that the editing of six genes demonstrates "that targeted reverse genetic engineering of wild plants could rapidly create new crops."

The availability of new tools such as CRISPR, and the possibility of applying them toward de novo domestication of novel crops, could be misunderstood by society to mean that gene editing can produce domestication as a nearly instant and/or inexpensive event, failing to recognize the many years of foundational research that was required to characterize and identify candidate genes and develop transformation and tissue culture regeneration systems (Van Eck, 2018). The tendency to emphasize the speed and ease of domestication of plants such as Solanum pimpinellifolium (currant tomato) via gene editing is illustrated in the following headlines and statements: "CRISPR can speed up nature... It took thousands of years for humans to breed a pea-sized fruit into a beautiful beefsteak tomato. Now, with gene editing, scientists can change everything" (Hall, 2018); "Gene editing can potentially cram millennia of agricultural progress into the blink of an eye" (Keats, 2019); "Plant breeding at the speed of light: The power of CRISPR/Cas. . ." (Wolter et al., 2019); "CRISPR/Cas brings plant biology and breeding into the fast lane" (Schindele et al., 2020); "The CRISPR technique quickly tamed 'unruly' ground cherries" (Saey, 2018); "For the first time, researchers have created, within a single generation, a new crop from a wild plant. . .by using a modern process of genome editing" (University of Munster, 2018); "Tweaking just a few genes in wild plants can create new food crops" (Hereward and Curtis, 2018).

While new gene editing technologies may accelerate some genetic aspects of domestication, we caution that de novo domestication resulting in widespread and ongoing cultivation should be more accurately understood as an ongoing, iterative process rather than a singular event. New genetic techniques may allow us the opportunity to begin – not to bypass – this "lengthy and tedious" (Wolter et al., 2019) work. Whereas earlier researchers emphasized the ways that humans had discovered loss-of-function plant mutations during domestication, leaving plants dependent upon farmers for their defense and dispersal (Zohary and Hopf, 2000; Anderson, 2005), the emerging consensus is that domestication is a long-term, co-evolutionary deepening of a mutualistic relationship, involving cultural, technological, biological, and ecological factors (Zeder et al., 2006; Gepts, 2010; Meyer et al., 2012; Meyer and Purugganan, 2013; Gremillion et al., 2014; Larson et al., 2014; Allaby et al., 2017; Swanson et al., 2018; and see especially the review by Zeder, 2015).

Historically, the genetic changes that have come to distinguish domesticated plants from their wild relatives are the outcome of an interaction of sociocultural and environmental pressures reflecting broader changes in human preference and agronomic intervention. In domestication, culturally innovative changes in human behavior that result in environmental modification (such as preparing soil and plant harvesting and processing) are "entangled" with biologically innovative changes in plant genetics (Fuller et al., 2010). Domestication is an ongoing process since people and plants live in dynamic social-ecological systems.

A more interdisciplinary approach is therefore needed to inform the review and implementation of new gene editing research on de novo domestication. We provide a practical framework that highlights and integrates multiple drivers of change (**Figure 1**) to conceptualize de novo domestication for communities of new crop domesticators, growers and consumers. Various magnitudes of genetic, agronomic, and cultural change drive cycles of selective pressures that result in the fixation of domestication genes – and broad genetic changes, innovations in management, and social support may all be required to enable domestication processes to continue at different pivotal moments and for the long term across human generations and geographies (**Table 1**). To realize the ecological and social benefits that a new era of de novo food crop domestication could offer, it is necessary to recognize how genetic, agronomic, and cultural changes remain co-evolutionarily entangled (**Figure 2**).

### GENETIC CHANGE BEYOND DOMESTICATION GENES

Wild plants can be regarded as reservoirs of useful genes (Zhang et al., 2018) needed to create "climate-smart crops" (Li et al., 2018). However, the view that this genetic resource can be tapped by cleanly switching out a few domestication genes, hopefully "without causing an associated drag on other useful traits" (Li et al., 2018) is overly optimistic because it underestimates the additional genetic changes that will be required to improve wild plants' ability to respond to favorable environments, high plant density, increased population, and human care. Domestication is not a single genetic transformation. Other kinds of plant genetic research and genetic change are needed beyond and/or preceding modification of DGs.

# Identification and Preparation of Candidates

Research may be needed to identify potential candidates for domestication (Ciotir et al., 2019), to rank species or subspecies by predicted breedability (e.g., genome size, ploidy, mating system), to identify barriers to human use (e.g., toxicity), and to prepare wild germplasm for gene-editing (e.g., genomic sequencing, mapping) (DeHaan et al., 2016). The foundation of a domestication program includes obtaining and characterizing the wild germplasm (Schlautman et al., 2020) and ascertaining that additive genetic variation for crop yield and disease resistance are available in the gene pool. A successful domestication candidate should come from a sufficiently large primary or secondary gene pool to ensure sustained breeding, as domestication by any means is only the beginning of continuous cycles of breeding for yield, quality and resistance.

# Core Germplasm Development and "Tuning" of DGs

Domestication via gene-editing almost certainly implies a severe genetic bottleneck because it currently requires an efficient plant transformation/regeneration protocol (Chen et al., 2019; Fernie and Yan, 2019). Tissue culture protocols are not available for most wild plants or every genotype of crop plants; even where many amenable genotypes are available, genetic modification is laborious and may be constrained by intellectual property protection issues, thereby limiting the number of transformed lines that will be rapidly available to farmers and breeders. Mutational load could increase by drift during such a bottleneck (or later through hitchhiking with DGs) as appears to have happened during maize domestication (Wang et al., 2017). Pre-domestication inbreeding could be attempted to estimate mutation load and thus identify candidates with lower levels of preexisting load, or even to begin to purge it. Alternatively, edited genes could be immediately introgressed into preidentified genetic diversity panels to reverse the bottleneck effect (Eshed and Lippman, 2019; Fernie and Yan, 2019), although recovering homozygosity for the edited allele while avoiding genetic bottlenecking will require sophisticated crossing strategies. Reversing bottlenecks, identifying unpredicted genetic background effects (Wolter et al., 2019), and the general need to "tune" the expression of novel major-effect alleles (Eshed and Lippman, 2019) all argue that extensive postdomestication breeding should be expected in addition to predomestication work.

TABLE 1 | Common evidence, processes, and participants and practices involved in the genetic, agronomic, and cultural drivers of de novo domestication and crop improvement.


# communications


# Breeding for Vigor and Broad Adaptation

All the world's major crops are, almost by definition, grown far beyond the environment of their origin. They have adapted to many soil types, climates and daylengths. To provide a positive return on the research investment and to attract sufficient cultural notice as to become part of human cuisine, de novo domesticates will arguably need to be broadly adapted. Candidate species should be evaluated in multiple environments to identify those capable of broad adaptation or those with genetic variation for yield stability in multiple years and environments. Evaluation of breeding lines in multiple environments permitted the development of broadly adapted wheat varieties. Norman Borlaug advanced only varieties "that withstood the rigors of both environments," and considered broad adaptation to have been one of the three greatest successes of wheat breeding in the late twentieth-century (Borlaug, 1983). With increasing climate instability, new crops must be able to withstand different rigors even in a single location, within or across years.

# Breeding for Biotic and Abiotic Stress Tolerance

Winter hardiness (for perennials and winter annuals) and disease/insect resistance are specific adaptation traits that DG editing will not improve for undomesticated plants. Editing/engineering of resistance (R) genes could help, of course, although R genes are more likely to be species-specific "orphan" genes than DGs (Woodhouse and Hufford, 2019) and therefore researchers may rarely be able to take advantage of the gene discovery work already accomplished for existing crops. Crop pathogens reduce harvests by about 20% globally (Bebber and Gurr, 2015). Wild plants are not immune to pathogens; fungicide application increased plant biomass by 31% in a native North American grassland (Mitchell, 2003). The cultivation of wild plants could create conditions that favor the spread of some of their pests and pathogens, for example, by increasing the number, density, size and uniformity of host plants (Chen et al., 2018). Breeding for increased biotic stress resistance may therefore be necessary for new crop domestication and plant pathologists recommend that breeders include selection for quantitative and

FIGURE 2 | Genetic, agronomic, and cultural changes are co-evolutionarily entangled in crop domestication. A reticulate biological/cultural "evolutionary tree" is shown in cartoon form (not to scale) to illustrate how "lineages" of plants, agronomic technology/traditions, and inherited human ethnobotanical culture "hybridize" to create different kinds of crops: (1) De novo incipient new crops from wild plants by breeding or gene editing (e.g., silphium), (2) New food crops developed from non-food crops by breeding or gene editing (e.g., intermediate wheatgrass), (3) Forage, fiber, and energy crops (e.g., alfalfa), (4) Major world food crops (e.g., maize, rice), (5) Cultivated crops with little genetic change (e.g., cranberry), (6) Culturally important wild-crafted food and medicinal plants (e.g., 'akkoub). Major evolutionary innovations are numbered: (1) Cultural references to plants (names, stories, recipes, etc.) appear, (2) Agronomic practices appear, (3) Genetically distinct cultigens appear. Narrow lines show "horizontal" influences such as accidental or deliberate gene flow between cultigens and wild relatives (narrow green lines) or changing cultural uses of harvested plants (narrow purple lines) or the influence of new agronomic practices and technologies (narrow brown lines). Dotted narrow lines indicate a very recent or emerging influence.

broad-spectrum resistance traits (Bebber and Gurr, 2015) in addition to R gene mediated resistance.

### Breeding for Yield and Quality

Traits controlled by many genes are poor candidates for genome editing and improvement may require more conventional cycles of breeding for the foreseeable future. While some important agronomic traits are monogenic, domestication traits in some crops are polygenic (Hämälä et al., 2019). Flavor, ripening, postharvest physiology, and nutritional profile are important traits for food crops and unlikely to be completely satisfactory in "crops" undifferentiated from wild ancestors except at a few DG loci and are likely to be genetically complex traits. For example, Zhang et al. (2015) found 28 volatile chemicals to be involved in tomato flavor/aroma and associated with 125 markers. Seed set (or seed fertility), the percentage of florets producing a viable seed, has been found to be quite low in some wild plants both in their wild habitats and under cultivation (Cornelius, 1950). However, recurrent phenotypic selection can improve this trait (Marshall and Wilkins, 2003), which would improve the yield and breedability of crops developed by gene editing. The conventional modeling of yield as a genetically complex trait with many smalleffect loci continues to be supported (e.g., Martínez et al., 2016). Evidence that genetic correlations between agronomic traits constrained and slowed maize domestication and that genetic constraints increased during domestication (Yang et al., 2019) suggests that – for some species at least – it will be hard to identify many agronomic targets for genetic modification that have large, positive, and independent effects.

# Value of Recognizing Broad Genetic Changes Involved in Domestication

The previous section shows that recognizing genetic change in the broader context of ongoing domestication can lead DG-based project teams to invest appropriate time, resources, and strategic planning to domestication efforts in the form of careful wild candidate selection and screening, collection, characterization and pre-adaptation of germplasm. Funding agencies should recognize the need to support multilocation, multi-trait breeding of new domesticates even after successful DG improvement.

One practical recommendation for accelerating de novo domestication of food crops and bypassing some stages described above is to focus on species that have been domesticated as forage, ornamental, medicinal, bioenergy or timber crops. These are species that have responded to agricultural conditions and human management and may have been further selected by breeders for adaptation to managed environments and tolerance of multiple soils and climates. Specific agronomic recommendations and laboratory protocols may be available. Last, but not least, the plant may already be culturally familiar to farmers and policy makers. Presumably, if it has become a successful forage or other kind of crop, this species has been found by humans to be pleasant to work with, non-invasive, and aesthetically compatible with rural landscapes. Forage crops do not require most of the traits conferred by DGs, yet traditional breeding has improved forage yield and usefulness to farmers.

In contrast, even locally adapted native species have been difficult to use as planted forages without some domestication (Morrison, 2016). Despite a wealth of native grass species, not a single native species is used widely enough to merit keeping statistics on the acreage of seed production in the United States (National Agricultural Statistics Service, 2019). Interestingly, the National Agricultural Statistics service primarily differentiates hay acreage as "tame," "wild," or "alfalfa" (National Agricultural Statistics Service, 2019), supporting our view that few forage

species have achieved deep cultural value (**Figure 2**). Poor seedling establishment and vigor is an example of a trait that constrains adoption of forages (Vogel, 2000). DGs leading to increased seed size could improve these traits but about half of the variation in seedling vigor is unrelated to seed size (Vogel, 2000). The value of recurrent selection for the performance of both native and introduced forages illustrates the fact that the micro-environments created by the human-crop mutualism are novel (DeHaan et al., 2007; Purugganan, 2019).

Thinopyrum intermedium (Host) Barkworth & D.R. Dewey (intermediate wheatgrass, hereafter) is being developed as a new perennial cereal grain marketed under the trade name Kernza (DeHaan and Ismail, 2017). This new crop is an example of a candidate for DG editing (DeHaan et al., 2020) that builds upon an established forage with extensive germplasm enhancement and multitrait breeding. Intermediate wheatgrass was collected from the wild in the former USSR in the 1930s (Hanson, 1959). Forage varieties were developed in the United States and Canada from 1945 onward by selection from the wild germplasm for vigor and seed fertility (Vogel and Hendrickson, 2019). Recurrent mass selection with controlled interpollination of selected genotypes was used to increase seed yield in the 1950s and other programs emphasized lodging resistance, plant health, productivity, forage quality and broad adaptation (Vogel et al., 1993). By 1988, 500 tons per year of seed was being produced for sale in Saskatchewan alone (Saskatchewan forage seed development commission, 1998). In response to the suggestion that new perennial grains could conserve soil, the Rodale Institute began evaluating wild perennial grasses as candidates in 1983 (Wagoner and Schaeffer, 1990; Wagoner and Schauer, 1990). After scoring ten agronomic traits, including seed size, seed shattering, and stem lodging, intermediate wheatgrass was identified as the top candidate and two cycles of recurrent selection for improved yield and harvestability were performed at the USDA-NRCS Big Flat Plant Materials Center (Cox et al., 2002). Forage varieties were included along with wild accessions in the starting population from which the Rodale Institute began its selections (Wagoner and Schauer, 1990). In the early 2000s, The Land Institute revived the intermediate wheatgrass domestication program and has since performed nine cycles of recurrent selection; three cycles have been completed at the University of Minnesota (Crain et al., 2020). Recent cycles of selection use genomic selection to accelerate genetic gains for domestication traits and cereal grain yield (Zhang et al., 2016).

Intermediate wheatgrass is a challenging species for genome editing. No transformation protocol is available, the species is highly heterozogous, the genome contains more than 11 million base pairs, and the genome is a complex allohexaploid (DeHaan et al., 2020). Domestication phenotypes due to knockouts in this species may require obtaining plants with all six alleles in the edited non-functional form. Such an effort involves not only completing gene edits but also possibly breeding them to homozygosity. Clearly, intermediate wheatgrass would not be a candidate for DG editing if the crop had not previously undergone more than three decades of breeding and agronomic work with the objective of obtaining a successful perennial grain crop. In essence, DG editing of this species has potential to dramatically improve its functionality as a grain crop by providing breakthrough improvements in one or two domestication traits because the species has already been the target of numerous breeding, agronomy, and utilization efforts (Tyl et al., 2020).

Genetic studies with intermediate wheatgrass thus far have revealed both the activity of known domestication genes from other grain crops in this new species, but also the potential challenges to utilizing such genes. Larson et al. (2019) identified 42 candidate genes in intermediate wheatgrass that can influence traits relevant to domestication. These genes are potential targets for selection or editing. However, even for the traits with highest heritability, such as free threshing ability, 16 significant markers associated with the trait were detected. This indicates that a large number of genomic regions were found to influence the trait, rather than a single highimpact locus. Furthermore, the full marker set only explained about 25% of variation in some environments, and marker effects depended upon environment in many cases. Therefore, even for simple domestication traits, there can be substantial interaction between genes and environment, and influence of a particular allele may depend on genetic "background," or interaction with the rest of the genome. These realities could greatly increase the complexity of genome editing approaches; perhaps particular allele forms will be necessary to override gene by environment interaction, or careful breeding will be necessary to preserve a complementary background genome that will not override the novel domestication allele introduced through editing.

## AGRONOMIC CHANGE

The increased value of a domesticated plant to humans, compared with its wild ancestor, depends on its altered phenotype (e.g., large seeds, high seed yield, and reduced branching). Variation in plant phenotypes result from both differences in environment (E), genes (G) and the interaction of G × E. From the perspective of growers, the exact contribution of G or E or G × E is irrelevant; the final phenotype is relevant. In addition to G × E, agronomic management (M) changes highly relevant aspects of the environment (Hatfield and Walthall, 2015; Wang et al., 2019). There is variation in management and this variation can be transmitted across human generations, constituting a necessary component of domestication processes.

Agronomic change occurs when farmers learn to produce specific abiotic and biotic environmental changes that enhance plant productivity and harvestability (Altieri, 2004). Some of these changes are more-or-less permanent, whereas new seed must be planted each year. Perhaps this is one reason why we more often think about the genetic than the agronomic contributions to domestication. We can grow crops side by side with ancestors whereas our ancestors modified vast landscapes to increase and stabilize crop yield. They drained wetlands, terraced mountains, cleared forests, and killed off large wild herbivores and it is difficult to experimentally replicate these changes at research stations.

Genetic change in plant domestication is interdependent with the actions taken by early farmers (intentionally or unintentionally) to prepare an environment that was consistently favorable for plants they preferred (Doebley et al., 2006; Clement et al., 2015; Altman and Mesoudi, 2019; Edwards et al., 2019; Mueller, 2019). The iterative development of plants makes their final size, shape and fecundity extremely plastic compared with animals. The influence of environment upon the realized yield of all crops – regardless of their yield potential in optimal environments – is so intuitively understood by any gardener or farmer, so central to the purpose of entire agronomy departments, and so inherently complicating to plant breeding that perhaps it is easy to overlook when considering de novo domestication. As an example, weed control alone doubles soybean and corn yields in the corn/soy region of North America (Soltani et al., 2016, 2017) and this estimate was made for crops otherwise managed according to modern best-practices. Without high quality seed, uniform seed placement and row spacing, well-managed fertility, etc., weeds would likely have been more competitive.

Agronomic change encompasses a wide range of ecosystem alterations, from the straightforward replacement of native vegetation with crop species, to dramatic interventions in a wide range of factors that affect crop productivity, including water, temperature, nutrient, weed, insect, and pathogen management. Before the fossil fuel era, agronomic changes were primarily "ecological" in that farmers manipulated key ecosystem patterns and processes to favor productivity per unit human or animal labor (Smil, 2018). Examples of such practices included the use of crop rotations, the integration of legumes, use of periodic flooding, and as mentioned, elimination of competing vegetation with weeding (Gliessman, 2015). The discovery and rapid increase in availability of fossil fuels profoundly relaxed the energetic constraints on possible agronomic practices. People have successfully employed fossil-fuel based strategies to relax or eliminate almost every limiting factor to crop productivity, including synthetic nitrogen, biocides, plastic row covers, pumping and transportation of water for irrigation, and more (Pimentel and Pimentel, 2007). As we once again move toward cultural expectations of reduced fossil fuel dependence, we see agronomic practices shifting as well from approaches that maximize fossil fuel-dependent agronomic practices in order to maximize yields, toward agronomic solutions that maximize ecological intensification in order to optimize yields (Crews et al., 2016).

### Value of Recognizing Agronomic Change Involved in Domestication

Wild species being introduced into an agricultural environment are likely to exhibit dramatic plasticity as they are released from some forms of herbivory and competition from other plant species. However, not all species or genotypes are equally plastic; some may require additional environmental modification to thrive. Since farmers are in the business of modifying the environment to benefit plants, it makes sense to screen de novo domestication candidates in conditions characteristic of the target agricultural system, whether that be an irrigated paddy or an orchard, a high-input cash plantation or a low-input farm. Plants without the plasticity to adapt to the target environment are poor candidates for domestication.

Learning the ideal conditions for each candidate will require either rigorous agronomy or trial-and-error experimentation by farmers. Time and funding to permit adequate characterization of crop candidate response to variations in soil texture, fertility, water availability, cold stratification and vernalization as well as other ecological factors should be included in any de novo domestication project.

For wild plants never previously used as crops, the research investment in agronomic drivers of domestication may be substantial and time-consuming – just as with the research investment into genetic change. Recognizing this may prompt de novo domestication research teams to attend to agronomic change and introduce a completely wild plant species as a crop for forage, erosion-control, pollinator habitat, or niche high-value use of seeds or fruits (medicinal, specialty vegetable or flavoring, cosmetics) prior to investing in genetic changes necessary for a major food crop.

Intermediate wheatgrass domesticators in the 1980s benefitted from planting practices, weed control, and fertilization that are recommended to farmers and researchers (Wills et al., 1998). Some of these recommendations may turn out to be counterproductive for seed producers vs. their intended audience of forage producers, but valuable insights for driving domestication via managing a new perennial cereal crop were available from forage grass seed producers (Horton et al., 1990; Saskatchewan forage seed development commission, 1998; Hybner and Jacobs, 2012; Kruger, 2015).

# CULTURAL CHANGE

From an evolutionary perspective, domestication is an example of co-evolution between mutualists. Genetic changes in human partners during this co-evolutionary process are modest, though documented (Altman and Mesoudi, 2019). However, human culture can evolve by accumulating complex behaviors and transmitting them to successive generations (Altman and Mesoudi, 2019). Some of these behaviors relate to the knowledge, practices and tools used in tending crops, i.e., resulting in agronomic change as described above.

Beyond applied horticultural tools and knowledge, other cultural changes are required to enable and sustain a deepening, long-term human relationship with a new crop. Past human cultural evolution enabled initial domestication and agricultural niche construction (see review in Laland (2017). Now, cultural change describes the human community valuation of a species in social, economic, and culinary terms that may be necessary for de novo domestication processes to begin, continue, and succeed in bringing a new crop into widespread cultivation and use.

Human management of plants includes a variety of activities – gathering, tolerating, enhancing, and protecting in situ as

well as ex situ sowing and transplanting – which suggests multiple possible routes for domestication processes (Casas et al., 1996; Lins Neto et al., 2014). Sociocultural factors shape these human choices about management and thus, for example, which plant species are engaged in incipient landscape domestication (Betancurt et al., 2017). Diverse cultural knowledge and methods of enculturation also shape how innovation, including domestication processes and products, are pursued and maintained across generations (Larson et al., 2014) with both negative and positive social and ecological implications (Scott, 2017; Swanson et al., 2018), such as the development of cooperative behavioral norms (Zeder, 2016).

The ongoing co-evolution that defines the domestication relationship must be sustained, on the human side, by society. Some degree of cultural change in response to a potential new crop, such as at the community scale, may be required in order to influence human resource allocation (Zeder, 2015) and successfully build investments to pursue agronomic and/or genetic change in terms of breeding and management experimentation, training of farmers, and the development of food processing, storage and distribution techniques.

Vaccinium macrocarpum (cranberry, hereafter) is an example of successful, ongoing crop domestication accomplished primarily through human cultural and agronomic change with minimal plant genetic change so far. A native North American fruit crop long wild-harvested by indigenous peoples and then adopted by European settler colonists, cranberries were valued as winter sources of calories and nutrients, especially vitamin C. They were so important that laws were passed banning the collection of the wild fruits prior to a certain date each fall (Klingbeil and Rawson, 1975). Though formal documentation is lacking, we consider it likely that indigenous peoples managed wild cranberry stands for increased yield (Doolittle and Mabry, 2006).

Agronomic change in cranberry domestication was advanced when transplanting of wild vines began in 1816 (Eck, 1990) and as cultivation spread with increasing economic returns by midcentury (Peltier, 1970). Over the next 100 years, the "big four" cultivars of selected clones from wild genotypes were involved in increasing agronomic and cultural change, including: managing cranberry hydrology using dikes and canals, plant propagation through sanding, protection from biotic (insects) and abiotic pressures (cold) by flooding, and harvest using specially designed cranberry rakes, barrels, crates, and eventually machines (Cole and Gifford, 2009); and organizing into cooperatives (i.e., Ocean Spray), who in turn developed new cranberry products (juices, sauces), creating cranberry growers' associations, establishing cranberry experimental stations, planning festivals to celebrate cranberry flowering and harvest, and establishing traditions like crowning an annual cranberry queen (Peltier, 1970; Eck, 1990; Schlautman, 2016).

Nearly all the agronomic and most of the cultural change widely recognized and practiced were accomplished by the time genetic change (artificial selection) produced any sort of germplasm that was adopted by cranberry growers. The first cranberry breeding program was initiated in 1929 after growers and scientists had been unable to address the

problem of cranberry false blossom disease through agronomic domestication (Eck, 1990). In 1950 the first cranberry cultivars were released after one cycle of selection from the 1929 program. A few of these varieties were eventually adopted because of their increased productivity and their perceived improved yield stability, especially cultivar "Stevens." Most cranberry breeding programs were abandoned shortly after the 1950 releases. Rutgers University started a program in the late 1980s, the University of Wisconsin started a small program in the 1990s, and a Wisconsin grower independently started a private breeding program in the 1990s (Schlautman, 2016). The next set of varieties with improved productivity were released beginning in the early 2000s (Fajardo et al., 2013), and those varieties are being adopted and planted (Gallardo et al., 2018). However, cranberry growers still plant older varieties and even continue to plant and harvest beds of cranberry clones selected from the wild in the early 1800s (Vorsa and Zalapa, 2019).

DG traits have yet to be identified in cranberry. A recent survey revealed that cranberry growers perceived fruit quality traits and, specifically, fruit firmness as needing the most attention from breeders. This is likely in direct response to premiums being paid by cranberry processors for fruits that meet standards for sweetened dried cranberry processing, while prices for cranberries used for juice concentrates (i.e., normal to low fruit quality) have decreased. Of secondary importance were traits related to disease resistance (i.e., fruit rot), abiotic plant stresses, and insects. Interestingly, yield and productivity were not assigned as important priority, nor were fruit quality traits related to shelf-life, flavor, and sweetness (Gallardo et al., 2018). The development of a "sweet" cranberry could be the DG and the genetic change the cranberry industry needs to propel the emergence of a new cultural change via the creation of a cranberry fresh fruit market and consumer demand.

As this extended consideration of cranberry shows, integration of a de novo domesticate into a culture's cuisine and food rituals and norms may be needed to continue the domestication process and sustain research investments in terms of agronomic and/or genetic change. While the exact size or appearance of newly domesticated food crops is difficult to predict prior to domestication, investigations into flavor, digestibility, toxicity and nutritional profile can help identify candidates or reveal additional biochemical pathways that need to be targeted for genetic change. Genetic and agronomic change remains central to work on intermediate wheatgrass, as described earlier. But a small amount of cultural change was seeded early on when the Rodale Institute began testing the nutritional and cooking properties of this new grain (Becker et al., 1991) and marketed it as "wild triga" (Wills et al., 1998). Likewise, The Land Institute and collaborators also conducted food science research (Marti et al., 2016; Banjade et al., 2019; Tyl et al., 2020). In line with previous efforts that led to the name wild triga, current marketers have expressed the need for a brief and memorable name. The Kernza trade name was initiated for this purpose, and Kernza branded grain has now been used in several restaurants, food products and beers. Customer demand for Kernza graincontaining products could, in the future, motivate corporate



Tesdell, O. (editor). 2018 Palestinian Wild Food Plants.

and governmental investment in further genetic and agronomic components of domestication.

### Value of Recognizing Cultural Change Involved in Domestication

As with agronomic change, attention to cultural change invites essential research and engagement with the humans who will grow, eat, trade, and continue to develop new crops. Without social engagement and support, appropriate tools and agronomic practices, or broadly adapted varieties, new crops will remain local and/or niche crops. Without a loyal constituency advocating for sustained research investment, they could easily be abandoned when new diseases appear or as the climate or the political and economic landscape changes. To produce the food security and agroecological benefits that new crops could provide, new crop acreage must eventually be substantial.

In order to enable ongoing social and economic support and later adoption, researchers should identify de novo domestication candidates that have the best chance of producing a positive return on the investments needed to domesticate and market them. Cultural and culinary uses of potential candidate plants may be documented in ethnobotany (Schlautman et al., 2018). From ethical, legal, and political standpoints as well as from a practical perspective, human relationships with candidate plants are relevant considerations in decisions about plant species' suitability for domestication. Acknowledging the reality of cultural change affirms the ongoing importance of in situ conservation of landraces and of wild and weedy species on farms with farmers for new crop domestication, as human social factors are components of these complex systems (Casas et al., 2007; Bellon et al., 2017).

Recognition of established cultural components relevant to domestication can help identify promising opportunities for new crops. In Palestine, site of some of the earliest plant and animal domestication, wild plant gathering has continued to support human communities for millennia (Ali-Shtayeh et al., 2008; Eghbarieh, 2017; Research Collaborative, 2018). Wild food plants (**Table 2**) are important for traditional Palestinian cuisine and for sustenance during climatic and political crises (Tesdell et al., 2019). Palestinian people already value these plants in social, economic, and culinary terms (Tukan et al., 1998; Marouf et al., 2015). Recognition that these relevant cultural drivers for domestication may be in place is leading local groups to begin to collect seeds of edible wild plants and to explore ways to encourage more widespread cultivation beyond recent existing cultivation and transplanting practices (Doolittle and Mabry, 2006).

Several wild food plants could be used as grain legumes or oilseed grains, in addition to their current use as vegetables (Tesdell et al., 2020). Examples include Lotus palaestinus Blatt., Pisum fulvum Sibth. & Sm., and Gundelia tournefortii L. ('akkoub, hereafter). 'Akkoub, a wildcrafted and increasingly cultivated plant, offers opportunities as a perennial vegetable and possible oilseed among other uses. In addition, many crops remain widely foraged in Palestine and several surrounding countries. **Table 2** offers a few prominent examples of Palestinian foraged plants of a list that could include more than 200 documented wild food plants to date. Their Arabic local names vary by region, but these wild plants persist as highly culturally important foods. Existing communities' cultural valuation of this candidate plant could help enable and support pursuit of agronomic and genetic changes involved in de novo domestication.

In other de novo domestication efforts, it may be necessary to catalyze cultural change. Public interest, motivation, and support for de novo domestication may not be gained through simple, one-time, or purely rational means. Instead, building and sustaining the cultural change needed to support new crops

may involve navigating a complex web of ongoing processes that include the affective, educational, ethical, legal, narrative, political, social, and other dimensions of human experiences and structures. Like scientific legitimacy more generally, agroecological legitimacy may be accomplished by meeting "credibility tests" and wisely engaging in this complex web of social life (De Wit and Iles, 2016). Honest acknowledgment and critical examination of value influences across the scientific research and development process, from inquiry to application, will likely be necessary for successful domestication of new crops – especially via gene editing (Elliott, 2017).

Participatory research and citizen science methods (Mendez et al., 2013; Ryan et al., 2018) – such as the involvement of many, geographically distributed farmers at an early stage of domestication – could help build initial communities that inform broader cultural support for the plant species. Introducing a potential food crop as a forage or specialty crop first may have the advantage of greatly increasing the number of farmers and other social groups involved and, at the same time, increasing the number of agricultural environments in which the species can be evaluated and selected.

One example of public support for new crop valuation is the Forever Green Initiative, developed at the University of Minnesota based on the principle that universities must engage with multiple stakeholders to realize their mission. When multiple stakeholders are considered, the broader societal impacts of agriculture must be considered. When agriculture threatens wildlife habitat or the provision of clean water or becomes a source of greenhouse gases, a broad base of stakeholders has the opportunity to support research directed at finding solutions to these challenges. The Forever Green Initiative has undertaken new domestication programs that aim to develop crops that will enable agricultural production to provide broad public goods. These efforts are gaining widespread interest and support (Runck et al., 2013; Anderson, 2014; Kleinhuizen, 2017; Haspel, 2019).

Documenting the ability of domestication candidates to provide regulating, cultural, and supporting ecosystem services (such as maintenance of soil and water quality, wildlife habitat, pollinator resources) could help build the case for social and economic valuation of new crops, including governmental support for continued research, farm payments, or the ability to charge consumers premium prices (Leakey and Asaah, 2013; Runck et al., 2014; Smýkal et al., 2018). As Kernza perennial grain has entered the market through the introduction of a few specialty products, abundant opportunities have emerged for storytelling around the new grain and its benefits. Messages about the ecological benefits of the grain have been placed directly on packaging such as beer cans and cereal boxes. Niche products have created a crowd funding project (Pierce, 2019) and generated numerous news reports (Bland, 2016; Garfield, 2016; Ostrander, 2017). Although this grain needs much additional research to enable profitable production at larger scale, niche specialty marketing boosts awareness and support, potentially inducing sustained funding.

In the case of de novo domestication that utilizes gene editing, special consideration should be given at an early stage to barriers to cultural adoption. Technological improvements in traditional crops have sometimes failed for cultural reasons. The reaction against transgenic foods leading to their non-adoption in Europe and elsewhere is an excellent example of failure to achieve cultural modification in support of a genetic approach (Kloppenburg, 2010). Social and educational researchers should investigate public perception and knowledge of the potential future domesticates via gene editing and consider strategies to foster informed public engagement in social decisions related to new crop domestication (Østerberg et al., 2017).

### INTEGRATING DRIVERS OF CHANGE IN DOMESTICATION

In contrast to the idea that gene editing can nearly instantly and/or inexpensively produce domestication, the established literature and our examples of intermediate wheatgrass, 'akkoub, and cranberry demonstrate the multiple components involved in long-term domestication. Genetic, agronomic, and cultural changes interact over time to drive domestication processes. To illustrate lessons learned about how these changes are entangled in complex ways, we turn to a final personal narrative example of a current de novo domestication process. This narrative is not intended to provide recommendations for other new crop candidates, as each will enter the domestication "pipeline" (DeHaan et al., 2016) at a different stage and with different liabilities.

The Land Institute (Salina, KS, United States) has been domesticating Silphium integrifolium Michx. (silphium, hereafter) as a perennial oilseed crop since the early 2000s (Van Tassel et al., 2017). Human cultural valuation and ecological knowledge informed the decision to begin domestication, as local silphium populations were observed to have large, good-tasting seeds and to perform well during seasonal or year-long droughts. The latter observation was confirmed anecdotally by botanists in Texas (James Manhart, personal communication) and during the droughts of the 1930s by ecologist John Weaver (Weaver et al., 1935). Seeds from several wild populations in central Kansas were collected and grown in observation plots in Salina and allowed to intermate to produce seeds. The experimental plants grew much larger than plants in nearby prairies, suggesting that silphium is well-adapted to agricultural conditions and amenable to agronomic change.

However, domestication efforts first focused on driving genetic change. The disk florets of silphium are staminate, limiting the number of seeds per head to 15–25 (equal to the number of ray florets). Therefore, the first breeding target in 2004 was to generate and identify a mutation causing bisexual disk florets or to use recurrent selection to increase the number of ray florets (Van Tassel et al., 2014). Mutagenesis was attempted but no mutants with the phenotype of interest were recovered. However, several cycles of recurrent visual selection on ligule number (ray floret number) were successful in feminizing the heads of silphium to a point where some individuals were almost malesterile. In 2012, parents for new cycles of selection were made for partial feminization plus increased achene (hereafter "seed")

size and head diameter. Recurrent selection for non-dormant seed was attempted and reduced shattering was a breeding goal. In short, traits comprising the classic domestication syndrome were targeted, albeit mostly with the expectation of finding highly polygenic control of these traits as reported in sunflower (Burke et al., 2002). These included a greater number of larger heads and seeds, non-dormant seeds, and higher seed oil content.

In 2014 and every year thereafter, plants in research plots in Salina Kansas were infected with rust (Puccinia silphii) and other pathogens that have yet to be conclusively identified. Eucosma giganteana, a specialist moth whose larvae are head and crown parasites of several Silphium species, also appeared and became a serious pest with colonization of heads approaching 100% in many situations. These pests and pathogens may be more common in more humid environments and did not become common in Salina until we had created favorable microclimates (lush, dense stands) and large populations of host plants (Turner et al., 2018; Vilela et al., 2018).

The pest and pathogen pressures have made selection for yield or domestication traits difficult in Salina. In genetic terms, improved strains and breeding populations at The Land Institute have little variation for resistance to several pathogens and pests and although these populations indeed have higher seed yield than wild silphium, increased yield is accompanied by increased plant size. Standard harvesting machinery does not work as well with taller plants with stouter stalks, and increased biomass almost certainly implies greater water use. In agronomic terms, attempts to scale up plot sizes by direct seeding have had highly variable success and even transplanting in new fields has occasionally failed for unknown reasons. Together these limitations have made it hard to obtain quantities of seed needed for food or feed science research – i.e., as a strategy to drive cultural valuation.

Renewed collection of wild silphium from throughout its natural range has revealed individual plants or populations that appear to have greater resistance to a number of pests or pathogens; some also appear to be shorter and have more slender stems that may be more compatible with combine harvesters. However, these ecotypes from south or east of Kansas have much smaller seeds and smaller, less feminized seeds and heads.

Looking back, we see that the western (Kansas) ecotype attracted us (humans) with its unusually large heads and seeds. However, the initial decision to focus exclusively on genetic change for classic domestication traits, especially seed and head size, at this point distracted us from first diversifying the genetic base of the breeding populations (which would have meant reducing average seed and head size) or discovering specialized agronomic management practices as would have been necessary to first achieve broad adaptation or cultural adoption of silphium as a forage or other specialty crop.

In the United States, this kind of work has been done for many new crops, beginning with the USDA's Section of Seed and Plant Introduction in 1898 and in partnership with germplasm "banks" and agricultural universities (Hyland, 1977). However, many native plants, including Silphium integrifolium have not received such attention and perhaps we should have recognized the need to perform this work ourselves and spent several years collecting more diverse germplasm, increasing it, and testing it in multiple environment to identify vigorous, resistant, and stable lines.

Another alternative scenario would have been to prioritize agronomic physiology and horticultural research over genetic selection during the initial stage of domestication. Although average seed yield in Kansas was about 300 kg/ha (prior to pest outbreaks), 1600 kg/ha was achieved in one year and one location in Minnesota (Schiffner, 2018). Clearly, the species already has the genetic potential to yield well. We might have been better served spending the time to understand what environmental conditions contribute to the "yield gap" between this potential and average yield than rushing ahead with selection for increased yield potential.

Some domesticated plant phenotypes can be produced through careful management of wild germplasm. For example, surgical manipulation of plant size and branching pattern is an ancient art still actively practiced with many woody tree and vine crops. We have had some promising results trimming of silphium stalks to reduce stalk height and improve architecture, but the procedure is very sensitive to pruning height and timing. This is an excellent example of how humans develop detailed, speciesspecific horticultural practices, formally or informally, much of it by trial-and-error.

Furthermore, there may be some advantages in retaining developmental plasticity for plant height and branching pattern, relying on skilled farmer management to produce harvestable, seed-productive phenotypes. This is because silphium has nutritious, palatable foliage and left unpruned could be used as a forage crop. Being able to use management to switch between forage and oilseed production could allow farmers to respond to weather or market fluctuations. Deliberately alternating between allowing heads to produce seeds (oilseed) and chopping the biomass prior to seed set (forage) may interrupt the lifecycle of specialist insects.

Finding another use for silphium prior to pursuing genetic changes necessary for domestication as a perennial oilseed crop could allow us to expand the number of people interacting with this species and the number of environments in which it could be tested. As an expanding ornamental, pollinator crop, or forage crop, seed production practices would still need to be developed to make seed more available and affordable. This demand could stimulate agronomic innovation in crop management and agricultural engineering and begin the process of building traditions, networks, companies and other social structures needed for the long-term success of a new crop.

These lessons learned about the public interest and support that will be needed to sustain agronomic and genetic change in silphium have encouraged us to experiment with methods to investigate and initiate cultural change. Citizen science has been identified as a way to meet "grand challenges" in agriculture (Ryan et al., 2018). Providing people with access to potential new crops-in-process could help build interest and increase knowledge and awareness. Participants could become a publicly visible and informed network of collaborators. Learning outcomes gained through long-term citizen science projects related to new crop domestication could catalyze and fuel the cultural change needed to enable and sustain domestication.


TABLE 3 | Summary of examples of genetic, agronomic, and cultural change for plant domestication projects discussed in this article.

We began a silphium civic science pilot community in 2019. More than 40 people in 18 states joined. Like farmers who participate in on-farm research, participants in our silphium civic science community grow seedlings in backyards, gardens, farms, and public spaces and collect data over multiple seasons in a range of environments. They also share and reflect on their own personal experiences, interests, and questions about silphium's relationship with insects, soils, mammals, and humans – including its future uses by humans. Participants receive educational materials and can interact and learn with plants, with each other, and with scientists and educational researchers. While further research is needed, preliminary results indicate positive engagement.

Silphium is not currently a good candidate for gene editing because its large, intractable genome has limited genetic and genomic research in the species, because obvious monogenic DGs have not been identified in its closest crop relative, H. annus (Dowell et al., 2019), and because pests, diseases and soil fertility prevent silphium from achieving its existing seed yield potential in field conditions. Rather, silphium reaffirms the persistent need for science and society alike to understand the complexity of new food crop domestication. Successful de novo domestications will likely be the result of diverse, adaptive approaches to integrating genetic, agronomic, and cultural drivers of change over time.

### CONCLUSION

In the preceding sections we have highlighted how de novo crop domestication, ranging from historical to contemporary, can be viewed as the varied and ongoing interaction of three factors of innovation: genetic, agronomic, and cultural. Gene editing approaches to domestication using DGs risk portraying a misunderstanding of de novo crop domestication as something humans do to wild plants in a simple onetime event, in a single gene or generation, using increasingly cheap technology, and drawing from just a few scientific disciplines. In contrast, de novo domestication literature and examples (**Table 3**) understand domestication as co-evolutionary interactions between plants and peoples that are complex, stretch across generations indefinitely, may require significant institutional and infrastructural investments, and can involve many disciplines and ways of knowing. This knowledge contextualizes gene editing approaches by remembering the additional work and integrated knowledge needed to accomplish goals that are shared across research communities: the accelerated domestication and widespread cultivation of a new generation of soil-conserving and climate-smart crops.

In the age of gene editing and at this moment of decision-making about domestication pathways, new food crop domestication might be viewed as analogous to a new public health campaign, in all its complexity. Narratives calling for new vaccines are exciting and easily communicated to private and public investors. But advances in vaccine molecules must be accompanied by sustained monitoring of virus evolution and transmission, development of improved vaccine delivery technology, support for clinics and vaccination campaigns and cultural work toward acceptance and use of vaccinations. Investment in novel recombinant or DNA vaccines could accelerate vaccine development or enable vaccination for new diseases, but this investment will not be worthwhile if vaccines are rejected on cultural or religious grounds, if vaccines are not reformulated to reflect pathogen evolution, or if vaccines can't be efficiently delivered to vulnerable populations. Similarly, recognizing the continuing importance of interacting genetic, agronomic, and cultural drivers to successful de novo domestication, we call on funding agencies, proposal reviewers and authors, and research communities (of new crop

domesticators, growers, and consumers) to value and support agroecology, agronomy, plant breeding, and participatory, interdisciplinary, and transdisciplinary approaches alongside genetics and genomics.

# AUTHOR CONTRIBUTIONS

DV, AS, BS, OT, MR, LD, and TC: framework conceptualization, research, writing, and editing. DV: silphium and intermediate

# REFERENCES


wheatgrass case studies. AS: silphium case study. BS: cranberry case study. OT: Palestinian wild foods case study. LD: intermediate wheatgrass case study.

# FUNDING

This work was made possible through the charitable donations of multiple persons ("Friends of the Land") and organizations given to The Land Institute, Salina, KS, United States.



Wheatgrass-Seed-Production-A-Literature-Review.pdf (accessed March 22, 2020).




**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Van Tassel, Tesdell, Schlautman, Rubin, DeHaan, Crews and Streit Krug. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genetic Analysis of Domestication Parallels in Annual and Perennial Sunflowers (Helianthus spp.): Routes to Crop Development

Sean R. Asselin1,2 \*, Anita L. Brûlé-Babel<sup>1</sup> , David L. Van Tassel<sup>3</sup> and Douglas J. Cattani<sup>1</sup>

<sup>1</sup> Department of Plant Science, University of Manitoba, Winnipeg, MB, Canada, <sup>2</sup> Agriculture and Agri-Food Canada, Swift Current Research and Development Centre, Swift Current, SK, Canada, <sup>3</sup> The Land Institute, Salina, KS, United States

### Edited by:

Petr Smýkal, Palacký University Olomouc, Czechia

### Reviewed by:

Aleksandra Dimitrijevic, Institute of Field and Vegetable Crops, Serbia Alejandro Presotto, Universidad Nacional del Sur, Argentina

\*Correspondence:

Sean R. Asselin Sean.Asselin@Canada.ca; sean.asselin@gmail.com

### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 23 March 2020 Accepted: 25 May 2020 Published: 12 June 2020

### Citation:

Asselin SR, Brûlé-Babel AL, Van Tassel DL and Cattani DJ (2020) Genetic Analysis of Domestication Parallels in Annual and Perennial Sunflowers (Helianthus spp.): Routes to Crop Development. Front. Plant Sci. 11:834. doi: 10.3389/fpls.2020.00834 Parallels exist between the domestication of new species and the improvement of various crops through selection on traits which favor the sowing, harvest and retention of yield potential and the directed efforts to improve their agronomics, disease resistance and quality characteristics. Common selection pressures may result in the parallel selection of orthologs underlying these traits and homologies between crop species can be exploited by plant breeders to improve germplasm. Perennial grains and oilseeds are a class of proposed crops for improving the diversity and sustainability of agricultural systems. Maximilian sunflower (Helianthus maximiliani Schrad.) is a perennial crop wild relative of sunflower (Helianthus annuus L.) and a candidate perennial oilseed species. Understanding parallels between cultivated H. annuus and H. maximiliani may provide new tools for the development of Maximilian sunflower and other wild relatives of sunflower as crops to enhance functional diversity in cropping systems. F<sup>2</sup> populations of Maximilian sunflower segregating for traits associated with the domestication ideotype of cultivated sunflower including branching architecture, capitulum morphology and flowering time were developed to investigate parallels between H. maximiliani and H. annuus. Genotype-by-sequencing (GBS) was employed to genotype novel Maximilian sunflower populations and perform quantitative-trait-loci (QTL) analysis. A total of 11 QTL in five regions were identified across 21 linkage groups using 4142 GBS derived single nucleotide polymorphism markers called using the sunflower reference genome as a guide. A major QTL on linkage group 17b, associated with aspects of floral development and apical dominance, was discovered and corresponds with a known domestication QTL hotspot in H. annuus and candidate genes were identified. This suggests the potential to exploit orthologs for neo-domestication of H. maximiliani for traits such as branching architecture, timing of anthesis, and capitulum size and morphology for the development of a perennial oilseed crop from wild relatives of cultivated sunflower.

Keywords: domestication, oilseeds, perennial grains, comparative genomics, ecosystem services, genotype-bysequencing, functional diversity, sunflower

# INTRODUCTION

fpls-11-00834 June 11, 2020 Time: 20:50 # 2

Annual crops comprise an estimated 60–80% of global cropland and approximately 75% of calories consumed by humans come from four annual grain crops: maize, wheat, rice, and soybean (Lobell et al., 2011). Targeting perennial species and integrating them into agroecosystems dominated by annual crops has been suggested as a method of enhancing functional diversity (i.e., the number of functionally disparate species), ecosystem function (Isbell et al., 2011), and ultimately, productivity through the introduction of new ecosystem services (Asbjornsen et al., 2014). A number of candidate perennial grain and oilseeds crops have been proposed through both the hybridization with annual crops and the identification of wild species with favorable characteristics for neo-domestication (Wagoner and Schaeffer, 1990; DeHaan et al., 2016).

Maximilian sunflower (Helianthus maximiliani Schrad.) is a perennial crop wild relative of cultivated sunflower (Helianthus annuus) and is native to much of the interior plains of North America, with a range extending from southern Canada to northern Mexico (Kawakami et al., 2011, 2014; Tetreault et al., 2016). Maximilian sunflower has been used in range seeding mixtures for high quality livestock forage, as a perennial filter strip to reduce agricultural run-off, and as a source of wildlife food and habitat (Dietz et al., 1992; USDA-NRCS, 2017). Maximilian sunflower produces an oil rich seed safe for human consumption primarily composed of linoleic and oleic acids and surveys of wild populations have shown an oil content of 31.1% (Dorrell and Whelan, 1978; Seiler, 1994; Seiler and Brothers, 1999). The natural range of Maximilian sunflower, along with its phenotypic divergence (Kawakami et al., 2011; Asselin et al., 2018), high seed production potential, and documented resistance to known common pathogens of annual sunflower (H. annuus L.) such as Sclerotinia rot [Sclerotinia sclerotiorum (Lib.) de Bary] (Taski-Ajdukovic et al., 2006; Liu et al., 2011), has attracted attention to the species for the development of a perennial oilseed. Maximilian sunflower is a candidate species for the development of dual-use perennial crop for edible oil, forage and bioenergy applications and is targeted for domestication and improvement (Cox et al., 2002; Van Tassel et al., 2014; Asselin et al., 2018).

Efforts to improve of Maximilian sunflower has, until recently, been focused on its use for conservation and rangeland applications or as a trait donor for H. annuus. Two open pollinated commercialized cultivars, "Aztec" and "Prairie Gold" were released by the United States Department of Agriculture Natural Resources Conservation Service (USDA-NRCS) in 1978 (Texas Agricultural Experiment Station, 1979; Dietz et al., 1992; USDA-NRCS, 2017). Aztec was developed for wildlife feed, livestock forage cover, use as a natural hedge, as filter-strips, and as an ornamental landscape plant. Prairie Gold was released for landscape reclamation and wildlife food plantings. Both cultivars were selected for vigor and stand establishment in Oklahoma and Texas (Aztec), or Kansas and further north (Prairie Gold). Agronomic research on Maximilian sunflower as a perennial grain candidate began at The Land Institute in the 1980s (Jackson, 1990). The first breeding program focused on developing Maximilian sunflower as a perennial grain was launched in 2002 at The Land Institute (Cox et al., 2002) with the intention of domesticating the species as a perennial oilseed. Selection for seed size and apical dominance has been effective and following three rounds of recurrent selection, the average seed size was increased by 240% and plants exhibiting a highly restricted branching habit were developed by 2012 (Van Tassel et al., 2014).

Crop domestication shows many parallels between species. Vavilov in the development of the Law of Homologous Series first recognized that heritable variation in common traits will occur in different species based on parallel selection pressures (Vavilov, 1922). The Law of Homologous Series states that closely related species, genera and families exhibit parallel variation in shared traits. In grains and oilseeds such as sunflower, parallel selection for a more determinate growth habit, larger inflorescence size, and increased grain size has occurred through domestication (Zohary, 2004; Purugganan and Fuller, 2009; Lin et al., 2012). A number of domestication syndrome and quality traits in annual grains are controlled by alleles with large effects and relatively simple genetic control. Domestication has led to the selection of common orthologs such as sh1 genes which contribute to shattering tolerance in sorghum, rice and maize (Lin et al., 2012); Q genes in wheat (Triticum aestivum L.) and barley (Hordeum vulgare L.) which confer the brittle rachis/free-threshing trait (Simons, 2005); and variations in fatty acid desaturase genes in oilseeds such as canola (Brassica napus L.) and sunflower (Peng et al., 2010; Chapman and Burke, 2012). Strong apical dominance and restricted branching were key traits identified for the domestication of H. annuus (Burke et al., 2002; Wills and Burke, 2007). Wild type Maximilian sunflower has many similarities to wild type H. annuus, which exhibits profuse branching, small achenes, indeterminate flowering and lacks shattering resistance (Burke et al., 2002; Wills and Burke, 2007). Maximilian sunflower individuals with restricted branching, resembling the unbranched ideotype of cultivated sunflower, have been identified in germplasm developed at The Land Institute (Cox et al., 2010; Van Tassel et al., 2014) and offer a unique opportunity to examine parallels in the domestication process in Helianthus. H. annuus and H. maximiliani are both diploid species (2n = 17) capable of hybridization and cytogenetic analysis has shown a high proportion of normal bivalents in hybrids, suggestive of high homology between parental chromosomes (Jan, 1997; Binsfeld et al., 2001). As an extension of Vavilov's Law of Homologous Series, knowledge of the domestication and genomic resources available in annual sunflower could be applied as a model for the neo-domestication and improvement of Maximilian sunflower through markerassisted-selection (MAS) for domestication orthologs.

Genotype-by-sequencing (GBS) has successfully been utilized for species in which heterologous reference genomes are available for single nucleotide polymorphism (SNP) discovery, such as for wild crop relatives of soybean (Chang et al., 2014), wheat (Edae et al., 2016), and sunflower (Baute et al., 2016; Asselin et al., 2018). The goals of this research were to examine parallels between the domestication of cultivated sunflower and the development of Maximilian sunflower as a perennial oilseed crop. This work

involved applying GBS to Maximilian sunflower using annual cultivated sunflower as a reference genome and performing quantitative-trait-loci (QTL) analysis in a novel Maximilian sunflower population segregating for apical dominance.

# MATERIALS AND METHODS

## Development of Maximilian Sunflower With Restricted Branching at the Land Institute

Seed from 96 wild Kansas populations were collected in the autumns of 1999 and 2000. Most of these were roadside populations in areas dominated by native rangeland. Ten plants from each population were transplanted to the field at The Land Institute in 2001 and evaluated for two growing seasons. One hundred and fifty individuals with higher than average seed production were selected and clonally replicated by rhizome division in 2003 for further assessment. Seed yield and seed mass data from the 2004–2005 growing seasons were used to select 20 genotypes for polycrossing. Polycrosses were performed by pollen bulking and subsequent manual pollination in 2006. Polycross seed from each of the 20 genotypes was sprouted in the greenhouse and 112 vigorous seedlings from each maternal family were transplanted to a new field in 2007. Although the purpose of this cycle was originally to estimate trait heritability using the 20 half-sib families, a single unbranched individual was found in 2007. Assuming that the unbranched trait was recessive, we planted 100 open-pollinated (half-sibling) seeds from this genotype in 2008, none of which exhibited restricted branching. Half-sibling plants were allowed to intermate and progeny were evaluated in the 2009 growing season for branching characteristics. Reduced branching individuals reappeared in the progeny, however, many of these plants exhibited reduced seed fertility. Plants with the least amount of branching and normal capitula morphology were polycrossed in 2009 and repeated cycles of visual selection for restricted branching and intermating occurred in 2010 and 2011 to develop seeds stock for populations with restricted branching. In 2012 crosses were performed between plants exhibiting restricted branching to generate full sibling families exhibiting limited to no branching.

# Development of F<sup>2</sup> Mapping Populations

The mapping populations described in this study were produced through crossing highly-branched wild-type H. maximiliani plants from Manitoba to advanced, restricted branching populations developed by The Land Institute (TLI), Salina, Kansas. Manitoban plants were selected as a parent to introduce genetic diversity to alleviate potential inbreeding depression observed in TLI germplasm and to initialize the introduction of the restricted branching trait into an earlier flowering genetic background better suited for northern growing conditions. Maximilian sunflower exhibits clinal variation in life history traits, most notably timing of flowering which occurs in July– August in Manitoban genotypes and September–October in Kansas genotypes (Heiser et al., 1969; Kawakami et al., 2011; Cattani and Van Tassel, personal communication). Full-sib families were developed for mapping traits associated with branching, capitulum size and flowering time. Five restricted branching populations from TLI ("o/i," "I/P," "I/H," "h/L," and "h/e") were initially screened under controlled environments. Plants exhibiting a complete restriction of branches, late flowering genetic background and a single large capitulum (30–40 mm in diameter) were selected as a parents for the development of the mapping populations. The Manitoba parents consisted of wild plants collected from roadside sites across southern Manitoba. Manitoban populations, previously described under growth chamber conditions in Asselin et al. (2018), exhibit extensive branching (∼11–15 branches), an early flowering genetic background and a large number (∼12–20) of small capitula which are ∼7–21 mm in diameter. A series of crosses were produced and F<sup>1</sup> seed derived from a cross between an individual from the TLI population "o/i" and a wild Manitoban accession collected near Brunkild, Manitoba (49.49◦N, 97.57W) were grown for sib-mating. The F<sup>1</sup> plants were clonally propagated from rhizome cuttings to produce materials for sib-crossing. Three F<sup>1</sup> individuals (herein denoted as F1A, F1B, and F1C) were intercrossed through reciprocal sib-mating to generate two F<sup>2</sup> populations which segregated for branching, herein denoted as crosses AB (F1A/F1B) and BC (F1B/F1C).

# Phenotypic Evaluation

### Plant Propagation

Seeds of the F<sup>2</sup> were surface sterilized using a 70% ethanol solution for 10 min, allowed to air dry, and seed coats were then manually removed to break seed dormancy. Seeds were placed in 9 cm petri dishes on filter paper moistened with distilled water containing a 0.1% solution of plant preservative mixture (PPMTM, Plant Cell Technology, Washington, DC, United States). Petri plates were placed in the dark at room temperature for 48 h to germinate. Germinated seeds were transplanted once radicles had emerged. All seedlings were started in 6-well 122.9 mL volume seeding trays filled with Sunshine #4TM soilless potting mix (SunGro Horticulture Ltd., Agawam, MA, United States) and transferred to 1 L pots containing a 2:2:1 ratio of soil:sand:peat by volume once they had reached the three to four true-leaf stage. Plants were grown in a growth chamber under a 23◦C 16-h day/18◦C 8-h night cycle. Plant positions were assigned randomly and rotated on a weekly basis to account for potential differences in airflow, humidity, and light intensity across benches within the growth chamber. Phenotypic traits measured can be broadly classified as capitulum size traits, traits relating to anthesis, traits associated with branching architecture, traits related to total plant size. Seed yield and size traits were not examined in this study directly due to the constraints of the testing location and ability to cross-pollinate all F<sup>2</sup> progeny examined.

### Trait Evaluation

Sixteen traits were phenotyped on 163 and 177 individuals from the AB and BC crosses, respectively, for a total 340 F<sup>2</sup> plants (**Table 1**). As the parental and F<sup>1</sup> individuals underwent

TABLE 1 | Descriptions of 16 phenotypic traits measured under growth chamber conditions for 340 H. maximiliani F<sup>2</sup> plants following a 23◦C 16-h day/18◦C 8-h night cycle.


photoperiod induction (resulting in skewed phenotypes), complete phenotypic analysis was conducted solely on the F<sup>2</sup> generation. Phenotyping of the appearance of reproductive buds and date of the first capitulum to reach anthesis through fifth capitulum to reach anthesis were conducted three times a week. As plants segregated for flowering time, the date of fifth anthesis was employed as a common physiological milestone at which all other traits were assessed. All nodes were counted starting at the cotyledonary node moving upwards along the main stem. Branches were measured as the total number of branches on the main stem greater than 2 cm in length. The highest branching node, a measurement of apical branching, was measured as the total number of unbranched nodes which were found above the highest branch on the apical portion of the stem. The lowest branching node, a measurement of basal branching, was measured as the number of unbranched nodes on the basal region of the stem between the cotyledonary node and lowest branch. Capitulum morphology was characterized to infer potential traits associated with shattering resistance. Shattering in wild H. annuus is in part due to the continued growth of the capitulum resulting in a convex shape (increased depth: width ratio). Domesticated H. annuus exhibits a low depth: width ratio, resulting in a flatter capitulum, less prone to shattering (Burke et al., 2002). As plants were not evaluated for seed traits with the depth: width ratio employed as a proxy for this trait.

Data were analyzed for normality using the SAS UNIVARIATE procedure to assure data meet assumptions of normality for downstream analysis. Principal component analysis was conducted to explore trait relationships on centered and scaled data using the R function PRCOMP. The number of principal components to retain was determined by examining a breakdown of eigenvectors following Cattell's rule (Cattell, 1966) and Horn's parallel analysis (Horn, 1965) was run using 1000 permutations in the R package paran (Dinno, 2012) and adjusted eigenvectors >1 were retained for analysis.

### DNA Extraction

Leaf tissues from all plants grown under growth chamber conditions were sampled for DNA extraction. Samples from each plant were labeled with an identity number and frozen in liquid N within 1 h of collection prior to lyophilization and storage at room temperature. Genomic DNA was extracted from the parental, F<sup>1</sup> and a subset consisting of 190 F<sup>2</sup> individuals, consisting of 78 individuals from the AB cross and 112 individuals from the BC cross. A modified single-tube cetyltrimethylammonium bromide (CTAB) extraction protocol (Doyle, 1987; Li et al., 2007) was employed with 1 µL of 10 mg mL−<sup>1</sup> solution of proteinase K (Promega) added to the initial CTAB incubation step. DNA quantity was determined using a dsDNA broad Range Fluorescence Assay kit and a Qubit 2.0 Fluorometer (Life Technologies) following the manufacturers' instructions for 1 µL sample sizes. DNA quality was assessed using 260/280 and 260/230 nm wavelength absorbance ratios measured using a Nanodrop 1000 spectrophotometer (Thermo Fischer Scientific). Samples that fell below a minimum DNA concentration of 50 µg/µL or absorbance ratios below 1.7 for either 260/230 or 260/280 nm were discarded and extractions were repeated to meet quality requirements.

# Genotype-by-Sequencing and SNP Calling

One-hundred and ninety-five DNA samples consisting of the Manitoba and Kansas parentals, F<sup>1</sup> parents and 190 F<sup>2</sup> plants were submitted to Data2bio LLC (Ames, IA, United States) for GBS and SNPs calling. A tunable genotype-by-sequencing (tGBS) protocol was employed, which differs from conventional GBS through the use of a greater selective genome reduction procedure (Ott et al., 2017). Fewer sites are sequenced in tGBS relative to the conventional GBS, resulting in greater read depth. The advantage of this approach is that greater depth of coverage allows for more effective calling of heterozygous genotypes and reduces false homozygote calls, leading to a lower level of missing data per site.

Sequencing via tGBS was carried out using a three selective base (TGT) protocol as described in Ott et al. (2017) for genome reduction. Fragments were sequenced via eight runs on an Ion ProtonTM sequencer (Thermo Fisher Scientific, Waltham, MA, United States). The conservative three selective base approach was chosen as H. maximiliani is an obligate outcrossing species and exhibit a high degree of heterozygosity (Asselin et al., 2018). Trimmed reads were aligned to the cultivated H. annuus reference genome, HA412.v.1.1.bronze.20142015,<sup>1</sup> using GSNAP (Wu and Nacu, 2010) with polymorphisms within the first and last 3 bp for the read being ignored (Ott et al., 2017). Polymorphisms with base calls with a PHRED score below 20 were removed from the analyses. The SNPs were called as homozygous if the most common allele was supported by a minimum of 80% aligned reads, while heterozygous SNPs were called if the two most common alleles were supported by a minimum of 30% of all aligned reads and the sum of the two most common alleles accounted for a minimum of 80% of the aligned reads. Single nucleotide polymorphisms that could be genotyped in a minimum of half of the samples, that contained an allele number of 2, a minor allele frequency >1% and a heterozygous rate of <70% were maintained for downstream analysis. Imputation was conducted separately on the AB and BC crosses using Beagle4.1 (Browning and Browning, 2016) using default parameters.

### Linkage Map Construction

The AB and BC crosses were examined jointly as a single population employing markers segregating in the same fashion between the two crosses for analysis in Joinmap 5.0 (Van Ooijen, 2018). Common segregation patterns between the crosses were determined using the F<sup>1</sup> genotypes and observed segregation patterns in the F<sup>2</sup> generation to allow joint analysis. Markers were coded as three possible segregation types (<lmxll>, <nnxnp>, and <hkxhk>) corresponding to backcross-like or F2-like segregation patterns. Homozygous allele states accounting for less than 10% of the observed total calls were considered likely genotyping errors and set to missing data prior to analysis. Markers classes in which either the parental genotype could not be accurately determined or were non-informative were removed. The remaining markers were tested for distorted segregation using a chi-square goodness-of-fit test through the test segregation function of Joinmap at p < 0.05 for downstream analysis. Due to the presence of a high degree of segregation distortion, markers exhibiting distortion were retained within the dataset and initially grouped based upon their assigned linkage groups in the cultivated H. annuus HA412.v.1.1.bronze.20142015 reference genome for ordering. Segregation distortion has been reported as having little effect on marker order (Zhang et al., 2010), but erroneously called genotypes, which may result in the appearance of highly-distorted markers, may cause pseudo-linkages between biologically unlinked sequences (Ronin et al., 2010).

Linkage map construction was conducted using the crosspollinated (CP) analysis option in Joinmap 5.0. Identical markers were removed and reference genome positions were utilized in initial maps using the "start order" option to position markers and linkage groups were numbered based upon their reference genome analog. Markers were ordered using the regression mapping algorithm with map distances in cM being calculated using Kosambi's genetic distance. Markers which produced negative map distances were excluded from the map and maps were recalculated. In instances were insufficient linkage was present within a linkage group markers were split into separate groups (i.e., LG 3a and 3b) for analysis to produce the final map.

### QTL Analysis

Restricted multiple-qtl model (rMQM) analysis was conducted using MapQTL 6.0. As a subset of the genotyped F<sup>2</sup> plants exhibited an abnormal "die-back" of the central capitulum, 15 plants were removed from the dataset, leaving 175 individuals for QTL analysis. An initial analysis was conducted to identify putative QTL and possible cofactors for rMQM using the interval mapping procedure at 1 cM intervals. A non-parametric oneway analysis of variance Kruskal–Wallis (K–W) rank-sum test was also conducted. The K–W test method does not require the incorporation of a genetic map, providing marker-trait associations independently of neighboring markers and was used to confirm putative co-factors and QTLs. To account for multiple testing, a stringent alpha of 0.005 was used to reduce potential spurious associations as suggested by Van Ooijen (2018). Possible rMQM cofactors were limited to no more than two per linkage group per trait and an automatic cofactor selection procedure in MapQTL 6.0 was performed using default settings. Subsequent rounds of rMQM mapping were performed for each trait until co-factors stabilized. To account for multiple testing and control of family-wise-error rate for type I errors, whole-genome LOD thresholds were calculated for each trait independently (Churchill and Doerge, 1994). Appropriate genome-wide LOD threshold values were estimated using 1000 permutations and an alpha threshold of p < 0.05. A genome-wide LOD threshold score of p < 0.05 was employed to declare significant QTL. Confidence intervals were calculated using the two-LOD drop-off method which approximates roughly a 95% confidence interval of QTL position (Lander and Botstein, 1989). Strongly overlapping/pleiotropic QTL between two or more traits were given a common name to designate the QTL region.

To infer candidate genes and provide functional annotation of the candidate SNPs generated from the analyses, H. maximiliani SNPs falling within the two-LOD QTL confidence interval were compared with the cultivated H. annuus reference genome HA412.v.1.1.bronze.20142015 assembly using Jbrowse (Skinner et al., 2009), available via the HeliaGene bioinformatics portal (Carrere et al., 2008). Significant SNPs found within a 1kb distance were considered a common entry for annotation. SNPs were compared to their positions within the H. annuus reference genome and a search was conducted for possible candidate genes within 100 kb of each entry. To annotate the presence of putative orthologs, a literature search of sunflower domestication studies was conducted to determine if QTL controlling similar traits

<sup>1</sup>www.sunflowergenome.org

have previously been reported on the same linkage groups in H. annuus.

# RESULTS

## Phenotypic Analysis

fpls-11-00834 June 11, 2020 Time: 20:50 # 6

The initial TLI populations screened for restricted branching exhibited either a single capitulum and a complete lack of branching, or a complete lack of mid-plant and basal branches and a small number (2–3) short apical branches bearing capitula. It has been suggested that branching restriction in Maximilian sunflower exhibits a leaky phenotype, and this observation is consistent with previous observations from restricted branching populations (Van Tassel, personal communication). The F<sup>1</sup> plants were uniform in appearance, and exhibited a branched phenotype consistent with that of wild Manitoban populations. The range of phenotypes observed amongst both crosses was extensive in the F<sup>2</sup> generation and the phenotypic distributions are suggestive of quantitative genetic control (**Table 2**). The unbranched phenotype observed in the TLI parent was not recovered in the F<sup>1</sup> or F<sup>2</sup> generations although a number of plants exhibiting highly restricted branching were observed in the F<sup>2</sup> generation. Branch number ranged from one to 21 branches and the percentage of branch bearing nodes ranged from 2.38 to 48.97%. Timing of first anthesis ranged from 61 to 198 days. Plant height ranged from 90to 192 cm and diameter of the central capitulum ranged from 6.2to 25.6 mm. The total number of capitulum exhibited the highest coefficient of variation relative to other traits examined, ranging from 4 to 126 capitula per plant likely due to the presence of secondary and tertiary branches.

Principal component analysis revealed four significant components and patterns suggestive of a resource allocation trade-off relating to the degree of apical dominance (**Figure 1** and **Supplementary Table S1**). Along the first two principal components axes a negative relationship between capitulum size traits (greater apical dominance) and the total number of branches and capitula (reduced apical dominance) is apparent. The first component (PC1) which explained 30.9% of the phenotypic variance appears to be dominated by trait differences in the parental materials, with positive loadings for later anthesis and increased capitulum size and negative loadings for capitulum and branch number. The second component (PC2), explaining 21.64% of the phenotypic variance revealed patterns between timing and anthesis and capitulum size traits which contradict the first component, suggesting these traits are not strongly associated and may represent recombination between parental types. The third and fourth components (**Supplementary Figure S1**), explaining 9.84 and 8.36% of the phenotypic variance respectively were dominated by branching characteristics (PC3) and other aspects of apical dominance (highest branching node, average branch length and capitulum count) as well as stem diameter (PC4).

# SNP Calling and Linkage Map Development Using the Homeologous H. annuus Genome as a Reference Guide

A total of 6094 polymorphic SNPs were aligned to the homeologous reference genome of H. annuus for QTL identification spanning all linkage groups. Following filtering 5323 SNPs were considered for joint linkage map development and QTL analysis. Due to insufficient linkage more than the expected 17 linkage groups of H. maximiliani were recovered, linkage groups 3, 5, 13, and 17 were split into separate groups resulting in 21 linkage groups (**Supplementary Figure S2**). To date, this is the first reported linkage map of Maximilian

TABLE 2 | Phenotypic mean, standard error of the mean (SEM) and coefficient of variation (CV) and other summary statistics of 340 F<sup>2</sup> H. maximiliani plants phenotyped for 21 traits under growth chamber conditions following a 23◦C 16-h day/18◦C 8-h night cycle.


sunflower and spans a total of 3159.29 cM of the Maximilian sunflower genome, with an average marker interval of 1.31 cM. A total of 4142 markers were incorporated into the map, with 42–300 markers per linkage group (**Supplementary Table S2**).

# QTL Analysis and Identification of Putative Candidate Genes Associated With Domestication

The Genome-wide QTL scans detected 11 significant QTL that were supported by both the rMQM and K-W analysis which corresponded to five genomic regions. A major co-localized QTL (herein referred to as ultra1) was detected on linkage group 17b explaining 12.1–21.5% of the variation in traits associated with timing of anthesis (timing of reproductive budding, first anthesis and average date of anthesis) as aspects of apical dominance (number of branches, average leaf length, average capitulum size, and average capitulum depth) (**Table 3**). Further QTL for timing of reproductive budding were found on linkage groups 6 (rb1) and 15 (rb2), and total capitulum count on linkage groups 2 (cc1) and 3b (cc2).

To identify putative genes associated with the domestication phenotype of H. annuus 100 kb flanking regions surrounding SNPs within the two-LOD QTL support intervals were manually examined using the sunflower genome JBrowse (INRA Sunflower Bioinformatics Resources, 2014). None of the QTL associated SNPs were found within gene sequences but several were within close proximity to annotated genes (**Supplementary Table S3**).

The SNPs associated with the QTL ultra1 on linkage group 17b of H. maximiliani correspond to a gene-rich region of the H. annuus reference genome. A variety of putative functions were found within this region relating to hormone response and regulation and meristem development. Nearby genes in the region of ultra1 include plant GH3 auxin-responsive promotors (IPR004993) and a ULTRAPETALA developmental regulator (IPR020533) (**Supplementary Table S3**) which is known as a key negative regulator of cell accumulation in shoot and floral meristems in Arabidopsis thaliana (Fletcher, 2001; Carles et al., 2004). Genes encoding Cytochrome P450 (IPR001128) and F-box domain proteins (IPR001810), both of which are candidate genes for branching in H. annuus (Mandel et al., 2013; Nambeesan et al., 2015), were also identified.


TABLE 3 | Significant genome-wide QTL in the tested Maximilian sunflower populations (p < 0.05).

<sup>1</sup>CI, Two-LOD confidence interval as approximated using the method of Lander and Botstein, 1989, <sup>2</sup>%Var, Percent of variation in which the QTL explains, <sup>3</sup>LOD, likelihood of odds.

Interestingly, the SNPs associated with ultra1 are approximately 200 kb downstream to a region containing a number of potentially influential genes associated with AUX/IAA proteins (IPR003311), Auxin response factors (IPR010525) and self-incompatibility S1 proteins (IPR010264) suggesting potential genetic linkage between genes associated with the domestication phenotype of H. annuus and selfcompatibility as noted in previous studies (Burke et al., 2002; Gandhi et al., 2005; Tang et al., 2006).

The SNPs associated with cc1 were found to be in proximity to genes which code proteins related to cell proliferation, differentiation and development (ARID/BRIGHT DNA-binding domain, IPR001606) and plant hormone production and regulation (AUX/IAA protein, IPR003311; Auxin response factor, IPR015300; Terpenoid synthase, IPR0089494 and Cytokinin riboside 5<sup>0</sup> -monophospahte phophoribohydrolase LOG, IPR005269). The annotations for cc2 included a fibroblast growth factor receptor (IPR016248). While SNPs associated with rb1 were not found to be within 100 kb of putative candidate genes, these SNPs flank a genomic region containing a phytochrome domain (IPR001294) which is associated with plant light response, the control of photoperiodism and flowering. The presence of a flowering-time QTL on this linkage group is also in line with previous studies which have identified a cluster of paralogous HaFT genes on LG 6 which are major contributors to photoperiod sensitivity in H. annuus (Blackman et al., 2011). Annotations for rb2 which corresponds to LG 15 of H. annuus did not yield obvious functional candidates within close proximity to the associated SNPs though a QTL for flowering is known on this linkage group in wild X landrace crosses of H. annuus (Wills and Burke, 2007; Blackman et al., 2011).

### DISCUSSION

### Genetic Parallels With H. annuus and Putative S-Locus Linkage

The release of the sunflower reference genome has enhanced the ability to generate SNP libraries for analyzing both natural and segregating populations of cultivated sunflower as well as their wild crop relatives. The use of the sunflower reference genome greatly assisted in the development of a genetic map of Maximilian sunflower and in the identification of candidate genes.

Several promising candidate genes within 100 kb of ultra1 were identified in this study, one of which is documented in Arabidopsis to influence the phenotypic traits associated with ultra1. ULTRAPETALA (Ult1) has been shown to regulate a number of development processes including the timing of vegetative to floral transition, maintenance of the shoot apical meristem and termination of floral meristem development in A. thaliana (Fletcher, 2001; Carles et al., 2004). This is in line with the associations between ultra1 and timing of anthesis, overall capitulum size and branching in the present study, which presents a strong case for a causative candidate gene. Plants that carry a Utl1 loss-of-function mutation show restricted branching and an increase in floral meristem size similar to the Kansas populations utilized in this study.

Another promising candidate gene in proximity to ultra1 is a GH3 auxin-responsive promotor. The role of auxins, cytokinins, and strigolactones in the control of shoot branching and maintenance of apical dominance is well-documented (Bennett et al., 2006; Shimizu-Sato et al., 2009). Several candidate genes have been identified in sunflower domestication studies relating to the production, regulation, and transport of these hormones (Nambeesan et al., 2015).

Additional candidates underlying ultra1 include cytochrome P450 and F-box proteins. The cytochrome P450 and F-box gene families include several well-known genes associated with the control of axillary meristem initiation and growth such as the more axillary growth (MAX) genes of A. thaliana (Stirnberg et al., 2002). One of these genes, MAX2 encodes an F-box protein which has been shown to co-localize with a branching QTL on LG 17 of H. annuus (Mandel et al., 2013).

An interesting parallel between H. maximiliani and H. annuus was observed in the QTL analysis. The QTL ultra1 on LG 17b corresponds to a region in the H. annuus reference genome which harbors annotations for a number of S-locus associated

proteins, suggesting potential linkage between ultra1 and selfcompatibility in H. maximiliani. Previous studies looking at wild X crop crosses in H. annuus have observed a preponderance for crop alleles on this linkage group and a clustering of upwards of 20 domestication and post-domestication QTL in tight linkage with the S-locus on LG 17 (Burke et al., 2002; Gandhi et al., 2005; Tang et al., 2006).

The role of ULTRAPETALA (Ult1) genes in sunflower has yet to be explored but several studies have identified domestication QTL in tight linkage with the S-locus on LG 17 of H. annuus. Burke et al. (2002) observed a clustering of QTL for flowering time, stem diameter, capitulum number, number of capitula/branch, plant height, number of leaves, peduncle length, achene weight, shattering and total number of selfed seeds on LG 17. Gandhi et al. (2005) subsequently mapped the S-locus to the same region on LG 17. Tang et al. (2006) observed several overlapping QTL for seed size and seed oil concertation in close proximity to the S-locus on LG 17. Wills and Burke (2007) also observed overlapping QTL for capitulum diameter, branch number, ray number, total capitula number and number of selfed seeds again in the same region on LG 17.

The presence of an S-locus on LG 17b in Maximilian sunflower may impact breeding efforts and is one explanation that helps explain the reduced fertility observed in the initial Kansas populations selected for apical dominance. This observation warrants further investigation as other forms of inbreeding depression reducing fertility cannot be ruled out at this stage. Tight linkage between domestication QTL such as ultra1 and an S-locus could impose restrictions on domestication if linkage cannot be broken. Stringent selection for a given variant of a ultra1 allele may result in a genetic bottleneck and reduced S-locus allelic diversity. Limited genetic diversity in breeding pools of Maximilian sunflower may limit seed yield through the lack of compatibility in pollen sources, reducing to ability to generate novel germplasm through directed crosses. Fine-scale genetic mapping of ultra1 and the S-locus and development of further SNP markers in Maximilian sunflower would provide an efficient way to break the putative genetic linkage between these loci through the identification of recombinant progeny carrying different S-locus alleles.

# Phenotypic Trade-Offs in the Context of Neo-Domestication

A trade-off between capitulum size and capitulum number was observed along both the first and second principal component axis in the F<sup>2</sup> populations and is suggestive of a putative resource allocation trade-off between capitulum size and capitulum number. In H. annuus, unbranched biotypes have larger capitula and seeds than their branched counterparts.

The presence of a major QTL such as ultra1 influencing flowering time, branching and capitulum morphology would explain the first principal component of the PCA which is dominated by type differences observed in the parental materials. The presence of ultra1 helps explain the phenotypic correlations between traits such as later flowering and increased capitulum width and depth. Interestingly, the second principal component shows the opposite relationship between these traits suggesting this trait correlation is not absolute.

While selection for a single large capitulum is a defining domestication syndrome characteristic of cultivated annual sunflower, the path perennial oilseeds will take through domestication will ultimately depends on the how standing genetic variation will contribute to yield. In sunflower, branching not only interacts with capitulum size and total capitula number but also seed weight and oil content amongst other traits (Fick et al., 1974; Dedio, 1980; Tang et al., 2006; Bachlava et al., 2010).

While there are inherent benefits to restricting branching, such as the facilitation of mechanical harvest and uniform maturity, restricting branching may also limit yield potential if selected for too stringently. Branched Maximilian sunflower plants are capable of producing a large number of capitula per stem−<sup>1</sup> (>100) while completely unbranched plants exhibited a single, central capitulum. The relationship between capitulum size and capitula number in Maximilian sunflower is not a 1:1 ratio. Plants with restricted branching have the tendency to exhibit larger capitula, though this increase in capitulum size does not appear to fully compensate for the loss of capitulum number and it has been suggested that this may limit seed yield (Van Tassel et al., 2014; DeHaan et al., 2016). Selection for a single central capitulum, akin to the domestication ideotype of cultivated sunflower, could decrease seed production in H. maximiliani if the loss of capitula through restricted branching is not compensated for by a reciprocal increase in capitulum size. Conversely, the single capitulum ideotype may be better suited for polyculture use as opposed to a monoculture, as the loss of yield associated with an unbranched stem may be compensated for by diversification within the field (Picasso et al., 2008; Picasso et al., 2011).

Helianthus annuus was initially domesticated for its edible seed, pigments and medicinal compounds (Heiser, 1951) with selection for oil content and composition occurring more recently (Burke et al., 2005; Chapman and Burke, 2012). Therefore, the defining characteristics of its domestication may not necessarily apply to the neo-domestication of Maximilian sunflower as an oilseed crop. Selection for a Maximilian sunflower ideotype as an oilseed may parallel other small-seeded Compositae oilseeds such as safflower (Carthamus tinctorius) and noug (Guizotia abyssinica) both of which exhibit contrasting domestication syndromes to annual sunflower (Pearl et al., 2014; Dempewolf et al., 2015), having retained a branched architecture and selection for greater seed production. Similarly, successful oilseed crops in Western Canada such as canola (B. napus L.) retain a branched architecture, small seed size, and are still prone to considerable shattering losses under certain conditions (Gulden et al., 2003; Cavalieri et al., 2016). Despite these characteristics, these crops are capable of sustaining economic yields supporting their use as crops. Increased branching and seed number may constrain seed size, in H. annuus and smaller seeds tend to bear a higher concentration of oil. Estimates of 6.7–8.5% greater oil concentration in the seed of branched individuals relative to unbranched individuals in segregating populations have been observed, presumably due to a thinner pericarp (Tang et al., 2006; Bachlava et al., 2010). Therefore,

increasing branching and seed number may prove beneficial in increasing oil yield per unit area.

Though selection for restricted branching may limit the total production of capitula, and by extension yield potential on a single plant basis, branching restriction may enhance yield on a per unit area. It has been observed that in species such as safflower lower order branches are often infertile and that in general, plants that produced many branches produced smaller capitula, fewer florets per branch, and smaller seeds in smaller quantities (Bidgoli et al., 2006). Branching restriction may enhance harvestable yield indirectly through greater synchronicity of flowering, maturity of capitula and through changes in capitulum morphology, both of which may alleviate shattering losses. In the present study, the capitulum depth:width ratio appears to decrease with branching restriction and the increase in capitulum size. Plants with a wider, flatter capitulum, more akin to those of domesticated sunflower may exhibit reduced shattering losses if these patterns are common between species. Furthermore, it may be possible to compensate for the loss of secondary capitula through increasing stem density on a per unit area, either through selection for a greater number of stems per plant or sowing higher density plantings coupled with tolerance to crowding. Ultimately, increasing the harvest index of Maximilian sunflower through conventional as well as marker-assisted breeding approaches appears to be possible through multiple pathways and further investigation into these factors under realistic field conditions is required to define the optimal ideotype for production.

In summary, we were successful in identifying QTL for important traits associated with domestication in Maximilian sunflower and identified candidate genes using the existing genomic resources of cultivated sunflower. Results presented in this study have implications for the defining an appropriate ideotype for perennial oilseed crops in the Compositae and the development of genomic resources in crop wild relatives. Further work is needed to compare contrasting Maximilian sunflower ideotypes and characterization is required to verify the presence of putative S-locus linkage and the development of markers for this and other domestication QTL. Future work will focus on the field based evaluation of yield components in proposed Maximilian sunflower ideotypes and the development of a de novo genetic map for Maximilian sunflower and fine mapping of genomic regions surrounding ultra1. Understanding both the genetic and agronomic parallels of domestication will play an important role in the neo-domestication of Maximilian sunflower and related species.

# REFERENCES


# DATA AVAILABILITY STATEMENT

The SNP dataset generated in this study is available in Dryad digital repository (doi: 10.5061/dryad.tdz08kpwt). Any other datasets and germplasm used and/or analyzed during the current study will be made available through contact with the corresponding author on reasonable request.

## AUTHOR CONTRIBUTIONS

SA generated the mapping populations, conducted the greenhouse and laboratory work, analyzed the data, and wrote the manuscript as a graduate student. DC and AB-B provided laboratory resources and supervision of experimental design and aided in the editing of the manuscript. DV developed the initial restricted branching populations and provided germplasm and feedback on the manuscript. All authors have reviewed and agreed on the contents of the manuscript.

# FUNDING

This work was supported by the Western Grains Research Foundation graduate scholarship, by grants from Manitoba Agriculture [Advanced Rural Development Initiative (ARDI) Project No. 12-1136], and The Land Institute, Salina, KS.

# ACKNOWLEDGMENTS

The authors would like to acknowledge Ardelle Slama, Robert Visser, Kathy Bay, and Ian Brown for providing greenhouse support throughout the length of the experiment and Dr. Maria Stoimenova and Dr. Curt McCartney for advice and technical support in the areas of molecular biology and genetic mapping respectively. Dr. Robert Gulden, Dr. Barbara Sharanowski, and Dr. Khalid Rashid are acknowledged for serving on SA's graduate student committee and providing feedback throughout the length of the experiment.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2020.00834/ full#supplementary-material



independent control of apical vs. basal branching. BMC Plant Biol. 15:84. doi: 10.1186/s12870-015-0458-9


(Schrader) via protoplast electrofusion. Plant Cell Rep. 25, 698–704. doi: 10. 1007/s00299-006-0134-5


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Asselin, Brûlé-Babel, Van Tassel and Cattani. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Common Vetch: A Drought Tolerant, High Protein Neglected Leguminous Crop With Potential as a Sustainable Food Source

### Vy Nguyen1,2, Samuel Riley1,2, Stuart Nagel<sup>3</sup> , Ian Fisk<sup>4</sup> and Iain R. Searle1,2 \*

<sup>1</sup> School of Biological Sciences, School of Agriculture, Food and Wine, The University of Adelaide, Adelaide, SA, Australia, <sup>2</sup> Shanghai Jiao Tong University Joint International Centre for Agriculture and Health, The University of Adelaide, Adelaide, SA, Australia, <sup>3</sup> South Australian Research and Development Institute, Adelaide, SA, Australia, <sup>4</sup> Division of Food Science, Nutrition and Dietetics, School of Biosciences, University of Nottingham, Nottingham, United Kingdom

### Edited by:

Thomas M. Davis, University of New Hampshire, United States

### Reviewed by:

Hamid Khazaei, University of Saskatchewan, Canada Tomas Vymyslicky, Agricultural Research, Ltd., Czechia

> \*Correspondence: Iain R. Searle iain.searle@adelaide.edu.au

### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 27 February 2020 Accepted: 21 May 2020 Published: 19 June 2020

### Citation:

Nguyen V, Riley S, Nagel S, Fisk I and Searle IR (2020) Common Vetch: A Drought Tolerant, High Protein Neglected Leguminous Crop With Potential as a Sustainable Food Source. Front. Plant Sci. 11:818. doi: 10.3389/fpls.2020.00818 Global demand for protein is predicted to increase by 50% by 2050. To meet the increasing demand whilst ensuring sustainability, protein sources that generate low-greenhouse gas emissions are required, and protein-rich legume seeds have the potential to make a significant contribution. Legumes like common vetch (Vicia sativa) that grow in marginal cropping zones and are drought tolerant and resilient to changeable annual weather patterns, will be in high demand as the climate changes. In common vetch, the inability to eliminate the γ-glutamyl-β-cyano-alanine (GBCA) toxin present in the seed has hindered its utility as a human and animal food for many decades, leaving this highly resilient species an "orphan" legume. However, the availability of the vetch genome and transcriptome data together with the application of CRISPR-Cas genome editing technologies lay the foundations to eliminate the GBCA toxin constraint. In the near future, we anticipate that a zero-toxin vetch variety will become a significant contributor to global protein demand.

Keywords: legume, common vetch, Vicia sativa, vetch toxin, γ-glutamyl-β-cyano-alanine, plant-based protein, sustainable

## INTRODUCTION

Global demand for protein is predicted to increase by a staggering 50% by 2050 (Westhoek et al., 2011; Henchion et al., 2017). With an increasing global population and increasing demand for animal-derived protein, the sustainability of agriculture systems has been brought into question. Over the last two centuries, the expanding livestock industry has led to significant deforestation, and overgrazing of natural grassland environments such that it has caused decreased terrestrial biodiversity and increased greenhouse gas emissions and contributed to climate change and global warming (Henchion et al., 2017). In order to meet the increasing protein demand and protect our environment, more sustainable protein food sources are required. Cheap plant-based protein, such as legume seeds, represent an environmentally sustainable option that is well suited for developing countries with rapidly growing populations (Asgar et al., 2010). Moreover, to cope with increasingly unpredictable climate change and expansion of marginal cropping areas, breeding strategies for

more drought tolerant and resilient crops will be vital (Sivakumar et al., 2005; Lobell and Gourdji, 2012). A legume that could be exploited for this scenario is the common vetch (V. sativa). Common vetch is able to grow in marginal cropping zones whilst being resilient to variable annual weather patterns mainly through superior drought tolerance (White et al., 2005). One study demonstrated that vetch could withstand water deficit for up to 24 days and show full restoration of biotic function once regular watering had resumed (Tenopala et al., 2012). Drought and heat tolerant crops are increasingly desirable in the face of rising global temperatures and increasingly prolonged periods of drought brought on by climate change (Mba et al., 2018).

For many decades, the inability to remove the γ-glutamylβ-cyano-alanine (GBCA) seed toxin has hindered common vetch's use in agriculture (Pfeffer and Ressler, 1967; Roy et al., 1996) leaving this resilient plant as an "orphan" legume. We envisage that the development of a zero-toxin vetch variety would facilitate its use for animal feed, specifically chickens and pigs, and human consumption (Ressler et al., 1997; Collins et al., 2002). We estimate the production costs of common vetch are approximately 50% less than competing legumes such as lentils and predict that zero-toxin varieties would rapidly surpass lentil in pig and poultry production. Zero-toxin common vetch will immediately generate new domestic markets, such as feed for the poultry industry, but it will also open new export markets for Australia and other countries thereby increasing export revenue and increasing farm profitability and indirectly increase investment for their local communities.

# COMMON VETCH: A VERSATILE PASTURE CROP THAT PROVIDES MULTIPLE BENEFITS FOR THE FARM

Common vetch (V. sativa) which is shown in **Figure 1** belongs to the Fabaceae (legume) family, within the genus Vicia. This genus contains about 140 species including woolly-pod vetch (V. villosa) and faba bean (V. faba). Other Fabaceae genera also contain so-called vetches; of which two examples are Astragalus (containing the milkvetches) and Lathyrus (containing L. ochrus, the cyprus-vetch). Nowadays, common vetch is commonly found both in natural and agricultural settings across Europe, Asia, North America, some parts of South America, Africa, the Mediterranean, and Australia (Navrátilová et al., 2003; Ford et al., 2008).

Like other legumes, common vetch forms a symbiosis with nitrogen-fixing bacteria (Rhizobia) that fix atmospheric nitrogen into nitrogenous compounds available to the plant, hence reducing the need for application of expensive nitrogen fertilizer and subsequent rotation crops. Often, common vetch is used as a green manure which, when incorporated into the soil, provides valuable carbon, and nitrogen for rotation crops such as wheat and barley. Additional soil carbon often increases water-holding capacity and ability to bind nutrients including nitrate (Reeves, 1997; Bünemann et al., 2018). Furthermore, common vetch biomass can also be used for forage, fodder, pasture, silage, or hay and the seed may safely be used as a protein-rich feed component for ruminant animals (Enneking, 1995). Common vetch is well suited as a pasture species as it forms many adventitious shoots that are either buried or close to the soil surface thus giving it the ability to be resilient to heavy grazing (Rathjen, 1997).

Despite common vetch's versatile uses, the production of vetch is still limited. Data collected by the Food and Agriculture Organization (FAO) in 2017 showed that, globally, the area harvested and the production of vetch were about 0.6 million ha and 0.9 million tons, respectively<sup>1</sup> . Based on FAO 2017 data for legumes, vetch occupied about 0.3% of land usage and accounted for only 0.2% of the production. This is 12 times less than the area harvested and 8 times less than the production of lentils (**Figure 2**). Common vetch's limited production is mainly attributed to the anti-nutritional compounds existing in the seeds which will be further discussed in the next section.

# COMMON VETCH SEEDS: IMPORTANT NUTRITIONAL ATTRIBUTES

Common vetch seed appeared in the diets of hunter-gatherers as early as 12,000–9,000 BP as evident in archaeobotanical analysis of samples from the Santa Maira cave in Alicante, Spain (Aura et al., 2005; Mikic, 2016 ´ ). Today, common vetch is globally distributed and its spread is thought to have occurred through the inadvertent selection and trading of vetch seeds as a weedy contaminant with other legume seeds (Erskine et al., 1994). This has led to the suggestion that selection of common vetch amongst other palatable legumes, such as lentils, led to vegetative, and seed mimicry of vetch to lentils (Erskine et al., 1994). Although intact lentil and vetch seed can be differentiated after close examination of the seed shape and size, the split seed is harder to distinguished. Due to the similarity of split vetch and lentil seeds, there was a period of time when split vetch seeds of the variety "Blanchefleur" were oil coated and inappropriately substituted and sold as lentils. However, Tate and Enneking (1992) raised the so-called substitution issue of vetch toxicity and indicated that vetch seeds were unsuitable to be eaten by monogastric animals, including humans. Since then, vetch has been restricted to use in pastures as the seeds can be safely eaten by ruminants and some monogastric animals, such as chicken (less than 40% w/w), and pig (less than 20% w/w) in low amounts (Rathjen, 1997).

Common vetch exhibits high concentrations of crude protein (w/w) in the seed between 24 to 32% (Francis et al., 2000). This concentration is comparable to faba bean (V. faba) and lupin (Lupinus angustifolius L.) seed which have about 32% (Crépon et al., 2010; Caliskanturk et al., 2017) and 30% crude protein (Valentine and Bartsch, 1996), respectively. Common vetch seeds contain eighteen amino acids and the ratio of essential amino acids/non-essential amino acids is about 0.7 (Mao et al., 2015) which is significantly higher than the 0.38 recommended by WHO (Joint, 2007). Additionally, the principle essential amino acids are arginine and leucine at average concentrations of 2.4 and 2.1%, respectively. Glutamic and aspartic acids are the

<sup>1</sup>http://www.fao.org/faostat/en/#data

predominant non-essential amino acids within common vetch seed, averaging levels of 5.5 and 3.7%, respectively (Mao et al., 2015). Vetch seed also contains a much lower proportion of lipids (only 1.5–2.7%) compared with soybean seed (Glycine max) and this is primarily composed of unsaturated fatty acids (Mao et al., 2015).

Presently, common vetch is mainly used as animal feed for ruminants and it can be argued that higher value returns could be observed through the repurposing of common vetch as a human food crop. Obstructing the development of the higher value product is a range of anti-nutritional factors, the most significant of which are the dipeptide GBCA and the free amino acid β-cyano-L-alanine (BCA), which exist in relatively high concentrations in the seed at approximately 2.6 and 0.9%, respectively (Tate et al., 1999). Such compounds are toxic to monogastric species, such as chickens or pigs, but have no obvious effect upon ruminant species, including beef cattle (Valentine and Bartsch, 1996). The proportion (w/w) of common vetch seed within feeds that can be safely consumed without deleterious effects is 10% for piglets and 20% for adult pigs and for chickens about 40% (Harper and Arscott, 1962; Rathjen, 1997). In addition to the main toxin GBCA, other anti-nutritional compounds were also found in common vetch included vicianine, vicine, convicine, and tannins (Ritthausen and Kreusler, 1870; Rathjen, 1997).

Genetic variation and relatively high heritability of vicianine levels in common vetch accessions have allowed breeders to select and produce cultivars free of this toxin. One early developed cultivar with no vicianine was Blanchefluer (Delaere, 1996; Rathjen, 1997). Most modern common vetch cultivars are vicianine free. Vicine and convicine are well-studied in faba bean, and in humans cause the potentially fatal disease favism (Bottini, 1973). Vicine and convicine are hydrolysed by native β-glucosidases in the cotyledons of seeds to form divicine and isouramil and when consumed by humans can cause oxidation of glutathione in red blood cells. In individuals who cannot generate glutathione at normal rates due to a deficient glucose-6-phophatedehydrogenase activity this results in haemolysis and the disease favism (Arese et al., 1981). Unlike vicianine and vicine, wild accessions and breeding lines with very low GBCA levels have not been identified, and the GBCA toxin levels in current cultivars is still deemed to be too high for monogastric consumption. Chickens can only tolerate feed with less than 20% (w/w) common vetch (∼0.2% GBCA in the feed). Confounding toxic effects of other compounds with vicine and GBCA were proposed (Rathjen, 1997), however, the chemical basis of these compounds is currently unknown. Finally, antinutritional tannins present in vetch seed coats are often removed during the dehulling process and hence are considered less significant (Rathjen, 1997).

### LIMITATION OF GBCA DETOXIFICATION USING CONVENTIONAL METHODS

Without the toxic compounds in the seeds, vetch would be highly nutritious and a valuable animal feed. Therefore, a number of methods have been investigated to remove the toxins, mainly

focusing on GBCA. Post-harvest processing efforts to lower the GBCA toxin levels within the seed have previously involved simple soaking, continuous flow through soaking, and boiling methods (Rathjen, 1997). The seed soaking method alone was insufficient to lower GBCA levels as consumption of soaked seed during feeding trials by chickens reduced egg production, and

daily food consumption and feed conversion ratios were also diminished (Farran et al., 1995). In contrast, the boiling method reduced the toxin levels in the seed such that the seed could be included in chicken feed at levels of up to 25% without negative effects on growth rates (Kaya et al., 2013). However, consumption of boiled vetch seed that had the broth periodically discarded during boiling resulted in 20% reduced growth rates in chickens when compared with conventional feed with similar protein amounts (Ressler et al., 1997). This boiling method combined with periodically discarded water had 45% decreased seed mass as water soluble vitamins, like vitamin B, water soluble proteins and carbohydrates were leached during processing (Ressler et al., 1997), and this correlated with the reduced chick growth rate. Autoclaving the seed as a processing method has also been investigated and assessed in feeding trials of laying hens. In the laying hen trial, overall growth rate was found not significantly different between animals fed with autoclaved or raw vetch seed suggesting that the non-heat labile toxins like GBCA were still bioactive (Farran et al., 1995). The lack of success of these seed processing methods in improving animal growth or health has strongly indicated the need for genetic approaches to detoxify the common vetch seeds.

# UNSUCCESSFUL SEARCHES FOR ZERO-TOXIN VETCH ACCESSIONS

Using conventional breeding methods and more recently the use of molecular marker-assisted breeding for genomic selection, plant breeders have prioritized the search for common vetch varieties that have biotic and abiotic stress resistant traits as well as selecting for increasing yield and seed nutritional quality (Francis et al., 2000). However, no concerted effort has been made to select for low or zero GBCA toxin levels in Australian or overseas breeding programs. This has resulted in varieties that are only used by farmers for pasture, green or brown silage, or ruminant feed (Francis et al., 2000; Dong et al., 2019; Huang et al., 2019; Mikic et al., 2019 ´ ). This is mainly due to very limited natural variation in GBCA toxin levels amongst common vetch accessions (Rathjen, 1997). Rathjen (1997) screened over 1,700 V. sativa accessions and failed to identify a single accession with no GBCA toxin. Later, screening of a total of 3,000 accessions identified only one line, IR28, with a low (0.3–0.4%) GBCA level but no zero-toxin line has yet been identified (Ford et al., 2008). Backcrossing IR28 to Jericho white, a spontaneous white flowered mutant of the French commercial variety Languedoc, over seven generations produced a near homozygous line named Lov 9 (Tate and Searle, unpublished). However, the GBCA levels in the Lov 9 seed from plants grown in shade houses or field conditions ranged from 0.4–1.2%, respectively (Tate and Searle, unpublished). These GBCA levels in Lov 9 seed were deemed too high for commercial release of the variety as a low toxin variety. Another strategy to develop a zero GBCA toxin common vetch variety was interspecies crosses of zero toxin species V. villosa and V. pannonica to common vetch but these resulted in embryo abortion and no viable hybrids were recovered (Searle, unpublished). Considering the limited success to date, other pathways to produce a zero-toxin common vetch variety are required.

# APPLICATION OF BIOTECHNOLOGY TO PRODUCE ZERO-TOXIN VETCH

Applications of biotechnology have promised to accelerate crop improvement (Moose and Mumm, 2008). The emergence of genomics, transcriptomics, metabolomics, and proteomics data has led to the establishment of publicly available databases for most major crops. For example, the LIS – Legume Information System (Dash et al., 2015), and eFP browser (Patel et al., 2012; Hawkins et al., 2017) now contain data for legumes. By combining the information available in these databases with new bioinformatic tools, we now have the ability to dissect complex traits to determine the underlying gene architecture in a more comprehensive way. In 2018, the 1.8 Gb common vetch genome and seed transcriptome sequencing projects were initiated at the University of Adelaide, Australia, opening the opportunity to determine the genetic basis of the vetch toxin accumulation. Using this transcriptome data, we could identify the genes involved in toxin production and in the future we could investigate their functions by overexpressing or mutating candidate genes. Moreover, we envisage that application of CRISPR-Cas (clustered regularly interspaced short palindromic repeats – Cas protein) genome editing to modify agronomically important traits in crops such as wheat, barley, rice, and tomato (Liu et al., 2017) will soon be applied to more challenging species including the common vetch. Using CRISPR-Cas genome editing, the nutritional profiles of many crops have been recently demonstrated. For example, in tomato, knocking down genes in the carotenoid metabolic pathway led to a 5-fold increase in lycopene (Li et al., 2018), and in rice, generating mutations in the starch branching enzyme (Berrens et al., 2017) genes increased amylose content by up to 25% and resistant starch to 9.8% (Sun et al., 2017). One of the most significant impacts of CRISPR-Cas genome editing is the potential improvement of a key trait in a commercially released cultivar within 6 months. In contrast conventional breeding of the trait may take 5–7 years to release the new cultivar. Importantly, it only takes one generation to obtain an edited plant using genome editing.

A major challenge in non-model plant systems like common vetch, is the delivery of transgenes such as CRISPR-Cas ribonuclear complex into plant cells and subsequent plant regeneration. Unlike crops such as rice and barley where the transformation and plant regeneration systems are standardized (Sahoo et al., 2011; Harwood, 2014), efficient transformation, and plant regeneration systems are lacking for common vetch (Ford et al., 2008). Unfortunately, common vetch's GBCA toxin level have lowered the priority for research and development of these necessary biotechnological tools – for example developing a transformation system. Further investment in common vetch is required to develop transformation and plant regeneration systems to facilitate the application of genome editing for trait improvement.

### FUTURE WORK AND EXPECTATIONS

The environmental benefits, the versatile growth habit, and the rich nutritional profile of common vetch make the legume an appealing crop to meet future protein food requirements for humans and animals while sustainably contributing to our agricultural system. However, the failure of conventional breeding to develop a zero-toxin common vetch variety requires a new strategy to be employed. The recent availability of new vetch genomic resources and tools for genome editing increase the likelihood of solving the vetch GBCA toxicity problem. To make this plausible, we suggest the following steps are important for vetch.

To solve the vetch GBCA toxicity, we suggest the following experiments should be prioritized:


### REFERENCES


Although the first three objectives above can be readily achieved, the last three could take many years to be accomplished due to the absence of a demonstrably routine in vitro transformation and regeneration system for vetch. Therefore, expertise from the plant tissue culture field will be invaluable to achieving objective four and ultimately the final goal.

### DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

### AUTHOR CONTRIBUTIONS

VN and IS initially conceived the manuscript. All authors contributed to the writing and editing of the manuscript.

### ACKNOWLEDGMENTS

We would like to thank the Hermon Slade Foundation and The University of Adelaide for the initial funding awarded to IS and VN, respectively. We also thank Jeremy Timmis for useful feedback and editing of the manuscript. This manuscript has been released as a pre-print at https://doi.org/10.1101/2020.02.11. 943324 (Nguyen et al., 2020).

of a set of federated data resources for the legume family. Nucleic Acids Res. 44, D1181–D1188.


influencing a sustainable equilibrium. Foods 6:53. doi: 10.3390/foods607 0053


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Nguyen, Riley, Nagel, Fisk and Searle. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Southern Species From the Biodiversity Hotspot of Central Chile: A Source of Color, Aroma, and Metabolites for Global Agriculture and Food Industry in a Scenario of Climate Change

### Edited by:

Eric Von Wettberg, University of Vermont, United States

### Reviewed by:

Kim E. Hummer, United States Department of Agriculture, United States Amparo Monfort, Institute of Agrifood Research and Technology (IRTA), Spain Petr Smy´ kal, Palacky´ University, Czechia

> \*Correspondence: Rau´l Herrera raherre@utalca.cl

### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 18 December 2019 Accepted: 19 June 2020 Published: 02 July 2020

### Citation:

Letelier L, Gaete-Eastman C, Peñailillo P, Moya-Leo´ n MA and Herrera R (2020) Southern Species From the Biodiversity Hotspot of Central Chile: A Source of Color, Aroma, and Metabolites for Global Agriculture and Food Industry in a Scenario of Climate Change. Front. Plant Sci. 11:1002. doi: 10.3389/fpls.2020.01002

### Luis Letelier 1,2, Carlos Gaete-Eastman1 , Patricio Peñailillo<sup>1</sup> , Mar´ıa A. Moya-Leo´ n<sup>1</sup> and Rau´ l Herrera1\*

<sup>1</sup> Laboratorio de Fisiolog´ıa Vegetal y Gene´tica Molecular, Instituto de Ciencias Biolo´ gicas, Universidad de Talca, Talca, Chile, <sup>2</sup> Nu´ cleo Cient´ıfico Multidisciplinario, Direccio´ n de Investigacio´ n, Universidad de Talca, Talca, Chile

Two interesting plants within the Chilean flora (wild and crop species) can be found with a history related to modern fruticulture: Fragaria chiloensis subsp. chiloensis (Rosaceae) and Vasconcellea pubescens (Caricaceae). Both species have a wide natural distribution, which goes from the Andes mountains to the sea (East-West), and from the Atacama desert to the South of Chile (North-South). The growing locations are included within the Chilean Winter Rainfall-Valdivian Forest hotspot. Global warming is of great concern as it increases the risk of losing wild plant species, but at the same time, gives a chance for usually longer term genetic improvement using naturally adapted material and the source for generating healthy foods. Modern agriculture intensifies the attractiveness of native undomesticated species as a way to provide compounds like antioxidants or tolerant plants for climate change scenario. F. chiloensis subsp. chiloensis as the mother of commercial strawberry (Fragaria × ananassa) is an interesting genetic source for the improvement of fruit flavor and stress tolerance. On the other hand, V. pubescens produces fruit with high level of antioxidants and proteolytic enzymes of interest to the food industry. The current review compiles the botanical, physiological and phytochemical description of F. chiloensis subsp. chiloensis and V. pubescens, highlighting their potential as functional foods and as source of compounds with several applications in the pharmaceutical, biotechnological, and food science. The impact of global warming scenario on the distribution of the species is also discussed.

Keywords: Chilean strawberry, Fragaria chiloensis subsp. chiloensis, mountain papaya, new crops, Vasconcellea pubescens

# INTRODUCTION

Few grain species were initially used in human diet. Several other vascular plants species have been incorporated and plant breeding effort have selected inbreeds with interesting and desirable characteristics focus on productivity (Khoshbakh and Hammer, 2008; Şerban et al., 2008). In the case of fruit species, characteristic like, fruit size, color, and texture have been the main quality attributes followed by breeders during the last century, but now a great interest for new traits, such as good aroma, pathogen resistance, high level of antioxidants, healthy compounds, and better postharvest life are considered. Nowadays, fruits from native species could be incorporated in our diet due to their contribution as functional food. In particular, strawberries are rich in secondary metabolites, as well as other attributes, however the fruit is susceptible to several diseases. Hancock et al. (2010) recognizes the importance of octoploid species such as F. chiloensis and F. virginiana to enrich traits such as resistance to diseases or better performance against biotic or abiotic stresses in new commercial lines of F. × ananassa. The global warming scenario of our planet has greatest importance in recent days, as the risk to lose wild plant species is certain due to changes in the ecological conditions (Feng et al., 2010; Lippmann et al., 2019; Moran, 2020). Chile is a world center of origin for cultivated plants and also considered as a biodiversity hotspot. In this sense, two interesting taxa with a history related to modern fruticulture can be mentioned: Fragaria chiloensis (L.) Mill. subsp. chiloensis (Rosaceae), known as the Chilean strawberry, and Vasconcellea pubescens A. DC. (Caricaceae), known as mountain papaya. The present article describes the development, use and potential future for both species.

# MOUNTAIN PAPAYA

Vasconcellea pubescens A. DC. (Caricaceae) (synonyms: Carica pubescens Lenné & K. Koch, C. candamarcensis Hook. f., and V. cundinamarcensis V.M. Badillo) (Badillo, 2000; Van Droogenbroeck et al., 2002; Hassler, 2019), known as highland or mountain papaya, is a diploid dicotyledonous species (2n = 2x =18). It is native from subtropical Andean mountains of South America, in particular from low dry mountain forest areas in Colombia and Ecuador (2,000 to 3,000 m altitude). Ecuador, which has 15 of the 21 Vasconcellea species, is a center of genetic biodiversity (Van den Eynden et al., 1999; Scheldeman et al., 2011). Some genotypes of V. pubescens have been well adapted to central Chile before the Spanish colonization, suggesting that its introduction occurred during the Inca Empire or even earlier (Latcham, 1936). A surface of 180 ha of commercial orchards (ODEPA-CIREN, 2019), mostly located between 30° and 33° latitude south (Figure 1; Supplementary Table 1), highlights Chile as the only country in the southern hemisphere producing the fruit at commercial scale and using it as a source of processed products.

Mountain papaya is an arborescent plant reaching 1 to 8 m tall, with one central stem and palmate leaves with long petioles at the top (Figure 2A). Most of individual plants are dioecious, but a wide range of hermaphrodite phenotype is possible to identify in the field, so formally it is a sub-dioecious species with three sex phenotypes: female, male, and hermaphrodite (Figure 2B) (Badillo, 1993). V. pubescens presents a case of andromonoecious, where bisexuality is observed in male plants, with a high proportion of male flowers and hermaphrodite ones. Bisexual flowers are male flowers bearing viable ovules leading to the production of small fruits (Scheldeman et al., 2011). Sex in V. pubescens is determined by two factors, the Ypm chromosome and pm cytoplasmic factor (Horovitz and Jimenez, 1967), which are primitive sex chromosomes and a model for sex evolution determination (Na et al., 2014). The reproductive mode is through seeds, taking between 10 to 12 months to reach reproductive age, and maintained for 5 years at commercial scale. Then, plants progressively produce less fruit and with lower quality, nevertheless, plants older than 20 years are found in Chilean orchards. Mountain papaya tree shows a slow growth, with a continuous flower and leaf production; when lower leaves get older they fall into the ground (Sánchez, 1994).

V. pubescens is sensitive to cold, stem and leaves could be affected leading to complete plant death when temperatures fall below 2°C; even ripening of the fruit could be altered by cold. Cold episodes have been more frequent in Chile in recent years, even in coastal areas where chilling injuries on commercial orchards were absolutely absent 10 years ago. Additionally, extended drought stress promotes the continuous loss of leaves (Sánchez, 1994).

Mountain papaya tree bears fruit at the leaf axis spirally arranged along the trunk, so it is possible to observe from immature (small size, green colour) to ripe (big size, bright yellow) fruit stages in one single plant. In natural populations, birds eat ripe and overripe fruits, allowing seed dispersal and germination. Seeds have a high germination rate (60% in 30 days) without the requirement of a dormancy period (Sánchez, 1994). The fruit is oblong, truncated at the base, with five pronounced ridges; a ripe fruit is about 8–15 cm long, 5–6 cm diameter, and 200 g mean weight (Garcıa, 1975 ́ ) (Figure 3). The fruit has a juicy yellow flesh with a strong and aromatic flavor, characterized by a high content of vitamins (A, B, and C), high level of antioxidants, and a milky latex (highly abundant in immature stages) that contains a mixture of proteolytic enzymes, commonly named as papain (Van Droogenbroeck et al., 2002; Moya-León et al., 2004). The fruit is commonly used to prepare preserves, jam, sweet candies and nectar, meanwhile its latex is widely used in the industry as meat tenderizer. Mountain papaya is frequently compared with the common and worldwide known papaya (Carica papaya L.), being V. pubescens smaller and less succulent than C. papaya but with a greater aroma and flavor (Van den Eynden et al., 1999; Scheldeman et al., 2003; Van den Eynden et al., 2003).

### Diversity and Genetic Structure

Some phenotypic variations are observed in mountain papaya, mainly in plant height, number of branches and fruit size. Nevertheless, the most important differences are in sex

FIGURE 1 | Distribution of Fragaria chiloensis (wild and domesticated) and Vasconcellea pubescens in Chile. The map shows a partial extension (area) of the Chilean Winter Rainfall-Valdivian Forests hotspot. Images located on the left side of the map, from North to South, corresponded to V. pubescens plants grown at the localities of Limarı́and Lipimá vida, and for F. chiloensis f. patagonica grown at Termas de Chillá n and Cucao, respectively.

determination. Although no environmental variation can be observed for pistillate and staminate plants, the proportion of male and hermaphrodite flowers in andromonoecious plants depends on climate conditions. On the other hand, the incompatibility barrier is labile in Vasconcellea, and therefore, this allows the possibility to increase genetic variation through the formation of spontaneous hybrid specimens between different species (Badillo, 1993; Sánchez, 1994). Interspecific hybrids have been found in natural populations (De Zerpa, 1959), however only one has commercial success: the babaco (Vasconcellea × heilbornii (V.M. Badillo) V.M. Badillo). Babaco results from the cross between V. pubescens and V. stipulata (V.M. Badillo) V.M. Badillo (Horovitz and Jimenez, 1967; Badillo, 1971; Scheldeman et al., 2011). Molecular studies using molecular markers suggest that the origin of V. × heilbornii results of a complex evolution process where V. pubescens, V. stipulata, and V. weberbaueri (Harms) V.M. Badillo are parental species (Scheldeman et al., 2011). Molecular analysis also confirms hybrids obtained by the cross of V. pubescens and V. monoica (Desf.) A. DC. (Kyndt et al., 2005; Kyndt et al., 2006) and interspecific crosses with V.

FIGURE 2 | Morphological characteristics of V. pubescens tree, flowers and fruit. (A) Mountain papaya tree at three growing stages: a two month old tree of half a meter tall producing the first flowers (left); after one year, the tree reaches one meter tall and produces fruit (center); after three years, the tree is two meters tall, it is under full fruit production (right); interestingly, fruit at different developmental stages could be observed in the same tree. (B) Sexual forms in V. pubescens. Male plants produce staminate flowers, showing their characteristic elongated form, with the stamen inside the flower (left). Female plants produce pistillate flowers, with the characteristic enlargement of the flower base, because of ovary growth (center). Andromonoecious plants produce hermaphrodite and estaminate flowers, where the first ones differentiate from pistillate flowers showing a wider enlargement of flower base (up right). The most notorious change during fruit ripening is the color change of fruit skin, varying from an intense green to bright yellow color (down right).

stipulata, V. monoica, V. microcarpa (Jacq.) A. DC. and V. horovitziana (V.M. Badillo) V.M. Badillo (Horovitz and Jimenez, 1967; Badillo, 1971).

A high genetic diversity along the natural distribution pattern in the centres of origin (Ecuador and northern Perú) is assumed, but scarce information exists regarding the history and genetic diversity of V. pubescens introduced in Chile. A low genetic variability but a high genetic polymorphism was reported using ISSR markers (Carrasco et al., 2009). Based on the results, the authors suggested that few introduction events of genetic material from the original populations were concreted during time and this material constitutes the basal genotype of Chilean material.

### Phytochemistry

Aroma is one of the most important attributes of mountain papaya fruit and it is due to a complex mixture of volatile compounds produced by the fruit (Balbontın et al., 2007 ́ ). Main volatile compounds produced by the fruit are esters and alcohols (linear and branched), and their production increase as ripening progresses (Balbontın et al., 2007 ́ ). Most of esters identified in mountain papaya fruit are potent odour compounds, such as ethyl butanoate, ethyl acetate, ethyl hexanoate, and ethyl 2 methylbutanoate, and the most important alcohol is butanol. The dynamic of their production during ripening depends on ethylene, which agrees to consider mountain papaya as a climacteric fruit (Moya-León et al., 2004; Balbontın et al., 2007 ́ ). Several other volatile compounds such as methyl cis-hex-3-enoate, isopentyl acetate, methyl 3-hydroxyhexanoate and ethyl nicotinoate have been reported in mountain papaya fruit grown in Chile, however they have not been detected in fruit grown in Colombia (Morales and Duque, 1987). On the other hand, many fruits accumulate antioxidant activity during ripening, which is a desirable feature for functional foods. Active antioxidant phenolics from mountain papaya fruit include quercetin glycosides, rutin, and manghaslin, which are not produced by common papaya (C.

FIGURE 3 | Morphological characteristics of mountain papaya fruit at the ripe stage. Fruit shows its characteristic shape, thin pulp layer, with an inside cavity full of maternal tissue that contains many seeds (left), and the typical yellow skin color of ripe fruit with five ridges (right). The fruit was harvested at Lipimá vida in seasons 2010 (A) and 2018 (B).

papaya) (Simirgiotis et al., 2009). This suggests a metabolic divergence between both plant species.

All members of the Caricaceae family have laticifers and produce milky latex when a plant tissue is damaged. This latex helps as a defensive mechanism against predators and to heal wounded sites (Konno et al., 2004). The latex composition and proteinase quantity is different between species. The latex from C. papaya has been described and it consists of a mixture of hydrolase enzymes like quitinases and proteases, being the most characteristics papain, chymopapain, caricain (proteinase Ω) and glycil endopeptidase (proteinase IV) (El Moussaoui et al., 2001). There are some evidences showing that freeze-dried latex from V. pubescens has 5 to 8 times more proteolytic activity compare to the one obtained from C. papaya (Baeza et al., 1990). On the other hand, the extraction of V. pubescens latex and its separation into four different fractions named as CC-I to CC-IV was reported (Walraevens et al., 1993); two of them, CC-I and CC-III have been characterized and corresponded to papain and chymopapain, respectively. The mixture of latex proteinases has been used to treat gastric ulcers in rodent models (Mello et al., 2008). More recently, two different fractions were obtained from V. pubescens latex: a fraction showing high proteinase activity (CMS) and a fraction showing moderate to low proteinase activity (Teixeira et al., 2008). On the other hand, the fraction called P1G10 has been used for gastric ulcers and diabetic foot treatments in several wounded models (Tonaco et al., 2018). At the same time, the reduction of tumour mass in animals bearing melanoma and metastasis level was observed using this fraction (Lemos et al., 2018).

### THE CHILEAN STRAWBERRY

Fragaria L. is a member of the Rosaceae family. There are about 20 species distributed in temperate Eurasia and North and South America. Staudt recognizes four subspecies for F. chiloensis (L.) Mill. based on morphological traits with a worldwide distribution: two of them are located in North America (F. chiloensis subsp. lucida (E.Vilm. ex J. Gay) Staudt, and F. chiloensis subsp. pacifica Staudt), one in Hawaii (F. chiloensis subsp. sandwicensis (Decne.) Staudt) and the last one in South-America (i.e., Ecuador, Bolivia, Perú, Argentina and Chile) (F. chiloensis subsp. chiloensis Staudt) (Staudt, 1962; Bringhurst, 1990; Staudt, 1999). Staudt also proposed two botanical forms for the subspecies F. chiloensis subsp. chiloensis based on characters such as plant size, texture of leaves, color and fruit size: F. chiloensis (L.) Mill. subsp. chiloensis f. chiloensis Staudt, is a landrace that produces large fruit with white/pink receptacle and white flesh; and F. chiloensis (L.) Mill. subsp.chiloensis f. patagonica Staudt, that produces small red fruits (Staudt, 1962; Staudt, 1999). Shortly after the conquest of the Central-South part of Chile, location where the Mapuches used to live, F. chiloensis subsp. chiloensis was taken as a war trophy by the Spaniards and the species was introduced in Perú. Historical reports indicate that Garcilaso de la Vega in 1557 observed a new strawberry species in the land and markets of Cuzco City, different in size compared to the European species (Popenoe, 1921). Thereafter, the species was introduced in Ecuador by the Spaniards; although there are not exact records, it seems to be before 1789 (Popenoe, 1921). Interestingly, it was not until 1712 that F. chiloensis subsp. chiloensis was introduced in Europe by some explorers coming from Chile (Popenoe, 1921; Darrow, 1966; Liston et al., 2014). From an agronomic perspective, mainly in Chile and in minor scale in Perú and Ecuador, F. chiloensis subsp. chiloensis was cultivated as a fruit source during the first half of 1900 century, however when new cultivars reached from Europe and USA, the species was displaced and its cultivation was not favoured (Finn et al., 2013).

In botanical terms, F. chiloensis subsp. chiloensis is a perennial herb, with strong and well developed runners; trifoliate leaves; with unique or few flowers, which can be dioic or hermaphrodite. The berry is an aggregate fruit, which develops from a flower that contains several ovaries (Marticorena, 2019). The fertilized ovaries develop into achenes that resulted embedded on surface of the receptacle. The edible receptacle or thalamus is usually considered as the most attractive part in sensory terms (Figure 4).

Nowadays in Chile, the two botanical forms of F. chiloensis subsp. chiloensis grow in central-south part of Chile (Figure 1), and can be found between the O´Higgins and the Aysén Regions

FIGURE 4 | Morphological characteristics of Chilean strawberry fruit at the ripe stage. (A-D) correspond to Fragaria chiloensis f. chiloensis fruit distributed from North to South, from the localities of Hueló n Alto, Pelluhue, Puré n and Contulmo, respectively. (E-H) correspond to wild F. chiloensis fruit distributed from North to South, from the localities of Termas de Chillá n (see Figure 1), Curiñanco, Cuesta Gutié rrez and Cucao (see Figure 1), respectively.

(35°05'– 45°32' latitude South) (Lavin et al., 2000; Rodrıguez et al., ́ 2018), growing from forests (understory) to open vegetation environments (Kalkman, 2004) (Figure 1). These areas present contrasting conditions for plant development, with differences in light, temperature and nutrients availability (Moreira-Muñoz, 2011). The fruit phenotype of Chilean accessions is quite diverse. As a way to demonstrate this, plants collected from different geographic areas were established in one edaphoclimatic condition (Linares) (Supplementary Table 1). Cultivated F. chiloensis f. chiloensis species from Huelón Alto, Pelluhue, Purén, and Contulmo produces fruit of white flesh, white/pink receptacle of uniform medium size (Figures 4A–D). On the contrary, the fruit produced by wild plants of F. chiloensis f. patagonica collected in Termas de Chillán, Curiñanco, Cuesta Gutiérrez and Cucao produce small fruit size with red receptacle (Figures 4E–H). The white flesh fruit is commercially produced at small scale and its main use is for fresh consuming or to prepare beverages (Pardo and Pizarro, 2013).

F. chiloensis f. chiloensis (white fruit) is considered as one of the parentals for the commercial strawberry (F. × ananassa (Duchesne ex Weston) Duchesne ex Rozier) along with F. virginiana Mill., which have also been supported by molecular analysis (Njuguna et al., 2013; Tennessen et al., 2014; Edger et al., 2019). F. chiloensis f. chiloensis has been widely studied particularly for its distinctive exotic white-pink color and characteristic aroma (Hancock et al., 1999; Retamales et al., 2005; González et al., 2009b). Several studies have been published reporting the softening of this fruit, showing that when firmness is reduced cell wall polymers are broken down (Figueroa et al., 2008; Moya-León et al., 2019). Several genes and enzymes involved in cell wall disassembly have been characterized (Figueroa et al., 2008; Figueroa et al., 2009; Pimentel et al., 2010; Opazo et al., 2013; Méndez-Yáñez et al., 2017; Méndez-Yáñez et al., 2020). Softening requires the fine coordination of several molecular activities, including the participation of plant hormones and transcription factors (Handford et al., 2014; Carrasco-Orellana et al., 2018; Moya-León et al., 2019). The key role played by the molecular coordinators on the expression of cell wall degrading genes impacts softening and the postharvest quality of the fruit.

The plant and fruit of F. chiloensis f. chiloensis show tolerance to pathogens and diseases (González et al., 2013). In addition, the plant species has the ability to grow under different abiotic stress conditions (low temperatures, salty soils) which gives a great potential for breeding purposes (González et al., 2009a; Aceituno-Valenzuela et al., 2018). In this sense, the species can be considered as genetic source for strawberry breeding programs.

F. chiloensis subsp. chiloensis is located in one of the worldwide biodiversity hotspots. A hotspot is defined as an area with high diversity of endemic species (over 1,500), where 30% or less of this area should be threatened, and there are 34 areas recognized globally (Arroyo et al., 1999; Myers et al., 2000; Mittermeier et al., 2004). The Chilean Winter Rainfall-Valdivian Forests hotspot covers a great part of northern and southern Chile and contains 1,957 different endemic species (Mittermeier et al., 2004). A reduction of plant communities have been observed directly by a high change in soil use from forests to forest plantations or agriculture (Arroyo et al., 1999). Several species belong to this area, which are used for human consumption (Table 1). Climate change models predict for the

TABLE 1 | Native plants from the Chilean winter rainfall-Valdivian forest hotspot with potential in agriculture.


geographic area of this hotspot a decrease in the distribution range of plant species, in addition to, distribution displacements towards the south or to take refuge in the Andes Mountains (Pliscoff et al., 2012; Bambach et al., 2013).

### Diversity and Genetic Structure

A center of diversity assures sustainable genetic improvement for plant species. The wide range of environmental conditions covered by the Chilean hotspot can be used as a source of genetic adaptation to different habitats, conferring tolerance to several stresses or environmental factors. Besides, the loss of genetic diversity within a species can result in the loss of useful and desirable traits and may eliminate options to use untapped resources for food production, industry and medicine (Hoisington et al., 1999; Govindaraj et al., 2015).

The genetic diversity in Fragaria has been studied using molecular markers such as microsatellites (SSR) or dominant markers (ISSR, AFLP and RAPD), and more recently using a high throughput system such as single nucleotide polymorphisms (SNPs) array (Bassil et al., 2015; Hilmarsson et al., 2017; Jung et al., 2017). SSR markers, for example, were used with the purpose to identify F. × ananassa cultivars (Monfort et al., 2006; Shimomura and Hirashima, 2006; Honjo et al., 2011). The IStraw90 Axiom array is a genotyping platform for Fragaria able to identify SNPs, indels, and also is used for mapping (Mahoney et al., 2016; Nagano et al., 2017; Sooriyapathirana et al., 2019; Oh et al., 2020). In the case of F. chiloensis subsp. chiloensis a small number of accessions were analyzed and low genetic diversity and no genetic structure was reported (Carrasco et al., 2007; Oñate et al., 2018). The vegetative propagation of the species was used as argument for the low genetic variability found along Chile (Lavin et al., 2000), however despite of that, a breeding and improvement program on the species was considered feasible. Several crosses of the Chilean strawberry and Fragaria spp have been performed which have been evaluated for the generation of elite genotypes (Hancock et al., 2001; Finn et al., 2013). Recently, new hybrids were successfully obtained by crossing four different F. × ananassa cultivars with F. chiloensis f chiloensis, increasing diversity and facilitating new bridge species for breeding (Luque et al., 2019). The selection from the breeding programe can be facilitated and optimized by the use of molecular techniques (Edger et al., 2019; Oh et al., 2019).

### Phytochemistry

F. chiloensis subsp. chiloensis fruit contains an interesting composition of phenolics and high antioxidant activity. Ellagic acid and cinnamic acid glycosides are present in the white Chilean strawberry fruit (Cheel et al., 2005), and the fruit is also rich in phenolic antioxidants (Cheel et al., 2007; Simirgiotis et al., 2009). In addition, the fruit is characterized for having great aroma properties (González et al., 2009b). Differences in the volatile profiles have been described among commercial strawberry cultivars (Staudt et al., 1975; Forney et al., 2000), and most importantly between cultivated and wild species, being wild species those with the highest concentration of volatiles and better aroma properties (Ulrich et al., 2007). Commercial strawberries produce numerous volatile compounds including esters, aldehydes, ketones, alcohols, terpenes, furanones, and sulfur compounds (Latrasse, 1991; Zabetakis and Holden, 1997), nevertheless esters are the most abundant class of compounds (25% to 90% of total volatiles) (Pé rez et al., 1992), and provide the fruity notes of fresh ripe fruit (Pyysalo et al., 1979; Schreier, 1980). Some esters identified in the white Chilean strawberry fruit include ethyl acetate, methyl butanoate, 2-methyl acetate, octyl acetate, octyl butanoate, hexyl acetate, ethyl heptanoate, 2-hexenyl butanoate, benzyl acetate, and hexyl 2-methyl butanoate, which were also described in several F. × ananassa cultivars (Pyysalo et al., 1979; Zabetakis and Holden, 1997; Azodanlou et al., 2003; Berna et al., 2007; Gonzá lez et al., 2009b). Importantly, esters such as hexyl propanoate, ethyl 4-decenoate, 2-phenylethyl propanoate, and ethyl 2,4 decadienoate have been identified in F. chiloensis f.chiloensis and not in F. × ananassa (Gonzá lez et al., 2009b).

Anthocyanins are the color pigments of strawberry fruit. In red strawberries glycosylated anthocyanins derived from pelargonidin and cyanidin are normally present, being pelargonidin 3-glucoside the most abundant anthocyanin (Tulipani et al., 2008). In the white Chilean strawberry, the main anthocyanin is cyanidin 3-O-glucoside, which is mainly present in the achenes (Cheel et al., 2005; Salvatierra et al., 2010; Salvatierra et al., 2014). Importantly, anthocyanins have potential benefits and broad health-promoting effects for human considering the antioxidant activity (Chamorro et al., 2019; Hwang et al., 2019; Mazzoni et al., 2019).

The potential health benefits of strawberry fruit consumption have been analyzed, and some in vivo effects on the oxidative status in human and animal models are now available. Chilean white fruit aqueous extracts have a protective effect on platelet aggregation (Parra-Palma et al., 2018) and can exert protection on human epithelial gastric cells against free radical-induced damage (Avila et al., 2017). On the other hand, dietary supplementation of rats with white Chilean strawberry juice favors the normalization of oxidative and inflammatory effects in response to a liver injury induced by lipopolysaccharides (Molinett et al., 2015). These results support the idea to considerate the Chilean strawberry fruit as a functional food.

### MEDITERRANEAN CROP AND CLIMATE CHANGE

In general, papaya plants grow mostly in tropical and subtropical Andes. In Chile, as shown in Figure 1, mountain papaya (V. pubescens) is cultivated in coastal areas between latitudes 30° to 36° South, which corresponds to a climate with subtropical influence characterized by moderate thermal ranges and low occurrences of frosts. Recently, Sarricolea et al. (2017) carried out a climatic classification of Chile, and proposed that the cultivated populations of V. pubescens could be cultivated in 4 climatic types of Köppen-Geiger (see Figure 5 and Supplementary Table 1). On the other hand, regarding the cultivation of strawberry in Chile, both F. × ananassa and F. chiloensis f. chiloensis grow between latitudes 32° to 37° South, in coastal areas of Chile, where climate corresponds to a Mediterranean type, with 8 variants according to Köppen-Geiger climate classification (see Figures 6 and 7 and Supplementary Table 1, Sarricolea et al., 2017).

The Intergovernmental Panel on Climate Change (IPCC) predicts that mid-latitude areas would be affected by the increment in drought, with irregular rainfall regimes and rising temperatures (Parajuli et al., 2019). These locations are the cultivation areas where main crops are worldwide concentrated, which also include our both species under study. The average temperature and precipitation within the area of cultivation for V. pubescens and F. chiloensis can be estimated using the information provided by climate models such as WorldClim. WorldClim 1 considers the interpolation of observed climatic data representative for the period of 1960–1990 (Hijmans et al., 2005), and WorldClim 2 considers climatic data between 1970-2000 (Fick and Hijmans, 2017). Although WorldClim defines 19 bioclimatic variables, only Annual Mean Temperature (BIO 1), Min Temperature of Coldest Month (BIO 6) and Annual Precipitation (BIO 12) were used in this analysis as they provide robust differences between sites. Aridity index (AI) was determined using BIO1 and BIO12, which correlates the precipitation (annual or monthly) with mean temperature (also annual or monthly), and allows the classification of study sites as desert, meadow or forest (Wang and Takahashi, 1999). High aridity index values indicate a site with more water availability. Considering that the increase in frosts could affect crops in mid-latitude areas BIO 6 variable was also considered (Parajuli et al., 2019).

In the case of V. pubescens´s distribution, the model predict a similar precipitation behavior for each location along the years, with values from North to South ranging from 90 mm average in Limarı́ to 1,515 mm average at the southern end of the distribution (Lleu-Lleu) (Supplementary Figure 1). Interestingly, Squeo et al. (1999) indicated that the annual rainfall for the area of La Serena (north of Limarı) was reduced from 180 mm at the beginning of 1900 to ́ about 80 mm by the year 2000, which evidences a precipitation drop of 40% during the last century. On the other hand, a slight decrease in the average annual temperature (BIO1) and a drop in the minimum temperature for the coldest month by at least 2.4 °C can be observed (Figure 5, Supplementary Figure 1); this drop is mainly observed in southern locations. Finally, no differences in aridity index are observed between the data provided by WorldClim 1 and WorldClim 2 for mountain papaya locations (Figure 5, insert).

The climatic behavior for growing locations of domesticated F. chiloensis f. chiloensis is shown in Supplementary Figure 2. Average annual temperature (BIO1) for the locations are the same in both models; however there is a reduction of about 1.7° C in the minimum temperature of the coldest month (BIO6) in WorldClim 2 compared to WorldClim1 (Figure 6). On the other hand, a dramatic reduction in rainfall is observed in southern locations (Purén) (around 100 mm) in WorldClim 2 compared to WorldClim 1 (Supplementary Figure 2). In general terms, there is a reduction in aridity index especially in the southern locations of domesticated strawberry (Figure 6, insert), which is also accompanied with a reduction in minimum temperatures during winter. As suggested by

FIGURE 5 | Climatic characterization of cultivated localities of V. pubescens distributed from North to South. The comparison of Bioclimatic variable 6 (Min temperature of coldest month) between WorldClim 1 and 2 for each locality is shown in the map. In the insert, De Martonne Aridity Index (AI) for each locality. On the X axis, the localities of Limarı́and Lipimá vida are marked as reference, as previously shown in Figure 1.

Bambach et al. (2013) these climate changes could affect the current distribution of the species.

Finally, the analysis of climate change for the locations of F. chiloensis f. patagonica is shown in Supplementary Figure 3. It is more complex to interpret as there is a latitudinal effect (North – South distribution of the localities) in addition to a longitudinal component (ie, East – West; mountain - seaside). Therefore, we will focus the comparison between the localities of Termas de Chillán and Cucao (Figures 1 and 7). Populations distributed between 36–40° South are located in the Andes mountain (nearby Termas de Chillán), meanwhile those distributed further south (42–46° South) are located at sea level (close to Cucao) (Supplementary Figure 3). A significant reduction in the minimum temperature of the coldest month of 4.2°C has been reported for Termas de Chillán location in WorldClim 2 compared to WorldClim 1 (Figure 7); a smaller reduction in minimum temperature was reported for Cucao (2.4°C) (Supplementary Figure 3). On the other hand, there are no significant differences in the average annual temperature for both locations, neither in the annual precipitation in Termas de Chillán (Supplementary Figure 3); nevertheless an increase of about 230 mm is reported in Cucao comparing WorldClim 2 and

WorldClim 1. In terms of aridity index, there is a significant increment of around sixteen points for the location of Cucao in WorldClim 2 compared to WorldClim 1; a smaller increment in aridity index (4 points) was also observed in Termas de Chillán (Figure 7, insert). The particular climate properties of the Andes Mountains (high precipitation level, lower annual average temperature) could constitute a good refuge for this wild species, restricting current distributions to these areas (Bambach et al., 2013; Alarcón and Cavieres, 2015). In the case of cultivated species, the Andes Mountains can also constitute a refuge for the species, but its displacement to southern regions could be dedicated for agriculture development (Parajuli et al., 2019).

# CROP POTENTIAL

Vasconcellea species has an interesting potential as a new crop, considering potential breeding for fruit and other sub-products and its domestication in specific geographic areas (Scheldeman et al., 2011). The two most important species are the babaco

Figure 1.

(Vasconcellea x heilbornii) in Ecuador and Colombia, and mountain papaya (Vasconcellea pubescens), in all Andean countries and particularly important at small commercial scale in Colombia (Scheldeman et al., 2011) and Chile (Moya-León et al., 2004). Interestingly, the cultivated surface of mountain papaya almost disappears after the earthquake and Tsunami of 2010 in the Maule Region, mainly because natural growing areas of the species were salinized and damaged by seawater. In the Maule Region, toxic levels of salty soil inhibit the growth of V. pubescens plants, because the content of salt reduces water availability to plants. In addition, the reduction in soil pH in depth (1.4 units) and surface (0.7 units) resulted as consequence of the ionic strength generated and the coarse-textured soil (Casanova et al., 2016). Remarkably, an increase in the cultivated surface of this species has been recently observed in the northern extreme of Chile (Arica and Parinacota Region, ~ 18° 30' latitude South), where 13.6 ha were recorded that represents 9% of total cultivated surface for the species in Chile (ODEPA-CIREN, 2019).

F. chiloensis subsp. chiloensis has also potential as a new crop. Firstly, the production of phytochemicals with health benefits highlights the opportunity to use the species as a functional food. Secondly, the species has great resistance to several diseases (Rojas et al., 2013). Although Aphidborne virus (ABVs) has been detected in F. chiloensis subsp. chiloensis plants non visual symptoms of disease were determined, albeit the virus affects severely F. vesca plants. Similarly, F. chiloensis f chiloensis shows tolerance to Botrytis cinerea, a severe fungal disease which affects dramatically several fruit species (González et al., 2013). Also, it was reported that the Chilean strawberry presented higher tolerance to the fungus compared to F. × ananassa.

Ploidy and hybridization are the basis of current cultivated crops. Importantly, wild relatives have impacted positively the availability of crops, allowing the expansion of cultivation areas or increasing the tolerance to pest and diseases. In a climate change scenario, a breeding program is needed in order to obtain smart cultivars that can face the new challenges as a way to ensure sustainable production in the future (Borrell et al., 2020). Fruit homogeneity and other phenotypic characters have been the main effort of plant breeders, but new characteristic must be incorporated in modern crops, such as, tolerance to cold and high temperature environment, tolerance to drought or salinity and disease resistance (Doebley et al., 2006). May be taking into account the genetic variability of native species for those traits and using strategies like introgression or hybridization, the genetic improvement of these valuable crops can be possible.

Genomic selection has aided marker assisted selection to gain time during crop improvement (Heffner et al., 2010). Moreover, the incorporation of new tools for molecular breeding, such as genome sequencing, SNPs, or genome editing, facilitates the identification of new inbreeds; this strategy has been defined as design breeding (Heffner et al., 2011; Fernie and Yan, 2019).

Domestication of non-domesticated plants is a great opportunity to obtain cultivars well adapted to different edaphoclimatic niches. The long period of time required for selection from breeding programs can be accelerated by the use of a biotechnology approach and new genome editing techniques (Osterberg et al., 2017; Scheben et al., 2017; Wallace et al., 2018). Clearly, under the climate change scenario new type of cultivars are

### REFERENCES


needed. The better adaptability to a variety of climatic conditions is a priority in current time as a way to reduce the negative impact of less food production in vulnerable environments.

Polyploidy can be a challenge for Chilean strawberry, but combining the information from genetic maps, the functional characterization of specific genes, and genome sequencing should be the starting point for applying this modern strategy (Rousseau-Gueutin et al., 2008; Davik et al., 2015; Vining et al., 2017; Edger et al., 2019; Moya-León et al., 2019). In the case of Vasconcellea pubescens there is no genomic or functional genomic data available until now, which can limit the breeding effort. However, wild plants are already adapted to a wide range of climatic conditions, considering their native habitat at high altitude of the Andes mountains. Chilean papaya is attractive as functional food and source of natural sub-products as papain proteases which can be the target for molecular breeding. For certain, the implementation of what is called breeding 4.0 or "de novo domestication" should provide new well adapted smart cultivars.

### AUTHOR CONTRIBUTIONS

The present work was conceived and designed by RH, CG-E, LL, PP, and MM-L. LL and CG-E contributed collecting information and preparing figures. All authors contributed to the article and approved the submitted version.

### ACKNOWLEDGMENTS

This study was funded by FONDECYT grant 1171530. The funders had no part in the design of the study or collection, analysis, and interpretation of data, neither in writing the manuscript.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2020.01002/ full#supplementary-material


Badillo, V. M. (1993). Caricaceae. Segundo esquema (Maracay, Venezuela: Rev. Fac. Agron. Univ. Cent. Venezuela, Alcance 43).


activity as judged by its structural and biochemical characterization. Plant Physiol. Biochem. 119, 200–210. doi: 10.1016/j.plaphy.2017.08.030


Loess Plateau, China. J. Climate 12, 244–257. doi: 10.1175/1520-0442(1999) 012<0244:ALSWDM>2.0.CO;2

Zabetakis, I., and Holden, M. A. (1997). Strawberry flavour: Analysis and biosynthesis. J. Sci. Food Agric. 74, 421–234. doi: 10.1002/(SICI)1097-0010 (199708)74:4<421::AID-JSFA817>3.0.CO;2-6

Conflict of Interest: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Letelier, Gaete-Eastman, Peñailillo, Moya-Leon and Herrera. This ́ is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Utilizing Wild Cajanus platycarpus, a Tertiary Genepool Species for Enriching Variability in the Primary Genepool for Pigeonpea Improvement

### Shivali Sharma1\*, Pronob J. Paul <sup>1</sup> , CV Sameer Kumar <sup>2</sup> and Chetna Nimje<sup>3</sup>

### Edited by:

Eric Von Wettberg, University of Vermont, United States

### Reviewed by:

Paul Kiprotich Kimurto, Egerton University, Kenya Sivakumar Sukumaran, International Maize and Wheat Improvement Center, Mexico Petr Smy´ kal, Palacky´ University, Olomouc, Czechia

> \*Correspondence: Shivali Sharma shivali.sharma@cgiar.org

### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 09 January 2020 Accepted: 26 June 2020 Published: 23 July 2020

### Citation:

Sharma S, Paul PJ, Sameer Kumar CV and Nimje C (2020) Utilizing Wild Cajanus platycarpus, a Tertiary Genepool Species for Enriching Variability in the Primary Genepool for Pigeonpea Improvement. Front. Plant Sci. 11:1055. doi: 10.3389/fpls.2020.01055 <sup>1</sup> Theme Pre-breeding, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India, <sup>2</sup> Regional Agricultural Research Station, Professor Jayashanker Telangana State Agricultural University, Palem, India, <sup>3</sup> Grain Quality Lab., International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India

The use of crop wild relatives in the breeding program has been well recognized to diversify the genetic base along with introgression of useful traits. Cajanus platycarpus (Benth.) Maesen, an annual wild relative belonging to the tertiary genepool of pigeonpea, possesses many useful traits such as early maturity, high protein content, photoperiod insensitivity, and pod borer tolerance for the genetic improvement of cultivated pigeonpea. Using this cross incompatible wild Cajanus species, an advanced backcross population was developed following the embryo rescue technique. In the present study, a prebreeding population consisting of 136 introgression lines (ILs) along with five popular varieties (used as checks) was evaluated for important agronomic traits during 2016 and 2017 rainy seasons and for grain nutrient content during 2016, 2017, and 2018 rainy seasons. Large genetic variation was observed for agronomic traits such as days to 50% flowering, number of pods per plant, pod weight per plant, grain yield per plant, and grain nutrients [protein content, grain iron (Fe), zinc (Zn), calcium (Ca), and magnesium (Mg)] in the pre-breeding population. Significant genotype × environment interaction was also observed for agronomic traits as well as grain nutrients indicating the sensitivity of these traits to the environments. No significant correlations were observed between grain yield and grain nutrients except grain Zn content which was negatively correlated with grain yield. Overall, 28 promising high-yielding ILs with high grain nutrient content were identified. These ILs, in particular, ICPP # 171012, 171004, 171102, 171087, 171006, and 171050 flowered significantly earlier than the popular mega variety, ICPL 87119 (Asha) and thus hold potential in developing new short-duration cultivars. The comprehensive multi-site assessment of these high-yielding, nutrient-rich accessions would be useful in identifying region-specific promising lines for direct release as

**254**

cultivars. Moreover, these ILs are expected to replace the popular existing cultivars or for use as new and diverse sources of variations in hybridization programs for pigeonpea improvement.

Keywords: Cajanus platycarpus, pre-breeding, introgression lines, wild Cajanus species, pigeonpea, grain nutrients, short-duration, photo-insensitivity

# INTRODUCTION

Pigeonpea [Cajajus cajan (L.) Milllspaugh] is an important often-cross pollinated grain legume crop of semi-arid tropics grown under subsistence agriculture. Globally, it is cultivated on a 7.02 m ha area with an annual production of 6.81 m t (FAOSTAT, 2017) mainly in Asia, Africa, and Latin America. India is the largest producer and consumer of pigeonpea in the world where dry, dehulled split seeds are consumed as "daal", a source of protein-rich (22–24%) food. Besides protein, pigeonpea seeds are also rich in carbohydrates, minerals, crude fibre, iron (Fe), sulphur, calcium (Ca), potassium (K), manganese (Mn) and water-soluble vitamins especially thiamine, riboflavin, and niacin (Saxena et al., 2010). In India, pigeonpea is the second most important legume after chickpea accounting for 5.39 m ha area and 4.87 m t of production (FAOSTAT, 2017). Besides mature seeds, immature green tender pods and seeds are consumed as vegetables mainly in Kenya, Tanzania, and Malawi. The crop is also grown for other uses such as fodder, medicine, rearing lac producing insects, fuelwood, and improving soil fertility through biological nitrogen fixation.

Narrow genetic base of cultivated pigeonpea and repeated use of a few elite breeding lines such as T-1 and T-90 (Kumar et al., 2004) in breeding programs are the major factors hindering its genetic improvement. Further, various biotic and abiotic stresses cause huge yield losses in pigeonpea worldwide and high levels of resistance/tolerance are not available in cultivated genepool. As a result, despite large breeding efforts in India and elsewhere, pigeonpea productivity is stagnant around 0.8-0.9 t ha-1. Major biotic stresses affecting pigeonpea are pod borers (Helicoverpa armigera Hubner, Maruca vitrata Geyer), and pod fly (Melanagromyza chalcosoma Spencer) among insect-pests and fusarium wilt (Fusarium udum Butler), sterility mosaic disease (SMD), and phytophthora blight (Phytophthora drechsleri Tucker) among diseases. Pigeonpea crop is also sensitive to abiotic stresses such as terminal drought, water-logging, salinity, and frost/cold. The protein advisory group of the United Nations has emphasized on improvement of the nutritional quality of proteins besides improving the productivity, adaptability, and yield stability of grain legumes.

One of the key factors for a successful crop improvement program is the availability of sufficient genetic variability. Over 13,200 accessions of cultivated pigeonpea and 555 accessions of wild species belonging to genus Cajanus from 60 countries are conserved in ICRISAT genebank. These germplasm accessions, based on the crossability relationship with cultivated pigeonpea, are grouped into three genepools with cultivated germplasm in the primary genepool (GP 1), all cross-compatible species, C. acutifolius (F.Muell.) Maesen, C. albicans (Wight & Arn.) Maesen, C. cajanifolius (Haines) Maesen, C. cinereus (F.Muell.), C. confertiflorus F. Muell., C. lanceolatus (W.Fitzg.) Maesen, C. latisepalus Maesen, C. lineatus (Wight & Arn.) Maesen, C. reticulatus (Dryand.) F.Muell., C. scarabaeoides (L.) Thouars, C. sericeus (Baker) Maesen, C. trinervius (DC.) Maesen in the secondary genepool (GP 2), and the cross-incompatible species, C. crassus (King) Maesen, C. goensis Dalzell, C. mollis (Benth.) Maesen, C. platycarpus (Benth.) Maesen, C. rugosus (Wight & Arn.) Maesen, C. heynei, C. kerstingii, C. volubilis, and other Cajaninae such as Rhynchosia Lour., Dunbaria W. and A., Eriosema (DC.) Reichenb in the tertiary genepool (GP 3). Wild Cajanus species are the reservoirs of many useful genes/alleles and can be used to enrich variability in the primary genepool for developing new broad-based cultivars with increased plasticity (Sharma et al., 1993; Rao et al., 2003; Sharma and Upadhyaya, 2016; Sharma et al., 2019). Introgression of useful genes/alleles from the wild Cajanus species would help to break the yield plateau in pigeonpea. In chickpea, interspecific derivatives having high yield and resistance for wilt, foot rot, and root rot diseases (Singh et al., 2005) as well as for cyst nematode (Malhotra et al., 2002) were developed from crosses involving C. reticulatum. Similarly high‐yielding, cold‐tolerant lines with high biomass (ICARDA, 1995) and resistance to phytophthora root rot were developed from interspecific crosses involving C. echinospermum (Knights et al., 2008). Frequent utilization of wild Cajanus species in breeding programs is hindered due to cross incompatibility barriers and linkage drag. Among wild species, C. platycarpus is of particular interest to the pigeonpea breeders due to several useful traits such as extra‐early flowering and maturity (Saxena et al., 1996), photoperiod insensitivity, prolific flowering and podding, high harvest index, annuality and rapid seedling growth, and resistance/tolerance to biotic and abiotic stresses such as pod borer (Sujana et al., 2008), Fusarium wilt (Saxena et al., 1990), phytophthora blight (Ariyanayagam and Spence, 1978; Pundir and Singh, 1987; Dundas, 1990), nematodes (Sharma, 1995), sterility mosaic (Lava Kumar et al., 2005) and salinity (Subbarao, 1988). Using this cross incompatible wild Cajanus species, C. platycarpus, a backcross population was developed following embryo rescue technique (Mallikarjuna et al., 2011).

Linkage drag is the most common problem associated with the utilization of wild species in breeding programs. Hence, utilization of wild species in creating new genetic variability will be successful only when introgression lines (ILs) with useful traits and acceptable agronomic performance are developed and made available to breeders for direct use in breeding programs. Therefore, the present investigation was carried out a) to study the genetic variability in the advanced backcross population derived from C. platycarpus for important agronomic traits and grain nutrient, and b) to identify stable promising traitspecific ILs with minimum linkage drag for ready use in breeding programs to develop new cultivars with a broad genetic base.

# MATERIALS AND METHODS

## Plant Material and Field Evaluation

Using a cross incompatible wild Cajanus species, C. platycarpus accession ICPW 68, and a popular pigeonpea cultivar ICPL 85010, a backcross population was developed following embryo rescue technique (Mallikarjuna et al., 2006). ICPW 68, originated from Uttar Pradesh, India, is extra-early flowering accession having high seed protein content (Saxena et al., 1996) and pod borer resistance (Sujana et al., 2008). ICPL 85010, also known as "Sarita", is a short duration, determinate type pigeonpea variety having medium seed size (9.5 g 100-seed weight), which is cultivated in the Indian Subcontinent (Dahiya et al., 2001). Embryo rescue and tissue culture techniques were followed as described by Mallikarjuna and Moss (1995). The details of the population development have been documented by Mallikarjuna et al. (2006). The advanced backcross population consisting of 136 ILs in BC4F10 generation was used in this study. The 136 ILs along with five popular varieties (used as checks) of different maturity durations [ICPL 87119 (also known as "Asha") ICP 8863 (Maruti), ICPL 20325, ICPL 85010, and ICPL 88039] were evaluated for different agronomic traits during the 2016 and 2017 rainy seasons and grain nutrients during the 2016, 2017, and 2018 rainy seasons at ICRISAT, Patancheru, Telangana, India (17°51′N, 78°27′E; 545 m). Among the checks, ICPL 87119 is a popular mega variety in the medium maturity duration group that is being widely cultivated in India over the past two decades; ICP 8863 is a medium-duration, high-yielding pigeonpea variety resistant to fusarium wilt which is popular in Karnataka, India; ICPL 20235 and ICPL 88039 are super-early and early maturing pigeonpea varieties, respectively. Accessions were planted in black soil (Vertisols) precision field in the first week of July in all years in an augmented design. Each check was placed after every 10 entries in each block and total lines were divided into three blocks. Each accession was sown in a single 4 m long row in a ridge and furrow system with a plant-to-plant spacing of 20 cm and row to row spacing of 75 cm. A standard package of practices was followed to raise a healthy crop. Manual weeding and spraying of insecticide were done to control weeds and insectpests damage. The weather data of 2016 and 2017 crop season at ICRISAT, Patancheru, India is given in Supplementary Figure 1. Data were recorded on eight agronomic traits [days to first flowering, days to 50% flowering, plant height (cm), number of primary branches, number of secondary branches, number of pods per plant, pod weight per plant (g), and grain yield per plant (g)], and five grain nutrients [protein content (%), iron (Fe in mg kg-1), zinc (Zn in mg kg-1), calcium (Ca in g kg-1), and magnesium (Mg in g Kg-1) content]. Data on grain nutrients and two agronomic traits namely days to first flowering, and days to 50% flowering were recorded on a plot basis, whereas data on remaining agronomic traits (plant height, primary branches per plant, secondary branches per plant, pods per plant, pod weight per plant and grain yield per plant) were recorded on five randomly selected representative plants per plot following pigeonpea descriptors (IBPGR and ICRISAT, 1993). All the lines were harvested and threshed manually.

## Estimation of Grain Nutrient Content

For estimating the nutrient contents of Fe, Zn, Mg, Ca, and protein in grains, seeds of 136 ILs along with five checks were cleaned thoroughly and special care was taken during cleaning to prevent contamination of seeds with dust and metal particles. Seeds were washed with distilled water for a few seconds and dried in hot air at 40°C for 2 h to remove the dust and metal particles. Well-cleaned random seed samples were used for estimating grain protein, Fe, Zn, Ca, and Mg contents at the Charles Renard Analytical Laboratory, ICRISAT, Patancheru, India. The four dietary minerals- Fe, Zn, Ca, and Mg contents were assessed by nitric acid and hydrogen peroxide digestion accompanied by inductively coupled plasma optical emission spectrometry (ICP-OES) (Wheal et al., 2011). The sulfuric acidselenium digestion method was adopted for the estimation of grain protein followed by the estimation of total nitrogen (N) in a SKALAR SAN++ SYSTEM autoanalyzer and the measurement of protein percentage as N percent × 6.25 conversion factor (Sahrawat et al., 2002).

# Statistical Analysis

Eight agronomic traits and five grain nutrients were analyzed separately for each rainy season and pooled over the two seasons for agronomic traits, and three seasons for grain nutrients using residual maximum likelihood (REML) in GenStat 15 (https:// www.vsni.co.uk/) in mixed model approach considering genotypes as random effect and environment as fixed effect. The significance of environments was tested using Wald's statistic. Variance components due to genotype (s<sup>2</sup> <sup>g</sup> ) and genotype × environment (s<sup>2</sup> <sup>g</sup><sup>e</sup>) interaction and their standard errors (SE) were estimated. Best linear unbiased predictors (BLUPs) were obtained for agronomic traits and grain nutrients for each accession for individual environment as well as pooled over the environments. Based on BLUPs, the range, mean, variances and broad-sense heritability (H<sup>2</sup> ) were estimated. Phenotypic correlations were estimated to determine trait associations in GenStat 15. Path analysis was performed to estimate the direct effect of the traits towards grain yield using R Version 3.5.3 (R Project for Statistical Computing, http://www.rproject.org/). To avoid the multicollinearity issues, two independent traits, days to 50% flowering and pod weight per plant were excluded while performing path analysis. Using the R package cluster (Patterson and Thompson, 1971), the Euclidean dissimilarity matrix was constructed using agronomic traits and grain nutrients and the accessions were clustered following Ward's method. Further, accessions with high grain yield and high grain Fe, Zn, Ca, Mg, and protein content were identified. Using the Euclidian distance matrix, the most diverse accession pairs were identified for potential use as parents in pigeonpea crossing programs.

### RESULTS

### Variance Components and Trait Variability

The REML analysis showed significant variations among ILs (s<sup>2</sup> g ) for all the eight agronomic traits in both 2016 and 2017 rainy seasons and all the five grain nutrients in 2016, 2017 and 2018 rainy seasons indicating the presence of significant variability among the ILs for these traits. Pooled analysis also showed significant genetic variance (s<sup>2</sup> <sup>g</sup> ) and significant G×E interactions for all agronomic traits and grain nutrients (Table 1).

A large variation was observed amongst the ILs for all the agronomic traits and grain nutrients in each season as well as in the pooled analysis (Table 2). It is evident that some of the ILs performed better than the popular variety ICPL 87119 and the recurrent parent ICPL 85010 for these traits. A flowering window of around 60 days (76–141 days range) was observed for days to 50% flowering showing substantial population variability while popular variety ICPL 87119 took ~123 days to 50% flowering. None of the ILs flowered earlier than the recurrent parent ICPL 85010 (67 days to 50% flowering). Large variation was also noted amongst the ILs for plant height (above 100 cm: ~128–272 cm) when compared with the popular variety ICPL 87119 (on an average 222 cm tall) (Table 2). Similarly, number of pods per plant were much higher in the ILs (up to 386 pods in 2016 and 303 pods per plant in 2017) compared to ICPL 87119 (average 171 pods per plant) and the cultivated parent ICPL 85010 (average 126 pods per plant) in both seasons. Similar pattern was observed in 2016, 2017, and in pooled analysis for other traits such as grain yield per plant (up to 83 g in ILs compared to 48 g in ICPL 87119 and 25 g in ICPL 85010), protein content (up to 23% in ILs compared to 19% in ICPL 87119 and 21% in ICPL 85010), Fe (up to 43 mg kg-1 in ILs compared to 32 mg kg-1 in ICPL 87119 and 33 mg kg-1 in ICPL 85010), Zn (up to 42 mg kg-1 in ILs compared to 29 mg kg-1 in ICPL 87119 and 41 mg kg-1 in ICPL 85010), Ca (up to 3.1 g kg-1 in ILs compared to ~1.0 g kg-1 both in ICPL 87119 and ICPL 85010), and Mg (up to 1.6 g kg-1 in ILs compared to ~1.3 g kg-1 in ICPL 87119, and 1.1 g kg-1 in ICPL 85010) (Table 2).

### Associations Between Agronomic Traits and Grain Nutrients

The correlation analysis in 2016 (Table S1), 2017 (Table S1), and pooled analysis over years (Figure 1) showed that grain yield per plant was significantly and positively associated with days to first flowering (r = 0.38 in 2016, 0.42 in 2017 and 0.49 in pooled), days to 50% flowering (r = 0.40 in 2016, 0.39 in 2017 and 0.48 in pooled), plant height (r = 0.24 in 2016, 0.42 in 2017 and 0.37 in pooled), number of secondary branches (r = 0.50 in 2016, 0.24 in 2017 and 0.42 in pooled), pods per plant (r = 0.92 in 2016, 0.84 in 2017 and 0.90 in pooled), and pod weight per plant (r = 0.99 in 2016, 0.95 in 2017 and 0.98 in pooled) (Table S1, Figure 1). Number of primary branches had significantly positive correlation with grain yield per plant in 2017 (r = 0.45) and pooled over years (r = 0.25) but no correlation was observed in 2016 rainy season.

Similarly, among the grain nutrients, protein content in seeds showed significantly positive association with three nutrients, Fe (r = 0.23 in 2016, 0.32 in 2017, 0.34 in pooled), Zn (r = 0.44 in 2016, 0.32 in 2017, 0.40 in pooled), and Mg (r = 0.19 in 2016, 0.21 in 2017, 0.19 in pooled). Significantly positive correlation was observed between Fe and Zn content (r = 0.37 in 2016, 0.27 in 2017 and 0.37 in pooled) and between Ca and Mg content (r = 0.64 in 2016, 0.60 in 2017 and 0.67 in pooled) in year-wise and pooled analysis.

In the present study, no significant correlation was found between grain yield per plant with grain nutrients (Grain protein, Ca, Fe) in 2016 (Table S1), 2017 (Table S1), and pooled over years (Figure 1). Grain yield per plant showed a significantly negative correlation (r = -0.28 in 2016, -0.42 in 2017, and -0.43 in pooled) with Zn content. Further, the path analysis in 2016 (Table S2), 2017 (Table S2), and pooled analysis over the years


TABLE 1 | Variance components due to genotypes (s<sup>2</sup> g ), genotype × environment (s<sup>2</sup> ge) interactions and their standard errors (SE) for agronomic traits and grain nutrients of C. platycarpus derived introgression lines evaluated during 2016, 2017, and 2018 rainy seasons at ICRISAT, Patancheru.

\*Significant at P ≤ 0.05; \*\*Significant at P ≤ 0.01.


revealed that pods per plant is the major direct contributor to grain yield (0.872 in 2016, 0.765 in 2017, and 0.813 in pooled). The other two traits which showed a major contribution towards yield were number of secondary branches per plant and plant height (Table S2).

### Identification of Promising Introgression Lines (ILs)

To understand the potential of these ILs derived from wild species in improving cultivated pigeonpea, the average performance of these lines over two years was compared with the popular variety, ICPL 87119 and the recurrent parent, ICPL 85010. None of the ILs flowered earlier than ICPL 85010 (days to 50% flowering <70 days). However, 104 ILs flowered significantly earlier (days to 50% flowering: 81–119 days) than ICPL 87119 (123 days). The majority of the ILs (> 100 lines out of 136 lines) were found significantly better than recurrent parent ICPL 85010 for most of the agronomic traits such as pods per plant, pod weight per plant and grain yield per plant (Table 3). In comparison with ICPL 87119 (~48 g), 28 ILs (~20% of the backcross population) had significantly higher grain yield per plant (~52–83 g) (Table 3). Besides, many ILs were found significantly better than ICPL 87119 in terms of plant height (50 ILs with 139–218 cm height), number of pods per plant (53 ILs with 189–321 pods) and pod weight per plant (55 ILs with 67–123 g). A few ILs also had a higher number of primary branches compared to ICPL 87119 (Table 3).

For grain nutrients, all the 136 ILs were found to have higher protein content (~19–23%) than the popular variety ICPL 87119 whereas, 104 ILs (~34–42 mg kg-1) in grain Fe content, 102 ILs (~32–42 mg kg-1 ) in grain Zn content, 74 ILs (1.35–3.09 g kg-1) in grain Ca content, and 45 ILs (1.35–1.59 g kg-1) in grain Mg content were found significantly found better than ICPL 87119 (Table 3). Above 50 ILs were found promising than ICPL 85010 for most of the grain nutrient contents except Zn content (Table 3). Top five trait-specific ILs for each agronomic trait and grain nutrients are given in Table 3.

A total of 28 promising high-yielding ILs, which performed better than the popular variety, ICPL 87119, were identified (Table 4). Most of these high-yielding ILs exhibited higher amounts of five grain-nutrient contents. Remarkably, one line ICPP 171012 was found early (50% flowering: 109 days) having a high number of pods per plant (301), pod weight per plant (108 g), grain yield per plant (73 g), along with better grain nutrient contents such as high protein (21%), grain Fe (35mg kg-1), grain Zn (36.16 mg kg-1), grain Ca (1.39 g kg-1), and grain Mg (1.30 g kg-1) content compared to the best check ICPL 87119. Similarly, ILs ICPP # 171004, 171102, 171087, 171006, and 171050 were found most promising in terms of early flowering (109–116 days), high yielding (~62–65 g) and other agronomic traits along with higher grain nutrient contents than ICPL 87119 (Table 4).

### Cluster Analysis

Cluster analysis is performed to categorize lines into distinct groups/ clusters wherein genotypes in different clusters are more diverse than within a cluster (Ward, 1963) and is useful in selecting the most diverse genotypes to be used as parents in crossing programs.

FIGURE 1 | Correlation analysis of various agronomic and grain nutrient traits in pigeonpea pre-breeding population derived from C. platycarpus at ICRISAT, Patancheru. Days to flowering (DF & DF50), plant height (PH), primary branches (NPB), secondary branches (NSB), pods per plant (PPP), and pod weight per plant (PWPP) were positively associated with grain yield per plant (GYPP). Zn concentration was correlated negatively with grain yield per plant. On the other hand, no significant relationship was found for grain yield per plant with grain protein content, grain Fe, Ca and Mg content. It was observed that grain Fe and Zn as well as grain Ca and Mg were positively correlated. \*\*Significant at P ≤ 0.01; \*Significant at P ≤ 0.05.

In this study too, a hierarchical cluster analysis based on all eight agronomic traits and five grain nutrients over two seasons was performed to group the introgression lines into different clusters. The cluster analysis following Ward's method resulted in 10 clusters (Table S3; Figure 2). Cluster 6 was the largest cluster consisting of 27 ILs followed by cluster 3 (21 ILs) and cluster 7 (20 ILs). Cluster 2 had only two genotypes, ICPL 85010 and ICPL 20325; both are early maturing cultivars (50% flowering: 68 days) (Figure 2). All early maturing ILs were grouped into cluster 4 (50% flowering: 89 days). Cluster 3 had the highest cluster mean for grain yield per plant (55.12 g) followed by cluster 6 (53.33 g) whereas cluster 6 had the highest number of pods per plant (225) followed by cluster 3 (213). Cluster 10 exhibited the lowest means for grain yield per plant (~21 g) and pod weight per plant (~32 g) (Table S3). The popular mega variety, ICPL 87119 was grouped into cluster 3 along with the highest yielding introgression line ICPP I71119. Cluster 10 was found to be the best cluster in terms of high Fe content (40 mg Kg-1) but had the lowest grain yield per plant. Also, cluster 4 had highest mean for Zn content (38 mg Kg-1, "Cluster 2" has not been considered as it does not hold any ILs) and cluster 8 was found to have ILs with the maximum mean value for grain Ca (2.15 g Kg-1) and Mg (1.48 g Kg-1) content (Table S3).

The Euclidian distance matrix was estimated to identify the most diverse pair of ILs among the advanced backcross population as well as to identify the most similar and diverse ILs to the popular variety ICPL 87119 and the recurrent parent ICPL 85010 (Table S4). ICPP 171027 was found to be the most diverse (9.09) when compared with ICPL 87119, while ICPP 171031 (2.60) was the most similar IL with ICPL 87119. Likewise, ICPP 171119 was the most diverse (12.42) and ICPP 171033 was the most similar (5.68) accession to the recurrent parent ICPL 85010. Among the 136 ILs, the most diverse pair of accessions were ICPP 171119 and ICPP 171078 with a distance of 10.76. Top 10 most diverse pairs of accession are given in Table S4; ICPP 171119 was found to be the most diverse IL amongst these top 10 diverse pairs of ILs.

# DISCUSSION

Grain legumes are an excellent and unique source of dietary protein for human beings in many parts of the world. The dietary importance of legumes is expected to increase over the years due to an increased demand for protein and other nutrients by the


TABLE 3 | Identification of promising trait-specific introgression lines derived from pigeonpea tertiary genepool species C. platycarpus.

TABLE 4 | Performance of 28 promising high-yielding introgression lines for important agronomic traits and grain nutrients.


# DF50, Day to 50% flowering; PPP, Number of pods per plant; PWPP, pod weight per plant; GYPP, Grain yield per plant. \*Significantly better than popular variety ICPL 87119 (Asha) at P≤ 0.05; † Better than ICPL 87119 on per se basis.

FIGURE 2 | Cluster diagram depicting different clusters formed using 136 introgression lines derived from C. platycarpus and five popular varieties following Ward's method based on agronomic and grain nutrient traits. The "Cluster 2" consisted of only two lines: the recurrent parent ICPL 85010 along with another popular variety ICPL 20325. The highest yielding introgression line ICPP 171119 grouped in "Cluster 3" along with mega variety ICPL 87119.

growing world population (Duranti, 2006). Moreover, the development of climate-resilient, nutrient-rich crop varieties is expected to reduce the number of malnourished people worldwide, especially in South Asia and Sub-Saharan Africa (Varshney et al., 2019). The crop varieties with improved nutrition may also be useful in addressing the UN Millennium goal of zero hunger and malnutrition, particularly in those parts of the world where plant-based protein is in high demand.

Pigeonpea is an excellent protein-rich legume crop that is mainly grown under rainfed conditions on marginal lands with minimal inputs and plays an important role in subsistence agriculture (Bohra et al., 2017; Obala et al., 2018). The modern pigeonpea cultivars have a narrow genetic base due to the frequent use of a few promising lines in the hybridization programs over the years. (Kumar et al., 2004; Bohra et al., 2010; Sharma et al., 2019). To meet the growing demand for plant-based nutrition and the need for soil health rejuvenation by implementing a proper cropping system, the pigeonpea breeding programs must embrace a few novel approaches (Anitha et al., 2019). The high-yielding nutrient-rich varieties will attract farmers not only in developing countries where the crop is being cultivated traditionally but will also find a niche in new environments in the developed countries (Atlin et al., 2017). Crop wild relatives (CWRs) are an excellent source of new alleles for different useful traits required for pigeonpea improvement (Khoury et al., 2015; Sharma and Upadhyaya, 2016; Sharma et al., 2019). But due to many hindrances such as cross incompatibility, late maturity, undesirable pod traits, poor agronomic performance, high photoperiod sensitivity, etc., breeders are disinclined to use these CWRs in crop improvement programs (Sharma et al., 2013). In this context, pre-breeding plays a vital role to enhance the use of wild relatives in breeding programs by providing the ready-to-use ILs with superior alleles for different traits introgressed from wild species.

The pre-breeding population consisting of 136 ILs used in the present study was derived from a cross-incompatible tertiary genepool species, C. platycarpus following embryo rescue technique (Mallikarjuna et al., 2011) with a view to introgress important traits such as early maturity, high protein content, and photoperiod insensitivity into cultivated pigeonpea. A large genetic variation was observed for important agronomic traits such as days to 50% flowering, number of pods per plant, grain yield as well as for the grain nutrients viz., protein, grain Fe, Zn, Ca, and Mg content. The genetic variation found in this backcross population is expected to be noble with broad genetic bases as it is derived from wild species.

Based on the days to maturity (duration from planting to 75% maturity), pigeonpea cultivars/varieties are grouped into different groups such as super-early (50–60 days to flowering and/or <100 days to maturity), extra-early (60–80 days to flowering and/or 101–120 days to maturity), early (81–100 days to flowering and/or 121–140 days to maturity), mid-early (101–120 days to flowering and/or 141–160 days to maturity), medium (111–130 days to flowering and/or 161–180 days to maturity) and late (> 130days to flowering and/or >180 days to maturity) maturity duration groups (Srivastava et al., 2012). In general, major pigeonpea cultivation is dominated by varieties in the medium-maturity group (Choudhary and Nadarajan, 2011). Under changing climatic conditions, there is an emphasis on developing short-duration pigeonpea cultivars having photo- and thermo-insensitivity to fit into multiple cropping systems as well as to expand pigeonpea cultivation into new niche areas (Saxena et al., 2019).

Though short duration lines are available, there is a huge yield penalty compared to popular medium-maturing varieties such as ICPL 87119. Most of the high-yielding ILs identified in the present study flowered early and had significantly higher yield and better grain nutrient contents than all the control cultivars used in this study including popular variety, ICPL 87119. The most promising high-yielding ILs such as ICPP # 171012, 171004, 171102, 171087, 171006, and 171050 having early flowering, high yield, and better grain nutrient contents hold great potential for ready use in pigeonpea breeding programs.

Dwarf plant type is not an advantageous feature for pigeonpea as it attracts Helicoverpa armigera and the dwarf bushy growth habit lines have shown 40% damage due to H. armigera (Mallikarjuna et al., 2011). Interestingly, all the ILs in this study were tall with semi-spreading secondary and tertiary branches and indeterminate growth habit. Plant height in pigeonpea is a complex and quantitative trait (Byth et al., 1981). The ILs evaluated were found taller than their recurrent parent ICPL 85010 and were similar to ICPL 87119 which puts the ILs in an advantageous position. Also, the range of yieldcontributing traits such as number of pods per plant, pod weight per plant, and grain yield per plant was very high indicating the high level of recombination in these ILs.

Grain nutrients were found to be highly influenced by genotype and genotype × environment interactions. The stability of these traits in different environments is therefore important in crop improvement programs to improve the nutritional quality of pigeonpea. Non-significant correlations were observed between grain yield and grain nutrients except grain Zn content. This indicates the possibility to develop nutrient-rich pigeonpea varieties without the trade-offs for yield. Besides, a positive association was observed between the grain nutrients, such as of protein with all four grain nutrients suggesting the simultaneous improvement of varietieswith enhanced multiple nutrient contents. Grain Fe and Zn content also showed a significant correlationwhich is reported in several crops including wheat (Morgounov et al., 2007), sorghum (Upadhyaya et al., 2016; Phuke et al., 2017), pearl millet (Kanatti et al., 2014), proso millet (Vetriventhan and Upadhyaya, 2018), and finger millet (Upadhyaya et al., 2011).

Further, it is important to study the similarity/dissimilarity of ILs with control cultivars for use in breeding programs. The cluster analysis grouped 136 ILs with 5 control cultivars into 10 clusters wherein similar ILs were placed in the same cluster based on agronomic traits and grain nutrients. This will help the breeders to choose trait-specific and diverse ILs for use in a breeding program to introduce new useful alleles derived from wild species into their working collection and/or newly developed cultivars/varieties. In addition, the most diverse pairs of ILs have been identified based on the mean phenotypic diversity index. Involving most diverse ILs in hybridization programs would be helpful in generating new and useful recombinants. Apart from this, a few ILs such as ICPP 171119, ICPP 171098, and ICPP 171045 showing maximum diversity with the recurrent parent ICPL 85010; and ICPP 171027, ICPP 171082, and ICPP 171024 having maximum diversity with the popular variety, ICPL 87119 have been identified for use in breeding programs to develop new cultivars with a broad genetic base.

## CONCLUSION

Significant variability was observed in the pre-breeding population, derived from the cross incompatible tertiary genepool species, C. platycarpus,for agronomic traits and grain nutrients.Moreover, it is noteworthy that many ILs performed better than the existing mega varieties not onlyin terms of yield but alsofor nutrient contents. The most promising high-yielding ILs such as ICPP # 171012, 171004, 171102, 171087, 171006, and 171050 having early flowering, high yield, and better grain nutrient contents compared to the best variety, ICPL 87119 hold great potential for ready use in pigeonpea breeding programs. A thorough multi-location evaluation of these promising trait-specific ILs will be efficacious in identifying regionspecific promising lines for their possible direct release as a cultivar (s) having a diverse genetic base, especially in the short-duration group to fit into multiple cropping systems. Further, as no correlations were observed between grain yield and grain nutrients, it shows the possibility of developing nutrient-rich pigeonpea varieties without the trade-offs for yield. Positive associations of grain protein with all four grain nutrients suggests the simultaneous improvement of varieties with enhanced multiple nutrient contents. Besides this, the most diverse promising ILs can be included in the hybridization programs as the potential sources of new and diverse variations. Finally, as C. platycarpus is reported to possess photoperiod insensitivity, these ILs hold great potential for evaluation across locations and seasons to identify photoinsensitive lines for use in breeding programs.

### DATA AVAILABILITY STATEMENT

All datasets generated for this study are included in the article/ Supplementary Material.

# AUTHOR CONTRIBUTIONS

SS conceived the idea and evaluated the material. CS was involved in field evaluation. PP analyzed the data and assisted in preparing the first draft. CN analyzed the seed samples for grain nutrient contents in quality lab. SS prepared the final manuscript. CS, PP, and CN provided their inputs. All authors contributed to the article and approved the submitted version.

### FUNDING

Funding support provided by the Global Crop Diversity Trust (GCDT) Grant Numbers GS15020 and GS18010 and CGIAR Research Program on Grain Legumes and Dryland Cereals (GLDC).

# REFERENCES


### ACKNOWLEDGMENTS

This work is part of the initiative "Adapting Agriculture to Climate Change: Collecting, Protecting and Preparing Crop Wild Relatives" which is supported by the Government of Norway. The project is managed by the Global Crop Diversity Trust. For further information, visit the project website: http:// www.cwrdiversity.org/. The partial funding support provided by the CGIAR Research Program on Grain Legumes and Dryland Cereals (GLDC) is duly acknowledged.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2020.01055/ full#supplementary-material


Conflict of Interest: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer PK declared a past co-authorship with one of the authors SS to the handling editor.

Copyright © 2020 Sharma, Paul, Sameer Kumar and Nimje. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Crop Wild Relatives as Germplasm Resource for Cultivar Improvement in Mint (Mentha L.)

Kelly J. Vining1\*, Kim E. Hummer 1,2, Nahla V. Bassil 1,2, B. Markus Lange3 , Colin K. Khoury 4,5 and Dan Carver 5,6

<sup>1</sup> Department of Horticulture, Oregon State University, Corvallis, OR, United States, <sup>2</sup> National Clonal Germplasm Repository, USDA-ARS, Corvallis, OR, United States, <sup>3</sup> Institute of Biological Chemistry and M.J. Murdock Metabolomics Laboratory, Washington State University, Pullman, WA, United States, <sup>4</sup> Decision and Policy Analysis, International Center for Tropical Agriculture (CIAT), Cali, Colombia, <sup>5</sup> National Laboratory for Genetic Resources Preservation, Agricultural Research Service, United States Department of Agriculture, Fort Collins, CO, United States, <sup>6</sup> Colorado State University, Geospatial Centroid, Fort Collins, CO, United States

### Edited by:

Thomas M. Davis, University of New Hampshire, United States

### Reviewed by:

Benoit Bertrand, Institut National de la Recherche Agronomique (INRA), France Kateřina Smékalová, Crop Research Institute (CRI), Czechia

> \*Correspondence: Kelly J. Vining kelly.vining@oregonstate.edu

### Specialty section:

This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

Received: 31 October 2019 Accepted: 27 July 2020 Published: 19 August 2020

### Citation:

Vining KJ, Hummer KE, Bassil NV, Lange BM, Khoury CK and Carver D (2020) Crop Wild Relatives as Germplasm Resource for Cultivar Improvement in Mint (Mentha L.). Front. Plant Sci. 11:1217. doi: 10.3389/fpls.2020.01217 Mentha is a strongly scented herb of the Lamiaceae (formerly Labiatae) and includes about 30 species and hybrid species that are distributed or introduced throughout the globe. These fragrant plants have been selected throughout millennia for use by humans as herbs, spices, and pharmaceutical needs. The distilling of essential oils from mint began in Japan and England but has become a significant industrial product for the US, China, India, and other countries. The US Department of Agriculture (USDA), Agricultural Research Service, National Clonal Germplasm Repository (NCGR) maintains a mint genebank in Corvallis, Oregon. This facility preserves and distributes about 450 clones representing 34 taxa, hybrid species, advanced breeder selections, and F1 hybrids. Mint crop wild relatives are included in this unique resource. The majority of mint accessions and hybrids in this collection were initially donated in the 1970s by the A.M. Todd Company, located in Kalamazoo, Michigan. Other representatives of diverse mint taxa and crop wild relatives have since been obtained from collaborators in Australia, New Zealand, Europe, and Vietnam. These mints have been evaluated for cytology, oil components, verticillium wilt resistance, and key morphological characters. Pressed voucher specimens have been prepared for morphological identity verification. An initial set of microsatellite markers has been developed to determine clonal identity and assess genetic diversity. Plant breeders at private and public institutions are using molecular analysis to determine identity and diversity of the USDA mint collection. Evaluation and characterization includes essential oil content, disease resistance, male sterility, and other traits for potential breeding use. These accessions can be a source for parental genes for enhancement efforts to produce hybrids, or for breeding new cultivars for agricultural production. Propagules of Mentha are available for distribution to international researchers as stem cuttings, rhizome cuttings, or seed, which can be requested through the GRIN-Global database of the US National Plant Germplasm System, subject to international treaty and quarantine regulations.

Keywords: mint, peppermint, spearmint, verticillium wilt, monoterpene

### INTRODUCTION

The millennia of human effort involved in the domestication of economically important agricultural and horticultural crops is documented and broadly discussed among plant evolutionary biologists (Darwin, 1859; Darlington, 1963; Zohary, 1969; Zohary, 1984; Zohary, 2004). Plant domestication, the genetic modification of a wild form to create an altered plant to meet human needs, has produced many plants incapable of existing in the wild (Doebley et al., 2006). This "domestication syndrome" (Hammer, 1984) involves a combination of traits that are different from those of the wild progenitors. These domesticated plants may have larger fruit or grains, robust growth habits, loss of sexual fertility, loss of bitterness, or synchronous flowering. These plants may not compete successfully in the natural world.

As Zohary (2004) points out, seed propagated agronomic crops, many of which display domestication syndrome, have undergone stabilizing selection to protect fertility. Grain crops have rigid protection for sexual reproduction with streamlined development of flowers, fruits, and seed. Chromosomes behave normally at meiosis and deviants do not survive to reproduce. Chromosomes are balanced with little pollen or seed sterility. In contrast, clonally propagated fruit crops, which also display domestication syndrome, not only tolerate but also promote the reduction of pollen and seed fertility and lower chromosome stability. Parthenocarpy, unequal ploidy levels, aneuploids, and other innovative mass production solutions reduce seed set without reducing fruit production.

At the next level, crops maintained by clonal propagation and grown for their non-reproductive organs have the most drastic disruptions to their flowering and fruiting systems. This group, according to Zohary (2004), demonstrates bizarre chromosomal segregation and unusual ploidy levels. Mint species are an exemplar of this category.

Multiple species of mint have been used for medicinal purposes by humanity from prehistory. From savory herbs produced in monasteries to single-family needs of the kitchen garden, commercial peppermint production for menthol, and essential oil extraction from other mint species, mints represent a significant global economic commodity.

The objectives of this manuscript are to describe the domestication of mint and its uses to humanity. Trait and genotype examples and a summary of the preservation of Mentha genetic resources and global genebank operations will be presented. Present breeding improvements and future possibilities considering future genetic analysis will be projected.

### DOMESTICATION OF MINT

### Taxonomy

Mentha is a strongly scented herb genus of the Lamiaceae (formerly Labiatae) and includes 18 species, 31 subspecies or botanical varieties, and 11 recognized hybrid species (Tucker and Naczi, 2007; GRIN-Global, 2020). The mint family includes diverse additional aromatic genera, such as mountain mint (Pycnanthemum L.), lavender (Lavendula L.), sage (Salvia L.), rosemary (Rosmarinus L.), and oregano (Origanum L.), which are grown commercially for essential oils that are distilled from leaves and stems. Mint shoots and leaves are used for medicinal and aromatic purposes, such as dried organic extracts, distillates, condiments, and food flavorings. Likewise, Mentha includes many commercially valuable species, including peppermint (Mentha ×piperita), Scotch spearmint (M. ×gracilis), native spearmint (M. spicata), American wild mint (M. canadensis), and corn mint (M. arvensis), which are cultivated in different parts of the world for their culinary and medicinal properties. Plants in this genus are herbaceous, rhizomatous, perennial, and aromatic, with smooth, wide-spreading underground stems, which are square in cross-section. The leaves are arranged in opposite pairs, and the white, pink, or purple flowers are produced in clusters.

The name Mentha is derived from Classical Greek mythology, from Minthe, (Minthê), who was a beautiful Cocythian (river nymph) beloved by Hades (Pluto) god of the underworld. Minthe was metamorphosed into dust by Hades's wife, Demeter (Persephone) (Rosengarten, 1969), but Hades caused the fragrant mint plant to grow from the dust. The etymology of "mint" is from the old English minte (= mint plant), which is derived from Proto-Germanic, through Latin from the Ancient Greek, and is akin to old Norse.

At the base of Mt. Minthe, there was a temple dedicated to Hades and a grove for Demeter (Ovid Met. 10) Near Pylos, Greece, Mt. Minthe is thought to be a location of the origin of triploid sterile spearmint (2n = 3x = 36), which likely arose from the introgression of the conspecific endemic ancient amphidiploid Mentha spicata (2n = 4x = 48) and the diploid M. longifolia (2n = 2x = 24) (Kokkini, 1991; Lawrence, 1991).

Since the time of Linneaus, more than 3,000 specific epithets have been reported for Mentha (Tucker and Naczi, 2007). As a measure of mint species taxonomic diversity, the Global Biodiversity Information Facility (GBIF) network includes more than 740,000 locality data points from 454 Mentha taxa with occurrences throughout the world (GBIF, 2020). The plethora of names and occurrences of hybrid and naturalized mints have created confusion in the literature. Wild species of mint hybridize readily, and over evolutionary time, native hybrid-species swarms developed in conspecific regions. Subsequently, minor variances have achieved species rank.

The Plants of the World database (http://www.plantsofthe worldonline.org/) (Kew Science, 2020) describes 39 taxa including, 24 species and 15 hybrid species (Figure 1). Tucker and Naczi (2007) prepared a more conservative list of the number of mints, synonymizing many to recognize 18 species, 31 subspecies or botanical varieties, 11 hybrid species, and one excluded species. They chose to consider Mentha cunninghamii (Benth.) Benth., as Micromeria cunninghamii Benth. GRIN-Global (2020) taking a moderate course, includes 20 Mentha species.

For this manuscript, we applied the determination by Tucker and others (Tucker and Chambers, 2002; Tucker and

Naczi, 2007) for 18 recognized species, with one exception: For this review, we considered Mentha repens (J.D. Hook.) Briq. as M. pulegium L. subsp. repens.

### Species Distribution

Mentha L. has a cosmopolitan native range (Kew Science, 2020). The Mentha distribution map (Kew Science) included 163 countries, provinces, and regions for native distribution and 43 regions of introduction (Figure 1). Considering the tendency for species hybridization within this genus and the successful expansion strategies of mint around the globe, we used a conservative estimation to develop global richness maps. We considered only the taxa accepted by Tucker and Naczi (2007). We prepared global richness maps searching GBIF data for the 17 mint species (Figures 2 and 3).

We separately examined world occurrences of point data within GBIF data for each of the 17 species. We filtered out the introduced, naturalized, and cultivated locality data, keeping only natural endemic occurrences. We used climatic and topographic predictors to model likely occurrence for the 17 individual species maps. The data for the separate species maps were merged to produce the global species richness maps (Figures 2 and 3). These

maps can be used for applied research or conservation planning and investigating the processes that have shaped these patterns.

While the highest diversity of present day species occurs in Western Europe (Figure 2), significant endemic mint species occur in Eastern and Western North America, Asia, Southern Australia, and Tasmania (Figure 3). In addition to this natural species diversity, hundreds of thousands of data points of introduced, naturalized, and cultivated mint occur throughout five continents (GBIF, 2020). The multitude of global mint occurrences (Figure 1) speak to the global success of this genus, starting from Centers of Diversity in Europe, Asia, North America, and Australia and spreading throughout the globe.

The ease of propagation by seeds and clonal propagules, such as rhizomes and cuttings, and the survivability of the plants during harsh and undesirable climactic conditions, allow for spread and diversification. The utilitarian application of mint for human pharmaceutical, food, and cosmetic needs encouraged domestication and cultivation throughout the world and across multiple cultures.

### Prehistory

Throughout human history, medicinal and aromatic plants have been used for flavor enrichment in culinary and medicinal purpose in folk medicine (Sonmezdag et al., 2017). Mentha has been of great importance considering its unique aroma and nutritional value (Naghibi et al., 2010). Archeological excavations showed that the usage of lavenders, sage, and mints occurred in prehistoric times, harvested locally from the wild (Nuńez and De Castro, 1992). In western traditions, mints have been cultivated in the dry, mild, and cold districts of Asia, Europe, and North Africa since antiquity (Jamzad et al., 2003). Native peoples in America have documented traditions of uses of mints as cold remedies (Chehalis, Cowlitz), treatments for gastrointestinal ailments (Kiowa), food (Kiowa, chewing leaves), and ceremonial medicine (Navaho) (Vestal and Schultes, 1939; Elmore, 1944; Gunther, 1973).

### Antiquity

Multiple mint species are referenced (Bostok, 1885) throughout Greek and Roman herbals (Aeschylus, Hippocrates, Krataeus, Dioscorides, and Galen; Romans: Cato, Ovid, and Pliny the Elder); Asian medicinal traditions, traditional Chinese medicine, and the Ayurvedic tradition of India.

The earliest published mint species images in existence can be found on gr. III fol. 129r and 132r of the Juliana Anicia Codex or Vindobonensis Codex (Janick and Hummer, 2012). The Juliana Anicia Codex is a magnificently illustrated manuscript with written information based on the Peri Ylis Iatrikis (De Materia Medica in Latin; Of Medical Matters of Dioscorides). This codex, one of the earliest books, was presented to the imperial Princess Juliana Anicia in Constantinople 512 CE (Collins, 2000). In the 15th century, it was purchased by the Emperor Maximilian for the Imperial Library and was moved to Vienna, where it now resides. The translation of the uses for mints as described in Byzantine Greek written in the background of the images was prepared by Lily Beck (Beck, 2005) and is presented (Table 1). These uses described in the Juliana Anicia Codex were reiterated, revised, and interpreted in many recensions of Dioscorides TABLE 1 | English translation of reference to species of Mentha in De Materia Medica of Dioscorides (Beck, 2005).


When drunk, it draws the menses, afterbirth, and embryos/fetuses. It brings up from the long phlegm when drunk with salt and honey and it helps people with spasms, and with sour wine mixed with water it relieves nausea and gnawing pains of the stomach. With wine, it drives down the bowel dark matter and helps those bitten by wild animals and when applied to the nostrils with vinegar, it revives those who fainted.

Ground up dry and burned, it also strengths the gums; plastered on with barley groats, it soothes all inflammations; it is suitable to use all by itself on the gouty until the skin surfaces becomes irritated and when used with a cerate, it checks facial eruptions; it also helps patients with spleen disease when plastered on with salt. Its decoction used as a wash stops itching and is suitable in a sitz bath for uterine inflations, indurations, and twistings. Some people call it blechon because sheep that taste it when in bloom bleat continuously.

III, 34 [ύбύοsmοn Mentha sp. L. Green mint] The green mint, but some call it minthe: it is a well-known little herb having warming, astringent and drying properties; it is for this reason that its juice, when drunk with vinegar, staunches blood, destroys the round intestinal worm, rouses sexual desire, and when two or three little sprays are drunk with the juice of sour pomegranate, stops hiccups, vomiting, and cholera. Applied as a plaster with barley groats, it dissipates abscesses, placed on the forehead, it assuages headaches, and it abates distension and swelling of the breasts. With salt, it is a plaster for people bitten by dogs, and its juice with hydromel is suitable for earaches. Used by women as a pessary before sexual intercourse, it causes barrenness, and if rubbed on a rough tongue, it smothes it; it keeps milk from curdling when little sprays are stirred about in it, and it is through and through wholesome and spicy.

There is also a wild green mint which has thicker leaves, all told it is larger than bergamot mint which has thicker leaves, all told it is larger than bergamont mint, rather foul smelling, and less useful for health purposes.

works and in subsequent herbals for the next several millennia (Collins, 2000; Figures 4 and 5).

Dioscorides mentions about 500 plants (Singer, 1927). About 130 of these were referenced in the Hippocratic collection. Thus, many of these treatments, including peppermint, were utilized in the Greek world for more than four centuries before Dioscorides writings and have survived throughout the times to medicinal applications in the present day.

Mint is mentioned in the bible: Mathew 23:23, "For you tithe mint, dill, and cumin, and have neglected the weightier matters of the law: justice and mercy and faith. It is these you ought to have practiced without neglecting the others" and Luke 11:42, "Woe to you Pharisees, because you give God a tenth of your mint, rue and all other kinds of garden herbs, but you neglect justice and the love of God. You should have practiced the latter without leaving the former undone." Parry (1925) suggests that these quotations refer to Mentha longifolia, long leafed mint, native to North Africa. Robert Gunther (1934) prepared a translation of John Goodyear's 1655 version of Eduosmos agrios in Dioscorides as: "But ye wilde mint, which ye Romans call Mentastrum, is round in the leaves and altogether greater than Sisymbrium, but poisonous in smell, and lesse fitting for use in health."

Mentha in the traditional Galenic medicine in the Islamic period (Table 2) can be traced to the writings of Dioscorides and other Greek authors.

FIGURE 4 | M. pulegium, pennyroyal, <sup>Ḡ</sup>leḵon, glήcοn Vienna Dioscorides, Juliana Anicia Codex, dating to 512 CE. This illustration is Book III, plate 31. The Arabic writing on the right is translated "type of thyme". The Arabic on the left is the translation of Ḡleḵon. The faded Byzantine Greek in the background is translated by L. Beck (Table 1).

### Egyptian

The Ebers Papyrus dating to 1550 BCE and purchased by Georg Ebers in Luxor in 1874, is one of the earliest surviving Egyptian medical papyri. It is kept in the library at the University of Lipzig, Germany. This 110-page scroll is about 20 m long, and the text is scribed in hieratic Egyptian, likely copied from earlier manuscripts. One treatment suggests that peppermint be mixed with flour, incense, wood of the waneb plant, a stag's horn, sycamore seeds, mason's plaster, seeds of zart, and water to create a curative paste for headaches supporting later Anglo-Saxon and Greek prescriptions for headache and migraine pain relief (Aboelsoud, 2010; Elansary and Ashmawy, 2013).

In the Hearst Papyrus, dated to approximately 2000 BCE, peppermint is recommended as a treatment for rhinitis, where a plaster is applied directly to the nose. In prescription 171, peppermint is mixed with wine and is used to treat what might be edema of the legs. After the mint and wine mixture is consumed, some sort of bloodletting is required, though the physician doesn't provide much in terms of specificity. Mint is still used to treat edema today, though most modern references

FIGURE 5 | Image of M. aquatica, labeled Eduosmos agrios, is from the Pierpont Morgan recession of Dioscorides De material medica. This book dates to about 1050 CE, and the plants are alphabetically arranged. Morgan Library New York.

TABLE 2 | Application of mint in Arabic traditional texts. (Iranian encyclopedia online, 2020).


\*Persian "sherbet", sekanjabin, a concoction of honey and vinegar flavored with mint extract.

deal with topical creams for livestock, particularly cattle (Aboelsoud, 2010).

### European Medieval and Renaissance 1400–1600

The majority of Medieval and Renaissance herbals elaborated the traditions described by ancient Greek and Roman text (Hummer and Janick, 2007). Mint species played a prominent role in many treatments. Adams et al. (2011) compared malaria treatments in Fuchs (1543) and Brunfels (1532), both of whom suggested the use of mints against this disease. Fuchs (1543) also recommended taking pennyroyal (M. pulegium) in vinegar against epilepsy. Essential oil extracted from pennyroyal was amongst the pro-convulsive essential oils discussed by Duke (1985), Burkhard et al. (1999), and Riddle and Estes (1992).

Nicholas Culpeper's work, the Complete Herbal (Culpeper, 1652) documents the use of several mint species. Unlike the Egyptian papyri, or Anglo-Saxon Leech books, Culpeper's work provides in-depth descriptions and growth habits of the plant entries. Culpeper discusses spearmint, which he refers to alternatively as "heartmint." The plant is labeled an "herb of Venus," which directly refers to Dioscorides' De Materia Medica, inferring that mint possesses healing, binding, and drying qualities. Much of the entry borrows directly from this text but has some additions. Culpeper includes treatments using mint for a sore and itchy scalp, pain of the ears, venomous bites, headache, indigestion, and flatulence. He also includes prescriptions for the resolution of bad breath and soreness of the gums and palate. Horse or wild mint are then referenced. Due to menthol's innate analgesic qualities, mint is described to be used against headaches, migraines, and general pain and swelling. Mint is also recommended as a digestive aid. Mint relaxes stomach muscles, allowing food and flatulence to pass more easily.

Mint was employed topically to treat dry, itchy skin, and insect and animal bites and lessen the appearance of scars, bruises and scabs, particularly on the scalp and face. Due to its antimicrobial and antifungal properties, mint was used for millennia to treat fungal infections, such as ringworm, and to treat parasites, such as roundworm. Mint is also a diuretic and, as such, can be used to stimulate urination and possibly even treat hypertension.

# INDUSTRIAL PRODUCTION OF MINT

Mint was significant to humans from early history, being collected from the wild and brought to the convents, monasteries, and kitchen gardens for direct use. Japan preceded western countries in the cultivation of mint and extraction of menthol from M. arvensis, begun before 0 CE (Flueckiger, 1891; Hayden, 1929). Tamba Yasudori (984 CE), in the oldest surviving Japanese medical text, stated that menthol was used in preparation of an aqueous eye medication. Japan began exporting menthol in 1873 (Gildemeister and Hoffman, 1913). In Japan, this oil was extracted from corn mint (Mentha arvensis L.).

Production of peppermint for menthol in large areas in the western world did not occur until 1750 in England (Parry, 1925). By 1796, about 40.5 ha (100 acres) of peppermint (M. ×piperita) were grown near Mitcham, England. The plant material was harvested, bundled, and sent to London for distilling. This early production of peppermint oil yielded 900–1360 kg (2,000–3,000 lb) (Parry, 1925). Over the next 20 years, the production area expanded to 10 townships surrounding Mitcham. By 1940, peppermint cultivation in the original Mitcham area ceased (Parry, 1925).

Peppermint plants were imported from England to the United States with early settlers. By 1812, peppermint was raised commercially near Ashfield, Massachusetts (Lawrence, 1991). Enterprising farmers and distillers, such as Archibald Burnett and Hiram G. Hotchkiss, brought mint westward to New York. By 1830, mint production was sufficient for nine distilleries. These stills also processed spearmint, tansy, wintergreen, spruce, and hemlock oils (Lawrence, 1991). Mint production began to move further west in the US. Both spearmint and peppermint were introduced in New York, Ohio, Michigan, and Indiana. In the western states, mint from Oregon and Washington spread into Idaho, Montana, California, Nevada, and Utah. Lawrence (1991) provides details of early mint production in the US.

In 1853, Albert M. Todd imported 'Black Mitcham' roots from England and began production in St. Joseph County, Michigan. 'White Mitcham,' the original introduction, had become known as 'American peppermint.' Todd found 'Black Mitcham' to be superior to 'White Mitcham' for production in Michigan. Todd's production and distillery in Kalamazoo became a major mint producer. By 1915, Todd began several hundred acres of Scotch spearmint (M. x gracilis Sole), because of its productivity over the native spearmint (M. spicata).

The main mint crops produced in the US are peppermint and spearmint, the latter including Scotch and native spearmints. 'Black Mitcham' peppermint has been the most widely grown peppermint cultivar since at least the mid-20th century (Lawrence, 1991).

The US does not report peppermint production to the Food and Agriculture Organization of the United Nations (FAOSTAT, 2018). In 2018, the countries reporting the largest peppermint production included Morocco, Argentina, Mexico, Bulgaria, and Spain. Oil distilled in China, India, and Japan is M. arvensis var piperescens and is not counted in the FAO peppermint database. In 2018, the FAO reported world mint oil production at 106,728 MT.

The annual farm gate value of peppermint production in the US has been between \$100 and \$150 million, while that of spearmint has ranged between \$45 to \$50 million during the past decade (NASS, 2020).

Recent US mint oil production was highest in 2011, at about 7 million pounds (3,175 MT) of peppermint oil and 2.2 million pounds (998 MT) of spearmint oil. Since then, US production has dropped to under 6 million pounds (2,721 MT), as production of (-)-menthol from oils of M. arvensis has been increasing in China and India. The demand for mint oil by Southeast Asian countries has been driving the expansion of the market in recent years (Liu and Lawrence, 2006).

# PROTECTING MENTHA GENETIC RESOURCES

The FAO manages a list of world genebanks or botanical gardens and the crops that they preserve. This information is publically available in two global databases: Genesys PGR2 and CATIE. We queried both of these databases on 14 April 2020 and totaled entries for Mentha and combined results. This query determined that 65 genebanks in 42 countries as preserving Mentha genetic resources. The largest mint collections are held in the US, Ukraine, Germany, Japan, Portugal, and the United Kingdom (Supplemental Table 1).

### USDA NATIONAL CLONAL GERMPLASM REPOSITORY AT CORVALLIS, OREGON

The Mentha collection of the USDA, NCGR in Corvallis, Oregon, includes representatives of approximately 450 accessions of 13 species and 10 hybrid species plus cultivated types (Table 3). This working genebank maintains the primary collections as plants in containers in greenhouse or lath house environments (Hummer et al., 2020). Most of these plants represent diverse wild species collections from 57 countries.

Most mints in this collection were originally donated to the USDA from Dr. Merrit J. Murray, breeder at A.M. Todd Company, Kalamazoo, Michigan, dating back to the 1970s. At that time, the Todd Company decided to discontinue their mint breeding program and donated the collection to Dr. Chester Ellsworth Horner, USDA researcher in Corvallis, Oregon. Dr. Al Haunold assumed responsibility upon Dr. Horner's retirement and subsequently donated the collection to the NCGR in the mid-1980s.

At present, 216 mint accessions at the NCGR are species or advanced breeder selections originally donated from M. J. Murray. Other representatives of diverse mint taxa including wild accessions have since been obtained from breeders and taxonomic collaborators in the US, Australia, New Zealand, Europe, and Vietnam.

Besides maintaining diverse species representatives, the NCGR collection includes advanced breeder lines, cultivar and germplasm releases, and what are referred to as "donor-named selections," where the NCGR has kept the names provided by donors. In addition, virologists have donated 16 pathogen-positive mints for virus identification studies and certification programs.

### DISTRIBUTION FROM THE NCGR MINT GENE BANK

The NCGR was dedicated in 1981. Since that time, the NCGR has distributed > 9,700 mints as plants, cuttings, rhizomes, tissue cultures, or seed lots for 1,277 orders. About 31% of the orders were sent to US researchers (including government agencies, universities, and non-profit organizations), 33% to US individuals and private companies, and 36% were shipped to foreign requestors. Additional plant and tissue culture shipments were made for secure remote backup of the collection at the USDA National Laboratory for Genetic Resource Preservation in Ft. Collins, Colorado, the designated base germplasm collection within the US National Plant Germplasm System (NPGS).Some

### TABLE 3 | Mentha L. species, distribution, habitat, and number of accessions in the USDA ARS National Clonal Germplasm Repository-Corvallis.


(Continued)

### TABLE 3 | Continued


of the most requested cultivars included M. × piperita 'Chocolate,' 'Pineapple,' 'Black Mitcham,' 'f. lavanduliodora,' 'Todd's Mitcham,' M. aquatica var. citrata 'Eau de Cologne,' M. × gracilis 'Scotch Spearmint,' and M. spicata 'Kentucky Colonel' (Table 4).

Many researchers noted that their purpose for requesting the germplasm was to seek diverse essential oil profiles or to perform genetic analyses. Many universities and companies requested propagules from the entire mint collection for this use. Private company results are unpublished and not publically available. Other scientists were interested in the taxonomy of specific accessions and systematic determinations (Tucker and Chambers, 2002). Others requested germplasm to seek disease resistance (Vining et al., 2005).

The policy of the US NPGS is to distribute plant genetic resources freely for crop improvement. The quarantine importation regulations on international shipment of Mentha are relatively minor compared to those for shipment of fruit, nut, and other more restricted horticultural crops. Fewer importation requirements facilitate the ease of transfer of germplasm for foreign requests.

# GERMPLASM EVALUATION AND CHARACTERIZATION

The primary objectives of a plant genebank includes the acquisition, maintenance, distribution, evaluation, and characterization of the assigned genetic resources (Chambers and Hummer, 1992). During the past 35 years, staff at the NCGR, working in collaboration with scientists at other institutions, have examined the cytology, essential oil content, disease resistance, and the development and use of molecular markers for identity confirmation and diversity determination. Data from these studies have been added as descriptors to the GRIN-Global database.

TABLE 4 | The 50 most distributed Mentha accessions from 1980 through 2020 from the USDA ARS National Clonal Germplasm Repository-Corvallis.


# CYTOLOGY

Harley and Brighton (1977) defined five sections within the genus Mentha (Mentha sect. Audibertia, sect. Eriodontes, sect. Mentha, sect. Preslia, and sect. Pulegium) including 19 species and 13 named hybrids. They also listed chromosome counts for sect. Mentha including most of the mint taxa recognized today. Chambers and Hummer (1994) summarized other documentation of chromosome counts and surveyed 73 Mentha accessions from the NCGR collection.

Many mint species have a monoploid (base) number of x = 12, though diverse species can have monoploid numbers of x = 9, x = 10, x = 18, or x = 25. In addition, ploidy series of several species are reported as diploid, tetraploid, hexaploid, octoploid, enneaploid, decaploid, or aneuploid.

Chambers and Hummer (1994) reported that ploidy determinations were obtained by cytological counts of dividing pollen mother cells in flower buds or meristematic cells in root tips. Unusual species counts included M. requenii with 2n = 2x = 18; M. pulegium with diploid (2n = 2x = 20), triploid (2n = 3x = 30), and tetraploid (2n = 4x = 40); and M. japonica with 2n = 2x = 50. This survey included chromosome counts for M. australis (two accessions), M. japonica, M. diemenica, and M. cunninghamii. For the majority of mint species with x = 12, diploid, triploid, tetraploid, hexaploid, octoploid, and decaploid members were observed (Chambers and Hummer, 1994). Vining et al. (2019) recently reported several unusual ploidy counts, finding diploid and triploid clones in M. suaveolens and octoploid and enneaploid clones of M. aquatica in the NCGR mint collection.

### ESSENTIAL OILS

The Lamiaceae, and particularly Mentha, produce aromatic "essential" oils. These lipophilic substances, predominantly terpenes, are produced by capitate or peltate glandular trichomes that occur on the leaf and stem surfaces (Amelunxen, 1964). They can be simply extracted by crushing the leaves and stems or more completely through distillation. The volatile composition is affected by short chain terpenes that constitute the main fraction especially C10 mono- and C15 sesquiterpenes, which overwhelmingly affect the flavor and taste of these species.

Mints with unique essential oil profiles are valued by the herb, nursery, and pharmaceutical trades. More than 275 accessions of the mint collection of the NCGR were screened by gas chromatography for 37 separate essential oil profiles (Bergstein, 1983). The signature component of native spearmint oil is the C6-oxygenated pmenthane monoterpene (-)-carvone (Figure 6). The oil distilled from 'Black Mitcham' peppermint contains predominantly C3 oxygenated monoterpenes, with (-)-menthone and (-)-menthol being the most prominent metabolites. Essential oil analyses indicated that about half of the NCGR accessions labeled as M. aquatica accumulate (+)-menthofuran as the most abundant

monoterpene (with occasionally high levels of (-)-limonene and/ or 1,8-cineole as well), while the remainder had variable oil profiles and need to be further investigated (Vining et al., 2019). Oils from M. suaveolens accessions are high in either (-)-carvone, piperitenone oxide, or trans-piperitone oxide (Vining et al., 2019). The oil types of M. longifolia accessions are quite diverse, with different C6- or C3-oxygenated monoterpenes dominating the profile (Vining et al., 2005).

While the oil composition varies significantly depending on growing location, regional weather, harvest dates, and processing technology (Lawrence, 2006), recent assessments of NCGR accessions and cultivars were performed under controlled conditions to ensure comparability. These data sets can be searched online as descriptors through GRIN-Global (https:// npgsweb.ars-grin.gov/gringlobal/method.aspx?id=496325).

# DISEASE RESISTANCE

Verticillium wilt disease perpetually plagues the mint industry, and therefore breeding for durable wilt resistance is a longstanding goal. Verticillium wilt is a vascular wilt disease caused by the soil-borne fungus Verticillium dahliae. 'Black Mitcham' peppermint is highly wilt susceptible. However, the cultivar is also a sterile hexaploid, which hinders traditional breeding techniques. Other genetic mutation and recombination techniques have been performed over the years.

An irradiation program, working in collaboration with nuclear facilities in the 1960s, resulted in the release of two new peppermint cultivars with relatively higher wilt resistance: 'Todd's Mitcham' (Todd et al., 1977) and 'Murray Mitcham' (Murray and Todd, 1972). In the 1990s–2000s, the Mint Industry Research Council initiated a genetic engineering project with the objective of genetically transforming peppermint cultivars to confer verticillium wilt resistance. However, despite remarkable technical successes in improving oil yield and composition (but not verticillium resistance) (Lange et al., 2011; Lange, 2015), this research was discontinued due to concerns about market acceptance of genetically modified organisms.

### MOLECULAR ANALYSIS

Multi-locus dominant markers such as amplified fragment length polymorphism (AFLP), random amplified polymorphic DNA (RAPD), and intersimple sequence repeat (ISSR) markers have been used with a small number of mint accessions for assessment of genetic diversity and relationships (Khanuja et al., 2000; Gobert et al., 2002; Shasany et al., 2002; Shiran et al., 2004; Vining et al., 2005; Sabboura et al., 2016; Jedrzejczyk and Rewers, 2018; Choupani et al., 2019) and for cultivar identification (Fenwick and Ward, 2001). Species-specific start codon targeted (SCoT) markers were also developed and used to assess genetic diversity in 12 accessions from four mint species (Khanm et al., 2017). Few co-dominant simple sequence repeat FIGURE 6 | C6-oxygenated p-menthane monoterpene (-)-carvone. (SSR) markers are available for mint (Kumar et al., 2015; Vining et al., 2019). Kumar et al. (2015) identified 54 SSRs from publicly available expressed sequence tag (EST-SSR) sequences of M. ×piperita that generated clear amplicons in 13 accessions from this species. Three SSRs with long core repeats amplified in representative samples from four species including M. arvensis, M. citrata, M. longifolia, and M. spicata, indicating transferability among species.

We recently screened these three SSRs in addition to 48 other primer pairs designed from a draft genome assembly of M. longifolia CMEN 585, from South Africa (Vining et al., 2017) in an eight-member testing panel (three accessions each of M. suaveolens and M. aquatica and two accessions of M. longifolia) (Vining et al., 2019). Screening the eight-member testing panel with these 51 SSRs identified nine new primer pairs that were polymorphic and appeared easy to score (Figure 7). These nine SSRs were thus used in two multiplexes to genotype 49 accessions propagated from the NCGR collection that included 24 accessions of M. aquatica, 23 accessions of M. suaveolens, and the two M. longifolia plants in the testing panel. The nine SSRs developed in this study separated the accessions mostly according to species and identified each accession as unique. Three sets of accessions of M. aquatica were closely related and were distinguished by a single allele each: CMEN 116, CMEN 117; CMEN 110, CMEN 111; and CMEN 121, CMEN 122. The consecutive local numbers of each pair of closely related accessions indicate they are in neighboring pots and may suggest plant contamination and will be investigated further.

Single nucleotide polymorphism (SNP) markers have not yet been reported in Mentha, and the number of SSRs remains low. More SSRs and new SNPs need to be identified and evaluated across Mentha species and are now possible due to the availability of a genome sequence (M. longifolia) (Vining et al., 2017), and the low cost of re-sequencing.

## USES OF THE COLLECTION IN BREEDING EFFORTS

From the mint industry viewpoint, improved oil quality and verticillium wilt disease resistance are two priority traits for new cultivar development. Both traits are multigenic with complex mechanisms. Mentha germplasm evaluations have therefore focused on those traits and efforts are ongoing to understand the underlying genetics. Vining et al. (2019) also reported species accessions with undeveloped anthers, resulting in apparent male sterility. This trait could be useful for developing male-sterile cultivars. The USDA NCGR Mentha collection is valuable as a source of genetic diversity for mint breeding toward these goals. Several private companies have had, or currently have, mint breeding programs that have requested accessions from the Mentha collection. While information about these programs is strictly proprietary, patent documents show cultivar releases for M. spicata: (https://patents.google.com/patent/USPP8645P/en; https://patents.google.com/patent/US20120204283A1/en; https://patents.google.com/patent/US20050044600), M. arvensis (https://patents.justia.com/patent/PP10935), and an interspecies hybrid: (https://patents.google.com/patent/USPP12030). Besides patents, future cultivars could be released under Plant Variety Protection (PVP), or as germplasm releases through the USDA.

The origin of 'Black Mitcham' peppermint, the most widely grown American cultivar in the US and Europe, is unknown, because it resulted from natural hybridization events. However, phylogenetic studies (Tucker et al., 1980; Tucker and Naczi, 2007) suggest that genotypes of M. longifolia and M. suaveolens are the diploid ancestors of M. spicata, which then further hybridized with M. aquatica to give rise to M. ×piperita. Therefore, efforts over the past 50 years have focused on evaluation and crossing of USDA NCGR accessions of M. longifolia, M. suaveolens, and M. aquatica. M. longifolia showed the most obvious phenotypic diversity, with variation in growth habit, leaf morphology, and oil type (Vining et al., 2005). M. suaveolens and M. aquatica showed relatively lower morphological and oil type diversity (Vining et al., 2019).

In V. dahliae inoculation tests, both M. longifolia and M. aquatica showed a range of wilt disease resistance to susceptibility, with a few accessions showing high susceptibility, a few showing high resistance, and most displaying mild to moderate susceptibility levels (Vining et al., 2005; Vining et al., 2019). M. suaveolens accessions, in contrast, were highly resistant, with the exception of one moderately susceptible accession (Vining et al., 2019). The contrast in overall wilt disease resistance between M. suaveolens and the other two species led to speculation that verticillium wilt disease resistance was genetically linked to a spearmint oil type. However, one carvone-type M. longifolia accession, CMEN 584, was among the most highly wilt-susceptible accessions of that species. Segregating F1 and F2 populations derived from crossing CMEN 584 with wilt-resistant M. longifolia accession CMEN 585 also showed a range of wilt resistance to susceptibility, with wilt susceptibility not associated with any particular oil phenotype (Hummer, pers. comm.). It is possible that the higher wilt resistance in M. suaveolens results from homozygosity at particular loci or even a different genetic resistance from the other two species.

Transgressive segregation, a phenomenon in which progeny traits differ from those of either parent, is well documented in oil types resulting from Mentha crosses (Tucker, 2012). Since both monoterpene profile and verticillium wilt resistance are polygenic traits, transgressive segregation can be expected for verticillium wilt resistance as well. With the advent of the M. longifolia reference genome, the genetic underpinnings of these traits are now actively being studied. The complex regulatory mechanisms governing both and verticillium wilt resistance will be a challenge to breeding

In conclusion, mint has a long history of human use and influence and relatively recent (within the past century) increasing attention by mostly private breeding programs. With the advent of genome sequencing, efforts are underway to develop molecular markers for alleles of monoterpene biosynthesis genes and for verticillium wilt disease resistance

Vining et al.

FIGURE 7 | Example of SSR genotyping with M. aquatica.

genes. Going forward, marker assisted selection will increase in importance to mint breeding programs.

### AUTHOR CONTRIBUTIONS

All authors contributed equally to this manuscript.

### FUNDING

Funding for the collection, maintenance, distribution, and evaluation of the NCGR Mentha collection was provided by USDA ARS PWA CRIS 2072-21000-049-00D.

### REFERENCES


Doebley, J. F., Gaut, B. S., and Smith, B. D. (2006). The molecular genetics of crop domestication. Cell 127, 1309–1321. doi: 10.1016/j.cell.2006.12.006


### ACKNOWLEDGMENTS

We appreciate the taxonomic assistance of Henrietta Chambers and Arthur Tucker in species determinations of the NCGR Mentha collections. We appreciate the technical assistance of James Oliphant, Jeanine DeNoma, Debra Hawkes, Melissa Fix, and Sunny Green.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2020.01217/ full#supplementary-material


Fuchs, L. (1543). De historia stirpium commentarii insignes (Basel, Switerland).


Yasudori, T. (984). 醫心方, Ishinpō Tokyo, Japan.


Conflict of Interest: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling editor declared past co-authorship with several of the authors [NB, KV].

Copyright © 2020 Vining, Hummer, Bassil, Lange, Khoury and Carver. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

digital media

of impactful research

article's readership