EDITED BY : Isabel Allona, Matias Kirst, Wout Boerjan, Steven Strauss and Ronald Sederoff PUBLISHED IN : Frontiers in Plant Science

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88963-178-0 DOI 10.3389/978-2-88963-178-0

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# FOREST GENOMICS AND BIOTECHNOLOGY

Topic Editors:

Isabel Allona, Universidad Politécnica de Madrid, Spain Matias Kirst, University of Florida, United States Wout Boerjan, Ghent University, Belgium Steven Strauss, Oregon State University, United States Ronald Sederoff, North Carolina State University, United States

Top: Stem cross-section of three-month old *P. tremula x alba* genotype INRA 717-1B4 plantlets. Cell walls are stained with toluene blue and vessels are white. Black bar represents 1 mm. Picture taken by Cintia Leite Ribeiro from Matias Kirst's lab.

Middle: The SEM images are stem cross-sections of six-month-old *P. trichocarpa* solubilized by Caldicellusiruptor bescii for wildtype (left) and transgenics (right) downregulated in the expression of PtrCAD1 and PtrCAD2 (Straub et al., 2019; Nat. Commun. 10:2548). Black bar represents 5 µm. The SEM pictures were taken by Dr. Ilona Peszlen (NCSU)

Bottom: *E. grandis x urophylla* hybrid plantation, five-years old trees, VERACEL CELULOSE S.A. Municipality of Eunapolis, Southern Bahia, Brazil. Picture taken by Steve Strauss.

Image Design: Jack Wang and Ronald Sederoff

This Research Topic addresses research in genomics and biotechnology to improve the growth and quality of forest trees for wood, pulp, biorefineries and carbon capture.

Forests are the world's greatest repository of terrestrial biomass and biodiversity. Forests serve critical ecological services, supporting the preservation of fauna and flora, and water resources. Planted forests also offer a renewable source of timber, for pulp and paper production, and the biorefinery. Despite their fundamental role for society, thousands of hectares of forests are lost annually due to deforestation, pests, pathogens and urban development. As a consequence, there is an increasing need to develop trees that are more productive under lower inputs, while understanding how they adapt to the environment and respond to biotic and abiotic stress.

Forest genomics and biotechnology, disciplines that study the genetic composition of trees and the methods required to modify them, began over a quarter of a century ago with the development of the first genetic maps and establishment of early methods of genetic transformation. Since then, genomics and biotechnology have impacted all research areas of forestry. Genome analyses of tree populations have uncovered genes involved in adaptation and response to biotic and abiotic stress. Genes that regulate growth and development have been identified, and in many cases their mechanisms of action have been described. Genetic transformation is now widely used to understand the roles of genes and to develop germplasm that is more suitable for commercial tree plantations. However, in contrast to many annual crops that have benefited from centuries of domestication and extensive genomic and biotechnology research, in forestry the field is still in its infancy. Thus, tremendous opportunities remain unexplored.

This Research Topic aims to briefly summarize recent findings, to discuss long-term goals and to think ahead about future developments and how this can be applied to improve growth and quality of forest trees.

Citation: Allona, I., Kirst, M., Boerjan, W., Strauss, S., Sederoff, R., eds. (2019). Forest Genomics and Biotechnology. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88963-178-0

# Table of Contents

*06 Editorial: Forest Genomics and Biotechnology* Isabel Allona, Matias Kirst, Wout Boerjan, Steven Strauss and Ronald Sederoff

## 1 GENE DISCOVERY


Alexander A. Myburg, Steven G. Hussey, Jack P. Wang, Nathaniel R. Street and Eshchar Mizrachi

#### 2 GENE TRANSFER AND GENETIC ENGINEERING


William Patrick Bewg, Dong Ci and Chung-Jui Tsai

*71 Strategies for Engineering Reproductive Sterility in Plantation Forests* Steffi Fritsche, Amy L. Klocko, Agnieszka Boron, Amy M. Brunner and Glenn Thorlby

#### 3 GENETIC MAPPING AND QUANTITATIVE TRAIT ANALYSIS

*79 Genome-Wide Association Studies to Improve Wood Properties: Challenges and Prospects*

> Qingzhang Du, Wenjie Lu, Mingyang Quan, Liang Xiao, Fangyuan Song, Peng Li, Daling Zhou, Jianbo Xie, Longxin Wang and Deqiang Zhang

*89 Quantitative Genetics and Genomics Converge to Accelerate Forest Tree Breeding*

Dario Grattapaglia, Orzenil B. Silva-Junior, Rafael T. Resende, Eduardo P. Cappa, Bárbara S. F. Müller, Biyue Tan, Fikret Isik, Blaise Ratcliffe and Yousry A. El-Kassaby

#### 4 GROWTH AND DEVELOPMENT

*99 Manipulation of Growth and Architectural Characteristics in Trees for Increased Woody Biomass Production*

Victor B. Busov

#### *107 Engineering Tree Seasonal Cycles of Growth Through Chromatin Modification*

Daniel Conde, Mariano Perales, Avinash Sreedasyam, Gerald A. Tuskan, Alba Lloret, María L. Badenes, Pablo González-Melendi, Gabino Ríos and Isabel Allona


## 5 BIOTIC AND ABIOTIC STRESS


# 6 CYBERINFRASTRUCTURE

*176 Cyberinfrastructure to Improve Forest Health and Productivity: The Role of Tree Databases in Connecting Genomes, Phenomes, and the Environment*

Jill L. Wegrzyn, Margaret A. Staton, Nathaniel R. Street, Dorrie Main, Emily Grau, Nic Herndon, Sean Buehler, Taylor Falk, Sumaira Zaman, Risharde Ramnath, Peter Richter, Lang Sun, Bradford Condon, Abdullah Almsaeed, Ming Chen, Chanaka Mannapperuma, Sook Jung and Stephen Ficklin

# Editorial: Forest Genomics and Biotechnology

#### *Isabel Allona1,2\*†, Matias Kirst3, Wout Boerjan4,5, Steven Strauss6 and Ronald Sederoff7\*†*

*1 Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo, Pozuelo de Alarcón, Madrid, Spain, 2 Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid (UPM), Madrid, Spain, 3 School of Forest Resources and Conservation, University of Florida, Gainesville, FL, United States, 4 Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium, 5 Center for Plant Systems Biology, VIB, Ghent, Belgium, 6 Department of Forest Ecosystems and Society, Oregon State University, Corvallis, OR, United States, 7 Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC, United States*

Keywords: tree, forest plantation, conifers, angiosperms, wood

**Editorial on the Research Topic**

**Forest Genomics and Biotechnology**

#### *Edited and reviewed by:*

*Peng Zhang, Shanghai Institutes for Biological Sciences (CAS), China*

#### *\*Correspondence:*

*Isabel Allona isabel.allona@upm.es Ronald Sederoff ron\_sederoff@ncsu.edu*

#### *†ORCID:*

*Isabel Allona orcid.org/0000-0002-7012-2850 Ronald Sederoff orcid.org/0000-0001-6819-7047*

#### *Specialty section:*

*This article was submitted to Plant Biotechnology, a section of the journal Frontiers in Plant Science*

*Received: 04 August 2019 Accepted: 29 August 2019 Published: 11 October 2019*

#### *Citation:*

*Allona I, Kirst M, Boerjan W, Strauss S and Sederoff R (2019) Editorial: Forest Genomics and Biotechnology. Front. Plant Sci. 10:1187. doi: 10.3389/fpls.2019.01187*

Forest biotechnology can be said to have begun with the construction of the first transgenic tree. A bacterial gene imparting glyphosate resistance (EPSP synthase) was introduced into a hybrid poplar (Fillatti et al., 1987). This achievement required the development of three technologies: gene discovery, gene transfer, and *in vitro* plant regeneration. These powerful tools advanced the investigation of the unique biology of forest trees, including their unusual reproductive features, woody and perennial growth habit, and their mechanisms of adaptation to abiotic and biotic stress. In addition, biotechnology has advanced the practical application of molecular genetics for tree breeding, by expanding options for the production of pulp, wood, and energy products (Huang et al., 1993; Baucher et al., 2010; Allwright and Taylor, 2016).

While biotechnology established the tools necessary to modify genes, genomics provided a new platform for the high-throughput genetic analysis of forest trees. Genomics was founded on two technologies: genetic mapping and DNA sequencing. Genetic mapping provided the location of genes, allowing the association of their position to function. Previously, genetic maps were made using isozyme loci as markers (Conkle, 1980) but the number of loci that could be sampled was limited. Restriction fragment length polymorphisms (RFLP), PCR amplification and high throughput DNA sequencing led to an extraordinary expansion in the regions of the genome that could be surveyed (Botstein et al., 1980; Williams et al., 1990; Vos et al., 1995; Wang et al., 1998). Consequently, genetic maps can now be readily constructed for forest trees where genetic analysis of quantitative traits had been previously impossible (Kirst et al., 2004). Through genetic mapping and segregation analysis, a trait could be readily shown to be monogenic, oligogenic, or polygenic in genetic architecture.

Large scale genome mapping of forest trees was first carried out using haploid genetic analysis using conifer megagametophytes and PCR-based anonymous markers (Carlson et al., 1991; Grattapaglia et al., 1992; O'Malley et al., 1996). The concept of mapping with anonymous dominant or codominant markers was extended to diploid crosses using "pseudotestcross" strategies (Grattapaglia and Sederoff, 1994). As sequencing technology advanced, anonymous markers were replaced by sequence-based markers associated with genes, single nucleotide polymorphisms (SNPs) (Eckert et al., 2009) or variation in repeated sequences (Echt et al., 2011).

Forest biotechnology has been strongly influenced by the human genome project (HGP) (Venter et al., 2001). The new technology of the HGP advanced the studies of many species and was important for species such as forest trees, which had been previously difficult due to large sizes and long generation times. Genome sequencing of forest trees (Tuskan et al., 2006) has brought about a host of new technologies ("omics") where gene expression and function could be studied for single genes or for large gene families and even for all of the expressed genes in a specific tissue or cell type. Omics have been developed in tree species for populations of RNA molecules (transcriptomics), proteins (proteomics), and metabolites (metabolomics). All of these methods have been or are being applied to forest tree species (Wagner et al., 2012;Wang et al., 2014; Wang et al., 2018).

While the new "omics" technologies characterize the activity of biological systems, the understanding about the relationship among them is complex and require mathematical modeling to provide predictive power (Wang et al., 2014). Systems and synthetic biology of forest trees reflect the growing collaboration of engineering and molecular biology. Systems biology integrates levels of information and technology deriving from genomics, such as genome sequencing, transcriptomics, epigenomics, proteomics, metabolomics, and imaging. While systems biology uses information from existing species, synthetic biology goes beyond what already exists in nature by reassembling or inventing novel gene sequences and functions. The goals of systems and synthetic biology are to improve the efficiency of metabolic flux, to redesign pathways, or to create novel ones.

A thorough understanding of metabolic and developmental pathways may not only provide innovations in systems design, but could ultimately prove critical for survival of natural and planted forests. Forests around the world continue to be threatened by an increase in the introduction of nonnative pests and pathogens due to world trade and travel. Epidemics of native species of pests and pathogens have also increased due to the destabilizing effects of climate change, which imposes increased abiotic stress on tree populations. Many species of forest trees may soon be lost, affecting entire ecosystems and leading to loss in ecosystem services and biodiversity (NASEM, 2019). Biotechnology could increase our understanding of host pathogen interactions and could also aid in the development of new genotypes able to resist new biotic and abiotic stresses. This will require major new allocation of resources to studies of forest tree biology and the genetic modification of forest tree populations require major changes in highly restrictive regulations and preclusion from markets (Strauss et al., 2015)

This volume is organized in six sections. (1) Gene Discovery, (2) Gene Transfer and Genetic Engineering, (3) Genetic Mapping and Quantitative Trait Analysis, (4) Growth and Development, (5) Biotic and Abiotic Stress, and (6) Cyberinfrastructure.

(1) Gene Discovery. Great attention has been paid to discovery and manipulation of genes involved in wood formation. Motivated by industrial activity in pulp and paper processing, and by the use of wood for biofuels and solid wood products, it has become critical to identify and characterize genes involved in the chemical and physical properties of wood. Wang et al. (in this volume) have reviewed the status of protein–protein interactions in lignin precursor biosynthesis, and Chanoca et al. (in this volume) have reviewed the efforts made to reduce biomass recalcitrance by engineering lignin quantity and composition. Engineering of noncellulosic polysaccharides in wood is discussed by Donev et al. (in this volume), with the goal of modifying wood for chemical content (particularly sugars) and physical properties, which affect the interactions of hemicelluloses with lignin and cellulose. Many forest trees accumulate high levels of secondary metabolites for defense against pests and pathogens. Conifers and other trees such as eucalypts accumulate terpenes in wood that can be extracted and utilized as a renewable chemical feedstock. Peter (in this volume) has reviewed the status of genetic engineering and breeding approaches to increase the abundance of terpenes and thereby increase the value of plantation forest trees. Myburg et al. (in this volume) integrated systems biology, systems genetics, and synthetic biology to propose a new paradigm for the production of chemical feedstocks from woody biomass and for a multitude of other wood products.

(2) Gene Transfer and Genetic Engineering. The slow and inefficient transfer, incorporation, and expression of specific genes into forest trees continues to be a major barrier to progress. Even more demanding is the subsequent requirement for the transformed cell to dedifferentiate, divide, and regenerate organs or embryos expressing the inserted gene. Only a small number of tree genotypes and species have *in vitro* regeneration systems able to support the stress of DNA insertion and express the plasticity needed for embryogenesis or organogenesis. Nagle et al. (in this volume) have reviewed the challenges and opportunities existing for DNA transformation in forest trees. Use of development-stimulating genes such as WUSCHEL appear highly promising, as do *in vivo* approaches. Bewg et al. (in this volume) summarized the status of gene editing in trees using CRISPR-cas9 technology. CRISPR technology is highly efficient for making knock-outs and other alterations, with few off-target effects in trees. Sterility has been the strategy to mitigate gene flow from future plantations that contain genetically modified trees. The minireview by Fritsche et al. (in this volume) outline approaches to the containment of modified genes, and of exotic and invasive forest tree species, by sterility.

(3) Genetic Mapping and Quantitative Trait Analysis. One major benefit of genomics has been the resulting integration of quantitative and molecular genetics. Now that genomic sequencing can, in principle, identify all the genes, the genetic basis of complex quantitative traits can be potentially identified and characterized more directly. Wood properties are under moderate to strong genetic control (Porth et al., 2012) but are also influenced by environmental factors such as season, rainfall, and variation of the gravity vector (Plomion et al., 2001). Du et al. (in this volume) use wood formation as a model system for investigation of the genetic architecture and regulatory mechanisms of quantitative traits in forest trees. They reviewed recent progress in genome-wide association studies (GWAS) of wood properties as a tool for functional genomics and the potential for molecular breeding. Grattapaglia et al. (in this volume) reviewed the application of genomics to tree breeding and describe how genomic information may be used to improve selection. Genomic selection (GS) may allow predictive markers to accelerate the selection of elite genotypes and discovery of the genetic factors contributing to quantitative traits.

(4) Growth and Development. The growth habit and architecture of forest trees has a major effect on the ecological roles and commercial value of forest trees (Zobel and Jett, 1995; Holliday et al., 2017). Busov (in this volume) has reviewed the regulation of crown architecture, secondary growth, wood formation, and adventitious rooting, all complex traits, based on a number of molecular mechanisms.

Chromatin modification is thought to be the basis for a wide spectrum of changes in gene expression in response to developmental or environmental signaling. Only recently have the extent and roles of chromatin remodeling in forest trees been explored. In this volume, Conde et al. have reviewed this progress focusing on the signatures of chromatin regulation during active growth and seasonal dormancy.

Molecular switches, sensitive to day length and temperature, drive poplar phenology. Short days and low temperatures trigger the sequential induction of ethylene and abscisic acid signaling pathways, bud maturation and the establishment of dormancy. Transcriptional profiling and genetic association studies in poplar by Maurya et al. (in this volume) describe a platform for studying how the environment affects the molecular switches that drive phenology.

A major distinction in forest trees is based on the independent origin of gymnosperms (softwoods) and woody angiosperms (hardwoods). Tuskan et al. (in this volume) have described how, as genome sequence information increases and gene function is better understood, much will be learned about the evolution of these very different lineages with respect to their adaptation to variable environments.

(5) Biotic and Abiotic Stress. The world's forest tree species are under threat from the globalization of pests and pathogens. At the same time, tree species are increasingly susceptible to these threats due to the increased stress from global climate change. Increased efforts are being made to find or create genetic resistance, either through breeding or genetic engineering. Naidoo et al. (in this volume) have proposed a strategy for the investigation of defense mechanisms that could lead to the development of superior genotypes with enhanced resistance to biotic stress, integrating quantitative and qualitative resistance with the additional contributions of microbial endophytes and the root-associated microbiomes.

Cold hardiness can affect the natural range of forest trees or the establishment of exotic species in new environments (Hinchee et al., 2011). Wisniewski et al. (in this volume) have reviewed the current knowledge of cold hardiness, and the efforts to improve it through transgenic approaches. Cold hardiness is a complex trait involving avoidance, tolerance, seasonal stages, and dormancy.

Nitrogen availability, as ammonia or nitrate, and nitrogen use efficiency are limiting factors for growth and development of forest trees. Nitrogen reserves also affect dormancy and nitrogen cycling in their ecosystem. Nitrogen affects both primary and secondary metabolism. Canovas et al. (in this volume) summarize advances in forest trees for the functional characterization of genes affecting molecular regulation of acquisition, assimilation, and internal recycling of nitrogen.

One of the major effects of climate change is manifested in the water cycles where droughts are expected to be more frequent and more extreme. Polle et al. (in this volume) review approaches to identify genes that could modify drought tolerance through knowledge of the molecular physiology of the responses to drought stress.

(6) Cyberinfrastructure. Databases have become essential to forest biotechnology, as genomic analysis, transcriptomics, metabolomics, and image analysis become accessible tools for genetic engineering and systems biology of forest trees. Wegrzyn et al. (in this volume) describe the existing individual databases, each focus of interest and how they interact to provide synergistic cyberinfrastructure for the forest tree biotechnology community.

Forest genomics and biotechnology is a highly diverse international endeavor, which is advancing rapidly as new technology enables novel approaches and insights. It is an exciting time, and the reviews in this volume provide an excellent update about where things are, and where they are likely to go. Enjoy reading this special issue on Forest Genomics and Biotechnology!

# AUTHOR CONTRIBUTIONS

All authors participated either in the writing or editing of the editorial.

## FUNDING

The work in I.As laboratory is supported by Grant PGC2018- 093922-B-I00 and SEV-2016-0672 (2017-2021) to the CBGP "Severo Ochoa Programme for Centres of Excellence in R&D" from the Agencia Estatal de Investigación of Spain. WB is indebted to the IWT-FISCH-SBO project ARBOREF (grant number 140894) and the IWT-SBO project BIOLEUM (grant number 130039).

## ACKNOWLEDGMENTS

We thank very much all the authors that have participated in this topic.

#### REFERENCES


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Allona, Kirst, Boerjan, Strauss and Sederoff. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Enzyme-Enzyme Interactions in Monolignol Biosynthesis

*Jack P. Wang1,2 , Baoguang Liu2,3 , Yi Sun2 , Vincent L. Chiang1,2 and Ronald R. Sederoff1 \**

*1Forest Biotechnology Group, Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC, United States, 2State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, China, 3Department of Forestry, Beihua University, Jilin, China*

The enzymes that comprise the monolignol biosynthetic pathway have been studied intensively for more than half a century. A major interest has been the role of pathway in the biosynthesis of lignin and the role of lignin in the formation of wood. The pathway has been typically conceived as linear steps that convert phenylalanine into three major monolignols or as a network of enzymes in a metabolic grid. Potential interactions of enzymes have been investigated to test models of metabolic channeling or for higher order interactions. Evidence for enzymatic or physical interactions has been fragmentary and limited to a few enzymes studied in different species. Only recently the entire pathway has been studied comprehensively in any single plant species. Support for interactions comes from new studies of enzyme activity, co-immunoprecipitation, chemical crosslinking, bimolecular fluorescence complementation, yeast 2-hybrid functional screening, and cell type–specific gene expression based on light amplification by stimulated emission of radiation capture microdissection. The most extensive experiments have been done on differentiating xylem of *Populus trichocarpa*, where genomic, biochemical, chemical, and cellular experiments have been carried out. Interactions affect the rate, direction, and specificity of both 3 and 4-hydroxylation in the monolignol biosynthetic pathway. Three monolignol P450 monooxygenases form heterodimeric and heterotetrameric protein complexes that activate specific hydroxylation of cinnamic acid derivatives. Other interactions include regulatory kinetic control of 4-coumarate CoA ligases through subunit specificity and interactions between a cinnamyl alcohol dehydrogenase and a cinnamoyl-CoA reductase. Monolignol enzyme interactions with other pathway proteins have been associated with biotic and abiotic stress response. Evidence challenging or supporting metabolic channeling in this pathway will be discussed.

Keywords: monolignol biosynthesis, protein-protein interaction, lignin, enzyme kinetics, BiFC, co-immunoprecipitation

## INTRODUCTION

Metabolic pathways are typically conceived as sequences of enzymatic events that are linear, branched, or sometimes circular, that is, how they are described in charts and textbooks. It is a standard procedure in biochemistry to study enzyme reactions in highly dilute aqueous solutions, that is, what is needed to isolate and characterize specific enzymes. As a result, the

#### *Edited by:*

*Wout Boerjan, Flanders Institute for Biotechnology, Belgium*

#### *Reviewed by:*

*Markus Pauly, Heinrich Heine Universität Düsseldorf, Germany Chang-Jun Liu, Brookhaven National Laboratory (DOE), United States*

> *\*Correspondence: Ronald R. Sederoff ron\_sederoff@ncsu.edu*

#### *Specialty section:*

*This article was submitted to Plant Biotechnology, a section of the journal Frontiers in Plant Science*

*Received: 05 July 2018 Accepted: 13 December 2018 Published: 11 January 2019*

#### *Citation:*

*Wang JP, Liu B, Sun Y, Chiang VL and Sederoff RR (2019) Enzyme-Enzyme Interactions in Monolignol Biosynthesis. Front. Plant Sci. 9:1942. doi: 10.3389/fpls.2018.01942*

roles of structure and compartmentation may not be adequately considered. We often assume that each enzyme acts independently, in part because we must purify each enzyme to learn its essential properties, unconfounded by activity of other enzymes. In biochemical reality, enzymes and pathways occur in threedimensional space and are likely to be associated with other proteins, polymers, membranes, and intracellular structures.

Much of our genetic theory of selection is based on concepts of genes acting independently, at least in part because it provides a computational simplification, although interactions of genes and proteins abound. Genomics provides a new platform for the investigation of molecular interactions and structure because all members of gene and protein families can now be identified and characterized, so that assays for molecular interactions can now be investigated in a far more comprehensive way. We can investigate the role of structure for pathways, beginning from the investigation of proteinprotein interactions using new tools to detect genome wide enzymatic and physical interactions.

The lignin biosynthetic pathway in vascular plants (**Figure 1**) has been intensively studied for the greater part of a century (Payen, 1839) because of its fundamental role in the formation of the plant secondary cell wall (Sarkanen and Ludwig, 1971; Harada and Côté, 1985; Boerjan et al., 2003; Weng and Chapple, 2010) and for its practical implications in wood formation (Sarkanen and Ludwig, 1971; Hu et al., 1999; Chiang, 2002; Rastogi and Dwivedi, 2006; Chen and Dixon, 2007; Vanholme et al., 2008; Novaes et al., 2010; Wang et al., 2014; Wang

FIGURE 1 | Interacting enzyme in the monolignol biosynthetic pathway. Colored boxes represent the interacting enzyme families. Red lines represent interactions indicated in the literature that involve proteins outside of the core monolignol pathway. Abbreviations: phenylalanine ammonia-lyase (PAL); cinnamate 4-hydroxylase (C4H); coumarate 3-hydroxylase (C3H); 4-coumarate: coenzyme A ligase (4CL); hydroxycinnamoyl-CoA shikimate hydroxycinnamoyl transferase (HCT); caffeoyl shikimate esterase (CSE); caffeoyl coenzyme A O-methyltransferase (CCoAOMT); cinnamoyl-CoA reductase (CCR); coniferaldehyde 5-hydroxylase (CAld5H); 5-hydroxyconiferaldehyde O-methyltransferase (COMT); cinnamyl alcohol dehydrogenase (CAD); membrane steroid-binding protein (MSBP); cytochrome P450 reductase (CPR); isoflavone synthase (CYP93C1); plant disease resistance gene 1 (Rp1); Rac family small GTPase 1 (Rac1).

et al., 2018). The lignin biosynthetic pathway begins in the cytoplasm by deaminating phenylalanine and proceeds stepwise through the modification of the phenyl ring and the reduction of the propanoid side chain to produce the monolignol precursors, which are translocated to the apoplast to form the lignin polymer (**Figure 1**) (Higuchi, 1985; Boerjan et al., 2003; Wang et al., 2014). While each step of the pathway has reversible reactions, which can be described by Michaelis-Menten kinetics (Wang et al., 2014), the formation of the lignin polymer by oxidative polymerization is not reversible.

The lignin pathway has been variously described as linear, branched, or metabolic grid. The pathway is different in different taxa, related to the distinction of hardwoods (angiosperms) that make a mixed polymer from guaiacyl and syringyl monolignols, compared to softwoods (gymnosperms) that make a purely guaiacyl lignin (Mottiar et al., 2016). Different cell types have different lignin compositions (Fergus and Goring, 1970a,b; Zhou et al., 2011; Shi et al., 2017). Vessel cells in poplar xylem have a guaiacyl rich lignin, while adjacent fiber cells have a syringyl rich lignin content. Studies typically have not been comprehensive in which all the enzymes in the pathway have not been studied in the same cell types, and comparisons across species have been both informative and confusing. The full extent of variation in lignin at the cell, tissue, and species level has not yet been determined.

Our previous studies have shown that unexpected and dramatic effects on the metabolic flux through the monolignol pathway have been the result of protein-protein interactions (Chen et al., 2011, 2014; Naik et al., 2018; Yan et al., 2018). These interactions can affect the extent and direction of flux. The purpose of this short review is to describe the interactions that have been found to date, to guide the discovery of new interactions.

#### Early Concepts and Evidence for Interactions

The aqueous environment within cells and the concentration of proteins in typical biochemical reactions are quite different. Protein concentrations in cells are very high, and the cellular environment in the *in vitro* reactions is dilute by comparison (Mendes et al., 1995). In theory, compartmentation would increase the concentration of metabolites and proteins leading to higher efficiencies (Ralston and Yu, 2006). In addition to compartmentation, molecular organization of the enzymes could lead to greater efficiencies by direct physical interaction of enzymes or by organizing structures bringing metabolites into greater proximity. "Supramolecular complexes of sequential metabolic enzymes and cellular structural elements" called metabolons have been proposed (Kuzin, 1970; Srere, 1985, 1987).

In 1974, Stafford presumed the existence of multienzyme complexes for phenylpropanoid metabolism because of the diversity of secondary products in the same cells and the need for a mechanism that regulated the complex series of biosynthetic pathways (Stafford, 1974). Support for multienzyme complexes was observed as high molecular weight aggregates, differences in utilization of endogenous versus exogenous origin of metabolites, and existence of multiple isoforms of enzymes. None of this, however, was unequivocal direct evidence for multienzyme complexes (Stafford, 1974). Early evidence for interactions as membrane bound enzyme complexes in monolignol biosynthesis came from studies of the formation of *p*-coumaric acid from phenylalanine in potato slices (Czichi and Kindl, 1975). Cinnamic acid added to a soluble reaction mixture was a less effective substrate for cinnamate 4-hydroxylase (C4H) than cinnamic acid formed by the phenylalanine ammonialyase (PAL) reaction (Czichi and Kindl, 1975, 1977). This observation indicated that endogenously formed cinnamic acid may only be partially equilibrated with exogenous cinnamic acid and that the interaction is dependent on microsomal integrity. PAL and C4H were subsequently proposed to interact and channel substrates for increased catalytic efficiency and regulation of flux (Rasmussen and Dixon, 1999; Achnine et al., 2004; Ralston and Yu, 2006). The concept of "metabolic channeling" refers to the transfer of a metabolite as a product of one enzyme to its interacting enzyme as substrate, without entering a metabolic pool (Winkel, 2004; Jorgensen et al., 2005). However, reconstituted PAL and C4H in yeast revealed that metabolic channeling is not required for the conversion of phenylalanine to *p*-coumaric acid (Ro and Douglas, 2004). Measurements of relative abundance of the 24 monolignol precursors also did not support the hypothesis of metabolic channels in this pathway (Chen et al., 2011). The intriguing hypothesis of channeling in monolignol biosynthesis remains to be substantiated.

Proteins in the endoplasmic reticulum (ER) are thought to act as anchors for multi-protein complexes (Ralston and Yu, 2006). Candidates for protein anchors in the phenylpropanoid pathway are the cytochrome P450s (Durst and Benveniste, 1993). P450s are the largest family of enzymes in plants. Enzymes catalyzing three of the oxidative steps of the monolignol biosynthetic pathway are cytochrome P450s. They are C4H, coumarate 3-hydroxylase (C3H), and ferulate 5-hydroxylase (F5H) also known as coniferaldehyde 5-hydroxylase (CAld5H) (**Figure 1**). C4H and C3H are early steps in the monolignol pathway, common to much of phenylpropanoid metabolism (Vogt, 2010), while CAld5H acts late at lignin-specific steps (Osakabe et al., 1999). These P450s insert an oxygen into hydrophobic regions of proteins and as a result increase reactivity and solubility. All cytochrome P450s are monooxygenases that split molecular oxygen through NADPHcytochrome P450 reductases (CPRs), which need to be tightly coupled to the P450 (Omura, 2010). A hydrophobic membranespanning domain anchors the CPR to the ER. Evidence for a direct interaction of CPRs with P450s has been provided using atomic force microscopy and reconstituted phospholipid bilayer disks (Bayburt et al., 1998; Bayburt and Sligar, 2002). Surface residues of both the CPR and the P450 are thought to contribute to the CPR-P450 interaction. The number of P450s in plants far exceeds the number of CPRs therefore that any given CPR might couple with many different P450s, involved with other pathways (Ralston and Yu, 2006). Such interactions are suggested to be dynamic and transient because of the disparity in abundance.

# Direct Physical Interaction of Ptr4CL3 and Ptr4CL5 in *Populus trichocarpa*

Genome sequence analysis identified 17 putative genes in *P. trichocarpa* with similarity to known 4CL encoding genes (Shi et al., 2010). Of the 17, only two (*PtrPtr4CL3* and *PtrPtr4CL5*) encode functional enzymes for CoA ligation in stem differentiating xylem (Chen et al., 2013).

A simple experiment indicated that the activity of a mixture of the Ptr4CL3 and Ptr4CL5 isoforms was not additive (Chen et al., 2014). Further experiments demonstrated that the isoforms interact to form a tetramer, *in vitro* and *in vivo*, which appears to have a regulatory role (Chen et al., 2014). Laser microdissection was used to collect different cell types to determine that both isoforms of 4CL are present in the same cells. Most of the transcripts in differentiating xylem are in fiber cells (Chen et al., 2014; Shi et al., 2017; Wang et al., 2018). Isolated fiber cells obtained by microdissection showed that transcripts from both 4CL genes are co-expressed and abundant. Bimolecular fluorescence complementation (BiFC) was carried out using complementing fragments of yellow fluorescence protein (YFP) co-transfected into *P. trichocarpa* protoplasts from differentiating xylem (Chen et al., 2014; Lin et al., 2014). Strong complementation was observed, indicating close spatial proximity (6–10 nm, Fan et al., 2008) between Ptr4CL3 and Ptr4CL5 in the protoplasts.

Further evidence of an interaction between Ptr4CL3 and Ptr4CL5 has been obtained from chemical crosslinking using dithiobis (succinimidyl propionate) (DSP), which makes crosslinks equivalent to 8 carbon linkages, about 12 angstroms (Lomant and Fairbanks, 1976). A crosslinked mixture of Ptr4CL3 and Ptr4CL5 produced a band detected on SDS-PAGE greater than 200 kDa, consistent with a heterotetramer (Chen et al., 2014). Co-immunoprecipitation (Co-IP) also supported the existence of a protein complex involving Ptr4CL3 and Ptr4CL5 (Chen et al., 2014). Antibody prepared against either Ptr4CL3 or Ptr4CL5 was able to co-precipitate both Ptr4CL3 and Ptr4CL5 from extracts of differentiating xylem. Therefore, the complex could be formed *in vitro* and *in vivo*. The stoichiometry of the complex was calculated to be a Ptr4CL3: Ptr4CL5 ratio of 3:1 based on a size estimate of ~240 kDa by SDS-PAGE (Chen et al., 2014). The size of the monomer is ~60 kDa. A ratio of molecular mass of the complex from crosslinking, purification, and protein cleavage-isotope dilution mass spectrometry was 2.69 to 1 of Ptr4CL3 to Ptr4CL5, supporting the conclusion of a hetero-tetramer with three subunits of Ptr4CL3 and one of Ptr4CL5. A mathematical model was developed that predicts the metabolic flux for mixtures of the two enzymes, in the presence of multiple substrates, incorporating the activity of the monomers and tetramers including competitive inhibition, uncompetitive inhibition, and self-inhibition (Chen et al., 2013, 2014). The model suggests that Ptr4CL3- Ptr4CL5 complex improves the homeostatic properties of the pathway, increasing the stability of the pathway by 22% (Naik et al., 2018).

#### Membrane Protein Complexes Catalyze 4- and 3-Hydroxylation of Cinnamic Acids in *P. trichocarpa*

As discussed earlier, the entry steps to the pathway for monolignol biosynthesis were hypothesized to involve metabolic channeling (Rasmussen and Dixon, 1999; Achnine et al., 2004; Ralston and Yu, 2006). Genomic sequence and transcriptome analysis indicate the involvement of multiple P450 proteins in these steps in *P. trichocarpa* (Shi et al., 2010). Two paralogs of C4H, designated PtrC4H1 and PtrC4H2, are abundantly and specifically expressed in fiber and vessel cells of stem differentiating xylem (Chen et al., 2011; Shi et al., 2017; Wang et al., 2018). One gene encodes the activity of C3H (PtrC3H3). All three are resident ER proteins (Chen et al., 2011).

Co-expression of some combinations of the three hydroxylases had increased activities (Chen et al., 2011) compared to individual enzymes in a yeast system (Urban et al., 1994). Co-expression of PtrC4H1 and PtrC3H3 showed up to 40-fold increases in *V*max of 4-hydroxylase activity compared to the individual proteins, while PtrC4H2 and PtrC3H3 showed nearly a 100-fold increase in 4-hydroxylase activity. The combination of the three enzymes had a lower *K*m with cinnamic acid and more than a 100-fold increase in catalytic efficiency (*V*max/*K*m) compared to the two C4H isoforms (Chen et al., 2011). Co-expression of PtrC4H1 and PtrC3H3 showed nearly a 500-fold increase in catalytic efficiency for 3-hydroxylation of *p*-coumaroyl shikimic into caffeoyl shikimic acid. Co-expression of all three hydroxylases, the increase was over 6,500-fold. Taken together with the activities in extracts of stem differentiating xylem, the results support two hydroxylation pathways, one for the conversion of *p*-coumaric to caffeic acid and the other for the conversion of *p*-coumaroyl shikimic acid to caffeoyl shikimic acid. These results suggest that when co-expressed in the same membrane system, the hydroxylases interact through protein-protein interactions to modulate enzyme activity and metabolic flux (Chen et al., 2011). This inference was further supported by reciprocal co-immunoprecipitation of protein complexes in yeast microsomes.

When full length His-tagged PtrC4H1 and untagged PtrC3H3 were co-expressed in yeast and lysates incubated with anti-His antibody, precipitation of PtrC4H1 was highly enriched in PtrC3H3 (Chen et al., 2011). The results were supported by reciprocal tagging followed by affinity purification and quantitative MS (spectral counting). Further experiments using BiFC, chemical crosslinking, MS, and reciprocal immunoprecipitation in stem differentiating xylem extracts provide a strong evidence for a multi-protein complex of PtrC4H1, PtrC4H2, and PtrC3H3 in yeast microsomes, differentiating xylem protoplasts, and in the differentiating xylem of the growing plant (Chen et al., 2011). The evidence for a trimeric protein complex suggests a structure for metabolic channeling; however, the complex was able to convert exogenous *p*-coumarate into caffeic acid, which argues against channeling.

## Interaction of Monolignol Enzymes With Other Pathway Proteins

Protein interactions involving monolignol biosynthetic enzymes have been associated with plant defense signaling. Rice *Oryza sativa*

CCR1 interacts with a Rac family small GTPase (Rac1) in yeast and *in vitro* in a GTP-dependent manner (Kawasaki et al., 2006). Rac1 is a signaling protein that regulates the production of reactive oxygen species mediated by NADPH oxidase and has an important role in defense response. The interaction of Rac1 with CCR1 (**Figure 2**) leads to the enzymatic activation of CCR1 *in vitro* and in rice suspension cell cultures, which results in a higher accumulation of lignin (Kawasaki et al., 2006). Maize CCoAOMT and HCT interact with a plant disease resistance (R) protein, Rp1, which is a nucleotide binding Leu-rich-repeat (NLR) protein that confers pathogen resistance (Wang and Balint-Kurti, 2016). Physical interaction among CCoAOMT, HCT, and Rp1 in a multi-protein complex (**Figure 2**) suppresses the hypersensitive response to *Agrobacterium tumefaciens* infection conferred by Rp1. Downregulation of CCoAOMT or HCT in tobacco *N. benthamiana* disrupts the protein complex and re-activates Rp1, leading to a severe hypersensitive response to the infection (Wang and Balint-Kurti, 2016).

Recent work in Arabidopsis (Gou et al., 2018) has implicated two membrane steroid binding proteins (MSBP1 and MSBP2) in the structural organization of the three lignin P450 hydroxylases. These MSBPs were shown to reside on the ER membrane and to interact with C4H, C3H, and F5H forming MSBP-P450 complexes (**Figure 2**). The MSBPs are proposed to be essential for the stability and activity of the P450s and necessary for channeling of metabolic flux through monolignol biosynthesis (Gou et al., 2018). In Arabidopsis, two cytochrome P450 reductases (ATR1 and ATR2) were shown to interact with the P450 enzymes (Sundin et al., 2014). One of these, ATR2, is associated with lignin biosynthesis and other phenylpropanoid biosynthetic genes. Plants with loss of function mutations in ATR2 were slightly reduced (6%) in total lignin, were enriched in *p*-coumaric acid derivatives, and were reduced in coniferyl alcohol derivatives. The results were attributed to reduced AtC3H and AtF5H activities (Sundin et al., 2014). Using a yeast-2 hybrid (Y2H) screen, strong interactions with ATR2 were found for AtC4H and AtC3H. Y2H evidence was also obtained indicating direct interactions of AtC4H and AtC3H with At4CL-1 as well as AtC4H with AtCCR1 (Gou et al., 2018). Affinity chromatography and co-purification of the three cytochrome P450 enzymes supported an *in vivo* association of these enzymes. All three P450s were found to interact with the MSBPs presumably on the ER membrane. BiFC experiments show an association of the P450s with MSBPs on the ER. Simultaneous downregulation of both MSBP genes reduced lignin biosynthesis (Gou et al., 2018).

#### Outlook

A fundamental challenge of biochemistry is the reconstruction of an *in vitro* system, which 1) reproduces *in vivo* function, 2) is consistent with predictions of mathematical models, and 3) accounts for variation in place and time, as well as concentration of enzymes and metabolites (Mendes et al., 1995; Faraji et al., 2018). We are still far from being able to recreate

such *in vivo* systems. One long-term strategy would be to build models based on *in vitro* evidence and to make predictions that can be verified in transgenic plants. Reconstruction of the pathway *in vitro* could also provide the necessary evidence. At this time, models of lignin biosynthesis have defined most, but most likely not all, of the components of the system for biosynthesis of monolignols. The best models have incorporated the absolute concentrations of the enzymes and metabolites, including their roles as substrates and inhibitors (Wang Jack et al., 2016; Wang et al., 2018). The variation in cell types and the relative abundance of transcripts of all known monolignol biosynthetic genes have been described (Shi et al., 2017). However, the current models are obtained from data where all enzymes are derived from a single wood-forming tissue (containing fiber, vessel, and ray cells) at the same place and stage of development. The models assumed enzyme functions to be largely independent (Wang et al., 2014; Wang et al., 2018). Cell type–specific data have not yet been incorporated, except to show that proteins that interact *in vitro* are co-expressed in the same cell types (Shi et al., 2017; Wang et al., 2018). Discovery of new regulatory components, such as the recently identified heterodimeric interaction between CAD and CCR that activates the sequential reduction of phenolic CoA esters to aldehydes and monolignols

#### REFERENCES


(**Figure 1**) (Yan et al., 2018), will continue to improve our understanding of the pathway. Incorporating the interactions of enzymes (**Figure 2**) and their metabolites requires a higher level of mechanistic insight based on dynamic interactomic models. The experimental basis for such models is available using inducible promoters that allow induction and repression of gene-specific expression (Borghi, 2010).

#### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

#### FUNDING

This work was supported by the National Natural Science Foundation of China (NSFC) Grants 31470672. We also thank the financial support from the U.S. National Science Foundation, Plant Genome Research Program Grant DBI-0922391 and the NC State University Forest Biotechnology Industrial Research Consortium.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling editor is currently organizing a Research Topic with one of the authors RS, and confirms the absence of any other collaboration**.**

*Copyright © 2019 Wang, Liu, Sun, Chiang and Sederoff. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Lignin Engineering in Forest Trees

Alexandra Chanoca1,2, Lisanne de Vries1,2 and Wout Boerjan1,2 \*

<sup>1</sup> Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium, <sup>2</sup> VIB Center for Plant Systems Biology, Ghent, Belgium

Wood is a renewable resource that is mainly composed of lignin and cell wall polysaccharides. The polysaccharide fraction is valuable as it can be converted into pulp and paper, or into fermentable sugars. On the other hand, the lignin fraction is increasingly being considered a valuable source of aromatic building blocks for the chemical industry. The presence of lignin in wood is one of the major recalcitrance factors in woody biomass processing, necessitating the need for harsh chemical treatments to degrade and extract it prior to the valorization of the cell wall polysaccharides, cellulose and hemicellulose. Over the past years, large research efforts have been devoted to engineering lignin amount and composition to reduce biomass recalcitrance toward chemical processing. We review the efforts made in forest trees, and compare results from greenhouse and field trials. Furthermore, we address the value and potential of CRISPR-based gene editing in lignin engineering and its integration in tree breeding programs.

#### Edited by:

Chandrashekhar Pralhad Joshi, Michigan Technological University, United States

#### Reviewed by:

Sivakumar Pattathil, University of Georgia, United States Kyung-Hwan Han, Michigan State University, United States

> \*Correspondence: Wout Boerjan woboe@psb.vib-ugent.be

#### Specialty section:

This article was submitted to Plant Biotechnology, a section of the journal Frontiers in Plant Science

Received: 29 March 2019 Accepted: 27 June 2019 Published: 25 July 2019

#### Citation:

Chanoca A, de Vries L and Boerjan W (2019) Lignin Engineering in Forest Trees. Front. Plant Sci. 10:912. doi: 10.3389/fpls.2019.00912 Keywords: lignin, forest trees, genetic engineering, CRISPR, field trial

## INTRODUCTION

Fossil resources are the main feedstock for energy and organic compounds, and their use results in the emission of greenhouse gases associated with climate change. The coming climate crash calls for an urgent transition from a fossil-based to a bio-based economy in which lignocellulosic biomass rather than oil is used for the production of fuels, chemicals and materials. Wood is an important source of lignocellulosic biomass; it is mainly composed of secondary-thickened cell walls rich in cellulose, hemicelluloses, and lignin. All three polymers can be valorized in the bio-based economy. Cellulose is a source for the pulp and paper industry, and both cellulose and hemicelluloses can be depolymerized to their monosaccharides for fermentation into, e.g., bio-ethanol, lactic acid and detergents (Vanholme et al., 2013b). As lignin negatively affects the efficiency of wood processing toward these applications, trees can be engineered to accumulate less lignin, to become more amenable for the production of paper and fermentable sugars. On the other hand, lignin is increasingly being considered a valuable component in the bio-based economy. Indeed, given that lignin is the largest renewable aromatic source on Earth, the economic viability of a bio-refinery can be significantly increased if lignin is also valorized, and used as a resource for the production of chemicals (Holladay et al., 2007; Tuck et al., 2012; Davis et al., 2013; Ragauskas et al., 2014; Li C. et al., 2015; Van den Bosch et al., 2015; Rinaldi et al., 2016; Upton and Kasko, 2016; Schutyser et al., 2018).

The lignin polymer is composed of monolignols that are produced by the phenylpropanoid and monolignol biosynthetic pathways, by a series of enzymatic reactions starting with the deamination of phenylalanine (**Figure 1**). The monolignols are synthesized in the cytoplasm and translocated to the apoplast, where they are dehydrogenated to monolignol radicals by the action

of laccases and peroxidases (Berthet et al., 2011; Zhao et al., 2013). These monolignol radicals then couple with each other in a combinatorial fashion, generating a range of chemical bonds such as the aryl-ether bond (β-O-4), resinol bond (β-β), and phenylcoumaran bond (β-5) (Boerjan et al., 2003; Ralph et al., 2004; Vanholme et al., 2010). The most common monolignols are the hydroxycinnamyl alcohols p-coumaryl, coniferyl, and sinapyl alcohols, which generate the H, G, and S units upon their incorporation into the lignin polymer, respectively (Bonawitz and Chapple, 2010; Ralph et al., 2019; Vanholme et al., 2019). The relative contribution of the lignin building blocks varies among taxa, developmental stage, tissue and cell type, and even cell wall layer; lignin from softwoods (gymnosperms) is comprised almost entirely of G units with a minor fraction of H units, while lignin from hardwoods (angiosperms) has S units in addition to G units and traces of H units (Boerjan et al., 2003; Vanholme et al., 2010, 2019). Besides these traditional monolignols, a variety of other p-hydroxylated aromatic molecules can be incorporated in the lignin polymer to various levels (Vanholme et al., 2019).

Given that lignin is a major recalcitrance factor in wood delignification processes, large research efforts have been devoted to unravel the lignin biosynthetic pathway, and to study the effects of perturbations of the lignin biosynthesis genes on lignin amount and composition, and on wood processing efficiency. While modifications in genes ranging from those encoding transcription factors up to those encoding oxidative enzymes have resulted in altered lignin content, composition or deposition (Eriksson et al., 2000; Li Y.H. et al., 2003; Liang et al., 2008; Lu et al., 2013; Lin et al., 2016; Xu et al., 2017; Yang et al., 2017; Obudulu et al., 2018), this review will focus on the results obtained by engineering the lignin biosynthetic genes.

## ENGINEERING THE LIGNIN PATHWAY

**Table 1** provides an overview of the different studies on downregulated or mutated lignin biosynthetic genes in poplar, pine, eucalyptus and birch, with the resulting effects on wood processing efficiencies, when determined. Reducing the activity of any step of the lignin biosynthetic pathway, starting from PAL up to CAD may result in a reduction in lignin content (**Table 1**). Several parameters influence the degree of lignin reduction, such as the target gene and the degree of downregulation of the enzyme activity, which in turn depends on the efficiency of the silencing construct used, the size of the gene family, and redundancy within the gene family. Generally, the downregulation of the steps from C4H up to CCR results in a more dramatic reduction in lignin amount (Hu et al., 1999; Meyermans et al., 2000; Zhong et al., 2000; Li L. et al., 2003; Jia et al., 2004; Lu et al., 2004; Leplé et al., 2007; Coleman et al., 2008a,b; Bjurhager et al., 2010; Mansfield et al., 2012; Ralph et al., 2012; Min et al., 2014; Van Acker et al., 2014; Zhou et al., 2015, 2018; Saleme et al., 2017; Xiang et al., 2017) than downregulation of F5H, COMT and CAD (Van Doorsselaere et al., 1995; Baucher et al., 1996; Lapierre et al., 1999; Jouanin et al., 2000; Van Acker et al., 2014; Wang et al., 2018). Lignin reduction can be associated with an increase in S/G, such as in C3'H- (Coleman et al., 2008a; Ralph et al., 2012) and CCoAOMT-downregulated trees (Meyermans et al., 2000), or a decrease in S/G ratio such as in CSE- (Saleme et al., 2017), and COMT-downregulated trees (Van Doorsselaere et al., 1995; Lapierre et al., 1999; Jouanin et al., 2000). Interestingly, lowlignin 4CL-downregulated poplars were found to have an increase in S/G (Min et al., 2014; Xiang et al., 2017), a decrease in S/G (Voelker et al., 2010; Zhou et al., 2015), or ratios comparable to wild type (Hu et al., 1999; Li L. et al., 2003). This variance cannot be associated with the promoter or the method used for downregulation, suggesting that differences in the degree of silencing, growth conditions or developmental state influence this trait. On the other hand, the strongest effects on H/G/S lignin composition have been observed for trees downregulated in C3'H and HCT, which deposit lignin with large increases in H unit content (Coleman et al., 2008a; Ralph et al., 2012; Vanholme et al., 2013a), whereas trees that overexpress F5H produce lignin strongly enriched in S units (Franke et al., 2000; Li L. et al., 2003; Stewart et al., 2009), and trees that are downregulated in COMT have dramatically reduced S unit biosynthesis (Van Doorsselaere et al., 1995; Lapierre et al., 1999).

Both the reduced lignin content and variation in the H/G/S ratios can affect the biomass processing efficiency. Consistent with the established role of lignin in determining biomass recalcitrance (Zeng et al., 2014; McCann and Carpita, 2015; Li et al., 2016; Wang et al., 2018), plants with reduced levels of lignin show increased chemical pulping and saccharification efficiency (Hu et al., 1999; Jouanin et al., 2000; Rastogi and Dwivedi, 2006; Wadenbäck et al., 2008; Wang et al., 2012; Sykes et al., 2015; Cai et al., 2016; Edmunds et al., 2017; Saleme et al., 2017; Xiang et al., 2017; Van Acker et al., 2017; Wang et al., 2018). An increased level of H units reduces lignin polymer length and, hence, increases the removal of lignin from the biomass (Mansfield et al., 2012; Sykes et al., 2015). Increased S/G results in lignin more easily cleaved and extracted in alkaline conditions, supposedly due to the lower degree of polymerization (Huntley et al., 2003; Stewart et al., 2009; Mansfield et al., 2012; Yoo et al., 2018).

The processing efficiency of the biomass can also be modified by the increased incorporation of molecules that generally represent minor components in the lignin of wild-type plants. The incorporation of ferulic acid in CCR-deficient trees results in the formation of acetal bonds in the lignin polymer, which are easily cleaved in acidic biomass pretreatments (Leplé et al., 2007; Ralph et al., 2008; Van Acker et al., 2014). Indeed, the levels of ferulic acid in lignin positively correlated with a higher saccharification efficiency (Van Acker et al., 2014). The incorporation of 5-hydroxyconiferyl alcohol and 5 hydroxyconiferaldehyde in the lignin of COMT-deficient poplars (Van Doorsselaere et al., 1995; Lapierre et al., 1999; Jouanin et al., 2000; Morreel et al., 2004; Lu et al., 2010) gives rise to benzodioxane bonds, potentially preventing covalent linkages between lignin and the polysaccharide hydroxyl groups (Weng et al., 2010; Vanholme et al., 2012; Nishimura et al., 2018). On the other hand, COMT deficiency also results in a more condensed lignin due to the relatively higher levels of the condensed β-β and β-5 bonds, and the lower levels of β-O-4 bonds, when the S unit frequency drops. Chemical pulping of wood derived from poplars strongly downregulated for COMT resulted in a higher

pulp yield, counterbalanced by the residual lignin content in the pulp. These trees had a lower lignin and a higher cellulose content (Jouanin et al., 2000). On the other hand, poplars that were modestly downregulated for COMT had a large decrease in pulp yield, presumably because lignin content had remained normal while the lignin had a higher frequency of condensed bonds that negatively affected the lignin extraction (Lapierre et al., 1999; Pilate et al., 2002). The incorporation of cinnamaldehydes in the lignin polymer in CAD-deficient trees results in shorter lignin polymer chains, hence a higher proportion of free phenolic

end groups that increase the solubility of the polymer in alkali. The incorporation of cinnamaldehydes in the lignin polymer presumably also reduces the covalent interaction of the aliphatic chain with hemicellulose, again rendering the lignin more soluble. In addition, due to the extended conjugated system that is generated when a cinnamaldehyde β-O-4 couples with another monomer, the aromatic ether bond of the incorporated cinnamaldehyde becomes more susceptible to alkaline cleavage (Lapierre et al., 1989; Van Acker et al., 2017).

Lignin polymerization is a combinatorial radical coupling process, allowing a wide range of phenolic compounds to be naturally incorporated into the lignin polymer (Boerjan et al., 2003; Vanholme et al., 2019). Researchers have attempted to tailor the lignin amount and composition to improve biomass processing by expression of heterologous genes, aiming at the biosynthesis and incorporation of various compatible phenolic compounds as alternative monolignols into the lignin polymer (Ralph, 2006; Vanholme et al., 2012; Mottiar et al., 2016; Mahon and Mansfield, 2018). One example is the introduction of genes encoding enzymes that are needed for S unit biosynthesis in pine; the simultaneous expression of F5H, COMT and CAD successfully introduced S units in Pinus radiata (Wagner et al., 2015; Edmunds et al., 2017). The introduction of the gene encoding a monolignol 4-O-methyltransferase (MOMT4) into poplar leads to the formation of 4-O-methylated coniferyl and sinapyl alcohols, which cannot be incorporated into the growing lignin polymer because they lack the aromatic hydroxyl group. This leads to a halt in lignin polymerization and results in trees with lower lignin content and higher saccharification efficiency (Bhuiya and Liu, 2010; Cai et al., 2016). Poplars have also been engineered to contain ester linkages in the lignin polymer backbone. Coniferyl ferulate esters were introduced into the polymer via expression of a FERULOYL-CoA:MONOLIGNOL TRANSFERASE (FMT) gene derived from Angelica sinensis (Wilkerson et al., 2014), leading to an improved saccharification efficiency under various pretreatment conditions (Wilkerson et al., 2014; Kim et al., 2017; Bhalla et al., 2018), and an improved kraft pulping efficiency as compared to wild type (Zhou et al., 2017). Monolignol p-coumarate esters have also been engineered in poplar, via expression of a rice p-COUMAROYL-CoA:MONOLIGNOL TRANSFERASE (PMT) gene, resulting in a higher frequency of resistant interunit bonds and a higher frequency of G and S terminal units with free phenolic groups (Smith et al., 2015; Sibout et al., 2016). While in Arabidopsis the heterologous expression of PMT resulted in a reduced lignin amount accompanied by an increased saccharification efficiency (Sibout et al., 2016), there was no decrease in lignin amount in poplar and the saccharification efficiency was not determined (Smith et al., 2015).

While several modifications of the lignin amount and composition were shown to provide improvements in biomass processing, these modifications were often accompanied by a biomass yield penalty (Leplé et al., 2007; Wadenbäck et al., 2008; Wagner et al., 2009; Voelker et al., 2010; Stout et al., 2014; Van Acker et al., 2014; Sykes et al., 2015; Zhou et al., 2018). A recent metastudy perturbed 21 lignin biosynthesis genes in P. trichocarpa, and comprehensively integrated the results of transcriptomic, proteomic, fluxomic, and phenomic data of 221 lines. The authors concluded that tree growth is not associated with lignin amount, subunit composition or specific linkages (Wang et al., 2018), but rather correlated with the presence of collapsed xylem vessels (Coleman et al., 2008a,b; Wagner et al., 2009; Voelker et al., 2010; Vargas et al., 2016; De Meester et al., 2018), the activation of a cell wall integrity pathway (Bonawitz et al., 2014) and/or the accumulation of chemical inhibitors (Gallego-Giraldo et al., 2011; Muro-Villanueva et al., 2019).

Whereas substantial efforts have been made to decrease lignin content by downregulation of lignin biosynthetic genes, studies on the upregulation of the lignin pathway and the overproduction of lignin have been scarce. Indeed, reports on the overexpression of F5H show an unchanged or even a decrease in lignin content (Huntley et al., 2003; Li L. et al., 2003; Stewart et al., 2009; Mansfield et al., 2012; Edmunds et al., 2017). The overexpression of CAD and COMT has resulted in gene-silencing rather than upregulation, or no effect on expression levels was detected (Baucher et al., 1996; Lapierre et al., 1999; Jouanin et al., 2000; Leplé et al., 2007; Van Acker et al., 2014). The overexpression of the R2R3-MYB transcription factors PtoMYB92, PtoMYB216, and PtoMYB74 all resulted in additional xylem layers, thicker xylem cell walls as well as ectopic lignin deposition, and the plants accumulated 13–50% more lignin (Tian Q. et al., 2013; Li C.F. et al., 2015; Li et al., 2018). The MYB overexpression lines constitutively upregulated the lignin biosynthesis pathway genes, and while plants overexpressing MYB92 and MYB74 had a biomass penalty, the overexpression of MYB216 resulted in plants with up to 50% more lignin and no developmental phenotype. As lignin is increasingly being considered an important resource for the sustainable production of chemicals (Cao et al., 2018) the engineering of plants overproducing lignin should be further explored.

## FIELD TRIALS

The examples discussed above clearly show that lignin engineering via down- or upregulation of phenylpropanoid pathway genes – or expression of heterologous genes – has the potential to increase the processing efficiency of lignocellulosic biomass. Due to practical and regulatory reasons, most studies report on data obtained from the analysis of trees grown in a greenhouse. However, experiments with trees grown in a greenhouse typically do not take into account developmental processes such as growth cessation and dormancy. In addition, greenhouse experiments do not provide sufficient insight into the interaction of the engineered plant with environmental factors such as soil type, wind, and pathogens. Understanding these interactions is an important step in the translation of research results toward commercial applications. Indeed, the body of work produced by studies for which permission to establish field trials was granted, highlights important differences in phenotype between greenhouse- and field grown trees. **Table 1** summarizes the reports on field trials performed with 4CL, CCoAOMT, CCR, COMT, and CAD downregulated trees.

TABLE 1 | Overview of forest trees with modified expression of lignin biosynthesis genes.


(Continued)

#### TABLE 1 | Continued

fpls-10-00912 July 25, 2019 Time: 16:27 # 6


(Continued)

#### TABLE 1 | Continued

fpls-10-00912 July 25, 2019 Time: 16:27 # 7


n.d., not determined; n/a, not applicable; S/G, syringyl/guaiacyl ratio; S/V, syringaldehyde/vanillin ratio; H/G, p-hydroxyphenyl/guaiacyl ratio. Papers reporting plants that have been used for independent studies reporting on biotic or abiotic stress tolerance are shown with a <sup>∗</sup> . For abbreviations of gene names (see legend Figure 1). Lignin amount was determined by various methods, see the corresponding reference for specific information. Readers are referred to Wang et al. (2018) for additional raw data on downregulated lines for monolignol biosynthesis genes in P. trichocarpa.

Confirming the potential of modified lignocellulosic biomass as a substrate for applications, several lignin-engineered fieldtrial grown trees showed improvements in wood processing. Poplars downregulated for CCoAOMT grown for 5 years in a field trial in Beijing (China), showed an increased glucose and xylose release upon saccharification (Wang et al., 2012). Poplars downregulated for CCR and grown in a field trial in France, proved to be more amenable to chemical kraft pulping (Leplé et al., 2007). Two additional field trials conducted in France and Belgium with CCR-downregulated poplars resulted in up to 160% improvement in ethanol production in simultaneous saccharification and fermentation (SSF) assays; however the plants had up to 50% biomass reduction (Van Acker et al., 2014). Field trials with CAD-downregulated poplar also showed promising results. These trees showed slightly less lignin than wild type and proved more amenable to kraft delignification (Lapierre et al., 1999). Consistently, the same lines grown in larger-scale field trials in France and the United Kingdom showed a mild decrease in lignin amount and an improvement in kraft pulping deemed commercially relevant, since the plants needed 6% less alkali to achieve a delignification similar to that of wild-type trees (Pilate et al., 2002).

However, conflicting reports on both biomass yield and downstream processing efficiency suggest that these parameters are highly influenced by environmental factors. While a field trial conducted in China using 4CL downregulated poplars found that, even with a 28% decrease in lignin content compared to wild type, the trees had about 8% increased height (Tian X.M. et al., 2013), consistent with greenhouse studies (Hu et al., 1999), other field trials found that 4CL-downregulated poplars had decreased biomass and were sometimes even dwarfed (Voelker et al., 2010; Stout et al., 2014; Marchin et al., 2017). Reports also diverge regarding downstream processing efficiencies of wood derived from these 4CL-downregulated field-grown poplars. While up to 100% increase in sugar recovery was found for 4CL1-downregulated trees (35S-driven antisense 4CL construct) grown in a mountain site in the United States (Xiang et al., 2017), data obtained from field studies conducted in Oregon (United States) found that Pt4CL1 promoter-driven antisense silenced 4CL1 poplars had no improvement in saccharification efficiency compared to wild type (Voelker et al., 2010). Likewise, a long term study in Wenling (China), found that 4CLdownregulated poplars did not show a significant improvement in sugar yield compared to wild type (Wang et al., 2012). In both latter cases, the trees showed mild decreases in lignin amount which did not translate into higher processing efficiency, potentially because of the higher concentration of extractives that could interfere with enzymatic activity (Voelker et al., 2010).

Field trial studies have shown that environmental factors can influence lignification and restore traits to wild-type levels as compared to the levels achieved when the same plants were grown in the greenhouse. While 4CL-downregulated trees had decreased lignin content when grown under greenhouse conditions, analysis of the same 4CL antisense poplars, but grown in the field, has often shown that the lignin content was increased as compared to the greenhouse-grown trees and sometimes even restored to wild-type levels (Stout et al., 2014; Xiang et al.,

FIGURE 2 | Patchy gene downregulation by RNAi. Patchy red xylem phenotype observed on trunks of CCR-deficient poplars (right) grown in a field trial in Belgium. The red xylem indicates areas of CCR downregulation. Wood from wild-type trees is whitish (left).

2017). Similarly, lignin levels were much less reduced in CCRdeficient poplars when they were grown in the field as compared to when they were grown in the greenhouse (Van Acker et al., 2014). At least for the CCR-deficient poplars, it is possible that the higher lignification level of field-grown trees is due to the fact that the wood samples were taken during winter. When tree growth ceases in autumn, the trees still have time to fully lignify their cell walls by the time the tree enters dormancy, as compared to greenhouse-grown trees that develop new xylem continuously. Lignin composition has also been shown to differ between greenhouse- and field-grown low-lignin trees. 4CLdownregulated poplars grown in a field in North Carolina had lignin with a lower S/G than when the same lines were grown in the greenhouse (Stout et al., 2014).

Taken together, these results show that data obtained from greenhouse-grown trees cannot easily be extrapolated to fieldgrown trees, underpinning the need for field trial experiments at different locations. Some lines presented a yield penalty rendering them less interesting for applications, highlighting the need for a better understanding of the molecular basis of the yield penalty and the development of strategies to overcome this problem.

Lignin has been shown to play an important role in pathogen resistance (Miedes et al., 2014; Zhao and Dixon, 2014), and it plays a pivotal role in allowing the plant to transport water. This suggests that lignin modifications could have an impact on plant stress tolerance. While further investigation is needed to fully address this possibility, the downregulation of 4CL, COMT, and CAD in poplar did not dramatically alter the feeding performance of leaf-feeding herbivores (Tiimonen et al., 2005; Brodeur-Campbell et al., 2006; Hjalten et al., 2013). The effect of the downregulation of COMT and CAD in poplar on plantinsect interactions has also been assessed on field-grown trees, and it was shown that the lignin-modified trees had normal incidence of visiting and feeding insects, as well as normal responses to microbial pathogens (Pilate et al., 2002; Halpin et al., 2007). These results indicate that trees with modified lignin do not necessarily suffer more than wild-type plants from pests and diseases. Nevertheless, profiling of the endosphere bacterial microbiome of wood harvested from field-grown, CCR-downregulated poplars demonstrated shifts in the bacterial

modification involves the stable integration of foreign DNA into the tree to overproduce (an) enzyme(s) or downregulate (a) gene(s). Combining the classical and New

community, presumably because of the altered abundance of particular phenolic metabolites in the xylem (Beckers et al., 2017).

Breeding Techniques is needed to provide sufficient highly quality wood for society.

Considering the role of lignin in xylem function and structure, the water relations of a few low lignin-modified poplars have been assessed. 4CL-downregulated poplars were found to have reduced hydraulic conductivity, potentially interfering with plant growth (Marchin et al., 2017). Hydraulic stress experiments with poplars downregulated for CCR, COMT or CAD showed that these plants had a lower resistance to cavitation, while maintaining normal xylem hydraulic conductivity and water transport (Awad et al., 2012). These results suggest that the growth of low-lignin mutants might be influenced by water availability. As for any new hybrid obtained from classical breeding, field tests are needed to evaluate field performance and stress tolerance of ligninengineered trees.

# PROSPECTS FOR LIGNIN ENGINEERING IN FOREST TREES

The performance of lignin-engineered plants appears to be highly influenced by the environmental conditions. It is unclear, however, whether the differences observed between greenhousegrown and field-grown trees, or between trees grown in different field locations, result from different levels of gene suppression or from interaction of the engineered trait with the environment (GxE). Indeed, unstable downregulation is a shortcoming of gene silencing techniques that are based on RNAi. This is witnessed by observing variation in the red xylem phenotype that is observed when particular lignin biosynthesis genes, such as CAD, COMT, or CCR, are downregulated. The red xylem coloration is often not uniform throughout the xylem, but rather appears in patches that reflect variable levels of gene silencing (Leplé et al., 2007; Voelker et al., 2010; Van Acker et al., 2014; **Figure 2**). In

addition, the use of gene silencing methods can potentially result in concomitant silencing of closely related gene family members – perhaps to various degrees – clouding the interpretations and camouflaging the effects of downregulation of individual genes.

These issues can now be easily overcome by the use of CRISPR-based gene editing technologies that enable stable lossof-function mutations (knock-outs) in specific target genes, allowing the dissection of the function of individual genes within families. For example, the targeting of individual 4CL gene family members in poplar showed that 4CL1 is related to lignification, whereas 4CL2 is involved in proanthocyanidin production (Zhou et al., 2015). In addition to knock-out alleles, CRISPR-based gene editing also allows to create new alleles that confer partial reduction in enzyme activity. This opens the possibility to finetune the level of residual enzyme activity and to bypass the yield penalty that is often observed when lignin amount drops below a threshold level. Another promising avenue for lignin engineering in forest trees made possible through CRISPR-based genome engineering is the simultaneous editing of multiple genes (allele stacking) to optimize biomass processing efficiency, as exemplified in Arabidopsis where stacking of the transaldolase (tra) and comt mutations, the c4h and comt mutations, or the 4cl and comt mutations resulted in additive and synergistic improvements in saccharification efficiency (de Vries et al., 2018). Indeed, a systems approach in P. trichocarpa predicts that the concomitant downregulation of PAL and CCoAOMT, or PAL, C3'H and CCOAOMT will substantially improve wood properties and sugar release (Wang et al., 2018).

The use of CRISPR-based genome editing in tree improvement for the pulp and paper and the bio-refinery industries, as well as for the production of platform aromatics from the hydrogenolytic breakdown of lignin, will be most valuable when this technology is strategically combined with other breeding techniques (**Figure 3**). Indeed, large variation in lignin amount and S/G composition already exists in natural populations of forest trees (Studer et al., 2011). Given that both traits affect the glucose release upon saccharification (Yoo et al., 2018), exploiting this genetic diversity by conventional breeding,

#### REFERENCES


aided by Genome Wide Association Studies (GWAS) (Porth et al., 2013; Fahrenkrog et al., 2017; Liu et al., 2018), Breeding with Rare Defective Alleles (BRDA) (Vanholme et al., 2013a) or genomic selection (Yin et al., 2010; Muchero et al., 2015; Pawar et al., 2018; Xie et al., 2018), is a valuable strategy to obtain lines that have improved wood processing efficiency. Once elite trees are obtained by these breeding methods, genetic engineering and CRISPR-based gene editing of specific genes is a very promising avenue to further improve these elite genotypes without breaking up their genetic constitution and without going through lengthy breeding cycles. Given the imminent climate crash, we have no more time to lose in adopting these new breeding techniques in our race to the biobased economy.

#### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

#### FUNDING

We acknowledge partial funding from the IWT-SBO project BIOLEUM (Grant No. 130039), and by SBO-FISH through the ARBOREF project. AC has received funding from the FWO and the European Union's Horizon 2020 Research and Innovation Programme under the Marie Skłodowska-Curie Grant Agreement No. 665501. LdV was funded by the Institute for the promotion of Innovation through Science and Technology in Flanders (IWT-Vlaanderen) for a predoctoral fellowship.

#### ACKNOWLEDGMENTS

We thank Annick Bleys for preparing this manuscript for submission.


an engineered monolignol 4-O-methyltransferase. Nat. Commun. 7:11989. doi: 10.1038/ncomms11989




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Chanoca, de Vries and Boerjan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Engineering Non-cellulosic Polysaccharides of Wood for the Biorefinery

#### Evgeniy Donev<sup>1</sup> , Madhavi Latha Gandla<sup>2</sup> , Leif J. Jönsson<sup>2</sup> and Ewa J. Mellerowicz<sup>1</sup> \*

<sup>1</sup> Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, Umeå, Sweden, <sup>2</sup> Department of Chemistry, Umeå University, Umeå, Sweden

Non-cellulosic polysaccharides constitute approximately one third of usable woody biomass for human exploitation. In contrast to cellulose, these substances are composed of several different types of unit monosaccharides and their backbones are substituted by various groups. Their structural diversity and recent examples of their modification in transgenic plants and mutants suggest they can be targeted for improving wood-processing properties, thereby facilitating conversion of wood in a biorefinery setting. Critical knowledge on their structure-function relationship is slowly emerging, although our understanding of molecular interactions responsible for observed phenomena is still incomplete. This review: (1) provides an overview of structural features of major non-cellulosic polysaccharides of wood, (2) describes the fate of non-cellulosic polysaccharides during biorefinery processing, (3) shows how the non-cellulosic polysaccharides impact lignocellulose processing focused on yields of either sugars or polymers, and (4) discusses outlooks for the improvement of tree species for biorefinery by modifying the structure of non-cellulosic polysaccharides.

#### Edited by:

Ronald Ross Sederoff, North Carolina State University, United States

#### Reviewed by:

Jozef Mravec, University of Copenhagen, Denmark Rosemary White, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia

#### \*Correspondence:

Ewa J. Mellerowicz ewa.mellerowicz@slu.se

#### Specialty section:

This article was submitted to Plant Biotechnology, a section of the journal Frontiers in Plant Science

Received: 13 July 2018 Accepted: 28 September 2018 Published: 23 October 2018

#### Citation:

Donev E, Gandla ML, Jönsson LJ and Mellerowicz EJ (2018) Engineering Non-cellulosic Polysaccharides of Wood for the Biorefinery. Front. Plant Sci. 9:1537. doi: 10.3389/fpls.2018.01537 Keywords: non-cellulosic polysaccharides, woody biomass, secondary cell wall, hemicellulose, pectin, galactan, tree genetic improvement, wood biorefining

#### NON-CELLULOSIC POLYSACCHARIDES IN WOOD

Plant cell walls constitute the most abundant renewable resource on Earth (Pauly and Keegstra, 2008). Wood, which essentially consists of cell walls, is naturally degradable and renewable, and technologies are currently being developed aiming at utilization of all organic wood components, i.e., cellulose, lignin, non-cellulosic polysaccharides, and extractives. Non-cellulosic polysaccharides, which include hemicelluloses, pectins, type II arabinogalactan (AG-II), and callose, account for roughly one third of wood dry weight. Among these four groups, hemicelluloses are most abundant contributing to 26–33% of the dry weight in softwoods (conifer species) and 19– 34% in hardwoods (dicotyledonous species) depending on species, cell type, developmental stage, and environmental conditions (Sjöström, 1993).

Non-cellulosic polysaccharides exhibit remarkable variability in different layers of wood cell walls, and chiefly define these layers (Mellerowicz and Gorshkova, 2012). Thus, the middle lamella

**Abbreviations:** AG-II, type II arabinogalactan; CML, compound middle lamella; GGM, galactoglucomannan; G-layer, gelatinous layer; GM, glucomannan; GT, glycosyltransferase family; GX, glucuronoxylan; HG, homogalacturonan; IRX, irregular xylem; KD, knock-down; LCCs, lignin carbohydrate complexes; PCW, primary cell wall; RES, reducing end sequence in xylan; RG-I, rhamnogalacturonan I; RG-II, rhamnogalacturonan II; S/G, syringyl/guaiacyl; SCW, secondary cell wall; Xylp, xylopyranosyl residue.

is dominated by pectins, the primary cell wall (PCW) layer by pectins and xyloglucan, the secondary cell wall (SCW) layers by xylans and mannans, and the gelatinous layer (G-layer) present as a tertiary layer in tension wood fibers by galactans and AG-II. These different polysaccharides blend with the lignin matrix and cellulose microfibrils, and are involved in covalent, ionic, and hydrophobic interactions with other cell wall components and with themselves (Cosgrove, 2005; Scheller and Ulvskov, 2010). They are the main source of lignin carbohydrate complexes (LCCs) linking to lignin by ether, glycoside, or ester bonds (Lawoko and Henriksson, 2006; Balakshin et al., 2007; Giummarella and Lawoko, 2016; Giummarella et al., 2016). Thereby, the non-cellulosic polysaccharides affect cell wall architecture, wood traits, and properties of lignocellulosic biomass being favorite targets for improving biomass properties (reviewed in Tavares et al., 2015; Damm et al., 2016; Marriott et al., 2016; Wang et al., 2016; Smith et al., 2017). They are also a precious source of large amounts of assimilated carbon for which clever applications are being sought (e.g., Zhao et al., 2015; Oinonen et al., 2016).

Among hemicelluloses, xylan, which includes the glucuronoxylan (GX) of hardwoods and the arabinoglucuronoxylan of softwoods (**Figure 1**), is a ubiquitous component of wood SCWs (Donaldson and Knox, 2012; Kim and Daniel, 2012). Approx. 60% of the xylopyranosyl residue (Xylp) of hardwood xylan are mono- or di-acetylated (Teleman et al., 2000, 2002). The acetyl groups compete with glucuronic acid for Xylp position 2, and a decrease of one of these substituents usually leads to an increase of the other (Chong et al., 2014; Lee et al., 2014). Mannans, which include water-soluble galactoglucomannan (GGM) and water-insoluble glucomannan (GM) (**Figure 1**), are the most abundant hemicelluloses in softwood SCWs, whereas hardwood SCWs contain lower fractions of GM (Teleman, 2009). Xyloglucan (**Figure 1**) is localized in PCWs of hardwoods and softwoods (Bourquin et al., 2002; Donaldson and Knox, 2012; Kim and Daniel, 2013), where it may associate with hydrophobic cellulose surfaces or become entrapped inside cellulose fibrils (Park and Cosgrove, 2015).

Pectins, which include homogalacturonan (HG), rhamnogalacturonan I (RG-I), and rhamnogalacturonan II (RG-II) (**Figure 1**), are acidic polysaccharides. They constitute a large part of the middle lamella and PCW layers, jointly referred to as the compound middle lamella (CML) (Kim and Daniel, 2013). Reaction wood, such as tension wood of hardwood and compression wood of softwood, typically contains high mass fractions of β-1,4-galactans (**Figure 1**) presumably associated with RG-I.

Water-soluble softwood arabinogalactan, a variant of AG-II (**Figure 1**), is highly abundant in larch (25%). Other softwoods and hardwoods contain small amounts of AG-II. AG-II may be covalently linked to xylan and pectin (Tan et al., 2013). Callose (**Figure 1**), or laricinan, accumulates in hardwoods and softwoods in response to damage and stress (Teleman, 2009). It is abundant in pits and between cavities of the inner S2 layer in compression wood (Hoffman and Timell, 1970; Chaffey and Barlow, 2002; Altaner et al., 2010; Zhang et al., 2016).

This review addresses the importance of the non-cellulosic polysaccharides in technological processes currently used in wood biorefining, and prospects of altering them in trees for obtaining either higher productivity or improved lignocellulose properties, like extractability or improved biochemical conversion to sugars.

# FATE OF NON-CELLULOSIC POLYSACCHARIDES DURING WOOD BIOREFINING

Biorefining of wood includes the pulping (mechanical and chemical pulping, as well as combinations thereof), biochemical processes, and thermochemical processes. Mechanical pulping aims at high recovery of all major wood constituents, including non-cellulosic polysaccharides (Sjöström, 1993; Ek et al., 2009). In contrast, chemical pulping and subsequent bleaching steps are designed to target the lignin and preserve the cellulose, whereas the fate of the non-cellulosic polysaccharides is strongly dependent on the aim and the process technology, which can be Kraft (sulfate), sulfite, soda, or organosolv pulping (Sjöström, 1993; Ek et al., 2009). For example, dissolving pulp manufacture based on sulfite pulping, or, sometimes, Kraft pulping, aims at producing a cellulose of relatively high purity, which means that most of the non-cellulosic polysaccharides are degraded and removed together with the lignin. In many other processes, such as conventional Kraft pulping for manufacturing of paper products, preservation of hemicelluloses is beneficial, as the pulp yield will then be higher.

Biochemical conversion is typically based on saccharification of the polysaccharides using pretreatment and enzymatic hydrolysis. This creates monosaccharides, which are then refined further using microbial fermentation or chemical catalysis. The aim of the pretreatment is to make the cellulose susceptible to enzymatic hydrolysis, which otherwise would be too slow and provide too low sugar yields. Among many different pretreatment methods (Yang and Wyman, 2008; Sun et al., 2016; Jönsson and Martín, 2016), hydrothermal pretreatment under acidic conditions, with or without externally added acid and with or without the disruptive effect of steam explosion, is a common approach. Due to autohydrolysis and formation of carboxylic acids, which are derived mainly from the non-cellulosic polysaccharides, the process will be acidic even without externally added acid (Jönsson and Martín, 2016). The main target of hydrothermal pretreatment under acidic conditions is the hemicellulose. Cellulose and lignin are also affected, but typically to much lesser degree than hemicelluloses, which can be degraded almost quantitatively in well-optimized pretreatment (Wang et al., 2018). The severity of the hydrothermal pretreatment (time, temperature, and acidity) needs to be adapted to the feedstock. Softwoods require higher severity, whereas hardwoods can be processed using lower severity. Nevertheless, pressurized reactors and temperatures in the range 160–240◦C (Sun et al., 2016) are typically used to create a pretreated material that is suitable for subsequent enzymatic saccharification.

Thermochemical conversion processes, such as combustion, gasification, and pyrolysis, will degrade all organic wood constituents and are less relevant within the context of the current review.

FIGURE 1 | Schematic illustration of types of non-cellulosic polysaccharides of wood, including hemicelluloses (gray background), pectins (blue background), callose (yellow background) and AGs-II (orange background), and hardwood fibers and softwood tracheids (inset). Polymer structures were based on different sources: hardwood GX (Teleman, 2009; Smith et al., 2017), softwood arabinoglucuronoxylan (Teleman, 2009; Martínez-Abad et al., 2017; Smith et al., 2017), hardwood and softwood glucomannan (GM), softwood GGM, tension and compression wood galactans, callose (Teleman, 2009), xyloglucan (Carpita and McCann, 2000; Teleman, 2009), HG (Atmodjo et al., 2013), RG-I and -II (Edashige and Ishii, 1996, 1997, 1998; Atmodjo et al., 2013), AG-II (Carpita and McCann, 2000; Hijazi et al., 2014), softwood arabinogalactan (Ponder and Richards, 1997; Teleman, 2009). Polymer localization is based on the following sources: hardwood GX and mannans (Kim and Daniel, 2012; Gorshkova et al., 2015; Guedes et al., 2017), softwood arabinoglucuronoxylan (Altaner et al., 2010; Donaldson and Knox, 2012), callose (Altaner et al., 2010; Zhang et al., 2016), xyloglucan (Bourquin et al., 2002; Sandquist et al., 2010; Nishikubo et al., 2011; Donaldson and Knox, 2012; Kim and Daniel, 2013; Guedes et al., 2017), HG (Kim and Daniel, 2013), RG-I /compression wood galactan/tension wood galactan (Gorshkova et al., 2015; Zhang et al., 2016; Guedes et al., 2017), AG-II/softwood arabinogalactan (Altaner et al., 2010; Guedes et al., 2017). PM, pit membrane; CML, compound middle lamella; S, secondary wall layer (S-layer), G, gelatinous layer (G-layer); C, cavities; S2i, inner S<sup>2</sup> layer; S2L, outer lignified S<sup>2</sup> layer.

# ROLE OF NON-CELLULOSIC POLYSACCHARIDES IN RECALCITRANCE AND ATTEMPTS TO IMPROVE CONVERTIBILITY

#### Improvement of Xylan Structure Xylan Content and Length Affect Saccharification and Plant Growth

Xylan is a key factor of recalcitrance, mainly by reducing cellulose accessibility (Bura et al., 2009; De Martini et al., 2013), prompting efforts to reduce its content in hardwoods. Attempts have been made using Arabidopsis as a model, either by targeting the xylan synthase complex or the enzymes synthesizing the reducing end sequence (RES) (Smith et al., 2017). However, strong reductions in xylan content led to mechanical failure of vessels [the socalled irregular xylem phenotype (IRX)] and stem weakening. Subsequent work with Populus indicated that xylan content can be reduced by 5–50% by knocking down (KD) different xylan biosynthetic genes including GT47C (Lee et al., 2009), two GAUT12/GT8D paralogs (Lee et al., 2011; Li et al., 2011; Biswal et al., 2015), GT43A and GT43B (Lee et al., 2011), and GT43B along with GT43C clade genes (Ratke et al., 2018) or by expressing fungal xylanase HvXyl1 and targeting it to cell walls (Kaida et al., 2009; **Table 1**). Such reductions either did not affect (Lee et al., 2009) or stimulated growth (Biswal et al., 2015; Ratke et al., 2018), decreased xylem cell wall thickness (Li et al., 2011; Ratke et al., 2018), which sometimes (Lee et al., 2009, 2011) was coupled to a mild IRX phenotype. Beside xylan, the cellulose content decreased in case of GT47C KD, coupled with increased lignification (Lee et al., 2009). A similar increase in lignin coupled with brittleness of stems was observed in strong KD GAUT12- 1 and -2 (Li et al., 2011), but not when only GAUT12-1 was reduced (Biswal et al., 2015). Lignin syringyl/guaiacyl (S/G) ratio was increased in GAUT12-1 KD poplar (Biswal et al., 2015), and in severe KD GT43B poplar (Lee et al., 2011). However, the S/G ratio was reduced without change in lignin content in mildly affected GT43B and GT43C KD (Ratke et al., 2018). Thus, it is difficult to predict how lignin might be affected in transgenic lines with reduced xylan content, and these changes should be carefully monitored, since they affect saccharification.

For non-pretreated wood, downregulation of GT43 genes resulted in a 30% increase in glucose yield in enzymatic saccharification (Lee et al., 2011; Ratke et al., 2018), but the benefits were less apparent after acid pretreatment (Ratke et al., 2018; **Table 1**). Reductions in xylan content by downregulation of GAUT12/GT8D did not improve saccharification without pretreatment (Lee et al., 2011) or did so only slightly (after steam pretreatment) (Biswal et al., 2015) whereas post-synthetic xylan reduction resulted in approx. 50% increased glucose yield in saccharification after steam pretreatment (Kaida et al., 2009; **Table 1**). There might be several and possibly opposing factors at play affecting saccharification. For example, cell wall thickness, lignin content and composition, and content of tension wood can all affect glucose yields (Escamez et al., 2017). KD GAUT12/GT8D poplar had less LCCs, which contribute to recalcitrance (Min et al., 2014a). Clearly, manipulation of xylan induces indirect effects, some of which, such as increased growth (Biswal et al., 2015; Derba-Maceluch et al., 2015; Yang et al., 2017; Ratke et al., 2018), or increased drought tolerance (Keppler and Showalter, 2010) are interesting for biotechnological applications.

#### Xylan Acetylation Affects Cell Wall Architecture

Deacetylation of lignocellulosic biomass prior to enzymatic saccharification results in improved sugar yields (reviewed by Pawar et al., 2013). For lignocellulosic biomass with high acetyl content such as hardwoods, reduction of acetylation might have an added benefit for ethanolic fermentation processes, as high concentrations of acetic acid are inhibitory to fermenting microorganisms (reviewed by Jönsson and Martín, 2016).

Modest reductions in acetylation in KD RWA aspen (Pawar et al., 2017b) and in aspen expressing fungal xylan acetyl esterase AnAXE1 targeted to cell walls (Pawar et al., 2017a; **Table 1**) were well tolerated by plants. These plants yielded 20–25% more glucose in enzymatic saccharification without pretreatment. Results with Arabidopsis (Pawar et al., 2016) suggested that deacetylation in planta reduces recalcitrance by other mechanism than reducing the inhibition of xylan enzymatic hydrolysis by acetyl groups. Indeed, aspen expressing AnAXE1 exhibited increased lignin solubility and reduced xylan content, xylan chain length, and lignin S/G ratio (Pawar et al., 2017a). Increased extractability of lignin and xylan agrees with the suggested xylan models (Ruel et al., 2006; Busse-Wicher et al., 2014), where the minor xylan domain (**Figure 1**) interacts with lignin. This domain, characterized by consecutive Xylp acetylation, would become (after deacetylation) more susceptible to hydrolysis by wall-residing enzymes, such as XYN10A (Derba-Maceluch et al., 2015), leading to loosening of xylan fraction interacting with lignin.

Overexpression of PdDUF231A, similar to AtPMR5, in P. deltoides resulted in increased xylan acetylation and surprisingly in improved saccharification without pretreatment (Yang et al., 2017; **Table 1**). Decreased lignin content and increased cellulose content in transgenic lines might have affected the sugar yield.

#### Significance of Glucuronosylation

Glucuronosylation of xylan makes its backbone less prone to hydrolysis by GH10 and GH11 xylanases, and requires α-glucuronidases for saccharification (Mortimer et al., 2010). It is also associated with LCCs of hardwoods (Min et al., 2014a,b; Bååth et al., 2016). The majority of glucuronate in SCW xylan is methylated (Teleman, 2009) and KD of a GX methyl transferases DUF579-3/GXM3 in poplar reduced not only methylation but also resulted in reduced xylan glucuronosylation, and reduced growth (Song et al., 2016). Xylose yield of acid pretreatment increased as well as glucose yield of enzymatic saccharification. Thus, methylation of glucuronate is a promising target, but some means of avoiding growth penalty need to be designed.

To reduce ester links between GX and lignin in aspen, a fungal glucuronoyl esterase was expressed and targeted to cell walls (Gandla et al., 2015; **Table 1**). Increased cellulose-to-glucose conversion was observed, but plants exhibited premature leaf senescence and impaired growth (Gandla et al., 2015). There


fpls-09-01537 October 22, 2018 Time: 17:20 # 5

(Continued)


 (

+ ),

compared to WT; RGII, increase compared to WT; (–), decrease compare to WT.

Rhamnogalacturonan

 II; Rha, rhamnose;

 RES, reducing end sequence of xylan; S/G,

syringyl:guaiacyl

 ratio; TFA,

trifluoroaceticacid;

 TW, tension wood; X, xylan; Xyl, xylose; XG, xyloglucan;

was a decrease in extractives and an increase in lignin (Gandla et al., 2015) and mechanisms underlying these responses are not understood.

# Prospects for Mannan Structure Improvement

GGM is tightly associated with cellulose microfibrils (Schröder et al., 2009; Salmén, 2015) and increasing its extractability could be beneficial for saccharification. Mannan hydrolysis in planta by overexpression of plasma-membrane-bound mannanase MAN6 induced production of active galactoglucomannooligosaccharides that modified growth of poplar and inhibited SCW formation (Zhao et al., 2013), making this approach problematic. However, overexpression of extracellular GM-active endoglucanase AtCel1 in poplar lead to approx. 30% increase in sugar yield and in cellulose conversion in saccharification after steam pretreatment (Kaida et al., 2009).

Mannan biosynthetic genes identified to date include a GM synthase (CSLA clade) (Suzuki et al., 2006), a mannan galactosyltransferase (GT34) (Edwards et al., 2004), and unknown function GT65 members AtMSR1 and 2 (Wang et al., 2013), and could be employed for biomass modification. For example, increasing the degree of galactosylation of mannan by increasing expression of mannan galactosyltransferase might lead to higher solubility of GGM (Edwards et al., 2004).

# Xyloglucan and Pectins – Minor Wood Components With High Impact on Properties

Several lines of evidence indicate that xyloglucan and pectins, despite being minor wood components (below 5% dry weight), have significant effects on wood properties, including recalcitrance. Increased xyloglucan content in aspen obtained by overexpressing XTH34/XET16A did not affect growth (Nishikubo et al., 2011) nor saccharification (Escamez et al., 2017), but its reduction in poplar, achieved by expression of fungal xyloglucanase AaXEG2, highly stimulated growth, wood cellulose content, density, and mechanical strength in the greenhouse-grown poplar (Park et al., 2004; **Table 1**). Cellulose microfibrils were larger in transgenic plants (Yamamoto et al., 2011) and their lignocellulose yielded almost 50% more glucose per wood weight and per cellulose weight in saccharification after steam pretreatment (Kaida et al., 2009). Similar effects on growth and saccharification were observed in Acacia mangium (Kaku et al., 2011; Hartati et al., 2011). However, in a 4-year field trial transgenic poplars expressing AaXEG2 displayed substantially reduced biomass (Taniguchi et al., 2012). Furthermore, the plants were unable to bend upward when placed horizontally despite forming tension wood as in wild type, suggesting that xyloglucan is essential for generation of tensile stress (Baba et al., 2009).

Suppression of aspen pectin methyl esterase PME1 resulted in highly methylesterified HG in developing wood (Siedlecka et al., 2008; **Table 1**). The glucose yields in saccharification without pretreatment increased, but there was no improvement after acid pretreatment (Escamez et al., 2017).

Overexpression of aspen wood-expressed pectate lyase PL1- 27 increased the solubility of xylan and xyloglucan suggesting that HG constrains the solubility of main non-cellulosic polysaccharides (Biswal et al., 2014; **Table 1**). There was a positive effect on the glucose yield for the transgenic lines, but only after acid pretreatment. Interestingly, decreased HG biosynthesis in poplar by KD of the GAUT genes involved in biosynthesis of pectin and xylan (Mohnen, 2008), led to substantial growth stimulation and a small increase in glucose yields in saccharification after acid or hot-water pretreatment (Biswal et al., 2015; Biswal et al., 2018; **Table 1**).

# FUTURE PROSPECTS FOR IMPROVING NON-CELLULOSIC POLYSACCHARIDES FOR BIOREFINING OF WOOD

Two decades of research on modifying non-cellulosic polysaccharides have provided some insight on the role of these polysaccharides in cell wall architecture, on their importance for the efficiency of pretreatment and enzymatic saccharification, and have identified some off-target effects. They have also identified the most promising targets for achieving better growth and saccharification. Most research has been focused on xylan modification identifying xylan chain length, degree of acetylation, glucuronosylation, and glucuronosyl methylation as possible targets. The discovery of microbial enzymes hydrolyzing ester links between glucuronosyl units and lignin opened new prospects for directly reducing LCCs in cell walls, and should be further explored. The modification of minor wood components, HG and xyloglucan, had some of the highest impacts on saccharification, pointing to the crucial role of these polymers in wood integrity, but their modification sometimes led to growth inhibition. Using the wood-specific promoters, such as the GT43B promoter (Ratke et al., 2015), for expressing transgenes, can prevent off-target modifications in meristems, root hairs and other primary walled tissues, and possibly avoid growth penalty. Overall, there were few attempts in trees to employ different promoters for expressing the transgenes. Utilization of heat inducible promoters and heat-active enzymes in trees for modifying properties during post-harvest heat treatment has not yet been explored. Modification of other non-cellulosic components, for example mannans, RG-I and RG-II has not been so far investigated in trees and will be interesting to reveal the role of these polymers in woody biomass organization.

Currently, there is little understanding of the molecular mechanisms responsible for the observed phenotypes. In most cases, the distinction between primary and secondary effects of transgene expression is not possible. Interestingly, some types of xylan modification lead to increased growth (Biswal et al., 2015; Derba-Maceluch et al., 2015; Ratke et al., 2018), which might be mediated by the SCW integrity sensing mechanism (Ratke et al., 2018). On the other hand, the transcriptome analyses in Arabidopsis mutants impaired in xylan biosynthesis did not reveal any changes indicative of SCW integrity sensing (Faria-Blanc et al., 2018). It would be important to elucidate if such signaling

exists and if so, what triggers it, for successful modification of SCW.

Almost all results reviewed here are based on greenhouse experiments. Experience with xyloglucanase-modified poplar (Taniguchi et al., 2012) points to a need for early field experiments to pinpoint possible problems of transgenic lines. Field-grown trees will also provide sufficient biomass for testing pulping properties.

Finally, the tension wood of hardwoods appears to be particularly suitable for saccharification (Brereton et al., 2012; Sawada et al., 2018). Progress in identification of pathways involved in the induction of tension wood (Felten et al., 2018) will make it possible to design strategies to stimulate tension wood formation

#### REFERENCES


Balakshin, M., Capanema, E. A., and Chang, H. (2007). MWL fraction with a high concentration of lignin-carbohydrate linkages: isolation and 2D NMR spectroscopic analysis. Holzforschung 61, 1–7. doi: 10.1515/HF.2007.001

Biswal, A. K., Atmodjo, M. A., Li, M., Baxter, H. L., Yoo, C. G., Pu, Y., et al. (2018). Sugar release and growth of biofuel crops are improved by downregulation of pectin biosynthesis. Nat. Biotechnol. 36, 249–257. doi: 10.1038/nbt.4067


in normal growth conditions for dedicated biorefinery feedstocks.

#### AUTHOR CONTRIBUTIONS

ED, LJJ, and EJM wrote the paper. MLG prepared the Figure and the Table. All authors read and approved the final submission.

# FUNDING

SSF project ValueTree, the Swedish Energy Agency, and the Bio4Energy program (www.bio4energy.se).



reduced mechanical strength and xylan content in wood. Tree Physiol. 31, 226–236. doi: 10.1093/treephys/tpr008


include useful promoters for wood modification. Plant Biotechnol. J. 13, 26–37. doi: 10.1111/pbi.12232


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Donev, Gandla, Jönsson and Mellerowicz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Breeding and Engineering Trees to Accumulate High Levels of Terpene Metabolites for Plant Defense and Renewable Chemicals

#### Gary F. Peter\*

School of Forest Resources and Conservation, Genetics Institute, University of Florida, Gainesville, FL, United States

Plants evolved the capacity to synthesize highly diverse sets of secondary metabolites which are important for plant adaptation and health. In forest trees, many classes of compounds, particularly ones related to defense against insects, fungi, and bacteria accumulate to levels that enable their recovery and commercial use. One of the oldest examples is conifer terpenes, but terpenes are important secondary products from other tree species including eucalypts. Because terpenes, latex, and natural gums are synthesized and stored in specialized secretory glands, ducts, and laticifers in mostly pure forms they can be collected from live trees in addition to being extracted during industrial processing of wood. This minireview describes the potential of breeding and genetic engineering approaches to increase the quantities of terpene secondary metabolites to increase the amount of secondary products and thereby increasing the value of planted and managed forest trees. I advance the view that breeding and genetic engineering of metabolic pathways and specialized cell secretory structures can dramatically increase tissue terpene content.

#### Edited by:

Ronald Ross Sederoff, North Carolina State University, United States

#### Reviewed by:

Bartosz Adamczyk, Natural Resources Institute Finland (Luke), Finland Heather D. Coleman, Syracuse University, United States

> \*Correspondence: Gary F. Peter gfpeter@ufl.edu

#### Specialty section:

This article was submitted to Plant Biotechnology, a section of the journal Frontiers in Plant Science

Received: 05 July 2018 Accepted: 26 October 2018 Published: 20 November 2018

#### Citation:

Peter GF (2018) Breeding and Engineering Trees to Accumulate High Levels of Terpene Metabolites for Plant Defense and Renewable Chemicals. Front. Plant Sci. 9:1672. doi: 10.3389/fpls.2018.01672 Keywords: resin duct, terpene, conifer, oil gland, Eucalyptus, defense, biosynthesis, cell specialization

## INTRODUCTION

Plants evolved the ability to synthesize a large diversity of carbon rich compounds, including fatty acids, lipids, phenolics, sterols, tannins, and terpenes. Some of these hydrocarbons can accumulate to high percentages of dry mass. For example, oil seeds accumulate 10–75% triacylglycerol by dry mass, depending on the species. With the goal of producing hydrocarbons directly in plants for renewable chemicals and biofuels, tobacco was engineered using a push-pull-protect strategy to accumulate 15% of leaf dry mass as triacylglycerol (Vanhercke et al., 2014). In addition to hydrocarbons synthesized via primary metabolic pathways, there are many synthesized via secondary metabolic pathways. One of the most diverse sets of plant hydrocarbons are terpenes. Plant terpenes are synthesized via the plastid localized 2-C-methyl-D-erythritol 4-phosphate (MEP) pathway and the cytosolic mevalonate pathway (MVA) (Kirby and Keasling, 2009; Vranova et al., 2013). Terpenes are critical components of plant defense against herbivores, insects, pathogenic fungi, and bacteria (Franceschi et al., 2005). For terpene defenses to be effective, sufficient amounts need to accumulate prior to invasion; however, at these levels they are phytotoxic. To minimize this toxicity, plants evolved the ability to sequester them in extracellular spaces formed during

Peter Tree Terpenes

specialized differentiation of secretory tissues (Fahn, 1979, 1988). In a wide variety of plant species terpenes accumulate in extracellular cavities, or lumen that form during differentiation of glandular trichomes (Glas et al., 2012; Huchelmann et al., 2017), secretory cavities, or oil glands (OG) (Mewalal et al., 2017), and resin ducts (Zulak and Bohlmann, 2010). These tissues are specialized for the biosynthesis and storage of terpenes.

In many forest tree species, terpenes are vital chemical, and physical defense compounds, and specialized tissues are crucial to their biosynthesis and storage at high concentrations. For example, in Eucalyptus species mono- and sesquiterpenes are important for defense against insects, and herbivores, and are synthesized, and accumulate in secretory cavities, or OG within leaves (King et al., 2006; Naidoo et al., 2014). Eucalypt leaf terpene content ranges from 0.5 to 8% of dry mass and they are commercially extracted and used in the fragrance and pharmaceutical industries and have potential as biofuels (Mewalal et al., 2017). Conifer terpenes are a primary component of a durable, quantitative defense mechanism against stem boring insects and fungal pathogens (Trapp and Croteau, 2001; Franceschi et al., 2005). Constitutive and induced terpenes are critical for resistance (Hodges et al., 1985; Strom et al., 2002; Klepzig et al., 2005). Upon wounding, preformed constitutively produced terpenes are released at the wound site and the insects, and fungi are contained and growth is inhibited. The toxic terpenes cannot only kill the attacking organisms but they also flush and seal the wound site forming a physical barrier protecting the stem from further infection (Trapp and Croteau, 2001). Conifer terpenes are synthesized and accumulate in primary and secondary resin ducts, and form a viscous oleoresin secretion consisting of a complex mixture of volatile monoterpene olefins, and non-volatile diterpenoid resin acids that naturally accumulate to between 2 and 40% of the dry mass of wood (Trapp and Croteau, 2001).

The native levels of terpenes in eucalypt leaves and wood of conifer species is adequate for commercial extraction and use as renewable chemicals, and biofuels. In conifers, commercial scale collection of terpenes occurs today from live pine trees by tapping, solvent extraction of stumps, and during chemical pulping of wood, with global production from all sources about 3.5–4.0 millon tons y−<sup>1</sup> . Southeastern United States pulp mills recover ∼900,000 tons y−<sup>1</sup> of pine terpenes and fatty acids as crude tall oil (CTO) supporting specialty chemical biorefineries that compete in markets with petroleum derived feedstocks justifying the concept that biofuels from pine hydrocarbons could be profitable without subsidy if supply was increased (American, 2011). In the European Union, CTO is converted to renewable diesel fuel at two industrial scale biorefineries: one with a capacity of 120 million liters y−<sup>1</sup> (UPM-Kymmene, 2018). Pine monoterpenes can be efficiently dimerized to make jet fuel (Harvey et al., 2010; Mewalal et al., 2017). Globally, pine chemicals are the largest non-food based renewable hydrocarbon supply. However, the pine terpene supply is limited by the relatively low 2–4% average wood terpene content of dry wood.

Here, I discuss three synergistic approaches to increase tissue terpene content that can lead to forest trees with improved resistance to herbivores, insects, and pathogens as well as greater supplies of terpenes for renewable chemicals, and biofuels. Three main approaches to increase tissue terpene content considered are: (1) increase the number of specialized secretory glands and duct in plant organs to create greater biosynthetic, and storage capacity; (2) increase luminal storage volume of specialized secretory tissues; and (3) enhance carbon flux into terpene biosynthetic pathways.

# INCREASE SECRETORY TISSUES IN PLANT ORGANS

In conifers, primary resin ducts (RD) form in needles and nonwoody shoots, and secondary RD form in the secondary xylem, or wood (Larson, 1994). Although there is limited knowledge of primary RD formation, secondary RD formation, and function are better studied (Larson, 1994; Zulak and Bohlmann, 2010). Within a network of secondary RDs, terpenes are synthesized by heterotrophic epithelial cells that line the duct and are secreted into the lumen where they accumulate (Fahn, 1979; Trapp and Croteau, 2001; Zulak and Bohlmann, 2010; **Figure 1**). The lumens of axial and radial RDs are interconnected such that upon wounding the terpene oleoresin can flow from within the wood toward the surface of the stem (Larson, 1994; **Figure 1A**). The interconnectedness of the network is also important for the RD to obtain sucrose and nutrients from the phloem at the periphery of the stem through the live ray cell files. During stem diameter growth, axial, and radial RDs differentiate in the cambial zone from derivatives of fusiform and ray initials (Larson, 1994). Two adjacent derivatives begin dividing anticlinally to form a multicellular pre-duct structure which subsequently form the lumen schizogenously by separation of the cells at the middle lamella (Larson, 1994; **Figure 2**).

The genetic mechanisms that control constitutive RD cell fate and differentiation are not known. In loblolly pine, we showed that constitutive axial resin duct number (Westbrook et al., 2015) is under moderate genetic control, inherited as a polygenic trait, positively genetically correlated with xylem growth, and has sufficient variation for breeding to increase constitutive axial RD number about 10% in a single generation. Picea species (spruce) have emerged as the best-studied experimental system for dissecting the developmental signals and biochemical basis of induced terpene resin formation (Franceschi et al., 2005; Keeling and Bohlmann, 2006; Zulak and Bohlmann, 2010). Insect feeding, wounding, ethylene, and methyl jasmonate (MeJ) treatments induce development of traumatic resin ducts (TRD), which develop as a single contiguous row of elements derived from xylem mother cells in the cambial zone (Franceschi et al., 2005). Anatomical time course studies show that axial TRD take about 9–36 days to differentiate depending on the species (Larson, 1994; Nagy et al., 2000). In Norway spruce, a detailed time course of TRD formation and terpene synthesis in wood after MeJ application shows that geranylgeranylpyrophosphate synthase, monoterpene, and diterpene synthase activity began increasing after 5 days and peaked ∼15 days and increases in wood monoterpene and diterpenoid content

were observed after 10 days and peaked between 15 and 25 days (Martin et al., 2002). New TRD are observed at 10 days and supports the conclusion that in spruce wood newly synthesized terpene is only present in the newly formed TRD and not in undifferentiated ray cells. Thus, increasing RD numbers should be an effective strategy to increase wood terpene content. Exogenous application of MeJ induces axial TRD in the cambial region (Martin et al., 2002; **Figure 1C**), and application of inhibitors of ethylene biosynthesis suppresses jasmonate TRD induction, suggesting that jasmonate induces ethylene

synthesis which then induces TRD formation (Hudgins and Franceschi, 2004). Consistent with this model, MeJ application to sitka spruce bark rapidly upregulates genes that code for the rate limiting enzyme for ethylene biosynthesis, 1 aminocyclopropane carboxylic acid synthase (Ralph et al., 2007). These observations support that genetically increasing ethylene, or jasmonate biosynthesis, or signaling in the cambial zone should increase RD number and lead to greater wood terpene content.

To identify RD specific and jasmonate regulated genes in RDs, the transcriptomes of isolated cortical RDs from bark of Norway spruce were compared before and after treatment with MeJ (Celedon et al., 2017). In untreated bark, 863 cortical RD specific transcripts (Tau score > 0.75) were identified, and in MeJ treated bark, 766 cortical RD specific transcripts were identified. Of these 766 transcripts, about 648 were also identified as being specific to untreated cortical RD, indicating that 85% of the cortical RD specific genes remain RD specific 8 days after MeJ treatment (Celedon et al., 2017). As expected in the untreated cortical RD, specific transcripts included genes that code for all MEP biosynthetic enzymes, and a large number of cytochrome P450s, and terpene synthases. In addition, they included 34 transcripts that are annotated as transcription factors or encode putative DNA binding domains. There is potential that these regulators mediate RD specification and terpene pathway gene expression. In bark, ∼3,700 genes were differentially expressed 8 days after MeJ treatment; however, the number of jasmonate responsive genes in cortical RD was not reported. The terpene pathway genes in cortical RD showed that they were reduced 8 days after MeJ treatment relative to controls (Celedon et al., 2017). Overall, this data shows that in control and jasmonate treated bark, the genes coding for terpene biosynthesis are specifically expressed in cortical RDs.

Manipulating ethylene or jasmonate may induce more secondary RDs but could also have unintended effects on tracheids, so targeted genetic manipulation may be preferred. Targeted manipulation is limited by our lack of knowledge of the molecular mechanisms that regulate resin duct cell fate, differentiation, and function. **Figure 2** shows a simple conceptual model for secondary axial and radial RD formation. During normal stem diameter growth, constitutively formed secondary axial RDs appear to occur randomly throughout the wood volume but are typically adjacent to one or more ray cell files (**Figure 1**). While, derivatives of fusiform initials typically differentiate into tracheids, those adjacent to ray initials can also differentiate into axial resin ducts. Because the vast majority of fusiform derivatives differentiate into tracheids, for axial RDs to form, a first step is to inhibit regulators that specify tracheid cell fate. Although the molecular control of tracheid cell specification is presently unknown in conifers, vascular related NAC domain transcription factors function as master switches in angiosperms xylem (Zhang et al., 2014; Jokipii-Lukkari et al., 2017). In analogy to xylem cell specification, it is reasonable to hypothesize that a master transcriptional regulator specifies RD identity. Based on parsimony, this regulator also will be responsive to ethylene and/or jasmonate, and therefore these hormones could be important for constitutive RD formation in addition to TRD formation. However, the overlap between the traumatic RD and constitutive RD pathways is not known. It is also unknown whether the fusiform derivatives need to first have ray cell identity to differentiate into RDs or whether derivatives can directly acquire RD cell fate. Constitutive radial RDs also form in the cambial zone; however, it is unclear if they can differentiate within existing ray cell files deeper in the stem. Once a radial RD forms, typically next to an axial RD, the radial RD fate extends within that ray cell file out to the cambial meristem (**Figure 1**). To discover genes involved in RD formation, we are using association genetics in loblolly pine. With single nucleotide polymorphisms (SNP) in 4027 genes we identified 16 well supported candidate genes associated with axial RD number, some of which were also significant for oleoresin flow (Westbrook et al., 2015). More recently, we genotyped ∼70,000 SNPs in this population and identified significant SNPs in an additional 18 genes for constitutive axial RD number (Peter, 2018, Unpublished). We are investigating the importance of some of these candidate genes in RD specification through genetic engineering.

#### INCREASE LUMINAL VOLUME OF SECRETORY TISSUES

Conifer species differ in the organization of terpene resin producing cells and tissues. Abies species form both resin cells and multicellular resin blisters, while some Pinaceae species form constricted resin passages and other Pinaceae species form non-occluded resin ducts (Penhallow, 1907; Lewinsohn et al., 1991). Although no rigorous comparison amongst these species is available to separate the significance of resin structure number and extracellular volume, in general, wood terpene content is greater in species with resin ducts compared with species with resin cells (Lewinsohn et al., 1991).

Increasing luminal volume in addition to increasing RD number should lead to wood with greater terpene content. What controls RD luminal diameter is unknown, but lumen volume tends to be greater in RDs that form from a greater number of early cell divisions during differentiation (**Figure 2B**). Possibly four instead of two cambial derivatives could be induced to divide and commit to RD large diameter lumen would form. In loblolly pine, the length of axial RD range from 5 to 20 cm and average 10.4 cm, and in slash pine average RD length was more than double (49.8 cm vs. 20 cm) at 20 versus 10 years (Larson, 1994). In southern pines, axial RDs are 10–100 times longer than individual fusiform initials (3–5 mm) suggesting that a large number of contiguous fusiform initial derivatives within a single file need to commit to form axial RDs. The mechanism(s) which mediates axial RD length are unknown, but I hypothesize that cell to cell communication signals that specify RD identity are important for coordinating differentiation of the 10–100s of cambial meristem derivatives. If this hypothesis is correct, then discovery of this transmissible signal could enable engineering of larger RD and thereby increase wood terpene content.

In Eucalyptus polybractea, the number of OG is correlated well with leaf terpene content (King et al., 2006; Goodger and Woodrow, 2012). Analysis of individual OG shows they have very similar capacities, suggesting that the storage volume of the cavity is relatively tightly controlled, and it was hypothesized that gland capacity is controlled by leaf expansion (King et al., 2006). If King's hypothesis is correct, then increasing leaf thickness could increase cavity volume. Because leaf terpene content is limited not only by the number of OG but also by luminal volume of the gland it could be possible to increase this volume by engineering OG to be more duct like, similar to primary RD in conifers. I hypothesize that the difference between formation of spherical glands and elongated ducts is related to the existence of a cell to cell signaling mechanism which evolved in Pinaceae that mediates duct formation.

#### ENHANCE CARBON FLUX TO TERPENES

Plants synthesize monoterpenes and diterpenes via the plastid localized 2-C-MEP pathway, and sesquiterpenes, and triterpenes via the cytosolic MVA (Kirby and Keasling, 2009; Vranova et al., 2013; Banerjee and Sharkey, 2014). The MEP pathway is considered the high flux pathway in part because many species including Eucalyptus are rich in monoterpenes and conifers are rich in monoterpenes, and diterpenoids. The first committed step in the MEP pathway is catalyzed by 1-deoxyxyulose 5-phosphate synthase (DXS). Constitutive overexpression of DXS enzymes elevated terpene alkaloids in C. roseus (Peebles et al., 2011), taxadiene in Arabidopsis thaliana expressing a taxadiene synthase (Botella-Pavia et al., 2004), and terpenes 2–3.5 fold in leaves of L. latifolia (Munoz-Bertomeu et al., 2006). DXS is sensitive to feedback inhibition by isoprenoid precursor's isopentenyl diphosphate and dimethylallyl diphosphate which can control carbon flux through the pathway (Banerjee et al., 2013; Banerjee and Sharkey, 2014). A mutant form of poplar DXS was described that is less sensitive to feedback regulation; however, this has yet to be tested in plants (Banerjee et al., 2016). In spruce, the DXS gene family codes for type I and type II enzymes. Type II DXS genes are involved in secondary metabolism and in spruce are upregulated by wounding and fungal infection, suggesting that DXS overexpression should enhance wood terpene synthesis (Phillips et al., 2007). Upregulation of deoxyxyulose 5-phospate reductoisomerase (DXR) leads to a 50% increase in terpene based essential oils stored in globular trichomes of peppermint (Mahmoud and Croteau, 2001), and increased isoprene products, e.g., chlorophyll, β-carotene and taxadiene in A. thaliana expressing taxadiene synthase (Carretero-Paulet et al., 2006).

The plant MEP pathway is conserved with the deoxyxyulose 5-phosphate (DXP) pathway in microbes which has been improved through extensive genetic engineering (Ajikumar et al., 2008; Wang et al., 2009; Kirby et al., 2016) and synthetic biology approaches (Keasling, 2014). The results from many of these studies show that larger increases in isoprene products occur when multiple genes are manipulated in combination (Ajikumar et al., 2008; Wang et al., 2009). Changes to multiple genes in plants will require advances in genetic engineering efficiencies and more effective gene editing tools, particularly for conifers which have very large genomes and large gene families (Birol et al., 2013; Nystedt et al., 2013; Neale et al., 2014; Wegrzyn et al., 2014; Zimin et al., 2014). A particularly interesting finding was the identification of two Escherichia coli genes that can bypass the DXS catalyzed step (Kirby et al., 2015). By-passing DXS should improve carbon efficiency as DXS releases CO<sup>2</sup> when it condenses pyruvate and glyceraldehyde 3-phosphate into the five carbon 1-DXP product. The wild type yajO and mutant forms of ribB gene product were shown to convert ribulose 5-phosphate into DXP in vitro and upon expression in E. coli improve terpene synthesis (Kirby et al., 2015). Transgenic loblolly pine constitutively expressing plastid localized YajO and RibB(G108S) showed significantly increased terpenes in the wood of 1-year-old seedlings (Kirby et al., 2018, Unpublished).

#### CONCLUSION

There is great opportunity to increase terpene hydrocarbon content in eucalypts and conifers through breeding and genetic engineering. Continued discovery of the molecular mechanisms that control OG and RD specification and cell to cell signaling have the potential to enable genetic engineering and targeted breeding to increase the number of resin ducts and OG and their size leading to trees with greater levels of constitutively produced terpenes. Capitalizing on the advances in genetic engineering and synthetic biology with the MEP pathway in microbes should further enable greater carbon flux and carbon efficiency.

#### AUTHOR CONTRIBUTIONS

The author confirms being the sole contributor of this work and has approved it for publication.

#### FUNDING

U.S. Department of Energy DE-AR0000209, DE-SC0 019099, and U.S. Department of Agriculture 2011-68002-30185 provided funds for GP research on pine terpenes. University of Florida provided publication fees.

#### ACKNOWLEDGMENTS

I wish thank J. Kirby for many discussions on MEP pathway engineering and Y. Wang for the cross section images in **Figure 1**.

#### REFERENCES

fpls-09-01672 November 16, 2018 Time: 14:58 # 6


studies of conifer wood development. New Phytol. 216, 482–494. doi: 10.1111/ nph.14458


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Peter. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fpls-09-01672 November 16, 2018 Time: 14:58 # 7

# Systems and Synthetic Biology of Forest Trees: A Bioengineering Paradigm for Woody Biomass Feedstocks

*Alexander A. Myburg1 \*, Steven G. Hussey1 , Jack P. Wang2 , Nathaniel R. Street <sup>3</sup> and Eshchar Mizrachi1*

*1 Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Hatfield, South Africa, 2 Forest Biotechnology Group, Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC, United States, 3 Umeå Plant Science Center, Department of Plant Physiology, Umeå University, Umeå, Sweden*

#### *Edited by:*

*Wout Boerjan, Flanders Institute for Biotechnology, Belgium*

#### *Reviewed by:*

*Kelly Mayrink, University of Florida, United States Daniel Conde, University of Florida, United States*

*\*Correspondence:* 

*Alexander A. Myburg zander.myburg@up.ac.za; zander.myburg@fabi.up.ac.za*

#### *Specialty section:*

*This article was submitted to Plant Biotechnology, a section of the journal Frontiers in Plant Science*

*Received: 22 August 2018 Accepted: 28 May 2019 Published: 20 June 2019*

#### *Citation:*

*Myburg AA, Hussey SG, Wang JP, Street NR and Mizrachi E (2019) Systems and Synthetic Biology of Forest Trees: A Bioengineering Paradigm for Woody Biomass Feedstocks. Front. Plant Sci. 10:775. doi: 10.3389/fpls.2019.00775*

Fast-growing forest plantations are sustainable feedstocks of plant biomass that can serve as alternatives to fossil carbon resources for materials, chemicals, and energy. Their ability to efficiently harvest light energy and carbon from the atmosphere and sequester this into metabolic precursors for lignocellulosic biopolymers and a wide range of plant specialized metabolites make them excellent biochemical production platforms and living biorefineries. Their large sizes have facilitated multi-omics analyses and systems modeling of key biological processes such as lignin biosynthesis in trees. High-throughput 'omics' approaches have also been applied in segregating tree populations where genetic variation creates abundant genetic perturbations of system components allowing construction of systems genetics models linking genes and pathways to complex trait variation. With this information in hand, it is now possible to start using synthetic biology and genome editing techniques in a bioengineering approach based on a deeper understanding and rational design of biological parts, devices, and integrated systems. However, the complexity of the biology and interacting components will require investment in big data informatics, machine learning, and intuitive visualization to fully explore multi-dimensional patterns and identify emergent properties of biological systems. Predictive systems models could be tested rapidly through high-throughput synthetic biology approaches and multigene editing. Such a bioengineering paradigm, together with accelerated genomic breeding, will be crucial for the development of a new generation of woody biorefinery crops.

Keywords: synthetic biology (synbio), systems biology, systems genetics, woody biomass, biorefinery, bioeconomy, lignin biosynthesis, wood formation

## INTRODUCTION

Compared to herbaceous plants, forest trees afford numerous of advantages to plant biologists interested in studying growth and development. Most obviously, trees produce vast quantities of wood comprising multiple cells types produced from the meristematic cambial initials of the vascular cambium. In the context of developmental studies of secondary growth, greater size represents greater spatial resolution to profile stages of development during the formation of secondary phloem and xylem as recently demonstrated for aspen (Obudulu et al., 2016; Sundell et al., 2017) and Norway spruce (Jokipii-Lukkari et al., 2017). Such studies have been instrumental in identifying the individual genes, proteins, metabolites, and pathways comprising the molecular components of biological processes such as wood formation. Many wood formation genes have been targeted using transgenic approaches to demonstrate the ability to modify wood properties (Chang et al., 2018). Unfortunately, compared to greenhouse results, some of these single gene modifications have had adverse effects on growth or altered transgene effect when tested in field trials (Leplé et al., 2007; Voelker et al., 2010). Such outcomes suggest a need for greater understanding of the interaction of systems components with intrinsic and extrinsic factors that differ between the greenhouse and field (Beckers et al., 2016).

In the context of industrial application, woody biomass is a renewable, carbon-neutral source of lignocellulosic materials for construction, pulp, bioenergy and, increasingly, for advanced biomaterials such as nanocellulose (Kunaver et al., 2016; Thomas et al., 2018). These and other biorefinery products, including novel biopolymers and biochemicals, are key to the emerging plant-based bio (Vanholme et al., 2013a; Van de Wouwer et al., 2018). Most woody biomass traits of commercial value are genetically complex. As such, it is challenging to devise strategies determining which genes contribute to these complex phenotypes and to associate these with causal mechanisms. In contrast to genomic selection that can be used as a "black box" route to improve complex traits, bioengineering approaches require systems-level understanding to elucidate the molecular basis of complex traits. "Multi-omics" analyses, together with extensive transgenic perturbation of a biological pathway and mathematical modeling of pathway dynamics, have recently been successfully applied to generate the most detailed systems biology model yet, of lignin biosynthesis, in a forest tree (Wang et al., 2018). The large amount of genetic diversity retained in tree breeding programs has also facilitated the use of natural variation to dissect complex traits (Mizrachi and Myburg, 2016; Mizrachi et al., 2017). Such systems genetics approaches pose opportunities for rapid advances in genomic breeding and provide diverse genetic backgrounds for genetic engineering of biomass traits.

Synthetic biology ("SynBio") aims to reduce biologically complex systems into discrete functional components that can be combined in numerous ways to create efficient and novel biological products or properties (for a synopsis of SynBio development, see Cameron et al., 2014). This transdisciplinary field draws on engineering principles and standardization of the fabrication process (i.e., gene circuit assembly) (Heinemann and Panke, 2006). Not only can synthetic biology incorporate existing genetic components (biological "parts") across the three Domains of life, but chemical gene and whole-genome synthesis, directed evolution, model-informed protein engineering, and synthetic regulatory machinery design enable the creation of xenobiological systems. The long-lived nature of trees and their vast reserves of carbon-rich sink tissues present both a challenge and opportunity for accumulating and storing synthetically produced compounds (Wilkerson et al., 2014; Mottiar et al., 2016). Synthetically modified trees could serve as renewable factories for the production of large quantities of custom-designed biomass and biochemicals that do not exist naturally in trees.

We present a brief review of recent progress and our perspective on the application of systems biology approaches to understand complex tree biology, from molecular to organismal level crucial for sustainable production of woody biomass. We propose a bioengineering paradigm based on rational design, drawing on systems-level understanding of biological processes – and the application of synthetic biology and genome editing technologies – to engineer such processes in combination with accelerated, genome-assisted breeding of forest trees.

# SYSTEMS BIOLOGY: INTEGRATIVE UNDERSTANDING OF BIOLOGICAL PROCESSES IN FOREST TREES

Systems modeling of biological processes requires extensive perturbation of systems components to produce the comprehensive experimental data required for modeling (**Figure 1**). The lack of extensive mutant collections in forest trees has necessitated the use of transgenic approaches to perturb genes. This has limited such studies to a relatively small number of genes, typically in the same biological pathway. Lignin biosynthesis has emerged as one of the most wellstudied biological processes in forest trees, largely due to the interest in reducing lignin content or altering lignin composition to reduce the recalcitrance of woody biomass during industrial processing (Pilate et al., 2002; Leplé et al., 2007; Voelker et al., 2010; Mansfield et al., 2012; Bryan et al., 2016; Saleme et al., 2017). The first comprehensive, systems-level analysis of the lignin biosynthetic pathway was performed in *Arabidopsis* (Vanholme et al., 2012). This resulted in the discovery of additional components of the biological process, in particular, a novel component in the form of a caffeoyl shikimate esterase (CSE) that hydrolyzes caffeoyl shikimate into caffeate and, when mutated, leads to a fourfold increase in glucose release from cellulose during saccharification without pretreatment in *Arabidopsis* (Vanholme et al., 2013b). In poplar downregulated for CSE, 60% more glucose was released from wood without pretreatment, as a consequence of the lower lignin amount and the higher cellulose content (Saleme et al., 2017).

The most extensive analysis of systems biology in forest trees has been done on the lignin biosynthetic pathway in differentiating xylem of *Populus trichocarpa* (Wang et al., 2018). Transgenic trees systematically perturbed in the expression of lignin genes were produced, and quantitative data on genomics, transcriptomics, fluxomics, biochemical, chemical, and cellular analyses were integrated to construct a mathematical description of the pathway. The lignin model calculates how changing expression of any pathway gene or gene combination affects protein abundance, metabolic-flux, and phenotypic traits, including lignin content and composition, tree growth, wood density, and saccharification, for sugar release. The model predicts improvements in any of these traits individually or in combinations, through modifying the expression of specific lignin genes. The lignin pathway is amenable to modeling because major pathway components have been identified and methodologies for their quantification have been developed (Shi et al., 2010; Vanholme et al., 2010, 2019; Wang et al., 2014, 2018). Genomic sequence information (Goujon et al., 2003; Raes et al., 2003; Tuskan et al., 2006; Myburg et al., 2014; Carocha et al., 2015) defined the gene families of the known proteins that are components of the lignin biosynthetic pathway. Some enzymes are encoded by single genes and others are members of large gene families (Goujon et al., 2003; Shi et al., 2010), while other members of the families are not associated with lignification in time or place. This highlights the importance of identifying those family members that are *bona fide* components of the systems model of wood lignification. The abundance of lignin enzymes *in vivo* has been measured in *P. trichocarpa* to determine the quantitative relationship between transcripts and proteins, and the enzymes in the pathway have been purified and characterized biochemically and genetically so reliable kinetics data could be obtained for substrates and inhibitors (Wang et al., 2014).

The systems analysis of lignin biosynthesis in *P. trichocarpa* has provided valuable information on the formation and function of lignin in forest trees, such as where genetic regulation is significantly different from non-woody species (e.g., *Arabidopsis*). For example, in the case of 4-coumarate CoA ligase (4CL), *Arabidopsis* has a single gene and enzyme for 4CL, while *P. trichocarpa* has two distinct *4CL* genes, one encoding a protein with a regulatory function (Chen et al., 2014). Similar regulatory functions have been demonstrated for genes upstream of lignin biosynthesis in the shikimate pathway (Xie et al., 2018) supporting the 4CL finding. *In silico* simulations furthermore predicted the consequence on flux and wood traits for every possible combination of multigene perturbations where expression of lignin genes was either upregulated, downregulated, or remained at the wild-type level. Systems modeling therefore provides a unique opportunity to guide multi-trait engineering and breeding strategies to create novel tree feedstocks optimized for fast growth and conversion to energy and materials. The models also inform identification of alleles or combinations of alleles to breed superior forest tree varieties (Wang et al., 2018).

# SYSTEMS GENETICS: DISSECTING SYSTEMS COMPONENTS AND INTERACTIONS UNDERLYING COMPLEX TRAITS

While systems biology studies have advanced our understanding of developmental and stress response pathways (especially in identifying key regulators within these processes), they frequently are based on severe phenotypes that represent an aggregate of a cascade of cellular and physiological reactions to the adverse effect of the perturbation. Furthermore, such studies do not resolve the molecular basis or mechanisms of trait variation among individuals within populations. Revealing how complex trait variation arises from genetic polymorphisms affords opportunities to discover novel approaches to inducing trait variation in addition to advancing understanding of the evolutionary processes and the mechanistic basis of variation in complex traits (**Figure 1**).

Complex traits are the emergent outcome of actions and interactions of hundreds to thousands of genes, functioning within pathways in combination with environmental effects. Variation at any biological layer from genome to trait can contribute to the final outcome. Full understanding of complex phenotypes requires approaches that can model these individual actions and interactions at the population level. Systems genetics provides a clearer path linking the genome to complex phenotypes by deducing the mechanisms by which genetic variation impacts variation in these intermediate biological layers (e.g., transcript, protein, and/or metabolite) to ultimately impact the complex trait (Civelek and Lusis, 2014; Feltus, 2014). Systems genetics has benefited substantially from recent sequencing technology improvements that have enabled population-wide genome resequencing and global transcript profiling *via* mRNA and small RNA sequencing. While SNP genotyping typically provides adequate genetic resolution for controlled crosses (QTL analysis using pedigrees), genome-wide association studies (GWAS) using natural populations often mandate whole-genome re-sequencing due to the rapid decay of LD in many tree species (McKown et al., 2014; Zhang et al., 2018). There have now been a number of studies in trees mapping the genetic architecture of gene expression variation (eQTLs) in tree pedigrees (Kirst et al., 2004, 2005; Drost et al., 2010) and natural populations (Porth et al., 2013; Mähler et al., 2017; Zhang et al., 2018); however, there is only one reported true systems genetics study (Mizrachi et al., 2017) in addition to genetical genomics approaches (Street et al., 2006; Drost et al., 2015).

These studies have reported contrasting results between pedigrees and natural populations, suggesting that the genetic background is an important consideration, especially for extrapolating findings to a wider context. Due to developments in sequencing technologies, there has been greater focus on eQTL studies; however, variation in protein (allelic) structural variation, epigenomic variation (Cheung et al., 2017), protein abundance (Consoli et al., 2002), and post translational modifications (Cesnik et al., 2016) or metabolome variation (Morreel et al., 2006; Joseph et al., 2014; Matsuda et al., 2015) fit equally well and indeed would be as responsible for variation, within the systems genetics framework. While systems genetics studies can reveal, for example, genomic loci linked to gene expression variation contributing to trait variation, it can be problematic to place these within a biological framework to understand how these novel mechanisms influence the trait. One approach to overcoming this barrier can be integration of systems biology studies, such as developmental studies, to provide context to these novel genes. Combining systems genetics and systems biology co-expression networks is also a powerful approach as biological inference can be aided by considering the neighborhood of genes connected to a novel candidate (Mähler et al., 2017).

To date, one level of complexity that has not been considered in eQTL and systems genetics studies is transcriptional plasticity.

There is now extensive knowledge of alternative splicing and transcript usage, which can vary among tissues, during development or among genotypes (Bao et al., 2013; Xu et al., 2014; Zhao et al., 2014). Mapping expression at the transcript level will provide greater resolution to the link between genome and phenotype. However, fine-scale expression analysis at the transcript level remains challenging, mainly because most tissue samples contain a complex mix of transcripts and splicing variants from different cell types or cells at different stages of development. There is equally a need for improvement in network inference methods, with an understanding that no single current method is adequate to capture the range of interactions present in biological networks (Marbach et al., 2012) and with few tools available to facilitate aggregate inference approaches (Schiffthaler et al., 2018). Similarly, no studies have yet integrated genome-wide assays of DNA modifications or accessibility despite increasing evidence of the additional insight such information brings. There is also a paucity of large-scale transcription factor binding or protein-protein interaction data for plants in general, which further limits comprehensive understanding.

# SYNTHETIC BIOLOGY: A NEW BIOENGINEERING PARADIGM FOR FOREST TREES

SynBio has made its greatest advancements in prokaryotes and single-celled eukaryotes such as yeast, but plant synthetic biology is catching up (Patron et al., 2015; Schaumberg et al., 2016;

de Lange et al., 2018; Hanson and Jez, 2018; Pouvreau et al., 2018). In addition to a large number of modifications made by conventional transgenic approaches (Chang et al., 2018), there have been some notable successes in the synthetic modification of trees using single-gene strategies such as the introduction of chemically labile ester linkages into the lignin backbone of poplar trees through the xylem-specific expression of an exogenous feruloyl-coenzyme A monolignol transferase (FMT) from *Angelica sinensis* (Wilkerson et al., 2014). Future strategies will attempt to evaluate far more complex designs, relying to a large extent on the ability to assemble DNA fragments idempotently (that is, the flexibility to assemble basic parts with increasing complexity using a universal method and without having to re-modify each intermediate). There is currently a scarcity of freely available standardized biological parts suitable for plant biology akin to the International Genetically Engineered Machines (iGEM) BioBrick parts collection1 . Encouragingly, the Phytobrick synthetic biology standard with a universal lexicon for plant gene elements coupled to powerful Type IIS idempotent assembly methods (Engler et al., 2014; Patron et al., 2015) is fast gaining traction, with an increasing number of Phytobricks now included in the iGEM Standard Registry2 . Tree biologists and biotechnologists should adapt to this conceptual framework as soon as possible to keep up with synthetic biology development in annual crops. We recently developed an open access synthetic panel of 221

<sup>1</sup> http://parts.igem.org 2

http://parts.igem.org/Collections/Plants

FIGURE 2 | A proposed bioengineering paradigm based on forest tree synthetic biology. (i) Systems biology models inform the initial design of synthetic multigene constructs. Individual components (DNA parts) are sourced from existing or novel genetic resources (e.g., bioprospecting), synthesized, and submitted to biological parts collections as standardized Phytobricks (ii). The fabrication and testing phase (iii) involves high-throughput idempotent construct assembly followed by transformation and testing in a simplified chassis such as a cell culture derived from the target tree species (iv). Construct expression and phenotypic data are then integrated into a machine learning model to optimize the construct design, an iterative process that produces a reduced number of semi-optimized constructs for validation in a genotype of the target species selected to perform favorably in laboratory conditions and conducive to transformation ("lab rat"). Such validation may include greenhouse trials involving juvenile trees or mature trees in field trials. (v) Successful constructs may be introduced into a preferred elite parental genotype for intraspecific or interspecific hybrid breeding. A possible avenue to rapidly mobilize synthetic gene constructs into more diverse genetic backgrounds would be to introduce an early flowering construct (such as overexpression of the *FLOWERING LOCUS T* gene; FT-OX) into a number of elite parental genotypes. Such genotypes can then be transformed with the optimized synthetic gene constructs and used as female parents in crosses with non-transgenic (wild-type, WT) parents to produce F1 progeny segregating for both constructs. If the two constructs are on different chromosomes, approximately 25% of the progeny should be WT for growth and flowering but contain the synthetic gene construct. Early flowering parental genotypes can be propagated *in vitro* to be transformed with various synthetic constructs and, if unrelated, different parental genotypes could be crossed for transgene stacking.

secondary cell wall-related *Eucalyptus grandis* transcription factors and 65 promoter sequences in partnership with the Department of Energy Joint Genome Institute, most of which were designed as Phytobricks (Hussey et al., in preparation). Accessibility of high-throughput DNA synthesis services will ensure that a growing number of standardized parts become available under open material transfer agreements.

One considerable challenge to tree synthetic biology is precise spatiotemporal control of complex multigene constructs, especially in woody tissues where inducing gene expression with external agents is impractical. Such constructs must function optimally in a tissue of interest, be resistant to eukaryotic silencing mechanisms such as RNAi or epigenetic silencing, be somatically stable such that somatic mutations that disrupt a synthetic construct should not be selectively favored, take into account compartmentalized plant cell biology, and have built-in biosafety mechanisms preventing transgene escape. Furthermore, synthetic gene circuits should consist of *composable* parts (de Lange et al., 2018) that individually encode defined and transferrable functions (a property known as *modularity*) and that function independently of endogenous processes to avoid unwanted interference, a property known as *orthogonality*. Designer transcription factors based on zinc finger, TALE, and dCas9 technologies targeting endogenous or synthetic promoters (Liu and Stewart, 2016) are ideal orthogonal synthetic tools that allow considerable transgene regulation flexibility but may require extensive testing and optimization.

Currently, thousands of iterations of multigene constructs can be produced by robotics-assisted DNA Foundry services. However, it is not feasible to transform and phenotype thousands of transgenic trees for iterative design-build-test-learn cycles envisioned for plant synthetic biology (Pouvreau et al., 2018). Early synthetic designs will therefore have to be tested in a simplified system (or "chassis" in synthetic biology terminology) until optimized constructs can be evaluated in target tree species. This will necessitate the development of experimental chassis derived from the target tree species that are easier to transform and phenotype *en masse*, such as protoplasts, cell cultures, or agroinfiltrated leaves, or "lab rat" genotypes that perform well in tissue culture and have high transformation rates (**Figure 2**). *In vitro* tracheary element transdifferentiation approaches relying on hormonal induction (Fukuda and Komamine, 1980; Kubo et al., 2005; Saito et al., 2017), or the VND7 inducible system (Yamaguchi et al., 2010; Goué et al., 2013), for example, could be used to induce secondary cell wall formation in suspension culture cells or explant tissues of a target tree species and thus evaluate the phenotypes of many cell wall-modifying constructs before semi-optimized constructs are tested in a multicellular tree model. Successful constructs may either be directly introduced into an elite genotype of the target species or rapidly crossed into elite breeding material after co-transformation of parental genotypes with an early flowering construct (**Figure 2**).

#### FUTURE PERSPECTIVE

Most fundamental discoveries, including proof-of-concept cell wall and growth modifications (and even extrapolations on biomass processing efficiency), are still derived from the analysis of *Arabidopsis* inflorescence stems, which remains a poor representation of large tree stems comprised mainly of wood. Of priority in the short term is testing genetic perturbations as much as possible in a woody model such as *Populus* and, if possible, directly in target species of interest. Several priorities must be met here, such as enhancing the transformation efficiency of commercial species or genotypes, capacity for large-scale transformation experiments, as well as (crucially) field trials confirming greenhouse phenotypes in mature trees. Large consortia and industry collaborations, as well as engagement and an improvement in the regulatory landscape, must be met for this to be truly realized.

Also in the short to medium term, the convergence of high-resolution technologies that capture genomics, epigenomics, and other cell 'omics', phenomics, and environment (including microbiome) data, as well as computational modeling of the interactions of these, requires transdisciplinary innovations and probably the application of artificial intelligence methodology. Combined with genome editing (with broader scale synthetic biology applications), this makes the field of forest biotechnology ripe for a new wave of creativity, especially in thinking of the tree itself as a living biorefinery and as a stable and continuous producer of specialized high-value compounds or polymers in sustainably harvestable tissues and organs such as leaves, secondary phloem, and bark. Higher resolution knowledge of metabolite precursors, tissue-specific pathway engineering, and knowledge of novel high-value derivatives that can be discovered using bioprospecting methods and produced in trees has the

#### REFERENCES


potential for a new generation of relatively low volume, but high value, forest products.

How far does the application of these technologies go? Given the long rotation times of forest trees as harvestable biomass crops, it is unlikely (and indeed not essential) that we will see movement toward a "bottom up" approach that builds on a synthetic minimal tree genome. It is much more important to optimize the precise introduction of complex regulatory circuits and metabolic pathways that remain stable through breeding generations, a nascent field of research in itself. Such a bioengineering paradigm, combined with advanced genomic breeding approaches and accelerated flowering technologies, may empower rapid development of woody biomass crops tailored for diverse biorefinery, biomaterials, and timber construction products. In many forest-growing countries, an advanced forest products industry will be one of the cornerstones of the bioeconomy and key to achieving global sustainable development goals.

#### AUTHOR CONTRIBUTIONS

All authors contributed to the drafting and editing of the manuscript.

#### FUNDING

This study was supported by National Research Foundation of South Africa (Research Funding, NRF Bioinformatics and Functional Genomics Programme grant UID 97911), Department of Science and Technology of South Africa, (Research Funding) Technology Innovation Agency (TIA) of South Africa (Research Support), Sappi and Mondi (Forestry Industry Partners and Research Support), and University of Pretoria (Research Facilities and Support). NRS is supported by the Trees for the Future (T4F) project (Sweden).

proteogenomics and global post-translational modification (G-PTM) search strategy. *J. Proteome Res.* 15, 800–808. doi: 10.1021/acs.jproteome.5b00817


profiles reveal uncharacterized modularity of wood formation in *Populus tremula*. *Plant Cell* 29, 1585–1604. doi: 10.1105/tpc.17.00153


lignin in *Populus trichocarpa*. *Plant Cell* 26, 894–914. doi: 10.1105/ tpc.113.120881


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling editor declared a past co-authorship with one of the authors NS.

*Copyright © 2019 Myburg, Hussey, Wang, Street and Mizrachi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Opportunities for Innovation in Genetic Transformation of Forest Trees

Michael Nagle<sup>1</sup>† , Annabelle Déjardin<sup>2</sup>† , Gilles Pilate<sup>2</sup> \* and Steven H. Strauss<sup>1</sup> \*

<sup>1</sup> Forest Ecosystems and Society, Molecular and Cellular Biology, Oregon State University, Corvallis, OR, United States, <sup>2</sup> BioForA, INRA, ONF, Orléans, France

The incorporation of DNA into plant genomes followed by regeneration of non-chimeric stable plants (transformation) remains a major challenge for most plant species. Forest trees are particularly difficult as a result of their biochemistry, aging, desire for clonal fidelity, delayed reproduction, and high diversity. We review two complementary approaches to transformation that appear to hold promise for forest trees.

Keywords: transformation, regeneration, WUSCHEL, BABY BOOM, Populus, organogenesis, embryogenesis, Agrobacterium

#### Edited by:

Chandrashekhar Pralhad Joshi, Michigan Technological University, United States

#### Reviewed by:

Chung-Jui Tsai, University of Georgia, United States Jae-Heung Ko, Kyung Hee University, South Korea

#### \*Correspondence:

Gilles Pilate gilles.pilate@inra.fr Steven H. Strauss steve.strauss@oregonstate.edu †Co-senior authors

#### Specialty section:

This article was submitted to Plant Biotechnology, a section of the journal Frontiers in Plant Science

Received: 06 July 2018 Accepted: 11 September 2018 Published: 02 October 2018

#### Citation:

Nagle M, Déjardin A, Pilate G and Strauss SH (2018) Opportunities for Innovation in Genetic Transformation of Forest Trees. Front. Plant Sci. 9:1443. doi: 10.3389/fpls.2018.01443 SUMMARY

Developmental genes (DG) may be useful tools for promoting transformation. DGs, which can act through a wide variety of developmental mechanisms to promote regeneration of transgenic cells, have been widely employed in model plants to promote embryogenesis and in some cases organogenesis. Following initial experimental demonstration in dicots, the DGs WUSCHEL and/or BABY BOOM have formed the basis of a high efficiency method for a variety of monocot genotypes and species. However, in dicots the utility of these genes as the basis of a robust transformation system has not been demonstrated. Many additional DGs appear capable of promoting regeneration that have not been systematically explored as transformation tools.

Because in vitro plant transformation systems are costly and must be customized for each new genotype and species, in vivo approaches to transformation hold much appeal. It is possible to produce stable transgenic plants by agro-inoculation of seeds or vegetative/floral buds, but as yet these approaches have not been used routinely in any plant species except for the Arabidopsis floral dip. We will discuss how the Arabidopsis system, and other in planta techniques, may be tailored for forest trees, taking into account variations in biology of different taxa.

#### DEVELOPMENTAL GENES AS TOOLS FOR TRANSFORMATION IMPROVEMENT

When overexpressed, transcriptional or epigenetic regulators of embryo and meristem development (referred to as developmental genes, DGs) have been shown to confer improvements to in vitro regenerability. Recent molecular evidence places these genes within a genetic regulatory network, connected by cascades and feedback loops of transactivation. Knowledge of these interactions, detailed in **Supplementary Table 1** and **Figure 1**, can inform selection of individual genes and combinations of genes that may be most effective for improving regeneration. In this mini-review, however, we do not consider the many genes which may enhance regeneration via

epigenetic mechanisms, or by affecting the rate of gene transfer or incorporation of DNA into the genome.

# Candidate Genes and Their Modes of Action

The first gain-of-function mutation for a transcriptional regulator of shoot induction was observed in 1941. Bryan and Sass discovered a heritable trait in maize that causes leaves to develop ectopic shoot meristems, or "knots" (Bryan and Sass, 1941). Transposon tagging later revealed the responsible locus to be Knotted-1, the first homeodomain transcription factor identified in plants (Hake et al., 1989). Knotted-1 was found to be overexpressed in mutants as a result of a transposon insertion (Smith et al., 1992), and overexpression of maize Knotted-1 in tobacco and Arabidopsis was also reported to trigger development of ectopic shoot meristems (Sinha et al., 1993; Lincoln et al., 1994).

A loss-of-function mutation of SHOOT MERISTEMLESS (STM), an Arabidopsis homolog of Knotted-1, was reported to lead to premature termination of shoot meristems and a phenotype of twig-like plants lacking lateral shoots (Long et al., 1996). Transcription of STM is regulated in part by a positive feedback loop between STM and the transactivator CUP-SHAPED COTYLEDON 1 (CUC1) (**Supplementary Table 1** and **Figure 1**; Takada et al., 2001; Spinelli et al., 2011). Similarly to STM, overexpression of CUC1 enhances regeneration through embryogenic and organogenic pathways (Takada et al., 2001; Hibara et al., 2003). Overexpression of CUC1/2 in Arabidopsis led to large increases in numbers of shoot from calli, while knockout had the opposite effect (Daimon et al., 2003). Directly upstream of CUC1 is LEAFY COTYLEDON1 (LEC1), which was first identified as a regulator of embryogenesis when overexpression induced embryos to develop on leaves of Arabidopsis (Lotan et al., 1998).

A mutant screen for shoot meristem defects led to the discovery of WUSCHEL (WUS) loss-of-function mutants, which had similar phenotypes to stm mutants (Laux et al., 1996). WUS is expressed in the organizing center adjacent to meristematic stem cells (Mayer et al., 1998), then trafficked into the central zone of the stem cell niche (Yadav et al., 2010; Daum et al., 2014), where it is required for maintenance of stem cell identity. Contrarily, STM is expressed and active in the peripheral zone (PZ) (Williams et al., 2005; Gordon et al., 2007). In Arabidopsis, overexpression of WUS was reported to initiate ectopic organogenesis in vivo, although differentiation into organs was incomplete unless STM was also overexpressed (Gallois et al., 2002). WUS was rediscovered in a T-DNA activation mutagenesis screen for gain-of-function mutations conferring cytokinin-independent potential for induced embryogenesis in vitro (Zuo et al., 2002).

Downstream of WUS and STM are overlapping sets of diverse genes which balance differentiation and dedifferentiation to promote progressive development of meristems. These genes include members of the SE-associated receptor kinase (SERK) family, as well as enzymes for hormone biosynthesis, cell cycle regulators, and numerous transcriptional regulators, some of which function not only downstream, but upstream of WUS/STM (Spinelli et al., 2011; Balkunde et al., 2017; Ikeuchi et al., 2018; Scofield et al., 2018).

WUS transcription is activated by a complex of factors including B-type ARABIDOPSIS RESPONSE REGULATOR (ARR) and HDIII-ZIP family proteins (**Supplementary Table 1** and **Figure 1**). In separate experiments, overexpression of ARR2 and ARR12 led to roughly twofold increases in number of shoots from callus in vitro (Dai et al., 2017; Zhang et al., 2017). Contrary to the roles of B-type ARR genes, overexpression of the A-type led to strong suppression of in vitro organogenic capacity (Osakabe et al., 2002; Buechel et al., 2010).

The combined knockout of several HDIII-ZIP factors in a wus background rescued SAM development, indicating that these factors may simultaneously promote and inhibit stem cell differentiation via pathways both dependent and independent of WUSCHEL (Lee and Clark, 2015). Orthologs of these TFs found in Populus trichocarpa are expressed in the SAM, and overexpression promotes stem cell proliferation and inhibits development of shoot primordia into mature organs (Du et al., 2011; Robischon et al., 2011; Zhu et al., 2013).

ENHANCER OF SHOOT REGENERATION 1/2 acts directly upstream of WUS and indirectly upstream of STM (**Supplementary Table 1** and **Figure 1**) and numerous other poorly known genes (Chandler et al., 2007; Ikeuchi et al., 2018). ESR1 overexpression conferred cytokininindependent competence for regeneration, although constitutive overexpression inhibited differentiation of SAMs. Recovery of transgenic plants was enabled via deactivation of chemoinducible ESR1 after shoot primordial development (Banno et al., 2001). ESR2 is transactivated by ESR1 and shares many downstream targets (Ikeuchi et al., 2018); overexpression of either leads to remarkable improvement in shoot regeneration capacity in Arabidopsis (Ikeda et al., 2006; Mase et al., 2007). Unlike ESR1 (Banno et al., 2001), overexpression of the upstream gene WOUND-INDUCED DEDIFFERENTIATION 1 (WIND1) enhances formation of callus in addition to shoots (Iwase et al., 2016); however, callus develops into shoots upon transfer of chemoinducible wind1 mutants to media without inducer (Iwase et al., 2011). Several WIND homologs are known to transcriptionally activate ESR1 (**Supplementary Table 1**).

PLETHORA (PLT) 3/5/7 are responsible for direct transactivation of WUS and indirect transactivation of both WUS and STM via CUP-SHAPED COTYLEDON (CUC) 1/2 (**Supplementary Table 1** and **Figure 1**). In Arabidopsis, overexpression of PLT5/7 enables cytokinin-independent shoot regeneration, although at a very low rate (Kareem et al., 2015).

BABY BOOM (BBM) overexpression confers the ability for cytokinin-independent in vitro somatic embryogenesis in Arabidopsis (Boutilier et al., 2002). In contrast, expression of a BBM homolog in tobacco enhanced regeneration via shoot organogenesis without affecting embryogenesis (Srinivasan et al., 2007, 2011).

# Effectiveness of Developmental Genes in Non-model Species

WUS and related genes have been found to be effective at promoting regeneration in crop and forest species. Overexpression of the rice homolog of WUS in seedlings was found to cause de novo organogenesis of shoots in rice (Kamiya et al., 2003). Populus tomentosa transformed to overexpress any of four WUSCHEL or WUSCHEL-ASSOCIATED HOMEOBOX orthologs showed increased adventitious rooting (Liu et al., 2014; Li et al., 2018). Overexpression of AtWUS led to increased embryo and callus formation in vitro in coffee (Arroyo-Herrera et al., 2008), increased embryo formation in cotton (Bouchabké-Coussa et al., 2013), and enables in vitro ectopic embryogenesis for the otherwise completely recalcitrant Capsicum chinense (Solís-Ramos et al., 2009). Overexpression of homologs of WUS, or WUS in combination with BABY BOOM (BBM), enhances in vitro transformation and shoot regeneration in a variety of monocots, including rice, sorghum, and maize. Several completely recalcitrant maize inbred lines became responsive to transformation and regeneration when overexpressing either WUS or BBM homologs (Lowe et al., 2016; Mookkan et al., 2017).

STM and related genes are also active in crop and forest species. Expression of STM/Knotted-1 orthologs from apple or maize enhanced shoot regeneration from leaf explants in the absence of exogenous cytokinins in tobacco, though were not effective in plum under the conditions studied (Srinivasan et al., 2011). In citrus, expression of maize Knotted-1 enhanced in vitro regeneration after transformation, with rates varying widely among varieties (Hu et al., 2016). In the gymnosperm forest tree Picea abies (Norway spruce), overexpression of a KNOTTED-1 ortholog similarly promoted in vitro somatic embryogenesis (Belmonte et al., 2007).

Other DGs can also stimulate regeneration in non-model species. Overexpression of maize LEC in wheat and maize enabled efficient transformation without the use of selectable markers (Lowe et al., 2002). AtLEC1 overexpression in white spruce, however, had no effect on somatic embryogenesis (Klimaszewska et al., 2010). Overexpression of an ESR1 ortholog led to a doubling of shoot regeneration during transformation in hybrid poplar (Yordanov et al., 2014). Overexpression of a BBM ortholog in Capsicum annuum (sweet pepper) enabled in vitro somatic embryogenesis and efficient transformation of a genotype which was previously recalcitrant (Heidmann et al., 2011). In P. tomentosa, expression of a Brassica homolog of BBM led calli to develop somatic embryos, which is otherwise rarely seen with poplar regeneration systems (Deng et al., 2009).

# Strategies for Using Developmental Genes for Transformation Improvement

When overexpressed during vegetative development, BBM, WUS, LEC1 and other genes can promote various regeneration pathways, but then unsurprisingly lead to defects in further development, such as disorganized shoot and floral meristems (Gallois et al., 2002), infertility, and shoot necrosis (Lowe et al., 2016). Clearly, to be useful their expression must be carefully controlled in strength and timing. Three main options exist for targeted expression of transgenes: induced expression using external stimuli such as chemical or physical inducers (e.g., heat or drought); controlled excision of the genes from the genome using similar inducer options; and use of promoters

whose innate expression pattern closely resembles that of native meristematic genes, thus will have much attenuated expression after meristems or embryos are initiated. More complex options include gene editing to engineer the promoters of native DGs to be inducible, or to add miRNA-resistance mutations into the genes' transcribed regions, thus achieving some level of derepression without affecting genomic context and other local regulatory cues (Zhang et al., 2017). Heat-shock, drought-stress, and meristematic promoters have been shown to be effective for driving Cre excision of BBM and WUS orthologs in monocots (Lowe et al., 2016, 2018; Mookkan et al., 2017), and a heatshock promoter was used to drive Flp-FRT excision of BBM after BBM induced somatic embryogenesis in P. tomentosa (Deng et al., 2009). Clearly many options exist, but the choice of the most effective DGs or combinations thereof, as well as gene expression/removal options, need to be explored in parallel for specific crop and forest tree species.

In vitro culture and selection conditions will strongly affect the value of DGs for promoting regeneration. For example, plant hormones added to culture media can mask or help to amplify DG effects, depending on the specific taxa, tissues, and hormone species and concentrations, and duration of treatments (Hill and Schaller, 2013; Irshad et al., 2018; Kumar et al., 2018). With highly effective regeneration, use of selective agents such as antibiotics can be reduced or eliminated entirely, greatly reducing physiological stresses that retard regeneration (Mookkan et al., 2017; Lowe et al., 2018). The optimal combinations, and their impacts on transformation rate as well as chimerism in the resulting plants, can only be determined by trial and error with specific taxa, and may need to be customized to specific genotypes in highly genetically variable and recalcitrant species such as most forest trees.

Future improvements to the system are likely to include growing use of multiple DGs, in part to complement and balance their differing activities, and to make DG "reagents" effective across a wide range of taxa. For example, strong overexpression of WUS or upstream ESR1 without sufficient balancing activity of the STM pathway in the PZ has reportedly led to failure of PZ cells to differentiate (Banno et al., 2001; Ikeda et al., 2006), and to necrosis of shoot primordia (Lowe et al., 2016). Studies of gene combinations have to date received little attention, but may lead to major improvements in transformation efficiency.

# TOWARD THE DEVELOPMENT OF IN PLANTA TRANSFORMATION METHODS IN FOREST TREES

A major limitation to genetic transformation is the need to develop in vitro propagation and regeneration systems, which for many plant species are very time consuming and require a high level of technical expertise. Moreover, it must be customized for each new genotype and species, with many remaining recalcitrant to regeneration and/or transformation. As a result, methods that bypass the need for in vitro systems are highly desirable.

In planta transformation techniques take advantage of natural biological processes to produce and regenerate transgenic plants (**Figure 2**), and are thus in theory applicable to a large panel of genotypes and species. The target tissues are diverse, and can include secondary meristems. The induced somatic sector analysis (ISSA) approach is an example of what can be achieved in various tree species (Pinus, Eucalyptus, Populus, Spokevicius et al., 2005; Van Beveren et al., 2006). A "cambial window" is cut with a sharp razor blade through the bark to get access to the cambial/young xylem tissues, which are then inoculated with an Agrobacterium tumefaciens solution. After wound closure and cambium reestablishment, the transformed cells divide and differentiate, producing somatic sectors of transformed cells. Within a few months, and without any in vitro steps or complex manipulations, it is possible to analyze transgenes and promoters directly in the woody stem tissues of trees by comparing transformed sectors with adjacent non-transformed ones (Hussey et al., 2011; Creux et al., 2013; Baldacci-Cresp et al., 2015). ISSA has a great potential to study cell fate and pattern formation during secondary growth and xylogenesis, thanks to the development of microscope-derived techniques like Raman or ATR-FTIR, which can give spectroscopic information at the cellular or cell wall level. Although very useful for research, ISSA can, however, not be used to regenerate transgenic plants.

To this end, other tissues can be targeted for transformation, including vegetative meristems protected in axillary or apical buds, as was investigated in sugarcane (Mayavan et al., 2015), Populus (Yang et al., 2010), and grapevine (Fujita et al., 2009). A. tumefaciens was the DNA vector and reached the meristems after mechanical wounding, possibly complemented by sonication or vacuum infiltration (Mayavan et al., 2015). This approach required an efficient adventitious rooting system to regenerate plants from transgenic shoots excised from mother plants. However, repeated rounds of selection generally failed to avoid chimeric transgenic shoots.

To avoid chimeras, some protocols have been developed on germinating seeds or seedlings, with the goal of reaching apical meristems as early as possible in plant development. Indeed, these meristems will ultimately give rise to reproductive meristems that may produce transformed germ cells. Several attempts were at least partly successful in producing transgenic T1 plants from T0 transformed chimeric embryos or seedlings using A. tumefaciens (Lin et al., 2009; Shah et al., 2015; Ahmed et al., 2018) or particle bombardment (Hamada et al., 2017). However, this method would be difficult to apply in forest trees as the elimination of chimeras usually requires sexual reproduction to the T1 generation, requiring a long wait for flowering.

In order to speed up the process, while ensuring the generation of non-chimeric plants, some authors targeted reproductive meristems before flowering and fertilization. The rationale was to transform future germ cells. Arabidopsis flowers were successfully transformed by A. tumefaciens vacuum infiltration (Bechtold et al., 1993); Clough and Bent (1998) further improved this method, and found that dipping of flowers was efficient enough as long as a surfactant was also used. Even though the overall efficiency was less than 1%, the method remained viable thanks to the high fertility and small size of Arabidopsis, enabling hundreds to thousands of germinating seeds to be efficiently screened using selectable markers. This method was developed

with limited success in other species (Raphanus sativus, Curtis and Nam, 2001; Brassica napus, Qing et al., 2000; Medicago truncatula, Trieu et al., 2000). An alternative method is to target pollen grains using methods such as sonication to penetrate pollen apertures in a DNA-containing sucrose solution. Southern analyses demonstrated the successful transfer of transgenes to progenies by pollination in sorghum (Wang et al., 2007), Brassica juncea (Wang et al., 2008), and maize (Yang et al., 2017). Pretreatment of pollen grains by aeration (e.g., 20 min at 4◦C) increased pollen viability, mitigating a common adverse effect of sonication (Yang et al., 2017). Zhao et al. (2017) have recently investigated an innovative method by which DNA was delivered to pollen via magnetic nanoparticles; stable transformants in cotton, pepper, and pumpkin were generated. Pollen-mediated transformation could be easily tested for forest trees as pollination for controlled crosses or seed production is very common as part of conventional breeding. However, to be feasible and conform to most regulatory requirements for containment of transgenic pollen, such crosses would need to be carried out on detached or grafted floral branches in greenhouses, which is possible only for a limited number of forest tree species.

Unfortunately, most of the procedures described above required a high level of technical expertise, as they were hardly transferred to other laboratories. The only exception is Arabidopsis floral dip, which has been used in numerous laboratories worldwide. Could in planta techniques such as this be tailored for forest trees? Several convergent studies have shown that the ovules, not the pollen, are the direct targets for transformation through floral dip. It has been shown that manual outcrossing experiments produced transgenic progenies only when A. tumefaciens was applied on pollen recipient plants, not on pollen donor plants (Ye et al., 1999; Desfeux et al., 2000). Bechtold et al. (2000) elegantly reached the same conclusion using a genetic approach.

Floral dip has been shown to give rise to transgenic seedlings with genetically independent insertions that are typically hemizygous (carrying the T-DNA at only one allele of a given locus) (Bechtold et al., 1993). The transformation rate is very dependent on the flower developmental stage (Desfeux et al., 2000), with the optimal stage being when the gynoecium is still open (Irepan Reyes-Olalde et al., 2013), thus allowing agrobacteria to penetrate and transform the ovule primordia.

The CRABS-CLAW mutant, which maintains an open gynoecium, gives a sixfold enhanced rate of transformation (Desfeux et al., 2000). Therefore, access by Agrobacterium to the locule of the ovule appears to be critical for the transformation. Trees are perennial species, where flower initiation takes place the year before flowering, thus injection of the Agrobacterium solution into female floral buds before flowering at a stage where the gynoecium is still open needs to occur weeks to months before seed release. However, trees also produce very large number of seeds, thus the in planta approach, if it can be optimized and applied to many buds at the right times, together with an efficient selection system for germinating seeds, may be realistic option for some tree species.

## CONCLUSION

In summary, both DG and in planta approaches to transformation hold promise to solve major problems in plant and tree transformation. DGs appear to hold most promise where a basic in vitro regeneration system is in place, thus might benefit from a large elevation of transformation efficiency using an established transformation pathway. It has to date been most effective in species with embryogenic rather than organogenic regeneration systems. In planta systems hold most promise where in vitro approaches are extremely difficult or impossible, thus alternate pathways are required. It will also be most easily pursued as part of a large-scale breeding program, enabling large numbers of floral buds to be treated and seedling populations screened for transformation and chimerism. In planta transformation of cambium and axillary/apical buds has been successful, but is prone to chimerism; research on

## REFERENCES


germline transformation (i.e., transformation of mother cells within floral buds and embryos within seeds) may help to reduce this problem. In planta and DG overexpression approaches to efficient transformation might not be mutually exclusive; research is warranted to elucidate the potential for DGs to enhance in planta systems. Given the importance of regeneration as a bottleneck to transformation and gene editing of forest trees in research and application (Chang et al., 2018), acceleration of research using both approaches is warranted.

# AUTHOR CONTRIBUTIONS

MN and AD drafted the sections "Developmental Genes as Tools for Transformation Improvement" and "Toward the Development of in planta Transformation Methods in Forest Trees," respectively, after extensive consultation with SS and GP. SS and GP revised the drafted text and prepared initial drafts of the sections "Introduction" and "Conclusion."

# FUNDING

AD and GP research on in planta transformation is funded by the French PIA project "Genius".

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.01443/ full#supplementary-material


Bryan, A. A., and Sass, J. E. (1941). Knotted" maize plants. J. Hered. 32, 343–346.



CUP-SHAPED COTYLEDON 1 at the transcriptional level and controls cotyledon development. Plant Cell Physiol. 47, 1443–1456. doi: 10.1093/pcp/ pcl023



development of tobacco (Nicotiana tabacum L) and European plum (Prunus domestica L). Plant Cell Rep. 30, 655–664. doi: 10.1007/s00425-006-0358-1


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Nagle, Déjardin, Pilate and Strauss. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genome Editing in Trees: From Multiple Repair Pathways to Long-Term Stability

William Patrick Bewg<sup>1</sup> , Dong Ci1,2 and Chung-Jui Tsai<sup>1</sup> \*

<sup>1</sup> Warnell School of Forestry and Natural Resources, Department of Genetics, and Department of Plant Biology, University of Georgia, Athens, GA, United States, <sup>2</sup> Department of Bioscience and Biotechnology, Beijing Forestry University, Beijing, China

The CRISPR technology continues to diversify with a broadening array of applications that touch all kingdoms of life. The simplicity, versatility and species-independent nature of the CRISPR system offers researchers a previously unattainable level of precision and control over genomic modifications. Successful applications in forest, fruit and nut trees have demonstrated the efficacy of CRISPR technology at generating null mutations in the first generation. This eliminates the lengthy process of multigenerational crosses to obtain homozygous knockouts (KO). The high degree of genome heterozygosity in outcrossing trees is both a challenge and an opportunity for genome editing: a challenge because sequence polymorphisms at the target site can render CRISPR editing ineffective; yet an opportunity because the power and specificity of CRISPR can be harnessed for allele-specific editing. Examination of CRISPR/Cas9-induced mutational profiles from published tree studies reveals the potential involvement of multiple DNA repair pathways, suggesting that the influence of sequence context at or near the target sites can define mutagenesis outcomes. For commercial production of elite trees that rely on vegetative propagation, available data suggest an excellent outlook for stable CRISPR-induced mutations and associated phenotypes over multiple clonal generations.

#### Edited by:

Ronald Ross Sederoff, North Carolina State University, United States

#### Reviewed by:

Eva Stoger, University of Natural Resources and Life Sciences Vienna, Austria Victor Busov, Michigan Technological University, United States

> \*Correspondence: Chung-Jui Tsai cjtsai@uga.edu

#### Specialty section:

This article was submitted to Plant Biotechnology, a section of the journal Frontiers in Plant Science

Received: 06 July 2018 Accepted: 07 November 2018 Published: 23 November 2018

#### Citation:

Bewg WP, Ci D and Tsai C-J (2018) Genome Editing in Trees: From Multiple Repair Pathways to Long-Term Stability. Front. Plant Sci. 9:1732. doi: 10.3389/fpls.2018.01732 Keywords: mutagenesis, genome engineering, allele dose effect, Populus, biallelic, monoallelic, knockout

## INTRODUCTION

CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)-based genome editing is rapidly becoming the system of choice for targeted mutagenesis in a growing variety of woody species, including forest trees. Forest trees are an invaluable commodity, providing fiber, energy, materials and climate buffering to the global community, and CRISPR has the potential to further enhance these important traits. Previous-generation methods for gene silencing in plants rely on expression of antisense RNAs, small interfering RNAs or microRNAs to base-pair with target mRNAs for degradation, often with unpredictable and unstable outcomes (Alessandra and Shihshieh, 2010). The specificity and efficiency of CRISPR for targeted DNA mutations, and the ease of adoption in virtually any species are behind the current revolution in genomic editing (Jiang and Doudna, 2017). Meanwhile, CRISPR's popularity is driving the discovery and characterization of new CRISPR-associated (Cas) endonucleases with novel properties that make the system even more versatile (Burstein et al., 2016; Murovec et al., 2017). This review will focus on recent applications

of CRISPR in woody species, with a special focus on forest trees, the mutation patterns observed at target sites, and the long-term stability of CRISPR/Cas9-edited outcomes.

# CRISPR APPLICATIONS IN WOODY SPECIES

Phytoene desaturase (PDS) has been a popular marker for evaluating CRISPR in new study systems (**Table 1**). Its mutation disrupts chlorophyll biosynthesis, allowing for visual assessment of knockout (KO) efficiency. CRISPR/Cas9-induced albino mutants have been reported in poplar (Fan et al., 2015), citrus (Jia and Wang, 2014; Zhang et al., 2017), apple (Nishitani et al., 2016), grape (Nakajima et al., 2017), cassava (Odipio et al., 2017), coffee (Breitler et al., 2018), and kiwifruit (Wang Z. et al., 2018). Successful implementation of CRISPR has also been demonstrated by targeting potential developmental and biosynthesis pathway genes in grape (Ren et al., 2016) and the tropical tree Parasponia andersonii (van Zeijl et al., 2018; **Table 1**). New CRISPR reagents have been developed to expand genome editing capabilities. One such reagent, SaCas9 from Staphylococcus aureus, was shown to effectively generate mutations in Duncan grapefruit (Jia et al., 2017a). Compared to the most commonly used SpCas9 from Streptococcus pyogenes, SaCas9 is considerably smaller and recognizes a distinct 5 0 -NNGRRT protospacer adjacent motif (PAM) sequence (versus 5 0 -NGG of SpCas9). Using alternative CRISPR/Cas systems such as SaCas9 can increase the number of potential guide-RNA (gRNA) target sites, especially in AT-rich regions which may facilitate promoter editing.

Besides proof-of-concept studies, the CRISPR/Cas9 system has been used to develop disease resistant fruit trees with promising results (**Table 1**). The devastating citrus canker disease is caused by Xanthomonas citrisubsp.citri (Xcc) through effectoractivation of a canker susceptibility gene LOB1 of the Lateral Organ Boundaries transcription factor family (Hu et al., 2014). When the LOB1 promoter was targeted by CRISPR/Cas9 to disrupt the effector-binding element, canker symptoms after Xcc infection were reduced in Duncan grapefruit (Jia et al., 2016) and Wanjincheng orange (Peng et al., 2017). CRISPR-KO of LOB1 also increases Xcc resistance in Duncan grapefruit (Jia et al., 2017b). KO-mutations in other susceptibility genes for powdery mildew and fire blight disease have also been achieved in grape and apple protoplasts, respectively (Malnoy et al., 2016), potentially allowing for the regeneration of disease-resistant plants. Several WRKY transcription factors involved in defense regulation have also been targeted for mutagenesis. CRISPR-KO of two positive regulators PtrWRKY18 and PtrWRKY35 compromised resistance to Melampsora rust in Populus (Jiang et al., 2017), whereas KO of grape VvWRKY52 increased resistance to necrotrophic Botrytis cinerea (Wang X. et al., 2018).

To date, the greatest progress in woody species has been made with poplar, the first stably transformed tree to be genomeedited by CRISPR with high efficiency (Zhou et al., 2015). Allelesensitive bioinformatics resources to facilitate genome editing in heterozygous species quickly followed, again based on the poplar system (Xue and Tsai, 2015; Xue et al., 2015). The majority of CRISPR studies in poplar have targeted phenylpropanoid metabolism or cell wall traits (**Table 1**). Mutations of individual 4-coumarate:CoA ligase (4CL) genes decreased the levels of structural (lignin) or non-structural (proanthocyanidin) phenylpropanoid polymers. CRISPR-KO of MYB transcription factors either increased (PtoMYB156 and PtrMYB57) or decreased (PtoMYB115 and PtoMYB170) phenylpropanoid flux, affecting in turn lignin deposition (PtoMYB156 and PtoMYB170) or flavonoid accrual (PtrMYB57 and PtoMYB115), respectively (Wan et al., 2017; Wang et al., 2017; Xu et al., 2017; Yang et al., 2017). Secondary cell wall synthesis was also compromised by CRISPR-KO of a brassinosteroid biosynthetic gene, supporting a role for brassinosteroids in wood formation (Shen et al., 2018). CRISPR-KO of BRANCHED1-1 (BRC1-1) and BRC1-2 belonging to the TCP family of transcription factors resulted in altered shoot architecture, and revealed an additional role of BRC2 in leaf development not previously reported for its Arabidopsis ortholog (Muhr et al., 2018). A recent study reported successful mutation of essential flowering genes in both male and female poplar genotypes (Elorriaga et al., 2018). The study also collated a large mutation dataset from over 500 transgenic events (Elorriaga et al., 2018) which should prove of value to understanding CRISPR/Cas editing patterns (see below). Although phenotypic evaluation of the flowering traits will require follow-on and multiyear studies in the field, the work underscores a powerful social application of CRISPR in containment of transgenic trees.

# DIVERSE INDEL PROFILES INDICATIVE OF cNHEJ, MMEJ, AND TMEJ ACTIVITIES

Small frameshift indels are the most common repair outcomes of single gRNA-directed Cas9 cleavage in trees, with 1 bp insertions (+1), especially +T and +A, predominant in many cases, similar to findings from other plants and animals (Bortesi et al., 2016). However, considerable variations and case-dependent repair outcomes are also noted, suggesting potential influences of target site sequences and/or their genomic contexts (Jacobs et al., 2015; Xu et al., 2015). Meta-analysis of mutation patterns across published tree studies is necessary to gain further insight, but that is made difficult by different reporting formats (not all studies report multi-allele data), and by the use of detection methods that differ in their sensitivity, accuracy, and allele discrimination (Sentmanat et al., 2018). We combined amplicon sequencing data from CRISPR-edited P. tremula x alba IRNA 717-1B4 (717) generated in our lab (Zhou et al., 2015) with the large 717 dataset from Elorriaga et al. (2018), along with manual inspection of other published tree studies for mutation profile analysis (**Figure 1**). In aggregate, +1 insertions constituted the greatest fraction of mutation types, followed by −1, and then −2, although stereotyped repair patterns are evident (**Figure 1A**). Interestingly, insertions were limited to +1 and +2 across all sites, whereas deletions spanned a much broader size range, though with decreasing frequencies for larger deletions.


TABLE 1 | Summary of published CRISPR/Cas9-mediated knockout in woody species.

∗ Indicates off-target mutations were assessed. All transformations performed via Agrobacterium unless otherwise stated. <sup>+</sup>Xcc (Xanthomonas citri subsp. citri)-facilitated agroinfiltration. <sup>∧</sup>Direct delivery of Cas9-gRNA ribonucleoproteins.

Small mutagenic indels have often been ascribed to the classical non-homologous end-joining (cNHEJ) DNA repair pathway, but recent studies have demonstrated involvement of the alternative end-joining (alt-EJ) pathway as well (Rodgers and McVey, 2016). It now appears that cNHEJ contributes to the most common +1 insertions and other small indels, whereas larger indels are due to alt-EJ. This is based on studies where impaired cNHEJ drastically changed the repair outcomes of CRISPR/Cas9 in yeast, human cells and Arabidopsis, such that the typically predominant +1 insertions as well as other small (<3 bp) indels were greatly reduced, while rates of large indels increased, apparently independently of cNHEJ (van Overbeek et al., 2016; Shen et al., 2017; Lemos et al., 2018). In yeast, the vast majority of the +1 insertions from cNHEJ were templated from 1 bp 50overhangs at the Cas9 cleavage site (fourth base from the PAM), and dependent on POL4, a low-fidelity X-family DNA polymerase with terminal transferase activity (Lemos et al., 2018). POL4-deficient yeast also lost +2 and +3 insertions, many of which are homonucleotides and apparently templated from the Cas9 cleavage site as well (Lemos et al., 2018). Templated insertions could also explain the majority of +1 events in a large Cas9-induced indel dataset from human cells (van Overbeek et al., 2016), suggesting a conserved mechanism underlying +1 insertions in CRISPR/Cas9-edited organisms (Lemos et al., 2018). In the combined 717 dataset, the majority of +1 insertions were +T as reported in many CRISPR studies. However, evidence in support of templated +1 insertions was weak, and appeared to be target site-dependent (**Figure 1A**). Clearly, much more data with greater target site diversity and coverage are necessary before a conclusion can been drawn, but such data from trees will require significant and perhaps community-wide efforts. Regardless, the small target site collection used in our analysis supports involvement of more than one mechanism for the commonly observed +T insertions, at least in Populus.

cNHEJ-independent repair likely involves different alt-EJ pathways, including microhomology-mediated end-joining (MMEJ), single-strand annealing (SSA), or polymerase theta (POLQ)-mediated end-joining (TMEJ) (Rodgers and McVey, 2016). Both MMEJ and SSA require end resection or unwinding to expose short homologous sequences for annealing (up to ∼20–30 bp for MMEJ and longer for SSA) and subsequent repair, and always result in deletions (Sfeir and Symington, 2015). The presence of microhomologous sequences at the deletion junctions can therefore serve as evidence of MMEJ/SSA repair. Indeed, microhomologies of 1–5 bp are readily identifiable in most of the deletion (≥5 bp) alleles we examined, but are rarely found for small deletions (3–4 bp) that might have arisen from cNHEJ (**Figure 1B**). MMEJ has also been associated with abnormal chromosomal translocations and inversions (Sfeir and Symington, 2015). Such modifications have been reported in

sequences are shown on the right, with allele number for each noted in parentheses. The percentages shown inside the +1 (1 bp insertion) bar indicate the fraction that were T insertions. The fraction of templated +1 insertions that deviate from T is shown in parentheses. (B) Representative examples of different mutation types and the potential DNA repair pathway involved in each case. PAM sequences are bold underlined, triangles denote predicted Cas9-cleavage sites, indels are shown in red, yellow-shaded regions denote microhomologies, and gray sequences in (6) and (7) were appended from P. tomentosa cDNA (GenBank accession KC954700) and P. tremula x alba 717 genomic sequences (Xue et al., 2015), respectively. Note, the region in (7) contains two overlapping target sites. For (8), there are five possible in trans template sites within introns of Phytozome (v12) gene model GSVIVG01016650001, the nearest one 640 bp upstream of the target site.

several studies – including two from Populus – where two or more gRNAs were designed to target the same gene to produce large deletions (Fan et al., 2015; Elorriaga et al., 2018). Moreover, large deletions are sometimes accompanied by small insertions (Fan et al., 2015; Nakajima et al., 2017), a pattern that is characteristic of the recently discovered TMEJ pathway (Koole et al., 2014). TMEJ depends on POLQ, an error-prone A-family DNA polymerase that can extend microhomologies in a template-dependent (either in cis or in trans) or independent manner (Kent et al., 2016). TMEJ is the essential repair pathway in animal germ cells, as embryos of zebrafish polq mutants are hypersensitive to DSB-inducing treatments, with low levels of repair producing only +1 insertions (Thyme and Schier, 2016). In Arabidopsis, TMEJ is required for T-DNA integration following Agrobacterium transformation of either flowers or roots (van Kregten et al., 2016). We found evidence of in cis or in trans templated insertions in the complex indels reported for poplar and grape (**Figure 1B**; Fan et al., 2015; Nakajima et al., 2017), supporting an active TMEJ pathway in somatic cells of plants.

Examination of published mutation profiles of Populus and other tree species suggests differential involvement of multiple repair pathways, probably with cNHEJ contributing to +1, +2 and small (1–4 bp) deletions, MMEJ (and SSA) to larger deletions, and TMEJ to complex indels (**Figure 1B**). The varying dependency of these pathways on sequence contexts (microhomologies) likely underpins the non-random nature of CRISPR/Cas9 repair outcomes reported in many studies, including trees (Jacobs et al., 2015; van Overbeek et al., 2016; Vu et al., 2017; Elorriaga et al., 2018). Incorporation of microhomology modeling into the gRNA design workflow (Bae et al., 2014; Segar et al., 2015) should enable prediction of potential DNA repair outcomes for informed selection of target sites.

# LONG-TERM STABILITY OF CRISPR-EDITED TREES THROUGH VEGETATIVE PROPAGATION

For many herbaceous species where CRISPR editing efficiencies are low, or where monoallelic/mosaic mutations predominate in the first-generation (T0) transformants, multi-generation progeny screening is necessary to obtain homozygous mutants (Xu et al., 2015). Although initial transmission rates vary depending on the study system and the nature of CRISPRinduced (somatic or germinal) mutations carried by the

founder plant, stable mutation inheritance can be expected once homozygous lines are obtained, as reported for Arabidopsis, rice, tomato and potato (Brooks et al., 2014; Feng et al., 2014; Zhou et al., 2014; Butler et al., 2015). For woody perennials, however, the issues are rather different. Cross-generational screening is difficult to implement for transgenic trees owing to their long generation times and strict regulation of flowering transgenic trees (Strauss et al., 2015). The predominantly outcrossing nature of trees, many of which are also dioecious, adds further challenge to rapid-cycle breeding and introgression of CRISPRderived mutations into elite germplasms. While advances of early-flowering induction in contained environments (Hoenicka et al., 2014; Klocko et al., 2016) promise to accelerate progress, commercial production of many forest, fruit and nut trees relies on clonal propagation of elite genotypes. For woody perennials, therefore, it is pertinent to address long-term stability of CRISPR editing, both on-target and off-target, in vegetatively-propagated T0 transformants.

In theory, CRISPR-induced DNA modifications should lead to permanent mutations in edited cells that can be inherited mitotically during clonal propagation, yet experimental data are rare. One study used tissue culture to clone CRISPRderived mutations from T0 diploid and tetraploid potato, and reported stable maintenance of targeted mutations across clonal generations, and in three selected cases, through the germline as well (Butler et al., 2015). In that study, however, somatic mutations were prevalent in T0 plants, as fewer than half of the originally-detected mutation types were captured as single mutations in clonally-propagated plants (Butler et al., 2015). The high levels of somatic mutations likely reflect a high proportion of chimeras, a common problem in tissue culture when plants are regenerated from multiple cells, in this case, with heterogenomic modifications. Fortunately for Populus, the proven efficiency of CRISPR (Zhou et al., 2015; Elorriaga et al., 2018) means null mutations with biallelic KO can be readily obtained in T0 transformants and stably inherited through clonal propagation. Wang et al. (2017) reported faithful maintenance of PtoMYB115 mutations in tissue culture-propagated Populus tomentosa somaclones, though in one case low frequencies of new mutations not seen in the parent line were detected, indicative of chimeras. Similarly, the CRISPR editing outcomes of BRC1-1 and BRC2-1 were also stable over multiple cycles of vegetative propagation in tissue culture (Muhr et al., 2018). We have maintained a subset of the previously-characterized 717 mutants (Zhou et al., 2015) in the greenhouse for over 4 years by repeatedly cutting back the original transformants and/or propagation using rooted cuttings. The reddish-brown wood discoloration of lignin-reduced 4cl1 mutants has been stable in all re-sprouted shoots or clonallypropagated plants. Repeated amplicon-sequencing of randomly selected lines re-confirmed the targeted 4CL1 mutations 4 years later, with no off-target changes to the paralogous 4CL5 (**Supplementary Table S1**). Another group of transgenic plants harbors a non-functional gRNA for 4CL5 due to SNPs (one per allele) between the genome-sequenced P. trichocarpa and the transformation host 717 that prevented Cas9 cleavage as confirmed by amplicon sequencing (Zhou et al., 2015). It should be noted that one of the 717 SNPs alters the PAM site from NGG to NGA, the latter is a non-canonical PAM of SpCas9 thought to cause off-target cleavage in human cells (Zhang Y. et al., 2014b). Reanalyzing this group of plants will inform as to whether the imperfectly-matched 4CL5-gRNA exhibited any off-target activity over the long term. We found no evidence of CRISPR/Cas9 cleavage after 4 years of coppicing and regrowth (**Supplementary Table S1**). These findings echo other tree studies that showed no or very rare off-targeting (Jia et al., 2017a; Nakajima et al., 2017; Elorriaga et al., 2018; see also **Table 1**), as well as reports from Arabidopsis, rice and tomato based on whole-genome re-sequencing (Feng et al., 2014; Zhang et al., 2014a; Peterson et al., 2016; Rodríguez-Leal et al., 2017). The data provide support for long-term stability and specificity of CRISPR/Cas9-mediated mutagenesis, with extremely low off-target potential during vegetative propagation in poplar.

## BROAD-SPECTRUM MUTAGENESIS BEYOND KO

Nullizygous mutations harboring either identical (homozygous) or distinct (heterozygous) mutations in all alleles of the genome are the ideal repair outcomes for gene KO investigation. However, monoallelic, in-frame and/or mosaic mutations can expand the phenotypic spectrum to enhance the power of functional analysis. For instance, transgenic grapevine with monoallelic mutations of a defense-related WRKY gene exhibited intermediate levels of disease resistance between WT and biallelic mutants (Wang X. et al., 2018). Similarly, monoallelic or in-frame mutations of PDS led to partial albino phenotypes in both poplar and apple (Fan et al., 2015; Nishitani et al., 2016). Given the abundance of duplicate genes in plant genomes, and the proven successes of CRISPR in multi-allele as well as allele-specific editing (Jia et al., 2016; Elorriaga et al., 2018), there is exciting potential to exploit CRISPR for development of allelic series mutations to address functional redundancy of duplicate genes or tandem repeats, and to investigate the allele-dose response of agronomic traits. Thus, the ability to generate novel germplasms is invaluable not only for tree improvement but also for basic functional genomics research.

In contrast to CRISPR-mediated KO, site-specific gene targeting or replacement remains a major challenge in plants, due to the inefficient homology-directed repair pathway. Geminivirus replicons have been shown to increase site-specific gene knockin (KI) efficiencies by orders of magnitude in tobacco, tomato and hexaploid wheat (Baltes et al., 2014; Cermák et al., 2015 ˇ ; Gil-Humanes et al., 2017). In animal systems, the MMEJ and SSA pathways, along with a new CRISPR/Cpf1 system have been harnessed for targeted KI with success (Sakuma et al., 2016; Tóth et al., 2016). These and other emerging approaches represent promising options for developing efficient KI systems in trees. Finally, many economically important tree species or genotypes remain recalcitrant to transformation and/or tissue culture regeneration, hindering applications of CRISPR. Recent

breakthroughs in morphogenic regulator-mediated regeneration (Lowe et al., 2016, 2018) have already stimulated similar research in trees. Direct delivery of pre-assembled Cas9-gRNA ribonucleoproteins into protoplasts for genome editing as already deployed in apple and grape offers a transgene-free alternative to Agrobacterium transformation (Malnoy et al., 2016). At the present time, however, protoplast regeneration for other tree species remains a challenge. There is strong incentive to overcome this challenge since avoiding the footprint of foreign DNA and the associated negative perceptions will improve the outlook for integration of CRISPR technology with commercial deployment of designer trees.

#### AUTHOR CONTRIBUTIONS

C-JT conceived the idea. WPB, DC, and C-JT collected background information and analyzed data. WPB and C-JT wrote the manuscript with contributions from DC. All authors approved the manuscript.

#### FUNDING

The CRISPR research and associated genomic resource development in the Tsai Lab were supported by the National

#### REFERENCES


Institute of Food and Agriculture in the Department of Agriculture (2015-67013-22812), the National Science Foundation (IOS-1546867), and The Center for Bioenergy Innovation, a Department of Energy's Research Center funded by the Office of Biological and Environmental Research in the DOE Office of Science. DC was supported by the Fundamental Research Funds for the Central Universities (BLYJ201603) from China.

#### ACKNOWLEDGMENTS

We thank current and former lab members and collaborators who contributed to the CRISPR research, Gilles Pilate of INRA, France for providing poplar clone 717, Estefania Elorriaga and Steve Strauss at Oregon State University for sharing their data, and Scott Harding for critical review of the manuscript.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.01732/ full#supplementary-material


resistance to citrus canker. Plant Biotechnol. J. 15, 817–823. doi: 10.1111/pbi. 12677



CRISPR/Cas9 in rice. Nucleic Acids Res. 42, 10903–10914. doi: 10.1093/nar/ gku806

Zhou, X., Jacobs, T. B., Xue, L.-J., Harding, S. A., and Tsai, C.-J. (2015). Exploiting SNPs for biallelic CRISPR mutations in the outcrossing woody perennial Populus reveals 4-coumarate:CoA ligase specificity and redundancy. New Phytol. 208, 298–301. doi: 10.1111/nph.13470

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Bewg, Ci and Tsai. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Strategies for Engineering Reproductive Sterility in Plantation Forests

#### Steffi Fritsche<sup>1</sup> , Amy L. Klocko<sup>2</sup> , Agnieszka Boron<sup>1</sup> , Amy M. Brunner<sup>3</sup> and Glenn Thorlby<sup>1</sup> \*

<sup>1</sup> Scion, Rotorua, New Zealand, <sup>2</sup> Department of Biology, University of Colorado Colorado Springs, Colorado Springs, CO, United States, <sup>3</sup> Department of Forest Resources and Environmental Conservation, Virginia Tech, Blacksburg, VA, United States

A considerable body of research exists concerning the development of technologies to engineer sterility in forest trees. The primary driver for this work has been to mitigate concerns arising from gene flow from commercial plantings of genetically engineered (GE) trees to non-GE plantations, or to wild or feral relatives. More recently, there has been interest in the use of sterility technologies as a means to mitigate the global environmental and socio-economic damage caused by the escape of non-native invasive tree species from planted forests. The current sophisticated understanding of the molecular processes underpinning sexual reproduction in angiosperms has facilitated the successful demonstration of a number of control strategies in hardwood tree species, particularly in the model hardwood tree Poplar. Despite gymnosperm softwood trees, such as pines, making up the majority of the global planted forest estate, only pollen sterility, via cell ablation, has been demonstrated in softwoods. Progress has been limited by the lack of an endogenous model system, long timescales required for testing, and key differences between softwood reproductive pathways and those of well characterized angiosperm model systems. The availability of comprehensive genome and transcriptome resources has allowed unprecedented insights into the reproductive processes of both hardwood and softwood tree species. This increased fundamental knowledge together with the implementation of new breeding technologies, such as gene editing, which potentially face a less oppressive regulatory regime, is making the implementation of engineered sterility into commercial forestry a realistic possibility.

Keywords: sterility, reproduction, forest trees, gene editing, genetic engineering, containment

# DRIVERS FOR ENGINEERING STERILE FOREST TREES

Increasing global population coupled with transition to a sustainable bio-based economy is predicted to lead to growing pressure on forests to deliver wood-based products, energy, food, and ecosystem services whilst maintaining their role as major reservoirs of biodiversity. To accommodate this growing demand, it is estimated that the amount of wood we take from forests and plantations each year may need to triple by 2050 (WWF, 2015). Planted forests, which in 2015 made up 7% of forest lands, provide a means to sustainably increase production of forest products and reduce pressure on natural forests (FAO, 2015). Alongside improved silviculture, land

#### Edited by:

Steven Henry Strauss, Oregon State University, United States

#### Reviewed by:

Ove Nilsson, Umeå Plant Science Centre, Sweden Atsushi Watanabe, Kyushu University, Japan Matthias Fladung, Johann Heinrich von Thünen-Institut, Germany

\*Correspondence: Glenn Thorlby glenn.thorlby@scionresearch.com

#### Specialty section:

This article was submitted to Plant Biotechnology, a section of the journal Frontiers in Plant Science

Received: 12 August 2018 Accepted: 26 October 2018 Published: 15 November 2018

#### Citation:

Fritsche S, Klocko AL, Boron A, Brunner AM and Thorlby G (2018) Strategies for Engineering Reproductive Sterility in Plantation Forests. Front. Plant Sci. 9:1671. doi: 10.3389/fpls.2018.01671

management and other technological advances, biotech-based technologies offer tools to enhance the sustainability and productivity of planted forests (Al-Ahmad, 2018). Sterile trees have the ability to impact a number of obstacles to increasing productivity from planted forests.

# Containment of Genetically Engineered Trees

Genetic engineering (GE) is able to provide solutions for many of the challenges forestry faces to sustainably increase forest production. Improved wood quantity and quality, processability, biotic, and abiotic stress tolerance and herbicide tolerance (Harfouche et al., 2011; Porth and El-Kassaby, 2014; Etchells et al., 2015; Ault et al., 2016; Zhou et al., 2017) are amongst the traits successfully demonstrated. The recent approval by the Brazilian Regulator for GE Eucalyptus that are able to grow 15–20% faster than the best existing clonal lines (Nature Biotechnology News, 2015) seems likely to lead to first large-scale commercial planting of trees.

There remain well documented regulatory and social challenges associated with commercial planting of GE trees (Porth and El-Kassaby, 2014; Strauss et al., 2016). Gene flow from transgenic trees remains a major concern, particularly as forest trees are virtually undomesticated and pollen is able to disseminate over great distances (DiFazio et al., 2004). Seeds also have the potential to spread, either locally or over distances, depending on the species. Transgene containment through the production of trees that are unable to produce fertile reproductive propagules has the ability to mitigate these concerns and prevent, or severely reduce, the flow of genes via sexual reproduction.

## Invasive Tree Species

Increasing attention is being paid to the ecological, economic, and cultural damage caused by invasive tree species that have "escaped" by seed dispersal from planted forests (Breton et al., 2008; Nuñez et al., 2017). Globally, Pinus species are recognized as among the most widespread and influential of all invasive plants (Richardson and Rejmánek, 2004). These escapes, or wildings, are particularly a problem in the Southern Hemisphere where a large percentage of tree plantations are composed of exotic species (Franzese and Raffaele, 2017). South Africa, New Zealand, and Australia, who were early adopters of exotic conifer plantations, have been joined more recently by several South American nations in facing wilding challenges (Simberloff et al., 2010). For example, in New Zealand several exotic conifer species have become established and now occupy ∼1.8 million ha, and are expanding by about 6% per annum (Froude, 2011). Economic and ecological damage resulting from these wildings is challenging the license to operate, with commercially advantageous, but wildingprone species such as Douglas-fir (Pseudotsuga menziesii). The ability to generate trees that are unable to reproduce would allow control programs to focus on the existing populations and give forest owners freedom to operate for new plantings.

# Increased Wood Production and Other Benefits

The ability to either prevent reproduction or limit the development of reproductive propagules is predicted to boost growth and increase wood production in forest trees by redirecting energy and nutrients to increased vegetative growth (Strauss et al., 1995; Luis and José, 2014). Conclusive evidence for such a reproductive cost is lacking but is supported by evidence that in conifers cone production may utilize a significant proportion of the trees energy and assimilates (Cremer, 1992; Sala et al., 2012; Kramer et al., 2014). Unsurprisingly, in conifers, the long-lived female cones are more energy demanding than the generally more transient male cones (Obeso, 2002). These observations suggest that engineered sterility, particularly female sterility is likely to have a positive impact on vegetative growth and wood production. Long-term growth comparisons between sterile and reproductive trees would provide direct evidence for this and allow quantification of growth differences.

Pollen from many trees cause allergenic reactions and symptoms correlate with exposure (Buters et al., 2012). Planted forests can provide a major source of seasonal allergens (D'amato et al., 2007). For example, allergy to sugi (Cryptomeria japonica) pollen is reported to effect 26.5% of the Japanese population (Taniguchi, 2018). The ability to prevent or limit pollen production from planted forests would provide relief to allergy suffers and mitigate potential social license to operate challenges.

# CURRENT UNDERSTANDING OF REPRODUCTIVE PROCESSES IN FOREST TREES

Both angiosperm (hardwood) and gymnosperm/conifer (softwood) trees are used as plantation species. Although they share broad similarities in their reproductive processes, there are distinct differences between them.

# Angiosperm Trees

Perhaps no other plant development process has been studied more than flowering. For Arabidopsis, the in planta functions of a large number of flowering genes as well as their regulatory network context are known and studies in plants such as rice and petunia have revealed broad functional conservation (Pajoro et al., 2014). These include genes that regulate the transition to flowering, floral organ identity as well as pollen and ovule development. Although advances in sequencing have enabled the identification of flowering gene homologs in diverse angiosperm trees, there are few cases where in planta functions have been characterized in trees (Brunner et al., 2017; Klocko et al., 2018). This is due to the long non-flowering period that can last years to decades and that for most species, genetic transformation is a formidable hurdle. Trees also differ from herbaceous plants in the prolonged period between the floral transition and anthesis.

In tropical species such as Eucalyptus, this occurs in one season, but temperate species exhibit indirect flowering, with flower development initiated in 1 year and completed the following year (Vining et al., 2015; Brunner et al., 2017). Thus, multi-year field trials that require monitoring of large trees and collecting flowers from the upper portion of the tree crown are typically required to demonstrate sterility or delay of flowering.

Selection of candidate genes for genetic containment in trees based on homology to Arabidopsis flowering genes and gene expression might be straightforward, but such conservation does not necessarily translate to the predicted or desired phenotype. Flowering time genes are attractive targets because prevention of flowering is easier to monitor (e.g., no need to demonstrate flowers are sterile) and to prevent resource allocation to reproduction. However, accumulating evidence supports that tree homologs of various flowering time and floral meristem identity genes have roles in both vegetative and reproductive phenology (Bohlenius et al., 2006; Bielenberg et al., 2008; Hoenicka et al., 2008; Mohamed et al., 2010; Hsu et al., 2011; Azeez et al., 2014; Tylewicz et al., 2015; Parmentier-Line and Coleman, 2016). Targeting such genes for manipulation can thus result in undesired vegetative effects, such as delayed bud flush, in addition to predicted effects on flowering or no effect on flowering (Hoenicka et al., 2012). However, promising results have also been achieved, such as the delayed flowering without growth reduction demonstrated by overexpressing the poplar ortholog of the floral repressor SHORT VEGETATIVE PHASE (SVP) (Klocko et al., 2018). Manipulation of floral organ identity genes might be less likely to have vegetative effects as these genes may show stronger conservation of reproductive-only function. For example, considerable evidence supports that the AGAMOUS (AG) subgroup of MADS-box genes have reproductive functions not only in angiosperms but also in gymnosperms (Dreni and Kater, 2014). However, even in these cases, results can differ from expectation. For example, downregulation of the conserved floral meristem identity gene LEAFY (LFY) in a male poplar genotype induced bisexual and female flowers (Klocko et al., 2018). Despite the challenges, the knowledge gained from gene function and sterility studies in trees along with more detailed and extensive genome-wide expression studies in different angiosperm trees will enable more accurate gene selection for manipulation of only-reproductive traits.

#### Gymnosperm Trees

Unlike angiosperms, where there is extensive knowledge of the molecular factors involved in the reproduction process, relatively little is known regarding gymnosperms. A number of putative genes have been identified through comparative analyses of orthologous angiosperm genes, tissue-specific expression analysis or genome sequencing. However, several key floral genes including FD, SQUAMOSA- (SQUA-) or SEPALLATA-like (SEPlike) seem to be absent (Becker, 2003; Abe et al., 2005; Zahn et al., 2005; Melzer et al., 2010; Karlgren et al., 2011; Jaeger et al., 2013). Initial research was able to detect orthologs to only the Band C-genes involved in the control of meristem formation and organ identity in the developing cones (Tandre et al., 1995, 1998; Mouradov et al., 1998; Rutledge et al., 1998; Fukui et al., 2001; Sundström and Engström, 2002; Gramzow et al., 2014; Katahata et al., 2014; Uddenberg et al., 2015). More recently, ABCE model prototype transcription factors, genes that define the developmental flower organ model (ABC(DE)) in angiosperms, have been confirmed in gymnosperms (Chen et al., 2017). Conifer-specific genes such as the DEFICIENS-AGAMOUS-LIKE (DAL) and NEEDLY have also been identified but functional knowledge is limited due to the lack of angiosperm orthologs (Carlsbecker et al., 2003, 2004; Rudall et al., 2011). It is not generally possible to predict which of the many vegetative meristems will undergo the reproductive bud transition before changes are initiated making research on reproductive initiation a bold venture (Williams, 2009).

The biggest bottleneck for conifer reproduction research is the inability to carry out functional characterization in an endogenous system which prevents the definitive elucidation of gene function. Testing of gene function in angiosperm model systems has produced inconclusive results. Whilst some such studies have confirmed the function of putative orthologs, others failed to find flowering related differences, found multiple phenotypic alteration or were unable to complement mutants, highlighting the need for a reliable conifer testing system (Rutledge et al., 1998; Tandre et al., 1998; Shindo et al., 2001; Sundström and Engström, 2002; Carlsbecker et al., 2003, 2004; Nilsson et al., 2007; Shiokawa et al., 2008; Klintenas et al., 2012; Katahata et al., 2014; Liu et al., 2018). As discussed above for angiosperm trees, it might be challenging to identify flowering time gene homologs in conifers that do not have roles in vegetative development that make their manipulation for reproductive sterility problematic. For example, the gymnosperm FLOWERING LOCUS T-like subfamily has been suggested to have roles in both vegetative and reproductive phenology (Klintenas et al., 2012; Karlgren et al., 2013; Nystedt et al., 2013; Liu et al., 2016). However, the lack of characterization in an endogenous system means that the function of the sub-family members remains unresolved.

## ENGINEERED STERILITY IN TREES

The increasingly sophisticated understanding of the molecular processes underpinning sexual reproduction described above has facilitated the successful demonstration of a number of sterility strategies in plants. Chief amongst these are strategies using ablation of reproductive cells or structures and the inactivation or suppression of genes essential for normal reproductive processes. Here, we highlight only a selection of sterility approaches and refer readers to Brunner et al. (2007) and Hoenicka et al. (2016b) for additional examples.

The use of cell or tissue-specific promoters to direct the expression of cytotoxic genes (Palmiter et al., 1987) to reproductive tissues has been widely used to investigate and modify reproductive development in plants (Goldman et al., 1994; Beals and Goldberg, 1997). Numerous examples exist of using this technology in plants to generate male and female sterility (Mariani et al., 1990; Goldman et al., 1994; De Block et al., 1997). Complete (dual male and female) sterility


has also been achieved using either independent male- and female-specific promoters or a single promoter targeting both tissues simultaneously (Liu and Liu, 2008; Huang et al., 2016). Male sterility using this cell ablation strategy has been demonstrated in both hardwood and softwood trees via expression of the BARNASE gene from Bacillus amyloliquefaciens under the control of reproductive tissue-specific promoters (**Table 1**). A key requirement for a cell ablation strategy is a promoter that tightly directs expression of the cytotoxin to the desired reproductive tissue to prevent pleiotropic effects on non-reproductive tissues. The conservation of expression of some floral genes has facilitated the use of a number of well characterized promoters across species (Strauss et al., 1995). Indeed, an anther-specific promoter derived from Pinus radiata has been used to express the BARNASE gene in both a softwood (pine) and hardwood (Eucalyptus) tree to deliver male sterility (Zhang et al., 2012). We are unware of dual male/female BARNASE-mediated sterility being demonstrated in trees without negative pleiotropic effects (Lemmetyinen et al., 2004) but this should be possible if a suitable promoter is used.

RNA interference (RNAi) is a well proven homologydependent gene silencing technology that involves doublestranded RNA directed against a target gene or its promoter region (Mansoor et al., 2006). Numerous demonstrations of engineered sterility through the suppression of genes essential for normal reproduction are available in angiosperm species (Wang et al., 2012). RNAi silencing has been used in angiosperm trees to engineer sterility with constructs targeting LFY and AG successfully producing sterile trees (**Table 1**). The production of male and female sterile plants via the use of chimeric repressors targeting transcription factors involved in flower development has also been demonstrated (Mitsuda et al., 2006; Katahata et al., 2014). In conifers the use of gene suppression methods to prevent reproduction has not been demonstrated even though such methods have been widely used to investigate wood quality traits (Wagner et al., 2005, 2009; Souza et al., 2007; Trontin et al., 2007). Attempts have been reported of expressing conifer flowering-associated genes in an endogenous system (Karlgren et al., 2013) but these studies have not directly sought to address sterility.

This lack of success in conifers reflects both a lack of fundamental knowledge regarding conifer reproduction and the inherent difficulties in working with conifers including the long timescale required for testing. For example, attempts to investigate the effects of over-expressing the Arabidopsis LFY gene in P. radiata were not informative as neither modified or control plants initiated reproduction during the 8 years that the trees were grown (NZ-EPA, 2008; Lottmann et al., 2010).

# FUTURE OUTLOOK AND CHALLENGES

The social, legal and ecological impacts of sterile trees is still controversial (Williams, 2005; Kazana et al., 2015; Strauss et al., 2017). Although sterility provides mitigation for some of the social and ecological objections to the deployment of both GE trees and species with the potential to become invasive, this may be challenged if the sterility technology is itself GE.

The recent development of a number of new breeding technologies, including gene editing, that are already seeing widespread application in crop species (Nekrasov et al., 2017; Waltz, 2018) have great potential in forest trees. Sitedirected mutagenesis would allow the inactivation of genes that are essential for normal reproductive processes and the generation of sterile trees. Gene editing-mediated mutagenesis would be particularly advantageous in forest trees where, to date, mutagenesis breeding has played an extremely limited role. The permanent inactivation of a gene would provide assurance of enduring containment and reduce concerns associated with the stability of long-term transgene expression associated with silencing of over-expression technologies (Li et al., 2008). Site directed mutagenesis via CRISPR-cas9 has been demonstrated in a number of tree species including Poplar (Fan et al., 2015) where mutagenesis of genes involved in flowering (Elorriaga et al., 2018) has also been shown. To date, gene editing has not been published in conifer species. However, the existence of a small number of natural spontaneous sterile conifer mutants (Orr-Ewing, 1977; Wilson and Owens, 2003; Rudall et al., 2011) suggest that a targetedmutagenesis strategy would be successful if suitable targets can be identified.

Although the regulatory landscape regarding gene editing technologies remains complex, it is likely that in many jurisdictions versions of the technology that do not include foreign DNA in the final organisms will not be regulated as GMOs (Waltz, 2016; Davison and Ammann, 2017;

Ishii and Araki, 2017). This would provide a more straightforward and less costly route to commercial release than is currently the case for products of GE technology (Waltz, 2018). This regulatory approach would hold particular promise in applications where sterility is a standalone trait, such as for the control of invasive tree species, rather than providing a means of containment for other (GE) traits. This strategy would require DNA-free editing technologies as outcrossing of transgenes would not be possible with sterile trees.

The second major challenge has been the inability to carry out timely prototyping of sterility constructs in commercially important species. To facilitate testing in conifers it is desirable to develop a system analogous to the Poplar model system (Jansson and Douglas, 2007; Douglas, 2017) which has allowed relatively rapid prototyping of sterility constructs (Klocko et al., 2018). Although effective transformation systems exist for a number of commercially important conifers including P. radiata, P. taeda, and Picea abies, these species have long pre-reproductive juvenile growth periods that limits their use as sterility-testing platforms (Tang and Newton, 2003; Uddenberg et al., 2015). Some conifer species are able to reproduce at a much younger age (Righter, 1939; Pharis et al., 1987; Uddenberg et al., 2013) or can be induced to undergo early reproduction. Such precocious reproduction has been demonstrated in both hardwoods and softwoods by grafting onto older rootstock (Simak, 1978; Zhang et al., 2012), and by the application of external stimuli such as hormone treatments (Pharis et al., 1965; Ross and Pharis, 1985; Meilan, 1997). Stable introduction of FT transgenes induced precocious fertile flowers in Eucalyptus (Klocko et al., 2016b) and in Populus when combined with a low temperature treatment (Hoenicka et al., 2016a). In fruit trees, viral vectors that express floral promoters or

#### REFERENCES


silence repressors induced early flowering (Velázquez et al., 2016; Yamagishi et al., 2016). Although these offer potential routes to earlier testing of sterility strategies, developing the required tissue culture and transformation capabilities for a new tree species remains a significant barrier.

The increasing availability of genome and transcriptome resources for forest trees is providing new insights into reproductive processes. This is reducing the reliance on non-tree model systems and providing novel species-specific knowledge of reproductive processes and candidate genes for modification. The development of gene-editing-based targeted mutagenesis is likely to be the most attractive route to engineered sterility as it offers precise and predictable modifications combined with assurance of phenotypic stability. The lack of global consensus on the regulation of gene editing technology remains a barrier to research investment and commercialization and complicates the public debate that must go hand-in-hand with progress toward implementation.

#### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

#### FUNDING

This work was supported by Scion's Strategic Science Investment Funding (SSIF) from the Science and Innovation Group, Ministry of Innovation, Business and Science.

flowering and seasonal growth cessation in trees. Science 312, 1040–1043. doi: 10.1126/science.1126038



transgenic male and female early flowering poplar (Populus tremula L.). Tree Physiol. 36, 667–677. doi: 10.1093/treephys/tpw015



plant reproduction: a two-decade history. J. Exp. Bot. 65, 4731–4745. doi: 10. 1093/jxb/eru233



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Fritsche, Klocko, Boron, Brunner and Thorlby. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genome-Wide Association Studies to Improve Wood Properties: Challenges and Prospects

Qingzhang Du1,2,3, Wenjie Lu1,2,3, Mingyang Quan1,2,3, Liang Xiao1,2,3, Fangyuan Song1,2,3 , Peng Li1,2,3, Daling Zhou1,2,3, Jianbo Xie1,2,3, Longxin Wang1,2,3 and Deqiang Zhang1,2,3 \*

<sup>1</sup> Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, Beijing, China, <sup>2</sup> National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China, <sup>3</sup> Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China

#### Edited by:

Ronald Ross Sederoff, North Carolina State University, United States

#### Reviewed by:

Ahmad M. Alqudah, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung (IPK), Germany Veronique Storme, Ghent University, Belgium

> \*Correspondence: Deqiang Zhang DeqiangZhang@bjfu.edu.cn

#### Specialty section:

This article was submitted to Plant Biotechnology, a section of the journal Frontiers in Plant Science

Received: 05 July 2018 Accepted: 10 December 2018 Published: 21 December 2018

#### Citation:

Du Q, Lu W, Quan M, Xiao L, Song F, Li P, Zhou D, Xie J, Wang L and Zhang D (2018) Genome-Wide Association Studies to Improve Wood Properties: Challenges and Prospects. Front. Plant Sci. 9:1912. doi: 10.3389/fpls.2018.01912 Wood formation is an excellent model system for quantitative trait analysis due to the strong associations between the transcriptional and metabolic traits that contribute to this complex process. Investigating the genetic architecture and regulatory mechanisms underlying wood formation will enhance our understanding of the quantitative genetics and genomics of complex phenotypic variation. Genome-wide association studies (GWASs) represent an ideal statistical strategy for dissecting the genetic basis of complex quantitative traits. However, elucidating the molecular mechanisms underlying many favorable loci that contribute to wood formation and optimizing GWAS design remain challenging in this omics era. In this review, we summarize the recent progress in GWAS-based functional genomics of wood property traits in major timber species such as Eucalyptus, Populus, and various coniferous species. We discuss several appropriate experimental designs for extensive GWAS in a given undomesticated tree population, such as omics-wide association studies and high-throughput phenotyping technologies. We also explain why more attention should be paid to rare allelic and major structural variation. Finally, we explore the potential use of GWAS for the molecular breeding of trees. Such studies will help provide an integrated understanding of complex quantitative traits and should enable the molecular design of new cultivars.

Keywords: GWAS, omics, functional genomics, wood formation, systems biology

# INTRODUCTION

Wood, the secondary xylem of long-lived perennial plants, is produced via cell division from the vascular cambium, cell expansion, cell wall thickening, programmed cell death, and heartwood formation (Plomion et al., 2001; Mellerowicz and Sundberg, 2008). In general, the chemical and ultrastructural properties of wood depend on the components of the secondary cell walls, allowing wood to fulfill highly specialized functions that are essential for tree growth and development (Du et al., 2013a; Mizrachi and Myburg, 2016). Wood also represents a major carbon sink that plays a crucial role in carbon cycling in the terrestrial ecosystem, serving as an important renewable resource for the production of lumber, pulp, paper, and biofuels (Mellerowicz and Sundberg, 2008).

Progress has recently been made toward modifying the major wood biopolymers (i.e., lignin and cellulose) in several model plants (Robischon et al., 2011; Chen et al., 2014; Ye and Zhong, 2015), but much remains to be explored about the biosynthetic machinery of the chemistry and ultrastructure of wood cell walls. Most studies to date indicate that secondary wall biosynthesis needs the coordinated activity of transcriptional networks in regulating the diverse metabolic pathways involving polysaccharides and lignin biopolymers (Coleman et al., 2009; Mewalal et al., 2014). This intricate biological process incorporates a diverse set of xylem-forming genes, most with unknown functions (Zinkgraf et al., 2017). The genetic architecture and functional mechanisms that directly affect wood properties have not yet been fully identified and dissected.

Numerous studies have examined the underlying genetic variation in complex polysaccharides (alpha cellulose, hemicelluloses, and holocellulose), lignin (insoluble, soluble, syringyl, and total lignin), cell wall sugars (arabinose, glucose, mannose, and xylose), and ultrastructural traits (average density, crystallinity, fiber length, and microfiber angle) in Eucalyptus, Populus, and various coniferous species using forward genetic approaches, such as quantitative trait loci (QTLs) (Sewell et al., 2000; Novaes et al., 2009; Thumma et al., 2010; Yin et al., 2010) and candidate gene-based association mapping (Thumma et al., 2009; Dillon et al., 2010; Wegrzyn et al., 2010; Beaulieu et al., 2011; Du et al., 2013b; Guerra et al., 2013; Porth et al., 2013). However, the QTLs, single nucleotide polymorphism (SNP) loci, and candidate genes identified to date explain only a small proportion of the genetic variation of wood components. Genes cannot work in isolation; instead, multiple genes within complex biological pathways are often jointly involved in phenotypic variation (Du et al., 2015; Zinkgraf et al., 2017). Therefore, a more holistic research approach encompassing whole genome variation must be taken to understand and improve wood property traits.

Given the high genetic diversity, almost undomesticated status, rapid decay of linkage disequilibrium (LD), and minimal genetic structure of forest tree populations, such populations should represent ideal systems for conducting association studies and breeding using molecular marker-assisted selection (MAS). LD is fundamentally important for any genome-wide association study (GWAS) when genotyping does not cover all sequence variants in a genome. Indeed, the advantage of a low extent of LD is that once an association is detected, it is likely that this marker is physically close to the causal variant (likely within the gene itself) or is even the causal variant (Ingvarsson et al., 2016). Dissection of genome–phenotype associations through GWAS in a variety of systems is thus expected to efficiently bridge the gap between QTLs and causal genes in most forest trees (Porth et al., 2013), thanks to the currently available large populations and high-throughput sequencing technology. Here, we (1) summarize the recent progress in functional genomics of wood property traits via GWAS; (2) discuss the statistical methods and experimental designs needed to improve the use of GWAS in trees; and (3) explore opportunities for the use of GWAS in the molecular breeding of trees.

# RECENT PROGRESS IN GWAS FOR EXPLORATION OF WOOD PROPERTY TRAITS

Trees have a wide geographical distribution and large wild population sizes and thus exhibit diverse responses to environmental changes. Association mapping principally exploits evolutionary recombination at the natural population level. Thus, a collection of cultivars/natural individuals with unknown ancestry and newly designed nested association mapping (NAM) populations are often used for association analysis in trees. The availability of fast, accurate estimation methods for variance components is a prerequisite for performing GWAS. Yu et al. (2006) proposed a mixed linear model (MLM) method for better controlling population structure and the imbalanced kinships among various individuals (Pritchard et al., 2000). Genome-wide rapid association analysis by mixed model and regression (GRAMMER) was subsequently developed to roughly estimate random effects. Unlike approximate estimation models, the efficient mixedmodel association (EMMA) matrix is a method for more accurately estimating the genetic and residual variance of a population, and therefore, may speed up the calculation process (Kang et al., 2008). EMMA eXpedited (EMMAX) and Population Parameters Previously Determined (P3D) are two other estimation methods that reduce the need for computational processing (Kang et al., 2010; Zhang et al., 2010). Factored spectrally transformed linear mixed models (FaST-LMM) (Lippert et al., 2011) and genome-wide efficient mixed-model association (GEMMA) (Zhou and Stephens, 2012) methods were subsequently developed. The advantage of these approaches is that they allow variance components to be directly estimated. Meta-GWAS and Joint-GWAS are used for obtaining higher statistical power in analyses of multiple tree populations (Wu et al., 2016; Müller et al., 2018).

Next-generation sequencing (NGS) and SNP arrays have opened up new possibilities for obtaining almost all SSRs and SNPs variants within a gene space or even within an genome-wide scan (Evans et al., 2014; Ingvarsson et al., 2016). Moreover, NGS has enabled genome-wide discovery of structural variation, insertion/deletions (InDels), and copy number variants (CNV) (Marroni et al., 2014), in a growing number of whole-genome resequencing studies in several model tree species (Evans et al., 2014; Silva-Junior and Grattapaglia, 2015; Wang J. et al., 2015; Wang et al., 2016; Du et al., 2016). These GWAS have yielded important insights into the genetic basis of complex quantitative traits in woody species, primarily focusing on wood compositions and wood property traits (**Table 1**) (Cappa et al., 2013; Porth et al., 2013; McKown et al., 2014; Allwright et al., 2016; Lamara et al., 2016; Gong et al., 2017; Resende et al., 2017; Zinkgraf et al., 2017).



Diversity Array Technology (DArT) markers.

Porth et al. (2013) performed the first GWAS for key wood chemistry and ultrastructure traits in a population of 334 unrelated black cottonwood (Populus trichocarpa) individuals and found 141 significant SNPs associated with cell wall traits. Only 40% of these associations involve genes previously known to function in wood formation (Wegrzyn et al., 2010). For example, a synonymous SNP within the FRA8 ortholog can explain 21.0% of the total genetic variance of fiber length, suggesting the GWAS could provide insight into the genetic basis of wood traits. Further integration of datasets from transcription factor binding, transcriptome profiling, and GWAS experiments in Populus provided an effective means of annotating conserved gene co-expression modules under diverse environmental conditions (Zinkgraf et al., 2017). The combinatorial utility of QTL and chromosome-wide association mapping enabled the authors to identify six genes (encoding the transcription factors ANGUSTIFOLIA C-terminus Binding Protein [CtBP] and KANADI, a Ca2+-transporting ATPase, an amino acid transporter, the copper transporter ATOX1 related, and a protein kinase) of functional relevance to cell wall recalcitrance in Populus (Muchero et al., 2015). These putative causal regulators may be utilized for selective breeding of trees.

These GWAS have largely focused on the common genetic variants using MLM-based methods. Identifying all functional variants that contribute to certain phenotypes has been infeasible. However, novel statistical models that combine the joint genetic effects of all variable loci at the whole-genome level have recently become the most popular types of models, allowing breeders to use genomic information to advance breeding programs (Yang N. et al., 2014; Resende et al., 2017; Xiao et al., 2017). For example, regional heritability mapping (RHM), has been proposed that provides heritability estimates for sequence segments with common or rare genetic effects (Nagamine et al., 2012). RHM was quite powerful for the detection of true QTLs in 768 hybrid Eucalyptus trees (Resende et al., 2017), suggesting that complex traits in Eucalyptus are controlled by multiple allele variants with rare effects.

Regardless of the statistical method utilized, the availability of numerous samples is especially important for detecting associated small-effect loci. Fahrenkrog et al. (2017) conducted GWAS for wood composition traits and detected a combination of common and rare SNPs by targeted resequencing of 18,153 genes in 391 unrelated Populus deltoides individuals. The results indicated that low-frequency SNPs associated with several bioenergy traits, suggesting that both common and rare variants must be considered in order to show a comprehensive picture of the genetic dissection of complex traits (Resende et al., 2017). Non-SNP allelic variation is another frequently used explanation for "missing heritability" of complex traits. Gong et al. (2017) utilized InDel-based GWAS to detect the causal variants underlying growth and wood properties in 435 unrelated Populus tomentosa accessions and identified regulatory InDels with an average of 14.7% phenotypic variance explained. The higher contribution of InDels to phenotypic variance compared to SNP loci, with a median of c. 5% explained, supports the notion

that the InDels represent an effective marker system for MAS.

Genome-wide association study might also be feasible for coniferous species, even though their genomes are generally very large (often as much as 20 Gb). Uchiyama et al. (2013) examined the potential of performing GWAS in conifers using 367 unrelated plus trees of Cryptomeria japonica D. Don and identified six novel QTLs that were significantly associated with variation in wood property traits and the quantity of male strobili. All six SNPs were identified in sequences sharing similarity with known genes, such as genes encoding microtubule-associated protein RP/EB family members and a CLIP-associated protein. A xylem co-expression network was reconstructed in white spruce (Picea glauca) based on 180 wood-associated-genes, and the network hubs of several known NAC and MYB regulators, which were found using integrated GWAS and co-expression networks (Lamara et al., 2016). The genome sequence for Norway spruce (Picea abies) (Nystedt et al., 2013) provides a basis for functional multi-locus GWAS of wood properties in this species. A recent study used a multi-locus LASSO penalized regression method to identify 39 candidate genes involved in the formation of both early and late wood, as well as dynamic processes of juvenility, which could be useful for explaining the temporal regulation on secondary growth (Baison et al., 2018).

Over the past 5 years, new whole-genome resequencing and emerging GWAS approaches that have been used to dissect ecologically relevant traits and wood traits in forest trees (Evans et al., 2014; Ingvarsson et al., 2016). Zhang et al. (2018) recently performed GWAS and expression QTL (eQTL) analyses of 917 P. trichocarpa accessions and found that hydroxycinnamoyl-CoA:shikimate hydroxycinnamoyl transferase 2 (PtHCT2) controlled chlorogenic acid (CGA) and partially characterized metabolite levels, providing novel insights into omics-based inference of gene function in trees. Such practical applications of GWAS require the development of novel statistical methods, databases, and experimental designs for forest trees.

# FUTURE APPLICATIONS OF GWAS TO ANALYZE COMPLEX QUANTITATIVE TRAITS IN TREES

Complex phenotypic variations, such as wood formation in longlived tree populations, involve a series of dynamic biological processes orchestrated in a precise, quantitative manner, including the transcriptional and translational regulation and the flux of metabolic intermediates of diverse biochemical pathways (Mizrachi and Myburg, 2016). We take advantage of GWAS to explore the single-marker effects, pleiotropic relationships, genetic interactions among these loci, and their interactions with different environmental factors (Ingvarsson et al., 2016; Plomion et al., 2016). Thus, based on recent advances in GWAS for identification of loci affecting wood property traits, we will discuss several statistical methods and experimental designs that could facilitate the extensive application of GWAS in a given undomesticated tree population.

# Drive Omics-Wide Association Study (OWAS) in Trees

Knowledge of the genetic basis and hierarchical interactions of higher-order variation at an intermediate level, such as transcriptomic, proteomic, and metabolomic variation ranging from genotypic to phenotypic, has proven to be a great resources as "molecular phenotypes" will be crucial for dissecting the functional pathways underlying complex quantitative traits in trees (**Figure 1**). The joint use of "-omics"-wide data will help disclose candidate genes and functional pathways underlying target traits beyond GWAS (Feltus, 2014; Mizrachi and Myburg, 2016). The comprehensive reconstruction of global biochemical networks across multiple omic layers would entail the use of omics-wide association study (OWAS) techniques proposed based on both multi-omic measurements and computational data integration. A systematic OWAS approach must therefore be adopted to map interactive eQTLs, protein QTLs (pQTLs), and metabolite QTLs (mQTLs) underlying these different profiles for quantitative traits (**Figure 1**). The opening of this new OWAS era in trees will allow us to identify minor QTLs that are masked by major loci (Luo, 2015; Xiao et al., 2017).

As a prerequisite, precise sequencing and rigorous annotation of holistic reference genomes are needed for large heterozygous tree genomes, which remains challenging. Work is also needed to refine new sequencing technologies and to develop relevant bioinformatics approaches, which would lay the foundation for identifying genome-wide allele variation and performing massively parallel RNA sequencing (RNA-seq) of 1000s of genotypes (Alonso-Blanco and Méndez-Vigo, 2014). In particular, non-coding RNAs constitute a specific component of the transcriptome worthy of attention, which could provide new insights into the regulatory mechanisms underlying eQTLs (Zhou et al., 2015; Plomion et al., 2016; Quan et al., 2018). In this new omics era for trees, many challenges remain. For example, if differences between tissues and developmental stages were not considered, conventional mapping methods would not be suitable for identifying new higher-level omics variation.

# Development of High-Throughput Phenotyping Technologies

In addition to obtaining complete genome sequence information, the improvement of phenotyping methods and the availability of high-quality trait data are also priority areas for successful GWAS in forest trees (Li Z. et al., 2014; Du et al., 2016). Trees are usually large and have a long lifespan, making it difficult to perform phenotyping in outdoor test plantations on the biological, structural, and temporal levels (for example, tree biomass, wood properties, and biotic and abiotic responses to stress; **Figure 1**). The conventional procedures used for phenotyping of tree populations, which represent a "phenotyping bottleneck" (Furbank and Tester, 2011). The highly dynamic nature of genes, proteins, and metabolite levels along a temporal or developmental trajectory, point to the need for careful

experimental design and sampling conditions to ensure that the samples (Alonso-Blanco and Méndez-Vigo, 2014) will provide relevant insights about the phenotype of interest.

More attention should focus on how to develop high-throughput phenotyping (HTP) technologies to increase the precision of time-series data for functional plant traits. The emerging plant HTP platform uses a variety of imaging methods including visible imaging, imaging spectroscopy, thermal infrared imaging, fluorescence imaging, 3D imaging, and tomographic imaging to collect data for quantitative studies of complex traits related to plant growth, yield, adaptation, morphological and physiological traits (Li L. et al., 2014; Gómez-Candón et al., 2016; Yendrek et al., 2017).

Such a computational ecosystem provides tremendous opportunities for solving foundational problems in predictive phenomics and accelerating breeding efforts (Singh et al., 2016; Shakoor et al., 2017). For example, high-throughput 3D imaging can be used to map QTLs underlying variation in root architecture and seed traits in herbaceous plants (Moore et al., 2013; Topp et al., 2013). Gómez-Candón et al. (2016) used thermal IR data to characterize individual apple tree responses to drought, and it can identify genotypic variation with differential phenotypic responses to water limitation. Plant phenomics facilities have been opening worldwide (Brown et al., 2014). Careful consideration of the appropriate sensors and HTP time points for field traits is important for data collection in long-lived woody species. For example, RGB color camera networks can be deployed to collect images continuously over the course of periods to assess biomass, growth rate, and disease progression. In this emerging era of phenomics, a tree HTP platform utilizing high-performance computing software and measurement hardwire is needed to collect multi-dimensional phenotyping data for different environments over long periods of time. Therefore, the resolution and precision of phenotyping systems should be higher for long-lived trees than the HTP systems developed for annual herbaceous plants (Meijón et al., 2014; Slovak et al., 2014; Yang W. et al., 2014).

### Increase the Focus on Rare Variation and Major Structural Variation

Over the past few years, population-scale resequencing of the human genome has enabled more comprehensive analysis of allelic effects on certain complex diseases. Some rare variants (i.e., minor allele frequency [MAF] < 1% or 5%) are likely to be more extreme than common variants for susceptibility to diseases with high and low heritability (Genomes Project et al., 2010; Nelson et al., 2012; Tennessen et al., 2012). Most association studies in plant species have typically removed rare variants due to the limited power of statistical approaches for detecting their contribution to the phenotypic variation of complex traits (Porth et al., 2013; Du et al., 2015).

Progress has been made in several crops and model plants. For example, Wang S. et al. (2015) exploited rare variation mapping to discover a TCP transcription factor gene vital for

tendril development in cucumber (Cucumis sativus L.). Xing et al. (2015) found that a low-frequency SNP within Brachytic2 can moderately increase yield and reduce stem height in maize. The genomes of woody species have not experienced reductions in genetic diversity due to domestication, and rare variants are abundant among tree genomes but less tractable in GWAS. Such abundant rare variants are actually important for explaining the missing heritability of complex traits (Resende et al., 2017).

Rare beneficial alleles are usually employed in current tree breeding, which once proven useful, undergo selective sweep and become fixed in all major cultivars. Balancing samples across population subdivisions and increasing the sample size can homogenize allele frequencies and elevate globally rare variants to common markers in some subpopulations. In addition, the use of multiple bi-parental crosses can break up population structure, which could increase the power of detecting multiple rare alleles that underlie natural variation. The promising approach of joint mapping with association panels and multiple bi-parental crosses in trees is also valuable for identifying low-frequency or smalleffect alleles (Du et al., 2016). Decreases in sequencing costs and improvements in genotyping technologies have promoted the use of exceptionally large diverse populations to identify and analyze rare variants. Other statistical models and approaches, such as RHM (Nagamine et al., 2012) and the SKAT package for estimating the joint effect (Fahrenkrog et al., 2017), should be used to detect and dissect rare variation in tree populations.

Advances in sequencing technology and bioinformatics algorithms offer the potential to test previously undetected structural variation within re-sequencing populations of a species (Xiao et al., 2017). In contrast to the currently known SNPs and SSRs, CNVs can be categorized into deletions, duplications, and insertions that have not been given much attention in tree species (Feltus, 2014; Ingvarsson et al., 2016). Hence, investigating the roles, structures, and functions of CNVs as both biomarkers for mapping efforts and potential functional variants in woody plants should provide vital findings about phenotypic variation (Marroni et al., 2014). The presence of high levels of heterozygosity in perennial forest trees reduces the power of CNV detection. To address this issue, the precision of genome sequencing should be enhanced and the resequencing coverage depth should be increased, which would contribute to fine read-depth-based CNV identification.

#### Dissect the Often-Ignored Epistasis Effects in GWAS

Genetic variation of quantitative traits is classified into additive, dominant, and epistatic effects, which are conferred by numerous genes/alleles in the multiple biological networks (Du et al., 2015). It is likely that non-additive interactions between separate mutations (epistasis) often reflect the missing heritability and lack of inter-population validation of causal variants (Manolio et al., 2009). However, detecting locus-locus epistasis is challenging experimentally, statistically, and computationally due to the large number of complex interactions to be dissected in GWAS (Mackay, 2014). Several association algorithms, such as BiForce (Gyenesei et al., 2012), genome-wide interaction studies (GWIS, Goudey et al., 2013), the Kempthorne model (Mao et al., 2007), and FastEpistasis (Schüpbach et al., 2010) make it possible to test for two-way epistasis between all locus pairs. Indeed, testing for pairwise epistasis would shed light on genetic architecture.

Additionally, dissecting the genome-wide genetic basis of exhaustive gene/allele epistasis using different populations could provide insight into the roles of epistasis in response to local adaptation in trees (Du et al., 2018). Once a gene–gene interaction network is constructed using GWAS (**Figure 2**), it can be validated as communities of genes (gene modules) that interact strongly within the complex network using RNA profiling, molecular biology, and biochemistry (Mackay, 2014; Gong et al., 2017). Quantitative traits controlled by multiple gene networks are often regulated by higher-order gene– gene interactions that are too complex to be analyzed by standard two-way epistatic tests (Du et al., 2015). In the future, more attention should focus on the pleiotropic effects of epistasis on gene expression, metabolites, and development in different geographical populations. Such knowledge would provide insights into the mechanisms and functions of the gene networks underlying plant traits.

## PERSPECTIVES FOR POST-GWAS AND MOLECULAR BREEDING IN TREES

A deeper knowledge of the genetic basis of complex traits, such as wood properties, can be achieved through the integrated use of diverse statistical models and experimental tools. Systems genetics could be used to understand the molecular mechanisms of candidate genes underlying traits of interest. Some promising studies demonstrate the advantages of this approach across a tree's lifespan (Feltus, 2014; Ingvarsson et al., 2016; Plomion et al., 2016). However, functional validation and annotation of causal loci remain challenging, especially because most loci detected by GWAS are located in intergenic regions or are not canonical components of previously identified pathways (Civelek and Lusis, 2013; Xiao et al., 2017). Given the growing number of genomic variations mapped by GWAS for a given trait, a systems biology approach may be needed to validate the function of these alleles using holistic data that provide evidence for causality. Family-based accession intercrosses using different alleles could be performed to clearly determine whether an allelic variant of a specific gene is causal for the observed natural trait variation (Du et al., 2016). Promising advances in genetic transformation and genomic technologies as well as statistical and computational methods may help to address these issues.

Tree MAS breeding may depend on the use of a pleiotropic allele or a favorable combination of alleles for multiple traits of interest (Lamara et al., 2016), such as the MYB transcription factor and LncRNA genes identified for lignin and polysaccharides traits (Zhou et al., 2017; Quan et al., 2018). Thus, it is particularly important to link specific alleles to corresponding traits. High-throughput whole-genome sequences should enable the use of alternative breeding approaches, including genomicsassisted selection (GAS) and genome selection (GS) breeding strategies, as well as genome-based phenotypic prediction and

FIGURE 2 | Identification of single marker effects and epistasis for wood traits using the systems genetics integrative framework. (A) Manhattan plots displaying genome-wide association study (GWAS) results for lignin content. The most significant SNP (SNP\_G1) associated with lignin content was identified, which is located in exon 1 of a gene (Gene01). (B) GWAS results for Gene01 expression using express QTL (eQTL) methods. The most significant SNP (SNP\_G2) associated with Gene01 expression was identified, which is located in exon 9 of a gene (Gene02). The x-axis shows the chromosome positions and the y-axis shows significance expressed as –log10. The dotted horizontal line depicts the Bonferroni-adjusted significance threshold (0.05/n). Three linked genes are shown at the bottom (red rectangle, coding sequences; black line, introns; green rectangle, 5<sup>0</sup> and 3<sup>0</sup> untranslated regions). (C) Box plot for lignin content trait (red) and expression of Gene01 (sky blue) plotted as an effect of genotypes at the lead SNP. The horizontal line represents the mean and the vertical lines mark the range from the 5th and 95th percentile of the total data. (D) Plots of correlation between each pair for lignin content, Gene01 expression, and Gene02 expression level among the genotype classes. The r-value is based on the Pearson correlation coefficient. The P-value was calculated using the t approximation. (E) Pairwise interactions between SNP\_G1 and SNP\_G2 control Microfibril angle (MFA) with different genotypic combinations at the two loci. (F) These GWAS and eQTL data were used to construct the expression network shown in the figure. Gene02 (blue dot) is located at the identified eQTL significance peak and is the network hub that affects the downstream Gene01 (red dot) to affect lignin content. Gene02 can also interact with Gene01 in the network, which is associated with MFA.

design breeding using rare, recessive, large-effect mutations (Resende et al., 2012; Evans et al., 2014; Varshney et al., 2016). Specifically, using GWAS tools, it will be possible to identify many more causative genes and their roles in multiple linked traits, highlighting the potential of GWAS for systematically uncovering the balanced regulation between these traits and identifying the key hub genes that link them (Feltus, 2014; Lamara et al., 2016). However, QTL and GWAS information has not yet been successfully used for the molecular breeding of perennial trees.

High-throughput GWAS combined with precision genome editing has huge potential for combining currently available causal alleles in the gene pool (Zhou et al., 2015; Xiao et al., 2017). This GAS strategy could be used to accelerate the improvement of wood traits, as well as plant growth and stress resistance. In addition, whole-genome prediction of hybrid performance will be very important for tree breeding design, as heterosis has been successfully exploited for many major forest species. Selection and crosses based on integrated genomics approaches, as well as genetic modification via transformation, could potentially be used to develop new and superior cultivars.

#### REFERENCES


## AUTHOR CONTRIBUTIONS

DZhang planned the review. QD, DZhou, WL, MQ, and LX designed the figures and revised the review. QD, FS, PL, JX, and LW collected the data. QD wrote the paper. All authors read and approved the manuscript.

#### FUNDING

This work was supported by the Project of the Natural Science Foundation of Beijing (Grant No. 6172027), the Project of the National Natural Science Foundation of China (Grant Nos. 31500550 and 31670333), and the Beijing Nova Program (Grant No. Z181100006218024).

## ACKNOWLEDGMENTS

We thank Prof. Ronald R. Sederoff (North Carolina State University, Raleigh, NC, United States) for his detailed edits and specific suggestions for improving the manuscript.


insights for thermal acquisition and calibration. Precis. Agric. 17, 786–800. doi: 10.1007/s11119-016-9449-6


fpls-09-01912 December 20, 2018 Time: 16:19 # 9

resequencing and single nucleotide polymorphism genotyping unlock the evolutionary history of Eucalyptus grandis. New Phytol. 208, 830–845. doi: 10.1111/nph.13505


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Du, Lu, Quan, Xiao, Song, Li, Zhou, Xie, Wang and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fpls-09-01912 December 20, 2018 Time: 16:19 # 10

# Quantitative Genetics and Genomics Converge to Accelerate Forest Tree Breeding

Dario Grattapaglia1,2,3,4 \*, Orzenil B. Silva-Junior 1,2, Rafael T. Resende<sup>1</sup> , Eduardo P. Cappa5,6, Bárbara S. F. Müller 1,3, Biyue Tan<sup>7</sup> , Fikret Isik <sup>4</sup> , Blaise Ratcliffe<sup>8</sup> and Yousry A. El-Kassaby <sup>8</sup>

<sup>1</sup> EMBRAPA Recursos Genéticos e Biotecnologia, Brasília, Brazil, <sup>2</sup> Programa de Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, Brazil, <sup>3</sup> Departamento de Biologia Celular, Universidade de Brasília, Brasília, Brazil, <sup>4</sup> Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC, United States, <sup>5</sup> Centro de Investigación de Recursos Naturales, Instituto de Recursos Biológicos, INTA, Buenos Aires, Argentina, <sup>6</sup> Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina, <sup>7</sup> Biomaterials Division, Stora Enso AB, Stockholm, Sweden, <sup>8</sup> Department of Forest and Conservation Sciences, Faculty of Forestry, University of British Columbia, Vancouver, BC, Canada

#### Edited by:

Steven Henry Strauss, Oregon State University, United States

#### Reviewed by:

Vincent Segura, Institut National de la Recherche Agronomique (INRA), France Deqiang Zhang, Beijing Forestry University, China

#### \*Correspondence:

Dario Grattapaglia dario.grattapaglia@embrapa.br

#### Specialty section:

This article was submitted to Plant Biotechnology, a section of the journal Frontiers in Plant Science

Received: 19 June 2018 Accepted: 31 October 2018 Published: 22 November 2018

#### Citation:

Grattapaglia D, Silva-Junior OB, Resende RT, Cappa EP, Müller BSF, Tan B, Isik F, Ratcliffe B and El-Kassaby YA (2018) Quantitative Genetics and Genomics Converge to Accelerate Forest Tree Breeding. Front. Plant Sci. 9:1693. doi: 10.3389/fpls.2018.01693 Forest tree breeding has been successful at delivering genetically improved material for multiple traits based on recurrent cycles of selection, mating, and testing. However, long breeding cycles, late flowering, variable juvenile-mature correlations, emerging pests and diseases, climate, and market changes, all pose formidable challenges. Genetic dissection approaches such as quantitative trait mapping and association genetics have been fruitless to effectively drive operational marker-assisted selection (MAS) in forest trees, largely because of the complex multifactorial inheritance of most, if not all traits of interest. The convergence of high-throughput genomics and quantitative genetics has established two new paradigms that are changing contemporary tree breeding dogmas. Genomic selection (GS) uses large number of genome-wide markers to predict complex phenotypes. It has the potential to accelerate breeding cycles, increase selection intensity and improve the accuracy of breeding values. Realized genomic relationships matrices, on the other hand, provide innovations in genetic parameters' estimation and breeding approaches by tracking the variation arising from random Mendelian segregation in pedigrees. In light of a recent flow of promising experimental results, here we briefly review the main concepts, analytical tools and remaining challenges that currently underlie the application of genomics data to tree breeding. With easy and cost-effective genotyping, we are now at the brink of extensive adoption of GS in tree breeding. Areas for future GS research include optimizing strategies for updating prediction models, adding validated functional genomics data to improve prediction accuracy, and integrating genomic and multi-environment data for forecasting the performance of genetic material in untested sites or under changing climate scenarios. The buildup of phenotypic and genome-wide data across large-scale breeding populations and advances in computational prediction of discrete genomic features should also provide opportunities to enhance the application of genomics to tree breeding.

Keywords: genomic selection (GS), tree breeding, quantitative genetics, whole-genome regression, single nucleotide polymorphisms (SNP), marker assisted selection (MAS), realized genomic relationship

# INTRODUCTION

Forest tree breeding encompasses a number of steps to increase the frequency of advantageous alleles for several traits concurrently in a target population. Recurrent cycles of selection ultimately result in genetically improved planting material by maximizing genetic gain per unit time in the most cost-effective way (Namkoong et al., 1988; White et al., 2007). Long breeding cycles, late and poor flowering, weak juvenile-mature correlations and changes in climate, market demands and emerging pest and disease pressures, pose, however, daunting challenges. The advancement and ultimate output of tree breeding programs are therefore highly conditional on the length of a breeding cycle. To maximize genetic gains per unit time, extensive efforts in tree breeding were devoted to the two fundamental means by which the length of a breeding cycle can be decreased, namely, early selection and accelerated breeding. While the former is based on the understanding of juvenile-mature correlations and the practice of selection on juvenile traits (Williams, 1988), the latter involved early flower induction methods with hormone, stress treatments, and top grafting (Greenwood et al., 1991; Hasan and Reid, 1995). In the late 80s, the advent of DNA markers and two seminal papers on the dissection of discrete Mendelian factors underlying quantitative traits (Lander and Botstein, 1989), and marker-assisted selection (MAS) (Lande and Thompson, 1990), were seen as powerful tools to overcome the time challenge of tree breeding (El-Kassaby, 1982; Neale and Williams, 1991; Grattapaglia et al., 1992; Williams and Neale, 1992).

Here we cover the current state of the science on the general theme of optimizing and accelerating tree breeding using genomic technologies. A brief overview of the path across QTL (quantitative trait loci) and association mapping is first presented. It provides a quick historical perspective on how and why we reached the point of convergence between quantitative genetics and genomics. Furthermore, it also serves to substantiate the fact that reductionist "genetic dissection" approaches or attempts to use single candidate genes or diffuse, indirect information from transcriptomics, have not proven useful for breeding practice and are therefore not discussed further. We focus on the factors that affect and the challenges that remain to fully integrate genomic data in tree breeding in light of the recent promising results of whole-genome prediction. Although making predictions is difficult "especially about the future" as Niels Bohr and others once amusingly said, we attempt to look at the near future of tree breeding when genotyping, whole genome sequencing and computational prediction of genomic features for thousands of trees will not be limiting. We anticipate a future where the progressive advances made possible by routine genomic selection (GS) in multiple large populations will provide a more powerful platform to revisit the discovery of discrete genomic elements that may further enhance whole-genome phenotype prediction and eventually allow direct discrete interventions at the DNA sequence level.

# THE PATH FROM GENETIC DISSECTION TO GENOMIC SELECTION

The prospects of MAS for forest trees was properly doubted early on, limiting its potential value to specific genetic backgrounds resulting from linkage equilibrium of forest tree populations, QTLs interacting with environments and changes of allele frequencies across generations (Strauss et al., 1992). Notwithstanding those sound advices, a number of QTL mapping experiments in the major conifers and eucalypts advanced, encouraged by the promising results of QTL mapping in inbred crops and model systems. In retrospect it is startling to consider how far removed from real-life tree breeding those biparental QTL mapping studies in forest trees were (Grattapaglia, 2017). The motivating hypothesis was that it would be possible to locate and estimate the effects of most individual QTLs underlying complex traits in every population and environment and implement them in tree breeding practice. A substantial number of studies reporting hundreds QTLs in forest trees was reported (reviewed in (Kirst et al., 2004; Grattapaglia et al., 2009; Neale and Kremer, 2011). Although several supposedly "major effect" QTLs were found in those early studies, those proved to be largely overestimated in effect size and underestimated in number. Indeed, subsequent multi-family experiments and larger sample sizes, revealed significantly larger numbers of QTLs with correspondingly smaller effects and inconsistent performance across environments and genetic backgrounds (Ukrainetz et al., 2008; Novaes et al., 2009; Thumma et al., 2010; Gion et al., 2011).

To solve the perceived shortcomings of QTL detection in single mapping families, association genetics was put forward as a way to provide population-wide marker–trait associations applicable to breeding (Neale and Savolainen, 2004). The limitation of methods to interrogate DNA polymorphisms at the time only allowed candidate-gene approaches (Thumma et al., 2005; Gonzalez-Martinez et al., 2007), which were then followed by genome-wide association mapping (GWAS) in several forest tree species (Beaulieu et al., 2011; Cumbie et al., 2011; Cappa et al., 2013; Porth et al., 2013; Mckown et al., 2014). However, irrespective of the marker density used, population size and improved analytical methods to account for low-frequency variants (Fahrenkrog et al., 2017; Müller et al., 2017, 2018; Resende et al., 2017a), only few polymorphisms of very modest effect have been detected, largely still lacking independent validation, the cornerstone for the scientific credibility of GWAS results. In effect, after 25 years of research efforts based on the principle and experimental approaches of genetic dissection of quantitative traits, no translation of such efforts to operational tree breeding was achieved (Grattapaglia et al., 2009; Grattapaglia, 2014; Isik, 2014).

The ineffectiveness of fully dissecting complex traits, and the limitations of MAS has not been exclusive to forest trees, but has also been recognized in crops (Bernardo, 2008) and domestic animals (Dekkers, 2004). This realization has caused a significant shift in the paradigm and technical approach to plant and animal MAS. These fields have now moved from the a priori discovery of discrete marker-trait associations to the capture of the whole-genome effect assisted by DNA marker data, harmonizing with the multifactorial polygenic nature of quantitative genetics, as predicted by Fisher's infinitesimal model (Fisher, 1918). This shift was only possible following the development of improved and accessible genomic technologies that allow interrogating thousands of genome-wide singlenucleotide polymorphisms (SNPs) using cost effective platforms. The concept of using the "total allelic" (Nejati-Javaremi et al., 1997) or "total genomic" (Haley and Visscher, 1998) relationship from marker data to derive estimates of breeding values was later termed "Genomic Selection" (GS) in Meuwissen et al. (2001) seminal paper. It demonstrated that "the selection on genetic values predicted from markers could substantially increase the rate of genetic gain per unit time in animals and plants, especially if combined with techniques to shorten the generation interval."

GS employs a genome-wide panel of markers, typically SNP (single nucleotide polymorphism), whose effects on the phenotype are estimated in a "training" population. In forest trees, such a training set is usually composed by sampling one to a few thousand individuals in progeny trials derived from mating a few dozen parents that constitute the target elite germplasm bred. SNPs are used to build prediction models to be later applied to "selection candidates" for which only genotypes are gathered and phenotypes are predicted by the genomic data. The prediction models are cross-validated against a "validation" population, a set of genetically related individuals to the training set but that did not participate in the estimation of marker effects. A prediction model that delivers a high correlation between the observed and predicted breeding values is subsequently used in the breeding phase to calculate the genomic estimated breeding or total genotypic values of the selection candidates (**Figure 1**). GS fundamentally exploits the genetic relationship between the training population and the prospective selection candidates and to a lesser extent the linkage disequilibrium (LD) between marker data and QTL effects. By precluding prior discrete marker selection derived from rigorous significance tests, and by estimating marker effects in a larger and breeding-representative population of trees, GS captures substantial proportions of the heritability contributed by the large numbers of genomic effects that QTL mapping or GWAS are, on principle, neither able nor intended to capture.

# PERSPECTIVES OF GENOMIC SELECTION IN TREE BREEDING

GS can have a substantial impact on the rate of genetic gain. Let's recall Falconer's breeder's equation (1G = irσA/L) (Falconer, 1989), where i is the selection intensity; r is the accuracy of selection, or heritability in the original Falconer's expression, corresponding to the correlation between the estimated and true breeding values; σ<sup>A</sup> is the additive-genetic standard deviation of the trait under inspection; and L is the generation interval. GS can increase the rate of genetic gain of breeding cycle by increasing (i) because the phenotypes of a much larger number of seedlings in the nursery can be predicted with marker data compared to the number of trees that can be tested in conventional field progeny trials. Additionally, the use of realized genomic relationships is associated with increased accuracy in estimating σ<sup>A</sup> and breeding values (r) (Hayes et al., 2009; El-Dien et al., 2018). Yet, in forest trees, the potentially greatest impact of GS on the rate of genetic progress will originate from decreasing (L). Phenotypes of the selection candidates can be predicted at very early ages, for example, when the seedlings are a few weeks old. GS not only could preclude or at least enhance the efficiency of progeny testing but would also optimize clonal testing phases by advancing a smaller number of pre-selected trees to be assessed in multi-site expanded clonal trials (Resende et al., 2012a) (**Figure 1**). In conifers, GS coupled to somatic embryogenesis for clonal propagation of elite genotypes could allow selecting elite zygotic embryos based on their genomic value saving a significant amount of time, and avoiding the costs and uncertainties currently involved in cryopreservation rescue (Resende et al., 2012b). Additionally, GS will allow simultaneous and early selection for multiple trait in large numbers of individuals, an impossible task in conventional tree breeding that currently largely adopts tandem selection. The final impact of GS would therefore be a significant improvement in the general efficiency of a tree breeding program, provided, of course, that genotyping is inexpensive and GS models are accurate.

What makes GS distinctive from what tree breeders have done so far is that instead of relying uniquely on the expected pedigree, frequently prone to errors, DNA data allows one to build additive and non-additive genomic relationship matrices that more accurately specify the relationships among individuals and simultaneously account for contemporary as well as historical pedigree. This procedure not only allows rectifying pedigree inaccuracies, but critically it captures the within family variation resulting from random Mendelian segregation term. Accordingly, the realized genetic covariances are now based on the actual fraction of the genome that is identical by descent or by state between individuals (Vanraden, 2008). It has been shown in a number of studies in forest trees that realized genomic relationships can produce more accurate predictions than pedigrees alone (Munoz et al., 2014; El-Dien et al., 2015, 2018; Bouvet et al., 2016; Cappa et al., 2017, 2018; Tan et al., 2018). Additionally, the realized genomic relationships of a small subset of the progeny testing population have been effectively combined with a substantially large proportion of un-genotyped individuals in a single-step analysis (Legarra et al., 2009). This method was dubbed "HBLUP," since the best linear unbiased predictors (BLUPs) of breeding values are derived using a single (**H**) genetic covariance matrix that combines the pedigree-based average numerator relationship matrix (**A**) with the marker-based relationship matrix (**G**). HBLUP increases the precision of the genetic parameters generated from traditional pedigrees as shown in recent studies with forest trees (Cappa et al., 2017, 2018; Ratcliffe et al., 2017).

FIGURE 1 | Genomic selection in forest trees. GS begins with the development of a predictive model for the traits of interest (Left panel), which are then used in the GS cycles (Right panel) and progressively updated. GS uses genome-wide markers whose effects on the phenotype are estimated concurrently in a large and representative "training population" of individuals without applying severe significance tests. Markers are retained as forecasters of phenotypes in prediction models to be later applied to "selection candidates" for which only genotypes are collected. The prediction models are cross-validated against a "validation population," a set of individuals of the same reference population that were not used for the estimation of marker effects. Once a prediction model is shown to provide adequate accuracy, it can be used in the GS cycle. An array of selection candidates - full of half-sib families derived from crossing either the original elite parents of the training set, or elite individuals selected in the training set - are genotyped and have their breeding values (GEBV) and/or genotypic values (GEGV; additive + non-additive effects) estimated using the model developed earlier. Top ranked seedlings for GEBV are subject to early flower induction and inter-mated to create the next generation of breeding. Top ranked seedlings for GEGV are clonally propagated and tested in verification clonal trials where elite clones are eventually selected for operational plantation. Additionally, all or subsets of the already genotyped selection candidates are planted in experimental design and phenotyped at the target selection age to provide genotype and trait data for GS model updating as GS generations advance and climate changes.

# GS: ADVANCES AND CHALLENGES IN FOREST TREES

A comprehensive and time-lined list of empirical GS reports in forest tree species was recently published (Grattapaglia, 2017) and it is now updated in **Table 1**. Prediction accuracies have been largely very good, matching or surpassing those obtainable by pedigree-based phenotypic selection, in line with former simulations (Grattapaglia and Resende, 2011; Iwata et al., 2011; Denis and Bouvet, 2013). When considering the practicalities of tree breeding, however, a number of factors that affect the prospects of GS have to be considered, including the composition of training populations, analytical methods, genotype x environment interaction (G∗E), age-age correlations, the long term models performance and cost and quality of DNA marker data. All these have been the subject of research and reviewed in detail in the context of tree breeding by Grattapaglia (2017) and are briefly discussed below in light of the experimental results reported to date in forest trees.

GS experiments in forest trees have capitalized on the existing structure and diversity of breeding populations and their designs that account for the expected relationship between training and prospective selection candidates. Training populations of several hundred to a few thousand individuals sampled from existing progeny trials with effective population sizes consistent with those used in operational breeding have provided good predictions in essentially all studies and for all traits. Analytical methods differing with respect to the presumed trait architecture have been used and compared. In all studies, the ridge regression best linear unbiased prediction (RR-BLUP), with TABLE 1 | Timeline summary of experimental genomic selection studies in forest tree species published to date.


Studies that investigated different aspects of GS but used partially or totally the same breeding population data (genotypes and/or phenotypes) are listed in the same entry.

marker effects treated as random, normally distributed with common variance, has been very efficient. RR-BLUP has been equivalent to Genomic BLUP (GBLUP), providing the best conciliation between prediction efficiency and fast computation, while also revealing that essentially all major traits in forest trees fit the infinitesimal model (Resende et al., 2012c; Beaulieu et al., 2014b; Ratcliffe et al., 2015; Isik et al., 2016; Müller et al., 2017; Tan et al., 2017; Chen et al., 2018). Still, additional research in this area is warranted especially as prior functional information on genomic regions of slightly larger effect might emerge, for example, for disease resistance traits as shown for prediction of fusiform rust resistance in loblolly pine (Resende et al., 2012c).

Ever since the first experimental GS studies in forest trees (Grattapaglia et al., 2011; Resende et al., 2012a,b), it became clear that prediction accuracies are mainly driven by genetic relationship between training and validation sets and are dependent on G∗E and age-age correlations. Predictions will be most effective at the same age and in the same environment where the prediction model was trained. Further studies in conifers (Zapata-Valenzuela et al., 2013; Beaulieu et al., 2014a,b; El-Dien et al., 2015; Ratcliffe et al., 2015; Thistlethwaite et al., 2017; Chen et al., 2018), and eucalypts (Müller et al., 2017; Tan et al., 2017), corroborated the key significance of genetic relationships and the impact of G∗E and age-age correlations, consistent with findings in domestic animals and crop plants (Lin et al., 2014; Van Eenennaam et al., 2014). While data from G∗E or age-age correlation studies will shed light on what to expect from genomic prediction, assuring that the target environment of future selection candidates will be equivalent to the one where models were originally trained is a challenging issue for GS (Heslot et al., 2015). Regular retraining of GS models by incorporating phenotypes collected in breeding generations closer to the current (Iwata et al., 2011) are expected to mitigate this problem, and will be especially essential in light of climate fluctuations. Research efforts in this area are highly needed and will come as GS programs advance, coupled to innovations in phenotyping platforms that integrate remote sensing, spatial and geographic information systems (Dungey et al., 2018).

Notwithstanding the encouraging estimates of predictive ability, most studies in forest trees used contemporary training and validation sets and thus have not yet been able to adequately assess the realized performance of GS across generations at a larger scale, but results on this topic are imminent. However, given that the relationship between parents and progeny are accurately captured by DNA marker data, and environments should be relatively stable across close generations, it is expected that the performance will be equivalent to current estimates in contemporary sets. In Pinus pinaster, preliminary promising results of inter-generation prediction were reported by training models with parents and progeny in the same set (Isik et al., 2016), and later using parents and grandparents to predict in the subsequent generation, albeit with limited effective population sizes (Bartholome et al., 2016). However, this outcome was not observed in a three-generation study of Pseudotsuga menziesii (El-Kassaby, personal communication). Model updating strategies will therefore be crucial to counteract the decay of relatedness and LD between the original training set and selection candidates as generations of breeding advance, as shown by simulations for eucalypt breeding (Denis and Bouvet, 2013).

In the past 2 years, a number of additional experimental GS studies have been reported (Cappa et al., 2017, 2018; Duran et al., 2017; Lenz et al., 2017; Müller et al., 2017; Ratcliffe et al., 2017; Tan et al., 2017, 2018; Thistlethwaite et al., 2017) (**Table 1**; Resende et al., 2017b; Chen et al., 2018; De Moraes et al., 2018; El-Dien et al., 2018; Kainer et al., 2018; Suontama et al., 2018). Many of them in species of Eucalyptus for which public highthroughput genotyping platforms of DArT (Sansaloni et al., 2010) and SNPs (Silva-Junior et al., 2015) have been available. Access to such resources for eucalypts also allowed improved precision of genetic parameter estimates, pedigree reconstruction and inbreeding studies (Telfer et al., 2015; Klápšte et al., 2017; ˇ Müller et al., 2017). This clearly points to the fact that the advancement of research and operational adoption of genomics into breeding is strongly dependent on the availability of public, robust, cost-accessible and portable SNP genotyping platforms. The success of GS or any other genomic-based breeding approach will rely on high data quality, as one has to be able to genotype SNPs across generations with high reproducibility and negligible missing data. Although shallow whole genome sequencing (Kainer et al., 2018), genotyping-by-sequencing (GbS) (El-Dien et al., 2015) and sequence capture (Thistlethwaite et al., 2017; Chen et al., 2018; De Moraes et al., 2018) have also been used for GS in trees, currently fixed SNP arrays provide the gold standard of data reproducibility across samples batches and laboratories. Additionally, SNP array data are breeder friendly, available from multiple service providers, easily manageable and stored without the cost and logistics of sequence data transfer, storage and analysis. This and a significant recent drop in array costs, making them as cost-effective as sequence-based methods, has motivated a large international effort to develop SNP arrays for all main planted conifers (F. Isik pers. comm.), and a second generation, higher density optimized SNP array for species of Eucalyptus and Corymbia (O.B. Silva-Junior and D. Grattapaglia pers. comm.). The use of a common SNP genotyping array across breeding programs of different organizations will be a key issue to provide the necessary economy of scale to integrate genomics into breeding.

# A LOOK TO THE NEAR FUTURE

With easy access to SNP genotyping and positive results in essentially all major forest trees, we are now at the brink of widespread adoption of genomic prediction data, thus realizing the early promises of MAS in forest tree breeding. In addition to the outstanding research challenges discussed above, a promising area to enhance the value of genomic data will involve the inclusion of environmental co-variables in GS models as already shown in crops (Jarquin et al., 2014; Saint Pierre et al., 2016). The integration of multi-environment trials data will be strategic for predicting performance in unobserved environments, identifying suitable sites for evaluating or deploying genetic material and predicting climate change scenarios. While predicting the performance of untested clones or families can be accurate when there is knowledge of genomic relatedness, correspondingly, the performance in yet unobserved or future environments could be forecasted if there is data about those environments as shown for recommendation of Eucalyptus clones (Marcatti et al., 2017). Resources such as ClimateNA (Wang et al., 2016) and the NASA POWER project (Stackhouse, 2014) offer multitudes of historical and predicted future environmental data. Because environmental variables that define the correlation between growing conditions are trait specific, research on those most appropriate for inclusion in genomic prediction models will be essential.

Another area that will demand research comes from the evolution of sequencing technologies in moving from sparse SNP data to sequence data for GS. Apart from the challenge and cost of managing massive next generation sequencing data sets for large numbers of individuals in a breeding program framework, in theory, if sequence data were used instead of dense SNPs, accuracy should increase because rare causal alleles would be better captured in predictive models. Until now, however, simulation and experimental studies in domestic animals have shown that whole-genome sequence data does not increase accuracy when LD has a slow decay pattern (Macleod et al., 2014; Forneris et al., 2017; Vanraden et al., 2017), unless very precise prior estimates on the functionality of particular SNPs exist (Perez-Enciso et al., 2015). Increasing the availability and quality of functional data on specific genomic regions might therefore, be warranted.

The success of whole-genome prediction and the poor outcome of dissection approaches in identifying functional quantitative trait nucleotides, have contributed nevertheless to an exciting new perspective for the study of complex trait variation. A clear pattern has emerged in annual plants indicating that the association signal of common variants in large sample sizes, although spread across the entire genome, is heavily concentrated in regulatory DNA in open chromatin marked by deoxyribonuclease hypersensitive sites (Sullivan et al., 2014; Rodgers-Melnick et al., 2016; Swinnen et al., 2016). In these plants, cis-regulatory elements (CREs) associated with open chromatin such as promoters and enhancers regulating gene expression may contain close to half of all variants influencing traits. As GS implementation advances and large datasets of several thousand trees across unrelated populations are collected, opportunities will emerge for joint and meta-GWAS, as recently described in Eucalyptus (Müller et al., 2018). At the same time, chromatin accessibility and gene network data as reported for Eucalyptus (Hussey et al., 2017) and Populus (Zinkgraf et al., 2017) will become increasingly available which, combined with data from highly powered SNP-trait association studies, should provide new avenues for computational predictive discovery of key regulatory elements in the genome. The progress of

# REFERENCES


such integrative approaches based on large genotype and phenotype datasets might, thus, result in additional clues toward understanding the complex connections and interactions between discrete genomic elements and continuous phenotypic trait variation, ultimately enhancing tree breeding practice.

#### AUTHOR CONTRIBUTIONS

DG drafted the first version of the manuscript and all co-authors subsequently contributed to it by editing and formatting the final version.

# ACKNOWLEDGMENTS

This work was supported by (a) PRONEX-FAP-DF grant NEXTREE 193.000.570/2009 and NEXTFRUT 0193.001198/2016, CNPq grant 400663/2012/0 and a CNPq fellowship productivity grant 308431/2013/8 to DG; (b) NSERC Discovery Grant, Genome British Columbia, and Genome Canada to YE-K; (c) Agencia Nacional de Ciencia y Tecnología (PICT 2016 1048) to EC.

genotyping platform in Norway spruce. bioRxiv [Preprint]. doi: 10.1101/2 93696


taeda - prospects for genomic selection. Tree Genet. Genomes 6, 1307–1318. doi: 10.1007/s11295-012-0516-5


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Grattapaglia, Silva-Junior, Resende, Cappa, Müller, Tan, Isik, Ratcliffe and El-Kassaby. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Manipulation of Growth and Architectural Characteristics in Trees for Increased Woody Biomass Production

#### Victor B. Busov\*

School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI, United States

Growth and architectural traits in trees are economically and environmentally important and thus of considerable importance to the improvement of forest and fruit trees. These traits are complex and result from the operation of a number of molecular mechanisms. This review will focus on the regulation of crown architecture, secondary woody growth and adventitious rooting. These traits and processes have significant impact on deployment, management, and productivity of tree crops. The majority of the described work comes from experiments in model plants, poplar, apple, peach, and plum because these species allow functional analysis of the involved genes and have significant genomics resources. However, these studies convincingly show conserved mechanisms for elaboration of specific growth and architectural traits. The conservation of these mechanisms suggest that they can be used as a blueprint for the improvement of these traits and processes in phylogenetically diverse tree crops. We will specifically consider the involvement of flowering time, transcription factors and hormone-associated genes. The review will also discuss the impact of recent technological advances as well as the challenges to the dissection of these traits in trees.

Keywords: hormones, transcription factors, woody biomass growth, molecular mechanisms, crown architecture, adventitious rooting, tree biotechnology

# INTRODUCTION

Intensive forest plantation can alleviate the harvesting pressure on native forests via allowing production of the same or larger amount of wood on a much smaller land base (Paquette and Messier, 2010). Improved genetics through breeding is one, if not the leading factor in this increased productivity (Fenning and Gershenzon, 2002; Ruotsalainen, 2014). However, tree breeding is slow due to long generation times, traits that need a long time to evaluate and complex genetic architecture of these traits (Fenning and Gershenzon, 2002). Understanding the involved genetic mechanism could significantly accelerate the process through both conventional breeding and genetic engineering.

Here we review the current knowledge about the molecular mechanisms that underpin three developmental processes in trees with significant impact on intensive plantation deployment, management and growth. The review will focus on mechanisms and genes that can provide positive effects and thus are of breeding value rather than exhaustively discuss progress in the dissection of each process. Where available, the reader will be pointed to reviews that deal with these processes in a more comprehensive manner.

#### Edited by:

Isabel Allona, Universidad Politécnica de Madrid (UPM), Spain

#### Reviewed by:

Thomas Teichmann, Georg-August-Universität Göttingen, Germany Raili Ruonala, University of Helsinki, Finland

> \*Correspondence: Victor B. Busov vbusov@mtu.edu

#### Specialty section:

This article was submitted to Plant Biotechnology, a section of the journal Frontiers in Plant Science

Received: 29 June 2018 Accepted: 26 September 2018 Published: 16 October 2018

#### Citation:

Busov VB (2018) Manipulation of Growth and Architectural Characteristics in Trees for Increased Woody Biomass Production. Front. Plant Sci. 9:1505. doi: 10.3389/fpls.2018.01505

# MANIPULATION OF CROWN ARCHITECTURE

fpls-09-01505 October 13, 2018 Time: 10:4 # 2

Crown architecture is a compound trait resulting from the position, size, periodicity, angle and density of the branches. Crown characteristics affect plantation density, interception of photosynthetic light and quality of the derived wood. Depending on the plantation purpose, the direction and extent to which these characteristics need to be changed can vary. Branches originate in axillary meristems (AMs) and thus establishment and outgrowth of AMs has a profound effect on branch characteristics and crown architecture. AM initiation is exclusively characterized in herbaceous plants and there is no information about the effect of these genes in trees. We therefore will not cover here these developments. Excellent reviews on AM initiation discuss in detail these genes and mechanisms (Janssen et al., 2014; Yang and Jiao, 2016).

#### Branch Outgrowth

Once established, the AM outgrowth is typically suppressed, a phenomenon known as apical dominance. Auxin is central to the establishment and maintenance of apical dominance (**Figure 1**). The regulatory roles of auxin in apical dominance are indirect and are explained by the canalization and secondary messengers' models (Domagalska and Leyser, 2011; Teichmann and Muhr, 2015). However, only the latter provides genes and mechanisms manipulated in trees and is thus covered here. According to this model, auxin synthesized in the shoot apex, moves basipetally to the roots to generate a secondary signal that travels acropetally to regulate bud outgrowth. Cytokinin was the first candidate for a second messenger because it has a strong positive effect on axillary bud outgrowth when exogenously applied (**Figure 1**). However, cytokinin acropetal transport was not able to activate bud outgrowth (Faiss et al., 1997). This led to the discovery of the shoot branching hormone strigolactones (SLs). SLs have strong negative effects on bud outgrowth (**Figure 1**), are synthesized in roots, acropetally transported to shoots and biosynthetic genes are positively regulated by auxin. SLs metabolic and signaling genes are strong regulators of bud outgrowth in a number of plant species including several trees (Domagalska and Leyser, 2011; Muhr et al., 2016; Foster et al., 2018). RNAi knockdown of poplar and apple orthologs of SLs biosynthetic genes resulted in increased sylleptic branching (branches developed from lateral buds that have not undergone dormancy) (Muhr et al., 2016; Foster et al., 2018).

# Branch Angle

Significant progress has been made in trees in elucidating the mechanism underpinning branch angle characteristics. Using an innovative sequencing approach and a distinct peach mutant with acute branch angle, the causative gene was isolated to be TILLER ANGLE CONTROL1 (TAC1) (Dardick et al., 2013). TAC1 was originally found to control tiller angle in rice (Yu et al., 2007). TAC1 belongs to a small family of genes. All the genes in the family, characterized to date in several plant species, including trees (poplar and plum) control branch or lateral root angles (**Figure 1**; Xu et al., 2017; Hollender et al., 2018). Depending on presence of a conserved domain, members of the family can increase (TAC1) or decrease (LAZY1) branch angles (**Figure 1**; Dardick et al., 2013; Hollender and Dardick, 2015; Xu et al., 2017).

# Roles of Gibberellins

Gibberellins control stem elongation, but can also regulate crown characteristics (**Figure 1**). Gibberellin 2-oxidase (GA2ox) overexpression leads to low levels of bioactive GAs, and proliferation of long sylleptic branches at a wide, almost perpendicular angle to the main stem (Mauriat et al., 2011; Zawaski et al., 2011). After 2 years in the field, GA2ox overexpressors produced a wide oval crown (Zawaski et al., 2011). A similar effect was also observed in turf grass and rice (Agharkar et al., 2007; Lo et al., 2008). These effects are possibly mediated via the GAs regulation of PIN auxin efflux carrier abundance (Willige et al., 2011; Lofke et al., 2013; Mauriat et al., 2014). In contrast, modifications of GA signaling via DELLA domain proteins produces a highly compact crown consisting of short branches with narrow acute angle (Zawaski et al., 2011). The effect of DELLA domain proteins on branching may be due their interactions with the transcription factor BRANCHED1 (Daviere et al., 2014).

#### Flowering and Crown Architecture

The determinacy of the meristem is genetically programmed, heritable and significantly affects plant architecture, including crown characteristics in trees (McGarry and Ayre, 2012). Indeterminate meristems typically produce monopodial growth characterized by a pronounced primary stem. In contrast, plants with determinate meristems show sympodial growth, a process of repeated loss of the shoot apical meristem (SAM) through terminal differentiation and lateral outgrowth from the axillary meristem resulting in a compound shoot architecture. Monopodial and sympodial growth types result from differences in the expression of genes and localization of proteins from the CENTRORDIALIS/TERMINAL FLOWER/SELF PRUNING (CETS) family that control flowering (McGarry and Ayre, 2012). CETS genes form a small gene family in Arabidopsis and other plant species. Very small (few amino acid) changes in the sequence of the proteins can reverse their function (Hanzawa et al., 2005). For example, FT promotes, while a close family member, TFL1 inhibits flowering (Hanano and Goto, 2011). FT is a mobile signal originating in the leaf that moves through the phloem stream to reach the shoot or axillary meristems and initiates terminal flower development (Pin and Nilsson, 2012). TFL1 plays an antagonistic role to FT in the SAM (Kobayashi et al., 1999). Low and high FT/TFL1 ratio in the SAM results in indeterminate and determinate growth respectively, (McGarry and Ayre, 2012; **Figure 1**). This model has been confirmed through transgenic overexpression of FT orthologs in several tree species (Hsu et al., 2006, 2011; Srinivasan et al., 2012; Klocko et al., 2016). FT overexpression leads to early flowering and highly branched, sympodial growth. Increase in FT/TFL1 balance via downregulation of TFL1/CEN genes in apple leads to similar effects as with FT overexpression (Kotoda et al., 2006;

Flachowsky et al., 2012). RNAi downregulation of two TFL1/CEN-like homologs in poplar (PopCEN1 and PoCEN2) produced similar but more moderate flowering and architectural phenotypes (Mohamed et al., 2010). Interacting factors and regulators of FT and TFL1/CEN, can also produce changes in tree architecture. Overexpression of CsRAV1, a chestnut ortholog of TEMPRANILLO, a regulator of FT (Castillejo and Pelaz, 2008), led to upregulation of a poplar ortholog of FT (PttFT2) expression (Triozzi et al., 2018) and consequently to increased branching both under greenhouse and field conditions (Moreno-Cortes et al., 2012, 2017). Under field conditions, increased branching led to increased biomass (Moreno-Cortes et al., 2017). Similarly, overexpression of a poplar ortholog of GIGANTEA, a positive regulator of FT, upregulated PttFT2 and increased sylleptic branching in poplar (Ding et al., 2018). FT interacts with FD to promote flowering and overexpression of poplar FD homolog led to precocious flowering and sympodial, highly branched growth (Parmentier-Line and Coleman, 2015).

# INCREASE OF SECONDARY WOODY GROWTH

Secondary growth originates in a lateral meristem known as vascular cambium, which in trees shows exaggerated and perennial activity, compared to herbaceous plants, resulting in production of massive amounts of conductive and supportive tissues, referred to as wood (Helariutta and Bhalerao, 2003; Elo et al., 2009; Barra-Jimenez and Ragni, 2017). Bifacial periclinal division of the cambium cells, followed by growth and differentiation results in production of phloem/bark to the outside and xylem/wood to the inside of the tree trunk (Helariutta and Bhalerao, 2003; Zhang et al., 2014). Excellent reviews comprehensively discuss the process (Groover, 2005; Demura and Fukuda, 2007; Groover et al., 2010; Spicer and Groover, 2010; Mizrachi and Myburg, 2016). Here we focus on genes and mechanisms that have positive effects on secondary woody growth and thus are of potential breeding/improvement value (**Figure 2A**). The only exception would be the genes that affect bark development, Bark is typically considered as waist byproduct and thus decrease of bark production would be favored. However, notable exceptions where bark increase would be the goal would be special plantation for production of cork as well as breeding for resistance to pests, fires and drought.

# Gibberellins

The first demonstration of increased secondary growth was via transgenic modifications of gibberellin biosynthesis (Eriksson et al., 2000). Overexpression of the Arabidopsis GA-20 oxidase (GA20ox), a key biosynthetic enzyme, resulted in significant (2-fold) increase in wood production (**Figure 2**). Similarly, overexpression of pine PdGA20ox1 in poplar resulted in nearly 3-fold increase in woody biomass (Jeon et al., 2016; **Figure 2A**). In addition, overexpression of the poplar orthologs of the GA receptor PttGIBBERELLIN-INSENSITIVE DWARF1 (PttGID1) resulted in similar wood biomass enhancement (Mauriat and Moritz, 2009; **Figure 2A**). Increase in bioactive gibberellins also increased fiber length and cellulose/xylan content (Eriksson et al., 2000; Jeon et al., 2016). Increase in GA signaling, however, did

not result in changes in fiber length. The increased GA synthesis and signaling in the transgenics poplars, had negative effect on root development, decreased expression of defense-related genes and resulted in poor leaf development (Eriksson et al., 2000; Mauriat et al., 2014; Jeon et al., 2016). These negative pleiotropic effects resulting from the constitutive overexpression were mitigated by using a xylem-specific promoter (Jeon et al., 2016). The xylem specific expression resulted in similar increases of wood biomass (Jeon et al., 2016). However, tissuespecific upregulation of PttGID1 using a different xylemspecific promoter did not result in any increases in woody biomass (Mauriat and Moritz, 2009), suggesting that different promoter::gene combination can have specific effects. Cisgenic modifications of several poplar GA 20-oxidase genes led to no pleiotropic effects and increased wood biomass and fiber length but the gains were more modest than these obtained with the constitutive and xylem-specific promoters (Han et al., 2011).

## Cytokinins

The regulatory role(s) of cytokinins during secondary woody growth has been known (Nieminen et al., 2008). However, it was only recently demonstrated that modification of cytokinin biosynthesis can have a positive effect on secondary woody growth (Immanen et al., 2016). Transgenic poplars transformed with the Arabidopsis AtIPT7 (key cytokinin biosynthetic gene) driven by a xylem-specific promoter, showed significant (nearly 2-fold) increases in secondary growth and no negative pleiotropic effects (Immanen et al., 2016) (**Figure 2A**).

# Brassinosteroids

Recent evidence suggests that both brassinosteroid (BR) biosynthesis and signaling has a positive effect on woody biomass production (Noh et al., 2015; Jin et al., 2017; Shen et al., 2018; **Figure 2**). Overexpression of key biosynthetic genes (PtoDWF4 and CYP85A3) led to increased brassinosteroid concentrations and woody biomass (Jin et al., 2017; Shen et al., 2018). The productivity gains however, were much smaller than, these observed with the manipulations of GA and cytokinin biosynthesis/signaling. The increase in brassinosteroids led to longer fibers and no or little impact on cell wall chemistry (Jin et al., 2017; Shen et al., 2018). Similar results were obtained with manipulation of brassinosteroid signaling. Overexpression of a poplar ortholog of BEE3 (Brassinosteroid Enhanced Expression 3), a transcription factor involved in BR signaling increased stem, leaf and root biomass. As with the enhancement of BR biosynthesis, gains in wood biomass were less than these observed with GA and cytokinin and ranged between 25 and 50%. The modifications of both BRs biosynthesis and signaling did not cause any negative pleiotropic effects, despite the strong constitutive promoters used in both studies.

# Small Protein Signaling in the Cambium

Cambial cell division in Arabidopsis is controlled by a protein ligand receptor complex (Fisher and Turner, 2007; Hirakawa et al., 2008; Etchells and Turner, 2010). The ligand is the small CLE41 protein, produced in the phloem and transported into the cambium, where it interacts with the PXY receptor to stimulate cambium cell division (Fisher and Turner, 2007; Hirakawa et al., 2008; Etchells and Turner, 2010). Recently, constitutive expression of aspen orthologs of the ligand and receptor in transgenic poplar trees resulted in highly pleiotropic and negative effects on growth and tissue organization (Etchells et al., 2015). However, when the ligand and receptor were simultaneously upregulated in their native tissue domains employing tissue specific promoters, not only that the negative effects were completely mitigated, but also the double transgenic plants showed a nearly double increases in wood production (**Figure 2A**).

## Secondary Phloem and Bark Development

Secondary growth also yields secondary phloem and bark. Using activation tagging in poplar, the first gene that regulates secondary phloem development was discovered (Yordanov et al., 2010). The gene encodes a transcription factor of the LATERAL ORGAN BOUNDARIES (LBD) gene family that is a positive regulator of secondary phloem development (Yordanov et al., 2010; Yordanov and Busov, 2011). Transgenic plants overexpressing the gene produced more, while dominant negative modification of the protein produced less secondary phloem (**Figure 2A**).

# GENES PROMOTING ADVENTITIOUS ROOTING

Adventitious rooting (AR) is root formation from organs and tissues that typically do not produce roots. The process is most important in forestry and horticulture for clonal propagation and deployment of elite germplasm. The cellular and molecular events underlying AR has been reviewed elsewhere (Diaz-Sala, 2014; Legue et al., 2014; Pacurar et al., 2014). Here we focus on several genes that have been functionally characterized in trees and provide strong positive effects on AR formation.

# Controls of Cell Proliferation Provide Points for AR Manipulation

AINTEGUMENTA (ANT) and ANT-like (AIL) genes are a group of eight AP2 transcription factors in Arabidopsis with important functions in regulation of meristem establishment and maintenance as well as organ growth and size (Horstman et al., 2014). One of the members of the AIL family from poplar (AIL1), showed induction during AR primordia activation (Rigal et al., 2012) and overexpression of the gene caused increase (**Figure 2B**), while RNAi downregulation decrease in the number of ARs. AIL1 transcriptionally regulates Cyclin D3.1 by binding to its promoter (Karlberg et al., 2011). Thus, AIL1 promotes AR at least in part via activation of cell proliferation.

The BIG LEAF/STERILE APETALA (BL/SAP) gene from poplar has a positive effect on AR formation when ectopically expressed (Yordanov et al., 2017; **Figure 2B**). BL is an F box protein that regulates leaf size in poplar and Arabidopsis through control of cell proliferation (Wang et al., 2016; Li et al., 2018). BL/SAP targets proteins for degradation that negatively regulate AIL genes (PLETHORA 1 and 2) (Horstman et al., 2014; Wang et al., 2016; Li et al., 2018). Thus, BL likely regulates AR formation through promoting degradation of a repressor(s)of the AIL-like genes (**Figure 2B**), which has a positive effect on cell proliferation and meristem organization.

Both AIL1 and BL, when overexpressed have significant pleiotropic effects (Rigal et al., 2012; Yordanov et al., 2017) and to serve as biotechnological tool for increased AR formation, will require inducible or tissue-specific upregulation.

# Gibberellins

GAs inhibit AR likely thought interfering with polar auxin transport (Mauriat et al., 2014). Increase and decrease in GA biosynthesis and signaling leads to decreased and increased AR (Busov et al., 2006; Gou et al., 2010; Elias et al., 2012). As mentioned earlier, GAs have strong positive effects on secondary woody growth and thus the decrease of AR may present an impediment for the clonal propagation of transgenics with increased GAs biosynthesis. Alternatively, decrease in bioactive GAs and block of signaling, which promotes AR formation, leads to various levels of dwarfism. Dwarfism is a desirable trait in fruit and ornamental tree crops and the increased AR formation would provide an additional benefit for the propagation of these genotypes. In forestry, however, extreme dwarfism may lead to loss in biomass productivity and thus, this effect can be either mitigated via increased girth growth using gene stacking with other transgenes that promote radial expansion (see above) or selection of semi-dwarfism genotypes (Elias et al., 2012).

## Activation Tagging Discovery of AR-Involved Genes

Using activation tagging (AT), the poplar gene ETHYLENE RESPONSE FACTOR 3 (ERF003) was shown to have positive effect on AR formation (Trupiano et al., 2013). In addition to ERF003, several other AT mutants affected in AR and associated with ethylene signaling and biosynthesis were also discovered (Trupiano et al., 2013). These genes however, have not been recapitulated through re-transformation experiments and thus their involvement and utility in manipulation of AR formation is still tentative.

# FUTURE OUTLOOK

## Improvements in Transformation Technologies

Transformation is the golden standard for asserting gene function and preferred method of choice in delivering advanced

editing tools like CRIPSR/Cas9 system (Busov et al., 2005; Altpeter et al., 2016; Ran et al., 2017). However, transformation technologies are slow, inefficient, require cumbersome tissue culture processes and remain largely genotype-specific (Busov et al., 2005; Altpeter et al., 2016; Baltes et al., 2017), even in genera considered as 'easy-to-transform' like poplars. Thus, major strides in understanding and improving transformation technologies are needed (Altpeter et al., 2016).

#### Understanding Promoter Architecture

The need for research in isolation and engineering artificial promoters for precise targeting of transgenic manipulations has been known and well-recognized. However, research in this area has been lagging behind. The advances in gene editing and synthetic technologies would further necessitate better understanding promoter architecture in order to being able to effectively modify and design level and specificity of promoter activities.

## Application of CRISPR Technology

New technological advances in gene editing technologies like CRISPR/Cas9 promise to revolutionize tree improvement (Tsai and Xue, 2015). The CRISPR/Cas9 was successfully implemented in a poplar tree (Fan et al., 2015; Zhou et al., 2015). CRISPR/Cas9 compared to RNAi produced stronger and more uniform phenotypic effects when the same gene was targeted (Zhou et al., 2015). Although now, the majority of the CRISPR/Cas9 applications involve generation of knockouts via non-homologous end joining, a significant progress is also made in the application of CRISPR/Cas9 for gene editing through homologous recombination (Schaeffer and Nakata, 2015). However, the latter is still in developmental stages for plants. CRIPSR/Cas9 can also alleviate the regulatory burdens associated with field-testing because, in some countries CRISPR/Cas9-modified genotypes are considered as a non-GMO type of modification.

## REFERENCES


# Using Induced and Natural Mutants

Application of natural or induced mutants in tree research has been rare. However, significant strides have been made in both approaches (Busov et al., 2010; Dardick et al., 2013). As described above, using activation tagging in poplar, genes important for secondary growth (Yordanov et al., 2010, 2014, 2017) and AR formation (Trupiano et al., 2013) were discovered. In addition to induced mutants, many natural tree mutants exist. The new sequencing technologies allow efficient mapping of the causative mutations, (Dardick et al., 2013). These approaches can be further used for identifying genes affecting various aspects of tree growth and development.

# Understanding Integrative System Controls

A significant progress has been made in identification of individual genes and pathways regulating different traits. However, it has been long known at an organismal level that the various processes are highly coordinated at tissue and organismal level but also in response to various environmental cues. Identification of the coordinating genes, signals and mechanisms can lead to more integrative manipulation of one or several traits.

# AUTHOR CONTRIBUTIONS

The author confirms being the sole contributor of this work and has approved it for publication.

# FUNDING

This research was supported by grants from the Plant Feedstock Genomics for Bioenergy: A Joint Research Program of USDA and DOE (2009-65504-05767 and DE-SC0008462), USDA McIntire Stennis Fund (Grant 1001498), and USDA BRAG (grant 2016- 33522-25626).



dormancy requirement, and continuous flowering. PLoS One 7:e40715. doi: 10.1371/journal.pone.0040715


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Busov. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fpls-09-01505 October 13, 2018 Time: 10:4 # 8

# Engineering Tree Seasonal Cycles of Growth Through Chromatin Modification

Daniel Conde<sup>1</sup>† , Mariano Perales<sup>1</sup> , Avinash Sreedasyam<sup>2</sup> , Gerald A. Tuskan<sup>3</sup> , Alba Lloret<sup>4</sup> , María L. Badenes<sup>4</sup> , Pablo González-Melendi1,5, Gabino Ríos<sup>4</sup> and Isabel Allona1,5 \*

<sup>1</sup> Centro de Biotecnología y Genómica de Plantas, Instituto de Investigación y Tecnología Agraria y Alimentaria, Universidad Politécnica de Madrid, Madrid, Spain, <sup>2</sup> HudsonAlpha Institute for Biotechnology, Huntsville, AL, United States, <sup>3</sup> Oak Ridge National Laboratory, Center for Bioenergy Innovation, Oak Ridge, TN, United States, <sup>4</sup> Instituto Valenciano de Investigaciones Agrarias, Moncada, Spain, <sup>5</sup> Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid, Madrid, Spain

#### Edited by:

Raúl Alvarez-Venegas, Centro de Investigación y de Estudios Avanzados (CINVESTAV), Mexico

#### Reviewed by:

Serena Varotto, University of Padua, Italy Andrea Miyasaka Almeida, Universidad Mayor, Chile

#### \*Correspondence:

Isabel Allona isabel.allona@upm.es orcid.org/0000-0002-7012-2850

#### †Present address:

Daniel Conde, School of Forest Resources and Conservation, University of Florida, Gainesville, FL, United States

#### Specialty section:

This article was submitted to Plant Biotechnology, a section of the journal Frontiers in Plant Science

Received: 08 November 2018 Accepted: 19 March 2019 Published: 05 April 2019

#### Citation:

Conde D, Perales M, Sreedasyam A, Tuskan GA, Lloret A, Badenes ML, González-Melendi P, Ríos G and Allona I (2019) Engineering Tree Seasonal Cycles of Growth Through Chromatin Modification. Front. Plant Sci. 10:412. doi: 10.3389/fpls.2019.00412 In temperate and boreal regions, perennial trees arrest cell division in their meristematic tissues during winter dormancy until environmental conditions become appropriate for their renewed growth. Release from the dormant state requires exposure to a period of chilling temperatures similar to the vernalization required for flowering in Arabidopsis. Over the past decade, genomic DNA (gDNA) methylation and transcriptome studies have revealed signatures of chromatin regulation during active growth and winter dormancy. To date, only a few chromatin modification genes, as candidate regulators of these developmental stages, have been functionally characterized in trees. In this work, we summarize the major findings of the chromatin-remodeling role during growthdormancy cycles and we explore the transcriptional profiling of vegetative apical bud and stem tissues during dormancy. Finally, we discuss genetic strategies designed to improve the growth and quality of forest trees.

#### Keywords: Populus, epigenetics, growth-dormancy, methylation, phenology, chromatin remodeling

# INTRODUCTION

In temperate and boreal regions, a perennial plant's interannual life cycle comprises multiple vegetative growth, and dormancy cycles. To guarantee survival, trees synchronize their growth and flowering times with the most favorable climate conditions of the year by following instructive information of annual photoperiod, and temperature patterns (Ding and Nilsson, 2016; Singh et al., 2016). For example, prior to winter, cell division in meristematic tissues is arrested and a protective structure is formed, i.e., the apical bud, in which a quiescent shoot apical meristem (SAM) and embryonic leaves are sheltered during the winter. In several tree species, such as poplar (Populus sp.), photoperiod plays a major role in cell division arrest, and bud formation (Fennell and Hoover, 1991; Cooke et al., 2012; Petterle et al., 2013). Such trees are able to sense the shortening of day length and thus anticipate the winter period. In other tree species, such as apple, growth cessation, and bud formation are controlled by temperature (Tanino et al., 2010). Once endodormancy has been established, low non-lethal temperatures progressively lead to dormancy release (chilling requirement). Once fulfilled, dormancy is released while growth cessation is maintained via external signals (ecodormancy), mainly low temperatures (Ding and Nilsson, 2016).

Finally, spring growth-promoting temperatures produce bud break in vegetative buds, followed by vegetative growth.

These developmental processes require orchestration of specific temporal and spatial patterns of gene expression. Chromatin-modification-based regulation of gene expression during dormancy-growth cycles have been proposed to play a role in the organization of these patterns based on the identification of spatio-temporal patterns of epigenetic marks and the seasonal expression profiling of chromatin modification genes (Schrader et al., 2004; Ruttink et al., 2007; Santamaría et al., 2009; Karlberg et al., 2010; Conde et al., 2013, Conde et al., 2017a; Howe et al., 2015; Kumar et al., 2016). Epigenetic targets arise from covalent modifications of DNA and histones that will determine the accessibility of the transcription machinery to chromatin. Unlike in animals where epigenetics targets are established during embryonic development, in plants, epigenetic mechanisms also operate during post-embryonic developmental stages, contributing to plant developmental plasticity (Henderson and Jacobsen, 2007).

In this perspective article, we review the most recent evidences of DNA methylation and histone modification roles during annual growth-dormancy cycles in trees. In addition, we explore RNA-seq-based gene expression profiles in poplar vegetative apical bud and stem tissues, discovering seasonal expression patterns of genes involved in DNA methylation machinery. Finally, we discuss future strategies focused on chromatin remodeling for tree biotechnology applications.

# DNA METHYLATION AND GROWTH-DORMANCY CYCLES

## DNA Methylation Patterns During Winter Dormancy

Genomic DNA methylation refers to the addition of a methyl group to the carbon atom at the fifth position of a cytosine (5 mC). DNA methylation plays a major role in gene expression, genome protection and stability through transposon silencing, DNA recombination, and other biological processes (Teixeira and Colot, 2010; Mirouze et al., 2012; Saze et al., 2012). Variation in DNA methylation has impact on plants phenotypic plasticity (Bossdorf et al., 2008; Bräutigam et al., 2013; Kooke et al., 2015). Several studies have revealed DNA methylation patterns during growth-dormancy cycles, both in buds, and stems. In chestnut, Santamaría et al. (2009) quantified higher levels of gDNA methylation and lower H4 acetylation levels in vegetative dormant apical buds compared to actively growing apices. Accordingly, poplar stems showed higher levels of gDNA methylation and lower levels of acetylation of lysine K8 of histone H4 during winter dormancy compared to active growth (Conde et al., 2013). Kumar et al. (2016) found that DNA methylation levels in apple decreased gradually from flower bud dormancy to fruit set. This dynamics in apple buds was only observed when apple trees were grown in environmental conditions satisfying the chilling requirement for winter dormancy release (Kumar et al., 2016). A search of differentially methylated genes in flower buds of almond by epi-Genotyping by sequencing (epi-GBS) led to a higher number of hypermethylated sequences in dormant buds when compared with dormancy-released samples (Prudencio et al., 2018). Recently, DNA methylation patterns during winter dormancy have been weekly examined in SAM tissue, from January to the time of vegetative bud break (Conde et al., 2017a). Results revealed a hypermethylationhypomethylation wave formed by an initial stage of gDNA hypermethylation followed by a period of progressive 5 mC reduction to minimum levels before vegetative bud break and a 5 mC increase coincident with cell division reactivation (Conde et al., 2017a). Remarkably, a similar hypermethylationhypomethylation wave has also been described in inflorescence SAM during the cold treatment of sugar beet (Beta vulgaris), suggesting comparable DNA methylation dynamics during vernalization and chilling requirements (Trap-Gentil et al., 2011; Conde et al., 2017a). Collectively, these findings suggest dynamic postembryonic deposition and removal of DNA methylation marks in SAM and stem tissues of woody perennials closely linked to the environmental factors.

# DNA Methylation Machinery Profile During Winter Dormancy

To investigate in poplar how the DNA methylation machinery could create this winter hypermethylation-hypomethylation wave in vegetative SAM and stem tissues, we performed a RNA-seq mediated gene expression profiling on weekly collected vegetative apical buds and stems of hybrid poplar (Populus tremula × alba INRA clone 717 1B4), grown under natural conditions in Pozuelo de Alarcón, Madrid, over the period of January 13th to April 14th 2015 for apical buds, and from November 7th 2014 to April 9th 2015 for stems, coinciding with bud break. Weekly time points were grouped according to the Pearson correlation for samples within the groups and considered as a biological replicate. A detailed list of sample names, dates and groups is shown in the **Supplementary Table S1**. These analyses resulted in 6 groups for apical bud samples, from mid-winter to mid-spring, and 10 groups for stem samples, from late-fall to mid-spring. The expression data of this experiment can be found in Phytozome<sup>1</sup> , under the expression tab for each gene.

Our RNAseq-based gene expression profiles revealed that the poplar homologs to Arabidopsis genes, involved in de novo DNA methylation machinery, such us domains rearranged 2 (DRM2), are highly and constantly expressed from autumn to spring including winter dormancy in apical bud and stem tissues (**Figures 1A,B**). In contrast, a seasonal specific gene expression pattern was found for a plant specific 5-methylcytosine DEMETER-like demethylase (DML10). DML10 showed a steady expression decline during early dormancy followed by a progressive increase in mRNA levels from mid-winter, with maximum expression observed at bud break in apical bud and stem tissues (**Figures 1A,B**). Similar results were reported in transcriptomics performed in poplar stem and lateral vegetative buds (Shim et al., 2014; Howe et al., 2015). In addition to that, poplar homologs to methyltransferase 1 (MET1) and

<sup>1</sup>https://phytozome.jgi.doe.gov/pz/portal.html

chromomethylase 3 (CMT3), that operate in CG and CHG contexts, were found induced just before the onset of bud break in apical bud and stem tissues (**Figures 1A,B**). According to Shim et al. (2014), CMT3 is also relatively highly expressed during growth resumption and active growth, while MET1 is more expressed during endodormancy and the start of ecodormancy in stem tissues. This difference in MET1 expression could be explained by the different environmental conditions in which the two experiments were carried out.

Collectively, these gene expression patterns evoke a plausible scenario whereby DNA methylation levels gradually increase during winter dormancy, likely via the unbalance between de novo DNA methylation and demethylation activity induced by the downregulation of DML10 during early dormancy (**Figure 1C**). Since 5 mC shows a maximal accumulation during winter dormancy, the progressive decline in 5 mC observed at the end of the dormancy period correlates with the induction of DML10 mRNA observed here (**Figure 1C**). In addition, the 5 mC increase produced during bud break was concurrent with the induction of MET1 and CMT3. These enzymes could contribute to the maintenance of heterochromatin and transposon methylation during cell replication, once cell division has been reactivated in poplar SAM. Together, these findings highlight the main contribution of DML10 enzyme in generating winter specific 5 mC pattern.

In spite of the close agreement between DNA methylation levels and gene expression of methylation enzymes, we cannot discard putative developmental, and tissue-dependent effects on the transcriptional and post-transcriptional regulation of DML10 and other epigenetic modifier genes, with potential impact on the methylation status of particular tissues and cells. In order to clarify this, tissue and cell-specific analyses would be very helpful.

# DNA Methylation and Energy Status During Winter Dormancy

DNA methyltransferases use S-adenosyl methionine (SAMe) as a methyl group donor (Lyko, 2018). The end-product of this catalytic reaction is S-adenosyl homocysteine (SAH). SAH hydrolase (SAHH) breaks down SAH into adenosine and homocysteine, the precursor of methionine (Met), which in turn produces SAMe. SAHH activity is linked to cell metabolism, as SAHH activity is dependent of NAD<sup>+</sup>

(Grillo and Colombatto, 2008). During glycolysis and TCA cycles, in which cells extract energy from glucose and pyruvate breakdown, molecules of NAD<sup>+</sup> are reduced into NADH, and hence a lower NAD+/NADH ratio could diminish the activity of SAHH (Luo and Kuo, 2009). In apple, the activity of the TCA cycle enzyme isocitrate dehydrogenase (ICDH) was found to be low in dormant vegetative buds compared to non-dormant buds (Wang et al., 1991). Transcriptional profiling of Paoenia ostii highlighted the importance of glycolysis and TCA cycle induction for flower bud dormancy release (Gai et al., 2013). We suggest that SAHH activity could acts as the bridge between the energetic status and DNA methylation, contributing to the DNA methylation increase observed during winter dormancy, following the NAD+/NADH ratio increment. We identified two poplar genes coding for SAHH enzymes: SAHH1 (Potri.001G320500) and SAHH17 (Potri.017G059400). Interestingly, both genes have been genetically associated with vegetative bud phenology during dormancy by Evans et al. (2014).

## HISTONE MODIFICATION AND GROWTH-DORMANCY CYCLES

#### Gene-Specific Histone Marks in Buds

The modification of histones via different biochemical mechanisms and chromatin remodeling are key elements of plant development regulation and their response to environmental conditions (de la Paz Sanchez et al., 2015). In particular, histone modifications identified in dormancy associated MADS-box (DAM) genes, the master regulators of vegetative and reproductive bud dormancy in Rosaceae and other perennial species, resemble chromatin dynamics of known regulators of vernalization, and seed dormancy and germination in Arabidopsis (Rios et al., 2014; Velappan et al., 2017). DAM1-6 was first identified in peach as a family of six tandemly arrayed genes coding for similar MADS-Box transcription factors, which were partially deleted in the evergrowing non-dormant mutant of peach (Bielenberg et al., 2008). Related DAM genes involved in bud dormancy maintenance have been also described in leafy spurge (Horvath et al., 2010), Japanese apricot (Sasaki et al., 2011), pear (Niu et al., 2016), and apple (Wu et al., 2017) among other species. DAM genes have been postulated to modulate dormancy via transcriptional regulation of FLOWERING LOCUS T (Hsu et al., 2011) in vegetative buds of leafy spurge (Hao et al., 2015) and flower buds of pear (Niu et al., 2016), and also the biosynthesis of ABA in lateral flower buds of pear (Tuan et al., 2017). Concomitant with bud dormancy release, a decrease in the trimethylation of lysine 4 in histone H3 (H3K4me3) has been found in the chromatin of the DAM1 gene in leafy spurge (Horvath et al., 2010), DAM6 in peach (Leida et al., 2012), and PpMADS13-1 in pear (Saito et al., 2015), suggesting that vegetative and flower buds share chromatin-related modifications across dormancy development in different species. Moreover, in peach, we observed a decrease in H3 acetylation on the ATG region of DAM6, and an increase in trimethylation of lysine 27 in H3 (H3K27me3) in a wider region of the gene (Leida et al., 2012). These chromatin changes are commonly associated with gene repression, and in fact, coincide with down-regulation of DAM-like genes, suggesting a mechanism for flower bud dormancy modulation through an arranged succession of epigenetic events in chromatin of DAM genes (Rios et al., 2014). In addition, small interference RNAs and microRNAs have been also postulated to regulate DAM-like expression and the floral dormancy transition in sweet cherry and pear, respectively (Niu et al., 2016; Rothkegel et al., 2017).

However, DAM genes are not the only known targets of chromatin modification during bud dormancy and development. The early bud-break 1 (EBB1) gene encodes a putative APETALA2/Ethylene responsive transcription factor that reactivates vegetative meristem growth and budbreak after dormancy release in poplar (Yordanov et al., 2014), with conserved orthologs in other woody perennial species (Busov et al., 2016). Two regions on the promoter and ATG site of PpEBB gene from pear have been found differentially trimethylated at H3K4 in accordance with PpEBB up-regulation during the floral sprouting stage (Tuan et al., 2016). On the other hand, PpeS6PDH encodes a sorbitol-6-phosphate dehydrogenase involved in the synthesis of sorbitol in axillary flower dormant buds of peach (Lloret et al., 2017). PpeS6PDH expression was found to be silenced in dormancy-released buds concomitantly with an H3K4me3 decrease and H3K27me3 increase in a particular regulatory region of the gene near the translation start (**Figure 2**). We postulated that sorbitol exerts a role as cryoprotectant and compatible solute in dormant buds, and hence dormancy regulation by DAM6 and abiotic stress

tolerance by PpeS6PDH could share a common mechanism for gene repression through concerted H3K27 trimethylation (Lloret et al., 2018).

# Histone Modifiers During Winter Dormancy

Trimethylation of H3K27 is achieved by the polycomb repressive complex 2 (PRC2), containing components conserved in animals and plants (Alvarez-Venegas, 2010; Derkacheva and Hennig, 2014). In peach, several genes coding for such subunits of PRC2 complexes co-localize with quantitative trait loci for the chilling requirement and bloom date traits, providing genetic evidence of the role of these complexes in dormancy regulation (Zhebentyayeva et al., 2014). The PRC2 component gene fertilization independent endosperm (FIE) was sharply up-regulated under short-photoperiod treatments correlating with growth cessation and dormancy induction in poplar (Ruttink et al., 2007). Moreover, FIE suppression by RNAi prevented the establishment of dormancy in transgenic hybrid aspen (Populus × spp.), even though growth cessation and bud formation were not affected (Petterle, 2011). In peach buds, genome-wide stretches enriched in H3K27me3 were found associated with GA-repeat sequences (de la Fuente et al., 2015), suggesting that basic pentacysteine (BPC) factors able to bind GA-repeats could mediate the recruitment of PRC2 and thus H3K27me3 modification in flower bud dormancy dependent genes such as DAM family and PpeS6PDH, as was recently reported in Arabidopsis (Xiao et al., 2017).

Additional chromatin-related genes, such as the chromatin remodeler PICKLE (PKL), and putative modifiers involved in histone deacetylation (HDA14 and HDA08), histone lysine methylation (SUVR3), and histone ubiquitination (HUB2) are also up-regulated during the transition to dormancy in Populus (Ruttink et al., 2007; Karlberg et al., 2010). Interestingly, downregulation of PKL expression restores photoperiod-induced dormancy in abi1-1 hybrid aspen mutants with a defective abscisic acid (ABA) response, suggesting that ABA promotes dormancy by repressing PKL (Tylewicz et al., 2018).

In axillary flower buds, dormancy changes are concomitant with flower developmental processes such as gametogenesis and organ development, which thus may account for part of the observed regulation of modifier genes and histone modifications. Strong changes in gene expression associated with microsporogenesis have been found in peach in parallel to dormancy release (Ríos et al., 2013), although no histone changes have been reported so far during this or other developmental processes in trees. However, in the model species Arabidopsis and rice, several epigenetic mechanisms involving chromatin remodeling and histone modification by PRC2 and other complexes modulate floral initiation and development at different steps (Guo et al., 2015), suggesting that processes other than dormancy may contribute to modify gene expression, DNA methylation, and histone modifications measurements in tree buds. Detailed tissue and organ-specific studies will be required in order to assess the contribution of particular organs and processes to these biochemical and molecular observations.

# BIOTECHNOLOGY OF THE EPIGENOME IN TREES

Modification of the plant epigenome contributes substantially to variation in plant growth, morphology, and plasticity (Johannes et al., 2008). In temperate and boreal trees, several lines of evidence point to environmental-guided DNA and histone modification profiles as critical regulators of chromatin function controlling the tempo of annual growth-dormancy cycles.

Recently, Kumar et al. (2016) reported that DNA demethylation in apple trees, which precedes floral bud break and fruit formation, only occurs under environments that fulfill the chilling requirement. Moreover, Conde et al. (2017a) observed that the induction of DML10 expression before poplar bud break only happens if a chilling requirement has been fulfilled. These observations indicate that the downregulation of 5 mC observed during winter dormancy is closely linked to the transition from endodormacy to ecodormancy and is a precondition for growth resumption of vegetative and reproductive meristems. Therefore, modification of this DNA methylation pattern may have an impact on phenology as a biotechnological strategy to relax or tighten the chromatin state thus modifying the annual growth-dormancy cycle in trees.

Based on the hyper-hypomethylation wave of 5 mC reported during winter in poplar, it could be hypothesized that disruption of this pattern by creating hypermethylated poplar lines will delay growth resumption after dormancy. Conversely the generation of hypomethylated lines could result in rapid growth resumption following dormancy period. Candidate genes that alter winter dormancy DNA methylation/demethylation patterns can be inferred from our analyses. Thus, it could be possible to engineer poplars showing DNA hypermethylation through the upregulation of DRM2 or silencing of DML10 or SAHH, respectively. On the other hand, upregulation of DML10 or SAHH, or silencing of DRM2, respectively, could yield hypomethylated poplar lines. Accordingly, Conde et al. (2017a) described an RNAi strategy to reduce DML gene expression in hybrid poplars, noting that transgenic poplars featuring DML10 downregulation showed significantly higher levels of DNA methylation, which delayed bud break. These RNAi DML10 poplar lines showed negligible alteration of growth and development despite the specific effect mentioned. Equally, the overexpression of a chestnut DML (Castanea sativa) in hybrid poplar resulted in transgenic lines in which apical bud formation during dormancy establishment was accelerated showing no other visible alterations (Conde et al., 2017b). Hence, the consequences of epigenome engineering need to be tested gene by gene to check for possible pleiotropic phenotypes. Alternatively, a detailed knowledge of the tissue and time-dependent expression and activity of key modifier genes may provide useful information for the specific activation/repression of epigenetic regulators without undesirable effects on many other biological processes affected by them, with the help of adequate specific promoters.

The functional role of histone modifications during tree annual growth-dormancy cycles needs further clarification. The PRC2 complex seems to play a key role in dormancy regulation through the H3K27me3 modification on regulatory genes, although only the function of the PRC2 component FIE has been initially explored in poplar (Petterle, 2011). Tree orthologs of other subunits of this complex, such as CURLY LEAF (CLF) and SWINGER (SWN) could provide additional evidences on PCR2 participation in dormancy mechanisms and also serve as candidates for phenological manipulation. However, the putative targets of PCR2 activity may constitute more suitable and specific objectives for biotechnological approaches.

A genomic deletion of several DAM genes located in tandem causes a non-dormant phenotype in peach, highlighting the functional relevance of these genes in regulating dormancy (Bielenberg et al., 2008). As DAM genes have been postulated to integrate environmental signals, particularly chilling accumulation, by an epigenetic mechanism involving H3K27me3 and other histone modifications, targeted mutants on DAM ciselements promoting the binding of histone modifier complexes could be employed to specifically modulate the response of DAM genes to chilling, with a potential use in manipulating the adaptability of stone fruit crops to changing climatic conditions. GA repeats on the large intron of DAM6 may serve as tentative candidate cis-elements for this approach. In addition to GA repeats, the telobox cis-element has been shown to recruit the PRC2 complex in Arabidopsis through telomere-repeat-binding factors (TRBs) (Zhou et al., 2018). Thus, putative telobox motifs on DAM and other regulatory genes could also be modified in order to reprogram the environmental input on dormancy cycles.

Additional research is needed on the study of detailed characterization of the seasonal growth cycles involving frequently sampled intervals along the growth-dormancy-growth cycle. Detailed, subtle changes in chromatin modifications may be occurring outside the published data sets. In addition, new analytical techniques are emerging that may allow high-throughput characterization of DNA methylation and histone acetylation (e.g., ATAC-seq) should be applied to such periodically sampled tissues.

Other genes mentioned in this study involved in DNA methylation, RNA interference, chromatin remodeling, histone modification and transcriptional regulation of meristem growth, and dormancy are interesting candidates for biotechnological applications in tree phenology. The CRISPR/Cas9 system

#### REFERENCES


emerges as a promising technique due to its simplicity, design flexibility and high degree of efficiency. However, so far, few studies involving the use of this technique in tree species have been conducted, and no study has examined the impacts of epigenetics mark modifications (Fan et al., 2015; Zhou et al., 2015). Future studies designed to edit specific epigenetic regulator genes will unravel the impacts of particular epigenetic modifications on the annual cycle of trees.

# AUTHOR CONTRIBUTIONS

DC, MP, AL, MB, PG-M, GR, and IA participated in the discussions described here. DC, MP, AS, GT, and IA implicated in the RNA-seq analysis. DC, MP, GT, GR, and IA wrote the manuscript.

# FUNDING

This study was supported by grants AGL2014-53352-R, AGL2010-20595, PCIG13-GA-2013-631630, and INIA-FEDER RF2013-00043-C02-02 awarded to IA, MB, and MP. The work conducted by the U.S. Department of Energy Joint Genome Institute (JGI) was supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02- 05CH11231. The work of MP was supported by the Ramón y Cajal MINECO program (RYC-2012-10194).

#### ACKNOWLEDGMENTS

We really appreciate the great help from the Gene Atlas group: Jeremy Schmutz, Jeremy Phillips, and Joe Carlson. We apologize to all their colleagues whose work has not been cited here because of space restrictions. We acknowledge the Severo Ochoa Programme for Centres of Excellence in R&D 2017–2021.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2019.00412/ full#supplementary-material

forest tree species to the environment. Ecol. Evol. 3, 399–415. doi: 10.1002/ ece3.461


apical bud maturation in poplar. Plant Cell Environ. 40, 2806–2819. doi: 10. 1111/pce.13056


of epigenetic regulation during winter dormancy in apple (Malus x domestica Borkh.). PLoS One 11:e0149934. doi: 10.1371/journal.pone.0149934



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Conde, Perales, Sreedasyam, Tuskan, Lloret, Badenes, González-Melendi, Ríos and Allona. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Environmentally Sensitive Molecular Switches Drive Poplar Phenology

Jay P. Maurya<sup>1</sup>† , Paolo M. Triozzi<sup>2</sup>† , Rishikesh P. Bhalerao<sup>1</sup> \* and Mariano Perales<sup>2</sup> \*

<sup>1</sup> Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, Umeå, Sweden, <sup>2</sup> Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid-Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria, Madrid, Spain

#### Edited by:

Ronald Ross Sederoff, North Carolina State University, United States

#### Reviewed by:

Hairong Wei, Michigan Technological University, United States Maria Veronica Arana, Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina

#### \*Correspondence:

Rishikesh P. Bhalerao Rishi.Bhalerao@slu.se Mariano Perales mariano.perales@upm.es

†These authors have contributed equally to this work

#### Specialty section:

This article was submitted to Plant Biotechnology, a section of the journal Frontiers in Plant Science

Received: 30 July 2018 Accepted: 04 December 2018 Published: 17 December 2018

#### Citation:

Maurya JP, Triozzi PM, Bhalerao RP and Perales M (2018) Environmentally Sensitive Molecular Switches Drive Poplar Phenology. Front. Plant Sci. 9:1873. doi: 10.3389/fpls.2018.01873 Boreal and temperate woody perennials are highly adapted to their local climate, which delimits the length of the growing period. Moreover, seasonal control of growthdormancy cycles impacts tree productivity and geographical distribution. Therefore, traits related to phenology are of great interest to tree breeders and particularly relevant in the context of global warming. The recent application of transcriptional profiling and genetic association studies to poplar species has provided a robust molecular framework for investigating molecules with potential links to phenology. The environment dictates phenology by modulating the expression of endogenous molecular switches, the identities of which are currently under investigation. This review outlines the current knowledge of these molecular switches in poplar and covers several perspectives concerning the environmental control of growth-dormancy cycles. In the process, we highlight certain genetic pathways which are affected by short days, low temperatures and cold-induced signaling.

Keywords: poplar, adaptive response, cold response, circadian clock, short day, low ambient temperature, bud set, winter dormancy

## INTRODUCTION

When the photoperiod falls below the critical day length, poplars undergo growth cessation, culminating in bud set and the acquisition of cold hardiness. The fact that components of light signaling, the circadian clock and orthologs of Arabidopsis flowering time regulators have all been implicated in this stage suggests the interplay between environmental signals and diurnal gene expression in dormancy switch. Thereafter, the sequential induction of ethylene and abscisic acid signaling pathways promotes bud maturation, cessation of meristematic activity and the establishment of dormancy. Once dormant, meristem becomes insensitive to growth-promoting signals. Release from dormancy strongly depends on the accumulation of a defined number of chilling hours, a mechanism for which the molecular basis is unknown. Under spring conditions, the low temperature (LT)-mediated activation of growth promoters reactivates meristem growth. Thus, poplar phenology is controlled by the orchestrated activity of molecular switches following exposure to environmental cues. This mini-review provides insight into the genetic network underlying poplar bud set, dormancy establishment and bud break with a focus on photoperiodand temperature-dependent signaling. Moreover, we show evidence that bud set is highly sensitive to low ambient temperatures and identify candidate genes that may participate in this response.

# SHORT DAYS PROMOTE ACCLIMATION TO WINTER CONDITIONS DURING BUD SET

Irrefutable experimental evidence for the short day (SD) requirement to cold acclimation comes from studies of photoperiod-insensitive oat PHYA overexpressing poplars Junttila and Kaurin, 1990; Olsen et al., 1997; Welling et al., 2002). When transgenic and wild type plants grown during SDs were subjected to freezing conditions, poplars overexpressing oat PHYA did not develop cold hardiness (Olsen et al., 1997; Welling et al., 2002). In contrast, neither transgenic nor wild type poplars grown during long days were able to survive freezing temperatures Junttila and Kaurin, 1990. This indicates that exposure to SDs acclimates poplars to cold weather conditions, and possibly other environmental stresses, which suggests that SDs may activate adaptation pathways.

The transcriptional and metabolomic profiling of poplar shoot apices and stem tissues grown under LD and SD conditions has been used to investigate the molecular signatures underlying SD-driven adaptive responses. The results revealed that SD conditions significantly alter the transcription of certain genes (Rohde et al., 2007; Ruttink et al., 2007; Karlberg et al., 2010; Hoffman et al., 2010; Resman et al., 2010). For example, a reduction in day length affects the transcription of several light signaling- and circadian clock-regulated genes (Ruttink et al., 2007). Moreover, SD also induces genes related to dehydration and cold adaptation, such as LATE EMBRYOGENESIS-ABUNDANT (LEA), DEHYDRIN (DHN), COLD REGULATED GENES (COR), and a set of transcription factors that mediate responses to abiotic stimuli, among others (Ruttink et al., 2007; Karlberg et al., 2010). Continued SD exposure stimulates ethylene signaling and abscisic acid (ABA) hormonal burst (Rohde et al., 2002; Ruttink et al., 2007; Karlberg et al., 2010). Ethylene and ABA are abiotic stressrelated phytohormones required for adaptive responses in plants and are also functionally associated to the regulation of many aspects of bud development in poplar and birch (Rohde et al., 2002; Ruonala et al., 2006; Shi and Yang, 2014; Müller and Munné-Bosch, 2015; Tylewicz et al., 2015; Trivedi et al., 2016; Tylewicz et al., 2018). Thus, acclimation to winter conditions is mediated by the temporal orchestration of SDinduced transcriptional responses that partly originate from light signaling and circadian clock dependent pathways. As a result, poplars with modified circadian clock sensitivity showed different tolerance to freezing temperatures (Ibañez et al., 2010).

# TEMPERATURE MODULATES BUD SET

The effect of temperature on bud set has been poorly investigated at the molecular level. Earlier physiological studies have shown that bud set is a thermosensitive process in trees (reviewed in Tanino et al., 2010). Furthermore, recent phenotyping studies using clonally replicated poplars grown under natural conditions show that the timing of bud set depends on the local climate (Rohde et al., 2011; Evans et al., 2014). Rohde et al. (2011) showed that sub-optimal growth temperatures delay the timing of bud set. Accordingly, we observed that the timing of bud set in hybrid aspen (Populus tremula x P. alba) grown under controlled conditions is highly sensitivity to small changes in ambient temperature (**Figures 1A,B**). Bud set was delayed by approximately 7 days in plants grown at 18◦C relative to plants grown at 21◦C, and this difference was amplified when the temperature was further reduced to 15◦C (**Figures 1A,B**). This indicates that ambient temperature can impact the SD responses controlling bud set timing. Thus, it can be hypothesized that the activation or repression of thermosensitive genes underlies the regulation of pathways that control bud set. Numerous biological processes are governed by the endogenous clock (Greenham and McClung, 2015); thus, bud set in response to low ambient temperature may additionally be mediated by the modulation of clock-related pathways. For instance, the transcription of cold tolerance genes, e.g., C-REPEAT BINDING FACTOR (CBF) genes, follow circadian rhythms in Arabidopsis (Fowler et al., 2005; Dong et al., 2011).

To identify poplar candidate genes that could affect the timing of bud set under low ambient temperature, we investigated how diurnal photothermocycles affect bud set-associated single nucleotide polymorphism (SNP) genes (Mockler et al., 2007; Filichkin et al., 2011; Evans et al., 2014). Particularly, we analyzed the diurnal oscillation (cut-off 0.8) of bud set-associated SNP genes in two different conditions, photothermocycles (LDHC; Day 25◦C/Night 12◦C) and photocycles (LDHH; Day 25◦C/Night 25◦C), in Populus thricocarpa using the Diurnal database<sup>1</sup> (Mockler et al., 2007; Filichkin et al., 2011). A total of 134 genes associated with bud set showed robust diurnal transcription patterns over a 48 h period under LDHC conditions but not under constant temperature (LDHH), which suggests that the lower night temperature promotes their rhythmic expression (**Figures 1C–E**). Within the clusters of genes affected by photothermal cycles, we identified diurnal expression in the poplar ortholog of Arabidopsis PHOSPHOLIPASE D DELTA (PLDDELTA; Potri.007G060300), which is involved in phospholipid metabolism, freezing tolerance and stomatal closure (**Figure 1D**; Chen et al., 2008a,b; Distéfano et al., 2012; Uraji et al., 2012). Moreover, the poplar ortholog of C-REPEAT-BINDING FACTOR 4 (CBF4; Potri.012G134100), which mediates the response to decreased temperatures in Arabidopsis (Wang and Hua, 2009), is also induced by cold temperatures in poplar and birch (**Figure 1E**; Benedict et al., 2006; Welling and Palva, 2008). These examples indicate that the rhythmic expression of genes associated with bud set – stimulated by diurnal oscillations in temperature – may be required for cold acclimation. In contrast, we identified 172 genes with bud set-associated SNPs that showed diurnal transcription patterns under constant temperatures (LDHH) but not under photothermocycles (LDHC). This suggests that reduced nighttime temperatures can undermine the rhythmic expression of certain genes (**Figures 1C,F,G**). Within the clusters of

<sup>1</sup>http://diurnal.mocklerlab.org

genes with diurnal expression disrupted by photothermocycles, we highlight the poplar ortholog of Arabidopsis COLD, CIRCADIAN RHYTHM, AND RNA BINDING 1 (CCR1; Potri.009G116400) gene (**Figure 1F**). CCR1 contains an RNA recognition motif (RRM), which promotes alternative splicing that is coupled to degradation by nonsense-mediated decay (NMD) (Schöning et al., 2008). CCR1 is also regulated by both cold conditions and circadian rhythms in Arabidopsis (Carpenter et al., 1994). Furthermore, we found that the diurnal expression of the poplar ortholog of Arabidopsis PURIN 7 (PUR7; Potri.017G051500), which is required to generate purine dependent cofactors in tissues under high rates of cell division, was disrupted by low night-time temperatures (Senecoff et al., 1996). These examples indicate that the impairment of key, diurnally regulated nucleic acid metabolism processes by decreased night-time temperatures could be important to timely bud set in poplar. Future functional studies will hopefully elucidate the genetic network involved in bud set regulation when poplars are exposed to LTs.

# COLD DISRUPTION OF CIRCADIAN CLOCK AND BUD SET

The circadian clock creates endogenous, 24 h rhythms to help plants and animals anticipate daily and seasonal environmental changes (Schultz and Kay, 2003). The circadian clock controls physiology, growth and development, as temporal transcriptional profiles have revealed that more than 30% of genes in Arabidopsis and poplar show circadian rhythms (Harmer et al., 2000; Covington et al., 2008; Michael et al., 2008; Hoffman et al., 2010). Recent research has revealed a role for the circadian clock in the genetic network regulating growth-dormancy cycles. It

CCR1 (F) and PUR7 genes (G).

is widely accepted that circadian rhythms play a central role in the photoperiodic mechanism that controls poplar shoot apical growth (reviewed in Triozzi et al., 2018). However, the implication of the biological clock in the control of growth cessation and bud set has recently emerged (Ibañez et al., 2010; Kozarewa et al., 2010; Ding et al., 2018). The circadian rhythms of several pathways involved in growth cessation and bud set can persist 8–10 weeks after exposure to continuous SDs (Triozzi et al., personal communication). This is supported by the finding that the downregulation of poplar clock-related genes LATE ELONGATED HYPOCOTYL (LHY) and TIMING OF CAB EXPRESSION 1 (TOC1) delays growth cessation and bud set (Ibañez et al., 2010). Furthermore, it has been firstly shown that chestnut clock-related genes display arrhythmic expression under winter conditions (Ramos et al., 2005; Ibañez et al., 2008). Moreover, chestnut and poplar clock-related genes respond to cold temperatures (4◦C), showing high and constant expression irrespective of photoperiod (Ramos et al., 2005; Ibañez et al., 2010). It has been suggested that circadian clock disruption may facilitate bud set under cold temperatures (Johansson et al., 2015). Accordingly, cold-induced disruption of circadian rhythms caused wide transcriptional rearrangement of cold response genes in Arabidopsis (Bieniawska et al., 2008). Additionally, when exposed to freezing temperatures, LHY-RNAi poplars showed far more severe stem injuries than control plants. This indicates that the circadian clock is pivotal in the development of cold hardiness during bud set (Ibañez et al., 2010). Nevertheless, the functional implications of cold-induced disruption of clockregulated pathways during bud set needs further investigation.

# DORMANCY ESTABLISHMENT

Prolonged SD exposure after growth cessation and bud set results in dormancy establishment in various plants (Heide, 1974; Espinosa-Ruiz et al., 2004; Ruttink et al., 2007; Maurya and Bhalerao, 2017). However, until recently, our knowledge of the molecular basis of bud dormancy was rudimentary. Conclusive evidences for dormancy establishment has come from studies on plant hormones ethylene and ABA, both of which are required for apical bud formation (**Figure 2**). For example, ethylene-insensitive dominant mutant etr1-1 birch plants failed to develop apical buds under SD conditions (Ruonala et al., 2006). Ethylene has also been suggested to participate in dormancy induction in plants such as leafy spurge and Chrysanthemum (Sumitomo et al., 2008; Dogramacı et al., 2013 ˘ . Nevertheless, genetic evidences for how ethylene affects bud dormancy are missing.

Abscisic acid levels increase following SD exposure (Ruttink et al., 2007; Karlberg et al., 2010). Interestingly, the apical buds of etr1-1 mutant birches failed to accumulate ABA under SD conditions, which may suggest that defects in these plants may stem from the inability to increase ABA levels. Moreover, under SD conditions, ABI3oe plants display apical buds with

defects during bud maturation (Rohde et al., 2002; Ruttink et al., 2007). Although poplar ABI3 has not been demonstrated to be ABA responsive or contribute to the ABA response per se, this finding may suggest a role for ABA, or at least ABI3, in bud maturation. Other research suggests that molecules associated with the opening and closing of plasmodesmata are involved in dormancy establishment (Rinne et al., 2011). It has been proposed that plasmodesmata closure in shoot apical meristem (SAM) is important for dormancy establishment (Rinne et al., 2011; Singh et al., 2017). Growth-promoting signals such a florigen move symplastically through plasmodesmata to SAM to promote growth under appropriate conditions, yet molecular and genetic evidence for this mechanism was only recently published (Singh et al., 2017; Tylewicz et al., 2018).

Abscisic acid-insensitive hybrid aspen plants overexpressing the dominant negative abi1-1 allele failed to establish dormancy under short photoperiods (SPs) (Tylewicz et al., 2018). Both WT and abi1-1 plants underwent growth cessation and apical bud formation when grown under SPs. However, when transferred from 11 weeks of SP to growth-promoting long photoperiods (LPs), WT plants did not start growing, which suggests that these plants were in a dormant state. However, abi1-1 lines restarted growth 11–15 days after being transferred to LPs. The transcriptomic analysis of WT and abi1-1 plants after 0, 6, and 10 weeks of SPs suggests that 1000s of genes are differentially regulated. Interestingly, many of these genes are associated with plasmodesmata closure and opening. Plasmodesmata closurerelated genes such as GERMIN-LIKE 10, REMORIN-LIKE 1 and 2 and CALLOSE SYNTHASE 1 were upregulated in the apices of WT plants whereas genes related to plasmodesmata opening, such as GH17\_39, were downregulated. The transcript levels of these genes differed between abi1-1 and WT plants. Microscopic analyses of WT and abi1-1 plants suggest that 83 and less than 3% of plasmodesmata in these plants, respectively, were closed after 10W of SPs (Tylewicz et al., 2018). Flowering Locus T (FT) protein acts as a mobile signal that can move from leaves to apex (Corbesier et al., 2007) and promotes growth of poplar trees even under SP (Böhlenius et al., 2006). Furthermore, when rootstocks of Flowering Locus T1 (FT1) plants were grafted to the scions of 10W SP grown WT and abi1-1 plants, it was able to activate the growth of abi1-1 plants even in SP conditions but not in WT scions.

Many of the transcripts associated with the plasmodesmata closure and opening also responded to the SPs (Tylewicz et al., 2018). Overexpression of plasmodesmata-located protein 1 (PDLP1) in abi1-1 plants leads to dormancy establishment, as abi1-1/PDLP1oe plants transferred from SPs to LPs did not show growth reactivation. Certain genes involved in chromatin remodeling, such as FERTILIZATION INDEPENDENT ENDOSPERM (FIE) and PICKLE (PKL), have been shown to be upregulated after SD exposure (Ruttink et al., 2007). FIE is a component of the polycomb repressive complex 2 (PRC2), while PKL is its antagonist. PRC2 plays important roles in keeping the genes in the repressed state (Margueron and Reinberg, 2011). Transcript level of PKL was upregulated in abi1-1 plants, and its downregulation in abi1-1 background developed dormancy by closing plasmodesmata (Tylewicz et al., 2018).

Taken together, these results suggest that SP influences ABA levels, which then differentially regulate plasmodesmata opening and closure-related genes to induce dormancy. Once the plasmodesmata are closed, growth-promoting signals cannot enter the SAM due to the formation of dormancy sphincters through callose deposition in plasmodesmata.

# DORMANCY RELEASE AND BUD BURST

As mentioned earlier, dormancy establishment makes the SAM impermeable to growth-promoting signals, even under LD conditions. Release from dormancy requires that buds are exposed to LT for a prolonged period (Saure, 1985; Hannerz et al., 2003; Brunner et al., 2014; Fu et al., 2015). How temperature regulates dormancy release remains a poorly studied topic. Interestingly, the exposure of chilled buds to warm temperature is sufficient to induce dormancy release and subsequent bud burst. By monitoring the dormancy release time in plants with altered basal level of certain genes, we can anticipate their role in this process. Since plasmodesmata closure is required for dormancy establishment (Tylewicz et al., 2018), their opening could be expected to result in dormancy release. It has been suggested that LT treatment leads to the opening of plasmodesmata (Rinne et al., 2011), but this does not have proper experimental proof until now. However, some of the genes needed for the removal of plasmodesmatal dormancy sphincters by degrading the deposited callose, such as 1,3-β-glucanase (glucan hydrolase family 17 [GH17]), are upregulated after chilling treatment (Rinne et al., 2011).

Gibberellic acids (GAs) and FT are positive regulators of growth; as such, LT induces the transcription of FT1 and genes implicated in GA metabolism, those that encode members of the GA3 and GA20 oxidases (Rinne et al., 2011). GAs may promote GH17s expression to reopen the plasmodesmata and thus restart the symplastic, growth-promoting cell-tocell signaling within the SAM (**Figure 2**). However, genetic and experimental proof of this mechanism is still lacking. CONSTANS (CO) expression was shown to be upregulated almost threefold after 2 weeks of chilling. This suggests that the CO/FT module may be initiated by LT, yet CO expression remains at a similar level throughout the cold period and is only further enhanced when chilled buds sense LDs and warmer temperatures (Rinne et al., 2011). However, the expression of CENL1/TFL1 (CENTRORADIALIS-LIKE1/TERMINAL FLOWER 1), a negative regulator of growth, remains at low levels in chilled buds throughout the LT period (**Figure 2**; Rinne et al., 2011). Experiments with CENL1oe and CENL-RNAi lines showed delayed and early bud burst, respectively, relative to wild type plants (Mohamed et al., 2010). This suggests that CENL1 represses dormancy release and bud burst. In chilled buds, CENL1 expression is only upregulated once the buds are transferred to warmer temperatures.

The process of dormancy release in trees is comparable to vernalization in Arabidopsis (Chouard, 1960). Many of the MAD-box transcription factors – termed DORMANCY ASSOCIATED MADS-BOX (DAM) – are known to be induced in the dormant buds of many plants, and are similar to short vegetative protein (SVP) and AGAMOUS-LIKE (AGL) transcription factors which control flowering in Arabidopsis (Ríos et al., 2014). Similar to Flowering Locus C (FLC) in Arabidopsis, these DAM genes are also epigenetically regulated by cold conditions. Cold-induced epigenetic silencing of floral repressor FLC is required during vernalization to induce flowering in Arabidopsis (Amasino, 2004; Bastow et al., 2004). The finding that expression of DAM genes goes down after cold treatment suggests that these genes are repressors of dormancy release (Shim et al., 2014; Howe et al., 2015). However, the exact roles of DAM genes can only be elucidated through future genetic studies.

Like dormancy release, our knowledge about bud burst is also very limited in perennial plants. A large number of genes are differentially regulated during dormancy release (Shim et al., 2014; Howe et al., 2015), but we do not have enough genetic evidence to conclusively describe their roles in dormancy release. Very recently, a genetic network mediating the dormancy release and bud break has been described (Singh et al., 2018). This study demonstrated that short vegetative protein-like (SVL), a tree ortholog of Arabidopsis SVP, acts as a central regulator of dormancy release and bud burst in poplar. It negatively regulates dormancy release and bud burst by promoting and repressing the expressions of negative and positive regulators of growth, respectively, after cold treatment (Singh et al., 2018). The expression of Early Bud-Break 1 (EBB1), a putative APETALA2/Ethylene responsive factor, also increases after cold periods during dormancy release and before bud burst. However, EBB1 transcripts are undetectable during the dormancy period. Transgenic lines with EBB1 overexpression and downregulation show early and delayed bud burst relative to wild type plants, respectively. This suggests that EBB1 is a positive regulator of bud burst (**Figure 2**; Yordanov et al., 2014).

#### REFERENCES


# CONCLUDING REMARKS

The effect of temperature on tree phenology is an important topic in the context of climate change. Extended seasonal growth and shifts in latitudinal distribution demonstrate how plants are already adapting to increasing global temperatures. Although previous research has attempted to elucidate how temperature influences growth cessation and bud set in trees, very little is still known about the molecular mechanisms involved in these phenomena. For this reason, understanding how environmentally sensitive molecular switches regulate the perception and transduction of temperature signals in woody plants will be crucial to predicting how plants will adapt to warmer environments and designing appropriate breeding programs for this scenario.

# AUTHOR CONTRIBUTIONS

PT and MP analyzed the data. JM, PT, RB, and MP contributed to the writing of the manuscript.

# FUNDING

The work of MP was supported by the Ramón y Cajal Programme of MINECO (RYC-2012-10194) to PT. The Ph.D. project of PT was supported by a program at the CEI campus of the Universidad Politécnica de Madrid (L1UF00-47-JX9FYF). The work in the lab of RB was funded by grants from Vetenskapsrådet and Knut and Alice Wallenberg Foundation.

## ACKNOWLEDGMENTS

We thank Dr. Allona for the critical reading of the manuscript. We apologize to all colleagues whose work has not been cited because of space limitations.




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Maurya, Triozzi, Bhalerao and Perales. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Hardwood Tree Genomics: Unlocking Woody Plant Biology

Gerald A. Tuskan<sup>1</sup> \*, Andrew T. Groover<sup>2</sup> , Jeremy Schmutz3,4, Stephen Paul DiFazio<sup>5</sup> , Alexander Myburg<sup>6</sup> , Dario Grattapaglia7,8, Lawrence B. Smart<sup>9</sup> , Tongming Yin<sup>10</sup> , Jean-Marc Aury11, Antoine Kremer12, Thibault Leroy12,13, Gregoire Le Provost<sup>12</sup> , Christophe Plomion12, John E. Carlson14, Jennifer Randall15, Jared Westbrook<sup>16</sup> , Jane Grimwood<sup>3</sup> , Wellington Muchero<sup>1</sup> , Daniel Jacobson<sup>1</sup> and Joshua K. Michener<sup>1</sup>

<sup>1</sup> Center for Bioenergy Innovation, Biosciences Division, Oak Ridge National Laboratory (DOE), Oak Ridge, TN, United States, <sup>2</sup> Pacific Southwest Research Station, USDA Forest Service, Davis, CA, United States, <sup>3</sup> HudsonAlpha Institute for Biotechnology, Huntsville, AL, United States, <sup>4</sup> Joint Genome Institute, Walnut Creek, CA, United States, <sup>5</sup> Department of Biology, West Virginia University, Morgantown, WV, United States, <sup>6</sup> Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute, University of Pretoria, Pretoria, South Africa, <sup>7</sup> Embrapa Recursos Genéticos e Biotecnologia, Brasília, Brazil, <sup>8</sup> Universidade Católica de Brasília, Brasília, Brazil, <sup>9</sup> Horticulture Section, School of Integrative Plant Science, Cornell University, Geneva, NY, United States, <sup>10</sup> The Key Laboratory for Poplar Improvement of Jiangsu Province, Nanjing Forestry University, Nanjing, China, <sup>11</sup> Commissariat à l'Energie Atomique, Genoscope, Institut de Biologie François-Jacob, Evry, France, <sup>12</sup> BIOGECO, INRA, Université de Bordeaux, Cestas, France, <sup>13</sup> ISEM, CNRS, IRD, EPHE, Université de Montpellier, Montpellier, France, <sup>14</sup> Schatz Center for Tree Molecular Genetics, Department of Ecosystem Science and Management, Pennsylvania State University, University Park, PA, United States, <sup>15</sup> Department of Entomology, Plant Pathology and Weed Science, New Mexico State University, Las Cruces, NM, United States, <sup>16</sup> The American Chestnut Foundation, Asheville, NC, United States

#### Edited by:

Ronald Ross Sederoff, North Carolina State University, United States

#### Reviewed by:

Hua Cassan Wang, Université de Toulouse, France Deqiang Zhang, Beijing Forestry University, China

> \*Correspondence: Gerald A. Tuskan gtk@ornl.gov

#### Specialty section:

This article was submitted to Plant Biotechnology, a section of the journal Frontiers in Plant Science

Received: 15 May 2018 Accepted: 19 November 2018 Published: 17 December 2018

#### Citation:

Tuskan GA, Groover AT, Schmutz J, DiFazio SP, Myburg A, Grattapaglia D, Smart LB, Yin T, Aury J-M, Kremer A, Leroy T, Le Provost G, Plomion C, Carlson JE, Randall J, Westbrook J, Grimwood J, Muchero W, Jacobson D and Michener JK (2018) Hardwood Tree Genomics: Unlocking Woody Plant Biology. Front. Plant Sci. 9:1799. doi: 10.3389/fpls.2018.01799 Woody perennial angiosperms (i.e., hardwood trees) are polyphyletic in origin and occur in most angiosperm orders. Despite their independent origins, hardwoods have shared physiological, anatomical, and life history traits distinct from their herbaceous relatives. New high-throughput DNA sequencing platforms have provided access to numerous woody plant genomes beyond the early reference genomes of Populus and Eucalyptus, references that now include willow and oak, with pecan and chestnut soon to follow. Genomic studies within these diverse and undomesticated species have successfully linked genes to ecological, physiological, and developmental traits directly. Moreover, comparative genomic approaches are providing insights into speciation events while large-scale DNA resequencing of native collections is identifying population-level genetic diversity responsible for variation in key woody plant biology across and within species. Current research is focused on developing genomic prediction models for breeding, defining speciation and local adaptation, detecting and characterizing somatic mutations, revealing the mechanisms of gender determination and flowering, and application of systems biology approaches to model complex regulatory networks underlying quantitative traits. Emerging technologies such as single-molecule, long-read sequencing is being employed as additional woody plant species, and genotypes within species, are sequenced, thus enabling a comparative ("evo-devo") approach to understanding the unique biology of large woody plants. Resource availability, current genomic and genetic applications, new discoveries and predicted future developments are illustrated and discussed for poplar, eucalyptus, willow, oak, chestnut, and pecan.

Keywords: tree habit, somatic mutations, evolutionary ecology, quantitative genetics, adaptive traits, comparative genomics

# INTRODUCTION

fpls-09-01799 December 13, 2018 Time: 15:23 # 2

Forest trees have historically been classified as softwood or hardwoods for practical purposes. Botanically, softwood trees are gymnosperms and hardwood trees are angiosperms (i.e., flowering plants) and, as the names indicate, there are differences in reproductive biology and overall habit between these two groups. Around the globe, hardwood trees underpin complex ecosystems, sequester carbon (e.g., rainforests), provide raw materials for the forest products industry, are a source of primary energy in developing countries and are increasingly utilized as a source of renewable bioenergy, biomaterials, and other bioproducts (Ragauskas et al., 2014; FAO, 2016).

Woody perennial angiosperms, i.e., hardwood trees, are amazingly diverse in terms of morphology, life habits, physiology, among many other traits. Hardwood trees show extensive variation in wood anatomy, leaf morphology, whole-tree architecture, secondary metabolism and numerous adaptive traits (Groover and Crook, 2017). The classification of "tree" or arboreal form reflects several distinctive features: e.g., extensive wood formation produced by a vascular or secondary cambium, persistent interannual growth, single or multiple stems exceeding two meters in height, accompanied by perennial lateral branches. Individually, these defining features can however occur outside of the classification as a tree, e.g., some annuals that can produce woody tissue or achieve heights greater than 2 m. Exception aside, hardwood trees can establish extensive root systems that enable vigorous interannual growth, store energy and nutrients overwinter in stem and root tissues, and in many scenarios, create dominant ecosystem overstories.

The ancestral habit for angiosperms is believed to be reflected in the most basal extant angiosperm, Amborella trichopoda (Soltis et al., 2008; Albert et al., 2013), which is a small tree native to New Caledonia. During angiosperm evolution, woody perennial habit has been accentuated in some lineages, lost in others (e.g., monocots), and lost and then regained in still others (Spicer and Groover, 2010). To a large extent, these evolutionary and developmental changes are contained in the genomes of the extant hardwood tree species.

Reflecting their evolutionary history, hardwood trees genomes display divergent chromosomal architectures, with multiple ancestral whole-genome duplications and rearrangements, gene family expansions, and many segmental deletions across lineages (Tuskan et al., 2006; Soltis et al., 2008; Dai et al., 2014; Myburg et al., 2014; Plomion et al., 2018). Understanding comparative connections among genomes and the associated characteristics defining hardwood trees remains a substantial research challenge. In this review we will cover the current state of the science in five major groups of hardwood trees outlining the current assembly and annotation metrics, as well as presenting on-going application of these resources. We will highlight on-line resources for these genomes and provide comparative resources for other angiosperm species. We will provide projections of near-term technologies and analytics that will increase the availability and research value of genomes of hardwood trees and conclude with a summary of several unanswered questions related to macro- and micro-evolution of woody perennials that can be addressed using genomic approaches in future.

# STATE OF THE SCIENCE

# Populus

When the Populus trichocarpa genome, 'Nisquallly-1,' was released in 2006 it was the first plant genome to use shotgun sequencing assembly approaches, the first genome to create chromosome-level assemblies based on genetic mapping, the first woody perennial genome to be assembled and annotated, as well as the first metagenomic assembly of associated endophytic bacteria and fungi (Tuskan et al., 2006). Today, v3.2 contains 423 Mb [of the ∼485 Mb genome] in 1,446 scaffolds with 2,585 gaps<sup>1</sup> . The scaffold N50 metric is 8 Mb and contig N50 is 205 kb, with roughly 98% of the genome in 181 scaffolds. Trained on deep RNAseq data from the Gene Atlas project<sup>2</sup> , there are 41,335 predicted loci and 73,013 protein coding transcripts. Methylation maps are available for 10 tissue types, including bud tissue, callus tissue, female catkins, internode explants, male catkins, phloem, xylem, and roots (Vining et al., 2012). Over 28 million single nucleotide polymorphisms (SNPs) from resequenced data representing over 880 native P. trichocarpa genotypes covering the core distribution of the species range are publicly available (Geraldes et al., 2013<sup>3</sup> ). This SNP resource has been used to characterize geographic structure and linkage disequilibrium (Slavov et al., 2012), detect signatures of selection across the genome (Evans et al., 2014) and identify genetic loci associated with various phenotypes based on genome-wide association approaches (Porth et al., 2013; Evans et al., 2014; McKown et al., 2014; Muchero et al., 2015; Fahrenkrog et al., 2017; Liu et al., 2018). The resequenced data is currently being assembled into a Populus pan genome, with preliminary data suggesting that there may be as many as 20,000 additional gene models that are not included in the current P. trichocarpa reference genome annotation (Pinosio et al., 2016). The first draft assemblies of Populus deltoides, 'WV94' and Populus tremula × alba hybrid, '717-1B4,' have been sequenced, assembled and annotated<sup>1</sup> . The assembly for WV94 is approximately 446 Mb in 1,375 scaffolds, with scaffold N50 of 21.7 Mb and contig N50 of 590 kb and there are 44,853 protein coding loci. CRISPR-related PAM (protospacer adjacent motif) sites for P. tremula × alba 717 have been published are on line at http://aspendb.uga.edu/s717. Highly contiguous, de novo genome assemblies have also been produced for P. euphratica (Ma et al., 2014) and P. tremula<sup>4</sup> , which greatly widen the phylogenetic sampling of the genus. Ongoing resource development work in Populus includes efforts to expand the tissue type and experimental conditions in the Gene Atlas, enlarge the number resequenced genotypes from the southern and eastern portions of the range, and release of V4.0 of the P. trichocarpa genome based on long-read

<sup>1</sup>https://phytozome.jgi.doe.gov

<sup>2</sup>https://jgi.doe.gov/doe-jgi-plant-flagship-gene-atlas/

<sup>3</sup>https://cbi.ornl.gov/

<sup>4</sup>http://popgenie.org/aspseq

sequencing and dense genetic maps derived from resequencing 1000 progeny from 49 half-sib families. Dozens of genome-wide association studies continue across a broad array of phenotypes and efforts to estimate the somatic mutation rate found in old-growth P. trichocarpa are currently underway. Methods to integrate SNP, gene expression, transcription factor binding, gene dosage, methylation, metabolite expression, phenome, and co-evolutionary relationships have been developed to generate a systems biology view of the molecular and regulatory interactions that lead to organismal scale traits (Henry et al., 2015; Liu et al., 2015; Joubert et al., 2017, 2018; Weighill et al., 2018). New algorithms are being developed to use signal processing techniques as well as explainable artificial intelligence to build better systems biology models, which include methods to discover and model genome-wide epistasis and pleiotropy. Efforts are also underway to gain a comprehensive view of chromatin structure and transcription factor binding sites (Rao et al., 2014).

#### Eucalyptus

Eucalyptus species and hybrids represent the most widely cultivated hardwood biomass crop globally and Eucalyptus grandis (rose gum) clonal genotype 'BRASUZ1,' was selected as the reference for the genus. Published in 2014, the E. grandis genome was the first for the rosid order Myrtales and for the family Myrtaceae. Key features of the genome included a very high proportion (34%) and number of genes (over 12,500) in tandem duplicate arrays, an ancient (109 Mya) genome-wide duplication event and high diversity of gene families encoding plant specialized metabolites such as phenylpropanoids and terpenes important for plant defense and pharmaceutical uses such as eucalyptus oils (Mewalal et al., 2017). BRASUZ1 was one of the last plant genomes produced exclusively with Sanger technology combined with extensive BAC-end coverage, resulting in a high-quality assembly (currently V2.0<sup>1</sup> ) comprising 691 Mb in 4,943 scaffolds with a scaffold N50 of 57.5 Mb, contig N50 of 67 kb and 94% of the estimated 640 Mb genome (Grattapaglia and Bradshaw, 1994) in 11 pseudomolecules containing 288 scaffolds longer than 50 kb. A total of 36,349 protein coding loci are annotated and the assembly has served as reference for extensive transcriptomics studies targeting the regulation and biosynthesis of lignocellulosic biomass (Mizrachi et al., 2014; Carocha et al., 2015; Hussey et al., 2015; Soler et al., 2015) and in comparative analyses with Arabidopsis (Davin et al., 2016) and Populus (Hefer et al., 2015; Pinard et al., 2015). Other research targets have included terpene biosynthesis (Myburg et al., 2014; Kulheim et al., 2015), reproductive biology (Vining et al., 2015), plant defense (Christie et al., 2015; Mangwanda et al., 2015; Oates et al., 2015; Meyer et al., 2016) and abiotic interactions (Plett et al., 2015; Spokevicius et al., 2017). High-throughput transcriptome sequencing in 100s of segregating interspecific hybrids of E. grandis × E. urophylla have been used to perform the first comprehensive systems genetics analysis of wood development in Eucalyptus (Mizrachi et al., 2017). The E. grandis genome has also served as a reference for whole-genome analysis of the causes of inbreeding depression (Hedrick et al., 2016) and development of a commercial, multi-species 60K SNP genotyping chip tagging 96% of the genome with 1 SNP every 12–20 kb (Silva-Junior et al., 2015). The EuCHIP60K has been a key resource to investigate genome-wide recombination, linkage disequilibrium, and nucleotide diversity (Silva-Junior and Grattapaglia, 2015), carry out genome-wide association (Resende et al., 2016; Muller et al., 2017) and drive genomic selection in different Eucalyptus species (Duran et al., 2017; Resende et al., 2017; Tan et al., 2017, 2018). Eucalyptus transcriptomics resources are available at https://eucgenie.org/. Ongoing and future efforts is focusing on understanding genome diversity and evolution in this species-rich genus and other members of the Myrtaceae (Grattapaglia et al., 2012), including sister genera such as Corymbia for which a genome assembly, soon to be released, revealed conservation but dynamic evolution of terpene genes relative to Eucalyptus (Butler et al., 2018).

## Salix

Whole-genome analysis in willow (Salix spp.), including development of reference genome assemblies, have focused on four species: Salix purpurea L. (Section Helix), S. suchowensis Cheng Section Helix), S. wilsonii Seemen (Section Wilsonianae) and S. viminalis L. (Section Vimen). The genome of a female S. purpurea '94006' was assembled at the J. Craig Venter Institute from Illumina sequence generated at the DOE Joint Genome Institute, including mate-pair library reads, and scaffolds were ordered using a genetic map generated from an F<sup>2</sup> mapping population genotyped using genotyping-by-sequencing at Cornell University<sup>1</sup> , placing ∼70% of the genome (276 Mb) in chromosome-scale pseudomolecules. The S. suchowensis genome was sequenced using a combination of Roche 454 technology and Illumina/HiSeq 2000 reads (Dai et al., 2014). Additional genome sequencing projects are currently underway in S. purpurea, S. wilsonii, and S. viminalis using deep PacBio sequencing, which should result in much more contiguous and complete genomes. Sequence information and gene assemblies for multiple Salix species are now available at TreeGenes<sup>5</sup> .

The genome sequencing projects have revealed three notable differences between the Populus and Salix genomes. First, the Salix genomes are clearly smaller than the Populus genomes. Using 17-mer analysis, the S. suchowensis genome size was estimated at ∼425 Mb (Dai et al., 2014), while the S. purpurea genome was estimated to be ∼379 Mb by 25-mer analysis, both of which are slightly smaller than the estimate of ∼429 and 450 Mb from flow cytometry of each species, respectively. Notwithstanding, all of these estimates are smaller than the genome size of P. trichocarpa (∼485 Mb). The Salix genomes also show evidence of more extensive fractionation following the whole-genome Salicoid duplication that is shared by Salix and Populus (Tuskan et al., 2006), resulting in a reduced number of predicted genes and overall smaller genome size (Dai et al., 2014). Second, it is clear that the large chromosome 1 in Populus corresponds to portions of chromosome 1 and chromosome 16 in Salix, suggesting a series of major chromosomal fission and fusion events occurred after the Salicoid duplication leading to the divergence of these lineages (Berlin et al., 2010; Hou

<sup>5</sup>https://www.treegenesdb.org/

et al., 2016). Finally, the sex determination locus is located on chromosome 15 in Salix, while it is on chromosome 19 in all Populus species studied to date, supporting the hypothesis that there has been greater genome fractionation in Salix following the Salicoid duplication event. Furthermore, most Populus species have an XY sex determination system (Tuskan et al., 2012; Geraldes et al., 2015), while all sequenced Salix species have a ZW system (Hou et al., 2015; Pucholt et al., 2015; Chen et al., 2016).

Given the apparently dynamic nature of sex determination in this dioecious family, rigorous efforts to understand this trait have been undertaken in recent years. Extensive differential gene expression has been documented in shoot tips of S. purpurea, which display an abundance of female-biased gene expression (Carlson et al., 2017). Expression analysis in S. viminalis suggests a predominance of male-biased gene expression, perhaps due to masculinization of the Z chromosome (Pucholt et al., 2017). Analysis of differentially expressed genes in unflushed buds of S. suchowensis identified 806 differentially expressed genes between males and females (Liu et al., 2015).

There are also extensive genetic mapping resources to support efforts to relate genotype to phenotype and accelerate willow breeding using molecular tools. In addition to the F<sup>2</sup> map used to anchor the S. purpurea genome assemblies (Kopp et al., 2002), more than a dozen linkage mapping populations have been developed for S. viminalis (Karp et al., 2011), as well as an association mapping population (Hallingbäck et al., 2016). New mapping populations have also been developed based on an approach of using S. purpurea male '94001' and female '94006' reference individuals as common parents crossed with a number of other species, including S. viminalis, S. suchowensis, S. integra, S. koriyanagi, and S. udensis at Cornell University. At Nanjing Forestry University, an effort has been employed to establish recombinant inbred lines for S. suchowensis, with an inbred F<sup>4</sup> pedigree currently established in the field.

#### Quercus

A highly contiguous haploid genome of a heterozygous pedunculate oak (Quercus robur) was recently generated based on Illumina synthetic long-reads and 454 (Roche) sequences (Plomion et al., 2018). The assembly contains 1,409 scaffolds totaling 814 Mb (N50, 1.34 Mb)<sup>6</sup> . Overall 871 scaffolds, representing 96% of the physical size of the genome, were anchored to the 12 chromosomes. Based on RNA-seq and protein homologies, 25,808 genes were predicted. Two other draft genomes, while of lower quality, have also been published (Sork et al., 2016; Schmid-Siegert et al., 2017). As in Eucalyptus, one of the main features of the oak genome is its remarkably high level of proximal tandem duplication (35.6% of the gene models). The tight relationships between duplicated genes and lineage-specific selection already reported in other species was found to be particularly exacerbated in oak, as three quarters of the genes in expanded orthogroups were also found in tandem duplications. Another interesting feature of long-lived tree species rests in the accumulation of somatic mutations. The presence and transmission of somatic mutations was recently demonstrated in oak (Schmid-Siegert et al., 2017; Plomion et al., 2018), which raises interesting questions related to the role of somatic mutation in the evolution of long-lived species with a potentially high mutational load.

One of the major challenges in plant biology is identifying and characterizing the genes responsible for variation in ecologically and economically important traits. Genetic architecture of traits related to growth, phenology, water metabolism, and defense against pathogens has been explored in different full-sib oak pedigrees (e.g., Brendel et al., 2008) and natural populations (e.g., Alberto et al., 2013). Linkage and association mapping revealed many underlying loci with low to moderate contributions to the trait of interest. The characterization of the oak transcriptome is more recent. Gene repertoires were first constructed (Ueno et al., 2010; Pereira-Leal et al., 2014; Cokus et al., 2015; Lesur et al., 2015) and RNA-seq data were then used to (1) analyze molecular plasticity under abiotic stress (e.g., drought, Spieß et al., 2012) and biotic interactions (e.g., with insects, Kersten et al., 2013) and (2) identify genes and gene networks underlying developmental mechanisms (e.g., bud phenology, Ueno et al., 2013 and acorn development, Miguel et al., 2015) and evolutionary processes including reproductive isolation (Le Provost et al., 2012, 2016) and local adaptation (Gugger et al., 2016a). Recent studies provide evidence for epigenetic variation in phenotypic plasticity and evolutionary processes in oaks (Gugger et al., 2016b), establishing new avenues for research into the role of epigenetics in trait plasticity for these long-lived species.

Shortlisted among the 'botanical horror stories' (Rieseberg et al., 2006), the Quercus genus constitutes an ideal taxon to investigate the dynamics of lineage diversification along a wide and fluid continuum of speciation. Several efforts to illuminate oak evolutionary histories using population genomic approaches have recently emerged (Ortego et al., 2016, 2018; Leroy et al., 2017, 2018). In European white oaks for example, the best-supported scenario of divergence is consistent with a long-term pervasive effect of interspecific gene flow, with the exception of some narrow genomic regions responsible for reproductive isolation. Further work will be required to identify genes targeted by intrinsic or ecological selection, but early attempts appear promising and could enable predictions of the evolutionary responses of oaks to climate change (Gugger et al., 2016a; Rellstab et al., 2016). An exciting perspective in population genomics is presented by allochronic approaches that compare ancient and modern DNA to derive evolutionary changes at targeted genomic regions. Wagner et al. (2018) have suggested ancient DNA could be retrieved from waterlogged archeological or fossil samples, thus enabling the incremental reconstruction of evolutionary trajectories.

# CHESTNUT, PECAN, AND OTHER EARLY DRAFT ANGIOSPERM GENOMES

Other reference genome projects for hardwoods are underway, including efforts to produce reference sequences for chestnut and pecans. The chestnut project aims to produce reference genomes of both Castanea mollissima (Chinese chestnut), the

<sup>6</sup>http://www.oakgenome.fr/

donor of resistance to Cryphonectria parasitica (chestnut blight) and Castanea dentata (American chestnut) the species destroyed by the blight in the early 1900s. The Forest Health Initiative (Nelson et al., 2014) supported a project to sequence the genome of the Chinese chestnut donor parent tree 'Vanuxem' used by The American Chestnut Foundation's (TACF). The first draft of the Chinese chestnut genome assembly was released in January 2014 at: https://www.hardwoodgenomics.org/chinese-chestnutgenome, consisting of 41,260 scaffolds averaging 39.6 kb in length and covering 98% (∼724 Mb) of the genome. Targeted regions of mapped QTL resistance loci from various clones have also been produced (Staton et al., 2015). In addition, work is underway to prepare chromosome-length sequences for the Chinese chestnut genome. Long, single-molecule PacBio data are being used to merge contigs, followed by anchoring of scaffolds to high-density linkage maps for the cv. Vanuxem. Fluorescent in situ hybridization is being used to validate the chromosome-level assemblies and subsequently to identify structural rearrangements among genotypes and species (Islam-Faridi et al., 2016). The Chinese chestnut reference genome is being utilized to develop genomic selection models for blight resistance in American chestnut backcross populations (Westbrook, 2018). The project to sequence the American chestnut has produced a contig assembly, based on PacBio sequencing, of 8.1 Mb contig N50 and efforts are now underway to use new technologies (see below) and genetic mapping to produce chromosome-scale assemblies for other C. mollissima and C. dentata genotypes.

Initial efforts at generating a Carya illinoinensis (pecan) reference genome, released in 2013 and based on Illumina short-read sequencing (Jenkins et al., 2015) resulted in a contig N50 of 6.5 kb and scaffold N50 of 11.2 kb. The Mexican accession '87MX3-2.11' from the USDA-ARS facility in Somerville, Texas was selected as the reference genome based on a reduced heterozygosity. Efforts are underway to generate PacBio-based genomes for pecans that include 87MX3-2 and three production cultivars<sup>7</sup> . The current version of the pre-chromosome assembly for 87MX3-2 covers 705.6 Mb of the genome with a contig N50 of 2.5 Mb with 36,489 genes annotated<sup>1</sup> . Multiple early efforts are also underway to sequenced the European ash (Fraxinus excelsior) and American ash (F. americana) 8,9 , as well as several birch genomes (Betula nana, B. pubescens, and B. pendula) (Wang et al., 2013). Finally, several angiosperm genomes are in early draft form and include species of ash<sup>5</sup> and walnut<sup>9</sup> , as well as several comparative angiosperms from horticultural species<sup>10</sup> .

## A LOOK TO THE NEAR FUTURE

The continued development of new technologies that can be applied to generate high-quality hardwood reference genomes has been fostered by single-molecule, real-time sequencing (SMRT) and related assembly algorithms from PacBio (Eid et al., 2009). These long reads, which average 10–15 kb in length, but are frequently 20–50 kb in length, can be used to resolve complex genomic repeats and improve contiguity of genomes. An early example of a small, inbred, grass genome (Oropetium thomaeum) achieved a contig N50 of 2.4 Mb, while capturing almost the entire genome space (244 out of 245 Mb) (VanBuren et al., 2015). The PacBio long reads were also used to produce a reference for the large, complex sunflower (Helianthus annuus) genome (Badouin et al., 2017) at 3.6 Gb with a contig N50 of 524 kb. An alternative reference to the Zea mays B73 supports using PacBio Iso-Seq sequencing for collecting full-length cDNAs to enhance the reference annotation and a genome (Jiao et al., 2017), which despite large-scale nested transposon activity, was completed at a contig N50 of 1.18 Mb. Because of the low accuracy rate of individual PacBio reads, these reads must be combined with an alternative technology, such as Illumina short reads, to polish the consensus sequence. For hardwood tree genomes, the longer reads facilitate the generation of outbred reference sequences to sort haplotypes and produce a single reference despite high heterozygosity rates (see Q. robur, P. deltoides WV94 and Castanea dentata described above). In addition to long-read sequencing, which has brought us back to the completeness of Sanger-based sequencing, new tools are emerging to achieve accurate chromosome-scale contiguity. These tools for genomic mapping and assembly include improved single-molecule mapping from BioNano (Stanková et al., 2016 ˇ ), in vitro reconstituted chromatin (Putnam et al., 2016) and binned sequencing approaches such as 10x genomics (Coombe et al., 2016). The most promising of these technologies is Hi-C for ordering large contigs from genome assemblies. Originally developed to perform "contact mapping" in human cell lines to show genes adjacent to promoter or regulatory elements (Suhas, 2014), it has been repurposed as a general solution to determining order in genome assemblies (Dudchenko et al., 2017). A recent example, the 4.8 Gb genome of Hordeum vulgare, has shown the ability to use Hi-C to precisely place 95% of contigs on pseudomolecules (Mascher et al., 2017). All of these tools are actively being applied to improve hardwood genomic references as a foundation for accurate population and functional analysis.

These new technologies will enable greater within and across species comparisons. An example of unique tree biology, that can be explored, is the characterization of somatic mutations. Due to their perennial habit, trees can accumulate somatic mutations in alternate vegetative lineages (as noted in the Populus and Quercus sections above). However, the effect of somatic variation on the generation of new genetic variation at the population-level and/or during reproduction remains largely unknown. Since, in plants, germlines are not segregated from the vegetative lineages, somatic mutations can be transmitted to the next generation. The frequency with which a given somatic mutation propagates to the next generation depends both on the overall fitness of the tree and the probability of the somatic sector giving rise to flowers and gametes. Because branches/tissue types do not make equal contributions to the resulting gene pool, somatic mutations may alter the adaptive balance between branches. This

<sup>7</sup>pecantoolbox.nmsu.edu

<sup>8</sup>http://www.ashgenome.org/

<sup>9</sup>http://hardwoodgenomics.org

<sup>10</sup>https://www.rosaceae.org/

mosaic genetic architecture raises the possibility that selection acts simultaneously on both the branch and tree (Hadany, 2001; Clarke, 2011). Understanding the role of multi-level selection within a single tree will require answering several questions. That is, to what extent is the effect of a somatic mutation in a single branch shared with the entire tree (Folse and Roughgarden, 2011)? And, if a branch acquires a broadly-beneficial mutation, such as one conferring resistance to herbivory (e.g., Edwards et al., 1990), does the resistant branch gain a larger fitness benefit than the remainder of the tree? Conversely, can a mutation increase the fitness of a single branch despite imposing cost on the rest of the tree? For example, a mutation that increases flowering in a branch might increase seed production from that branch while depressing production from the tree as a whole by acting as a metabolic sink (Walbot, 1985). Finally, should these conflicts occur, do trees have mechanisms to mediate resource allocation among branches, and, if so, are these mechanisms common between independent tree lineages? New techniques (described above) for finding or constructing whole-genome genetic mosaic and chimeric trees with phenotypically-relevant somatic mutations are now allowing these questions to be addressed.

In summary, the assembly and annotation of multiple hardwood tree genomes has facilitated an increase in (1) functional characterization of genes and gene networks related to tree habit, (2) GWAS and genomic selection investigations and applications, and (3) comparative genomic efforts among various tree species. The rapidly expanding new technologies will add even greater number of hardwood species to these efforts. The power of comparative genomics will increase our understanding of how these highly dynamic genomes have evolved and resulted in the amazing array of phenotypic diversity

#### REFERENCES


found among and within hardwood species. To broaden the raft of available hardwood genomes, resources should be directed toward additional candidate hardwood genera, including but is not limited to Liriodendron, Liquidambar, Swietenia, and Acer. In sum total, most hardwoods are undomesticated long-lived plants that provide many ecological and commercial benefits, whose management, conservation and domestication for economic and ecological purposes will benefit from a set of rich genomics resources. Ultimately such resources will favorably impact the pressing problems of climate change, soil and water conservation, bioenergy and biomaterials production, and maintenance of heathy ecosystem functions. As a result, we may finally answer the questions of: (1) why has the tree habit evolved repeatedly in the angiosperms and (2) what is the connection between the genomes and the defining characteristics of long-lived perennial plants?

#### AUTHOR CONTRIBUTIONS

All authors contributed equally to the formatting and writing contained in this Mini-review.

#### FUNDING

Funding for the pecan genome was supported by grant USDA2016-51181-25408. The work conducted by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02- 05CH11231.





methods for a keystone forest tree species: oak. BMC Genomics 11:650. doi: 10.1186/1471-2164-11-650


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Tuskan, Groover, Schmutz, DiFazio, Myburg, Grattapaglia, Smart, Yin, Aury, Kremer, Leroy, Le Provost, Plomion, Carlson, Randall, Westbrook, Grimwood, Muchero, Jacobson and Michener. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Road to Resistance in Forest Trees

*Sanushka Naidoo1 \*, Bernard Slippers1 , Jonathan M. Plett2 , Donovin Coles2 and Caryn N. Oates1*

*1Division of Genetics, Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, South Africa, 2 Hawkesbury Institute for the Environment, Western Sydney University, Richmond, NSW, Australia*

In recent years, forests have been exposed to an unprecedented rise in pests and pathogens. This, coupled with the added challenge of climate change, renders forest plantation stock vulnerable to attack and severely limits productivity. Genotypes resistant to such biotic challenges are desired in plantation forestry to reduce losses. Conventional breeding has been a main avenue to obtain resistant genotypes. More recently, genetic engineering has become a viable approach to develop resistance against pests and pathogens in forest trees. Tree genomic resources have contributed to advancements in both these approaches. Genome-wide association studies and genomic selection in tree populations have accelerated breeding tools while integration of various levels of omics information facilitates the selection of candidate genes for genetic engineering. Furthermore, tree associations with non-pathogenic endophytic and subterranean microbes play a critical role in plant health and may be engineered in forest trees to improve resistance in the future. We look at recent studies in forest trees describing defense mechanisms using such approaches and propose the way forward to developing superior genotypes with enhanced resistance against biotic stress.

#### *Edited by:*

*Steven Henry Strauss, Oregon State University, United States*

#### *Reviewed by:*

*Armand Seguin, Canadian Forest Service, Canada Richard Buggs, Queen Mary University of London, United Kingdom John M. Davis, University of Florida, United States*

*\*Correspondence: Sanushka Naidoo sanushka.naidoo@fabi.up.ac.za*

#### *Specialty section:*

*This article was submitted to Plant Biotechnology, a section of the journal Frontiers in Plant Science*

*Received: 07 August 2018 Accepted: 19 February 2019 Published: 29 March 2019*

#### *Citation:*

*Naidoo S, Slippers B, Plett JM, Coles D and Oates CN (2019) The Road to Resistance in Forest Trees. Front. Plant Sci. 10:273. doi: 10.3389/fpls.2019.00273*

#### Keywords: genomics, breeding, candidate genes, endophytes, genetic engineering

# INTRODUCTION

Forest trees in their native range are introduced and domesticated as plantation stock encounter various biotic and abiotic factors that influence their persistence and productivity. In recent years, introduced pests and pathogens have been increasing at an alarming rate (Santini et al., 2013; Wingfield et al., 2015; Hurley et al., 2016). In some cases, these invasions threaten tree species with local extinction, such as the fungal pathogen *Cryphonectria parasitica*, that is responsible for the devastating chestnut blight that has all but eliminated the American chestnut from North America and led to novel transgenic strategies to save this tree (Newhouse et al., 2014). In recent years, *Austropuccinia psidii* has been introduced into various regions of the world and threaten some Myrtaceae with extinction (Roux et al., 2013; Granados et al., 2017; McTaggart et al., 2017). Additionally, various pine plantations have suffered losses from pitch canker disease caused by *Fusarium circinatum* (Gordon and Reynolds, 2017). Invasive pests can also cause major loss to forestry operations. For example, introduced *Eucalyptus* plantations have experienced devastation by the gall wasp, *Leptocybe invasa* (Dittrich-Schröder et al., 2012). At the same time, global climate change will exacerbate disease and pest incidence (Sturrock et al., 2011; Das et al., 2016), as is the case with the devastating outbreaks of the mountain pine beetle (Six et al., 2018). The rapid increase in both biotic stress factors in many forests and forest plantations require a matching increase in options to mitigate their impact.

Plant defenses are complex and multiple signaling pathways may be induced following recognition of the attacker through effectors or molecular patterns specific to the invader (Jones and Dangl, 2006; reviewed in Naidoo et al., 2014). Resistance genes interact with effectors to induce effector-triggered immunity (ETI), accompanied by a hypersensitive response and containment of the invader. In the absence of such specific pathogen recognition, a basal level of defense may be induced during pathogenesis, although this may be too weak to fight off the invading microbe. Plant biotic stress interactions have been dissected in model herbaceous plants and have extended our knowledge of the sophisticated defense systems to different invaders. A challenge in forest trees is that the underlying resistance mechanisms are not always the same as described or predicted in model systems necessitating study of defense pathways and strategies utilized by tree species directly. Resistance may be polygenic (quantitative) or the result of a single major gene (qualitative; reviewed in Kovalchuk et al., 2013). The latter type of resistance may become ineffective over time due to rapidly evolving pathogens. Durable resistance that persists over time and across environments is needed in long-lived forest trees.

Long-term solutions to biotic stress interactions are clearly needed, and an important component is the inclusion of strategies that utilize plant-encoded genetic resistance to a particular stress. Genomic resources for forest trees have supported tree breeding and selection efforts in the form of genome-wide association studies (GWAS) and genomic selection (GS). In addition, genomic resources have improved our ability to infer defense pathways and genes important for producing superior forest trees with resilience against biotic stress. In this minireview, we will highlight recent discoveries in tree systems employing such approaches and propose a way forward for the insightful selection of candidate genes for genetic engineering and improvement of trees.

#### Breeding for Resistance

Conventional tree breeding, based on phenotype selection, has been a successful means of attaining resistance against pests and pathogens (**Figure 1A**). Sniezko and Koch (2017) review some examples of successful resistance breeding programs around the world. An example from Africa is the hybridization of *Eucalyptus grandis* with *Eucalyptus urophylla* which has improved resistance against the fungal pathogen, *Chrysoporthe austroafricana* (Wingfield, 2003).

Several resistance loci have been identified in forest trees, e.g. *Ppr1* (*Puccinia psidii* resistance 1, Junghans et al., 2003) and Fusiform rust based on mapping using molecular markers. Despite this, the marker-aided selection (MAS) approach in forest systems was limited as most studies reported quantitative trait loci (QTL) discovery but were not validated for the application of MAS itself (reviewed in Sniezko and Koch, 2017) and has recently been superseded by genomic selection (GS). Grattapaglia et al. (2018) provide a critical review of GS in this special issue. In GS, associations are made between phenotypes and genotypes of a structured training population. A predictive model is made incorporating as many desirable traits as possible. The model is refined based on additional populations and used to predict desired phenotypes based on genotypic information only (Isik, 2014).

In GWAS, associations are made between the phenotype and the genotype in a structured breeding population (Isik, 2014). It is imperative that the associations are validated in a second population. The method relies on a high density of markers and a large discovery population; however, owing to limited genome information, the former is not always possible. In the case of white pine blister rust, a preliminary analysis using a small set of single-nucleotide polymorphism (SNP) markers was undertaken to identify partial resistance to the disease in *Pinus lambertiana* (Vázquez-Lobo et al., 2017). Significant associations were shown for four of the SNP markers. In Norway spruce, exon-capture sequencing was applied genome wide to identify genes associated with susceptibility to *Heterobasidion parviporum* (Mukrimin et al., 2018). Muchero et al. (2018) found *Populus trichocarpa* loci associated with the response to the fungus, *Sphaerulina musiva,* based on genome resequencing. Three loci were associated with resistance and one, encoding a G-type D-mannose–binding receptor-like kinase, was associated with susceptibility. This example demonstrates that candidate defense or susceptibility genes can be identified in a GWAS. In the case of SNP markers, candidate genes within the genomic loci associated with resistance may be inferred based on genome sequence and further information regarding the relevance of the gene during a biotic stress challenge can be determined using functional genomics, e.g. transcriptomics or proteomics. If genes in these associated regions are differentially expressed under the biotic stress challenge, it is possible that they may be contributing to the resistance phenotype.

The approaches can be even more sophisticated using regional heritability mapping (RHM). RHM uncovers variance not accounted for in GWAS as the approach examines short segments of the genome combining the effects of rare and common SNP variants between individuals to estimate the trait variance explained by such regions (Nagamine et al., 2012). Resende et al. (2017) compared RHM and GWAS approaches in *Eucalyptus* for trait mapping of wood and *Puccinia psidii* disease phenotypes. RHM was considered superior revealing more genomic regions that could be interrogated for underlying candidate defense genes. Additionally, the authors suggest that the genomic architecture revealed by the RHM-QTLs could be used to enhance genomic selection models.

An exciting prospect in forest trees would be to follow the approach of Thoen et al. (2017) who performed a GWAS in *Arabidopsis* to 11 different stresses separately and examined combinations of biotic and abiotic stress responses as a phenotype to explore the genetic architecture that underlies plant immunity. Contrasting and overlapping SNP associations were identified in biotic and abiotic stress combinations and it would be interesting

next steps are to test the function of such genes in model systems, and this information is updated in the database. Successful candidates conferring a degree of resistance can be introduced into the forest tree of interest with intense laboratory and field testing (blue trees). The transgenic tree conferring resistance may be bred into various clones selected through breeding programs to pyramid resistance.

to see if similar patterns are apparent in forest trees for which high-density SNP markers are available, e.g. *Eucalyptus*.

## Genetic Engineering Tree Resistance

In the absence of natural genetic variation for resistance against a particular biotic stress, genetic engineering may become necessary. The steps toward engineering resistance include gene discovery, candidate gene selection, transformation, and testing (**Figure 1B**). Gene discovery has been aided by omics approaches in forest trees.

#### Comparative Genomics

A trend observed in the recent genomes of tree species suggests an increase in the number of NLR genes in comparison to herbaceous plants. This may be attributed to their increased exposure to more pathogens during their longer lifetime (Yang et al., 2008; Neale et al., 2017). Specific pathogenesis-related genes were also expanded in different forest tree species such as *Pinus tecnumanii* compared to other sequenced genomes (Visser et al., 2018).

#### Transcriptomics and Proteomics

Host-pathogen interaction studies have benefitted from the use of transcriptomic approaches and have gained popularity in tree species where comparisons are made between species with disparate tolerance to pathogens (Mangwanda et al., 2015; Meyer et al., 2016). The dual-RNA sequencing approach allows for the intimate study of pathogen and pest as transcripts in both organisms can be captured simultaneously (Naidoo et al., 2017). Host tree responses including defense mechanisms and the corresponding virulence mechanisms in the pathogen can be identified. Some examples are studies on *Notholithocarpus densiflorus* and the oomycete, *Phytophthora ramorum* (Hayden et al., 2014), *Populus* and the fungal pathogen, *Septoria musiva* (Liang et al., 2014), *E. grandis* and *Chrysoporthe austroafricana* as well as *Eucalyptus nitens* and *Phytophthora cinnamomi* (Mangwanda et al., 2015; Meyer et al., 2016). Proteomics offer another level of functional relevance to genes involved in plant defense and iTRAQ (Isobaric Tags for Relative and Absolute Quantitation) was applied in *Eucalyptus* challenged with *C. austroafricana* and *Calonectria pseudoreteaudii* (Chen et al., 2015; Zwart et al., 2017). In the case of the interaction with *C. austroafricana*, proteins specific to cell death, salicylic acid signaling and systemic resistance were more apparent compared to the transcriptomic study (Mangwanda et al., 2015). Thus, proteomics may provide novel insight into plant-microbe interactions as there is not a one-to-one relationship between transcript levels and protein abundance.

#### Metagenomics

Metagenomic studies in forest trees have typically found thousands of taxa associated with individual trees from tropical forests to the boreal region and have linked the importance of these microbiomes to tree health (e.g. Kemler et al., 2013; Kembel and Mueller, 2014; Hacquard and Schadt, 2015; Agler et al., 2016; Jakuschkin et al., 2016; Vivas et al., 2017; Bullington et al., 2018). Above- and below-ground endophytic microbes can contribute to host immunity through multiple direct and indirect (*via* the host) interactions. Indirectly, endophytes can impact pathogens and pests through changes in the host physiology that affect growth or priming of the immune response *via* induced systemic resistance (ISR) and plant volatile production (Hardoim et al., 2015; Brader et al., 2017). Endophytic microbes can also directly impact pathogens and pests directly through competition, exclusion, and antibiosis (Agler et al., 2016; Bamisile et al., 2018). A further mechanism by which endophytic microbes, especially mycorrhizal fungi, may affect plant pathogens and pests is through altered nutritional status of the plant (Gange et al., 2005; Lehr et al., 2007; Pfabel et al., 2012; Kaling et al., 2018). In future, therefore, we need to improve the depth of our understanding on how mutualistic colonization of tree tissues impact pest and pathogen interactions by combining a range of omic approaches to understand the molecular mechanisms at play.

Advances in our knowledge of mechanisms by which beneficial microbes play a role in plant health as well as in our understanding of holistic plant systems are opening up new avenues whereby we can begin to consider actively selecting tree genotypes that specifically foster microbiomes to improve plant health (Mueller and Sachs, 2015). Past research has demonstrated that it is possible to control various aspects of the root and/or foliar microbiome through genetic manipulation of different metabolic pathways within the tree through single-gene alterations (Beckers et al., 2016; Veach et al., 2018). While these examples are a proof of concept that minute genomic changes to a tree's genome can have significant impacts on microbial populations, the question now remains as to whether we can intentionally breed or engineer tree genotypes that specifically target and foster/ repress the growth of specific microbes within plant tissues. While much research is needed within this area for tree species of economic importance, advances in this area have been made in agricultural crops (Besserer et al., 2006; Fierer et al., 2007; Carvalhais et al., 2015). Should similar pathways be targeted in tree systems we might be able to similarly improve tree health through naturally present microbial populations.

#### Metabolomics

Metabolomics in forest tree-biotic stress interactions have been limited to the study of specific secondary metabolites, e.g. terpenes (Oates et al., 2015; Naidoo et al., 2018), using methods such gas or liquid chromatography coupled to mass spectrometry (GC-MS and LC-MS, respectively) and near-infrared reflectance (NIR) technologies. Metabolomics in European common ash, *Fraxinus excelsior*, revealed different profiles associated with resistance and tolerance to the fungus, *Hymenoscyphus fraxineus* (Sambles et al., 2017). Where possible, combining different approaches, e.g. transcriptomics and metabolomics, enhances the selection of pathways implicated in defense. This was applied in pedunculate oak (Kersten et al., 2013). Oaks resistant and susceptible to the leaf roll miner herbivore had specific volatile profiles and susceptible trees produced a volatile blend more attractive for the pest. At the transcriptome level, resistance was attributed to constitutive rather than induced defenses. Plant-plant interactions play an important role in defense signaling against pathogens (for a recent review, please see Subrahmaniam et al., 2018 who describe candidate genes defining plant-plant interactions as cell wall modifications and defense pathways). This phenomenon has been studied in trees to a limited extent as indirect defenses. Indirect defenses involve the release of volatile compounds that attract natural enemies of the pest or signal neighboring plants of the threat (Unsicker et al., 2009). This has been demonstrated in elm (*Ulmus minor*) responses to the elm leaf beetle, *Xanthogaleruca luteola,* where oviposition altered volatile profiles that attracted the specialist egg parasitoid, *Oomyzus gallerucae* (Meiners and Hilker, 2000). *Eucalyptus* also uses these signals to prime neighboring plants, e.g. feeding by *Ctenarytaina eucalypti* induced volatile responses which increased defense-related compounds in unwounded plants (Troncoso et al., 2012)*.* Oviposition or herbivory-induced chemical cues released by plants have been shown to serve as reliable indicators of the presence of prey to predators and parasitoids of the pest (Hilker et al., 2002; Bruinsma et al., 2009).

While these previous applications have proceeded at a rapid rate in forestry research, an area that still lags behind is phenomics. Phenomics involves the high-throughput capture of the phenotypes of an organism, influenced by its genotype and genotype × environment. Phenomics informs the genotypephenotype map (reviewed in Houle et al., 2010) and new imaging technologies are being pursued to achieve this goal. Ludovisi et al. (2017) describe the use of unmanned aerial vehicle type of remote sensing and imaging to enable high-throughput field phenotyping. The authors applied this to black poplar populations to capture precise phenotypes for drought responses. These techniques are equally promising for biotic stress phenotyping to accelerate the development of superior genotypes.

With the generation of multiple levels of omics data, systems biology views of organisms become possible. This provides a framework for linking molecular interactions with complex traits (Mizrachi et al., 2017). The integration of information from different experiments can be used to generate a network modeling the dynamics and complexity of a biological system. The development of such models is iterative and is used to refine new models. Studies in *Arabidopsis* show the power of this approach to uncover key mechanisms important in plant defense (Mukhtar et al., 2011). A related approach is systems genetics (also known as genetical genomics) which incorporates the genetic variation of organisms with the systems level phenotyping to dissect complex traits (Jansen and Nap, 2001). Tree breeding programs provide the structured populations for such types of studies (**Figure 1B**). Molecular mechanisms underlying wood properties in *Eucalyptus* were characterized in this manner (Mizrachi et al., 2017), with systems genetics studies involving tree biotic stress interactions not far behind.

## SELECTION OF CANDIDATE DEFENSE GENES

The wealth of omics data that is being generated for trees during pest and pathogen challenge requires concomitant development of resources to promote access by the scientific community. User-friendly databases are increasingly becoming available to analyze and visualize omics data in woody perennials, which can assist with identifying key genes and pathways for resistance. The TreeGenes database (Wegrzyn et al., 20081 ) is a forest tree genomics resource for integration and analysis of genome sequences, transcriptomes, genetic maps, molecular markers, and phenotypic data for 1,749 species. The Plant Genome Integrative Explorer (Sundell et al., 20152 ) provides access to genomic and transcriptomic data for *Poplar*, *Arabidopsis*, *Eucalyptus*, and conifer. PlantGenIE contains tools for expression analyses and visualization of gene co-expression networks within and across forest species (Netotea et al., 2014). Gramene (Tello-Ruiz et al., 20163 ) contains the plant reactome database for analysis of plant metabolic and regulatory pathways. MorphDB (Zwaenepoel et al., 20184 ) assists in identifying missing genes in pathways and regulators such as transcription factors or other signaling genes. Such approaches can place the potential resistance genes in a functional context relative to known resistance genes and thus assist in identifying and prioritizing candidates (Rhee and Mutwil, 2014). Flowing out of such databases should be the development of a predictive network model to understand the potential functional role of genes within a pathway (**Figure 1B**). Mewalal et al. (2014) provides an insightful review on identifying and prioritizing genes using omic approaches to assist new hypothesis-driven experiments. Once candidate genes are prioritized and selected, they are tested in model systems for a disease or pest tolerant phenotype. This has been demonstrated for *P. trichocarpa* whereby overexpression of a salicylic acid-inducible gene, *PtrWRKY73,* in *Arabidopsis* increased resistance to *Pseudomonas syringae* (biotroph) but reduced resistance to *Botrytis cinerea* (necrotroph) (Duan et al., 2015). Another study made use of *Nicotiana tabacum* overexpressing antimicrobial protein Sp-AMP2 (PR-19) from *Pinus sylvestris* L., which enhanced resistance to *B. cinerea* (Jaber et al., 2017). Following successful or unsuccessful demonstration of a candidate gene's function, the model can be refined followed by repeated testing until a desired phenotype is observed (**Figure 1B**). Then introduction of the gene into the desired forest tree species is pursued *via* genetic transformation. Candidate genes may be overexpressed, knocked down, or knocked out. Jiang et al. (2017) utilized CRISPR/Cas9 knockouts of candidate *WRKY* transcription factors to unravel regulatory mechanisms in *Populus* during interaction with *Melampsora* rust. Once successful transgenic lines are obtained, extensive functional testing under different conditions is made to determine if the candidate gene confers biotic stress resistance. One limitation of producing transgenic trees is that transformation may be optimized for specific clonal backgrounds. This can be circumvented by incorporating transgenic trees as parents in breeding programs to transfer the desired resistant phenotype to other genetic backgrounds (**Figure 1**).

## ADVANCES TOWARD ENGINEERING RESISTANCE

Forest biotechnology companies have been developing transgenic forest tree species, *Populus* and *Eucalyptus*, to enhance biomass and resistance to stresses. ArborGen Inc. has developed transgenic freeze-tolerant *Eucalyptus* (Zhang et al., 2012) and SweTree Technologies have developed transgenic hybrid aspen (*Populus tremula* × *Populus tremuloides*) with enhanced growth properties (Eriksson et al., 2006). FuturaGene Ltd. has developed and commercialized the first genetically modified *Eucalyptus* tree with 20% more biomass ("Brazil approves transgenic eucalyptus," 2015) with developments to commercialize diseaseresistant trees (Avisar et al., 2013) underway. Additionally, genome editing holds great promise for accelerated breeding in forestry (Bewg et al., 2018). Therefore, once we identify genes or metabolic pathways crucial to disease resistance, we will be able to quickly manipulate these pathways and produce plants that are available to industry in a timelier manner than ever before.

# CONCLUSION

With the unprecedented increase in forest pests and pathogens, the two avenues to generate resistant trees are necessary. Breeding and genetic engineering could be combined to accelerate achieving the goal of resistance in forestry. Genomic tools are enhancing our rate of gene discovery in forest trees; however, the vast amounts of data must be more centralized and accessible to identify candidate genes for testing. As is the trend in model systems, tree immunity should also be increasingly viewed from a holistic, systems-based perspective that incorporates the complexity of the biotic interactions outside, within and below the tree. This includes the potential for direct manipulation of host-associated microbes for tree resistance, as well as the recognition that genetic engineering of the host impacts the microbiome with potentially important consequences for tree resistance. A concerted effort has to be made to make engineering resistance a reality in forestry.

<sup>1</sup> https://treegenesdb.org/Drupal/

<sup>2</sup> http://plantgenie.org/

<sup>3</sup> http://www.gramene.org/

<sup>4</sup> http://bioinformatics.psb.ugent.be/webtools/morphdb/morphDB/index/

#### AUTHOR CONTRIBUTIONS

SN determined the scope of the review. SN, BS, JMP, CNO and DC wrote different sections of the review and contributed equally. SN and CNO designed the figure and all authors contributed to final edits.

#### FUNDING

SN is supported through the Department of Science and Technology grant for Forest Genomics and Biotechnology, the South African

#### REFERENCES


National Research Foundation Grant for Y-rated researchers (UID105767) Incentive funding for rated researchers (UID95807) and Technology and Human Resources for Industry Programme (THRIP, Grant ID 96413). Opinions expressed and conclusion arrived at are those of the author(s) and are not necessarily to be attributed to the NRF. JMP's research is funded by the Australian Research Council (DP160102684).

#### ACKNOWLEDGMENTS

We thank the three reviewers for their constructive comments.


Jones, J. D. G., and Dangl, J. L. (2006). The plant immune system. *Nature* 444, 323–329. doi: 10.1038/nature05286


variants of norway spruce genes associated with susceptibility to *heterobasidion parviporum* infection. *Front. Plant Sci.* 9, 1–13. doi: 10.3389/fpls.2018.00793


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Naidoo, Slippers, Plett, Coles and Oates. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Cold Hardiness in Trees: A Mini-Review

#### Michael Wisniewski<sup>1</sup> \*, Annette Nassuth<sup>2</sup> \* and Rajeev Arora<sup>3</sup> \*

<sup>1</sup> United States Department of Agriculture – Agricultural Research Service, Kearneysville, WV, United States, <sup>2</sup> Department of Molecular and Cellular Biology, University of Guelph, Ontario, ON, Canada, <sup>3</sup> Department of Horticulture, Iowa State University, Ames, IA, United States

Significant advances have been made in our understanding of the regulation of cold hardiness. The existence of numerous biophysical and biochemical adaptive mechanisms in perennial woody plants and the complexity their regulation has made the development of methods for managing and improving cold hardiness in perennial woody plants has been very difficult. This may be partially attributed to viewing cold hardiness as a single dimensional response, rather than as a complex phenomenon, involving different mechanisms (avoidance and tolerance), different stages (mid-winter vs. late winter), and having an intimate overlap with the genetic regulation of dormancy. In particular separating the molecular regulation of cold hardiness from growth processes has been challenging. ICE and C-repeat binding factor (CBF), transcription factors (Inducer of CBF expression and CRT-binding factor) have been shown to be an important aspect in the regulation of cold-induced gene expression. Evidence has emerged, however, that they are also intimately involved in the regulation of growth, flowering, dormancy, and stomatal development. This evidence includes the presence of CBF binding motifs in genes regulating these processes, or through cross-talk between the pathways that regulate them. Recent changes in climate that have resulted in erratic episodes of unseasonal warming followed by more seasonal patterns of low temperatures has also highlighted the need to better understand the genetic and molecular regulation of deacclimation, a topic of research that is only more recently being addressed. Environmentally-induced epigenetic regulation of stress responses and seasonal processes such as cold acclimation, deacclimation, and dormancy have been documented but are still poorly understood. Advances in the ability to efficiently generate large DNA and RNA datasets and genetic transformation technologies have greatly increased our ability to explore the regulation of gene expression and explore genetic diversity. Greater knowledge of the interplay between epigenetic and genetic regulation of cold hardiness, along with the application of advanced genetic analyses, such as genome-wide-association-studies (GWAS), are needed to develop strategies for addressing the complex processes associated with cold hardiness in woody plants. A cautionary note is also indicated regarding the time-scale needed to examine and interpret plant response to freezing temperatures if progress is to be made in developing effective approaches for manipulating and improving cold hardiness.

Keywords: freezing tolerance, ice nucleation, cold acclimation, deacclimation, dormancy, C-repeat binding factor (CBF), DAM genes, antifreeze protein (AFP)

#### Edited by:

Steven Henry Strauss, Oregon State University, United States

#### Reviewed by:

David Horvath, Red River Valley Agricultural Research Center, Agricultural Research Service (USDA), United States Sofia Valenzuela, Universidad de Concepción, Chile

#### \*Correspondence:

Michael Wisniewski michael.wisniewski@ars.usda.gov Annette Nassuth anassuth@uoguelph.ca Rajeev Arora rarora@iastate.edu

#### Specialty section:

This article was submitted to Plant Biotechnology, a section of the journal Frontiers in Plant Science

Received: 19 July 2018 Accepted: 03 September 2018 Published: 20 September 2018

#### Citation:

Wisniewski M, Nassuth A and Arora R (2018) Cold Hardiness in Trees: A Mini-Review. Front. Plant Sci. 9:1394. doi: 10.3389/fpls.2018.01394

# INTRODUCTION

fpls-09-01394 September 19, 2018 Time: 17:9 # 2

Ever since the first microscopic observations of the freezing response of cells were made in the latter part of the 19th century and early 20th century (Molisch, 1897; Wiegand, 1906), and it was discovered that plant cells undergo cytorrhysis rather than plasmolysis in response to freezing, an elusive search has been conducted to develop a complete and integrated understanding of cold hardiness and freezing tolerance in plants (Wisniewski et al., 2003; Gusta and Wisniewski, 2013; Arora, 2018). Despite thousands of reports and countless reviews, reliable approaches to improving freezing tolerance, without affecting other aspects of plant development, have yet to be developed, either at the molecular/genetic level or the physiological level. New technologies have allowed us to understand plant response to low temperatures in greater and greater detail, but the picture has greatly increased in complexity.

The lack of progress may be partially attributable to two factors. One factor is interpreting cold hardiness as a singular on/off response rather than a combination of many diverse mechanisms that involve significant structural, biochemical, and genetic adjustments, as well as the complexity of manipulating cold hardiness without having a negative impact on other plant developmental processes. The characteristics of these components are species-specific (often genotype-specific), potentially under separate genetic control. Therefore, it is essential when investigating plant cold hardiness to be cognizant of what aspect of the process is being studied and its potential impact on the aspect of cold hardiness that is deemed to be critical for survival. The second factor, is related to the difficulty of studying the biology of organisms at low temperatures, where the kinetics of reactions, and the time required for processes to reach an equilibrium can be problematic when conducting experiments. As noted in Gusta and Wisniewski (2013), the admonition made by Felix Franks in his book on the biophysics of water at low temperature (Franks, 1985), is very relevant. "Too frequently experimental observations on highly complex systems are based on measurements performed under nonequilibrium conditions and rationalized in terms of elementary textbook science. The degree of undercooling (mostly presented using the incorrect terminology, supercooling), the mechanism of ice nucleation, the growth and type of crystals, their size and distribution, the flow properties of the unfrozen matrix, and longterm effects of aging, all need to be taken into account." The book published by Franks, 1985 still serves as an invaluable primer on the low temperature biology.

Cold hardiness adaptations in plants have been divided into two general categories, tolerance and avoidance. The former involves transcriptomic reprogramming and a host of subsequent biochemical changes that allow plants to tolerate freezing temperatures and the presence of ice in their tissues, while the latter involves mechanisms that allows pockets of water to remain undercooled (deep supercooling) to very low, sub-zero temperatures (−20 to −40◦C), so that the supercooled cells are not exposed to the dehydrative effects associated with a freeze tolerance response (often referred to as extracellular freezing). Deep supercooling is characteristic of the dormant buds of many woody perennials and the xylem parenchyma cells of many temperate tree species. The terms freeze tolerance and avoidance, however, are somewhat inaccurate, though widely used, as in both cases cells are avoiding freezing. In the case of freezing tolerance, this is accomplished by the loss of cellular water to extracellular ice, which then decreases the freezing point of the cytoplasm. In the second case, water is not relocated to sites of extracellular ice, even though extracellular ice is present, but instead remains in a metastable condition, and prone to "flash" intracellular freezing (Fujikawa et al., 2009; Wisniewski et al., 2014b). Processes relevant to these strategies are ice nucleation and propagation (Wisniewski et al., 2009, 2014b), the ability to specifically determine where ice crystals are initiated in plant tissues and what shape they form as they grow (McCully et al., 2004), and the formation of cryoprotective and antifreeze compounds (Duman and Wisniewski, 2014).

Despite the complexity of plant cold hardiness, considerable progress has been made in understanding the various components that comprise cold hardiness (Gusta and Wisniewski, 2013). This mini-review highlights one area where considerable progress has been made in understanding the genetic regulation of cold acclimation, and another topic, deacclimation, that is deserving of considerable more focus due to the erratic patterns of warming and cooling temperatures that have developed in the context of climate change. These highly variable weather patterns have had a major impact on dormancy, cold acclimation, and chilling requirements.

# THE MOLECULAR REGULATION OF PLANT COLD HARDINESS

Plants cannot move but rather must adapt to a stressful environment. Genes encoding transcription factors in the model plant Arabidopsis constitute 6–10% of their genome, compared to 5% in humans, and it is therefore not surprising that adaptation to stresses in plants includes a dramatic change in transcriptional cascades (Pireyre and Burow, 2015). The C-repeat binding factor (CBF), transcription factor pathway has been demonstrated to play an exceptionally important role in plant cold acclimation, a process in which low temperatures lead to biochemical and physiological changes that confer freezing tolerance. These changes are largely associated with the expression of so-called Cold Responsive (COR) genes. In Arabidopsis, two or three CBFs co-regulate, often with other transcription factors, more than two-thirds of COR genes (Shi et al., 2017). The increase in frost tolerance under ambient conditions that has been demonstrated to occur in many plants as a result of CBF overexpression, and the decrease in frost tolerance in CBF triple mutants (Jia et al., 2016; Zhao et al., 2016), further underscores the importance of CBF genes. CBF overexpression in herbaceous and tree species, however, can also reduce growth and induce dormancy (Wisniewski et al., 2011, 2014a), thus it is not surprising that CBF activity is tightly regulated and exhibits only short periods of elevated presence in an active form. This regulation occurs at various levels, including

transcriptional (transcript quantity and variant), translational (protein quantity), and post-translational (protein activity), and can have an immediate effect, because many changes are made to pre-existing molecules. Detailed insights into the regulation of the CBF pathway in Arabidopsis has only recently been emerging (**Figure 1**), while limited information for other plant species suggests they have similar but unique regulatory processes of their own. An overview of the information on Arabidopsis, and woody plants when available, is presented in the current minireview.

Low temperature-induced chromatin modification provides physical access to certain genes and allows their transcription. This apparently includes access to and activation of COR genes by CBFs (Park et al., 2018). CBF gene expression itself is regulated by a large number of transcription factors (Shi et al., 2018). Transcriptional activators include ICE1 and 2 (Inducer of CBF expression 1 and 2), CAMTA 3 and 5 (Calmodulin binding transcription activator 3 and 5), CESTA, BZR1 (Brassinazole-resistant 1), and CCA1/LHY1 (circadian clock-associated 1/late elongated hypocotyl). In contrast, CBF transcription repressors include MYB15, EIN3 (Ethylene insensitive 3), PIF3 (phytochrome-interacting factor 3), and PIF4/7. The activity of these transcription factors is modulated by low temperature, light, and/or a circadian clock in such a way that it results in CBF expression only at specific times during constant low-temperature conditions (**Figure 1**). Each transduction most likely also involves one or more hormones (Eremina et al., 2016; Barrero-Gill and Salinas, 2017; Li et al., 2017b; Zhou et al., 2017), whereby CBFs also affect hormone levels (Li et al., 2017c), but details are currently relatively sparse. The fact that a gradual or rapid decrease in temperature has slightly different effects make the CBF pathway even more complicated (Kidokoro et al., 2017).

In addition to transcript levels, the type of transcripts can also be altered in response to a cold period. Approximately 60% of intron-containing genes in Arabidopsis were reported to undergo alternative splicing (Marquez et al., 2012), especially under stress conditions. Recent RNAseq analysis for Arabidopsis identified the often rapid cold induction of alternative splicing (AS) of over 2,400 genes, with over 1,600 regulated only at the AS level and therefore not detected in most previous analyses (Calixto et al., 2018). CBF genes do not have introns, thus AS does not directly affect them, however, alternatively spliced transcripts have been detected for PIF7, PHYB, and CAMTA3 (Calixto et al., 2018), which may alter CBF expression.

Once produced, the stability of CBF1/3 proteins is downregulated by their interaction with cold-induced CRPK1 phosphorylated 14-3-3 protein (Liu et al., 2017), and upregulated by their interaction with cold-induced OST1-phosphorylated basic transcription factor 3/BTF3-like protein (BTF3/BTF3L; Ding et al., 2018). While some reports have suggested that Arabidopsis CBF1-3 are equally important, others suggest that Arabidopsis CBF2 and CBF3 play a more important role in directing the cold response (Jia et al., 2016; Zhao et al., 2016; Shi et al., 2017), and adaptation to low temperature in natural populations (Gehan et al., 2015). This apparently occurs through the employment of different regulons. Whereas the main genes upregulated in CBF2 overexpressing plants were related to lipid localization, starch metabolic process, light stimulus response, and regulation of transcription, the genes regulated by CBF3 were mainly related to oxidative stress response (Li et al., 2017c).

The presence of a similar pathway in perennial woody plants, such as poplar, apple, grape, Prunus sp., and eucalyptus, is suggested by the identification of usually a larger number of ICEand CBF-like genes, increasing the possible further delineation of functions (Wisniewski et al., 2014a). Investigations into their regulation found that ICE RNA levels are not much affected by treatments, suggesting an emphasis on regulation of ICE by post-translational modifications (Wisniewski et al., 2014a). CBF expression is often induced by low temperatures and/or drought or high salt (Wisniewski et al., 2014a), can be affected by the circadian clock (Artlip et al., 2013), and, for some CBFs, is induced by a continuous cold treatment later and/or for a longer time period than reported for the Arabidopsis CBFs (Xiao et al., 2006, 2008; Artlip et al., 2013; Leyva-Pérez et al., 2015; Li et al., 2017d). AS was determined to be a prevalent occurence in the transcription of many genes in apple, orange, and grape, with genes in grape plants showing the most AS events (Sablok et al., 2017). Not much is currently known about AS events, however, in ICE, or COR genes in low temperature conditions, except for ICE transcripts in grape (Rahman et al., 2014). AS, however, has been suggested to regulate responses to environmental stresses in many plants, including Western poplar (Populus trichocarpa) (Filichkin et al., 2018). Amino acid motifs thought to be involved in post-translational modifications have been identified in predicted sequences for ICE and CBF proteins, and functional studies suggest that they are important (Feng et al., 2012; Nguyen et al., 2016; Carlow et al., 2017), but much more studies are needed to determine when and how they regulate ICE and CBF activity. Together the collective studies suggest that woody perennial plants have a CBF-like pathway similar to Arabidopsis (Benedict et al., 2006), even including a trade-off between growth and cold stress tolerance (Ibanez et al., 2010; Tillett et al., 2011; Nguyen et al., 2016) but details of CBF regulation in woody plants is still very limited.

## PROSPECTS FOR THE GENERATION OF PLANTS WITH ENHANCED FREEZING TOLERANCE

The tight regulation of CBF activity is largely lost in transgenic plants using a CBF construct driven by a constitutive 35S promoter, thus the use of natural promoters is preferred. Recent reports suggest that plants with constitutive brassinosteroid (BR) response display higher CBF expression but no signs of growth retardation (Li et al., 2017b). Therefore, it may be possible to avoid the reduced growth associated with CBF overexpression (Artlip et al., 2014), if the "correct" CBF or CBF regulators are utilized. Recently, Dong et al. (2017) conducted a metaanalysis of the effect of CBF overexpression on temperature stress tolerance and related responses. In that study data from 75 published articles were analyzed to determine the impact of a host of factors such as origin of the CBF gene, promoter

FIGURE 1 | Overview of the regulation of the CBF pathway in Arabidopsis. Low temperatures trigger plasma membrane rigidification which leads, presumably via COLD1-like protein, to the opening of Ca2<sup>+</sup> channels. The resulting higher calcium levels activate CRLK1/2 (calcium/calmodulin-regulated receptor-like kinase; Yang et al., 2010a,b). In turn, this CRKL1/2 triggers the MEKK1-MMK2-MPK4 cascade to ultimately increase ICE activity, because it inhibits the phosphorylation of ICE1 by MPK3/6 and subsequent ICE1 ubiquitination and degradation (Li et al., 2017a; Zhao et al., 2017). ICE activity is further regulated by low temperatures via OST1 (open stomata 1), induced phosphorylation (Ding et al., 2015), and SIZ1-induced sumoylation (Miura et al., 2007), both of which interfere with HOS1 (high expression of osmotically responsive protein 1), directed ubiquitination and subsequent degradation of ICE (Dong et al., 2006). The resulting active ICE directs CBF expression. Low temperature activated phosphorylation of 14-3-3 proteins by CRPK1 cause the degradation of CBF proteins (Liu et al., 2017). In contrast, cold-induced OST1-directed phosphorylation of BTF3s promotes its binding to CBFs and thereby prevents CBF degradation (Ding et al., 2018). Photoperiod regulates CBF expression via red light perception by PhyB and subsequent degradation of PIF3 (phytochrome-interacting factor 3), thereby relieving its inhibition of CBF expression (Jiang et al., 2017), whereas the circadian clock regulates CCA1 and LHY activity (Dong et al., 2011). Interestingly, PIF3 stability is increased by low temperatures, presumably at a later time to downregulate CBF expression (Jiang et al., 2017). PIF4/7 and EIN3 (Ethylene insensitive 3) downregulate whereas BZR and CESTA upregulate CBF expression, but how this is triggered is not yet known (Shi et al., 2018). Phosphorylation, sumoylation and ubiquitination events are indicated by P, S and U, respectively, with activating modifications in green and inhibiting modifications in red.

used to drive expression, the method of stress evaluation, etc., on temperature response and associated indicators, such as electrolyte leakage, growth, chlorophyll fluorescence, sugar and proline levels, etc., Results indicated that 7 of 8 measured variables were significantly modulated in CBF (DREB)-transgenic plants, while two of the eight parameters were only modulated in non-stressed plants. The measured parameters were modulated by 32% or more by various experimental variables. The modulating variables included, acclimated vs. non-acclimated, type of promoter, duration of stress and its severity, source of the donor gene, and whether the donor and recipient were the same genus. CBF overexpression had a consistent negative impact on plant height, a reduction in electrolyte leakage, and positive impact on survival. The impact was evident in both acclimated and non-acclimated plants, although the greatest impact was observed in acclimated plants. Such analyses may provide a more comprehensive understanding of how to best utilize CBF genes with modified promoters to improve freezing stress tolerance.

An alternative, enterprising approach is the modification of endogenous CBF genes into variants that lead to a higher frost tolerance using a clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 nuclease (Cas9)-like system which has recently been optimized for use in vegetatively propagated perennial plants (Chen et al., 2018). For example, existing sequences could be modified based on variants present in more frost tolerant cultivars or species (Carlow et al., 2017; Li et al., 2017d), and thereby change their

regulation. Because of the precise change, the resulting plants may not be considered genetically modified by government regulatory agencies and may be more acceptable to the general public.

# DEACCLIMATION (DA) RESPONSE, A CRITICAL FACTOR FOR WINTER-SURVIVAL

The ability to increase freeze-tolerance in temperate- and borealzone woody perennials (fruit and forest tree species) via autumnal cold acclimation is undoubtedly the first line of defense against harsh and long winters. Seasonally induced freeze-tolerance is lost under relatively warmer conditions in a process called 'deacclimation', a process that typically occurs in response to spring-warming. The maintenance of a sufficient level of coldinduced freeze-tolerance until the danger of killing frosts is passed, however, is an imperative to avoid frost-damage. For example, erratic temperature fluctuations, i.e., sudden winterwarming or premature spring-like conditions followed by more "normal" freezing temperatures, could render partially or fully deacclimated tissues vulnerable to freeze-damage. Indeed, the frequency of such fluctuations has been increasing (Jentsch et al., 2007; IPCC, 2014), and some of the most devastating killer-frosts across North America have been attributed to such events, e.g., Easter freeze of 2007 (Gu et al., 2008), Mother's Day freeze of 2010, killer frost of 2012, and the polar vortex of 2014. Field simulations of winter-warming events have also confirmed their damaging effects on overwintering perennials (Taulavuori et al., 1997; Bokhorst et al., 2009, 2010).

#### DORMANCY STATUS AND SPRING PHENOLOGY (BUDBREAK) IN RELATION TO DA

Temperate trees have evolved the ability to tolerate harsh winters by undergoing a period of endodormancy (rest), during which cold-acclimated meristems are less prone to DA when trees are exposed to unseasonal episodes of warming (Kalberer et al., 2006, 2007a,b). Buds of native temperate and boreal trees must satisfy a genetically defined chilling requirement to exit endodormancy (Richardson et al., 1974). Post-endodormancy, buds enter an ecodormant state where they must be exposed to a genetically defined threshold of warming ('heat units') (Charrier et al., 2011), for the resumption of meristematic activity and growth to occur (budbreak or spring phenology). Ecodormant buds are substantially more sensitive to warmer temperatures and DA than endodormant buds (Kalberer et al., 2007a). This sensitivity increases progressively as the period of ecodormancy increases, finally culminating in complete DA and spring budbreak (Kalberer et al., 2006; Arora and Taulavuori, 2016). Any shift in this annual cycle of spring phenology could potentially increase the risk of trees encountering frost injury (Vitasse et al., 2014 and references therein). Whether an increased risk occurs depends on a variety of internal factors, including species, chilling requirement, the genetic ability to resist deacclimation in response to transient, unseasonal episodes of warm temperatures, and the capacity to reacclimate. External (environmental) mitigating factors include, temperature fluctuations (intensity and timing), and the region/site (latitude and altitude) where the trees are located (Pagter and Arora, 2013; Arora and Taulavuori, 2016; Vitasse et al., 2018b). The sensitivity of ecodormant buds to deacclimating temperatures has also been reported to increase with increasing photoperiod in spring in species such as European beech (Fagus sylvatica) (Vitasse and Basler, 2013). There is ample evidence that warming trends in recent decades have advanced spring budbreak and leaf development in many plant species growing in cold regions (Penuelas and Filella, 2001; Menzel et al., 2006, and references therein). Some studies have indicated, however, that the degree of advancement in spring phenology appears to be declining in the recent years (Yu et al., 2010; Fu et al., 2015).

Advances in spring phenology due to climate change can occur under two scenarios. In the first scenario, a faster than normal accumulation of heat-units by ecodormant buds occurs due to earlier and warmer spring-like temperatures (Cleland et al., 2007). This could render prematurely deacclimated buds vulnerable to subsequent spring frosts. The second, somewhat ignored and paradoxical scenario, involves a more rapid fulfillment of chilling requirement due to warmer winter temperatures than has occurred in more typical, historical winters in certain regions. For example, tree species in northern latitudes or high elevations could experience a greater level of 'dormancy-breaking chill units' since the warmer winter temperatures could expose trees to temperatures >0 ◦C that are more effective in breaking endodormancy and reduce exposure to sub-freezing temperatures that do not contribute to chill unit accumulation (Hanninen, 2006). Based on this premise, spring phenology could be expected to advance more rapidly in historically colder areas under warming climate conditions. This would result in premature deacclimation and a greater risk of spring freeze-damage. Indeed, Vitasse et al. (2018a) reported that spring phenology in fruit (apple, cherry), and forest (Norway spruce and European beech) trees has advanced at a faster rate during 1975–2016 at fifty high elevation locations in Switzerland than in other temperate locations. The authors argued that even if the frequency and severity of late spring frosts remains unchanged in the future or changes less than the spring phenology of plants, deacclimated organs may be more exposed to a greater number of freeze-damage events (Vitasse et al., 2018a).

On the other hand, proponents of a possible decline in the advancement or delay in spring phenology by warming climate make their case as follows. They suggest that elevated winter temperature may result in a chilling-deficit, i.e., reduced duration and/or sum of cold. And since heat unit requirement for spring phenology is believed to be inversely correlated with the chill accumulation during dormancy (Harrington et al., 2010; Laube et al., 2014), any reduction in accumulated chilling would

result in higher heat-unit requirement, thus slowing down the advance in, or delaying, the spring leaf-unfolding (Yu et al., 2010; Fu et al., 2015). One of these studies (Fu et al., 2015) noted that while spring phenology for seven deciduous forest tree species had advanced by ∼4 d during 1980–1994 in Europe, this response has decreased by ∼40% (to 2.3 d) during 1999– 2013 (Pan European Phenology Network). A caveat that must be added for such an observation to be practically and widely applicable, however, is that the temperatures during warmer winters have to be high enough to cause real chilling deficit, i.e., either negate accumulated chilling or be ineffective to meet the chilling requirement.

Although the focus of this mini-review is not on bud dormancy, it is relevant to note that several studies have associated the genetic regulation of chilling requirement and dormancy with Dormancy Associated MADs-box (DAM) genes in peach (Bielenberg et al., 2008; Li et al., 2009) apple (Wu et al., 2017a), pear (Saito et al., 2013), apricot (Sasaki et al., 2011), leafy spurge (Euphorbia esula) (Horvath et al., 2010), kiwifruit (Wu et al., 2011, 2017b), and tea plant (Camellia sinensis) (Hao et al., 2017). Bud-dormancy associated candidate genes have also been identified in blackcurrant (Ribes nigrum) by Hedley et al. (2010) and Yordanov et al. (2014) identified an early budbreak (EBB) gene in poplar that was an APETALA2/Ethylene responsive transcription factor responsible for early bud flush.

C-repeat binding factor-binding motifs have been identified in promoters of several DAM genes and an EBB homolog in apple (Wisniewski et al., 2015a), as well as other plant species. A comprehensive analysis of DAM genes in the ornamental woody plant, P. mume (Chinese plum, Japanese apricot) demonstrated an interaction between CBFs and DAM genes, especially PMCBF1 - PMDAM1 (Zhao et al., 2018) and CBF expression lowered whereas MADS-box gene (1 and 3) expression increased in almond flower buds after bud break (Barros et al., 2012). Wisniewski et al. (2015a) noted that overexpression of a peach CBF gene (PpCBF1) in apple altered the expression of DAM, EBB, and RGL (DELLA) genes and that some members of each of these gene families contained C-repeat regions in their promoter regions that are the target sites for CBF. They provided a model linking CBF expression with the regulation of dormancy, bud-break, freezing tolerance, and growth. Interestingly, a subsequent study indicated that the impact of the apple transgenic rootstock overexpressing CBF was not graft-transmissible and thus did not affect the cold hardiness of dormancy of the scion cultivar grafted to the transgenic 'M.26' rootstock, although growth and flowering were significantly impacted (Artlip et al., 2016).

Current research has highlighted the impact of dormancy status and spring phenology on the propensity of trees to deacclimate. Spring-phenology is an outcome influenced by both the chilling and heat unit requirements of overwintering tree species. The fact that chill- and heat-units can be satisfied by the same temperatures for certain species (Cooke et al., 2012 and references therein) makes their combined effect on spring phenology even more complex. It is therefore critical to include dormancy-status and the interactions between chilling and heat requirements as key parameters in models designed to predict the relationship between deacclimation response and freezingtolerance.

# FUTURE DIRECTIONS

The past 50 years of research has provided a wealth of information on the genetic and molecular regulation of plant cold hardiness, as well as the regulation of dormancy. These advances have been fueled by new technologies associated with highthroughput sequencing, genetic mapping, and transformation technologies. In particular the regulatory role of CBF genes in freezing tolerance, and of DAM genes in the regulation of chilling requirement stand out as major advances. The discovery that CBF activity is regulated continually and at various levels, helps explain, at least in part, why slightly different treatment of plants with respect to light, duration and rate of low temperature treatment for example, lead to different outcomes with respect to their frost tolerance. Notable advances have also been made with the use of high-resolution infrared thermography in our understanding of ice nucleation and propagation (Wisniewski et al., 2014b, 2015b), and the properties of antifreeze proteins (Duman and Wisniewski, 2014). Despite these advances, significant improvements in plant cold hardiness have been elusive and problematic due to the complexity of this trait and its intimate connection to other plant developmental processes, especially growth and flowering. In addition, the relatively new field of epigenetics has demonstrated the key role that the environment can play on imprinting plant response to abiotic stress (Kumar, 2018).

Future studies will need to better understand the cross-talk that occurs between different plant developmental processes and how it can be manipulated in a prescribed manner. A key question will be whether processes that determine cold hardiness can be separated from processes that restrict growth. Can CBF genes be regulated in a manner that removes their negative impact on growth and reproductive output? What are the genetic mechanisms that can be used to tease these processes apart? Which CBF gene or gene variant present in more frost tolerant species is best targeted for manipulation and how can epigenetic modifications affecting their activity best be harnessed?

Although, not as glamorous, a comprehensive understanding of the underlying biophysical mechanisms responsible for freeze avoidance, especially in woody plants, is still lacking. Deep supercooling of xylem parenchyma (Wisniewski, 1995; Fujikawa et al., 2009) and floral buds (Kuprian et al., 2016, 2017) is an integral aspect of the cold hardiness of many temperate tree species, especially fruit trees, however, few advances have been made on this topic over the past 30 years. What new technologies that can be applied to better understand how, when, and where ice is initiated in plants, how it is propagated, and how the size and shape of ice crystals are regulated. Genetic studies of the inheritance of avoidance traits, such as supercooling, have yet to be conducted, but would provide very useful information.

An integrated approach that takes into account the complexity of traits that contribute to plant cold hardiness will be needed to achieve advances that can be translated into practical solutions that address the challenges of a rapidly changing climate.

#### AUTHOR CONTRIBUTIONS

MW was responsible for the general overview of the opinions stated in the manuscript and any faults or shortcoming in logic directly fall on him. AN contributed the overview of CBF regulation, and RA the information on the importance of deacclimation in a changing climate. All authors

#### REFERENCES


reviewed and agreed with the final version of the submitted manuscript.

#### FUNDING

Research on CBF in fruit trees in the Wisniewski lab was supported by the United States Department of Agriculture – Agricultural Research Service (USDA-ARS).

#### ACKNOWLEDGMENTS

The authors apologize to all colleagues whose relevant work is not quoted due to space limitations.

transcription factors are regulated by combinations of conserved amino acid domains. Plant Phys. Biochem. 118, 306–319. doi: 10.1016/j.plaphy.2017. 06.027




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Wisniewski, Nassuth and Arora. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Nitrogen Metabolism and Biomass Production in Forest Trees

Francisco M. Cánovas\*, Rafael A. Cañas, Fernando N. de la Torre, María Belén Pascual, Vanessa Castro-Rodríguez and Concepción Avila

Grupo de Biología Molecular y Biotecnología de Plantas, Departamento de Biología Molecular y Bioquímica, Universidad de Málaga, Málaga, Spain

Low nitrogen (N) availability is a major limiting factor for tree growth and development. N uptake, assimilation, storage and remobilization are key processes in the economy of this essential nutrient, and its efficient metabolic use largely determines vascular development, tree productivity and biomass production. Recently, advances have been made that improve our knowledge about the molecular regulation of acquisition, assimilation and internal recycling of N in forest trees. In poplar, a model tree widely used for molecular and functional studies, the biosynthesis of glutamine plays a central role in N metabolism, influencing multiple pathways both in primary and secondary metabolism. Moreover, the molecular regulation of glutamine biosynthesis is particularly relevant for accumulation of N reserves during dormancy and in N remobilization that takes place at the onset of the next growing season. The characterization of transgenic poplars overexpressing structural and regulatory genes involved in glutamine biosynthesis has provided insights into how glutamine metabolism may influence the N economy and biomass production in forest trees. Here, a general overview of this research topic is outlined, recent progress are analyzed and challenges for future research are discussed.

#### Edited by:

Steven Henry Strauss, Oregon State University, United States

#### Reviewed by:

Chung-Jui Tsai, University of Georgia, United States Victor Busov, Michigan Technological University, United States

#### \*Correspondence:

Francisco M. Cánovas canovas@uma.es orcid.org/0000-0002-4914-2558

#### Specialty section:

This article was submitted to Plant Biotechnology, a section of the journal Frontiers in Plant Science

Received: 25 June 2018 Accepted: 12 September 2018 Published: 28 September 2018

#### Citation:

Cánovas FM, Cañas RA, de la Torre FN, Pascual MB, Castro-Rodríguez V and Avila C (2018) Nitrogen Metabolism and Biomass Production in Forest Trees. Front. Plant Sci. 9:1449. doi: 10.3389/fpls.2018.01449 Keywords: Populus, nitrogen acquisition, nitrogen recycling, glutamine biosynthesis, transgenic trees

# INTRODUCTION

Forest trees include a large group of gymnosperm and angiosperm species that play a crucial role in the overall balance of ecosystems. Forest species also have great economic importance in the production of wood, paper pulp, biofuels and a variety of resins and secondary metabolites. The biorefinery and nanotechnology of forest products are emerging areas of industrial interest in Europe within the so-called bioeconomy of the forestry sector (Horizons – Vision 2030 for the European Forest-based Sector<sup>1</sup> ). Despite the relevance of forest species from the environmental, economic and social point of view, our knowledge of the mechanisms underlying forest growth, development, and productivity is still limited when compared to crop plants. However, recent developments in genomics and biotechnology are providing new tools to unravel key regulatory processes in fundamental tree biology (Tuskan et al., 2006; De la Torre et al., 2014; Plomion et al., 2016; Tsai et al., 2018).

A sustainable management of forest resources is needed to satisfy the increasing demand of forest-derived products and to preserve natural forest stands. For example, highly productive

<sup>1</sup>http://www.forestplatform.org/

plantations with increased levels of tree biomass production are necessary to meet the demands of second-generation bioenergy and other forest resources (Hinchee et al., 2009; Allwright and Taylor, 2016). These new forests will require a sustainable use of fertilizers with N as one of the most relevant components. N use efficiency (NUE) is defined in general terms as the amount of plant product per unit of N fertilizer supplied (Good et al., 2004), and trees with improved NUE will be required to enhance the yield of future plantations. Nitrogen acquisition and metabolism are therefore important targets to improve forest biomass production, and key genes involved in N acquisition from soil and assimilation into amino acids have been studied (Cánovas et al., 2007; de la Torre et al., 2014; Castro-Rodríguez et al., 2016a, 2017). In addition, processes of N storage and recycling are particularly relevant in forest species with long life cycles exhibiting seasonal periods of growth and development (Cantón et al., 2005; Minocha et al., 2015). In deciduous trees, most of the leaf N that is present in the stromal and thylakoidal proteins and chlorophylls of green plastids is allocated and stored in the stem during seasonal dormancy; these N reserves are rapidly mobilized to sustain metabolic activities in the next growing season (Babst and Coleman, 2018).

The genus Populus (poplars) include a variety of tree species with fast growth in temperate habitats. In fact, poplars are widely used for biomass production and considered to be one of the most important bioenergy crops (Ye et al., 2011; Allwright and Taylor, 2016). In addition, Populus has become a model tree due to its favorable characteristics for experimental analyses and the advances made during the last 15 years in its structural and functional genomics (Jansson and Douglas, 2007; Douglas, 2017). In this article, the functional genomics of N metabolism in Populus is reviewed. The relevance of these studies to enhance biomass production is highlighted. Finally, potential avenues for future research on this topic are discussed.

# N ACQUISITION AND METABOLISM IN POPLAR

Poplars are able to acquire inorganic N forms from soil, such as NH<sup>4</sup> <sup>+</sup> and NO<sup>3</sup> <sup>−</sup>, and its relative preference will depend on the soil pH (Rennenberg et al., 2010). The genome of Populus trichocarpa contains an abundant repertoire of genes encoding low- and high-affinity transporters involved in N uptake and allocation. NO<sup>3</sup> <sup>−</sup> uptake is mediated by a large family of transporters consisting of 68 PtNPF genes encoding nitrate and peptide transporters and a smaller family of 11 PtNRT2/NRT3 genes also encoding nitrate transport systems (Bai et al., 2013). NH<sup>4</sup> <sup>+</sup> transporters (AMT) are encoded by 22 genes distributed in two separate subfamilies (AMT1 and AMT2) (Couturier et al., 2007; Wu et al., 2015; Calabrese et al., 2017). Interestingly, the number of AMT2 members in poplar is much higher than in Arabidopsis, suggesting differences in the way that these plants incorporate and transport NH<sup>4</sup> <sup>+</sup> ions (Castro-Rodríguez et al., 2017; Gojon, 2017). The specific expression patterns of these genes strongly suggest that they play non-overlapping roles in N 2017). Regardless of the source of inorganic N that is taken up by the roots, NH<sup>4</sup> <sup>+</sup> is the ultimate N form to be assimilated into amino acids. The main step for N entry in plant metabolism is catalyzed by the enzyme glutamine synthetase (GS; EC 6.3.1.2) and it involves the ATP-dependent condensation of NH<sup>4</sup> + and glutamate for the biosynthesis of glutamine. Unlike that found in other plant species, GS is encoded by a duplicated gene family in Populus consisting of 4 groups of genes, 3 of which code for GS isoforms of cytosolic localization (GS1.1, GS1.2, and GS1.3) and one group that codes for a plastidlocated isoform (GS2). Duplicated GS genes display similar structures with well conserved regulatory regions and intronexon boundaries. However, they are dispersed in the poplar genome and distributed on separate chromosomes (Castro-Rodríguez et al., 2011). Functional analyses of recombinant duplicates revealed that they exhibit similar molecular and kinetic properties and therefore are functionally equivalent enzymes (Castro-Rodríguez et al., 2015). The specific spatial and seasonal patterns of expression of GS genes support non-overlapping roles in poplar N metabolism (Castro-Rodríguez et al., 2011).

In situ hybridization and laser capture microdissection of the whole GS family disclosed that the expression of duplicates is confined to specific cell-types, confirming and extending previous findings (Castro-Rodríguez et al., 2015). Expression studies supported a relevant role of the GS1.1 isoforms in N metabolism of photosynthetic cells in coordination with the role of the chloroplast-located GS2, for example, in the reassimilation of NH<sup>4</sup> <sup>+</sup> released during photorespiration (Betti et al., 2006). In contrast, the role of the GS1.2 enzymes could be related to N mobilization during seasonal N recycling at the onset of dormancy and with senescence associated with pathogen attack. Finally, the expression patterns of the GS1.3 isoforms are consistent with playing an essential role in the biosynthesis of glutamine for N transport (Cánovas et al., 2007) and reassimilation of NH<sup>4</sup> <sup>+</sup> released in the metabolism of phenylalanine during wood formation (Craven-Bartle et al., 2013). Interestingly, genes encoding glutamate synthases (GOGAT, EC 1.4.7.1, and EC 1.4.1.14) and cytosolic isocitrate dehydrogenase (ICDH, EC 1.1.1.42) are also duplicated in poplar. The main players in N uptake and metabolism are shown in **Figure 1**. Gene IDs are listed in **Supplementary Table S1**.

Enzyme redundancy in glutamine biosynthesis occurs in particular cell types of poplar. Paralogous genes may have been retained in the poplar genome to increase the amount of enzyme because their expression is restricted to specific celltypes, and the accumulation of a GS isoform could contribute to maintaining the homeostasis of the N metabolism in a particular cell-type. Glutamine biosynthesis is at the crossroad of many metabolic pathways, and according to the above hypothesis, functions associated with glutamine-derived metabolic products would be enhanced in specialized tissues such as meristems, photosynthetic parenchyma, xylem, and phloem of vascular bundles. The availability of enhanced levels of organic N in the form of glutamine could have boosted the growth and vigor of

glutamine synthetase, isoform 1; GS1.2, cytosolic glutamine synthetase, isoform 2; GS1.3, cytosolic glutamine synthetase, isoform 3; GS2, glutamine synthetase, chloroplastic isoform; Fd-GOGAT, ferredoxin-dependent glutamate synthase; NADH-GOGAT, NADH-dependent glutamate synthase; ICDH, cytosolic isocitrate dehydrogenase; NiR, nitrite reductase; NR, nitrate reductase.

these plants, favoring adaptability to changes in environmental conditions and colonization of new habitats.

## GENE FUNCTIONAL ANALYSIS IN TRANSGENIC TREES

Classical breeding has been widely used for tree improvement but new developments in genomics and biotechnology can accelerate the process. In regard to N nutrition, the above results point to GS as a key enzyme in N metabolism; however, it is important to elucidate which GS isoform contributes to a major extent to poplar growth and biomass production. Transgenic hybrid poplars (P. tremula x P. alba) overexpressing a cytosolic GS of pine exhibited enhanced growth and increased levels of proteins and chlorophylls (Fu et al., 2003; **Figure 2A**). Furthermore, the observed phenotype as a consequence of transgene expression was related to the correct assembly of GS1 subunits in the cytosol of photosynthetic cells. Further characterization showed that GS transgenics had a better NUE (Man et al., 2005; Castro-Rodríguez et al., 2017) and enhanced tolerance to abiotic stress (El-Khatib et al., 2004; Pascual et al., 2008; Shestibratov et al., 2010; Molina-Rueda and Kirby, 2015). A similar approach, overexpressing GS1, has also been used to improve NUE in birch species (Lebedev et al., 2017).

Enhanced growth of GS transgenics was associated with increases in the transcript and protein levels of anthranilate synthase, the enzyme catalyzing the biosynthesis of tryptophan, a precursor of auxin biosynthesis (Man et al., 2011). These findings highlight the paramount importance of GS1 in poplar growth and biomass production, which has also been demonstrated in herbaceous plant models such as maize (Hirel et al., 2007) and rice (Tabuchi et al., 2007). In contrast, chloroplastic GS plays a well-established role in reassimilation of NH4<sup>+</sup> released in the photorespiratory pathway as determined by characterization of photorespiratory mutants lacking GS2 (Blackwell et al., 1987; Betti et al., 2006).

The ability of a regulatory gene to influence growth and biomass production has also been tested in poplar. Dof factors are regulators of N metabolism and potential targets to enhance N assimilation and plant growth (Rueda-López et al., 2008; Tsujimoto-Inui et al., 2009; Wang Y. et al., 2013). The transcription factor Dof5 that regulates GS1 isoforms in maritime pine (Rueda-López et al., 2008) was overexpressed in hybrid poplars (**Figure 2B**). In comparison to untransformed controls, young transgenic plants exhibited enhanced growth, an increased capacity for inorganic N uptake, and accumulated significantly more carbohydrates and lignin (Rueda-López et al., 2017).

The assimilation of NH<sup>4</sup> <sup>+</sup> into amino acids by the GS/GOGAT pathway also requires the provision of carbon skeletons in the form of 2-oxoglutarate (2-OG) (Hodges, 2002). The role of ICDH, a key enzyme in the provision of 2-OG, has been investigated in hybrid poplar and overexpression of ICDH causes an alteration in vascular development (Pascual et al., 2018). Transgenic trees with higher levels of ICDH also displayed increased expression of GS1.3 and other genes associated with vascular differentiation. Phenotypic characterization of the transgenic plants showed increased growth in height, longer internodes and enhanced development in young leaves and the apical region of the stem (**Figures 2C,D**). ICDH overexpression altered the contents of organic acids including citrate, malate and 2-OG, and the levels of glutamate and γ-aminobutyric acid. These results show that the provision of carbon skeletons for NH<sup>4</sup> <sup>+</sup> assimilation and glutamine biosynthesis is a key metabolic process for growth and vascular development in poplar.

Field trials of genetically modified trees are extremely important to assess transgene behavior under natural conditions and potential risks of transgenes spreading before commercialization. A field trial of independently transformed lines expressing GS1 was established, and the performance of these transgenic lines was studied in natural conditions over 3 years (Jing et al., 2004; **Figure 2E**). The transgene was stably expressed in the field resulting in enhanced vegetative growth of transgenic poplars reaching average heights that were 41% greater than non-transformed controls (Jing et al., 2004; Cánovas et al., 2006). These results likely reflect a higher capacity of transgenic trees for N remobilization and N recycling, resulting in a better exploitation of nutrient resources. Interestingly, analysis of wood samples from these 3-year-old trees revealed alterations in cell wall characteristics resulting in improved attributes for pulp and paper production (Coleman et al., 2012). GS1 transgenics also tolerate high levels of NO<sup>3</sup> <sup>−</sup> supply, exhibiting greater NUE and accumulating increased biomass, particularly enriched in cellulose in the above-ground part of the plant (Castro-Rodríguez et al., 2016b). These results are consistent with an efficient N allocation and metabolism in the transgenics. Transcriptomic analysis revealed that transgenic trees are able to reprogramme the transcriptome in response to N excess by differential expression of a greater number of genes than untransformed plants. The above findings strongly support the potential use of these genetically modified trees for phytoremediation of NO<sup>3</sup> <sup>−</sup> pollution with enhanced production of biomass and cellulose for bioenergy applications.

The performance of transgenic trees overexpressing Dof5 was also studied in a field trial during two growing seasons (**Figure 2D**). Interestingly, these transgenic lines showed attenuated growth and no modification of carbon or N metabolism when growing under natural conditions. As the expression of the transgene was stable during the period of study the observed differences in the performance of transgenic trees were attributed to the low levels of N nutrients available in the soil (Rueda-López et al., 2017). These findings reinforce the importance of field studies and indicate that the manipulation of structural rather than regulatory genes has been more effective for increasing biomass production and forest productivity. **Figure 2** illustrates the phenotypes of genetically modified poplars growing under controlled and natural conditions.

## PERSPECTIVES AND FUTURE DEVELOPMENTS

Recent advances made in model and crop plants highlight the importance of NO<sup>3</sup> <sup>−</sup> and NH<sup>4</sup> <sup>+</sup> transporters as key

components of NUE. Consequently, manipulation of N acquisition and intercellular transport should be addressed to explore potential benefits in growth and productivity. In rice plants, overexpression of NO<sup>3</sup> <sup>−</sup> transporters such as OsNRT1.1B (Hu et al., 2015) and OsNRT2.3a (Fan et al., 2017) led to an accumulation of more biomass and increased yield. The identification of potential orthologs of these genes in poplar and subsequent functional analysis will deserve special attention to increasing tree productivity. Additional efforts are also needed to characterize amino acid transporters, particularly those involved in mechanisms of N allocation and recycling (Babst and Coleman, 2018). A strict coordination between N transporters and GS isoforms needs to exist to sustain the glutamine flux that is necessary for the biosynthesis of all nitrogenous compounds during poplar growth and development. A comparative analysis of gene expression in poplar showed co-expression profiles for several AMT1, AMT2, and GS1 genes in young leaves, mature leaves and stems (Castro-Rodríguez et al., 2017). The coordinated function of N transporters and GS1 in different poplar tissues need to be investigated in future studies. Interestingly, Zhang et al. (2018) reported that under low nitrogen, the excess of carbon is redirected to the biosynthesis of aromatic amino acids and lignin, resulting in improved NUE that could be of practical value in terms of biomass production. Deciphering of regulatory networks involved in the response of roots to nitrogen availability is also deserving special attention (Wei et al., 2013; Dash et al., 2015; Luo et al., 2015).

As previously discussed, the manipulation of glutamine biosynthesis is a reasonable strategy to improve NUE and biomass production in poplar. According to the currently available data, the overproduction of GS1 isoforms in particular cell-types is largely beneficial; however, further studies are necessary to fully understand how the increase in glutamine biosynthesis influences tree growth and biomass production. For example, the specific contribution of the GS1 duplicates, GS1.1, GS1.2, and GS1.3, should be explored by performing functional studies using classic transformation strategies, or alternatively, by using the powerful CRISPR-Cas9 technology (Zhou et al., 2015). A recent study reported that overexpression of poplar GS1.2 in tobacco altered secondary cell wall and fiber characteristics and accelerated auxin biosynthesis (Lu et al., 2018), but unfortunately, we still do not know what the impact of GS1.2 manipulation in poplar may be. Of particular interest will be to specifically elucidate how the GS1.1 duplicates are associated with photosynthetic primary and/or secondary NH<sup>4</sup> <sup>+</sup> assimilation in mature leaves. The impact of GS1 overproduction can be further studied with refined approaches using tissue-specific and inducible promoters (Filichkin et al., 2006; Wang L. et al., 2013).

Nevertheless, the existence of intrinsic regulatory mechanisms in planta cannot be ruled out. GS1 expression driven by the constitutive 35S promoter is potentially modulated in photosynthetic tissues through the interaction with a microRNA leading to improved biomass production (Fu et al., 2012). Furthermore, the co-transformation of GS1 transgenics with cellulase genes driven by inducible promoters could facilitate processing of feedstocks for bioenergy applications. Another challenge for the future is the identification and molecular dissection of QTLs potentially associated with GS1/glutamine biosynthesis. QTLs for biomass production under nitrogen limitation and excess have been mapped in poplar but candidate genes of N metabolism were not identified (Novaes et al., 2009). In contrast, interactions have been shown between genes involved in glutamine and glutamate metabolism and QTLs associated to yield traits in maize (Hirel et al., 2007) and rice (Yamaya et al., 2002).

It is worth mentioning that the observed effects of GS1 transgene expression are explained by the altered expression of other genes involved in primary and secondary metabolism. Significant changes in the leaf transcriptome were observed when growing trees at high NO<sup>3</sup> <sup>−</sup> levels with a high number of genes differentially expressed, including those involved in photosynthesis, cell wall formation and phenylpropanoid biosynthesis (Castro-Rodríguez et al., 2016b). In turn, the upregulation of transcription factors strongly suggests that chromatin organization differs in transgenics and wild-type plants, particularly in the response of trees to N availability. Genome-wide identification of regions containing targeted genes involved in the nutritional responses can be achieved by conducting comparative ChiP-Seq analysis in poplar (Liu et al., 2015). A recent genomic resource will facilitate this task, the whole genome-assembly of the P. tremula x P. alba clone INRA 717-1B4 that is used as a tree model in transgenic experiments (Mader et al., 2016). New knowledge derived from these studies and/or those derived from the molecular dissection of QTLs will facilitate the identification of genes linking GS1/glutamine biosynthesis and biomass production. Using gene capture approaches (Seoane-Zonjic et al., 2016), structural variability in these genes can be analyzed in poplar genotypes with a contrasted ability for biomass production. In fact, a recent study has confirmed substantial structural variation in the poplar pan-genome (Pinosio et al., 2016).

#### CONCLUSION

A combination of functional genomic approaches, including transgenic and gene editing technology, chromatin analysis, and systematic identification of genome regions involved in NUE, will facilitate the exploration of the molecular basis of how N metabolism influences biomass production in forest trees.

# AUTHOR CONTRIBUTIONS

FC and CA conceived and wrote the manuscript. RC, FdlT, MP, and VC-R made additional contributions and edited the manuscript. RC and VC-R composed the figures.

# FUNDING

Research work in the author's laboratory was supported by grants from "Ministerio de Economía y Competitividad" (BIO2015– 69285-R) and Junta de Andalucía (BIO-474). RC was supported by a grant from Ministerio de Economía y Competitividad (BIO2015-73512-JIN; MINECO/AEI/FEDER, UE).

## ACKNOWLEDGMENTS

fpls-09-01449 September 26, 2018 Time: 15:24 # 7

We would like to thank Prof. Edward G. Kirby for many years of research collaboration on poplar N metabolism.

#### REFERENCES


We are indebted with Prof. Pedro J. Aparicio for his excellent collaboration in the field trials of transgenic trees.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.01449/ full#supplementary-material

TABLE S1 | Gene IDs and accession numbers for N transporters and enzymes involved in N acquisition and metabolism in poplar.


crop production. Trends Plant Sci. 9, 597–605. doi: 10.1016/j.tplants.2004. 10.008


phosphinothricin. Phytochemistry 69, 382–389. doi: 10.1016/j.phytochem.2007. 07.031



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Cánovas, Cañas, de la Torre, Pascual, Castro-Rodríguez and Avila. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Engineering Drought Resistance in Forest Trees

#### Andrea Polle1,2,3 \*, Shao Liang Chen<sup>1</sup> , Christian Eckert <sup>2</sup> and Antoine Harfouche<sup>4</sup>

<sup>1</sup> Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China, <sup>2</sup> Forest Botany and Tree Physiology, University of Goettingen, Göttingen, Germany, <sup>3</sup> Centre of Biodiversity and Sustainable Land Use, University of Goettingen, Göttingen, Germany, <sup>4</sup> Department for Innovation in Biological, Agrofood and Forest systems, University of Tuscia, Viterbo, Italy

Climatic stresses limit plant growth and productivity. In the past decade, tree improvement programs were mainly focused on yield but it is obvious that enhanced stress resistance is also required. In this review we highlight important drought avoidance and tolerance mechanisms in forest trees. Genomes of economically important trees species with divergent resistance mechanisms can now be exploited to uncover the mechanistic basis of long-term drought adaptation at the whole plant level. Molecular tree physiology indicates that osmotic adjustment, antioxidative defense and increased water use efficiency are important targets for enhanced drought tolerance at the cellular and tissue level. Recent biotechnological approaches focused on overexpression of genes involved in stress sensing and signaling, such as the abscisic acid core pathway, and down-stream transcription factors. By this strategy, a suite of defense systems was recruited, generally enhancing drought and salt stress tolerance under laboratory conditions. However, field studies are still scarce. Under field conditions trees are exposed to combinations of stresses that vary in duration and magnitude. Variable stresses may overrule the positive effect achieved by engineering an individual defense pathway. To assess the usability of distinct modifications, large-scale experimental field studies in different environments are necessary. To optimize the balance between growth and defense, the use of stress-inducible promoters may be useful. Future improvement programs for drought resistance will benefit from a better understanding of the intricate networks that ameliorate molecular and ecological traits of forest trees.

Keywords: water limitation, antioxidative systems, genetic engineering, forest tree species, isohydric, anisohydric, avoidance, tolerance

# INTRODUCTION

Forests cover about 30% of the terrestrial land (FAO, 2016). They have strong effects on the local climate (Li et al., 2015), by interacting with biogeochemical water cycles (Ellison et al., 2017). When forest trees die or forests are cleared across large-scale landscapes, the negative consequences of drought are aggravated (Allen et al., 2015; Reyer et al., 2015), as shown for many areas world-wide (Laurance, 1998, 2004; van der Werf et al., 2008; Malone, 2017). Over-utilization of forests as a feedstock for energy, construction materials, or the generation of value-added products for the chemical industries, intensifies the problem.

The negative consequences of drought become even more urgent in current times of climate change because projections suggest that such events will occur more frequently and be more

#### Edited by:

Wout Boerjan, Flanders Institute for Biotechnology, Belgium

#### Reviewed by:

Christine Helen Foyer, University of Leeds, United Kingdom Chung-Jui Tsai, University of Georgia, United States

> \*Correspondence: Andrea Polle apolle@gwdg.de

#### Specialty section:

This article was submitted to Plant Biotechnology, a section of the journal Frontiers in Plant Science

Received: 10 August 2018 Accepted: 04 December 2018 Published: 08 January 2019

#### Citation:

Polle A, Chen SL, Eckert C and Harfouche A (2019) Engineering Drought Resistance in Forest Trees. Front. Plant Sci. 9:1875. doi: 10.3389/fpls.2018.01875

extreme (Allen et al., 2010; Reyer et al., 2015). In the past decades, global warming has resulted in a drastic reduction of ice-covered northern polar areas during summer (NSIDC, 2018). Over smaller polar areas air temperatures are cooling down less, thus, resulting in lower differences between boreal, temperate and tropical areas. A possible climate implication of this atmospheric situation is an effect on jet-stream oscillation, which in turn may extend stable meteorological high- and low-pressure (anticyclone/cyclone) conditions; the consequences of such conditions are manifested in periods of precipitation on the one hand, and periods of drought on the other hand (Schaller et al., 2018). During lows, flooding events are frequent, whereas the long-lasting highs lead to scarcity of water in many regions world-wide (FAO, 2016). The dry spells promote salt accumulation in upper soil layers, soil degradation, and erosion (Polle and Chen, 2015). Salt and drought are, thus, often cooccurring stresses with which plants have to cope although their physiological implications vary to some extent (Chen and Polle, 2010; Polle and Chen, 2015). The current review is focused on tree responses and improvement by genetic engineering in response to drought. Since most studies in which trees were ameliorated for improved stress resistance included both drought and salt, salinity cannot be completely ignored.

In this review we highlight the molecular physiology of drought stressed forest trees and present an overview on recent biotechnological approaches to improve the drought tolerance of trees with a focus on yield and enhanced stress resistance. Drought effects on woody plants and measures for tree improvements have regularly been reviewed (Wang et al., 2003; Polle et al., 2006; Rennenberg et al., 2006; Fischer and Polle, 2010; Harfouche et al., 2014). Therefore, this review briefly recapitulates the molecular physiology of drought and salt tolerance mechanisms. We summarize novel studies, published in the past 5 years on the performance of trees engineered for better osmotic resistance. We also pinpoint research gaps that need to be addressed for future improvement of drought resistance in trees.

## Concepts and Strategies

Growth and reproduction of plants requires access to water. Water is the solvent for nutrients in soil, the transport medium for nutrients in the plants and the solvent for cellular solutes. Because water is essentially the "stuff of life", the plant water status is tightly controlled by a multitude of general and specific measures such as stomatal control on water loss (Buckley, 2005; Daszkowska-Golec and Szarejko, 2013), osmotic adjustment (Harfouche et al., 2014), anatomical adjustment of the water conducting system (Sperry and Love, 2015; Leuschner and Meier, 2018), deposition of cuticular waxes (Hadley and Smith, 1990) and morphological adjustments such as leaf shedding to avoid uncontrolled desiccation (Munné-Bosch and Alegre, 2004; Fischer and Polle, 2010). Periods of severe and long-lasting drought threaten the existence of plants when overruling their acclimation capacities. These broad examples show that drought responses act at different scales, i.e., inside the plant body and at the level of populations; the responses occur at different time scales, and thus, invoke short- and long-term adjustments that can be flexible or reflect evolutionary adaptation (**Figure 1**). As a result, drought resistance can be achieved by avoidance (homeostasis of tissue water status) or by tolerance mechanisms (acclimation that enable metabolism at a low water potential) (Levitt, 1980; Jones, 1993). These distinctions are important when considering strategies for engineering drought resistance in tree species.

Avoidance mechanisms generally act at the scale of organs or at the whole-plant and the species level (**Figure 1**). Drought adaptation is characterized by ecological traits such as leaf thickness, root morphology, leaf shedding, etc. It is obvious that these traits are the result of a distinct developmental repertoire in a given species. Due to their presumed complexity they have rarely been incorporated in molecular breeding programs. However, the availability of increasing number of tree reference genomes may open new avenues to better understanding and exploiting their ecological traits. For example, the genomes of European beech (Fagus sylvatica) and pedunculate oak (Quercus robur) have recently been published (Mishra et al., 2018; Plomion et al., 2018). These two species are closely related members of the Fagaceae but exhibit strongly divergent ecological behavior (Aranda et al., 2015; Roman et al., 2015). Beech has a shallow fine root system, while oak has a deep root system (Leuschner and Meier, 2018). Deeper tree roots make a valuable contribution to drought resistance and thus, root morphology is one of the traits targeted for improving water use by capturing subsoil water. At the whole-plant level drought stress avoidance is dependent on the capability of the tree to minimize loss and maximize uptake of water (Chaves et al., 2003) through stomatal control and extensive, deep root systems (Nguyen and Lamant, 1989; Brodribb et al., 2010). We envisage that exploiting genomic information, for instance by comparing the molecular differences in root development of important forest species such as beech and oak, novel approaches that could direct breeding for drought avoidance may become available.

Another interesting example for drought avoidance is leaf shedding, a common phenomenon in tropical dry forests (Wolfe et al., 2016). Leaf shedding is controlled by an intricate interplay of phytohormones, including ethylene, abscisic acid (ABA), and auxin (González-Carranza et al., 1998; Chen et al., 2002a,b; Jin et al., 2015; Paul et al., 2018), which could be harnessed to improve tree drought resistance. In polyploid poplars regenerated from protoplast fusion accelerated droughtinduced leaf shedding was observed that resulted in increased tree survival under extreme drought (Hennig et al., 2015). The exact genetic basis of this phenotype is not known but it is apparently associated with partial genome duplication (Hennig, 2016).

Trees must exist in their environment over decades and centuries and therefore require not only drought adaptedness but also metabolic flexibility to adjust their metabolism to changing conditions. Drought tolerance is usually achieved by biochemical modification of the cellular metabolism (**Figure 1**). Acclimation to drought by an individual plant invokes changes in membrane composition, protection of protein folding, osmotic adjustment, scavenging of reactive oxygen species (ROS), etc. (Harfouche et al., 2014) and, thus, acts at the level of cells to organs (**Figure 1**). An important feature of plant drought tolerance is the increase in osmotic pressure as a countermeasure to maintain water flux under declining soil water potentials. The production of osmolytes is costly in terms of carbohydrates because it diverts carbohydrates from growth to defense. A striking example of how woody species from arid, saline deserts can economize their carbon budget is the succulent xerophyte Zygophyllum xanthoxylum (Janz and Polle, 2012). This species exploits sodium as a "cheap" osmolyte, thereby, improving photosynthesis and growth under harsh environmental conditions (Ma et al., 2012). The discovery of such amelioration mechanisms constitutes an important basis to improve drought tolerance in trees (Bao et al., 2015, more details are found below) and underpins our understanding of the physiological consequences of novel features, which is crucial to harness the critical molecular mechanisms for drought acclimation and adaptation.

## MOLECULAR PHYSIOLOGY OF OSMOTIC STRESSES

#### Roots

Roots are the first organ to sense and signal soil water deficits (Hamanishi and Campbell, 2011; Brunner et al., 2015). Since enhanced salinity decreases water availability to roots by increasing the osmotic potential in soil solution, the consequences for water uptake are partly similar to those of drought. Both drought and salt result in a decline in root-toshoot water flow in poplars (Chen et al., 1997, 2002b; Shi et al., 2010), but the consequences are generally less severe in salt tolerant than in sensitive species (Chen et al., 2002b,c, 2003).

At the biochemical level, increased ABA concentrations are a hallmark of osmotic stress across all organs (Wasilewska et al., 2008; Kuromori et al., 2018) (**Figure 2**). The stress signal ABA interacts with pyrabactin resistance 1 (PYR1)/PYR1 like (PYL)/regulatory components of ABA receptors (RCAR) proteins, which then can then form a complex with PP2Cs (Type 2C phosphatases). Thereby, phosphorylation of a SnRK (Kinase) is enabled, which subsequently activates down-stream transcription factors and target genes (Fujita et al., 2011; de Zelicourt et al., 2016). Transcriptomic analyses of pine and poplar roots under drought revealed upregulation of genes for ABA biosynthesis [9-cis-epoxycarotenoid dioxygenase (NCED)], signaling and response factors such as DREB1, bZIP, AP2/ERF, MYB, NAC, and WRKY (Wilkins et al., 2009; Cohen et al., 2010; Lorenz et al., 2011; Perdiguero et al., 2012). Salinity and drought share similar response patterns in poplar roots, which are likely mediated by ABA (Chen et al., 1997, 2001, 2002b; Luo et al., 2009).

ABA in roots further promotes increased biosynthesis of proline (Davies and Bacon, 2003). High concentrations of proline may act as osmolyte and contribute to osmotic adjustment; a further possible function of proline is the protection of appropriate protein folding (Claeys and Inzé, 2013). However, only a few studies have demonstrated increases in proline concentrations in the roots of trees in response to drought stress (Cocozza et al., 2010; Naser et al., 2010). This casts doubts on a prominent role of proline as an osmolyte in roots. Another function of ABA biosynthesis and transport in roots may be related to stimulate endodermal suberization (Tan et al., 2003; Koiwai et al., 2004; Kuromori et al., 2010; Kanno et al., 2012; Zhang et al., 2014). A recent study demonstrated that endodermis is reversibly impregnated with hydrophobic compounds such as suberin in response to abiotic stresses, which might restrict apoplastic movement of water (Barberon et al., 2016).

Evidence is accumulating that ABA plays a role in regulating hydraulic conductance in roots possibly via aquaporin activity (Parent et al., 2009; Almeida-Rodriguez et al., 2011). At the molecular level, aquaporins are important for the control of water

uptake (Fox et al., 2017). Since contrasting responses to different osmotic stress factors have been observed, their regulation is still unclear. For example, changes in the evaporative demand resulted in aquaporin upregulation (Parent et al., 2009; Almeida-Rodriguez et al., 2011), while drought or salt stress caused declines in their expression (Bogeat-Triboulot et al., 2007; Wang et al., 2017a). A decrease in aquaporins in roots is supposed to increase cellular water conservation due to reduced membrane water permeability during periods of dehydration stress (Smart et al., 2001; Bogeat-Triboulot et al., 2007) and would fit with the observed restriction of apoplastic water loss by enhanced suberization (Barberon et al., 2016).

Mycorrhizal fungi also increase tree stress tolerance by regulation of aquaporins and stress metabolites (Luo et al., 2009; Dietz et al., 2011; Xu et al., 2015; Peter et al., 2016), but an indepth treatment of this aspect is beyond the scope of this review. The reader is advised to consult Brunner et al. (2015).

# Stem

Physiological responses of trees to drought stress lead to hydraulic and carbon cycle adjustments (Parker, 1956; Bréda et al., 2006). The hydraulic architecture of the stem is important to keep up water transport under drought and re-establish water flux after re-irrigation. Hydraulic acclimation can be achieved by increased vessel frequencies and decreased vessel lumina (Hacke et al., 2006). Drought-resistant trees decrease the ratio of vessel lumen to cell wall thickness to enhance wall strength under water stress (Hacke et al., 2001) (**Figure 2**). While the anatomy and biophysics of xylem adjustment to water-limited conditions has often been studied (Tyree and Ewers, 1991; Anderegg, 2015; Sperry and Love, 2015), our knowledge on the molecular processes underlying these changes is limited. Aquaporins (PIP1 family) are important for refilling of embolized vessels and thereby helping the tree to recover after drought (Secchi and Zwieniecki, 2011; Laur and Hacke, 2014).

Anatomical and transcriptomic analyses of the developing xylem of poplars revealed that drought imposed changes that are similar to those found after salt exposure (Junghans et al., 2006; Bogeat-Triboulot et al., 2007; Janz et al., 2012; Wildhagen et al., 2018). For example, salt stressed poplar trees reinforce cell walls by increasing wall thickness relative to lumen area and avoid a strong loss in hydraulic conductivity by enhancing vessel frequency (Janz et al., 2012). Transcript abundances of of genes encoding fasciclin-like arabinogalactans (FLA), COBRA-like proteins, xyloglucan-endo-transglycolyases, pectin methylesterases were jointly repressed in developing xylem, while those of genes activating stress and defense responses increased (Janz et al., 2012). Similarly, in water-stressed poplars, transcript abundances of several cellulose synthases, arabinogalactan (AGP) and fasciclin-like proteins decreased (Berta et al., 2010). Wildhagen et al. (2018) also reported massive changes in the regulation of genes required for cell wall forming enzymes. Unexpectedly, drought decreased lignin and increased the saccharification potential of the wood (Wildhagen et al., 2018), indicating positive changes with regard to the biotechnological usage of wood. It would therefore be worthwhile to test whether these changes can be achieved without the typical drought-induced growth-defense trade-off.

Drought further activates antioxidant defenses in the cambium of different poplar clones [Dvina (P. deltoides), I-214 (P. × canadensis), Pallara et al., 2012]. A distinct isoform of CATALASE 3 (CAT3) was strongly enhanced under water deficit, unpinning an essential role for this enzyme in ROS control under drought stress (Pallara et al., 2012). Furthermore, increases in the concentrations of osmotically active solutes in the cambial region of P. alba accompanied reductions in predawn leaf water potentials and stem dehydration (Pallara et al., 2012).

#### Leaves

Stomatal regulation is one of the most important mechanisms to adjust water consumption to fluctuations in water availability (Tardieu and Simonneau, 1998; McDowell et al., 2008; Skelton et al., 2015). Regulation of stomatal aperture reduces water loss by leaf transpiration (Stålfelt, 1955; Barrs, 1971; Brodribb and Holbrook, 2003; Araújo et al., 2011) but there is a trade-off between transpirational water loss and CO<sup>2</sup> assimilation (Jarvis and Jarvis, 1963; Cowan, 1978). Pioneering studies involving poplar species and hybrids shed light on two drought stress response strategies, anisohydric and isohydric behavior with divergent consequences for water flux and biomass production (Marron et al., 2003; Monclus et al., 2006; Giovannelli et al., 2007). Anisohydric plants keep their stomata relatively widely open and prevent dehydration by increasing the osmotic pressure in leaves (Gebre et al., 1994; Marron et al., 2002; Hanin et al., 2011; Barchet et al., 2014; Martorell et al., 2015); thereby, they are capable to support growth and biomass production (Passioura, 2002). Isohydric plants limit water loss through sensitive stomatal regulation and closure and/or by leaf abscission (Couso and Fernández, 2012). Poplars are isohydric species but exhibit a suite of adaptive measures (Brignolas et al., 2000) such as variation in stomatal sensitivity (Hamanishi et al., 2012), leaf shedding (Marron et al., 2002) and growth decline (Giovannelli et al., 2007). In a population of tree species a continuum of responses to water deficit between isohydryic and anisohydric behavior can be found (Klein, 2014). For instance, beech generally exhibits isohydric behavior but progenies from dry habitats showed stronger anisohydric behavior than those from wet habitats (Nguyen et al., 2017). In poplar, heritability of stomatal responsiveness to water deficit is generally high, indicating that this trait is a useful target for genetic engineering (Orlovic et al., 1998; Al Afas et al., 2006; Monclus et al., 2006).

Endogenous ABA is rapidly produced upon water deficit, initiating a signaling cascade that results in down-stream responses such as stomatal closure (Fujii et al., 2009). Besides roots, ABA biosynthesis takes place in leaves (Kuromori et al., 2018). Stomatal responsiveness to ABA shows large variation among different species and is evolutionary determined (Sussmilch et al., 2017). In angiosperms, ABA induced stomatal closure is usually rapid and can occur within seconds or minutes (Geiger et al., 2011), thus, not requiring de novo transcription. Over-expression of the ABA biosynthesis protein 9-cis-epoxycarotenoid dioxygenase 3 (NCED3) is beneficial for water-use efficiency (WUE) and results in enhanced drought Polle et al. Drought Resistance in Forest Trees

resistance in several plant species (Iuchi et al., 2001; Tung et al., 2008). Drought induced changes in stomatal development involve regulation of transcript abundance of the poplar orthologs of STOMAGEN, ERECTA, and STOMATA DENSITY AND DISTRIBUTION 1 (SDD1) (Harfouche et al., 2014). Interestingly, WUE was increased by the over-expression of a poplar ortholog of ERECTA in A. thaliana (Xing et al., 2011). ERECTA controls stomatal density but the sequence of events leading to this effect is still unknown (Xing et al., 2011). Genes such as ERECTA, SDD1, or NCED3 should be the focus of future research programs aimed at developing transgenic or geneedited trees with resistance to naturally occurring field drought conditions.

At the cellular level, biochemical protection measures are activated in response to drought to avoid negative consequences of stress-induced ROS and to endure water deficit (Wang et al., 2003, **Figure 2**). Moderate water stress results in increased concentrations of soluble carbohydrates and polyols, which potentially promote maintenance of cell turgor in P. euphratica leaves through increased osmotic pressure (Bogeat-Triboulot et al., 2007). However, after salt acclimation bulk soluble carbohydrates (including glucose and fructose, sucrose), sugar alcohols, organic acids, mostly decrease or remain almost unaffected (Ottow et al., 2005; Dluzniewska et al., 2007; Ehlting et al., 2007; Brinker et al., 2010), suggesting that moderate salt accumulation in leaves may compensate for osmotic adjustment as observed in some halophytic species (Ma et al., 2012). It is noteworthy that amino acids, in particular proline, increase drastically in both water- and salt-stressed leaves (Brosché et al., 2005; Ottow et al., 2005; Dluzniewska et al., 2007; Ehlting et al., 2007; Pallara et al., 2012). In accordance, the mRNA levels of genes encoding enzymes that catalyze rate-limiting steps of proline synthesis and degradation [delta-1-pyrroline-5-carboxylate synthase (PcP5CS) and proline dehydrogenase] accumulate under osmotic stress (Dluzniewska et al., 2007). However, the bulk rise of proline to µM levels is insufficient to explain the observed change in osmotic pressure in salt exposed trees required to maintain water uptake (Ottow et al., 2005; Brinker et al., 2010). Therefore, increased proline may act as a protectant of protein integrity rather than function as an osmolyte in leaves.

Antioxidative systems also play an important role in the defense against negative consequences of drought stress (Hasanuzzaman et al., 2013). Therefore, one would expect that populations of wild tree species (beech) from dry conditions contain higher ROS protection than those from mesic conditions. Unexpectedly, the opposite was observed: unstressed beech from a mesic habitat showed a higher antioxidative capacity than those from a dry habitat and moreover, those from mesic conditions showed a stronger antioxidative response to drought than those from the dry habitat (Carsjens et al., 2014). These observations suggest that trees exposed infrequently to stress respond more flexibly, whereas long-term stress adapted trees are protected by resistance measures, which are already in place before the onset of acute stress. This view is also supported by constitutively enhanced salt tolerance of P. euphratica compared to salt sensitive poplars (Janz et al., 2010). The enhanced tolerance of P. euphratica is, for example, based on the expansion of the sodium:proton antiporter family in the genome of this species (Ma et al., 2013). These few and selective examples highlight that divergent strategies may be required for improving drought resistance in short- or long-term water limited environments.

# GENETIC APPROACHES FOR INCREASED STRESS TOLERANCE

Because drought and other osmotic stresses result in multiple tolerance or avoidance mechanisms, simple strategies for improving the performance of trees in water-limited environments do not exist. In order to target a suite of genes that can enhance drought tolerance, recent attempts to improve plant performance have often focused on signal perception and transduction (**Table 1**), whereas overexpression of structural genes found fewer applications (**Table 2**). Strategies for the selection of candidate genes relied mainly on the inducibility of genes under stress or on the gene origin in a highly stress tolerant species (**Tables 1**, **2**).

In most cases, candidate genes for stress tolerance were expressed under the 35S promoter, leading to high constitutive production in the transgenic plant (**Table 1**). A drawback of this approach is that more drought resistant plants often show biomass yield trade-off (e.g., the dwarfed eui mutant, a mutant in the GA-regulating CYP714 A3 gene, Wang C. et al., 2016). The utilization of stress-inducible promoters may be promising to achieve a balance between growth under non-stress conditions and enhanced defense activation under drought conditions. For example, a novel zinc finger protein from the succulent, xerophytic species Z. xanthoxylum rendered transgenic plants more tolerant to osmotic stress (Chu et al., 2016; **Table 1**). Similarly, overexpression of DREB (dehydration responsive protein binding element) under the RD29 promoter activated osmolytes (sugars) and enhanced the performance of transgenic plants under drought stress (Zhou et al., 2012; **Table 1**). Other studies showed successful activation of antioxidants, reduction of membrane leakage and increased photosynthesis, when YUCCA6 (a flavin mono-oxygenase-like from Arabidopsis thaliana) or choline oxidase (from bacteria) were overexpressed under an oxidative stress-inducible promoter (Ke et al., 2015, 2016; **Table 2**). However, overall utilization of stress-inducible promoters is still rare.

Plant model species, in which drought responses have often been studied and for which genomic information is available for a decade or longer, such as A. thaliana and Populus spp. were often used as the source species of inducible genes. In recent years, the gene pool of drought and salt tolerant woody species has increasingly been tapped. Among these species are: the succulent, xerophyte Z. xanthoxylum, the salt-tolerant and facultative succulent poplar, P. euphratica, the salt- and droughttolerant species, Tamarix hispida and the salt-tolerant Fraxinus velutina (**Tables 1**, **2**). Other crops and woody species that have also been increasingly used as gene source are: Diospyros kaki (a widely cultivated fruit tree in China), Phyllostachys edulis (bamboo), Morus spp. (mulberry, feed for silkworms),


TABLE

1


Functional

characterization

of

drought-

and

salt-inducible

protein

kinase

and

transcription

factors

originating

from

or

expressed

in

trees

species. (Continued)


(Continued)

TABLE

1


Continued


(Continued)


malondialdehyde

 (MDA), Cauliflower mosaic virus

Polyethyleneglycol

(CaMV), Electrolyte leakage (EL), instantaneous

Data were searched between 2013 and 2018 with the key words: Tree and overexpression

 (PEG),

1-naphthaleneacetic

 acid (NAA), 6-benzyl aminopurine

 water use efficieny (iWUE), ↑, Increase; ↓, Decrease.

 (6-BA), gibberellic acid (GA), abscisic acid (ABA), superoxide dismutase (SOD), peroxidase (POD),

 and drought in Web of Science.


 in trees species.

Frontiers in Plant Science | www.frontiersin.org

TABLE 2



TABLE

2


Continued

Data were searched between 2013 and 2018 with the key words: Tree and overexpression

peroxidase (APX), Salicylic Acid (SA), peroxidase (POD),

malondialdehyde

 (MDA), Cauliflower mosaic virus (CaMV), Electrolyte leakage (EL), ↑, Increase; ↓, Decrease.

 and drought in Web of Science.

Hevea brasiliensis (rubber), Picea and Pinus (conifers), and Cicer arietium (herbaceous legume crop such chikpea). The target species were model species such as poplars, Arabidopsis and Nicotiana tabacum, but also crops such as alfalfa, cotton, lotus, tomato and rice. Transformation of non-model tree species for enhanced stress tolerance is still rare but recent results showed promise. Overexpression of DREB2A, a gene that forms a hub for drought-stress related gene expression in Robinia pseudoacacia resulted in enhanced drought resistance (Xiu et al., 2016). The drought resistant phenotype was mediated by the formation of deeper roots and decreased oxidative stress, and most likely mediated by effects on the phytohormone balance of the plants (Xiu et al., 2016, **Table 1**).

Succulence, which occurs in many drought or salt resistant species, is a complex trait that may prove to be useful for drought resistance. Leaf thickness and water content increase with increasing salinity and aridity (Ottow et al., 2005; Nguyen et al., 2017). Succulent leaves exhibit a significant water storage capacity and dilute intrinsic salt concentrations (Ottow et al., 2005; Scholz et al., 2011; Han et al., 2013; Ishii et al., 2014). Overexpression of a putative xyloglucan endotransglucosylase/hydrolase from P. euphratica (PeXTH) contributed to salt-induced leaf succulence (Han et al., 2013) by improving cell wall properties to cope with water deficit and high salinity (Cho et al., 2006). Overexpression of a hot pepper (Capsicum annuum) CaXTH3 in guard cells reduced transpiration under dehydration stress, thus, supporting a role of XTHs in drought resistance (Choi et al., 2011).

As highlighted before, the acclimatory responses of trees to drought invoke a multitude of molecular and biochemical changes. Consequently, a focus of many recent genetic approaches was on genes encoding protein kinases and transcription factors to potentially target whole signaling and biochemical pathways instead of single gene products. Overexpression of CPK (calcium-dependent protein kinase, Chen et al., 2013), bZIP (Basic leucine zipper protein, Wang et al., 2010), DREB (dehydration-responsive element-binding protein, Chen et al., 2009, 2011; Zhou et al., 2012; Yang et al., 2017), CBL (calcineurin B-like protein, Li et al., 2013), NAC [no apical meristem (NAM, Wang L. et al., 2016, 2017), ATAF (Arabidopsis transcription activation factor), CUC (cup-shaped cotyledon) superfamily, Wang et al., 2013; Lu et al., 2017], ERF (ethylene response factor, Wang et al., 2014; Yao et al., 2016), WRKY (Zheng et al., 2013), and ZFP (zinc finger proteins, Zang et al., 2015) often resulted in enhanced photosynthesis, higher WUE, higher activity of antioxidative enzymes, lower oxidative damage and improved growth under osmotic stress (**Table 1**). Examples are still rare where drought and salt responses are not congruent (**Table 1**). For example, overexpression of a MAPK1 of the MAPK C family resulted in more salt tolerant but less drought and heat resistant plants but the underlying mechanisms for this difference are speculative (Liu et al., 2017).

ABA is crucial in mediating plant drought responses. Most of the signal transduction and response factors used for stress amelioration are regulated by ABA (**Table 1**). The receptor RCAR is the first target of ABA and forms a complex with PP2C for stress signaling (Fujita et al., 2011; de Zelicourt et al., 2016). The situation is even more complex because multiple RCARs and PP2Cs exist that are forming combinatorial interaction networks (Tischer et al., 2017). Arabidopsis and hybrid poplar overexpressing RCARs from P. trichocarpa were more drought tolerant than the wildtype through decreased water loss and increased osmotic and antioxidative protection (Yu J. et al., 2016, 2017). However, there is also fitness trade-off because germination of the overexpressed Arabidopsis seeds is inhibited (Yu J. et al., 2016). The transgenic poplar phenotype was normal and the biomass gain under strong drought stress was higher than that of the controls (Yu J. et al., 2017). HAB1 was suggested to be the ortholog of Arabidopsis PP2C, which is a negative regulator of ABA signaling and acts as co-receptor for RCARs. Arabidopsis overexpressing HAB1 gene from P. euphratica lost ABA sensitivity and became more drought sensitive than the wildtype (Chen et al., 2015). Overall, functional characterization of these genes indicated that the core ABA signaling pathway is conserved in poplar and may be a suitable target for genetic engineering.

Studies applying novel gene editing methods (CRISPR/Cas9) to improve drought tolerance are still in their infancy but hold promise for new discoveries. For example, lignin deposition was reduced in poplars in which Myb170 expression was abolished by CRISPR/Cas9 (Xu et al., 2017). Surprisingly, heterologous expression of Myb170 in Arabidopsis uncovered its presence in guard cells, which showed stronger stomatal closure at night and thereby, enhanced drought protection (Xu et al., 2017). This study illustrates that novel functions of genes can be detected by combining CRISPR/Cas9 and overexpression.

# CONCLUSIONS AND RESEARCH NEEDS

Trees are capable of responding to drought stress through a wide variety of cellular and physiological acclimation strategies, which form the basis for genetic improvements of drought tolerance. In particular, overexpression of drought sensing, signal transduction, and drought responsive transcription factors can enhance drought tolerance in a variety of model systems and some economically important woody species. Our overview on transgenic modifications revealed that modifications at the cellular level were the main targets, often using genes from drought or salt tolerant woody species for overexpression. However, systematic studies to clarify if these genes perform better than those from drought sensitive species are lacking. Comparative studies suggest that amplification of distinct gene families such as the SOS pathway in P. euphratica, gene duplication, and evolutionary recruitment of distinct metabolites such as ABA for stomatal regulation could also be important avenues for future research. Furthermore, long-term studies under field conditions are still scarce. There is obviously a strong need for testing genetically modified trees in their natural environment because the combination of stress factors such as heat and drought together may overrule the effects of single stressors present under laboratory conditions.

At a wider scale, we have to assert that our mechanistic understanding of the interplay among osmotic regulation, hydraulic adjustment and uptake systems for water and nutrients is still in its infancy. In particular, the root-toshoot communication that sets off a suite of responses leading to morphological changes of the root system is not clear. Therefore, an important future task will be to uncover the genetic basis for an optimized resource allocation between biochemical defenses and production of new structures such as deep rooting systems under stressful climatic conditions. Next-generation genomics and phenomics approaches will facilitate a better understanding of phenotype-genotype maps and help to formulate genomic-assisted breeding strategies in forest trees for resistance to drought stress and other osmotic cues.

# AUTHOR CONTRIBUTIONS

AP, SC, CE, and AH drafted and wrote the manuscript together. All authors agreed on the final version of this review.

# REFERENCES


# FUNDING

This work was supported by the European Commission's Seventh Framework Programme (FP7/2012-2017) under the Grant Agreement No. FP7-311929 (WATBIO), the Brain Gain (Rientro dei Cervelli) MIUR professorship (with tenure) for AH, and jointly by the National Natural Science Foundation of China (Grant Nos. 31770643 and 31570587), Beijing Municipal Natural Science Foundation (Grant No. 6182030), the Research Project of the Chinese Ministry of Education (Grant No. 113013A), the Program of Introducing Talents of Discipline to Universities (111 Project, Grant No. B13007) and the Beijing Advanced Innovation Center for Tree Breeding by Molecular Design (Beijing Forestry University).


phenotypes and increased tolerance to abiotic stress. Plant Physiol. Biochem. 94, 19–27. doi: 10.1016/j.plaphy.2015.05.003


from metabolite and transcriptional profiling into reprogramming for stress anticipation. Plant Physiol. 151, 1902–1917. doi: 10.1104/pp.109.143735


Parker, J. (1956). Drought resistance in woody plants. Bot. Rev. 22, 241–289.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Polle, Chen, Eckert and Harfouche. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

\*,

,

,

# Cyberinfrastructure to Improve Forest Health and Productivity: The Role of Tree Databases in Connecting Genomes, Phenomes, and the Environment

Jill L. Wegrzyn<sup>1</sup> \*, Margaret A. Staton<sup>2</sup> \*, Nathaniel R. Street<sup>3</sup> \*, Dorrie Main<sup>4</sup> Emily Grau<sup>1</sup> \*, Nic Herndon<sup>1</sup> , Sean Buehler<sup>1</sup> , Taylor Falk<sup>1</sup> , Sumaira Zaman<sup>1</sup> Risharde Ramnath<sup>1</sup> , Peter Richter<sup>1</sup> , Lang Sun<sup>1</sup> , Bradford Condon<sup>2</sup> , Abdullah Almsaeed<sup>2</sup>

, Chanaka Mannapperuma<sup>3</sup>

#### <sup>1</sup> Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, United States, <sup>2</sup> Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, Knoxville, TN, United States, <sup>3</sup> Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, Umeå, Sweden, <sup>4</sup> Department of Horticulture, Washington State University, Pullman, WA, United States

, Sook Jung<sup>4</sup> and Stephen Ficklin<sup>4</sup>

Despite tremendous advancements in high throughput sequencing, the vast majority of tree genomes, and in particular, forest trees, remain elusive. Although primary databases store genetic resources for just over 2,000 forest tree species, these are largely focused on sequence storage, basic genome assemblies, and functional assignment through existing pipelines. The tree databases reviewed here serve as secondary repositories for community data. They vary in their focal species, the data they curate, and the analytics provided, but they are united in moving toward a goal of centralizing both data access and analysis. They provide frameworks to view and update annotations for complex genomes, interrogate systems level expression profiles, curate data for comparative genomics, and perform real-time analysis with genotype and phenotype data. The organism databases of today are no longer simply catalogs or containers of genetic information. These repositories represent integrated cyberinfrastructure that support cross-site queries and analysis in web-based environments. These resources are striving to integrate across diverse experimental designs, sequence types, and related measures through ontologies, community standards, and web services. Efficient, simple, and robust platforms that enhance the data generated by the research community, contribute to improving forest health and productivity.

Keywords: database, content management system, forest tree, bioinformatics, web services

# INTRODUCTION

Starting in the Sanger sequencing era, significant investments were made to catalog genetic resources in primary repositories (Frishman et al., 1998). EMBL (the European Molecular Biology Laboratory), DDBJ (the DNA Data Bank of Japan), and NCBI (the National Center for Biotechnology Information) GenBank were initiated between 1980 and 1992, and remain freely

#### Edited by:

Matias Kirst, University of Florida, United States

#### Reviewed by:

Ming Chen<sup>2</sup>

Matthew Aaron Gitzendanner, University of Florida, United States Jason Holliday, Virginia Tech, United States

#### \*Correspondence:

Jill L. Wegrzyn jill.wegrzyn@uconn.edu Margaret A. Staton mstaton1@utk.edu Nathaniel R. Street nathaniel.street@umu.se Dorrie Main dorrie@wsu.edu Emily Grau emily.grau@uconn.edu

#### Specialty section:

This article was submitted to Plant Biotechnology, a section of the journal Frontiers in Plant Science

Received: 16 August 2018 Accepted: 05 June 2019 Published: 25 June 2019

#### Citation:

Wegrzyn JL, Staton MA, Street NR, Main D, Grau E, Herndon N, Buehler S, Falk T, Zaman S, Ramnath R, Richter P, Sun L, Condon B, Almsaeed A, Chen M, Mannapperuma C, Jung S and Ficklin S (2019) Cyberinfrastructure to Improve Forest Health and Productivity: The Role of Tree Databases in Connecting Genomes, Phenomes, and the Environment. Front. Plant Sci. 10:813. doi: 10.3389/fpls.2019.00813

accessible and federally funded (Benson et al., 1997; Tateno and Gojobori, 1997). The vast majority of data for these large, sequence-centric databases is sourced directly from researcher submissions that are encouraged through peer review journals. These primary resources have evolved with the data collection and curation needs of today, expanding in terms of both the sequence source and the associated metadata (Sayers et al., 2019). All three specialize in generating persistent identifiers to track a single sequence over an extensive network of resources. A genic identifier, as an example, may link a reference genome in NCBI's Genome, an expression value in the Gene Expression Omnibus (GEO), and support for a UniRef90 cluster. These uniquely accessioned resources are increasingly integrated into secondary and tertiary repositories that subset or enhance these accessions with data specific to the communities they serve (Herrero et al., 2016).

As the data types and experimental designs contributing to these repositories diversified, a plethora of model organism databases (MODs) or clade organism databases (CODs) emerged. These databases sought to provide unique resources for the research communities they serve, through layered curation and specialized integration. The AAtDB (An Arabidopsis thaliana Database), developed in 1991 to support the first model plant system, has since evolved into the widely accessed, Arabidopsis Information Resource (TAIR) (Flanders et al., 1998; Huala et al., 2001). Around the same time, USDA-ARS funds were dedicated to developing some of the first informatic portals for economically important crop species, including RiceGenes (Cartinhour, 1997), GrainGenes (Triticeae) (Carollo et al., 2005), MaizeGDB (Lawrence et al., 2004), SoyBase (Grant et al., 2009), and the Dendrome Project for forest trees (Wegrzyn et al., 2008). Some of these databases remain independent funded entities, while others have merged into larger repositories or broadened their scope. There are hundreds of plant-focused organismal databases acting as secondary repositories today (Lai et al., 2012; Chen et al., 2018). The vast majority have moved beyond genetics and genomics data, providing advanced integration through stock centers, phenotypic evaluation, breeding resources, and metabolomic pathway integration.

Forest trees are unique among species represented in crop databases. The vast majority are long-lived, outcrossing, with extensive natural distributions represented by large, diverse and locally adapted populations (Holliday et al., 2017). They represent species of economic importance and are used for paper, pulp, biofuels, food, and timber production. At the same time, they serve as a foundation for watersheds, biodiversity, and contribute substantially to carbon sequestration with forests covering roughly 30% of the earth's surface (Houghton, 2005). Like many plants, forest trees have complex genomes with challenges associated with ploidy and repetitive content. Additionally, gymnosperm tree genomes are exceptionally large, ranging from 10 to 40 Gbp in size (De La Torre et al., 2014). This, in combination with the need to broadly sample these large and diverse populations, yields limited full genome representation. Of the nearly 60,000 forest tree species, less than 35 are associated with an assembled and annotated reference genome (Neale et al., 2013; Plomion et al., 2016). A view into our primary databases reveal that over 2,000 species are associated with genetic information that is of value to the research community (Sayers et al., 2019).

In the era of high throughput technologies that assess genotype, phenotype, and environmental metrics much faster than we can conceive, the need for well structured, efficient, and well-connected databases is apparent. In a recent survey of over 700 investigators in the biological sciences, access to analytical frameworks, long-term storage, and the ability to integrate data across disparate sources, were of primary concern (Barone et al., 2017). The long-lived nature of trees, their pivotal role in local economies, and role in ecosystem health requires an integrated approach that must leverage datasets from a variety of sources. To meet the needs of the research community, forest tree databases are moving away from independent database structures and toward integrated Content Management Systems (CMS) that support specific, shareable modules for query and analysis (Ficklin et al., 2011; Sundell et al., 2015). There is less focus on standardized database backends since web services allow users to expose their data, and data definitions. The ability to provide this, and cross-site query, relies heavily on the curation of ontologies to describe data housed in these frameworks. Initiatives that curate woody plant ontologies, to describe plant structures and measured traits, are critical components of forest tree database cross-talk (Jaiswal et al., 2005; Lens et al., 2012; Cooper et al., 2018). When these terms are provided within the framework of recommended standards for data collection, such as the Minimal Information About a Plant Phenotyping Experiment (MIAPPE), opportunities for analytical pipelines to evaluate complex data becomes a reality (Krajewski et al., 2015). In addition to these standards, all tree databases are integrated with existing analytical frameworks, such as Galaxy, that support and expose common bioinformatic workflows (Afgan et al., 2018). In this review, four tree databases are described, including their history, scope, current resources, and analytic tools (**Figure 1**).

# TREEGENES

TreeGenes<sup>1</sup> , previously known as Dendrome, was initially constructed to provide access to genetic data for forest trees in a relational framework. Early development included curation of molecular markers, genetic maps, Expressed Sequence Tags (ESTs), and species range maps. TreeGenes remained in a custom database schema and website, adopting components of the Generic Model Organism (GMOD) framework for housing genetic maps (cMAP) and genome assemblies (JBrowse) (Stein et al., 2002). Later development focused on the integration of genotyping resources, phenotypes, expression studies, and additional reference genome sequences (Wegrzyn et al., 2012; Falk et al., 2018).

TreeGenes currently represents just over 1700 species from 16 orders and 124 genera (Falk et al., 2018). TreeGenes has 1200 registered users with associated colleague accounts that enable access to data submission and analytical pipelines.

<sup>1</sup>http://www.treegenesdb.org

The database contains 27 reference genomes, 100 genetic maps, 36.7 M genotypes, 303 species with transcriptomes, 40 species with TreeGenes' Unigenes, 306 unique phenotype measures and 935,596 phenotypic measures. Genomic data is sourced primarily from GenBank, 1KP, Phytozome, and PLAZA (Goodstein et al., 2012; Matasci et al., 2014; Proost et al., 2015; Sayers et al., 2019). Phenotypic data is integrated from TRY-DB and Dryad, but primarily comes from direct user submissions (Kattge et al., 2011). Environmental data is extracted from imported layers, including temperature, precipitation, and solar radiation from WorldClim (Fick and Hijmans, 2017), and a variety of metrics from the Harmonized World Soil Survey Database (FAO et al., 2009).

TreeGenes is running on Tripal v3 which integrates a content management system known as Drupal with the Genetic Model Organism Database's (GMOD) relational schema, known as Chado. This conversation in 2017, aligned TreeGenes for the first time with over 30, primarily plant, databases (Ficklin et al., 2011; Sanderson et al., 2013). Recent focused development in Tripal, led by the tree and legume community, enabled cross-site communication, access to efficient data transfer, and the ability to interface with a local installation of Galaxy (Watts and Feltus, 2017; Wytko et al., 2017). Galaxy is an independent framework that provides an API to abstract command line informatic software, develop workflows, and connect to high performance computing resources. Conversion into Tripal resulted in a complete overhaul of the database, and has enabled the development of several analytical modules that allow researchers to search, filter, and funnel data directly into supported workflows.

Following conversion, TreeGenes released a set of Tripal modules that can be utilized by researchers visiting the site or installed and customized for any Tripal supported databases. Tripal Sequence Similarity Search (TSeq) provides access to genomes, transcriptomes, and curated TreeGenes unigenes through traditional NCBI BLAST or optimized Diamond protein searches (Boratyn et al., 2013; Buchfink et al., 2015). The Tripal Plant PopGen Submit (TPPS) module presents a framework for researchers to submit their association genetics, landscape genomics, and related population studies by collecting any combination of molecular markers, phenotypes, and environmental measures. This module implements MIAPPE standards and the associated ontologies to ensure data integrity. The Tripal OrthoQuery module provides a framework for curating unigenes, executing OrthoFinder (Emms and Kelly, 2015), and generating interactive visualizations of gene families in a phylogenetic context. OrthoQuery enables both real-time orthologous gene family analysis and functional assessment of the resulting orthogroups.

Current development in TreeGenes focuses on CartograTree, which enables integration of genotype, phenotype and environmental data for georeferenced forest trees (Herndon et al., 2018). This module provides a robust framework

to query publication datasets, species, phenotypes, genotypes, and associations based on metadata collected at the time of submission. The data and metadata exposed in CartograTree is derived from published population level studies submitted to TreeGenes via TPPS or curated from Dryad. Landscape genomics, association genetics, and population structure analysis is executed through the Galaxy framework.

#### HARDWOOD GENOMICS PROJECT

The Hardwood Genomics Project (HWG) provides access to genomic resources generated from angiosperm trees, including forest and urban trees of ecological and agricultural significance. The resource originated from the Fagaceae Genomics Web, built in 2007, to house transcriptomes, genomes and genetic maps. As new collaborators joined the effort and the scope of species extended beyond Fagaceae, the site was rebuilt in 2011 as the HWG. HWG's mission is to host reference genomes and transcriptomes that are either not accessible elsewhere, or only available as raw files without an associated and searchable, functional annotation. In addition, HWG accepts molecular markers, genetic maps, germplasm and population descriptions, and community project descriptions. Current resources support species associated with pest or pathogen threats, including green ash, European ash, American chestnut, American beech, black walnut, and redbay, as well as trees with significant economic value, including white oak, black cherry, sugar maple, and tulip poplar.

For species with an available reference genome, HWG provides a workspace for accessing the annotation. This provides value to these sequence resources by performing and hosting functional annotation, including: identification of Open Reading Frames (ORFs) from transcripts, BLAST annotations derived from the Uniprot Swiss-Prot/TrEMBL plant protein databases, InterProScan domain searches (Jones et al., 2014), Gene Ontology (GO) term assignments (Ashburner et al., 2000), and predictions for Simple Sequence Repeats (SSRs) and primers. Researchers can download flat files, explore the spatial context of the assembly with JBrowse (Buels et al., 2016), search functional annotation for genes, and explore assigned GO terms through the ontology graphs. Additional genome specific data, such as alternative splicing, variants, and molecular markers are added to the JBrowse viewer when available.

Hardwood Genomics Project is currently running Tripal v3, and like TreeGenes, is responsible for the development of custom modules that can be installed on other Tripal-enabled sites. RNASeq data is poorly integrated in plant community databases despite the widespread use of expression studies to examine responses to biotic and abiotic stressors in plant systems. To address this limitation, HWG launched a framework devoted to the integration and analysis of gene expression experiments. BioSamples imported from GenBank, with the metadata describing the tissues, treatments, experimental design, and informatic methods, can be explored and compared. Each transcript, examined as part of an RNASeq experiment, has expression values that can be interrogated through interactive visualizations or downloaded for further analysis. The expression data displays can be customized interactively, grouping BioSamples by their tagged metadata values. A tool for comparing gene expression is also available, allowing the user to provide their own gene list and generate a heat map comparing expression of those genes across the relevant BioSamples (Chen et al., 2017). Current development in HWG is focused on supporting bioinformatic workflows, through Galaxy, to allow users to load their own datasets for analysis. HWG has also developed an Elastic Search module that enables search engine style cross-site query. This enables the discovery of relevant datasets within and across Tripal-enabled websites. The Aurora Galaxy Tripal module allows informatic tools to be wrapped in R Markdown which makes it possible to generate Galaxy workflow outputs as static websites.

# GENOME DATABASE FOR ROSACEAE

The Genome Database for Rosaceae (GDR<sup>2</sup> , Jung et al., 2013) was initiated in 2003 to provide curated and integrated, genomic, genetics and breeding (GGB) data alongside analysis tools to enable basic, translational and applied research. Rosaceae is an economically, nutritionally and biologically important plant family that includes the majority of tree fruits (apple, apricot, blackberry, cherry, peach, plum), nuts (almond), and ornamentals (pear, crab apple). While not specifically focused on forest trees, GDR is included here for its role in developing Tripal modules for breeding and the comparative genomics utility with forest hardwoods.

GDR contains 21 genome assemblies and annotations for 14 species. A total of 528,890 genes, reference transcriptomes for all major species, 14,411 germplasm records, 313 genetic maps, 3.3 M molecular markers, 402,559 phenotype measurements, 3,902 QTL/MTL for 392 agronomic traits, 10.8 M genotypes, and 7,449 publications are housed in the database. GDR provides access to breeding management and analysis tools, pathway analysis through PlantCyc and Pathway Inspector, flexible front-end querying, genome annotations through JBrowse, and sequence similarity search functionality (Jung et al., 2016, 2017). GDR is participating in the development of new Tripal modules; visualization and analysis of genetic maps is available through the new Tripal Map Viewer module while whole genome alignments can be executed through the Tripal Synteny Viewer (Jung et al., 2018). GDR is currently expanding the analytic capabilities of their Breeding Information Management System (BIMS) and developing reference genome integration for the Tripal Map Viewer module.

## PLANT GENOME INTEGRATIVE EXPLORER

The Plant Genome Integrative Explorer (PlantGenIE) began as The Populus Integrative Genome Explorer (PopGenIE;

<sup>2</sup>https://www.rosaceae.org

Sjödin et al., 2009), to overcome a lack of tools for routine tasks such as annotating gene lists, converting among sequence identifiers, and visualizing transcript abundance on the basis of EST sequencing. The Populus version was expanded to include visualization of poplar microarray data using the concept of the Arabidopsis electronic fluorescent pictograph (eFP) browser (Winter et al., 2007), gene set enrichment tests for Gene Ontology (Ashburner et al., 2000), Pfam (Finn et al., 2010), genome synteny browsing alongside sequence similarity searching. Later, a complimentary comparative co-expression tool was developed to facilitate inference of functional orthologs on the basis of conserved co-expression (Netotea et al., 2014). The resulting networks were integrated within Populus and Arabidopsis GenIE sites. With the release of the Norway spruce (Picea abies) genome (Nystedt et al., 2013), the associated resources were made available in a Conifer database, ConGenIE, which also includes genomes for loblolly pine (Pinus taeda; Neale et al., 2014; Wegrzyn et al., 2014) and white spruce (Picea glauca; Birol et al., 2013).

The PlantGenIE umbrella resource (Sundell et al., 2015), which included the development of new and updated gene expression tools, together with an integrated gene family analysis, is now available for all species. The primary aim is visualization of gene expression data, primarily from forest tree species, but including related sites for plant models, such as Arabidopsis. As such, gene expression resources for aspen (AspWood; Sundell et al., 2017) and Norway spruce (NorWood; Jokipii-Lukkari et al., 2017) cryogenic tangential cuttings series profiling wood development are being integrated within the PopGenIE and ConGenIE sites, respectively. Dedicated sites have been developed to provide access to spatial transcriptomics (Giacomello et al., 2017) and laser capture microdissection (Canas et al., 2014) gene expression data. For a subset of species, community annotation is also provided via WebApollo (Stevens et al., 2016).

The GenIE sites were originally developed using the Drupal CMS with many of the tools from GMOD (Stein et al., 2002). More recently, an alternative open-source CMS has been developed (GenIE-CMS<sup>3</sup> ) and the PlantGenIE resource is currently being updated in this platform. GenIE-CMS includes web services, enabling end users to access genomic information from external interfaces such as R and Python analysis scripts. Alongside this update, new and improved versions of gene expression visualization tools have been developed and made available as plugins to GenIE-CMS. The PlantGenIE update includes integration with the PLAZA resource (Proost et al., 2015), integration of cross-GenIE gene lists using PLAZA gene orthology inference methods, new integrative gene expression explorer tools, new and updated gene expression networks inferred using seidr (Schiffthaler et al., 2018), and an updated functional enrichment tool. In addition to updating existing reference genomes, new genomes are being added, including a dedicated eucalyptus site, EucGenIE. The development of GenIE-CMS enables rapid and easy implementation of new GenIE

# FUTURE DIRECTIONS

Tree database cyberinfrastructure is supporting comparative genomics, population genetics, expression profiling, and genome annotation. These resources focus on a combination of model and non-model systems and integrate with established comparative resources to deliver value added information. Despite their importance, the sustainability of cyberinfrastructure and the related activities of curating and importing scientific data is always in question. The databases described here are leveraging larger open-source projects as their base framework and sharing web-based applications for common functionality, such as genome browsing and sequence similarity searching. For the forest tree community, the majority of the functionality described here has been deployed within the last 3 years and represents the first coordinated effort across these resources. Frameworks like Tripal and PlantGenIE focus on efficient deployment, web services for cross-talk, data visualization, and analytics to provide a robust environment for end users. As an example, the Elastic Search module developed by HWG allows one to search a gene, genome, marker, and other indexed objects in one database and locate results in other Tripal databases without executing independent searches on each website. Sharing development across a larger community allows forest tree databases to focus on the specific needs of their users. Their independent value exists in the additional curation, metadata acquisition, indexing, analytics, and visualization that is not delivered from the primary repositories. TreeGenes and Hardwood Genomics Web, are focused on expression data integration for nonmodel trees and metadata retrieval and cross-study analytics for population genetics studies. GDR is focused on improving access and visualization of genetic maps as well as breeding tools. The PlantGenIE framework is providing a robust platform for species with a reference genome, and advanced visualization for expression data that integrates across studies. All of these databases are also seeking stronger connections to more broadly plant focused repositories, such as Phytozome, PLAZA, and Planteome, that provide genetic and ontological resources that improve the utility of cross-site querying.

While tremendous advancements have been made through recent and focused development on these pivotal frameworks, several challenges remain for the forest tree community. As datasets become larger and more integrative, it is increasingly difficult for small database teams to keep up with the data capture and curation. With increasing access to reference genomes, largescale population studies, and high throughput environmental data, biological databases must develop more efficient metadata capture, storage, and query capacity. These repositories will be tasked with implementing advanced natural language processing, automated metadata capture, and ontological term assignment to span not only genetic data, but associated phenotypic and environmental data. These latter categories encompass an expansive range, from traditional growth traits to canopy

resources and cross-linking among existing GenIE sites using PLAZA gene family and orthology information.

<sup>3</sup>https://github.com/PlantGenIE/GenIECMS

metrics, soil profiles, microbiomes, and metatranscriptomics. The biomedical community has paved the way for some of this technology but forest tree data, and the associated genetic resources, remain more heterogenous (Koleck et al., 2019). This heterogeneity is combined with high throughput technologies, such as remote sensing, that challenge existing cyberinfrastructure in terms of efficient transfer, storage, and query (Côté et al., 2018). Capturing data for large forest tree populations may involve storing millions of genotypes across thousands of individuals or hundreds of pangenomes. It will also rely on a combination of sequencing and phenotyping technologies that continue to evolve (Bolger et al., 2019). After the storage and minimal reporting requirements are established, the frameworks the databases are built upon will need to assist users in determining the most appropriate analytics and provide the required formatting for the queried data. While progress has been made in connecting data to workflows on high performance computing, such as Galaxy; systems that can recommend appropriate workflows are still in progress. The future of biological databases for all plants is reproducible workflows that represent the metadata associated with the original studies. Concerted efforts in this area and integration of

#### REFERENCES


new data types evolving from high throughput technologies will be key to advancing discovery for the forest tree community.

## AUTHOR CONTRIBUTIONS

JW, MS, NS, DM, and SF designed the databases and software described. EG, NH, SB, TF, SZ, RR, PR, and LS developed the core TreeGenes and TreeGenes Tripal modules. MS, BC, AA, and MC developed the core HWG and HWG Tripal modules. NS and CM developed the core PlantGenIE. SJ developed the core GDR and GDR Tripal modules. SF developed the core Tripal. JW, MS, NS, and DM wrote the manuscript. All authors read and approved the manuscript.

## FUNDING

We would like to acknowledge the funding provided through the National Science Foundation (ACI-1443040 and ACI-1444573) and United States Department of Agriculture (2016-67013-24469).


Arabidopsis genome initiative. Nucleic Acids Res. 26, 80–84. doi: 10.1093/nar/ 26.1.80


records: a systematic review. J. Am. Med. Inform. Assoc. 26, 364–379. doi: 10.1093/jamia/ocy173


formation in Populus tremula. Plant Cell 29, 1585–1604. doi: 10.1105/tpc. 17.00153


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Wegrzyn, Staton, Street, Main, Grau, Herndon, Buehler, Falk, Zaman, Ramnath, Richter, Sun, Condon, Almsaeed, Chen, Mannapperuma, Jung and Ficklin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.