G Quadruplex in Plants: A Ubiquitous Regulatory Element and Its Biological Relevance

G quadruplexes (G4) are higher-order DNA and RNA secondary structures formed by G-rich sequences that are built around tetrads of hydrogen-bonded guanine bases. Potential G4 quadruplex sequences have been identified in G-rich eukaryotic non-telomeric and telomeric genomic regions. Upon function, G4 formation is known to involve in chromatin remodeling, gene regulation and has been associated with genomic instability, genetic diseases and cancer progression. The natural role and biological validation of G4 structures is starting to be explored, and is of particular interest for the therapeutic interventions for human diseases. However, the existence and physiological role of G4 DNA and G4 RNA in plants species have not been much investigated yet and therefore, is of great interest for the development of improved crop varieties for sustainable agriculture. In this context, several recent studies suggests that these highly diverse G4 structures in plants can be employed to regulate expression of genes involved in several pathophysiological conditions including stress response to biotic and abiotic stresses as well as DNA damage. In the current review, we summarize the recent findings regarding the emerging functional significance of G4 structures in plants and discuss their potential value in the development of improved crop varieties.


INTRODUCTION
Double helical B-DNA is the predominant nucleic acid structure of the genome. In addition, DNA may adopt various extrahelical, non B-DNA secondary confirmations depending on the nucleotide content. These secondary structures are prevalent in all living organisms and play a pivotal role in the physiology of organisms. G quadruplex or G4 DNA is one of these structures adopted by spontaneous folding of sequences containing multiple runs of guanines (Bochman et al., 2012). Structurally, G4 DNA comprises of G-quartets or G-tetrads, in which the four guanine bases are bound together via Hoogsteen hydrogen bonds in a square planar conformation (Huppert and Balasubramanian, 2005). G-quartets stack on top of each other to form an advanced nucleic acid structure, G4 DNA (Figure 1). Adding to the complexity of the G4 DNA structure, the stacks of G-quartets are connected by loops of variable sizes (1-7 nucleotides) and orientations (parallel or antiparallel) (Wang and Patel, 1993;Parkinson et al., 2002;Huppert, 2010). These  (Huppert and Balasubramanian, 2005). No permission is required for the modification and reproduction of this figure under the terms of the Creative Commons CC BY license. secondary structures are stabilized by cations, preferably potassium ion (K + ) (Largy et al., 2016). G quadruplex structures form in RNA as well as DNA and may be intermolecular or intramolecular depending on number of nucleic acid strands involved in the quadruplex formation. For identification of potential G4 forming sequences in the in genome, G quadruplex prediction algorithms, such as Quadparser (Huppert and Balasubramanian, 2007), G4 calculator (Eddy and Maizels, 2006), and Quadbase (Dhapola and Chowdhury, 2016), are easily accessible and used widely. G quadruplex forming sequences (GQFS) have been categorized into different types on the basis of the number of guanine repeats (G2-two G4 repeats, G3-three G4 repeats, G4-four G4 repeats) and the number of nucleotide in the loop (1-3 bp, 1-7 bp., etc.) ( Table 1). The stability of G4 structure depends on the length of loop with prediction of increased stability with shorter loop length. For instance, G3 type GQFS are more stable with 1-3 bp than 4-7 bp loop length, similarly G2 type GQFS are more stable with loop length of 1-2 bp than 3-4 bp (Bugaut and Balasubramanian, 2008).
The study of G4 DNA has emerged as a forefront area of research because of its proposed role in several biological functions ranging from physiology to pathology. These secondary structures are found to be abundant in a wide range of eukaryotic and prokaryotic genomes. In bacteria, yeast and humans, genome-wide analyses of GQFS have revealed the nonrandom distribution of these secondary structures (König et al., 2010). It is evident that GQFS are particularly abundant in, but not limited to, promoters (Evans et al., 1984;Kilpatrick et al., 1986), telomeres (Blackburn, 1994), ribosomal DNA (Sun et al., 1998), untranslated region (UTR) of mRNA, micro-and minisatellite repeats (Nakagama et al., 2006), and immunoglobulin heavy chain switch regions (Yu et al., 2003).
In bacteria, GQFSs are evolutionarily conserved and enriched non-randomly in the promoter region of the genes that are associated with specific functions such as transcription, secondary metabolite biosynthesis, and signal transduction, suggesting a regulatory role of G4 DNA at global level in prokaryotes as well (Rawal et al., 2006;Beaume et al., 2013). In addition, G4 DNA and RNA play a key role in recombinationmediated antigenic variation mechanism that effectively varies the amino acid sequence of the surface expressed protein pilin and, thus, evades detection by the host adaptive immune system, in the bacterial pathogen Neisseria gonorrhoeae (Cahoon and Seifert, 2009). The emerging pattern of association of GQFS with specific genomic regions suggests a regulatory role of GQFS in biologically significant pathways (Garg et al., 2016). However, in contrast to the significant information available on the distribution and role of G quadruplex-forming sequences in humans and microbial pathogens, similar studies in plant systems has been very limited. Recent investigation showed that G4 DNA and RNA are also generally conserved across plant species. In this context, several studies on bioinformatics analysis of plant genomes have been accompanied with identification and functional characterization of these secondary structures. In plants, genome wide distribution of G quadruplexes and their association with different genomic features led to the identification of putative G4 forming sequences within gene body or promoter region of orthologs genes in monocot and dicot plant species (Garg et al., 2016). Given the significant regulatory roles ascribed to G4 DNA in multiple systems, understanding the mechanism of gene regulation through G quadruplexes in plants may provide significant information for crop improvement. Very recently, several studies have been conducted to identify GQFS in a wide variety of plant species, including many important crop plants. Here, we will review the current state of understanding of the biological pathways where a significant role of G4 DNA is implicated.

DISTRIBUTION OF G QUADRUPLEX FORMING SEQUENCES IN PLANT SPECIES
Similar to other organisms, the prevalence and distribution of GQFS in plant genomes vary according to the specific GQFS type. The G3 type GQFS were more abundant in the intergenic region, whereas, G2 type GQFS were found to be located in the genic region ( Table 1). The specific association of different type of GQFS with different genomic regions suggest their vital role in various cellular processes for instance G2 GQFS may play a role in regulation of translation and transcription, while G3 GQFS are important for promoter regulation. In plant genome (Arabidopsis thaliana, Oryza sativa, Glycine max, Cypripedium arietinum), several G4 sequences were identified and confirmed to form parallel, antiparallel, intramolecular or intermolecular G4 DNA conformations in vitro by using circular dichroism (CD) spectroscopy and gel electrophoresis (Garg et al., 2016). Based on the gene ontology (GO) enrichment analyses, it have been shown that, in a variety of dicot plant species, orthologous genes harboring GQFS were involved in important biological pathways such as chromatin modification, regulating phosphorylation and intracellular signaling, auxin transport, seed development and GTPase activity. In monocot plant species, orthologous genes with GQFS are involved in biological processes such as development, ion transportation, regulation of transcription and protein folding (Garg et al., 2016).

EVOLUTIONARY CONSERVATION OF G QUADRUPLEX AMONG PLANT SPECIES
G4 DNA forming sequences are evolutionarily conserved from bacterial to single cell eukaryotes to metazoans. Among closely related fungal species, GQFS are evolutionary conserved at the nucleotide level and associated with distinct genomic features (Capra et al., 2010). In order to assess the evolutionary conservation of G4 sequences in plant species, a genome wide analysis was conducted for monocot and dicot plant genomes. The result conclusively showed that G2 type GQFS were abundant, comprising more than 90% of GQFS found in all the plant species analyzed, while G3 type GQFS were found less frequently, comprising 5% of the total GQFS in each of the plant species (Garg et al., 2016). In addition, frequency of GQFS distribution varied between monocot (∼80-1500 GQFS/Mb) and dicot (∼10-20 GQFS/Mb), this disparity in GQFS distribution may be due to high GC content of monocot genomes (Garg et al., 2016). The evolutionary conservation of GQFSs among plant species and their association with specific genomic features as described below suggest that G4 DNAs are integral parts of plant biology and are under evolutionary constraints.

G QUADRUPLEX DISTRIBUTION AND THEIR GENOMIC POSITION: FUNCTIONAL RELEVANCE
G DNAs are considered a molecular switch for gene expression in metazoan cells (Eddy and Maizels, 2006), it is imperative to study the positional relationship of GQFS in plant genomes (Figure 2). In the genomes of A. thaliana, Vitis vinifera, O. sativa and Populus trichocarpa, GQFS are frequently located near the transcribed units or genes (Mullen et al., 2010). In particular, significant GQFS enrichment was observed in the transcription FIGURE 2 | Genome wide occurance of G quadrplex forming sequences in different part of the gene (Garg et al., 2016).
start site proximal regions [TPR], which are generally conserved across plant species. This suggests that G4 motifs in plants, similar to their proposed function in mammalian systems, play a role in regulating gene expression (Andorf et al., 2014). G4 motifs are also enriched at 5 UTR, 3 UTR, and 5 end of introns implicating the role of G4 quadruplex in posttranscriptional regulation of the genes (Andorf et al., 2014;Wang et al., 2015). Comparative analyses of the genomes of Oryza sativa japonica and O. sativa, widely cultivated Asian variety of rice species, showed the enrichment of GQFS in TPR region (149.57GQFS/Mb and 131.34GQFS/Mb, respectively) relative to coding regions, introns, and 5 -and 3 -UTRs. Overall, the conserved pattern of high density of GQFS at TPR across the variety of plant species suggests the role of G quadruplex in transcriptional regulation in these species. Overall, the density of GQFS among monocot species was higher than that among dicot species studied (Wang et al., 2015).
G4 has also been identified in RNA in plants. For example, a combination of biophysical and biochemical assays was carried out to confirm G4 structure formation by a GQFS located in 5 UTR of ATR mRNA in A. thaliana (Kwok et al., 2015). The ATR gene encodes a protein kinase, which is activated upon DNA damage and required for the ensuing DNA damage response of the cell including repair, cell cycle arrest and telomere maintenance. Further study into the role of the GQFS identified at the 5 UT of the ATR mRNA showed its inhibitory effect during translation initiation. Search of ATR homologs among 31 plant species resulted in identification of 35 ATR homologs. At least one GQFS was present in the 5 UTR of 16 (14 plant species) of 35 ATR mRNA. Whether the conserved GQFS present in the ATR mRNA also have conserved function such as the negative regulation of translation as in A. thaliana is yet to be resolved. GQFS mediated gene regulation appears to be prevalent and of functional importance in plant kingdom. Table 2 describe the genome wide distribution of GQFS in different plant species.
Several putative G quadruplex structures have been identified by bioinformatics analyses in microRNA. In human genome, there are ∼16% of pre-mi-RNA that contains putative GQFS and can adopt these secondary structures to modulate canonical stem-loop structure of mi-RNA to adopt G quadruplex structure and thus impeded dicer mediated cleavage of mi-RNA (Mirihana Arachchilage et al., 2015). In addition, the equilibrium between the G quadruplex structure and stem loop structure influence the miRNA functionality as dicer enzyme recognize canonical  Zea mays G3 L1-7 Higher in genic regions Andorf et al., 2014 Frontiers in Plant Science | www.frontiersin.org stem loop structure in pre mi-RNA to produce mature miRNA and thus in turn formation of G quadruplex affect miRNA maturation [The RNA Stem-Loop to G Quadruplex Equilibrium Controls Mature MicroRNA Production inside the Cell (Pandey et al., 2015)]. In silico transcriptome wide analyses have identified significant number of G quadruplex motifs in human long noncoding RNA (lncRNA). Further, biophysical methods provide the information that approximately 60% of these putative structures form stable quadruplex in vitro and a further analyses of these secondary structures would give a better insight about the functional relevance of G4 structures in cellular function (Potential G quadruplexes in the human long non-coding transcriptome (Jayaraj et al., 2012). The plant genome have not been evaluated extensively for the presence of G quadruplex in their non-coding RNAs.

G QUADRUPLEX DURING STRESS AND DNA DAMAGE: BIOLOGICAL RELEVANCE
During unfavorable conditions such as abiotic (environmental factors such as high salt, high or low temperature) and biotic stress (damage to plants mediated by living organisms), plants must adapt to survive. During abiotic stress such as drought, the cytosolic concentrations of cations become elevated (Leigh and Wyn Jones, 1984). Since, higher potassium (K + ) level is a condition that is known to facilitate the G quadruplex formation and under high salinity conditions, the K + ion concentration in the cell increases (Zhang and Blumwald, 2001). This elevated levels of K + ions facilitate the formation of G quadruplex genome wide and might be involved in salinity tolerance. Differential gene regulation mediated by G4 DNA or RNA structure formation is thereby hypothesized to be a potential mechanism to cope up with drought conditions. This hypothesis recently gained support by Mullen et al. (2010) who showed that GQFS are enriched at those genes differentially regulated during drought. In this study, transcriptome study of Arabidopsis originally conducted by Matsui et al. (2008) were analyzed to conclude that 16% of all genes in A. thaliana are drought-responsive and 45% of these genes contained at least one GQFS. Many similar studies followed since, (Andorf et al., 2014) demonstrated the abundance of GQFS in hypoxia-responsive genes in maize (Bailey-Serres et al., 2012). GQFS were also found to be frequent in genes associated with energy homeostasis signaling as well as many genes associated with TOR, AMP kinase, and oxidative stress signaling pathways. Kinases in TOR pathway are also directly regulated by the level of sugar availability and play a crucial role in nutrient and energy sensing. Occurrence of GQFS in genes encoding these kinases suggests that G quadruplex plays important role in regulation, signaling, and metabolic adjustment to energy status (Xu et al., 2010;Robaglia et al., 2012;Dobrenel et al., 2013; Figure 3). In Sapium sebiferum or Chinese tallow, which is an important agricultural crop species in east Asian countries, a bioinformatic analysis predicted the enrichment of GQFS at genes in the lipid biosynthesis and stress response pathways (Yang et al., 2015). Overall, genes that are differentially regulated during various stress conditions are more likely to contain a GQFS, and formation of G quadruplex may be one of multiple adaptive mechanisms utilized by plants during environmental stresses. Understanding how the stress response pathways are regulated in agriculturally important plant species through identification and functional analyses of GQFS can facilitate development of stress-tolerant plant varieties possibly through transgenic techniques and ultimately lead to higher-yield crops (Yang et al., 2015). G4-forming synthetic oligonucleotides impede DNA polymerase activity in vitro and produced truncated product in presence of K + ions, which stabilizes G4 structure (Garg et al., 2016). This result leads to the postulation that replication would be obstructed by G4 DNA in vivo, causing replication fork stall and collapse and thus causes genome instability (Figure 4).

TRANSPOSABLE ELEMENTS AND G QUADRUPLEX FORMATION
Transposable elements are a significant part of eukaryotic genomes as they contain many regulatory sequences and serve as machinery to disseminate the genes present within. Interestingly, GQFS has been found to be present in long terminal repeats (LTR) of plant-transposons and -retrotransposons and possibly effects not only transcription and translation but reverse transcription as well. (Lexa et al., 2014;Kejnovsky et al., 2015). Formation of these secondary structures causes the conformational changes in DNA and DNA can become nucleosome free when its confirmation changes and It is well known that preferred site of integration for TE is open chromatin (Huppert and Balasubramanian, 2005;Liu et al., 2009;Wong and FIGURE 4 | Impediment of DNA Polymerase activity by G quadruplex. Huppert, 2009). G quadruplex in TE inhibits transcription and formation of these structures in TE can serve as hot spot for recombination and TE serves a vehicle for spread of G4 structure in the genome. These secondary structure are not only formed inside the TE due to open chromatin confirmation but can also become the genomic targets for insertion of new TE during which changed DNA confirmations are recognized by transposase or by integrase. Proteins originating from TE retain their affinity to open configuration and prefer binding to secondary structures such as G4 DNA, i.e., RAG1 protein (Nambiar and Raghavan, 2011). Formation of G quadruplex structures in TE-derived RNA participate in many important cellular processes (Kapusta et al., 2013).
Moreover, long stretches of guanines were identified upstream and downstream of the promoter region of retrotransposons. These sequences have been confirmed to readily adopt parallelor antiparallel-stranded G quadruplexes by CD spectroscopy (Lexa et al., 2014). Occurrence of GQFS, at these specific locations suggests their role during initiation of transcription and elongation of retrotransposon RNA. Alternatively, these GQFS might act as the check point of transcription and reverse transcription. Biological role of GQFS in transposable elements life cycle have been established, as long stretch of guanines is found in young and active LTR and lower number of guanines is found in old elements due to supress elongation of RNA strand (Lexa et al., 2014). Such enrichment of GQFS, within specific region of LTR retrotransposons propose functional role of G quadruplex in genome instability (recombination based reshuffling) of plant genomes (Lexa et al., 2014;Kejnovsky et al., 2015).

G QUADRUPLEX BINDING PROTEINS IN PLANTS
Several G quadruplex binding proteins have been identified in yeast and metazoans. Upon binding, these proteins may either help resolve the G4 structures or may enhance the stability of the secondary structures. G4 resolving proteins include DNA helicases such as BLM (RecQ family), WRN (RecQ family), and FANCJ in humans (Sun et al., 1998;Fry and Loeb, 1999;Mohaghegh et al., 2001;Cheok et al., 2005;Wu et al., 2008), Dog-1 in the nematode Caenorhabditis elegans (Youds et al., 2008), and Sgs1 (RecQ family) and Pif1 in yeast (Eddy and Maizels, 2006).
RecQ helicases, conserved from bacteria to humans, are involved in unwinding of a wide variety of DNA substrates including G quadruplexes and thus are important in maintaining genome integrity. In humans, there are five RecQ-family helicases (BLM, WRN, RTS/RECQ4) while yeast and bacteria possess only one -Sgs1 and RecQ, respectively. In case of plants, A. thaliana genome contains seven different genes that encode RecQ family helicase; RECQ1, RECQ2, RECQ3, RECQ4, RECQ4B, RECQ5, and RECQsim (Hartung et al., 2000;Hartung and Puchta, 2006). AtRecQ4A and 4B are two of the genes of this family that evolved due to recent duplication and are 70% identical on protein level to other members of this family. Role of AtRecQ4A have been suggested to be equivalent to yeast Sgs1 and human BLM. AtRecQ4B is distinct among all eukaryotic RecQ homologs as it appears to promote rather than suppress crossover recombinations (Hartung et al., 2007;Schröpfer et al., 2014). RecQsim contains a unique insertion of acidic amino acids in its helicase domain. Homologs of RecQsim of A. thaliana have been found in other plant species including rice and rape (Bagherieh-Najjar et al., 2003). The expression of A. thaliana RecQsim gene in yeast lacking Sgs1 compensates the loss of Sgs1 and rescues the hypersenstivity to the DNA damaging drug methyl methanesulfonate (MMS), indicating functional conservation between these helicases.
In yeast, other non-helicase G4 DNA binding proteins, such as the co-transcription factor Sub1, also contribute to the stability of the genomic loci containing GQFS (Lopez et al., 2017). In case of plants, very little information is available regarding potential G4-binding proteins. In maize (Zea mays), a G4 binding protein known as ZmNDPK1 have been identified in a a ligand-binding screening of a cDNA-expression library. ZmNDPK1, which is a nucleoside diphosphate kinase 1, interacts with folded GQFScontaining oligos, with low nanomolar-range affinity (Kopylov et al., 2015). Electrophoretic mobility shift assay (EMSA) using nuclear extracts from rice plants revealed proteins stably binding G4 DNA of both parallel and antiparallel conformations; the identities of these proteins, however, are still pending (Garg et al., 2016). Additionally, certain medicinal plant extracts such as theaflavin-digallate from tea and saffron carotenoids from Crocus sativus, were shown to contain non-protein, small molecule ligands with G quadruplex binding activity (Hoshyar et al., 2012;Mikutis et al., 2013;Wang et al., 2015). As shown in yeast, the interaction of GQFS with proteins with specific affinity to the non-canonical secondary structure is an important mechanism in G4-associated regulatory functions. Therefore, in order to fully understand the function of G4 in plant biology, identification and further characterization of G4 binding proteins and small molecule G4 ligands in plant species are of high priority.

CONCLUSIONS AND PERSPECTIVES
Genome-wide analyses elucidated numerous GQFS in several plant species including A. thaliana, Z. mays, O. japonicum and O. sativa and others and conclusively showed the abundance of G2 type GQFS in the genic and coding regions and G3 type GQFS in the intergenic regions. Biophysical characterization of a subset of these GQFSs have also been accomplished. G quadruplex play a regulatory role during cellular responses to DNA damage and other internal and external cues such as sugar availability; metabolic and energy status; stress response etcs. In addition, G quadruplex formation can induce fluorescence activation with high selectivity and sensitivity (DasGupta et al., 2015). Current agricultural production is heavily reliant on many biotic and abiotic factors; stress conditions such as drought and soil salinity are main factors responsible for crop yield reduction. Further work into Identification and functional analyses of G quadruplexes in plants can be a particular interest as a target for biotic and abiotic stress response in plants. It is not yet clear whether the tolerance mechanisms of plants in response to various kind of stresses are directly or indirectly regulated by G4 quadruplexes. Future work is expected to focus on defining the detailed the molecular pathway(s) governed by G4 DNA in response to both biotic and abiotic stresses. Generating transgenic plants tolerant to drought and other stresses is a central hypothesis for the agricultural industry. Understanding how the structural transformation of G4 DNA assembly is regulated and in turn regulate gene expression, therefore, could have valuable implications for the development of transgenic plant varieties with higher yield.

AUTHOR CONTRIBUTIONS
PY and VY contributed to the conception of the review article. PY and VY drafted the work. PY, VY, H, and NK wrote the review article. PY, VY, NT, and NK revised it critically. PY, NK, and NT helped in literature search. PY, VY, H, NT, and NK gave Final approval of the version to be published.