Unraveling the Regulatory G-Quadruplex Puzzle: Lessons From Genome and Transcriptome-Wide Studies

G-quadruplexes (G4s) are among the best-characterized DNA secondary structures and are enriched in regulatory regions, especially promoters, of several prokaryote and eukaryote genomes, indicating a possible role in cis regulation of genes. Many studies have focused on evaluating the impact of specific G4-forming sequences in the promoter regions of genes. However, the lack of correlation between the presence of G4s and the functional impact on cis gene regulation, evidenced by the variable expression fold change in the presence of G4 stabilizers, shows that not all G4s affect transcription in the same manner. This indicates that the regulatory effect of the G4 is significantly influenced by its position, the surrounding DNA topology, and other environmental factors within the cell. In this review, we compare individual gene studies with high-throughput differential expression studies to highlight the importance of formulating a combined approach that can be applied in humans, bacteria, and viruses to better understand the effect of G4-mediated gene regulation.


INTRODUCTION
The landscape of genomic DNA has shown a myriad of alternate DNA structures such as cruciform (Brazda et al., 2011), G-quadruplexes (G4s) (Kwok and Merrick, 2017), triplexes (Frank-Kamenetskii and Mirkin, 1995), and i-motifs (Abou Assi et al., 2018). These structures can form within genomic DNA (B-DNA), as seen in the case of left-handed Z-DNA (Kim et al., 2009), or require the opening of base pairs leading to generation of single-stranded regions within genomic DNA as seen in cruciform DNA (Brazda et al., 2011) and G4s (Kreig et al., 2015). Genome-wide prediction of secondary structure-forming regions in various genomes is possible because of their propensity to favor specific sequence patterns. It has been proven that Z-DNA favors purine-pyrimidine repeats flanked by specific sequences for B-DNA/Z-DNA junction formation (Bothe et al., 2011;Kim et al., 2018), whereas cruciform structures can be formed in palindromic regions (Leach, 1994). The abundance of secondary structures has led to attempts to identify the probable roles of these structures in replication, gene regulation, and DNA damage/repair. These structures have been implicated in several diseases, such as amyotrophic lateral sclerosis, frontotemporal dementia, Fanconi anemia, Bloom's syndrome, and fragile X disease (Wu and Brosh, 2010;Simone et al., 2015).
G4s are among the most widely studied DNA secondary structures formed from consecutive blocks of two or more guanines separated by a single-stranded region called a loop. Four consecutive G-runs form G-stacks with Hoogsteen bonds [(C2)NH2:N7 and O6:N1H], which are stabilized by several monovalent and divalent cations such as K + , Na + , Ca 2+ , and Sr 2+ , which have been reviewed elsewhere (Sannohe and Sugiyama, 2010;Bhattacharyya et al., 2016). K + is the best stabilizer of the G4 due to its favorable ionic radius and Gibbs free energy of solvation (Zaccaria et al., 2016). Computational tools such as Quadparser, PQSFinder, G4Hunter, and QGRS Mapper have been developed to predict putative G4-forming sequences (Parveen et al., 2019). They are based on pattern matching and scoring algorithms using the schema G x N y G x N y G x N y G x , where N is any nucleotide, x is ≥2, and y is ≥1. However, y is usually considered as 1-7 as longer loops are flexible and destabilize the G4 (Hazel et al., 2004). Several putative G4-forming sequences were predicted in the genomes of prokaryotes and eukaryotes and accessible through web servers such as QuadBase and NonBDB Cer et al., 2011). In addition, highthroughput sequencing has also been used to experimentally verify G4 formation in the genomes of many organisms (Marsico et al., 2019) and construct whole-genome experimental G4 maps.
The enrichment of G4s throughout several genomes especially in the cis-regulatory regions (Chambers et al., 2015;Marsico et al., 2019) has led to the development of several small molecules that can bind and stabilize G4s, including porphyrins, benzoquinolines, and perylene diimide (Tian et al., 2018). Some of the most widely used G4 stabilizers are TMPyP4, NMM-IX, pyridostatin (PDS), and BRACO-19. To study the effects of these chemicals on G4-mediated gene regulation, individual reporter assays are performed by cloning specific regulatory regions in reporter vectors and analyzing reporter gene expression in the presence of a G4-stabilizing ligand (Halder et al., 2012a) (Figure 1). G4-mediated cis-regulatory activity has been confirmed by reporter assays in various genes such as CMYC, C-KIT, and BCL2 (Siddiqui-Jain et al., 2002;Ashman and Griffith, 2013;Le et al., 2013).
Although these ligands display considerable selectivity for G4 structures over single-stranded and double-stranded DNA, it is essential to study the effect of ligand binding to untargeted G4-forming regions in the genome (Figure 1). A key factor influencing specificity and the cis-regulatory impact is to identify the conditions that affect G4 formation. Apart from intracellular K + , Na + , and Mg 2+ , there are several other conditions that can affect G4 formation within both prokaryotic and eukaryotic cells. Proteins that can interact with G4s have been described (Brazda et al., 2014;Hale et al., 2014). Some studies have shown that G4 formation is influenced by the chromatin status and that euchromatin shows more G4-forming sequences than heterochromatin (Hansel-Hertsch et al., 2016), corroborating the idea that actively transcribed genes show higher propensity to form G4s. Several universal transcription factors, such as MYC, SP1, and VEGF, are also regulated by G4s. Changes in expression levels of such transcription factors may affect the expression of genes regulated by them (Figure 1).
The cross-reactivity between G4 ligands and i-motifs has also been reported since i-motifs are present on the opposite strand of the G4. Ligands such as TMPyP4 and berberine were shown to bind to i-motifs, although their ability to stabilize them was lower than that of G4s (Fedoroff et al., 2000;Masoud and Nagasawa, 2018;Pagano et al., 2018). The prevalence of G4s in the RNA has also been shown recently (Kwok et al., 2016;Yang et al., 2018), and the functional effects of RNA G4 stabilization have also been reviewed before (Fay et al., 2017). DNA G4-binding ligands such as TMPyP4 have also been shown to destabilize RNA G4s (Ofer et al., 2009;Morris et al., 2012;Zamiri et al., 2014). However, the functional impact of the cross-reactivity of G4 ligands to DNA and RNA has not been well studied. A combination of high-throughput studies along with individual RNA analysis can also be utilized to understand the impact of cross-reactivity of G4 ligands. In addition, since RNA G4s have been extensively discussed in previous reviews (Fay et al., 2017), we do not include the studies on RNA G4s in this review.
Therefore, the addition of a G4-stabilizing ligand can be expected to impact multiple regions in the genome and affect transcription of multiple genes at the same time. High-throughput studies do not provide a fine-grained analysis of the dynamics of individual genes regulated by G4s. Therefore, individual reporter assays are required to analyze the effect of G4s on specific target genes (Figure 1). Concomitantly, individual reporter assays on all genes of an organism combined with genome or transcriptome-wide studies can provide a better understanding of how individual G4s can regulate gene expression ( Figures  1A, B). In this review, we discuss the importance of combining high-throughput experiments with studies on individual G4s in humans, bacteria, and viruses to obtain a better picture of G4-mediated cis regulation.

Studies on Cis-Regulatory G4s in Humans
Quadparser-based computational analysis of the human genome for the prediction of G4-forming sequences based on the schema G 3+ N 1-7 G 3+ N 1-7 G 3+ N 1-7 G 3+ initially revealed 370,000 G4 sequences (Huppert and Balasubramanian, 2005). Regulatory regions of the human genome were enriched in G4s (Huppert et al., 2008;Verma et al., 2008), and the distribution of G4-forming sequences was also dependent on the function of the gene; for example, tumor suppressors contained lower G4-forming sequences than did proto-oncogenes (Eddy and Maizels, 2006). Later studies showed that promoter G4 regions overlapped with DNAse hypersensitive sites in over 40% of human genes (Huppert and Balasubramanian, 2007). Experimental confirmation of the impact of G4 formation on transcription and translation was carried out initially in specific genes such as CMYC (Yang et al., 2017), KRAS (Cogoi and Xodo, 2006), HRAS (Membrino et al., 2011), and BCL2 (Nagesh et al., 2010). However, experimental evidence of G4 formation to corroborate the computational analysis was still pending. Later, high-throughput sequencing studies in vitro showed that over 700,000 G4s can be formed in the genome in the presence of KCl and PDS (Chambers et al., 2015). These studies proposed that G4 formation in the regulatory regions may have an impact on gene regulation. However, they could not elucidate the impact of nonspecific binding of G4 stabilizers or the downstream impact of G4-mediated gene regulation on other genes. These limitations may be overcome by high-throughput transcriptome-wide differential expression studies.
Several studies have involved treating cells for specific periods of time with G4-stabilizing ligands and analyzing changes in gene expression for changes before and after treatment ( Table 1). In most studies, TMPyP4 was used as the G4-stabilizing ligand. Initial studies using the HeLa S3 cell line showed that the G4-binding ligand could cause changes in gene expression . They observed that proto-oncogenes, such as CMYC, CMYB, and CFOS, were downregulated under TMPyP4 treatment, but not by TMPyP2. Another study on the same cell line showed similar results and found that the promoter regions of differentially expressed genes, including CMYC, CMYB, and CFOS, contained G4-forming sequences (Verma et al., 2008). Interestingly, this study also found that there was no statistically significant correlation between the presence of G4s and the expression fold change. The same group performed a subsequent study with TMPyP4 on the A549 cell line and compared the results with BMVC and a TMPyP4 analog TyPy (Verma et al., 2009). This study also observed 863 significantly upregulated and 298 significantly downregulated genes similar to their previous study on Hela S3 cells. Therefore, they shortlisted 12 genes containing G4s in their promoters from the microarray results and analyzed them individually by quantitative Real-Time PCR (qRT-PCR), demonstrating that the genes were indeed affected by G4-stabilizing ligands. V Custom algorithm, G 3 N 1-12 G 3 N 1-12 G 3 N 1-12 G 3 and G 2 N 1-12 G 2 N 1-12 G 2 N 1-12 G 2 X X (Marsico et al., 2019) Deinococcales and Thermales V Quadparser, G 3 N 1-12 G 3 N 1-12 G 3 N 1-12 G 3 and G 2 N 1-12 G 2 N 1-12 G 2 N 1-12 G 2 X X (Ding et al., 2018) List of G4 cis-regulatory studies in viruses Custom script, G 3-6 N 1-7 G 3-6 N 1-7 G 3-6 N 1-7 G 3-6 V X (Ravichandran et al., 2018) Alphaherpesviruses V QGRS Mapper, Quadbase G 2 N 1-7 G 2 N 1-7 G 2 N 1-7 G 2 , G 3 N 1-12 G 3 N 1-12 G 3 N 1-12 G 3 V X (Frasson et al., 2019) "X" indicates "absent," and "V" indicates "present." Frontiers in Genetics | www.frontiersin.org October 2019 | Volume 10 | Article 1002 A similar study on the effect of gene expression by TMPyP4 in the K562 cell line showed that only 33 genes were upregulated and 54 genes were downregulated and proposed that TMPyP4 might act by repressing CMYC and activating MAPK family kinases (Mikami-Terao et al., 2008). The same group observed similar effects using retinoblastoma cell lines in response to TMPyP4 and demonstrated that the induction of p53 and activation of MAPK kinases could contribute to the antitumor effects of TMPyP4. However, in both studies, they could only speculate that the G4 stabilization by TMPyP4 could affect the regulation of differentially expressed genes. In addition, they also observed telomere shortening in both K562 and retinoblastoma cell lines where G4 stabilization prevents telomerase from binding to the 3′ end of the telomere and maintaining telomere length. So, it is difficult to predict whether the effect of G4 is by gene regulation or telomere shortening. TMPyP4 was developed to target telomeric G4s (Haq et al., 1999), but the transcriptomewide study showed nonspecific activities, underscoring the need for further genome-wide studies in cells.
Another study compared the effect of bisquinolinium drugs 360A and PhenDC3 on gene expression in Hela S3 cell lines and showed that 1157 genes were downregulated and 1529 upregulated in PhenDC3. In the case of 360A, only 249 downregulated and 401 upregulated genes were observed (Halder et al., 2012b). This clearly indicates that although the small molecules were developed as G4-binding ligands, they show significant nonspecific effects, which need to be explored further. In addition, the mechanisms that dynamically control G4 formation are yet to be understood, so combined genome and transcriptome-wide mechanistic and functional analysis is required to unravel the mysteries of gene regulation by G4 stabilization.

Studies on Cis-Regulatory G4s in Bacteria
The bacterial genome is considerably simpler than the eukaryotic genome due to the absence of the complex organization that is found in the human genome. However, various computational studies and individual promoter region analysis have shown that the genomes of several bacteria contain G4-forming sequences ( Table 1). Genome-wide prediction has identified G4-forming sequences in the genomes of Escherichia coli (Rawal et al., 2006), Deinococcus radiodurans (Kota et al., 2015), Mycobacterium tuberculosis (Perrone et al., 2017), Xanthomonas sp., and Nostoc sp. (Rehm et al., 2015). These studies also showed that the G4-forming sequences were predominantly restricted to regulatory regions such as promoters (Rawal et al., 2006). In each study, individual luciferase assays carried out on selected promoter regions showed variable responses to the addition of G4 ligands. For example, in the case of D. radiodurans, some promoters showed higher activity when the bacterium was treated with NMM-IX, whereas others showed diminished activity, although all promoters contained G4-forming sequences (Kota et al., 2015). This lack of correlation exhibited in promoter luciferase assays and whole transcriptome studies indicates that the landscape of gene regulation by G4 is more complex than expected even in prokaryotes, despite the simple organization of their genome.
In the case of E. coli, a systematic study on the effect of the location of G4 relative to the transcription start site (TSS) was performed in the genome using reporter assays where the G4-forming sequences were cloned according to their genomic locations into pQE luciferase reporter plasmids (Holder and Hartig, 2014). The results revealed that G4 formation in the 5′ UTR significantly affected reporter gene expression, but the 3′ UTR G4s had a negligible effect on gene expression. It was also observed that the G4 sequences within 20 bp downstream of the TSS showed maximum upregulation or downregulation depending on whether the G4 was formed on the antisense or sense strand, respectively. Interestingly, there was no effect of NMM-IX or other G4-stabilizing ligands on G4-mediated gene regulation (Holder and Hartig, 2014), indicating that more studies are required to explain gene regulation by G4s in E. coli.
In a recent study on G4 sequences in the E. coli genome, the predicted G4-forming regions were aligned using ClustalW to identify repetitive sequence motifs (Kaplan et al., 2016). This was based on the idea that similar sequences will have similar regulatory roles. In this analysis, only 52 sequences matched their stringent schema with G-tract length 1 to 3 and loop length 1 to 7. They further classified these into two groups of well-aligned sequence and performed reporter assays using the representative sequence. Interestingly, all the sequences were within the regulatory regions flanking the open reading frame, and the group was able to identify two sequence motifs conserved in several bacteria. However, the functional impact of these sequences needs to be investigated.
The recent interest in studies on bacterial G4s has necessitated the formulation of a streamlined approach to facilitate interpretation of the roles of G4s in gene regulation. Since the bacterial system has been extensively studied as a model organism, it will be interesting to combine both high-throughput and individualistic approaches to gain a comprehensive picture of bacterial G4s.

Studies on Cis-Regulatory G4s in viruses
Viruses provide an exciting platform for studying the impact of G4s at the genomic and transcriptomic level due to their small genome size. Some viral DNA can be chromatinized as in eukaryotes, and the genetic materials of some DNA and RNA viruses can be integrated into the human genome and affected by the same parameters as human genomic DNA. Therefore, they present an ideal platform for studies that especially focus on the holistic effects of G4-binding ligands especially to understand G4-mediated gene regulation.
Genome-wide computational analyses of G4s in several viral genomes have revealed that DNA viruses had a higher number of G4s per 1 kb compared to RNA viruses (Lavezzo et al., 2018). Many computational and individual reporter studies were performed in a number of viruses to evaluate the cis-regulatory effects of G4s ( Table 1). One of the first systematic genome-wide studies on G4s in viruses was performed in human herpesvirus genomes (Biswas et al., 2016). Preliminary computational analysis of G4s present in regulatory regions of the herpesvirus genome revealed their prevalence in regulatory and long terminal repeat regions.
Among regulatory regions, immediate-early genes showed higher densities of G4-forming sequences when compared with early or late gene promoters. Overall, alpha-herpesviruses such as herpes simplex virus-1 (HSV-1) and varicella-zoster virus genomes had a higher G4-density than did human and mouse genomes, indicating that G4 formation in these viruses would have more impact on gene regulation. In this study, the authors considered G4-forming sequences found in only three genes, namely, UL2 and UL24 of HSV-1 and K15 of Kaposi's sarcoma-associated herpesvirus, for further experimental study. It was observed that G4s could suppress gene expression in the presence of the G4-stabilizing ligands BRACO19 and TMPyP4. However, since several G4-forming sequences were predicted, the questions of how many G4-forming sequences actually form G4s and how many G4s regulate viral gene expression still need to be explored.
In lieu of this important question, a recent study systematically checked the effect of G4s on gene regulation in all genes of the human cytomegalovirus, which belongs to the beta-herpesvirus subfamily (Ravichandran et al., 2018). Unlike previous reports that only tested a few G4s, the genome-wide analysis for all conventional, long-loop, and bulged-G4 schema identified 36 G4-forming sequences associated with 20 viral genes including all immediate-early, early, and late genes. Most of these sequences formed G4s in vitro, and their stability could be further increased by NMM-IX treatment. The cell-based assays using reporter constructs with promoters containing G4s indicated that out of 20 genes only 9 were suppressed effectively by G4-stabilizing ligand NMM-IX. This is interesting because while all tested genes contained G4-forming sequences (evidenced by in vitro assays) in their promoter regions, only half of these genes were affected by the ligand. Therefore, it was proposed that there exists a contextdependent mechanism by which G4s influence viral genes. It is also possible that other factors that are involved in controlling gene expression, such as the binding of human transcription factors, are implicated in G4 activity as shown earlier for HPV (Carson and Khan, 2006). This proves to be an exciting field for further study and can facilitate the construction of G4-mediated regulatory networks. This also shows that although many promoters might contain G4s that can affect gene expression when tested individually, genome-wide studies are also important in studying the collective impact of G4 stabilization.

FUTURe PeRSPeCTIveS
The role of G4s in the genomes of prokaryotes and eukaryotes is an exciting area of research because of the plethora of reports showing the influence of G4s on specific gene expression. However, there is an absence of reconciliation between studies on individual G4s and high-throughput genome and transcriptomewide studies, especially for cis-regulatory G4s. With the advent of next-generation sequencing technologies and high-throughput reporter assays, it should be possible to construct complex G4 networks with the ability to incorporate computational and experimental analysis and present a combined view of G4-mediated regulation. For example, the computational G4 prediction and individual gene reporter assays can be compared with high-throughput differential expression studies to identify candidates that show the maximum effect. The regions can be compared with Chromatin Immuno Precipitation (ChIP) studies to correlate with transcription factor binding sites or chromatin binding sites. Construction of a central repository to store the results of functional analysis from various publications will also facilitate the comparison of data and provide a holistic picture of gene regulation by G4s.

AUTHOR CONTRIBUTIONS
SR, J-HA, and KK wrote the manuscript.