The 5BSL3.2 Functional RNA Domain Connects Distant Regions in the Hepatitis C Virus Genome

Viral genomes are complexly folded entities that carry all the information required for the infective cycle. The nucleotide sequence of the RNA virus genome encodes proteins and functional information contained in discrete, highly conserved structural units. These so-called functional RNA domains play essential roles in the progression of infection, which requires their preservation from one generation to the next. Numerous functional RNA domains exist in the genome of the hepatitis C virus (HCV). Among them, the 5BSL3.2 domain in the cis-acting replication element (CRE) at the 3′ end of the viral open reading frame has become of particular interest given its role in HCV RNA replication and as a regulator of viral protein synthesis. These functionalities are achieved via the establishment of a complex network of long-distance RNA–RNA contacts involving (at least as known to date) the highly conserved 3′X tail, the apical loop of domain IIId in the internal ribosome entry site, and/or the so-called Alt region upstream of the CRE. Changing contacts promotes the execution of different stages of the viral cycle. The 5BSL3.2 domain thus operates at the core of a system that governs the progression of HCV infection. This review summarizes our knowledge of the long-range RNA–RNA interaction network in the HCV genome, with special attention paid to the structural and functional consequences derived from the establishment of different contacts. The potential implications of such interactions in switching between the different stages of the viral cycle are discussed.


INTRODUCTION
Predicting how RNA virus populations might evolve -a major public health goal -is a challenging task. RNA viruses, such as hepatitis C virus (HCV), replicate by virtue of a relatively low fidelity viral genome-encoded RNA-dependent RNA polymerase. It has been calculated that for each round of replication the mutation rate in HCV reaches a maximum of 1.3 nucleotides (nt) (Bartenschlager and Lohmann, 2000;Cuevas et al., 2009;Ribeiro et al., 2012). The variation introduced generates a dynamic genome pool composed of different but closely related sequences, referred to as a quasispecies (Martell et al., 1992). Quasispecies dynamics involve the constant sampling and selection of mutations that improve viral fitness. Such mutations are critical for escaping the host immune response, for drug resistance, the infection of new host species, and disease emergence. The acquisition of new mutations by viral genomes therefore provides a background for the action of natural selection. The magnitude of the effect of the acquired mutations is dependent on the environment: changes can be favorable for the infection in a given host but may have important fitness costs in a different environment, even within the same infected individual. This phenomenon creates a complex picture for the prediction of the disease severity and impedes the development of novel therapies (Manrubia and Lazaro, 2016;Perales and Domingo, 2016).
The balance between sequence promiscuity and functional conservation in viral genomes is made possible via the use of highly structured genomic regions that can absorb nucleotide variations -as long as these do not alter their active conformation. These are organized as discrete, in cis-functional RNA domains that operate in an interconnected manner to perform functions essential to the execution of the viral cycle (Romero-López and Berzal-Herranz, 2013). Such domains are considered information-carrying units beyond the nucleotide sequence. The study of the cis-acting functional elements, including their localization, sequence and structural conservation, is key to understanding viral infections at the molecular level. In addition, the essential role of genomic functional RNA domains in viral propagation and persistence makes them potential therapeutic targets.
Hepatitis C virus infection, which has a global prevalence of 2.8% (more than 185 million people are infected worldwide; Mohd Hanafiah et al., 2013), causes severe chronic disease that in many patients may eventually make liver transplantation necessary (Hoofnagle, 2002). Currently, the standard of care (SOC) involves the use of pegylated α-interferon combined with the modified nucleosides ribavirin and sofosbuvir Lawitz et al., 2013). Unfortunately, interferon is not always well tolerated; in addition, the variability of the viral genome prevents a sustained therapeutic response. As mutants resistant to the current SOC arise, new targets and therapeutic agents must be ready. In this context, recent advances in the development of direct antiviral agents (DAAs) have prompted the progress of a new combined therapy consisting in the use of sofosbuvir plus the NS5A inhibitor velpatasvir and the NS3-NS4A inhibitor voxilaprevir (see below) (Bourliere et al., 2017). The two major benefits of this new strategy rely, on one hand, in the lack of α-interferon in this drug cocktail, which significantly reduces the secondary effects of the treatment; on the other, both velpatasvir and voxilaprevir operate as pangenotypic inhibitors, thus simplifying the customization of the therapy.
Hepatitis C virus is a member of the family Flaviviridae and the genus Hepacivirus. Based on the phylogenetic analysis of genomic sequences, up to seven genotypes have been described for HCV showing more than 30% divergence at the nucleotide level. Closely related sets of subtypes and isolates have been identified as well (Smith et al., 2014). The viral genome is a positive RNA molecule of ∼9.6 kb that encodes a single ORF flanked by highly conserved untranslated regions (5 and 3 UTR; Figure 1) (Choo et al., 1989). Viral proteins are produced as a single polypeptide that is co-and post-translationally processed by viral and cellular proteases to yield structural (core, E1, and E2) and non-structural proteins (p7, NS2, NS3, NS4A, NS4B, NS5A, and NS5B; Figure 1) (Grakoui et al., 1993). It is noteworthy that genetic variation does not occur evenly over the viral genome. On average, the complete sequence genome differs by about 31-33% per position (Smith et al., 2014), but only ∼10% of the entire viral genome is susceptible to positive selection (Patino-Galindo and Gonzalez-Candelas, 2017). The affected regions are intimately related to genotype prevalence (Messina et al., 2015). The 5 end of the HCV genome and the viral capsid-encoding sequence are the most conserved regions, with around 80-90% sequence identity. The hypervariable region and the envelope protein-encoding sequence show the least sequence identity among isolates (Figure 1) (Le Guillou-Guillemette et al., 2007). These observations agree with the existence of highly conserved structural units throughout the HCV genome (Fricke et al., 2015;Mauger et al., 2015;Pirakitikulr et al., 2016). Some of these operate as essential functional domains -key targets for the development of new therapeutics and diagnostic agents. This review provides a brief overview of the cis-acting signals critical for HCV infection, with special emphasis on the core partner, i.e., the so-called 5BSL3.2 domain at the 3 end of the viral ORF. It also focuses on the sequences and structural units of the viral genome essential to the preservation of the roles of 5BSL3.2 during the infective cycle.

FUNCTIONAL RNA DOMAINS IN THE HCV RNA UNTRANSLATED REGIONS
Functional RNA Domains in the 5 End of the HCV Genome The HCV 5 UTR occupies the first 341 nucleotides of the HCV genome (Figure 2). This is one of the most conserved genomic regions, with ∼85% of sequence identity preserved across viral isolates (Bukh et al., 1992;Smith et al., 1995). Importantly, HCV protein synthesis and replication are governed by functional domains in the 5 UTR. Viral translation is initiated by a mechanism different to the canonical cap-dependent pathway that dictates most cellular protein production. In HCV, an internal ribosome entry site (IRES), in the absence of any other factor, can direct the recruitment of the 40S ribosomal subunit (Tsukiyama-Kohara et al., 1992;Wang et al., 1993;Pestova et al., 1998;Lytle et al., 2002). The 48S particle is then constituted via interaction with the eIF3 and eIF2/GTP/Met-tRNA ternary complex (Sizova et al., 1998;Otto et al., 2002;Berry et al., 2011;Sun et al., 2013). The three-dimensional structure of the 48S complex allows for the proper positioning of the translation start codon in the P site (Berry et al., 2010(Berry et al., , 2011Quade et al., 2015;Angulo et al., 2016). Viral protein synthesis initiates after the binding of the 60S particle, plus the eIF5 and eIF5B factors, to the 48S complex, thus forming the productive 80S ribosomal complex (Yamamoto et al., 2014). This mechanism can be simplified by bypassing the recruitment of eIF3 and eIF2. In fact, eIF3-IRES binding is not essential for the initiation of viral translation per se. A recent publication has shown that this binding displaces the eIF3 from the canonical 43S pre-initiation translation complex, thus releasing the IRES binding site in the 40S ribosomal subunit and promoting the initiation of protein synthesis (Hashem et al., 2013). In addition, under certain stress conditions during which eIF2 is inactivated by phosphorylation, the HCV IRES can circumvent the eIF2-dependent delivery of the tRNA-Met by the GTPase-activating protein eIF5 (Terenin et al., 2008), or perhaps eIF2A (Kim et al., 2011) or 2D (Dmitriev et al., 2010). This strategy imitates the assembly of the translation machinery seen in prokaryotic cells.
The different proposed mechanisms provide important insights in the translation initiation pathway mediated by the HCV IRES (and other related viral and cellular IRESs). However, several questions remained unclear during past years. In a recent work (Jaafar et al., 2016), new roles for eIF1A were discovered, which may help to illustrate and complete the real stepwise pathway during HCV IRES-dependent translation initiation. The findings include the stabilization of the Met-tRNA binding to the IRES-40S pre-initiation complex, the help in the discrimination of improper AUG start codons and the stimulation of the eIF5B-dependent GTP hydrolysis. The authors proposed a revisited IRES-driven translation initiation model by which the HCV IRES would bind to a pre-assembled translation initiation complex, which is a remainder from the translation termination and ribosomal recycling events. This complex encompasses the 40S subunit, eIF3 and eIF1A. In the absence of the ternary complex eIF2/GTP/Met-tRNA, the IRES can easily dock to occupy the decoding groove, in good agreement with previous reports (Quade et al., 2015). In this step, the IRES can displace the eIF3 from its position in the 40S subunit (Hashem et al., 2013). The conformation of this new complex favors the recruitment of the Met-tRNA, in a process that should not necessarily require eIF2. This would be followed by GTP hydrolysis and the release of eIF2. Then, eIF1A and eIF5 could work together to confirm the use of the functional AUG codon, with the subsequent 60S ribosomal subunit binding. Both the model proposed by Jaafar et al. (2016) and the previous simplified version are supported by experimental data and can overlap to provide a wide view of the potential of the IRES to accomplish the complex task of translation initiation under adverse cellular conditions.
The extreme simplification of the initiation translation mechanism shown by the HCV IRES, compared to the canonical mechanism, is achieved through its complex, highorder structure, in which the canonical protein factors are substituted by functional RNA domains present in the viral genome. The minimal IRES region encompasses most of the 5 UTR and spans to nucleotide 372 within the coding sequence (Figure 2) (Reynolds et al., 1995;Honda et al., 1996b). Under physiological magnesium conditions, the IRES FIGURE 2 | The HCV IRES region. Secondary structure of the 5 UTR in the HCV genome including the minimal internal ribosome entry site (IRES). Domains involved in the interaction with eIF3 and the 40S ribosomal subunit are marked in cyan and yellow, respectively. The translation start codon is shown in enlarged blue lettering. PK, pseudoknot. Numbering corresponds to nucleotide positions of HCV Con1 isolate, genotype 1b. folds autonomously into three major domains -II, III, and IV -defined by simple or branched stem-loops (Figure 2) (Kieft et al., 1999;Perard et al., 2013). Far from being a rigid entity, the HCV IRES is an articulated region in which different domains move collectively to achieve the initiation of translation (Perard et al., 2013). Domains II and III appear aligned at both sides of a double pseudoknot motif (PK1 and PK2; Figure 2) (Berry et al., 2010), which guides the correct positioning of the translation start codon (domain IV) into the P site (Berry et al., 2011). Domain II is mostly involved in the constitution of the pre-initiation 48S complex and in inducing changes in the conformation of the 40S ribosomal subunit to promote the first round of ribosomal translocation (Spahn et al., 2001;Filbin and Kieft, 2011;Filbin et al., 2013;Yamamoto et al., 2015).
The highly branched domain III bears critical partners for the execution of viral protein synthesis. A collection of threeand four-way junctions organize different stem-loops (designated subdomains IIIa to IIIf; Figure 2), which operate as recruiting platforms for the binding of eIF3 (in the JIIIabc junction) (Lukavsky et al., 2000) or the 40S ribosomal subunit (mainly anchored in the critical subdomain IIId; Figure 2) (Jubin et al., 2000;Kolupaeva et al., 2000;Babaylova et al., 2009).
In addition to the well-known IRES-protein interactions, three GGG residues within the essential apical loop of subdomain IIId specifically recognize (via canonical Watson-Crick base-pairing) the highly conserved CCC triplet in structural element helix 26 of the rRNA 18S (Malygin et al., 2013a;Matsuda and Mauro, 2014). This interaction favors efficient and stable HCV IRES-40S binding, and seems to be specific for HCV RNA. Certainly, mutations in the rRNA 18S disrupting the IRES-40S contact do not affect cap-dependent translation (Matsuda and Mauro, 2014). Further, changes in the GGG triplet also block IRES translational activity (Jubin et al., 2000;Matsuda and Mauro, 2014;Angulo et al., 2016). A direct consequence of the interaction between IIId and 18S rRNA is a conformational rearrangement in the region surrounding the universally conserved nucleotide G1639 in rRNA, which unleashes the tRNA discrimination mechanism and the subsequent initiation of translation (Malygin et al., 2013a).
The molecular and functional organization of the HCV 5 UTR provides clear evidence of how information contained in genomic RNA domains governs the progress of the infective cycle.

Structural and Functional Organization of the HCV Genomic RNA 3 UTR
Further proof exists of the essential nature of genomic functional domains in HCV RNA. The HCV 3 UTR is a ∼200-250 nt-long sequence that contains essential functional domains required for viral replication and translational control (Kolykhalov et al., 1996;Fricke et al., 2015). Three well-defined regions are (Figure 3): (i) The hypervariable region (HV); this is around 40 nts long and shows low-level sequence conservation among HCV isolates (Tanaka et al., 1995(Tanaka et al., , 1996Yamada et al., 1996). However, its secondary structure is preserved as a single stem-loop plus a 5 region that partially overlaps the 5BSL3.4 domain (Figure 3) (Tanaka et al., 1996;Yamada et al., 1996;Ito and Lai, 1997). Though the hypervariable region seems to be dispensable for virus viability, its complete deletion leads to a significant reduction in HCV replication efficiency in cell culture (Kolykhalov et al., 2000;Friebe and Bartenschlager, 2002;Yi and Lemon, 2003). (ii) The polyU/UC tract. This varies in length from one viral isolate to another, ranging from 30 to 80 nts (Figure 3; Kolykhalov et al., 1996). It consists of a homopolyuridine stretch interrupted by cytosine residues. Shortening the poly(U) tract below 27 nts, or interruption by CC dinucleotides, leads to the instability of the viral genome (Friebe and Bartenschlager, 2002;You and Rice, 2008). This suggests the recruitment to the polyU/UC tract of protein factors with special preferences for uridine over other pyrimidines. Interestingly, the HCV NS3 helicase, NS5A and NS5B (Figure 1) have been shown to preferentially bind poly(U) sequences in vitro (Gwack et al., 1996;Lohmann et al., 1997;Huang et al., 2005), which suggests a role for the poly(U/UC) tract in HCV replication. Nucleotide numbering is as in Figure 2.
(iii) The 3 X tail, located at the very 3 end of the HCV RNA genome, is one of the most sequence-and structurallyconserved regions of the entire genome (Figure 3) (Kolykhalov et al., 1996;Blight and Rice, 1997). This 98 nt-long sequence is of major structural interest given its dynamic folding, which endorses the idea that the 3 X tail has an important regulatory function. Two mutually exclusive conformations (Figure 3) have been identified by chemical modification assays (Cristofari et al., 2004;Ivanyi-Nagy et al., 2006) and NMR (Cantero-Camacho and Gallego, 2015): a three stem-loop (3 SL3, 3 SL2, and 3 SL1) conformer (Figure 3, upper panel), which exposes the highly conserved k sequence motif complementary to the apical loop of the upstream 5BSL3.2 domain (see below) (Friebe et al., 2005), and a two stem-loop (3 SL2 and 3 SL1/3 SL1 ) conformer (Figure 3, lower panel). Nevertheless, recent NMR and SAXS studies have led to the proposal of the existence of a predominant, relatively rigid conformation defined by two stem-loops exposing the palindromic sequence motif DLS (dimer linkage sequence; Figure 3) (Ivanyi-Nagy et al., 2006;Shetty et al., 2010;Cantero-Camacho and Gallego, 2015;Cantero-Camacho et al., 2017). The DLS motif is involved in the formation of homodimeric genomes. In the two stem-loop conformation, a fine-tuned balance between two further isoforms is achieved, mediated by long-distance RNA-RNA interactions (see below). This conformational switch works via a slight gliding of 3 nts at the base of 3 SL1 (Figure 3) (Cantero-Camacho and Gallego, 2015;Cantero-Camacho et al., 2017) allowing the formation of 3 SL1 (Figure 3). This conformation favors the initiation of primer-independent RNA synthesis via the viral RNAdependent RNA polymerase NS5B protein (Kao et al., 2000). All this is consistent with the fact that a homodimeric genome operates as a preferential template for HCV polymerase (Masante et al., 2015), suggesting that the acquisition of the two stem-loop conformation is a prerequisite for the progression of the infective cycle (Romero-López et al., 2014;Masante et al., 2015). Dimer genomic formation might also promote the generation of new recombinant variants during RNA synthesis (Morel et al., 2011;Galli and Bukh, 2014), helping to improve viral fitness.
As well as its essential role in RNA replication, the 3 UTR functions as a translation enhancer (Mccaffrey et al., 2002;Bradrick et al., 2006;Song et al., 2006;Bung et al., 2010). This is mediated by the acquisition of a viral genome's closedloop topology, which resembles that adopted by mRNAs that are translated in a cap-dependent manner (Song et al., 2006;Weinlich et al., 2009;Bai et al., 2013). This conformation depends on the establishment of distant, direct long-range RNA-RNA contacts Berzal-Herranz, 2009, 2012;Shetty et al., 2013). The interaction of the translational machinery with both ends of the viral genome is also critical in the viral genome achieving a circular isoform (Bai et al., 2013), which helps to retain the 40S ribosomal subunit during the translation termination step and favors ribosome recycling for the next round of protein synthesis (Bai et al., 2013).
The HCV 3 UTR thus plays various roles during the viral cycle, controlling the progress of infection via a collection of functional elements that can switch between different structural states.

CONSERVED STRUCTURAL AND FUNCTIONAL DOMAINS WITHIN THE CODING SEQUENCE
In addition to RNA domains being located in the untranslated regions, numerous cis-acting signals have been identified in the ORFs of RNA viruses (for a review, see Romero-López and Berzal-Herranz, 2013). The search for unique, highly conserved structural and functional elements encoded in the HCV ORF has been going on for 15 years. The use of bioinformatic tools and secondary structure mapping, combined with genetic strategies, has finally provided a good overview of the global and local folding of HCV RNA (Tuplin et al., 2002Mcmullan et al., 2007;Davis et al., 2008;Diviney et al., 2008;Chu et al., 2013;Fricke et al., 2015;Mauger et al., 2015;Pirakitikulr et al., 2016). The HCV genome is a highly compact molecule, with extensive base-pairing, helping to preserve viral RNA from cellular endonuclease-mediated degradation. Two of these degradation systems, both of which are involved in the innate immune response, include RNase L, which is specific for singlestranded regions, and the double-stranded-specific interference pathway (Li and Lemon, 2013). Importantly, by constraining the length of the helix segments, the viral genome has reached a perfect balance between single-and double-stranded regions with the aim of minimizing the effect of RNase L activity without unleashing the interference mechanism. This is consistent with the presence of alternate, extensive regions of compact folding throughout viral RNA genomes -the so-called GORS (genomescale ordered RNA structure) elements. The existence of these elements correlates positively with host persistence and cellto-cell movement Davis et al., 2008;Witteveldt et al., 2014).
Up to 20 conserved RNA structural elements are scattered throughout the HCV ORF (Figure 1), expanding the functional repertoire of the viral genome via their participation in viral translation, replication, and infectivity (Tuplin et al., 2002Mcmullan et al., 2007;Diviney et al., 2008;Chu et al., 2013;Fricke et al., 2015;Mauger et al., 2015;Pirakitikulr et al., 2016). Interestingly, the NS5B coding sequence is specifically enriched in functional RNA domains, with up to 10 distinct conserved stem-loop structural units able to drive HCV RNA and protein synthesis (Figure 1) (Smith and Simmonds, 1997;Walewski et al., 2001;Tuplin et al., 2002Tuplin et al., , 2004Hofacker, 2004;Lee et al., 2004;You et al., 2004;Chu et al., 2013;Mauger et al., 2015;Pirakitikulr et al., 2016). One of the best-characterized functional RNA domains in the NS5B coding sequence is the so-called 5BSL3.2 domain (also known as SL9266 following the new standardized nomenclature system according to the genomic nucleotide position of the first 5 paired residue of the stem-loop). The following sections focus on the molecular and functional features of this domain and its relationships with distant RNA elements in the HCV genome.

THE 5BSL3.2 DOMAIN MAPS WITHIN THE NS5B CODING SEQUENCE
Using a combination of phylogenetic comparisons and thermodynamic prediction methods, Tuplin et al. (2002), defined the precise and conserved boundaries within the NS5B coding sequence that mark the limits of a set of genotypically well-conserved secondary structure elements. Five of these had already been proposed by other groups, either in whole or in part (Smith and Simmonds, 1997;Hofacker et al., 1998;Walewski FIGURE 4 | The CRE region. Sequence and secondary structure of the HCV CRE region, including the functional domains 5BSL3.1, 5BSL3.2, and 5BSL3.3 plus the 5BSL3.4 stem-loop. Conserved residues and covariant base pairs in the essential 5BSL3.2 domain, as described by You et al. (2004), are indicated in boxes. Colors denote the frequency for each nucleotide variation, as indicated. NP (non-paired) represents nucleotide variations impeding the formation of canonical base pairs. The translation stop codon is indicated in enlarged blue letters. Position numbering is as in Figure 2. Tuplin et al., 2002). Nevertheless, it was in 2004 when three independent groups described the existence of a preserved sequence and structural region composed of three stem-loops in the 3 terminus of the coding sequence. These were named 5BSL3.1, 5BSL3.2, and 5BSL3.3 (or SL9217, SL9266, and SL9324, respectively; Figures 1, 4; Lee et al., 2004;You et al., 2004;Friebe et al., 2005). Reverse genetic analyses in hepatocytes bearing subgenomic replicon constructs, and in full-length viral replication models, showed these stem-loops to operate as a cis-acting replication elements (CRE). The central domain 5BSL3.2 was shown indispensable for HCV propagation (Lee et al., 2004;You et al., 2004;Friebe et al., 2005;Masante et al., 2015;Tuplin et al., 2015).
Bioinformatic prediction, biochemical structural mapping and NMR studies have all been undertaken to try to decipher the three-dimensional folding of 5BSL3.2 (Lee et al., 2004;You et al., 2004;Friebe et al., 2005;Tuplin et al., 2012), which was found to be a 48 nt-long imperfect hairpin with a 12 nt-long apical loop (Figure 4). The stem is interrupted at its 3 end by an 8 nt-long bulge. Interestingly, both unpaired regions are phylogenetically conserved across different genotypes and show low synonymous site sequence variation (Figure 4) (You et al., 2004) that cannot be explained only by the need to preserve the NS5B coding sequence. Such a high degree of conservation undoubtedly points to a role in the functional control of the infective cycle, mediated by the apical loop and the bulge of the 5BSL3.2 domain (Lee et al., 2004;You et al., 2004;Friebe et al., 2005;Tuplin et al., 2012Tuplin et al., , 2015. Interestingly, the 5 -CACAGC-3 sequence motif in the apical loop is found in other conserved elements of distantly related flaviviruses, such as Kunjin virus, West Nile virus or Dengue virus, where it operates as a single CRE (Markoff, 2003;Friebe et al., 2005). This observation points to the existence of common molecular mechanisms for viral RNA synthesis across different members of the family Flaviviridae.
Co-variation data confirm the existence in the basal part of the stem-loop of three out of the eight base pairs. In the upper stem, three of the six base pairs are invariable and two of the other remaining three can be predicted through compensatory base pair changes (Figure 4) (You et al., 2004). These phylogenetic data provide a convincing clue about the existence of the 5BSL3.2 hairpin in vivo.
From a structural point of view, the molecular context surrounding the 5BSL3.2 domain is striking. It is embedded between two other stem-loops, 5BSL3.1 and 5BSL3.3, to yield a high-order structure that can be depicted as a cruciform element ( Figure 5) (Lee et al., 2004;You et al., 2004;Friebe et al., 2005). Domains 5BSL3.1 and 5BSL3.3 were initially identified as evolutionarily conserved elements (Smith and Simmonds, 1997;Tuplin et al., 2004) that fold into stable stemloop structures (Figure 4). RNase and chemical mapping have confirmed this secondary structure and support the proposed large cruciform structure (You et al., 2004). However, neither mutagenesis nor biophysical methods have yet confirmed the existence of this high-order structure in vivo. This might be due to the dynamic and relatively unstable nature of long-distance RNA-RNA contacts, which can promote the switch between different metastable structural states in the RNA molecule, depending on the presence of specific ligands or external stimuli. Thus, such contacts suggest a regulatory operation mode that might rely on the versatility and efficiency of the HCV CRE region.

THE 5BSL3.2 DOMAIN LIES AT THE CORE OF A COMPLEX RNA-RNA INTERACTION NETWORK
The remarkable structural features of the 5BSL3.2 domain, as well as its strong conservation, point to its critical involvement in the HCV infective cycle. As mentioned above, the observation that both the apical loop and the bulge of the 5BSL3.2 domain are highly preserved among different genotypes and viral isolates suggests their participation in interactions with other viral RNA elements. Several reports have provided substantial proof of the existence of long-distance RNA-RNA contacts, together forming a complex network of interactions that appears to be governed by 5BSL3.2 (Friebe et al., 2005;Romero-López and Berzal-Herranz, 2009;Tuplin et al., 2012Tuplin et al., , 2015Shetty et al., 2013;Fricke et al., 2015). Such a network would organize the three-dimensional folding of the viral genome into a compact conformation that correlates with virulence and the persistence of infection (Davis et al., 2008). To date, two conformational rearrangements of the HCV genome are known to be mediated by the 5BSL3.2 domain: viral genome circularization and the structural tuning of the 3 end of the HCV genome.

Viral Genome Circularization
Genome circularization is a crucial step in the initiation of viral protein and RNA synthesis in many positive-stranded RNA viruses (Villordo and Gamarnik, 2009). It depends on either direct, long-distance RNA-RNA interactions, or protein bridges that bring the ends of the viral RNA together. In most cases, a combination of both mechanisms is used. While no conclusive evidence exists for the acquisition of a closed-loop topology by the HCV genome, indirect data suggests it does indeed occur and depends on long-range RNA-RNA contacts (Romero-López and Berzal-Herranz, 2009;Shetty et al., 2013;Fricke et al., 2015). Using bioinformatic, biochemical, and biophysical methods, several independent groups have shown the establishment of a direct interaction involving the bulge of the 5BSL3.2 domain and the apical loop of subdomain IIId in the IRES element ( Figure 5) (Romero-López and Berzal-Herranz, 2009;Shetty et al., 2013;Fricke et al., 2015). Importantly, this interaction occurs in the absence of protein factors. According to the proposed model, the essential nucleotide G263, located at the base of the apical loop of the subdomain IIId, would establish the initial contact with C9301 on the 3 side of the bulge in the 5BSL3.2 domain (Romero-López and Berzal-Herranz, 2009). This contact theoretically extends along the unpaired regions of the participating domains. Nucleotide complementarity runs through the flanking stems up to A288 for the 5 end, and to A9275 for the 3 end (Romero-López and Berzal-Herranz, 2009). However, no such extended complex has ever been detected.
Conformational consequences derived from the interaction 5BSL3.2-IIId have been mapped by chemical probing coupled to three-dimensional structure prediction (Figure 5) Romero-López and Berzal-Herranz, 2015). The aromatic rings of the residues located in the apical loop of subdomain IIId have been reported to change their orientation with respect to the solvent because of their interaction with 5BSL3.2 (Figure 5). Such conformational rearrangements would affect IRES function due to their interference with the efficient recruitment of the 40S ribosomal subunit mediated by the subdomain IIId (Malygin et al., 2013a,b). This is in good agreement with the idea that the 5BSL3.2 domain operates as a specific and efficient negative regulator during HCV IRES-dependent translation (Romero-López and Berzal-Herranz, 2012). The 5BSL3.2 is therefore considered a versatile multifunctional genomic element that takes part in different steps of the infective cycle and controls transitions between them.

Structural Tuning of the 3 End of the HCV Genome
The apical loop of the 5BSL3.2 domain establishes a long-range interaction with a complementary sequence, k, located in the Middle panel: It shows the prediction for the structure of subdomain IIId (Romero-López and Berzal-Herranz, 2015). Root mean-square deviation (RMSD) values reflect differences in the conformation of subdomain IIId in the presence of the IIId-5BSL3.2 interaction (right) compared to its absence (left). Color code: Black, residues with an RMSD < 3.5 Å; orange, nucleotides with an RMSD ranging from 3.5 to 6.0 Å; red, residues with a RMSD of >6.0 Å. Lower panel: Structural reorganization of the 3 X tail depending on the establishment of different RNA-RNA contacts. The interaction of the 5BSL3.2 domain with the k motif in the 3 X region renders a two-stem-loop conformation in which the residues located at the 3 end of the viral genome appear completely base-paired (left, closed isoform). However, contacts between 5BSL3.2 and upstream domains, such as the Alt sequence or subdomain IIId, release a three-nucleotide 3 overhang in the 3 X tail (right, open isoform). DLS and k motifs are indicated according to Figure 3. Nucleotide numbering is as mentioned in Figure 2. apical loop of the downstream 3 SL2 element within the 3 X tail (Figures 3, 5) (Friebe et al., 2005;Cantero-Camacho et al., 2017). The biochemical and structural properties of this interaction have been studied in depth following different biophysical and biochemical strategies (Friebe et al., 2005;You and Rice, 2008;Tuplin et al., 2012;Palau et al., 2013;Shetty et al., 2013;Cantero-Camacho et al., 2017). These analyses showed that the contact 5BSL3.2-3 X occurs in the absence of RNA chaperone proteins and is preferred over the formation of homodimeric viral genomes (Cantero-Camacho et al., 2017). Recent studies have determined that even for the dimerizable conformation of the 3 X region, which partially occludes k, the interaction with the 5BSL3.2 domain is made possible by the induction of the structural disruption of the two stem-loop conformations of the 3 X tail, thus unfolding the motif k (Cantero-Camacho et al., 2017). Hence, 5BSL3.2 may act as a structural cofactor promoting the acquisition of a new, functionally active folding of the HCV genome.
Conversely, the 5BSL3.2 bulge element may interact with the complementary sequence motif centered on position 9110 -the so-called Alt sequence (Figure 5) (Diviney et al., 2008;Tuplin et al., 2012;Shetty et al., 2013). This contact overlaps with that involving subdomain IIId. Both interactions show dissociation constant values in the same range, and seem equally likely to occur (Shetty et al., 2013). Choosing between them is affected by additional structural constraints and the presence of different cofactors (see below).
A significant feature of the interactions mediated by 5BSL3.2 at the 3 end of the viral RNA is that they can be established in an independent and simultaneous manner (Tuplin et al., 2012;Palau et al., 2013;Shetty et al., 2013). This suggests that the binding sites are structurally independent. The connections established at the 3 end render the formation of an intricate tertiary structure in which the 5BSL3.2 forms the core of an extended and dynamic pseudoknot (Tuplin et al., 2012). Reverse genetic and biochemical structural data have revealed this pseudoknot able to promote two alternative conformations in the 3 X region (Tuplin et al., 2012;Cantero-Camacho and Gallego, 2015;Kranawetter et al., 2017). In the open form, contact with the Alt sequence or with subdomain IIId would favor the establishment of the preferred two stem-loop conformation bearing three overhang nucleotides at the 3 end of the HCV RNA (Figures 3, 5). As mentioned above, this is a favored folding state for virus replication (Kao et al., 2000) and for genome dimerization (Palau et al., 2013;Romero-López et al., 2014;Cantero-Camacho et al., 2017). Alternatively, with the closed isoform, the interaction of 5BSL3.2 with the k motif in the 3 X tail would induce the local melting of the upper stem of 3 SL2 and induce conformational rearrangements in the base of the 3 SL1 , leading to the formation of the extended stem-loop 3 SL1 (Figures 3, 5) (Tuplin et al., 2012;Cantero-Camacho and Gallego, 2015). Interestingly, the thermodynamic equilibrium between these two isoforms seems to be dependent on the viral genotype, with the open conformation being favored by genotype 1b and the closed form by genotype 2a (Tuplin et al., 2012). These structural data undoubtedly suggest clear correlations between HCV RNA conformation, genotypedependent virulence and even the viral sustained response to the treatment shown by the infected patient (Irshad et al., 2010;Carter et al., 2017).
The creation of this complex network of RNA-RNA interactions promotes the structural remodeling of the viral genome, not only in the directly involved domains, but also in distant regions. This phenomenon achieves complex and interconnected genomic-RNA-folding-dependent regulatory pathways. Indeed, the conformational rearrangement mediated by the 5BSL3.2 domain at the 3 end is regulated by the IRES region, most likely as a consequence of the interaction IIId-5BSL3.2 (Romero-López and Berzal-Herranz, 2009;Romero-López et al., 2014). Using chemical structural mapping coupled to secondary structure prediction, it has been shown that the presence of both the IRES and the CRE regions favors the acquisition of the dimerizable two stem-loop conformation in the 3 X tail, exposing the DLS motif (Romero-López et al., 2014). This led to the assumption that the contact IIId-5BSL3.2 improves the dimerization of the viral genome. However, it has been recently reported that while the two stemloop isoform is a requisite, it is not the only determinant affecting genomic dimer formation (Cantero-Camacho et al., 2017;Romero-López et al., 2017). In summary, the IRES and the CRE regions might be considered both cofactors and chaperoning agents that finely tune the three-dimensional folding of the 3 end of the HCV RNA in order to control the different stages of the infective cycle, and the transitions between them.
The folding of the IRES region is also influenced by the presence of the 3 end of the HCV genome. The 3 UTR and the CRE region have been shown to effect remarkable changes in subdomains IIIe and IIIf (Romero-López et al., 2012) -essential elements involved in the proper positioning of the translation start codon in the P site of the 40S ribosomal subunit (Berry et al., 2011). In addition, the three-dimensional organization of the eIF3 binding platform within the IRES can be modified by the interaction IIId-5BSL3.2, turning the typical S-turn conformation (Collier et al., 2002) into a more rigid form with altered functionality . Domain IV appears less stable in the presence of the 3 end of the HCV RNA, which might contribute toward the proper positioning of the translation start codon (Honda et al., 1996a;. All these findings are in good agreement with the acquisition of a circular topology by the HCV genomic RNA.
The 5BSL3.2 domain thus participates in the establishment of different long-range RNA-RNA interactions in the absence of proteins. This helps create a complex network of contacts, the careful regulation of which allows the HCV genome to acquire different functional conformational states, facilitating proper switching between the different stages of the viral cycle.

HOST AND VIRAL COMPONENTS INTERACT WITH THE CRE
The recruitment of both host and viral components is important in the regulatory activities of the CRE region (Lourenco et al., 2008). The apical loop of the 5BSL3.2 domain binds to the viral polymerase NS5B protein (Zhang et al., 2005), favoring FIGURE 6 | Long-range RNA-RNA interactions regulate the different steps in the HCV infective cycle. The figure shows a working model demonstrating the long-range RNA-RNA interactions in the HCV genome described to date, and their role in the progression of the infective cycle. Briefly, during early infection, the viral genome is released to the cytosol and viral translation initiates in an IRES-dependent manner on the surface of the endoplasmic reticulum (ER). Subdomain IIId is then occupied by the 40S ribosomal subunit, which impedes the interaction IIId-5BSL3.2 and enhances the conformational rearrangement at the 3 end mediated by the 5BSL3.2 domain. The accumulation of NS viral proteins induces the formation of replication complexes, which preferentially recruit viral genomes showing the interaction IIId-5BSL3.2 or Alt-5BSL3.2. This contact favors a translationally repressed-state and enhanced replication dependent on the interaction Alt-5BSL3.2. It also interferes with the formation of dimeric genomic particles. The accumulation of newly synthesized HCV RNA genomes induces the initiation of new rounds of translation. The dimerization process is thermodynamically favored under these conditions, and the dimeric genomes produced offer optimal templates for viral replication. Alternatively, a fraction of the HCV RNA molecules is encapsidated and released to the extracellular medium by exocytosis. the positioning of the polymerase in the 3 X tail and thus the initiation of replication. The interaction of the NS5B protein with the apical loop of 5BSL3.2 might compete with the contact 5BSL3.2-3 X. Swapping from NS5B recruitment to 3 X interaction could therefore be used by the 5BSL3.2 domain as a regulatory mechanism for promoting transitions between steps during the infective cycle.
In recent years, exhaustive lists of cellular proteins susceptible to recruitment by the CRE region have been produced (Oakland et al., 2013;Ríos-Marco et al., 2016). Interestingly, some have been shown to influence viral translation, replication or both (Oakland et al., 2013;Ríos-Marco et al., 2016). Oakland et al. (2013) showed Ewing's Sarcoma binding protein 1 (EWSR1) to interact with the CRE region; EWSR1 is a nuclear factor that regulates RNA synthesis and processing as well as the transport of pre-mRNAs involved in cell cycle progression and the response to DNA damage (Zinszner et al., 1994;Paronetto et al., 2011). It also binds other factors related to splicing, such as those belonging to the heterogeneous ribonuclear protein family (hnRNPs). Its regulatory role during mitosis has also been reported (Wang et al., 2016). The involvement of ESWR1 in tumorigenesis has attracted much attention since HCV infection can lead to the development of hepatocellular carcinoma. Though EWSR1 is preferentially located in the nucleus, it can be translocated to the cytosol of HCV-infected hepatocytes (Oakland et al., 2013). Here it binds to the viral genome in a manner dependent on the structure resulting from the interaction 5BSL3.2-3 X, promoting efficient HCV RNA replication, at least in cell culture (Oakland et al., 2013). Subsequent proteomic analyses have identified a collection of RNA-binding proteins with highly conserved RNA recognition motifs able to bind to the CRE region (Ríos-Marco et al., 2016). hnRNPA1 is a very abundant nuclear and cytosolic protein involved in the packaging of pre-mRNAs into spliceosomal particles, and in the translocation of processed poly(A) mRNAs from the nucleus to the cytoplasm (Dreyfuss et al., 2002). During HCV infection, hnRNPA1 operates as an IRES-dependent translation initiation enhancer via its interaction with the IRES region (Lu et al., 2004), and as a negative regulatory partner of viral RNA synthesis, most likely by competing with NS5B for a common interacting site in the 5BSL3.2 domain (Ríos-Marco et al., 2016). Additionally, hnRNPA1 plays an important role as a splicing factor in the maturation of IFR3 (interferon regulatory factor 3), which is involved in interferon-mediated immunity (Guo et al., 2013). Sequestering hnRNPA1 by 5BSL3.2 would, therefore, influence interferon production, helping HCV escape the cellular immune response. HMGB1 (high mobility group box 1 protein) also interferes with HCV replication by binding to the CRE (Jung et al., 2011;Ríos-Marco et al., 2016), and is considered an antiviral factor (Jung et al., 2011). Finally, host proteins with helicase activity, such as DDX3, DDX5, and DDX17, might recognize the 5BSL3.2 domain and promote HCV replication and/or translation (Ríos-Marco et al., 2016). This last observation confirms the role of the CRE as a regulator of viral protein and RNA synthesis (Ariumi et al., 2007;Randall et al., 2007;Oakland et al., 2013;Ríos-Marco et al., 2016).
MicroRNAs (miRNAs) produced by the host cell have been identified as agents that regulate viral infection (Bruscella et al., 2017). Via a little-understood mechanism, miRNAs influence HCV infection at different stages. For example, miR-122, a highly abundant miRNA in hepatocytes, promotes viral translation and replication via its interaction with the IRES region at different target sites (Jopling et al., 2005;Jangra et al., 2010;Roberts et al., 2011), while miR-199a * represses RNA replication (Murakami et al., 2009). Let-7b also acts as a negative regulatory agent of HCV replication via its direct interaction with the region connecting domains 5BSL3.2 and 5BSL3.3 (Cheng et al., 2012). Let-7b is involved in cell differentiation and has been intimately associated with the development of cancer (Takamizawa et al., 2004;Yu et al., 2007). This, along with the observation that cell cycle regulatory proteins bind to the CRE, provides additional evidence of the potential role of the CRE region in HCV-associated hepatocellular carcinoma.

THE CRE REGION IS A CRITICAL SWITCH COMPONENT IN THE HCV INFECTIVE CYCLE
The HCV cycle is extensively compartmentalized, both spatially and temporally: viral protein synthesis, replication, and encapsidation occur in different cellular localizations and do not overlap in time (Shulla and Randall, 2015). The switch between one stage and the next must therefore be carefully controlled. The machinery required for this is not well-understood, but it is widely assumed that functional genomic domains, along with host and viral factors, together control the progression of the infective cycle.
During early infection, HCV virions are endocytosed and their genomes released into the cytosol by a littleunderstood mechanism (Figure 6; for a review see Scheel and Rice, 2013). Endoplasmic reticulum (ER)-associated translation is then initiated by the IRES region. During this stage, subdomain IIId of the IRES is preferentially occupied by the translational machinery, thus favoring the contacts 5BSL3.2-Alt and 5BSL3.2-3 X (Figure 6). The emerging HCV polyprotein is co-and post-translationally cleaved by cellular proteases and viral NS2-NS3 and NS3-NS4A proteases to release 10 HCV proteins (Grakoui et al., 1993). The accumulation of non-structural viral proteins promotes significant rearrangements in the ER membrane to create optimized microenvironments for RNA replication (as seen for other positive-strand RNA viruses) (Egger et al., 2002;Moradpour et al., 2003;Meyers et al., 2016;Falcon et al., 2017). The binding of the replicase complex to the 3 X tail of the viral genome, and the concomitant recruitment of cellular components at the CRE region (Oakland et al., 2013;Ríos-Marco et al., 2016), releases the 5BSL3.2 domain for new interactions with distant genomic RNA elements such as subdomain IIId within the IRES (Figure 6). This contact promotes the viral genome's acquisition of a circular form. In addition, the interaction IIId-5BSL3.2 blocks the recruitment of 40S ribosomal particles by the IRES, contributing to the creation of a translationally repressed-state . 5BSL3.2 is thus considered a specific inhibitor of HCV IRES function . The IIId-5BSL3.2 interaction also controls the structural switch at the 3 X tail and partially interferes with the formation of dimeric genomes (Romero-López et al., 2017). From a thermodynamic point of view, the interaction IIId-5BSL3.2 can swap with the contact Alt-5BSL3.2 (Shetty et al., 2013) to create a favorable replicative environment (Figure 6) (Diviney et al., 2008;Tuplin et al., 2012). Viral RNA synthesis is then initiated using the positive genome strand as a template, yielding the complementary, negative strand, which is used for the generation of new progeny RNA genomes (for a review, see Kim and Chang, 2013). In a later stage of the cycle, viral genomes accumulate in the cytosol, where they serve as mRNAs for the production of HCV proteins, thus initiating new rounds of translation-replication (Figure 6). At this time, the IRES region is occluded by the translational machinery, favoring the interaction Alt-5BSL3.2. Because of the high concentration of viral RNAs, genomic dimerization is thermodynamically favored (Figure 6). This results in viral RNA storage, but HCV dimeric genomes are also excellent templates for the initiation of replication (Cantero-Camacho and Gallego, 2015;Masante et al., 2015;Cantero-Camacho et al., 2017). According to this model, HCV genomic dimerization may play a replication enhancing role during late infection. Finally, a fraction of the newly synthesized RNA molecules is shuttled for encapsidation and released to the extracellular medium (Figure 6).
The model proposed above recapitulates many of the findings made regarding the molecular biology of HCV. More importantly, it provides an overview of the critical role of longdistance RNA-RNA contacts in the progression of the infective cycle, and points to 5BSL3.2 as a multifunctional component encoded in the viral genome.

CONCLUDING REMARKS
The contrast between the high structural conservation and sequence variability of the HCV RNA genome reflects an efficient mechanism for retaining essential functionalities in certain structural elements while favoring the acquisition of novel capabilities. Whereas many genomic domains govern essential steps in infection, the control exerted by 5BSL3.2 at multiple levels reflects the functional versatility and fine regulation that can be achieved by a single stemloop. This domain is thus an interesting target for novel therapeutic agents. Exploring the structural dynamics of functional RNA domains offers a powerful, informative methodology for future drug design. Extending the knowledge acquired in HCV to related viruses, such as flaviviruses, will allow conformational maps of common structural domains to be produced, help reveal similar viral mechanisms involved in the infective cycle, and perhaps contribute to the development of novel therapeutic and diagnostic strategies.

AUTHOR CONTRIBUTIONS
CR-L and AB-H designed and wrote the paper.

FUNDING
This work was supported by the Spanish Ministerio de Economía y Competitividad [BFU2015-64359-P] to AB-H. Work at our laboratory is partially supported by FEDER funds from the EU.