Long-Read Isoform Sequencing Reveals a Hidden Complexity of the Transcriptional Landscape of Herpes Simplex Virus Type 1

In this study, we used the amplified isoform sequencing technique from Pacific Biosciences to characterize the poly(A)+ fraction of the lytic transcriptome of the herpes simplex virus type 1 (HSV-1). Our analysis detected 34 formerly unidentified protein-coding genes, 10 non-coding RNAs, as well as 17 polycistronic and complex transcripts. This work also led us to identify many transcript isoforms, including 13 splice and 68 transcript end variants, as well as several transcript overlaps. Additionally, we determined previously unascertained transcriptional start and polyadenylation sites. We analyzed the transcriptional activity from the complementary DNA strand in five convergent HSV gene pairs with quantitative RT-PCR and detected antisense RNAs in each gene. This part of the study revealed an inverse correlation between the expressions of convergent partners. Our work adds new insights for understanding the complexity of the pervasive transcriptional overlaps by suggesting that there is a crosstalk between adjacent and distal genes through interaction between their transcription apparatuses. We also identified transcripts overlapping the HSV replication origins, which may indicate an interplay between the transcription and replication machineries. The relative abundance of HSV-1 transcripts has also been established by using a novel method based on the calculation of sequencing reads for the analysis.


INTRODUCTION
Herpes simplex virus type 1 (HSV-1) is a human pathogenic alphaherpesvirus from the Herpesviridae family. Herpes is a lifelong infection, which often has mild or no symptoms. The most common symptoms of viral infection are cold sores. HSV-1 can cause acute encephalitis in immunocompromised patients. According to WHO's first global estimates, worldwide more than 3.7 billion people under the age of fifty are infected with HSV-1 (Looker et al., 2015). The HSV-1 genome is composed of a unique long (UL) and a unique short (US) region, both being bracketed by inverted repeats (IRLs and IRSs, respectively). According to earlier annotations, the HSV-1 DNA contains 89 protein-coding, 10 long non-coding (lnc)RNA genes (Rajčáni et al., 2004;McGeoch et al., 2006;Macdonald et al., 2012;Lim, 2013;Hu et al., 2016) and several micro RNAs (Du et al., 2015).
Herpesvirus genes are expressed in a coordinated temporal cascade and grouped into three kinetic classes, immediate-early (IE), early (E), and late (L) (Harkness et al., 2014). The IE proteins are required for the transcription of both E and L genes. The E genes typically encode proteins that play a role in DNA replication, while the L genes specify the structural components of the virus. Kinetic analysis of the HSV transcriptome faces a significant challenge due to the overlapping nature of the viral genes. The typical architecture of polycistronic units is characterized by varying transcription start sites (TSSs) that are caused by the control of distinct promoters, and shared transcription end sites (TESs). As an example, the following transcripts are produced from a tetracistronic unit: 1-2-3-4, 2-3-4, 3-4, and 4, where "1" represents the most upstream gene, while "4" is the most downstream gene within the given unit. In a recent study on the transcriptome of another alphaherpesvirus, the pseudorabies virus (PRV), we demonstrated that additional gene combinations are also produced mostly in the form of low-abundance RNA molecules (Tombácz et al., 2016). The downstream genes on the polycistronic transcription units are generally thought to be untranslated because the eukaryotic RNAs-with some exceptions (Craigen and Caskey, 1987)-use cap-dependent ribosome-binding sites at their 5 ′ -ends (Merrick, 2004).
It has been demonstrated that a large part of the mammalian genome is transcribed, producing a large variety of noncoding (nc)RNA molecules (Bertone et al., 2004). There is a current debate on whether the genome-wide expression of ncRNAs merely represents transcriptional noise, or whether these molecules might have yet undisclosed functions (Mattick, 2004). The most abundant class of the non-coding transcripts are the lncRNAs (Mattick and Makunin, 2006), which are defined as RNA molecules that exceed 200 nucleotides. A large number of lncRNAs have recently been described in mice and humans (Wilusz et al., 2009). Several genomic regions containing protein-coding genes also encode antisense lncRNAs from the complementary DNA strand. The HSV LAT was described as the first viral lncRNA (Stroop et al., 1984). In a recent study, we have shown that both DNA strands along the entire PRV genome are transcriptionally active (Tombácz et al., 2016). Some evidence suggests that the extensive transcriptional read-throughs produce very long RNA molecules including some containing oppositely oriented ORFs, which have been classified as complex transcripts (Tombácz et al., 2016).
Alternative splicing expands the coding capacity of metazoan genomes by producing multiple messages from a single gene. The splice variants can have similar or antagonistic functions (Boise et al., 1993). Previously, only five HSV-1 genes had been reported to produce spliced transcripts (Rajčáni et al., 2004;Sedlackova et al., 2008).
Massively parallel DNA sequencing technologies have become useful tools for the analysis of the transcriptomes. However, the conventionally applied short-read platforms cannot reliably distinguish between transcript isoforms and overlapping RNA molecules. Currently, the most commonly used long-read technology is the single-molecule real-time (SMRT) sequencing platform developed by Pacific Biosciences (PacBio). This approach is capable of reading cDNAs generated from full-length transcripts in a single sequencing run permitting mapping of the transcript ends with base-pair precision.
Various methods have already been used for the analysis of the herpesvirus transcriptome including microarrays (Stingley et al., 2000), Illumina sequencing (Harkness et al., 2014;Oláh et al., 2015), multi-time-point real-time reverse transcription PCR (qRT-PCR) analysis (Tombácz et al., 2009), and PacBio SMRT sequencing (O'Grady et al., 2016;Tombácz et al., 2016Tombácz et al., , 2017. Next-generation sequencing platforms have only been used for analyzing the transcriptional activity along the viral genome (Harkness et al., 2014). The PacBio long-read sequencing approach used for transcriptome profiling of PRV has proven to be the most effective platform for the identification of polycistronic RNA molecules, complex transcripts, transcript isoforms, and transcript overlaps. Our multiplatform-based approach has revealed a previously unexplored complexity of the PRV transcriptome (Tombácz et al., 2016).
In this study, we report the application of PacBio longread sequencing technology for the characterization of the global lytic transcriptome of HSV-1. We used an amplified isoform sequencing (Iso-Seq) protocol that was based on PCR amplification of the cDNAs prior to sequencing. Besides the identification of novel transcripts and transcript isoforms, our intention was to map the TSSs and TESs of the viral RNAs with base pair precision. Long-read sequencing has proved to be efficient in the identification of overlapping and alternatively processed transcripts, as well as other long RNA sequences. Additionally, this method is particularly suitable for the sequencing of small genomes, GC-rich DNAs (HSV-1 has a 68% GC content) and with many repetitive sequences (Tombácz et al., 2014;Scott and Ely, 2015). The PacBio sequencing technique has an additional advantage over the short-read platforms in that if any random errors occurs in the raw reads, they are easily corrected thanks to the exceptionally high consensus accuracy of this platform (Miyamoto et al., 2014). In this report, we reevaluated the currently available datasets concerning the polyadenylated fraction of the HSV-1 transcripts generated in productively infected cells.

Cells and Viral Infection
An immortalized kidney epithelial cell line (Vero) isolated from African green monkey was used for the propagation of HSV-1 strain KOS. Vero cells were cultivated in Dulbecco's modified Eagle medium supplemented with 10% fetal bovine serum (Gibco Invitrogen) and 100 µl penicillin-streptomycin 10K/10K mixture (Lonza)/ml and 5% CO 2 at 37 • C. For the preparation of viral stocks, rapidly-growing semi-confluent Vero cells were infected at a multiplicity of infection (MOI) of 1 plaque-forming unit (pfu)/cell, followed by incubation until a complete cytopathic effect was observed. The infected cells were then frozen and thawed three times, followed by centrifugation at 10,000×g for 15 min. The titer of the virus stock was determined in Vero cells. For the sequencing analysis cells were infected with MOI = 1, incubated for 1 h, followed by removal of the virus suspension and washing with PBS. This was followed by the addition of fresh culture medium to the cells, which were incubated for 1, 2, 4, 6, 8, or 12 h. Samples taken from each experiment were then mixed for the sequencing analysis.
Pacbio RS II Sequencing-The Amplified Iso-Seq Protocol

Synthesis of cDNAs and Preparation of SMRTbell Template
Polyadenylated RNAs were isolated from total RNA by using the Oligotex mRNA Mini Kit (Qiagen) according to the manufacturer's instructions. The cDNA libraries were prepared using anchored oligo-dT primers for the reverse transcription. The cDNA production and the SMRTbell library preparation were performed with the Iso-Seq protocol of PacBio sequencing using the Clontech SMARTer PCR cDNA Synthesis Kit. We carried out either no Size Selection for the analysis of short viral transcripts, or performed SageELF TM and BluePippin TM Size-Selection Systems (Sage Science) for the isolation of long RNA molecules. The SMARTer PCR cDNA Synthesis Kit (Clontech) was used for the generation of the first-strand cDNAs. Subsequently, the single-stranded cDNAs were PCR-amplified using KAPA HiFi Enzyme (Kapa Biosystems) following PacBio's recommendations, which were as follows: initial denaturation was carried at −95 • C for 2 min, followed by 16 cycles (the optimal cycle was determined in the optimization step) at −98 • C for 20 s (denaturation), −65 • C for 15 s (annealing) −74 • C for 4 min (extension). The final extension was carried out at −72 • C for 5 min. From the non-size selected samples 500 ng of each cDNA sample was applied for the SMRTbell template preparation, using the PacBio DNA Template Prep Kit 2.0. Subsequently, PCR products were pooled then size-selected with the SageELF TM System according to the PacBio's protocol. Size-selected samples were amplified with KAPA enzyme using the conditions as above. The fraction of cDNAs with a size over 5 kb was run on BluePippin TM System to remove the short SMRTbell templates.

PacBio Sequencing
The purified SMRTbell templates were bound to v2 sequencing primers and polymerases by using the DNA/Polymerase Binding Kit P6 (P/N 100-356-300). The conditions for annealing of the primers and binding of polymerase were determined using the PacBio's Binding Calculator in RS Remote. The polymerasetemplate complexes were bound to magbeads using the PacBio MagBead Binding Kit. The samples were analyzed on an Agilent 2100 bioanalyzer. Sequencing was carried out by using the PacBio RS II sequencer with DNA Sequencing Reagent 4.0 (P/N 100-356-200). The length of each movie was 360 min. We applied the PacBio Iso-Seq protocol (SMRT Analysis version v2.3.0.; Chaisson and Tesler, 2012) for the analysis of the HSV-1 transcriptome.

PCR Analysis
The novel HSV transcripts identified by PacBio sequencing were validated by PCR analysis. SuperScript III reverse transcriptase (Life Technologies) was used to convert the RNA samples to single-stranded cDNAs according to the manufacturer's instructions. The cDNAs were amplified with the Veriti Thermal Cycler (Applied Biosystems), using AccuPrime TM GC-Rich DNA Polymerase (Invitrogen). The PCR reactions ran for 3 min at 95 • C (denaturation), followed by 30 cycles at 92 • C for 30 s, at 60 • C for 30 s (annealing), and at 72 • C for 10 s (extension). Finally, a 10 min elongation step ran at 72 • C. The primers used in this study are listed in Table S1.

Strand-Specific Real-Time RT-PCR
The relative amounts of transcripts were calculated by real-time reverse transcription (RT)-PCR. Three parallel RT reactions were carried out using 70 ng of total RNA as template, Superscript III enzyme (Life Technologies) and anchored oligo(dT) primers. In order to control for potential DNA contamination, No RT reactions were carried out for each sample by omitting the RT enzyme. The PCR reactions were carried out in a volume of 20 µl with Absolute QPCR SYBR Green Mix (Thermo Scientific) containing 7 µl of cDNA solution diluted 10-fold, 1.5 µl of forward and 1.5 µl of reverse primers (10 µM each). The normalized relative expression ratios (R x ) were calculated by the following formula: , where E is the amplification efficiency, Ct is the threshold cycle number. The mean expression value of all examined transcripts (total) in a given sample was used as a normalization factor for the transcripts (sample) to obtain R x values. This method is similar to that used by Mestdagh et al. (2009). However, instead of using the Ct values alone, we used E Ct for the calculation of the expression values. The Comparative Quantitation module of the Rotor-Gene Q software v2.3.1 (Qiagen) was used for the determination of Ct values. The efficiency of the reaction and the thresholds were automatically calculated by the software.

Transcript Quantitation Using PacBio Sequencing Reads
The following criteria were set to count the number of reads aligning to each transcript isoform: Reads of Inserts (ROIs) containing poly(A) tails were assigned to a transcript, if the poly(A) site (PAS) of the read was no longer than 10 nts from the PAS of the transcript (in either direction), and if the 5 ′ end of the read was no longer than 10 nts upstream or 10 nts downstream from the TSS of the transcript isoform. Reads with long trimmed ends (>100 nts), or with many deletions, insertions or mismatches (>5% of the alignment) were discarded. Only the reads that contained a single poly(A) tail were counted. A set of DNA ladders was sequenced to determine the read length preference of the sequencing method. Equimolar molecularweight size markers (Lambda DNA StyI, Lambda DNA PstI and phiX174 DNA HaeIII, resulting in 50 discrete fragments, between 15 and 19.329 nts) were sequenced along with the viral sequences.
The number of reads for each fragment length was determined and their distributions were used to normalize read counts of the HSV-1 transcripts. When assigning reads to transcript isoforms, each read was weighted by the probability of sequencing a read of that given length.

Prediction of cis-Regulatory Sequences of the HSV Genes
The poly(A) signals were predicted by Dragon PolyA Spotter (DPS) (Kalkatawi et al., 2012) using Artificial Neural Network classifier. We determined the common transcription factor binding sites (−100 nts from transcription start) for common binding sites including TATA-box, GC-Box etc., using JASPAR POLII database (Mathelier et al., 2016) and FIMO (Find Individual Motif Occurrences) software (Grant et al., 2011). The threshold was set to p < 0.001. The best hits are linked to the genes and are included in Tables S2, S3.

Analysis of the HSV Transcriptome with PacBio Isoform Sequencing
The annotation of the herpesvirus genome has traditionally been carried out in silico by detection of ORFs and other cis-motifs. Many HSV-1 transcripts have been identified by Northern-blot analysis (Costa et al., 1984;Sedlackova et al., 2008), while the 5 ′ and 3 ′ termini of the transcripts have been determined by S1 nuclease mapping (McKnight, 1980;Rixon and Clements, 1982) or by primer extension (Perng et al., 2002;Naito et al., 2005) techniques. For the investigation of the HSV-1 lytic transcriptome, we employed a PacBio polyadenylation sequencing approach based on Iso-Seq template preparation protocol utilizing the switching mechanism at the 5 ′ end of the RNA template, which produces full-length cDNAs (Zhu et al., 2001). We mapped the ROIs for both the HSV-1 (X14112) and the host genome (NC_023642.1) using the GMAP (Wu and Watanabe, 2005) alignment tool with default parameters. Altogether, 50,166 ROIs were mapped for the viral genome with a mean length of 1,461 nts, representing 7.32 million bases; and 35,089 ROIs were mapped for the host genome with mean length of 1,262 nts, representing 4.43 million nucleotides. Sizeselection was applied for the detection of transcripts exceeding 5 kbps. The criteria for accepting sequencing reads as existing transcripts were the presence of at least two independent ROIs which contained the same TSSs and TESs. Likewise, a novel splice site was accepted if the same site occurred in at least two independent ROIs. We ran 17 parallel sequencing reactions for providing independent sequencing reads. Additionally, in many cases, the same TSS, TES, or splice junction were also found in different transcripts detected within the same sequencing reaction, which further enhanced the number of independent ROIs. In most cases, even in low-abundance transcripts, far more than two independent ROIs contained the same transcript ends or splice junctions. Besides the poly(A) tails, a few sequencing reads were also produced from A-rich regions of the transcripts; these non-specific products were discarded from further analysis. A few ROIs possessed both poly(A) and poly(T) tails. This phenomenon was earlier described in human (Sharon et al., 2013) and PRV (Tombácz et al., 2016) transcriptomes that were sequenced using the PacBio technique. Altogether, we could predict TATA boxes for 77 HSV-1 transcripts ( Table S2).
The promoter regions of the other genes lack typical TATA boxes, but many of them contain GC-and/or CAAT boxes (Table S2).

Novel Putative Protein-Coding Genes
Our investigations revealed 34 novel putative protein-coding genes ( Figure 1, Table 1, Table S4), which is a fairly large number in a well-characterized viral genome. Each new gene is embedded into already annotated protein-coding genes. The mRNAs from these 5 ′ truncated genes are generated by alternative transcription initiation from an intragenic promoter within the larger host gene. Embedded genes are well-known in the herpesviruses, and their number is increasing; for example, the detection of the novel ul36.5 gene by PacBio sequencing has recently been described (Tombácz et al., 2016). The first in-frame AUG triplets are supposed to correspond to the first amino acids of the putative protein molecules. With the exception of seven genes (ul5.5; ul15.5; ul21.5; ul24.5; ul27.2; ul41.5; and ul53.5), the remaining novel genes were expressed at markedly lower levels than the host genes.

Novel Non-coding Transcripts
In this part of our study, we identified 10 novel ncRNAs, including a LAT variant, four antisense (as) and four 3 ′ truncated transcripts, as well as a putative RNA molecule overlapping the replication origin (Ori-L) of the virus (Figure 1, Table 2). The novel 0.7 kb LAT-S transcript is a TSS isoform of the 0.7-kb LAT (Zhu et al., 1999). We detected polyadenylated asRNAs (termed antisense transcripts, AST) transcribed from the complementary DNA strands (the predicted TATA boxes are presented in Table S2). The existence of ASTs was verified by strand-specific PCR (Figure 2). We also detected lncRNAs, which were produced from protein-coding genes, but their transcriptions were prematurely terminated, and as a result lack stop codons. The potential function of these 3 ′ truncated RNA molecules remains obscure. The homolog of NCS8 transcript has been described in PRV (Tombácz et al., 2016). Furthermore, we identified a putative Ori-L-overlapping ncRNA with uncertain orientation, TSS and TES. We cannot exclude the possibility that it is not an individual RNA molecule but is rather the upstream region of a putative longer TSS isoform of the UL30 transcript. We did not detect HSV ncRNAs homologous to the PRV CTO-S transcript located in the close vicinity of the Ori-L of this virus . The probable reason for this is that the Ori-L is situated at different genomic loci in the two viruses. We could however detect the Ori-S-overlapping RNA described by Voss and Roizman (1988).

Determination of the 5 ′ and 3 ′ Termini of the HSV Transcripts
In this report, we precisely mapped the 5 ′ and 3 ′ ends of the HSV transcripts (Table S3) and re-annotated those that had not been well-established earlier. Our amplified Iso-Seq technique is capable of identifying the full-length transcripts without any loss at the terminal sequences. We found that most of the TSSs were located between 28 and 33 nts downstream from the TATA box. Altogether, we were able to identify 46 novel TSSs and 6 TESs   All of the identified candidate genes are located within the coding region of an already annotated HSV-1 gene. This table shows the list of the putative genes and the positions of their transcriptional start (TSS) and end (TES) sites. We identified a novel LAT transcript, four antisense RNAs, four 3 ′ truncated transcripts, and a putative OriL RNA molecule. In this table, the transcripts are listed with their start (TSS) and end (TES) positions. The "?" indicates that we were unable to determine the exact 5 ′ ends of the given transcripts.
in already described transcripts, as well as 16 TSSs in the novel transcripts.
We could detect putative TATA boxes for six TSS variants; however, TATA-less genes have been reported to be common in eukaryotic organisms (Yang et al., 2007). Similar to PRV, we detected multiple TSSs for the HSV UL10 transcripts, which indicate that the complex regulation of this gene is conserved among herpesviruses. We found that the ul22 and the us7 gene contain two active TATA sequences, and seven genes (ul2; ul6.5;  In this part of our work, we detected novel TSS isoforms of both previously known and newly identified transcripts. This table shows the TSS and TES positions of the transcripts. The "?" indicates that we could not determine the exact 5 ′ end of the given transcripts. ul12; ul22; ul51; us1; us7) have at least two potential TATA boxes. We identified two transcripts that overlap the Ori-Ss of HSV (Figure 1) at their 5 ′ -UTRs; both are long TSS isoforms of us1 and us12 genes. In contrast to the PRV us1 gene that is located in the IRS, in the HSV only the promoter and a short 5 ′ -UTR of this gene map to the repeat region. The Ori-Soverlapping US1 transcript is homologous to the PRV PTO-US1. Our analysis has revealed extensive minor variations in ∼68% of the transcripts. These RNA molecules are controlled by the same promoter and vary in length from 1 to 10 nts at their 5 ′ -ends ( Table 3).

Transcription End Site Isoforms
It has earlier been shown that many eukaryotic genes produce TES isoforms using alternative PA signals (Edwalds-Gilbert et al., 1997;Shen et al., 2008;Proudfoot, 2011). We detected six HSV transcripts, which each had two TES isoforms (Figure 1, Table 4). Except for three transcripts, we found that the rest of the abundant RNA molecules containing the same PA signal exhibited considerable polymorphism at their 3 ′ -ends (Table 4).

Splice Isoforms
In this report, we detected 13 novel spliced transcripts (Figures 1, 3, Table 5). The most intricate splicing pattern was found in the ul41-42 and ul41-45 complex transcripts (Figure 3). We identified two splice variants of the AST-2 antisense transcript. We also detected a new splice isoform of the ul49 gene, which was found to be expressed in a much higher level than the non-spliced variant. The UL49 and UL49-48 transcripts are the only ones in which the splicing occurs within an ORF, which is translated (in other cases the splicing takes place in either non-coding transcripts or in the downstream genes in polycistronic transcripts, which are non-translated). The splicing within the ul49 ORF results an in-frame deletion.

Novel Polycistronic Transcripts
According to the current concept, the HSV genome is organized in such a way that the downstream genes in a tandem gene cluster are transcribed either as monocistronic transcripts or as downstream genes of polycistronic (bi-, tri-, tetra-, or pentacistronic) RNA molecules, whereas the upstream genes are expressed exclusively as parts of polycistronic transcripts. Our earlier investigations revealed that several upstream genes of 3 ′ -coterminal gene clusters of the PRV are also transcribed as monocistronic RNA molecules (Tombácz et al., 2016). However, in this study, with the exception of the transcripts expressed from the embedded genes, we only detected novel polycistronic molecules ( Table 6) that include transcripts terminated upstream of the co-terminal PA sites. Most of the newly discovered polycistronic RNA molecules are expressed at low levels, which explain why they had previously gone undetected.

Complex Transcripts
Complex transcripts are defined as containing at least two genes with opposite orientations. We identified 10 full-length complex transcripts in HSV, seven of which had been generated by alternative splicing of the RNA molecule encoded by the genes within the ul41-45 region (Figure 1, Table 6). Earlier we had detected the homologs of the UL41/42 and the UL52/51 complex transcripts of HSV-1 also in PRV (Tombácz et al., 2016). Our investigations revealed a widespread expression of very long complex transcripts in PRV whose upstream sequences could not be determined even with the long-read PA-Seq technique. We detected more full-length but fewer partial complex transcripts in HSV than in PRV; the reason for the latter may be that in our previous report we also used random primer-based sequencing that allowed for the capture of more distal upstream sequences than the PA-Seq technique alone. The RNA molecules with partial sequences were illustrated as if they were controlled by the promoters of the closest upstream genes and were given ad hoc names accordingly. However, it is possible that they are shorter and initiated by yet unidentified promoters, or even longer and driven by more distal cis-regulatory sequences.

Transcriptional Overlaps between HSV Genes
In this part of the study, we investigated the transcriptional overlaps along the entire HSV genome.

Parallel Overlaps
Tandemly-oriented adjacent HSV genes can either partially or fully overlap one another. We detected 115 novel parallel full Frontiers in Microbiology | www.frontiersin.org FIGURE 3 | Continued (B2) Illustrates the expanded view of ROIs, the read coverage along analyzed region and the schematic view of AST-2 and adjacent genes. (C2) Graph shows the read coverage, the ROIs and the schematic representation of the examined region. The numbers on the left side of panels A1, B1, and C1 indicate the sizes of DNA ladders while the numbers on the right side of these panels indicate the sizes of the PCR products amplified from the splice junction regions of the various splice isoforms (see Table 5). (  transcript overlaps, including the 34 5 ′ truncated genes, four 3 ′ truncated genes and 18 adjacent genes, as well as six partial tail-to-head overlaps between tandem genes. This latter kind of overlap is illustrated in Figure 4A, using ul42, ul43, and ul44 genes as examples. As a result of this analysis, we obtained that HSV contains fewer ORF-overlaps between the adjacent tandem genes than does PRV (7 vs. 8). However, there are many more embedded genes were detected in HSV.

Divergent Overlaps
Earlier annotations revealed that three out of 13 divergentlyoriented HSV gene pairs form head-to-head overlaps with each other (Figure 4C), including 5 ′ -UTR/5 ′ -UTR, the 5 ′ -UTR/ORF, and ORF/ORF overlaps. PacBio analysis revealed alternative TSSs for several genes, which generate overlaps or much longer overlapping regions than published earlier (12 out of 13).

Detection of Antisense RNAs with Sequence-Specific qRT-PCR
In order to determine the existence of additional asRNAs, which are undetected by sequencing due to the potential lack of PA tails or being too long to be sequenced by the PacBio technique, five convergent HSV gene pairs were selected for the analysis of transcriptional activity of the complementary DNA strand by qRT-PCR. The asRNAs are supposed to be produced by transcriptional read-through from the convergent partner genes. We detected varying levels of asRNAs from each examined locus (Figure 5). The standard deviation (SD) values of the three parallel experiments are presented in Table S5. The pairwise comparison of qPCR curves shows that the relative amounts of asRNAs in 8 out of 10 genes are much lower than those of their mRNA partners, except in ul54 gene where the mRNA level only slightly exceeds that of the asRNA, and in the us10 gene where the sense-antisense partners are expressed at a similar rate. The This table shows the newly discovered polycistronic and complex transcript isoforms. The "?" indicates that we were unable to determine the exact 5 ′ end of the given transcripts.
reason for this latter phenomenon could be that the us10 gene is expressed in low level, and therefore even a weak antisense RNA expression can produce a comparable level of transcripts.

The Expression of Convergent Genes Exhibits an Inverse Pattern
In this part of the study, we observed the phenomenon that high transcriptional activity of a gene is associated with a low level of mRNAs from the convergent genes. We found that the higher the expression of a gene, the lower the expression of the convergent partner was ( Figure 6A). Furthermore, we also observed that a gene with a high expression rate has a convergent partner with a low mRNA/asRNA ratio, and conversely, the partners of genes with low expression rates have higher values ( Figure 6B).

Read Count Normalization by Spike-In DNA
It has been reported in our recent publication (Tombácz et al., 2017) and by others (O'Grady et al., 2016) that library preparation methods for PacBio sequencing enrich and sequence 1-2 kb-long fragments. Consequently, the long (>3 kb) or the short (<700 nts) transcript isoforms are underrepresented in the raw number of reads. To account for this bias, raw read counts were normalized by spike-in DNA (Figure 7). This analysis represents global gene expression along the HSV genome in a sample containing a mixture of RNA molecules collected from six different time points of viral infection. We also calculated the transcriptional activity throughout the HSV genome using the same method (Figure 8).

DISCUSSION
Short-read sequencing has become a common technique for the annotation of transcriptome datasets (Mortazavi et al., 2008;Wang et al., 2009;Djebali et al., 2012). However, these methods are not optimal for resolving transcript structures, especially for the de novo characterization of complex transcript isoforms and long overlapping RNAs. In this study, a PCR-based long-read sequencing method was applied for the analysis of the HSV-1 transcriptome. Our investigations revealed a much higher level of transcriptomic complexity than has been captured by existing annotations. Here we provide a base-pair-precision annotation of already detected but not fully characterized RNA molecules.
We have also succeeded in identifying 34 previously undescribed putative protein-coding transcripts, which were all transcribed from the ORF region of already annotated HSV-1 genes. We were able to identify in-frame AUG triplets for all genes in relative close position to the TSSs. Ten putative ncRNAs have also been detected. Four of these were 3 ′ truncated RNA molecules expressed from the promoter of protein-coding HSV-1 genes, but they were prematurely terminated from an internal PA signal. These transcripts lack stop-codons, and therefore we assume that they are lncRNAs. The homolog of HSV-1 NCS8 transcript has also been described in PRV in a recent report (Tombácz et al., 2016). This study identified many polycistronic and complex transcripts. Two complex transcripts of HSV show homology to previously described PRV transcripts (Tombácz et al., 2016), while the rest of them appear to be unique to HSV. We have also detected pervasive asRNA expression throughout the entire HSV genome with both cDNA sequencing and PCR. The antisense RNAs either have their own promoters, or they are transcribed from the promoter of neighbor or of distal genes as being transcriptional read-through products. The question as to whether the long transcripts are the result of transcriptional noise, if they contribute to the viral proteome, or if they fulfill other roles, remains to be ascertained. Another possibility is that these long transcripts are mere byproducts of a gene regulation mechanism that is based on interference between the transcriptionally active genes (Boldogköi, 2012). No short ncRNAs homologous to CTO-S or PTO of PRV located near the replication origins were detected in HSV. However, we identified Ori-overlapping HSV transcripts, which were the long TSS variants of genes us1 and us12, as well as a putative short ncRNA overlapping Ori-L. The existence of these transcripts may indicate a crosstalk between the replication and transcription machinery.
Rutkowski and colleagues have recently investigated the translation of HSV-1 transcripts throughout the viral life cycle FIGURE 4 | Transcriptional overlaps. Transcriptional overlaps between adjacent genes. (A) Represents the parallel transcriptional overlap exemplified by the ul42, ul43, and ul44 genes, which is a unique genomic region of HSV-1 in the sense that the transcripts from these genes have their own transcription termination sites and (Continued) with a ribosome profiling technique using the RPKM (reads per kilobase per million mapped reads) model (Rutkowski et al., 2015). However, they mapped only the known HSV-1 transcripts, and provided no information from the translational start sites, and therefore we cannot check whether our novel transcripts (e.g., the 5 ′ truncated or the antisense RNAs) are translated or not. PacBio long-read sequencing revealed the existence of many transcript isoforms including TSS, TES, and splice variants produced from the same gene. Additionally, as it has been FIGURE 6 | Correlations between the expressions of convergent HSV genes. The relative copy numbers (R x values) were calculated for the transcripts encoded by five convergent gene pairs: ul3-ul4; ul30-ul31; ul45-ul46; ul55-ul56; us9-us10. (A) The less abundant mRNAs encoded by one of the convergent gene pairs are indicated by a red dashed line, while the more abundant partner mRNAs are depicted by a blue solid line. The figure shows that the higher the relative abundance of a transcript in the total PRV transcriptome, the lower the level of mRNA produced from the convergent gene. (B) Cluster analysis of the R x values of the examined mRNAs compared to the ratio between the mRNA/antisense transcripts produced by the convergent gene partner. This figure shows that if a high mRNA levels are produced from a gene, then it exerts a decrease of mRNA/asRNA ratio for the convergent gene. The reason for this is that asRNAs are supposed to be produced as a result of transcriptional read-throughs from the oppositely-oriented genes. The blue dots illustrate low-abundance, while the red dots represent the high-abundance mRNAs. detected in PRV (Tombácz et al., 2016), HSV transcripts also exhibit minor variations in both of their 5 ′ and 3 ′ termini.
Our investigations revealed an intricate meshwork of transcriptional read-throughs leading to overlapping RNA molecules. It turned out that HSV genes are transcribed in more combinations than it had been previously thought. The number of asRNAs and the complex transcripts of HSV are likely to be underestimated, because most of them may have been undetected due to their non-polyadenylated nature or because they are too long to be identifiable with even a long-read platform. This hypothesis was tested by examining the transcriptional activity of 10 HSV genes by qRT-PCR, which detected asRNAs in each case, suggesting that the entire HSV genome expresses long overlapping transcripts.
The analysis of antisense transcripts and their mRNA partners has disclosed the interconnected regulation of transcription (Katayama et al., 2005), and in many cases reciprocal patterns of gene expression (Lehner et al., 2002). Additionally, asRNAs have been shown to inhibit the transcription of complementary mRNA in human cells (Izant and Weintraub, 1985). In this work, we show that high mRNA expression from a gene was associated with low mRNA levels produced from the convergent partner. Additionally, high expression rates from genes were also associated with a low mRNA per asRNA ratio for convergent genes. We explain this phenomenon by transcriptional read-throughs, which are believed to result in the collision of the transcriptional machineries. The extensive transcriptional read-throughs and inverse correlations between the expression of convergent genes provide further support to our TIN hypothesis, which claims that genes interact with each other at the level of transcription throughout the entire genome (Boldogköi, 2012).
Finally, in this study, we developed a method for the quantification of RNA molecules based on calculations using ROI reads, which differs from the approach that we published earlier (Tombácz et al., 2017) in two important aspects: first, instead of the non-amplified SMRT method, here we used an amplified Iso-Seq technique, and second, we also sequenced spike-in DNAs for normalizing the size-dependent loading of cDNA molecules into ZMWs of the PacBio SMRT-cells, which allowed for comparison of the relative amounts of transcripts of different lengths.
In summary, our investigations essentially redefine the transcriptome of HSV-1. We demonstrated that HSV-1 exhibits a far higher transcriptional complexity than predicted from in silico ORF-based genome annotations or than those detected by techniques that map the termini of the transcripts. Most of the novel RNA molecules are produced at low levels which hinders their detection by gel-based assays or, because of their overlapping nature, even by PCR. Additionally, our analysis significantly increased the number of already known transcriptional read-throughs between the genes.

ACCESSION NUMBER
Raw sequencing files, processed data files as well as metadata have been submitted to the NCBI GEO repository and can be found with GenBank accession number GSE97785.

ACKNOWLEDGMENTS
We would like to thank Marianna Ábrahám (University of Szeged) for technical assistance, as well as to Kristin Mars (Pacific Biosciences) for helping to carry out the sequencing work. This study was supported by the European Union and the Hungarian State, co-financed by the European Social Fund in the framework of TÁMOP [4.2.2/B-10/1-2010-0012] and Swiss-Hungarian Cooperation Programme [SH/7/2/8] to ZBo. The work was also supported by the Bolyai János Scholarship of the Hungarian Academy of Sciences to DT and by the Centers of Excellence in Genomic Science-National Institutes of Health [5P50HG00773502] to MS.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2017.01079/full#supplementary-material Table S1 | The list of the primers used in this study.