ORIGINAL RESEARCH article

Front. Microbiol., 02 July 2012

Sec. Evolutionary and Genomic Microbiology

Volume 3 - 2012 | https://doi.org/10.3389/fmicb.2012.00231

Diversity of Antisense and Other Non-Coding RNAs in Archaea Revealed by Comparative Small RNA Sequencing in Four Pyrobaculum Species

  • DL

    David L. Bernick 1

  • PP

    Patrick P. Dennis 2

  • LM

    Lauren M. Lui 1

  • TM

    Todd M. Lowe 1*

  • 1. Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA

  • 2. Janelia Farm Research Campus, Howard Hughes Medical Institute Ashburn, VA, USA

Abstract

A great diversity of small, non-coding RNA (ncRNA) molecules with roles in gene regulation and RNA processing have been intensely studied in eukaryotic and bacterial model organisms, yet our knowledge of possible parallel roles for small RNAs (sRNA) in archaea is limited. We employed RNA-seq to identify novel sRNA across multiple species of the hyperthermophilic genus Pyrobaculum, known for unusual RNA gene characteristics. By comparing transcriptional data collected in parallel among four species, we were able to identify conserved RNA genes fitting into known and novel families. Among our findings, we highlight three novel cis-antisense sRNAs encoded opposite to key regulatory (ferric uptake regulator), metabolic (triose-phosphate isomerase), and core transcriptional apparatus genes (transcription factor B). We also found a large increase in the number of conserved C/D box sRNA genes over what had been previously recognized; many of these genes are encoded antisense to protein coding genes. The conserved opposition to orthologous genes across the Pyrobaculum genus suggests similarities to other cis-antisense regulatory systems. Furthermore, the genus-specific nature of these sRNAs indicates they are relatively recent, stable adaptations.

Introduction

Archaeal species are known to encode a plethora of small RNA (sRNA) molecules. These sRNAs have a multitude of functions including suppression of messenger RNA (mRNA; Straub et al., 2009), targeting modifications to ribosomal (rRNA) or transfer RNA (tRNA; Omer et al., 2000; Bernick et al., 2012), specifying targets of the CRISPR immune defense system (Barrangou et al., 2007; Hale et al., 2008; Hale et al., 2009), cis-antisense regulation of transposase mRNA (Tang et al., 2002; Tang et al., 2005; Jager et al., 2009; Wurtzel et al., 2010), and encoding short proteins less than 30 amino acids in length (Jager et al., 2009).

Only a few previous studies have described sRNA genes in the phylum Crenarchaeota. In the Sulfolobus genus, C/D box and H/ACA-box guide sRNAs have been studied, including 18 guide sRNAs in Sulfolobus acidocaldarius (Omer et al., 2000), nine in S. solfataricus (Zago et al., 2005), and corresponding homologs detected computationally in S. tokodaii (Zago et al., 2005). These sRNAs form two distinct classes of guide RNAs: C/D box sRNAs which guide 2′-O-methylation of ribose, and H/ACA-box guide RNAs which direct isomerization of uridine to pseudouridine. Eukaryotes also share these two classes of guide RNAs with the same functions, but these homologs are dubbed small nucleolar RNAs (snoRNAs) because of their cellular localization. Recently, we employed high-throughput sequencing to identify ten conserved, novel families of H/ACA-like sRNA within the genus Pyrobaculum (Bernick et al., 2012).

Sulfolobus solfataricus has been further characterized using high-throughput sequencing (Wurtzel et al., 2010), revealing 18 CRISPR-associated sRNAs, 13 C/D box sRNAs, 28 cis-antisense encoded transposon-associated sRNAs, and 185 sRNA genes encoded antisense to other, non-transposon protein coding genes. It is unclear how many of the latter antisense transcripts are the result of transcriptional noise, overlapping but non-interacting gene products, or biologically relevant products of functional ncRNA genes. The diversity of sRNA genes is just beginning to be studied in depth in other members of the Crenarchaeota.

Genes that produce sRNA antisense to mRNA are known in all three domains of life and many of these sRNA have provided interesting examples of novel regulation. Within bacteria, antisense sRNAs are known and well-studied (Repoila et al., 2003; Aiba, 2007; Vogel, 2009). For example, utilization and uptake of iron in Escherichia coli is modulated by the sRNA RyhB that acts in concert with the ferric uptake regulator (Fur) protein (Masse et al., 2007). The sRNA is coded in trans to its regulatory targets, and the Sm-like protein Hfq is required for its function. In Pseudomonas aeruginosa, an analogous regulatory mechanism exists with the PrrF regulatory RNA (Wilderman et al., 2004).

In this study, we adapted techniques pioneered by researchers studying microRNA in eukaryotes (Lau et al., 2001; Henderson et al., 2006; Lu et al., 2006), to execute parallel high-throughput pyrosequencing of sRNAs across four Pyrobaculum species. This comparative transcriptomic approach enabled us to identify novel conserved sRNA transcripts among four related hyperthermophiles (Pyrobaculum aerophilum, P. arsenaticum, P. calidifontis, and P. islandicum). We provide an overview of the distribution of sRNAs across species, and focus on two major classes: the highly abundant C/D box sRNAs, and sRNAs antisense to three biologically important protein coding genes. We augment our transcriptional analyses further with comparative genomics utilizing two additional Pyrobaculum species with sequenced genomes: P. neutrophilum (recently renamed from Thermoproteus neutrophilus) and P. oguniense (NCBI GenBank accession NC_016885.1).

Materials and Methods

Culture conditions

Pyrobaculum aerophilum cells were grown anaerobically in media containing 0.5 g/L yeast extract, 1× DSM390 salts, 10 g/L NaCl, 1× DSM 141 trace elements, 0.5 mg/L Fe(SO4)2(NH4)2, pH 6.5, with 10 mM NaNO3. P. islandicum and P. arsenaticum cells were grown anaerobically in media containing 10 g/L tryptone, 2 g/L yeast extract, 1× DSM390 salts, 1× DSM88 trace elements, and 20 mM Na2S2O3. P. calidifontis cells were grown aerobically in 1 L flasks using 500 mL media containing 10 g/L tryptone, 2 g/L yeast extract, 1× DSM88 trace metals, 15 mM Na2S2O3, pH 6.8, loosely capped with moderate shaking at 125 rpm. Anaerobic cultures were grown in 2 L flasks with 1 L media, prepared under nitrogen with resazurin as a redox indicator at 0.5 mg/L; 0.25 mM Na2S was added as a reductant. All cultures were grown at 95°C to late log or stationary phase, monitored at OD600.

The 10× DSM390 salts are comprised of (per liter ddH2O) 1.3 g (NH4)2SO4, 2.8 g KH2PO4, 2.5 g MgSO4·7H2O. The 100× DSM88 trace metal solution is comprised (per liter 0.12 N HCl), 0.9 mM MnCl2, 4.7 mM Na2B4O7, 76 μM ZnSO4, 25 μM CuCl2, 12.4 μM NaMoO4, 18 μM VOSO4, 6 μM CoSO4. The 100× DSM141 trace metal solution is comprised of 7.85 mM Nitrolotriacetic acid, 12.2 mM MgSO4, 2.96 mM MnSO4, 17.1 mM NaCl, 0.36 mM FeSO4, 0.63 mM CoSO4, 0.68 mM CaCl2, 0.63 mM ZnSO4, 40 μM CuSO4, 42 μM KAl(SO4)2, 0.16 mM H3BO3, 41 μM Na2MoO4, 0.1 mM NiCl2, 1.14 μM Na2SeO3.

cDNA library preparation

Two preparations were constructed for each of P. aerophilum, P. islandicum, P. arsenaticum, and P. calidifontis cultures, yielding a total of eight cDNA libraries. The following protocol was used for each preparation.

Total RNA was extracted from exponential or stationary cultures; 100 μg of each preparation was loaded onto a 15% polyacrylamide gel, and size selected in the range 15–70 nt. The gel was post-stained with SYBR Gold and the tRNA band was used as the upper exclusion point. The lower exclusion point was set at 75% of the region between xylene cyanol (XC) and bromophenol blue (BP) loading dye bands (Ambion protocol). Samples were eluted, EtOH precipitated, and 3′ linker (5′-adenylated, 3′ ddC) was added as described by Lau et al., 2001; IDTDNA, Linker 1). A second gel purification was performed as above, excising the gel fragment above the XC dye band to remove excess 3′ linker. The recovered linked RNAs were reverse transcribed (RT) using Superscript III (Invitrogen) with a DNA primer complementary to Linker 1. Following RT, Exonuclease I (EXO1, Thermo) was added to the RT reaction mixture, and incubated for 30 min to remove excess primer. We utilized standard alkaline lysis treatment with NaOH-EDTA at 80°C for 15 min to remove any residual RNA, as well as to inactivate the reverse transcriptase and the EXO1 ssDNA nuclease. Neutralization and small fragment removal was performed with water-saturated G50 columns (Ambion NucAway). The recovered single stranded cDNA was dried to near completion using a Servo SpeedVac, followed by a second 5′-adenylated linker addition (IDTDNA – Linker 2) to the cDNA using T4 RNA ligase (Ambion).

A 2 μL volume of this reaction was amplified by PCR (20 μL reaction, 16 cycles). This was followed by a second amplification (20 μL reaction, 16 cycles) using 2 μL from the first amplification as template using Roche 454-specific hybrid adapters based on the method described by Hannon1. A four-base barcode was included in the 5′ hybrid primer. The final reaction was cleaned using the Zymo clean kit following the manufacturer’s protocol.

Sequencing and read mapping

Sequencing was performed using a Roche/454 GS FLX sequencer, and the GS emPCR Kit II (Roche). Sequencing reads described in this work are provided online via the UCSC Archaeal Genome Browser2 (Chan et al., 2012).

Reads that included barcodes and sequencing linkers were selected from the raw sequencing data and used to identify reads from each of the eight pooled cDNA libraries. Reads were further consolidated, combining identical sequences with associated counts for viewing with the Archaeal Genome Browser. Reads were mapped to the appropriate genome [P. aerophilum (NC_003364.1); P. arsenaticum (NC_009376.1); P. calidifontis (NC_009073.1); P. islandicum (NC_008701.1); P. oguniense (NC_016885.1); P. neutrophilum (T. neutrophilus: NC_010525.1)] using BLAT (Kent, 2002), requiring a minimum of 90% identity (-minIdentity), a maximal gap of 3 (-maxIntron) and a minimum score (matches minus mismatches) of 16 (-minScore) using alignment parameters for this size range (-tileSize = 8 -stepSize = 4). Reads that mapped equally well to multiple positions in the genome were excluded from this study. The remaining, uniquely mapped reads were formatted and visualized as BED tracks within the UCSC Archaeal Genome Browser.

Of the 216,538 raw sequencing reads obtained, those that had readable barcodes and could be uniquely mapped to their respective genomes were: 39,294 in P. calidifontis, 30,827 in P. aerophilum, 31,206 in P. arsenaticum, and 42,951 in P. islandicum.

Northern analysis

Northern blots were prepared using ULTRAhyb-Oligo (Ambion) following the manufacturer protocol3 using Hybond-N+ (GE life sciences) membranes to transfer 10 μg/lane denatured total RNA (45 min, 50°C with glyoxyl loading buffer – Ambion). Size separation was conducted using 23 cm × 25 cm gels (1% agarose) in BPTE running buffer (30 mM bis-Tris, 10 mM PIPES, 1 mM EDTA, pH 6.5). The following DNA oligomers (Integrated DNA technologies) were used as probes: TFBiiSense (CCTCCTCTGGAAAGCCCCTCAAGCTCCGA), TFBiiAnti (TCGGAGCTTGAGGGGCTTTCCAGAGGAGG), PAEsR53sense (GACCCCGATCGCCGAAAAATGACGAGTGGT).

Computational prediction of orthologous gene clusters

Computational prediction of orthologous groups was established by computing reciprocal best BLASTP (Altschul et al., 1990; RBB) protein coding gene-pairs among pairs of four Pyrobaculum species. When at least three RBB gene-pairs select the same inter-species gene set (for example A pairs with B, B pairs with C, and C pairs with A), the cluster was considered an orthologous gene cluster.

Computational prediction of C/D box sRNA homolog families

C/D box sRNA homolog families were constructed from computational predictions with core C/D box features that were supported by transcripts from one or more of the four Pyrobaculum species (data from this study). Six Pyrobaculum genomes were searched for orthologs using these sRNA candidates as queries to BLASTN (Camacho et al., 2009). The highest scoring candidates were manually curated, then grouped into homologous C/D box sRNA families by multiple alignment.

Results

Small RNA populations

We prepared eight barcoded sequencing libraries using sRNA fractions (size range 16–70 nt) from anaerobic cultures of P. aerophilum, P. arsenaticum, P. islandicum, and an aerobic culture of P. calidifontis. These libraries were prepared using a 5′-independent ligation strategy (Pak and Fire, 2007) which preserves RNA strand orientation, captures both the 5′ and 3′ ends of the sRNA, and does not impose a bias for molecule selection based on 5′-phosphorylation state. Pyrosequencing, followed by selection of uniquely mapped sequence reads, allowed detection of reads associated with both known and novel genomic features (Figure 1), including:

  • (i)

    snoRNA-like guide RNAs, x C/D box sRNA and a new class of H/ACA-like sRNA (Bernick et al., 2012),

  • (ii)

    RNA sequences encoded cis-antisense (asRNA) to known protein coding genes,

  • (iii)

    RNA sequences derived from CRISPR arrays, thought to guide the CRISPR-mediated immune response,

  • (iv)

    unclassified novel sRNA, and

  • (v)

    degradation products of larger RNA including ribosomal RNA, messenger RNA and transfer RNA.

Figure 1

Most antisense-oriented sequencing reads are associated with coding regions (Figure 1) in each of the species and growth phases examined. Antisense-oriented reads are frequently the result of convergent expression of a protein coding gene and a snoRNA-like guide RNA (Tables A1A4 in Appendix). We find, in some cases, that sequencing reads that appear to be antisense to snoRNA-like RNAs appear to be fragments of novel 3′ untranslated regions (3′ UTRs) of a convergently expressed protein coding region. These antisense-oriented sRNA reads are counted as antisense to the associated snoRNA-like sRNA. We made use of this transcriptional pattern to find novel C/D box sRNA and H/ACA-like sRNA; in these cases, highly abundant antisense reads to coding transcripts often proved to be a hallmark of novel C/D box and H/ACA-like sRNA (Tables A2 and A4 in Appendix). In a few remaining cases, we found novel cis-encoded antisense reads that were not derived from known classes of sRNA. We note that the proportion of reads belonging to each type of classified RNA is relatively stable across species and conditions (Figure 1), with the exception of two conditions in which tRNA fragments were enriched (P. aerophilum stationary phase, P. islandicum exponential phase). We are further investigating these differences, however the purpose and design of the sequencing portion of this study was aimed at qualitative discovery of novel sRNAs.

C/D box sRNA account for the largest fraction of reads in all species tested

In each of the eight small transcriptomes studied (four species sampled at exponential and stationary phase), C/D box sRNA accounted for the largest fraction of reads (Figure 1). A previous study (Fitz-Gibbon et al., 2002) has provided computational evidence for 65 C/D box sRNA candidates encoded in the genome of P. aerophilum. We now find an additional 23 C/D box sRNA candidates in that genome, representing a 35% increase in family size. By using transcriptional support from the four examined genomes (this study), combined with comparative genomic evidence that includes P. oguniense and P. neutrophilum, we find at least 74 C/D box sRNA in each Pyrobaculum spp. (Table 1). Of those genes, 70 appear to be conserved among all six genomes examined (Figure 2).

Table 1

SpeciesC/D box sRNAs
P. aerophilum88
P. arsenaticum83
P. calidifontis88
P. islandicum84
P. oguniense83
P. neutrophilum74

C/D box sRNA genes in each Pyrobaculum species based on transcriptional evidence or inferred by homology (P. oguniense and P. neutrophilum).

All loci are manually curated.

Figure 2

Convergently oriented ncRNA are frequently found at the 3′ terminus of protein coding genes

It has been noted previously that in the genomes of S. acidocaldarius and S. solfataricus, C/D box sRNA genes occasionally exhibit antisense overlap to the 3′ end of protein encoding genes (Dennis et al., 2001). In the Pyrobaculum clade, we find numerous instances of a convergently oriented C/D box or H/ACA-like guide RNA gene that partially overlap, by a few nucleotides, the 3′ end of a protein-coding gene (Tables A2 and A4 in Appendix).

To find conserved, novel cis-encoded antisense RNA, we ranked conserved transcript abundance that overlapped orthologous protein coding genes. Among the top 34 predicted ortholog groups of genes with well-annotated function and conserved 3′ antisense transcription (Table A2 in Appendix), 28 are convergent with C/D box sRNA and three are convergent with H/ACA-like sRNA. Among the top 19 predicted ortholog groups of unknown function with 3′ antisense transcription (Table A4 in Appendix), 11 are convergent with C/D box sRNA, four are convergent H/ACA-like sRNA, and one is adjacent to a tRNA. Together, 87% of conserved, cis-antisense encoded sRNA are snoRNA-like guides, while only 2.6% are tRNA. In P. aerophilum, C/D box sRNA genes are nearly twice as abundant (88 compared to 46) as tRNA genes, but the sRNA genes are over 40-fold more likely to have a conserved overlap with the orthologous protein coding region. This may be an indication that these C/D box sRNA play a regulatory role with respect to the associated protein coding genes.

A notable example of a convergent ncRNA occurs at the 3′ terminus of the electron transport flavoprotein (etf) operon, where a C/D box sRNA, PAEsR53, overlaps the terminal gene (PAE0721 in P. aerophilum) in this four-gene operon. Like other operons within the Pyrobaculum genus, multiple promoters appear to drive expression of the etf operon (Figure 3). For this operon, an upstream promoter generates a 3400-nt-long full length etfDH-ferredoxin-etfB-etfA transcript. Two predicted internal promoters appear to generate respectively, the ferredoxin-etfB-etfA ~2250 nt transcript, and the etfA-only 1040 nt transcript.

Figure 3

The P. aerophilum sRNA sequencing data revealed a strong abundance of sequences mapping to PAEsR53, as well as sequences of the same general size and location, mapping to the opposite strand (the UTR of the etf operon). Northern hybridization was performed to determine the origin of these “anti-PAEsR53” reads. Figure 3 shows that these reads likely originate from the overlapping 3′ UTR of the etf operon, suggesting a possible interaction of the C/D box machinery with the etf mRNA. Predicted orthologs of this C/D box sRNA (PAEsR53) are syntenic with etfA in all Pyrobaculum species studied, overlapping the 3′ end of etfA orthologs by ~12 bases. The overlap positions the D box guide sequence of PAEsR53 over the etfA stop codon in all Pyrobaculum species. If the guide RNA interacts through complementarity with the etfA mRNA, it could enable a 2′-O-methyl modification of the central “A” nucleotide within the conserved TAA stop codon in all four species.

The transcription initiation factor B genes, tfb1 and tfb2

The genomes of Pyrobaculum species contain a pair of paralogous genes that encode alternate forms of transcription initiation factor B (TFB). This factor is required for the initiation of basal level transcription at archaeal promoters (Santangelo et al., 2007).

In every sequenced Pyrobaculum species, TFB1 (PAE1645 and orthologs) contains a short N-terminal extension (22 amino acids in P. aerophilum) that is not present in the TFB2 proteins (PAE3329 and orthologs). Sequencing data reveals the presence of an abundant sRNA (asR1) encoded on the antisense strand that overlaps the 5′ end of tfb1 (Figure 4A) in all four Pyrobaculum species examined (Table A1 in Appendix). Tfb1 also appears to have two promoters separated by 17–18 nt, such that the upstream promoter (Pu) is positioned to drive expression of full length tfb1, while the downstream promoter (Pd) generates transcripts that would lack a start codon near the start of the transcript.

Figure 4

In P. aerophilum, asR1 sRNA is about 59 nt in length (Table 2; Figure 4), with a well-defined 5′ end that overlaps the extension region of the tfb1 gene. The 3′ end of asR1 is located just upstream of the tfb1 translation initiation codon, precisely at the predicted start of transcription consistent with the Pu promoter. Importantly, there is an additional set of asR1 sRNA reads of 41 nt in length, starting at the same 5′ position but terminating early, at the 5′ end of tfb1 transcripts consistent with the alternate Pd promoter. Mirroring the two variants of the antisense asR1 transcript, deep sequencing revealed a large number of short sense strand sequencing reads, consistent with fragments representing the 5′ end of tfb1 transcripts generated by Pu and Pd, spanning 50 and 32 nt in length respectively.

Table 2

sRNALenSequence
asR1Pae595’-AACTCGGAGCTTGAGGGGCTTTCCAGAGGAGGGGGGATTTGAGACCGACATAGCGTGTT
Par795’-TATGCGGAGCTTTAGGGGCTTGCCGGAAGAAGGTAGGCTTGTACTCGACATAGCGTGTTTATAAGCTTTTCTAGCGTAT
Pca335’-‥TACGGAATTTTAAGGGCTTGCCGGGCGGGGTAG
Pis635’-.ATACGTAGCTTAAGGGGTTTCCCAGAAGACGTCGGACTTGACGACGACATAGCGAGTTTATAA
asR2Pae605’-GGGAGTCACTCTGTACCCCCTCTCCTTCAACGCTTGTACTAACTGGGCTGACTCCATCGT
Pca545’-……GACGCGGTATCCCTTCTCCTTTAGCGTGGCGACGAGCTGTGCCGTCTCCATAAT
asR3Pae655’-ACCCCCGAATTGGGGGCAAATGAGCGGCGGACACTTAAGGCGGCCCCGCCGCGAGCGGTTTCGCC
Par585’-.CCCCCGGA.CGGGGGCGAATGAGCGGCGGGCACCTGTGGCGGCTCCGCCGCGACTACT
Pca635’-.ACCCCGGA.TGGGGGCGGATGAGCGGCAGACACCTAAGGCGGCCCTGCCGCGACCAAGGGCTT
Pis595’-GACCCCCTGCTGGGGGCATATGAGCGGCGGGCACCTAAGGCGGCTCCGCCGCGACTGTA

Terminal cis-antisense encoded sRNA in Pyrobaculum species.

Position of start codon (on coding strand) shown underlined for asR1 and asR2 (CAT). Position of stop codon (on coding strand) underlined (CTA, TTA) for asR3. Pae (P. aerophilum); Par (P. arsenaticum); Pca (P. calidifontis); Pis (P. islandicum); len (length of sRNA approximated from sequence read population).

Northern analysis of total RNA from P. aerophilum confirmed the presence of a population of sense oriented transcripts of about 1000 nt in length, consistent with full length mRNA and another transcript population consistent with the sense oriented sRNAs described above (Figure 5A). When the antisense sRNA is probed, a population of short transcripts near 50 nt is detected (Figure 5B). The full length sense transcripts appear to be relatively constant in abundance across growth phase and culture conditions, consistent with data from a prior microarray study using the same RNA samples (Cozen et al., 2009). The correlated abundance of sense and antisense sRNA (Figures 5C–E) suggests that these sense::antisense pairs are associated, potentially as a double-stranded RNA. The elevated abundance of these pairs relative to the mRNA (Figure 5A) suggests that the sRNA pairs are stabilized within a dsRNA complex. The role of asR1 with respect to tfb1 transcripts is unclear, though the modulation of sRNA (both sense and antisense) while tfb1 mRNA remains at constant and low abundance is reminiscent of negative feedback control.

Figure 5

The presence of complementary sense and antisense transcripts has been observed in a previous RNA sequencing study (Tang et al., 2005). Those authors suggested that the presence of an antisense transcript might enhance the stability of the mRNA target. As exemplified with tfb1, the presence of cis-antisense transcripts in our data are often accompanied by the presence of complementary sense strand fragments of similar size. This observation suggests that formation of a dsRNA duplex between the antisense sRNA and the 5′ region of the mRNA target may trigger destabilization of the mRNA; or alternatively, that base pairing between the antisense sRNA and the 5′ end of the nascent mRNA early in elongation may trigger premature transcription termination. For either mechanism, the result appears to be a constant level of tfb1 mRNA under a variety of different culture conditions and growth phases.

The ferric uptake regulator gene (fur)

In a number of bacteria, the ferric uptake regulator FUR, is a transcriptional regulator of genes encoding proteins involved in iron homeostasis and protection from the toxic effects of iron under aerobic conditions. Some bacteria also encode a FUR-associated sRNA, for example ryhB; its synthesis is negatively regulated by FUR. The ryhB sRNA functions as a negative regulator of genes whose transcription is indirectly activated by FUR. The mechanism of ryhB sRNA negative regulation involves base pairing followed by selective degradation of the targeted mRNA (Andrews et al., 2003).

A homolog of the fur gene is conserved in the genomes of all known Pyrobaculum species. Embedded in each of the associated genes and located about 75 nt downstream from the 5′ start codon is an antisense-oriented, promoter-like sequence. In the two studied facultative aerobes (P. aerophilum and P. calidifontis), we detected a novel 54 nt-long cis-antisense transcript (Table A1 in Appendix), designated as asR2, with precise transcription initiation consistent with the noted antisense promoter-like sequence. The 3′ end of the asR2 transcript (Table 2; Figure 4B) transcript terminates just upstream of the fur translation start codon. Both the asR2 transcript and a complementary RNA fragment apparently derived from the 5′ end of fur mRNA, were present at high levels in anaerobically grown P. aerophilum and at modest levels in P. calidifontis. In the strict anaerobes (P. islandicum, P. arsenaticum), it appears that sequencing depth was insufficient to resolve any antisense-sense pairs under the limited set of growth conditions; however, we note that the predicted promoter for asR2 in the facultative aerobes is equally well-conserved across all Pyrobaculum species.

The triose-phosphate isomerase (tpi) gene

The tpi gene encodes triose-phosphate isomerase, an enzyme that is central to the modified Embden–Meyerhoff glycolytic pathway in Pyrobaculum species (Reher et al., 2007). We detected a 65-nt-long antisense transcript asR3 (Table A2 in Appendix) that overlaps the 3′ end of the tpi gene (Figure 4C) in all four of the species examined. Upon further examination of the 3′ terminal portion of tpi, we also detected a conserved sequence and associated secondary structure that is present in all sequenced Pyrobaculum spp. (Figure 6), which we term the tpi-element. In P. aerophilum, P. islandicum, and P. calidifontis, the tpi-element includes the stop codon of tpi, while the entire element is encoded immediately downstream of the tpi stop codon in the remaining Pyrobaculum spp.

Figure 6

A dsRNA formed by an interaction of asR3 with the tpi-element could potentially compete against the mRNA intramolecular structure, and thus modulate function of the highly conserved tpi-element. Alternatively, asR3 might itself be the active element of the pair, and in that case, presence of free tpi transcript might act as a repressor of asR3. In this model, asR3 may have other trans targets in the genome and play a more general role in coordination of glycolysis in Pyrobaculum species.

Discussion

Comparative transcriptomics has revealed compelling, conserved cases of novel cis-encoded transcripts that are antisense to core protein coding genes involved in transcription initiation and metabolism. We have considered these most obviously as potential regulators of their opposite strand partners, but they might also have broader regulatory roles.

We found that 28 of the top 34 cases of conserved 3′ antisense expression among orthologous Pyrobaculum proteins of known function coincide with convergent C/D box guide RNAs. This finding suggests that guide directed 2′-O-methylation of the mRNA in the region or downstream of the stop codon might be an unrecognized component of mRNA metabolism and gene regulation. It has been shown that pseudouridine modification of a stop codon can suppress termination of translation (Karijolich and Yu, 2011), but there are currently no studies of the possible implications of 2′-O-methyl modification on mRNA translation or stability. Alternatively, the presence of abundant mRNA fragments at the 3′ end may indicate that a sense-antisense interaction between the C/D box sRNA and mRNA terminus results in truncation of the mRNA by an unknown mechanism, leading to mRNA destabilization and degradation.

The coordinated regulatory program implemented by Fur and its regulatory sRNA ryhB in some bacteria, provides a mechanism that yields both repression of some genes and activation (de-repression) of others. This program balances the needs of iron storage and utilization while protecting from iron-induced toxicity under oxic conditions. These dual roles are mediated by the inverse expression patterns of Fur and ryhB. Fur negatively regulates ryhB, which negatively regulates downstream genes. This circuit allows Fur to derepress (activate) those downstream genes. In published studies, active transcription in one direction can negatively regulate expression of the cis-encoded antisense partner (Lapidot and Pilpel, 2006), thus creating exclusive access to the shared genomic region. Likewise in Pyrobaculum, the cis-antisense transcription observed may yield the same type of inverse expression pattern. In this view, if the cis-antisense gene product is capable of repressing transcription or translation of targets in trans, then positive expression of Pyrobaculumfur, tfb1, or tpi may act through their corresponding antisense partner to activate (derepress) additional members of the associated regulon. Identification and verification of targets in trans is difficult in species that are not amenable to genetic manipulation such as Pyrobaculum, although future studies will explore computationally predicted targets.

The presence of asR1, a cis-encoded antisense RNA found within tfb1 but not tfb2 is of special interest when we consider these possible models of action for the cis-encoded antisense RNA. A specific TFB and TATA binding protein (TBP) pair in the archaeal halophile Halobacterium sp. NRC-1 has been shown to activate transcription under heat shock conditions (Coker and DasSarma, 2007). The observations that there are two instances of tfb in all Pyrobaculum genomes, and that only one harbors an antisense gene, suggest that tfb1 might be essential only under particular conditions and/or initiate transcription for a subset of Pyrobaculum genes. Under this model, tfb1 transcription might be held at low levels by the presence of asR1 and possibly a dsRNA-binding complex. Under the alternative view, the cis-encoded asR1 might facilitate activation of a trans-encoded regulon via de-repression. In the former view, the mechanism(s) that regulate sRNA transcription, stability, and mRNA interaction are central, while in the latter model, the sRNA is a downstream effector molecule of the independently regulated top-strand mRNA partner. In either case, resolving the molecular details of the sRNA’s interaction with tfb1 are needed to better understand this potential high-level mechanism for broad gene regulation in Pyrobaculum.

The tpi-element and its associated antisense partner, asR3, may provide a novel regulatory circuit acting from the 3′ UTR of tpi. The structure of the tpi-element (Figures 4C and 6) contains the stop codon in some species while in other species the conserved structure is just downstream of the tpi stop codon. Possibilities for the function of the tpi-element include early transcription termination or translation termination. In either case, the tpi-element could be acting as a novel 3′ UTR riboswitch by binding a small molecule, or alternatively may be involved in a protein-binding event. Just as described above, the cis-antisense element asR3, encoded opposite the tpi-element, may act as a repressor of tpi-element function, or may have a role in trans with other genes in the tpi regulon.

In this study, we have described 74 or more expressed C/D box sRNA in each of four transcriptomes, most of which are conserved among multiple Pyrobaculum species. We have shown evidence that an unexpectedly large number of these sRNA overlap protein coding genes. Three novel sRNAs asR1, asR2, and asR3 overlap genes involved in core transcription, iron regulation and core metabolism. Sequencing data have revealed the presence of sRNA originating from both strands, and these transcripts can be supported by promoter analysis, and verified by northern analyses. By contrast, less than 1% of transcripts mapped to CRISPR arrays show any evidence of dual strand transcripts (Figure 1). We suggest that the presence of dual-stranded transcript reads is an indication of an interaction of an sRNA with a convergently oriented mRNA, potentially mediated by one or more unknown dsRNA-binding complexes.

Future RNA-seq studies employing deeper sequencing technologies, alternative growth conditions, and other archaeal species will likely uncover many more cases of candidate regulatory antisense RNA. This work suggests multiple new research directions and will require complementary methodologies to better understand the complexity of sRNA function in Archaea. Given the conserved patterns of cis-antisense RNA transcripts now apparent, we anticipate rapid progress from follow-up studies that will demonstrate new modes of gene regulation homologous or analogous to those found in bacteria and eukaryotes.

Statements

Author contributions

David L. Bernick designed and performed the experimental and computational analyses, and wrote the manuscript. Lauren M. Lui analyzed the C/D box sRNA sequencing data. Patrick P. Dennis provided assistance with the manuscript, collaborative review, and structure determination of C/D box sRNA. Todd M. Lowe provided scientific direction, contributed to interpretation of results, and edited the manuscript.

Acknowledgments

We are grateful to members of the Joint Genome Institute for making 454 sequencing possible (P. Richardson and J. Bristow for providing resources, and E. Lindquist and N. Zvenigorodsky for sample preparation and analysis). We thank Aaron Cozen for his generous procedural guidance and for the use of RNA blots used in the study. This work was supported by National Science Foundation Grant EF-082277055 (Todd M. Lowe and David L. Bernick); the Graduate Research and Education in Adaptive Bio-Technology (GREAT) Training Program sponsored by the University of California Bio-technology Research and Education Program (David L. Bernick); and by the National Science Foundation while Patrick P. Dennis was working at the Foundation. The opinions, findings, and conclusion expressed in this publications are ours and do not necessarily reflect the views of the National Science Foundation.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  • 1

    AibaH. (2007). Mechanism of RNA silencing by Hfq-binding small RNAs. Curr. Opin. Microbiol.10, 134139.10.1016/j.mib.2007.03.010

  • 2

    AltschulS. F.GishW.MillerW.MyersE. W.LipmanD. J. (1990). Basic local alignment search tool. J. Mol. Biol.215, 403410.10.1006/jmbi.1990.9999

  • 3

    AndrewsS. C.RobinsonA. K.Rodriguez-QuinonesF. (2003). Bacterial iron homeostasis. FEMS Microbiol. Rev.27, 215237.10.1016/S0168-6445(03)00055-X

  • 4

    BarrangouR.FremauxC.DeveauH.RichardsM.BoyavalP.MoineauS.RomeroD. A.HorvathP. (2007). CRISPR provides acquired resistance against viruses in prokaryotes. Science315, 17091712.10.1126/science.1138140

  • 5

    BernickD. L.DennisP. P.HochsmannM.LoweT. M. (2012). Discovery of Pyrobaculum small RNA families with atypical pseudouridine guide RNA features. RNA18, 402411.10.1261/rna.031385.111

  • 6

    CamachoC.CoulourisG.AvagyanV.MaN.PapadopoulosJ.BealerK.MaddenT. L. (2009). BLAST+: architecture and applications. BMC Bioinformatics10, 421. 10.1186/1471-2105-10-421

  • 7

    ChanP. P.HolmesA. D.SmithA. M.TranD.LoweT. M. (2012). The UCSC Archaeal Genome Browser: 2012 update. Nucleic Acids Res.40, D646D652.10.1093/nar/gkr990

  • 8

    CokerJ. A.DasSarmaS. (2007). Genetic and transcriptomic analysis of transcription factor genes in the model halophilic archaeon: coordinate action of TbpD and TfbA. BMC Genet.8, 61. 10.1186/1471-2156-8-61

  • 9

    CozenA. E.WeirauchM. T.PollardK. S.BernickD. L.StuartJ. M.LoweT. M. (2009). Transcriptional map of respiratory versatility in the hyperthermophilic crenarchaeon Pyrobaculum aerophilum. J. Bacteriol.191, 782794.10.1128/JB.00965-08

  • 10

    DennisP. P.OmerA.LoweT. (2001). A guided tour: small RNA function in archaea. Mol. Microbiol.40, 509519.10.1046/j.1365-2958.2001.02381.x

  • 11

    Fitz-GibbonS. T.LadnerH.KimU. J.StetterK. O.SimonM. I.MillerJ. H. (2002). Genome sequence of the hyperthermophilic crenarchaeon Pyrobaculum aerophilum. Proc. Natl. Acad. Sci. U.S.A.99, 984989.10.1073/pnas.241636498

  • 12

    HaleC.KleppeK.TernsR. M.TernsM. P. (2008). Prokaryotic silencing (psi)RNAs in Pyrococcus furiosus. RNA14, 25722579.10.1261/rna.1246808

  • 13

    HaleC. R.ZhaoP.OlsonS.DuffM. O.GraveleyB. R.WellsL.TernsR. M.TernsM. P. (2009). RNA-guided RNA cleavage by a CRISPR RNA-Cas protein complex. Cell139, 945956.10.1016/j.cell.2009.07.040

  • 14

    HendersonI. R.ZhangX.LuC.JohnsonL.MeyersB. C.GreenP. J.JacobsenS. E. (2006). Dissecting Arabidopsis thaliana DICER function in small RNA processing, gene silencing and DNA methylation patterning. Nat. Genet.38, 721725.10.1038/ng1804

  • 15

    JagerD.SharmaC. M.ThomsenJ.EhlersC.VogelJ.SchmitzR. A. (2009). Deep sequencing analysis of the Methanosarcina mazei Go1 transcriptome in response to nitrogen availability. Proc. Natl. Acad. Sci. U.S.A.106, 2187821882.10.1073/pnas.0909051106

  • 16

    KarijolichJ.YuY. T. (2011). Converting nonsense codons into sense codons by targeted pseudouridylation. Nature474, 395398.10.1038/nature10165

  • 17

    KentW. J. (2002). BLAT – the BLAST-like alignment tool. Genome Res.12, 656664.10.1101/gr.229102

  • 18

    LapidotM.PilpelY. (2006). Genome-wide natural antisense transcription: coupling its regulation to its different regulatory mechanisms. EMBO Rep.7, 12161222.10.1038/sj.embor.7400857

  • 19

    LauN. C.LimL. P.WeinsteinE. G.BartelD. P. (2001). An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science294, 858862.10.1126/science.1065062

  • 20

    LuC.KulkarniK.SouretF. F.MuthuvalliappanR.TejS. S.PoethigR. S.HendersonI. R.JacobsenS. E.WangW.GreenP. J.MeyersB. C. (2006). MicroRNAs and other small RNAs enriched in the Arabidopsis RNA-dependent RNA polymerase-2 mutant. Genome Res.16, 12761288.10.1101/gr.5530106

  • 21

    MasseE.SalvailH.DesnoyersG.ArguinM. (2007). Small RNAs controlling iron metabolism. Curr. Opin. Microbiol.10, 140145.10.1016/j.mib.2007.03.013

  • 22

    OmerA. D.LoweT. M.RussellA. G.EbhardtH.EddyS. R.DennisP. P. (2000). Homologs of small nucleolar RNAs in Archaea. Science288, 517522.10.1126/science.288.5465.517

  • 23

    PakJ.FireA. (2007). Distinct populations of primary and secondary effectors during RNAi in C. elegans. Science315, 241244.

  • 24

    ReherM.GebhardS.SchonheitP. (2007). Glyceraldehyde-3-phosphate ferredoxin oxidoreductase (GAPOR) and nonphosphorylating glyceraldehyde-3-phosphate dehydrogenase (GAPN), key enzymes of the respective modified Embden-Meyerhof pathways in the hyperthermophilic crenarchaeota Pyrobaculum aerophilum and Aeropyrum pernix. FEMS Microbiol. Lett.273, 196205.10.1111/j.1574-6968.2007.00787.x

  • 25

    RepoilaF.MajdalaniN.GottesmanS. (2003). Small non-coding RNAs, co-ordinators of adaptation processes in Escherichia coli: the RpoS paradigm. Mol. Microbiol.48, 855861.10.1046/j.1365-2958.2003.03454.x

  • 26

    SantangeloT. J.CubonovaL.JamesC. L.ReeveJ. N. (2007). TFB1 or TFB2 is sufficient for Thermococcus kodakaraensis viability and for basal transcription in vitro. J. Mol. Biol.367, 344357.10.1016/j.jmb.2006.12.069

  • 27

    StraubJ.BrenneisM.Jellen-RitterA.HeyerR.SoppaJ.MarchfelderA. (2009). Small RNAs in haloarchaea: identification, differential expression and biological function. RNA Biol.6, 281292.10.4161/rna.6.3.8357

  • 28

    TangT. H.BachellerieJ. P.RozhdestvenskyT.BortolinM. L.HuberH.DrungowskiM.ElgeT.BrosiusJ.HuttenhoferA. (2002). Identification of 86 candidates for small non-messenger RNAs from the archaeon Archaeoglobus fulgidus. Proc. Natl. Acad. Sci. U.S.A.99, 75367541.10.1073/pnas.052712999

  • 29

    TangT. H.PolacekN.ZywickiM.HuberH.BruggerK.GarrettR.BachellerieJ. P.HuttenhoferA. (2005). Identification of novel non-coding RNAs as potential antisense regulators in the archaeon Sulfolobus solfataricus. Mol. Microbiol.55, 469481.10.1111/j.1365-2958.2004.04428.x

  • 30

    VogelJ. (2009). A rough guide to the non-coding RNA world of Salmonella. Mol. Microbiol.71, 111.10.1111/j.1365-2958.2008.06505.x

  • 31

    WildermanP. J.SowaN. A.FitzgeraldD. J.FitzgeraldP. C.GottesmanS.OchsnerU. A.VasilM. L. (2004). Identification of tandem duplicate regulatory small RNAs in Pseudomonas aeruginosa involved in iron homeostasis. Proc. Natl. Acad. Sci. U.S.A.101, 97929797.10.1073/pnas.0403423101

  • 32

    WurtzelO.SapraR.ChenF.ZhuY.SimmonsB. A.SorekR. (2010). A single-base resolution map of an archaeal transcriptome. Genome Res.20, 133141.10.1101/gr.100396.109

  • 33

    ZagoM. A.DennisP. P.OmerA. D. (2005). The expanding world of small RNAs in the hyperthermophilic archaeon Sulfolobus solfataricus. Mol. Microbiol.55, 18121828.10.1111/j.1365-2958.2005.04505.x

Appendix

Table A1

ProductP. aerophilumP. arsenaticumP. calidifontisP. islandicum
Transcription initiation factor IIB1645 (225,409)1976 (12,16)0584 (1,1)1667 (8,39)
DNA-cytosine methyltransferase1659 (4,0)1839 (1,0)05761675 (2,0)
Rhomboid family protein1099 (3,0)0267 (1,0)06861249 (2,0)
Ferric uptake regulator, Fur family2309 (40,11)15261653 (1,0)1023
30S ribosomal protein S12P06702326 (2,0)20960698 (7,1)
Cobalamin adenosyltransferase171507820623a (86,4)1701 (0,3)
Thiol:disulfide interchange protein3152167217940523 (32,0)
30S ribosomal protein S11P3179 (15,2)1654 (0,2)18130540
NAD-dependent deacetylase35001959a (12,3)19630793
NADH dehydrogenase subunit A3520 (9,0)1954 (0,1)19830847
30S ribosomal protein S3P1779076905531729 (9,0)
Translation initiation factor IF–1A1072 (7,0)027806811256
Putative transcriptional regulator, GntR family2315 (0,10)1532 (4,2)1659 (0,2)1028
Valyl-tRNA synthetase2297 (4,0)1497 (0,1)16491019 (0,1)
Putative signal-transduction protein with CBS domains2961 (4,0)1332 (0,1)11430364
Major facilitator superfamily MFS_11550s (3,5)0660s (0,2)0530s (0,2)
Elongation factor 1, beta/beta’/delta chain0695 (3,1)234521140684
Egghead-like protein0042 (3,2)10762043 (0,1)0056
V-type ATP synthase subunit B11460237 (3,0)06981264
Conserved protein (possible ATP binding)0793 (3,11)004421381084
Putative transcriptional regulator, ModE family0813 (2,0)005700231100
50S ribosomal protein L18e06722328 (2,0)20980696 (0,1)
ABC transporter related1393 (2,0)044518791525
Peptidase M5017022238 (2,1)06181696
Cation diffusion facilitator family transporter0568 (2,0)223912150125 (0,2)
paREP1014800613 (2,0)0811 (0,1)1575
Exosome complex RNA-binding protein Rrp422206 (2,1)193809320835
Inner-membrane translocator3412 (2,0)117410460977
NADH-ubiquinone oxidoreductase subunit227420470329 (1,0)
CopG domain protein DNA-binding domain protein2357 (1,0)156116890622
Inner-membrane translocator3348 (1,0)176004440590
Amino acid-binding ACT domain protein2296151016481018 (1,1)
Hydrogen sulfite reductase2596 (1,0)12131457
DNA-directed RNA polymerase subunit P2258 (0,1)18251624 (1,0)0899
DNA polymerase, beta domain protein18930821 (1,0)1502
Phosphate ABC transporter, inner membrane subunit PstC1396044318811527 (1,0)
Nicotinamide-nucleotide adenylyltransferase14380405 (1,0)07941561
Peptidase S8 and S53, subtilisin, kexin, sedolisin198320561805 (1,0)
Glu/Leu/Phe/Val dehydrogenase,C terminal3438187110310980 (1,0)
D-isomer specific 2-hydroxyacid dehydrogenase, NAD-binding3320 (1,1)173617410566
Electron transfer flavoprotein, alpha subunit07212372 (1,0)21320645 (0,1)
Sua5/YciO/YrdC/YwlC family protein2978 (1,0)134511290378 (0,1)
AAA ATPase35271626 (1,0)19780145
MazG nucleotide pyrophosphohydrolase1159 (1,0)02220722
30S ribosomal protein S8P20982009 (0,2)01761865 (1,0)
Transcriptional regulator, XRE family07830037 (1,0)21451076
Acyl-CoA dehydrogenase domain protein2070210301991853 (1,0)
2-dehydropantoate 2-reductase3409 (1,0)200303831363
FHA domain containing protein08160060 (1,0)00261103
PaREP1 domain containing protein32350464 (0,3)1514 (1,0)
30S ribosomal protein S19e3043 (1,1)179009880440 (0,39)
CutA1 divalent ion tolerance protein2325153916671044 (1,0)
Nitrilase/cyanide hydratase and apolipoprotein N-acyltransferase2075 (1,1)201902031857
Inner-membrane translocator2083 (1,0)150408260317 (0,2)
30S ribosomal protein S7P0733 (1,1)000100060655 (0,3)
Ribosomal protein L113104 (1,0)160218320464
Metallophosphoesterase32111639 (1,0)02391924
Acetolactate synthase, large subunit, biosynthetic type33001724 (1,1)17530554 (0,2)
NAD+ synthetase1219 (0,1)0310 (1,0)07931302
30S ribosomal protein S3Ae3472 (1,0)185211820771 (0,1)
Band7 protein075000152166 (1,0)1055
TGS domain protein1649184405811670 (1,0)
MoaD family protein0727 (0,1)23682136 (1,0)0649
Putative circadian clock protein, KaiC0729 (1,1)2366 (0,2)00100651
Tryptophanyl-tRNA synthetase3091 (1,0)161218220454
Aldehyde ferredoxin oxidoreductase0622 (1,4)228520570738
Inner-membrane translocator335017610445 (0,1)0591 (1,0)
Tyrosyl-tRNA synthetase06302290 (1,0)20620733
NADH-quinone oxidoreductase, B subunit2928100119570336 (1,0)
Prephenate dehydratase0893s (0,51)011100751150 (1,0)

Orthologous genes with 5′ sequencing reads. Orthologous groups are shown in each row where the locus tag number (e.g., 1645 for gene PAE1645) is followed by counts of (antisense, sense) reads. Groups are ranked by the total number of reads found within groupings formed by the number of species in a group with antisense sequencing reads. Read counts are accumulated by considering the largest region covered by at least one read in an overlapping region along a given strand, and assigning the read count to that region. Footnoted gene IDs have associated snoRNA-like sRNA (C/D box or H/ACA-like) – a, antisense oriented; s, sense oriented.

Table A2

ProductP. aerophilumP. arsenaticumP. calidifontisP. islandicum
Electron transfer flavoprotein, alpha subunit0721a (381,54)2372a (258,14)2132a (2145,21)0645a (22,6)
DNA-directed RNA polymerase, M/15 kDa subunit3480a (153,0)1847 (1,0)1177a (94,0)0776a (31,0)
NAD-dependent deacetylase3500a (83,2)1959a (6,0)1963 (1,0)0793a (26,1)
SMC domain protein2280a (2,7)1811a (9,0)1637a (7,1)0884a (10,0)
Triosephosphate isomerase1501 (2,1)0622 (1,0)0817 (13,0)1585 (1,0)
Metallophosphoesterase2243a (287,0)19130956a (26,0)0802a (63,0)
Resolvase, N-terminal domain3513a (250,0)1963a (66,0)19670797a (27,0)
Succinate dehydrogenase subunit D0719a (94,10)2361a (129,0)21300668a (46,0)
HhH-GPD family protein0880a (23,2)0101a (10,3)00661140a (235,0)
Elongation factor EF-20332a (183,0)21390213a (15,0)1957a (3,0)
Twin-arginine translocation protein, TatA/E family subunit1546a (32,16)06660534a (53,1)1615a (7,2)
Aldo/keto reductase2929a (66,14)1002a (1233,1)0966
Putative agmatinase226018231626a (13,2)0897a (611,2)
MazG nucleotide pyrophosphohydrolase1159 (133,1)0222 (352,2)0722
Ferric uptake regulator, Fur family2309a (128,0)1526a (141,1)16531023
Seryl-tRNA synthetase3158a (50,0)166718020528a (39,36)
Uridylate kinase315916651804 (20,0)0530 (39,0)
Purine and other phosphorylases, family 1147606100814a (24,0)1572 (23,0)
Isoleucyl-trna synthetase16171993a (5,0)0601a (2,1)1650
Transcriptional regulator, Fis family3027s (4,10)1779a (3,0)0999s (0,5)0429s (0,47)
GCN5-related N-acetyltransferase324618071556a (4,0)0488 (2,0)
Putative circadian clock protein, kaic07292366a (5,0)00100651 (1,0)
Conserved protein (RNA polymerase related?)19752051 (1,0)15871800 (1,0)
Lysine exporter protein (LYSE/YGGA)207720180708a (5260,0)1858
Translation initiation factor IF-2 subunit gamma006411710242a (162,0)0078
Alpha-l-glutamate ligases, rimk family181807230506a (116,0)1747
Oxidoreductase, molybdopterin binding0389083312631366a (84,0)
Ribosomal protein L25/L23197220481585a (66,8)1798
Proliferating-cell nuclear antigen-like protein072023622131a (36,0)0667
3-dehydroquinate synthase168518270566a (25,0)1689
DNA polymerase, beta domain protein region115310670856 (23,0)
Haloacid dehalogenase domain protein hydrolase178507390554a (20,0)1734
Mn2+-dependent serine/threonine protein kinase2192194809240825 (12,0)
Radical SAM domain protein2153 (0,1)081810680189a (9,0)
DNA polymerase I2180079810870816 (6,0)
Ribonuclease HII1216031207801305 (5,0)
Bifunctional GMP synthase/glutamine amidotransferase protein3369177217230600 (4,0)
Band 7 protein0750001521661055 (4,0)
Alpha-l-glutamate ligases, RimK family0645 (4,0)230220740721
Ribonucleoside-diphosphate reductase, adenosylcobalamin-dependent3155 (4,0)167017970525
Thermosome3273 (0,3)1704 (0,1)17710501 (3,5)
Metallophosphoesterase1087 (3,0)027015121254
Peptidase M242025208610101836 (3,0)
Pyruvate/ketoisovalerate oxidoreductase, gamma subunit3279170817670497 (3,0)
DNA polymerase, beta domain protein region00451137 (3,0)0385
Creatininase2983135011240383 (2,0)
Acetyl-CoA acetyltransferase1220030907811301 (2,2)
Indole-3-glycerol-phosphate synthase0570 (2,0)224012130124 (0,2)
Regulatory protein, ArsR073123640008 (2,0)0653
PaREP1 domain containing protein000210951373a (2,0)
Putative signal-transduction protein with CBS domains358813940254 (2,0)
Exosome complex exonuclease Rrp412207 (2,1)193709330836
ABC transporter related3413 (2,0)117510450976
Uroporphyrinogen III synthase HEM40589225017120116 (2,0)
Potassium transport membrane protein, conjectural2422 (2,0)144603140883
Undecaprenyl diphosphate synthase2942 (2,0)131911570348
Nucleotidyl transferase0837008000431119 (2,0)
Carbon starvation protein CstA1423 (2,0)08940860
Carboxypeptidase Taq0885 (1,2)01040069 (0,1)1143
Leucyl-tRNA synthetase1107026006911246 (1,0)
HEPN domain protein18940820 (1,0)1501
DNA-directed RNA polymerase subunit E, RpoE23563 (1,0)223019910921
5-carboxymethyl-2-hydroxymuconate Δ-isomerase268805351503 (1,0)
ATPase178907361446 (1,0)
Oligosaccharyl transferase, STT33030 (1,0)178109970431
paREP7090604920185 (1,0)
Haloacid dehalogenase domain protein hydrolase2017s (0,12)2080s (0,15)1016s (1,4)1830
Egghead-like protein0042107620430056 (1,0)
Putative transcriptional regulator, CopG family1443 (0,2)039907961563 (1,1)
Asparaginyl-tRNA synthetase2973 (1,0)134211330375
Succinate dehydrogenase iron-sulfur subunit0717235921280670 (1,0)
Peptidase T2, asparaginase 23083 (1,0)189209700908
Radical SAM domain protein0596225517160113 (1,0)
30S ribosomal protein S25e2188 (0,1)0790 (0,1)10790808 (1,0)
Ribosomal-protein-alanine acetyltransferase2246 (1,0)09581001
PilT protein domain protein3561 (1,0)161419890923
Peptidase S8 and S53, subtilisin, kexin, sedolisin071223552124 (1,0)0674
Nitrilase/cyanide hydratase and apolipoprotein N-acyltransferase2075s (0,49)201902031857s (1,8)
Beta-lactamase domain protein2160081010740803 (1,0)
Xanthine dehydrogenase accessory factor266902531324 (1,0)
ABC transporter related32691702 (1,0)17740503
tRNA CCA-pyrophosphorylase3325174017370570 (1,0)
Starch synthase34291878 (1,0)10380968
Dual specificity protein phosphatase15360675 (1,0)05411603
Putative endoribonuclease L-PSP3003s (1,62)1258s (0,210)1096s (0,81)0414s (0,65)
Sulfite reductase, dissimilatory-type beta subunit25971212 (1,0)1456
Methyltransferase small0261219902360747 (1,0)
Putative transcriptional regulator, AsnC family1507 (1,0)0627 (0,1)08221590
Methyltransferase type 1111650216 (1,0)13641338
Serine/threonine protein kinase08150059 (1,0)00251102
Transcriptional regulator, PadR-like family00131087 (1,0)0038
Inner-membrane translocator3350 (1,0)176104450591
Geranylgeranyl reductase2989135511190388 (1,0)
Extracellular solute-binding protein, family 52391149404220602 (1,0)
2-methylcitrate synthase/citrate synthase II1689 (1,0)223405631692
30S ribosomal protein S6e1505 (0,1)0626 (1,0)08211589

Orthologous genes with 3′ sequencing reads. Orthologous groups, read counts, and footnotes displayed are as described in Table A1.

Table A3

ProductP. aerophilumP. arsenaticumP. calidifontisP. islandicum
Hypothetical protein3282 (3,3)171017650495 (11,6)
Hypothetical protein03012159a (673,0)02252002
Hypothetical protein3499195819620792a (26,1)
Hypothetical protein0432 (6,0)04740140
Hypothetical protein17980175 (4,0)05091742
Protein of unknown function DUF1070749001421671056 (4,0)
Hypothetical protein1503 (3,0)062408191587
Hypothetical protein29341279 (3,0)11650340 (0,1)
Hypothetical protein15170632 (3,0)0877
Hypothetical protein35461625 (3,0)19770933
Hypothetical protein0433 (3,6)04790139
Hypothetical protein3051s (0,284)17970981s (0,152)0447s (3,129)
Hypothetical protein0838 (2,2)008100441120
Hypothetical protein1710 (0,3)0785 (2,3)06201698
Hypothetical protein07282367 (2,0)21370650
Hypothetical protein1147022907061284 (2,0)
Hypothetical protein1522063608741594 (2,0)
Hypothetical protein2941 (2,0)131811580347
Hypothetical protein2338154916770634 (2,0)
Hypothetical protein2822027911871388 (2,1)
Hypothetical protein19432025 (2,0)1297
Hypothetical protein2416 (2,0)147903190879 (0,1)
Hypothetical protein0746 (2,0)001221681006
Hypothetical protein1069 (0,13)028106801257 (2,0)
Hypothetical protein3081 (2,0)189109690907 (0,1)
Hypothetical protein080000500016 (0,1)1092 (2,4)
Protein of unknown function DUF7711580223 (2,0)07111327
Hypothetical protein3135 (0,2)168317830512 (2,0)
Hypothetical protein16831828 (1,0)0567 (0,1)1688
Hypothetical protein07890040 (1,0)21421079
Protein of unknown function DUF722078 (1,0)201702051859
Hypothetical protein1641 (0,3)1979 (1,1)05871664 (0,1)
Protein of unknown function DUF542213193109380842 (1,0)
Protein of unknown function DUF4370638 (1,0)229620680727
Hypothetical protein3550162219740930 (1,0)
Hypothetical protein3467185711880993 (1,0)
Hypothetical protein131804711365 (1,1)
Hypothetical protein3556 (1,0)161919690927
Hypothetical protein25981211 (1,0)1455
Hypothetical protein1643 (1,0)197705851666
Hypothetical protein28240884 (1,0)08671387
Hypothetical protein2322153716641037 (1,0)
Hypothetical protein2177079910880817 (1,0)
Hypothetical protein1449 (0,1)0601 (0,1)0800 (1,0)1567 (0,1)
Protein of unknown function DUF5208180062 (1,0)00281105
Hypothetical protein1173021207431320 (1,0)
Hypothetical protein3004s (1,62)125910950415s (0,65)
Protein of unknown function DUF723079 (0,1)1889 (1,3)09670905
Hypothetical protein2268 (1,0)182016290894
Hypothetical protein332417391738 (0,1)0569 (1,0)
Protein of unknown function UPF0027099801720141 (1,0)1219
Hypothetical protein22101934 (1,0)09410839
Hypothetical protein17970174 (1,1)05081741
Hypothetical protein0718236021290669 (1,0)
Hypothetical protein1448060007991566 (1,0)
Hypothetical protein1613215606031648 (1,10)
Hypothetical protein101801840656 (0,1)1203 (1,0)
Hypothetical protein2429144103091892 (1,0)
Hypothetical protein3148167417920521 (1,0)
Hypothetical protein167618331682 (1,0)
Hypothetical protein2403 (1,0)147003270871 (0,3)

Hypothetical genes with 5′ sequencing reads. Orthologous groups, read counts, and footnotes displayed are as described in Table A1.

Table A4

ProductP. aerophilumP. arsenaticumP. calidifontisP. islandicum
Protein of unknown function DUF6, transmembrane1545s (16,32)0667s (2,20)0535s (1,53)1614s (2,7)
Hypothetical protein1519a (550,1)0634a (197,0)0875a (202,0)1596
Hypothetical protein0577a (6,0)2243a (18,7)1195a (2,0)0121
Hypothetical protein1836 (0,2)0710 (1,0)0502 (5,0)1752 (2,0)
Hypothetical protein3249 (4,0)180518550485 (2148,0)
Hypothetical protein1234 (349,0)0295 (1,0)15110244
Protein of unknown function DUF161420202082 (15,0)10141832 (58,0)
Hypothetical protein1687a (48,4)2232a (17,3)05651690
Hypothetical protein3138a (15,23)1680a (27,2)17860515
Hypothetical protein088901080073 (4,0)1147a (28,3)
Protein of unknown function DUF1922955a (15,1)13291147a (10,4)0358
Hypothetical protein355016221974 (1,0)0930s (2,44)
Hypothetical protein3005a (177,0)126010940416
Hypothetical protein3630103020150003 (74,0)
Hypothetical protein3245a (63,1)18080487
Hypothetical protein206921020198a (44,0)1852
Hypothetical protein3468185611860994 (31,0)
Hypothetical protein07302365 (18,0)00090652
Hypothetical protein349719561958 (12,6)0790
Hypothetical protein09360136 (8,0)01051172
Hypothetical protein3135 (0,1)16831783s (8,6)0512s (0,116)
Hypothetical protein3156166917980526 (6,0)
Hypothetical protein07480013s (6,6)1058s (0,1982)
Protein of unknown function DUF6236272209s (0,5)20130002 (6,0)
Hypothetical protein117702090746a (5,16)1317
Hypothetical protein3189164518190545 (4,0)
Hypothetical protein3295 (4,0)171917580550
Hypothetical protein25490845 (3,0)13510257
Hypothetical protein1173 (3,0)021207431320
Protein of unknown function UPF002709980172 (3,0)01411219
Hypothetical protein2504 (3,0)14121927
Hypothetical protein2326154116691042 (3,0)
Hypothetical protein1549s (3,5)0661s (0,2)0531s (0,5)1618s (0,4)
Hypothetical protein0611s (2,15)0813s (0,24)1573 (0,23)
Protein of unknown function DUF64037115331367 (2,0)
Hypothetical protein2285 (2,0)158316401010
Hypothetical protein130704731351s (2,14)
Hypothetical protein18160724 (2,0)0422 (0,1)
Hypothetical protein3251 (2,0)180318530483
Hypothetical protein1497061908051582 (2,0)
Hypothetical protein18950678 (2,0)0646
Protein of unknown function DUF224, cysteine-rich region domain protein1762075405451721 (2,0)
Hypothetical protein356822251996 (2,0)0916
Hypothetical protein1998206610301817 (2,0)
Protein of unknown function DUF1152328154216701041 (2,0)
Hypothetical protein2337154816760635 (2,0)
Protein of unknown function DUF1000944 (1,0)014101111181
Protein of unknown function DUF340, membrane1479061208121574 (1,0)
Hypothetical protein3304 (1,0)172717500557
Hypothetical protein0882 (1,0)010200671141
Hypothetical protein11300243 (1,0)06971241
Hypothetical protein1449060108001567 (1,0)
Hypothetical protein2190194609270827 (1,0)
Hypothetical protein0927 (1,0)013100831161
Hypothetical protein0708235321220676 (1,0)
Hypothetical protein26061232 (1,0)1431
Hypothetical protein21870791 (1,0)10800809 (0,1)
Hypothetical protein02391512 (1,0)0690
Protein of unknown function DUF102833801006 (1,0)0160
Hypothetical protein2154 (1,0)081710690942
Hypothetical protein3161166618030529 (1,0)
Hypothetical protein2058231104011850 (1,0)
Hypothetical protein08400083 (1,0)00461122

Hypothetical genes with 3′ sequencing reads. Orthologous groups, read counts, and footnotes displayed are as described in Table A1.

Summary

Keywords

antisense small RNA, archaea, transcriptome sequencing, comparative genomics, gene regulation, C/D box small RNA

Citation

Bernick DL, Dennis PP, Lui LM and Lowe TM (2012) Diversity of Antisense and Other Non-Coding RNAs in Archaea Revealed by Comparative Small RNA Sequencing in Four Pyrobaculum Species. Front. Microbio. 3:231. doi: 10.3389/fmicb.2012.00231

Received

28 April 2012

Accepted

06 June 2012

Published

02 July 2012

Volume

3 - 2012

Edited by

Frank T. Robb, University of California, USA

Reviewed by

Mircea Podar, Oak Ridge National Laboratory, USA; Imke Schroeder, University of California Los Angeles, USA; Matthias Hess, Washington State University, USA; Lanming Chen, Shanghai Ocean University, China

Copyright

*Correspondence: Todd M. Lowe, Department of Biomolecular Engineering, University of California, 1156 High Street, Santa Cruz, CA 95064, USA. e-mail:

This article was submitted to Frontiers in Evolutionary and Genomic Microbiology, a specialty of Frontiers in Microbiology.

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics