Reannotation of the Ribonucleotide Reductase in a Cyanophage Reveals Life History Strategies Within the Virioplankton

Ribonucleotide reductases (RNRs) are ancient enzymes that catalyze the reduction of ribonucleotides to deoxyribonucleotides. They are required for virtually all cellular life and are prominent within viral genomes. RNRs share a common ancestor and must generate a protein radical for direct ribonucleotide reduction. The mechanisms by which RNRs produce radicals are diverse and divide RNRs into three major classes and several subclasses. The diversity of radical generation methods means that cellular organisms and viruses typically contain the RNR best-suited to the environmental conditions surrounding DNA replication. However, such diversity has also fostered high rates of RNR misannotation within subject sequence databases. These misannotations have resulted in incorrect translative presumptions of RNR biochemistry and have diminished the utility of this marker gene for ecological studies of viruses. We discovered a misannotation of the RNR gene within the Prochlorococcus phage P-SSP7 genome, which caused a chain of misannotations within commonly observed RNR genes from marine virioplankton communities. These RNRs are found in marine cyanopodo- and cyanosiphoviruses and are currently misannotated as Class II RNRs, which are O2-independent and require cofactor B12. In fact, these cyanoviral RNRs are Class I enzymes that are O2-dependent and may require a di-metal cofactor made of Fe, Mn, or a combination of the two metals. The discovery of an overlooked Class I β subunit in the P-SSP7 genome, together with phylogenetic analysis of the α and β subunits confirms that the RNR from P-SSP7 is a Class I RNR. Phylogenetic and conserved residue analyses also suggest that the P-SSP7 RNR may constitute a novel Class I subclass. The reannotation of the RNR clade represented by P-SSP7 means that most lytic cyanophage contain Class I RNRs, while their hosts, B12-producing Synechococcus and Prochlorococcus, contain Class II RNRs. By using a Class I RNR, cyanophage avoid a dependence on host-produced B12, a more effective strategy for a lytic virus. The discovery of a novel RNR β subunit within cyanopodoviruses also implies that some unknown viral genes may be familiar cellular genes that are too divergent for homology-based annotation methods to identify.

Ribonucleotide reductases (RNRs) are ancient enzymes that catalyze the reduction of ribonucleotides to deoxyribonucleotides. They are required for virtually all cellular life and are prominent within viral genomes. RNRs share a common ancestor and must generate a protein radical for direct ribonucleotide reduction. The mechanisms by which RNRs produce radicals are diverse and divide RNRs into three major classes and several subclasses. The diversity of radical generation methods means that cellular organisms and viruses typically contain the RNR best-suited to the environmental conditions surrounding DNA replication. However, such diversity has also fostered high rates of RNR misannotation within subject sequence databases. These misannotations have resulted in incorrect translative presumptions of RNR biochemistry and have diminished the utility of this marker gene for ecological studies of viruses. We discovered a misannotation of the RNR gene within the Prochlorococcus phage P-SSP7 genome, which caused a chain of misannotations within commonly observed RNR genes from marine virioplankton communities. These RNRs are found in marine cyanopodo-and cyanosiphoviruses and are currently misannotated as Class II RNRs, which are O 2independent and require cofactor B 12 . In fact, these cyanoviral RNRs are Class I enzymes that are O 2 -dependent and may require a di-metal cofactor made of Fe, Mn, or a combination of the two metals. The discovery of an overlooked Class I β subunit in the P-SSP7 genome, together with phylogenetic analysis of the α and β subunits confirms that the RNR from P-SSP7 is a Class I RNR. Phylogenetic and conserved residue analyses also suggest that the P-SSP7 RNR may constitute a novel Class I subclass. The reannotation of the RNR clade represented by P-SSP7 means that most lytic cyanophage contain Class I RNRs, while their hosts, B 12 -producing Synechococcus and Prochlorococcus, contain Class II RNRs. By using a Class I RNR, cyanophage avoid a dependence on host-produced B 12 , a more effective strategy for a lytic virus. The discovery of a novel RNR β subunit within cyanopodoviruses also implies that some unknown viral genes may be familiar cellular genes that are too divergent for homology-based annotation methods to identify.

INTRODUCTION
Viruses are the most abundant biological entities on the planet, with an estimated 10 31 viral particles globally (Suttle, 2005). While viruses are known to infect cellular life from all three domains, viruses largely influence ecosystems through the infection of microbial hosts. In the oceans, 10 23 viral infections are estimated to take place every second, resulting in the mortality of approximately 20% of marine microbial biomass each day (Suttle, 2007). Cell lysis resulting from viral infection influences ocean biogeochemical cycling by returning particulate and dissolved organic matter to the water column (Suttle, 2005;Jover et al., 2014), where it may be taken up by microbial populations to fuel new growth, or exported to the deep ocean (Suttle, 2007;Laber et al., 2018). Viral predation can also influence biogeochemical cycles through the restructuring of microbial populations (Rastelli et al., 2017), metabolic reprogramming of host cells (Lindell et al., 2005;Puxty et al., 2016), and horizontal gene transfer (Lindell et al., 2004).
While the importance of viruses within marine microbial communities is now commonly accepted, the biological and ecological details of viral-host interactions that influence the transformations of nutrient elements in ecosystems are largely unknown. Bridging the gap between genetic observations and ecosystem-level effects requires an understanding of the connections between genes and phenotypes. Among viruses infecting marine microbes, genes involved in nucleotide metabolism and viral replication are highly predictive of viral phenotype and evolutionary history (Iranzo et al., 2016;Kazlauskas et al., 2016;Dolja and Koonin, 2018). For example, a point mutation in motif B of the family A DNA polymerase gene (polA) is indicative of viral life style (Schmidt et al., 2014;Chopyk et al., 2018).
All RNRs share a common catalytic mechanism in which a thiyl radical in the active site removes a hydrogen atom from the 3 hydroxyl group of the ribose sugar, thereby activating the substrate (Licht et al., 1996;Logan et al., 1999;Lundin et al., 2015). The mechanism by which the thiyl radical is generated varies greatly among RNRs and provides the biochemical basis dividing the three major RNR classes . Extant RNRs are also commonly divided by their reactivity with O 2 (Reichard, 1993): Class I RNRs are O 2 -dependent; Class II RNRs are O 2 -independent; and Class III RNRs are O 2 -sensitive ( Figure 1A).
Class III RNRs are the most dissimilar of the extant types, bearing no sequence similarity to Class I and II RNRs despite a common ancestry (Aravind et al., 2000;Lundin et al., 2015). They consist of two subunits that create radicals by cleaving S-adenosylmethionine molecules using iron-sulfur clusters (Mulliez et al., 1993). Class III RNRs are inactivated by O 2 (Eliasson et al., 1992;King and Reichard, 1995), and are therefore found only in strict or facultative anaerobes and their viruses (Fontecave et al., 2002). Class II RNRs are the only RNRs that do not require separate subunits for radical generation and catalysis (Nordlund and Reichard, 2006), and are instead encoded by a single gene, nrdJ. Class II RNRs require adenosylcobalamin (AdoCbl), a form of B 12 , to produce a radical (Blakley and Barker, 1964;Lundin et al., 2010). There are two types of Class II RNR: monomeric and dimeric (Nordlund and Reichard, 2006).
Class I RNRs are the most recent  and the most complex of the extant RNRs ( Figure 1B). Radical generation takes place on a smaller subunit (β or R2) and is transferred to a larger catalytic subunit (α or R1) (Jordan and Reichard, 1998). The α subunit is encoded by nrdA or nrdE and the β subunit is encoded by nrdB or nrdF. These genes form exclusive pairs: nrdA is found only with nrdB (nrdAB), and nrdE is found only with nrdF (nrdEF). Notably, the Class I α subunit is thought to have evolved directly from dimeric Class II RNRs, so they share several catalytic sites, though sequence similarity between the two classes remains low otherwise . The radical initiation mechanism of the β subunit further divides Class I RNRs into five subclasses (a-e) (Cotruvo et al., 2011(Cotruvo et al., , 2013Blaesi et al., 2018;Rose et al., 2018;Srinivas et al., 2018). The subclasses are divided based on the identity of the metallocofactor (or absence thereof), the identity of the oxidant, and whether the β subunit contains (and utilizes) the tyrosine radical site ( Figure 1B). Class I RNRs are generally presumed to be subclass Ia enzymes unless they can be assigned to another subclass based on sequence homology to a close relative that has been biochemically characterized (Berggren et al., 2017).
While the diversity of RNR biochemistry makes this enzyme an excellent marker for inferring aspects of viral biology, proper annotation of RNR genes is imperative for this purpose. Unfortunately, this same diversity has also fostered high misannotation rates, with one study reporting that 77% of RNRs submitted to GenBank had misannotations (Lundin et al., 2009). Most of those misannotations (88%) were due to RNR sequences being assigned to the wrong class. In response, a specialty database (RNRdb) was created for maintaining a collection of correctly annotated RNRs (Lundin et al., 2009). Even with resources such as the RNRdb, however, the complexity of RNR annotation remains daunting for non-experts. Class I RNRs can be particularly difficult to identify, as their classification relies largely on the annotation of both an α and β subunit.
Our prior work examining the phylogenetic relationships among RNRs from marine virioplankton revealed two large clades of cyanophage RNRs, the first made up of Class I enzymes and the second of Class II RNRs . The hosts of these cyanophage, marine Synechococcus and Prochlorococcus, carry Class II RNRs. Thus, the presence of such a large cyanophage clade with Class I RNRs was intriguing, and in contradiction to earlier findings that phage tend to carry an RNR gene similar to that of their host cell (Dwivedi et al., 2013). Now, the reanalysis of an RNR from the Class II-carrying cyanophage has revealed that the RNRs in this second clade are, in fact, Class I RNRs that were misannotated as Class II. The reannotation of the RNR from Prochlorococcus phage P-SSP7 from Class II to Class I implies that most known cyanophage carry RNRs that are not host-derived, nor dependent on B 12 . Additionally, our analysis suggests that the P-SSP7 RNR may represent a novel Class I RNR subclass.

The Cyano SP Clade
The RNR from Prochlorococcus phage P-SSP7 is a member of the 'Cyano II' RNR clade, as recognized by Sakowski et al. (2014) in a study of virioplankton RNRs. Based on our analysis, and to avoid confusion with the nomenclature for RNR classes, we have renamed the Cyano II clade to the Cyano SP clade, as RNRs in this clade are exclusively found within the cyanosipho-and cyanopodoviruses . We have also renamed the Cyano I clade to the Cyano M clade, as RNRs in this clade are exclusively seen in cyanomyoviruses. The aforementioned study included ten reference sequences from the (now) Cyano SP clade. Eight of those 10 references were used in the current study ( Table 1). Cyanophage KBS-S-1A was excluded because its genome has not been fully sequenced and Synechococcus phage S-CBP3 was excluded because its RNR was missing a conserved catalytic site. P-SSP7 was chosen as the clade representative because it is the most well-studied phage from this group, has a full genome available, and is the source of the original RNR misannotation.

Putative α and β Subunit Identification
Putative α and β subunit sequences were extracted from the genome of Prochlorococcus phage P-SSP7 (genome accession no. NC_006882.2). The putative Class I α subunit is the RNR currently identified in the P-SSP7 genome as ribonucleotide reductase class II (accession no. YP_214197.1) and was downloaded from NCBI in April 2018. As P-SSP7 has no annotated β subunit, candidate β sequences were identified based on length filtering of unannotated protein sequences. While Class I β subunits are typically between 350 and 400 amino acids (Kolberg et al., 2004), we expanded our search range to avoid excluding any potential Class I β subunits. Four candidate, unannotated proteins between 200 and 500 amino acids in length were downloaded for analysis in May 2018. Candidate proteins were searched against the Conserved Domain Database using batch CD-Search (Marchler-Bauer et al., 2017). The P-SSP7 putative Class I RNR α subunit and four candidate β subunit proteins were imported into Geneious Support 1 to analyze conserved residues. The putative α subunit peptide sequence was aligned with one representative of each of the known Class I subclasses (Supplementary Table 1) using the MAFFT v7.388 Geneious plug-in (Katoh and Standley, 2013) on the FFT-NS-ix1000 (iterative refinement method with 1,000 iterations) setting with the BLOSUM62 scoring matrix. If necessary, alignments were manually modified to ensure that annotated active sites in the subclass representatives were properly aligned. References have been biochemically characterized and have corresponding crystal structures, where possible. Active sites were annotated for each of the subclass representatives based on literature reports and crystal structures. Residues from the putative P-SSP7 Class I α subunit aligning with active sites in subclass representatives were recorded (Supplementary Table 2). Candidate Class I β subunit proteins were analyzed individually in the same manner, using the β subunits corresponding to the Class I α subclass representatives (Supplementary Table 1). P-SSP7 candidate β subunit proteins lacking key residues were removed from the analysis. This left a single candidate β subunit protein (accession no. YP_214198.1). Putative active sites identified in the putative β subunit are recorded in Supplementary Table 2.

Phylogenetic Reference Sequence Curation
Sequences from the RNRdb were used as phylogenetic references. The RNRdb pulls RNRs from databases including RefSeq (Coordinators, 2014;O'Leary et al., 2016) and GenBank (Clark et al., 2016), and includes RNRs from cultured and isolated organisms and viruses as well as from environmental metagenomic samples. To create a reference sequence set for phylogenetic analyses, all available Class I α (NrdA and NrdE), Class I β (NrdB and NrdF), and Class II (NrdJ) sequences were downloaded from the RNRdb on August 20, 2018 (Lundin et al., 2009). Sequences were separated into three sets (Class I α, Class I β, and Class II) before sequence curation. Exact and sub-string matches were removed from each set using CD-HIT v4.6 (Li and Godzik, 2006;Fu et al., 2012). Sequences were then divided into smaller groups of similar sequences identified by the RNRdb. RNRdb group assignment is based on phylogenetic clade membership (Berggren et al., 2017;Rozman Grinberg et al., 2018a), so division increased sequence alignment quality. Group names and subclass membership are presented in Table 2. RNRdb sequences were aligned individually by group using the MAFFT v7.388 Geneious plug-in on AUTO setting with the BLOSUM62 scoring matrix. Sequence alignments were visualized and edited in Geneious v10.2.4. Inteins within RNRdb sequences were removed manually after the initial alignment step because they are evolutionarily mobile and confound phylogenetic analyses (Perler et al., 1997;Gogarten et al., 2002). After intein removal, sequences were realigned and those lacking essential catalytic residues were removed, as they are likely nonfunctional . Other than the two tyrosine residues involved in Class I radical transport (Y730 and Y731, E. coli), the same conserved residues were used for Class I α and Class II sequences (Supplementary Table 2). Both intein removal and catalytic residue identification for all groups were done with guidance from the annotated Class I subclass and Class II representatives (Supplementary Table 1).

Sequence Preparation
Broadly, three categories of phylogenies were constructed from protein sequences: (i) Class I α-only, (ii) Class I β-only, and (iii) Class I α with Class II. All phylogenies included Cyano SP clade members (Table 1). Class I α and Class II proteins share a common ancestor , but are phylogenetically unrelated to Class I β proteins. Class I α and Class II proteins also share a common catalytic mechanism and several active sites, but are divergent enough that full-length protein sequences from both classes cannot be presented on the same phylogeny (Lundin et al., 2010). Thus, Class I α and Class II protein sequences in this analysis were trimmed to a previously defined region of interest that excluded regions not shared between the two groups (N437-S625, E. coli CQR81730.1) . The Class I α-only phylogeny allowed for greater resolution, as the phylogeny could be based on a longer protein sequence segment, being trimmed only before C225 in E. coli (CQR81730.1). Class I β sequences were trimmed to the region between W48 and Y356 (E. coli, KXG99827.1). For Class I α-only and Class I β-only phylogenies, sequences were trimmed near the N-terminus to exclude evolutionarily mobile ATP cone domains (Aravind et al., 2000). Class I β sequences were also trimmed near the C-terminus to exclude any fused glutaredoxin domains (Rozman Grinberg et al., 2018b). In all cases, trimming was guided by annotated Class I subclass (a-e) and Class II subtype (mono-or dimeric) representatives (Supplementary Table 1).
In addition to trimming, sequences were clustered prior to phylogenetic analysis, as each group contained a large number of sequences (Class I α: 15,894 sequences, Class I β: 17,109 sequences, and Class II: 9,147 sequences). To avoid inter-group mixing within individual sequence clusters, sequences were clustered by RNRdb group ( Table 2). Clustering of RNRdb sequences was performed at multiple identity thresholds (70%, 75%, and 80%) using CD-HIT v4.7 to ensure that the placement of the Cyano SP clade was not an artifact of the identity threshold, as Cyano SP members have grouped with Class II sequences in the past . Cyano SP sequences were not clustered before phylogenetic analysis. For Class I α-only and β-only phylogenies, sequences were clustered over 80% of the alignment length. For the Class I α with Class II phylogeny, sequences were clustered over 100% of the alignment length due to the short length of the trimmed region.
Two RNRdb groups, NrdABz and NrdEF, contained member sequences belonging to two Class I subclasses ( Table 2). In these cases, the Class I β sequences (NrdBz and NrdF) were assigned to subclasses based on active sites. For NrdBz, Class I β subunit enzymes were classified as subclass Ia (NrdBza) by the presence of a Tyr residue in the Tyr radical site (Tyr122 in E. coli R2), or as subclass Ic (NrdBzc) by the presence of a Phe, Leu, or Val mutation in the Tyr radical site (Lundin et al., 2009). For NrdF, Class I β subunit enzymes were classified as subclass Ib (NrdFb) or Ie (NrdFe) if carboxylate residues were conserved or missing, respectively, from the second, fourth, and fifth metal-binding sites in relation to the subclass Ib representative (Supplementary Table 1). Class I α sequences (NrdAz and NrdE), which could not be assigned to subclasses based on primary sequence alone, were assigned to a subclass based on the assignment of their corresponding β subunits. Class I α subunit sequences that were not able to be paired with a β subunit, or that were paired with more than one β subunit, were excluded from further analysis. Excluded Class I α subunit sequences included 1,006 NrdAz and 2,921 NrdE sequences, or 31% and 45% of total curated NrdAz and NrdE sequences, respectively. The excluded sequences comprised a small percentage of overall RNR diversity (Supplementary Table 3). Thus, their exclusion is not expected to have affected the phylogenetic analyses (Supplementary Table 3). All other RNRdb groups exclusively belonged to a single subclass.

Phylogenetic Tree Construction
For all phylogenetic analyses and clustering identity thresholds, cluster representatives were aligned with correspondingly trimmed α or β subunits from the Cyano SP clade. All alignments were constructed in Geneious using the MAFFT v7.388 plugin with setting FFT-NS-2 (fast, progressive method) and the BLOSUM62 scoring matrix. Trees were constructed using the FastTree v2.1.5 (Price et al., 2010) Geneious plug-in with default settings. Trees were visualized and customized in Iroki (Moore et al., 2018). Phylogenies inferred from sequences clustered at different identity thresholds can be found in the Supplementary  Figures 1-3.
Finally, a phylogeny was constructed from trimmed Class I α subunit and Class II sequences from only cyanobacteria and cyanophage. No clustering was performed. The phylogeny was constructed as described above from an alignment done using the MAFFT v7.388 plug-in with setting FFT-NS-ix1000 (iterative refinement method with 1,000 iterations).

Sequence Similarity Network
A protein sequence similarity network (SSN) was constructed with the same RNR Class I β subunit sequences used for phylogenetic analysis. The SSN was generated with the Enzyme Similarity Tool (EFI-EST) (Gerlt et al., 2015) as in , fraction: 1, minimum alignment score: 90) . As the full network was too large to visualize in Cytoscape (Shannon et al., 2003;Smoot et al., 2011), the 90% identity representative node network was used (i.e., each node in the network contained sequences that shared at least 90% amino acid identity).

RESULTS
Prochlorococcus phage P-SSP7 is a cyanopodovirus that infects the marine cyanobacterium Prochlorococcus marinus subsp. pastoris str. CCMP1986 (Sullivan et al., 2005). The RNR discovered in P-SSP7 was initially annotated as Class II based on the apparent lack of a Class I β subunit in the phage genome. The RNR from P-SSP7 also lacks an ATP cone region, a domain that is common in Class I α subunits but rare in Class II enzymes (Aravind et al., 2000;Jonna et al., 2015). This was also the first cyanophage RNR of its kind to be annotated, and consequently this gene became the baseline annotation for closely related RNRs. Prior examination of RNRs in viral shotgun metagenomes (viromes) designated the phylogenetic clade containing the RNR from P-SSP7 as the 'Cyano II' clade, recognizing that member RNRs (Table 1), exclusively from cyanophage, were annotated as Class II and seemed to fall on the Class II side of the tree . This study also recognized a 'Cyano I' clade composed exclusively of cyanomyoviruses that carried Class I RNRs . The Cyano II clade has been renamed to Cyano SP, as the clade is comprised solely of RNRs from cyanosipho-and cyanopodoviruses. The Cyano I clade has been renamed to Cyano M, as it consists of RNRs strictly from cyanomyoviruses.

P-SSP7 Class I α Subunit Identification
The first indication that the RNR from P-SSP7 was misannotated as a Class II RNR came from the observation of two consecutive tyrosine residues (Y730 and Y731 in E. coli) that are present in the C-terminus of Class I α subunits and participate in long-range radical transport between the α and β subunits of Class I RNRs (Uhlin and Eklund, 1994;Greene et al., 2017). These tyrosines are not present in Class II RNRs but are present in the P-SSP7 RNR peptide (Supplementary Table 2). To confirm the classification of the P-SSP7 RNR as a Class I enzyme, a phylogenetic tree was constructed containing Class I α subunits and Class II sequences from the RNRdb, together with the putative α subunits from the Cyano SP clade (formerly Cyano II) reported in Sakowski et al. (2014) (Figure 2). Trees were constructed at different clustering identities to ensure that the placement of Cyano SP sequences with a given RNR class was not an artifact of the clustering threshold (Supplementary Figure 1). The Cyano SP RNRs grouped with the Class I α subunit sequences in the phylogenies constructed from sequences clustered at 75% and FIGURE 2 | Maximum-likelihood phylogenetic tree of Cyano SP clade α subunits with 80% clustered Class I α and Class II RNRdb sequences trimmed to a region of interest. Gray branches belong to Class II. Colored branches belong to one of the five Class I subclasses, or Cyano SP as indicated in the key. Light purple branches indicate RNRdb groups without characterized members, which are assumed to be subclass Ia enzymes. Trees were constructed using FastTree and visualized and customized in Iroki. Scale bar represents amino acid changes per 100 positions. 80% identity, but clustered with Class II sequences in the tree made from sequences clustered at 70% identity.

P-SSP7 Class I β Subunit Identification
While the tyrosine residues within the P-SSP7 RNR are indicative of a Class I RNR, the initial annotation of the P-SSP7 RNR was made primarily because no β subunit gene could be identified within the P-SSP7 genome. Class I RNRs require a β subunit for radical generation. Because the cyanobacterial host of P-SSP7 carries a Class II RNR, the phage would have to carry its own copy of the Class I β subunit gene in order for its α subunit to function. All unannotated proteins in the P-SSP7 genome approximately the length of a Class I β subunit in the P-SSP7 genome were considered RNR β subunit candidates. Four predicted proteins within the genome matched this length criteria. A batch CD-Search (Marchler-Bauer et al., 2017) of the candidate β subunit peptide sequences was unable to identify any conserved domains in any of the sequences. Thus, we aligned the candidate P-SSP7 β subunit sequences with the sequences of biochemically characterized β subunits from each of the known Class I subclasses (Supplementary Table 1). Only one of the candidate sequences, accession no. YP_214198.1, was found to contain residues experimentally shown to be required for β subunit function (Supplementary Table 2). The hypothetical protein also resided directly downstream of the α subunit, where the β subunit is typically found (Dwivedi et al., 2013). Thus, YP_214198.1 was identified as the missing P-SSP7 β subunit.

Assignment of P-SSP7 RNR to a Class I Subclass
Class I subclasses are based on the mechanism of radical generation utilized by the β subunit. Alignment with representative Class I RNR β subunit sequences found that the P-SSP7 β subunit lacked the tyrosine residue (Y122 in E. coli R2) on which the stable protein radical is formed in subclasses Ia, Ib, and Ie ( Figure 1B). The lack of the tyrosine residue seemed to indicate that the P-SSP7 β subunit belonged to subclass Ic, as Ic is the only described subclass that lacks this residue completely (the residue is conserved in Id but does not harbor a radical) (Högbom et al., 2004;Blaesi et al., 2018;Rose et al., 2018;Srinivas et al., 2018). Each subclass has a unique combination of metal-binding residues and uses a different metallocofactor (or does not bind metals at all, in the case of subclass Ie) Srinivas et al., 2018). The residues in the putative P-SSP7 β subunit aligning with the first sphere of metal-binding residues of the subclass representatives (Table 3) were consistent with Class I RNRs that require metallocofactors (subclasses Ia-Id) and exactly matched subclasses Ic and Id Srinivas et al., 2018). However, when considering second sphere binding residues, the overall pattern of metal-binding residues in the P-SSP7 β subunit did not match that of any subclass representative ( Table 3), nor of any existing RNRdb group ( Table 4).
Known Class I subclasses are either monophyletic or contain members that are closely related (Berggren et al., 2017;Rozman Grinberg et al., 2018a). Thus, phylogenetic trees were constructed to confirm proper subclass assignment of the P-SSP7 RNR using Class I β subunit sequences from the RNRdb clustered at 70%, 75%, and 80% and β subunits from the Cyano SP clade members. In a phylogenetic analysis of the 70% identity cluster representative sequences, the P-SSP7 β subunit and Cyano SP homologs were phylogenetically distinct from known RNRs, and did not clearly join with RNRdb groups, instead branching directly off the backbone of the tree (Figure 3). In the phylogenetic reconstructions at 75% and 80% identity, the Cyano SP group remained distinct but branched closely with either the NrdBg group (75% identity, subclass Ia) or the NrdBh group (80% identity, subclass Ia presumed) (Supplementary Figure 2). Figure 2), making it unlikely that the Cyano SP clade belongs to subclass Ic.

Notably, the Cyano SP β subunits branched away from subclass Ic members (NrdBzc subgroup) in all phylogenies (Supplementary
Because Class I subclass assignment was inconclusive based on the β subunit metal-binding residues and phylogenetic analysis, we constructed a protein sequence similarity network (SSN) using the Enzyme Similarity Tool (EFI-EST) (Gerlt et al., 2015) as per Rose et al. (2018) with the same β subunit sequences used for phylogenetic tree construction (Figure 4). The SSN also provided an alignment-free method for viewing connections between RNR sequences, an especially important consideration for highly divergent peptides such as the Cyano SP clade RNRs (Gerlt et al., 2015). Most sequences were members of large, distinct subgraphs with sequences exclusively from a single RNRdb group (e.g., NrdBk and NrdBg). However, some RNRdb groups were evenly spread across multiple subgraphs of similar size (e.g., NrdBh and NrdBi), likely indicating a higher level of sequence heterogeneity  than other groups. The Cyano SP clade representatives formed exclusive subgraphs not connected to other RNRdb sequences and were divided into three singleton and one non-singleton cluster, indicating that the clade representatives are divergent even from each other. Assignment of the Cyano SP RNRs to an existing Class I subclass could not be reliably made based on the analysis of β subunit metal-binding residues, phylogenies, or the protein SSN. Instead, the missing tyrosine radical residue, unique pattern of metal-binding sites, and phylogenetic divergence of the Cyano SP β subunits from RNRdb groups likely indicate that the Cyano SP clade represents a novel Class I subclass.
Origin of the P-SSP7 RNR Class I α and β subunits tend to evolve in units, producing highly similar phylogenies (Lundin et al., 2010;Dwivedi et al., 2013). Because placement of the Cyano SP β subunits on phylogenetic trees changed with the percent amino acid identity used for clustering RNR sequences (Supplementary Figure 2), the Cyano SP α subunits were evaluated for clues to the FIGURE 4 | Protein sequence similarity network of the Cyano SP clade and all RNRdb Class I β subunit sequences included in phylogenetic analysis. Nodes represent sequence clusters ≥90% similarity. Nodes are colored based on RNRdb group and match leaf dot colors on the cladogram in Figure 3. Edges connect nodes with minimum alignment score ≥90. Network was visualized and customized in Cytoscape.
origin of the RNR in P-SSP7. Class I α-only phylogenies were built from sequences longer than those used for the combined Class I α-Class II phylogenies, allowing greater phylogenetic resolution. Representative RNRdb Class I α subunit sequences from 70%, 75%, and 80% identity clusters were assessed. Regardless of the clustering identity, the Class I α subunit phylogenies showed consistent placement of the Cyano SP clade as an outgroup for the branch that contains RNRdb groups NrdAi (subclass Id) and NrdAk (subclass Ia presumed) (Figure 5 and Supplementary Figure 3). Like the Class I β phylogenies, the Cyano SP α subunit clade was distinct and was not surrounded by any RNRdb group. The phylogenetic placement of the Cyano SP Class I α sequences among RNRdb groups (Figure 5 and Supplementary Figure 3) was different from that seen for the Cyano SP Class I β sequences (Figure 3  and Supplementary Figure 2). Thus, a conclusive placement for the Cyano SP β subunits among RNRdb groups was not possible.

The Cyano SP RNR Has Adapted to the Intracellular Environment
The perceived lack of a β subunit gene in the P-SSP7 genome may have led to the initial misannotation of the P-SSP7 RNR gene as a Class II RNR (Sullivan et al., 2005). Additionally, it seems unusual for a virus to carry a different class of RNR than its host (Dwivedi et al., 2013). Given that cellular organisms carry RNRs that are adapted to their environmental niche (Reichard, 1993;Cotruvo et al., 2011), viruses would also likely benefit from having the same RNR type as their host cell. The preference FIGURE 5 | Cladogram of near full-length Cyano SP and RNRdb Class I α subunit sequences clustered at 80%. Branch colors indicate Class I subclass and leaf dot colors correspond to RNRdb group. Colors matching to clades in Figure 3 indicate α/β subunit pairs. Note there are α subunit clades that do not have corresponding, distinct β subunit clades, as the α subunits have diverged more than the β subunits. NrdAm β subunits belong to β subunit group NrdBh. NrdAq β subunits belong to β subunit subgroup NrdBza. Trees were constructed using FastTree and visualized and customized in Iroki. Scale bar represents amino acid changes per 100 positions.
for a potentially iron-dependent Class I RNR enzyme among cyanophage seems puzzling considering that iron is often the primary limiting nutrient in the oceans, including in regions dominated by Synechococcus and Prochlorococcus (Moore et al., 2013;Browning et al., 2017), hosts infected by phage within the Cyano SP (cyanosipho-and cyanopodoviruses) ( Table 1) and Cyano M (cyanomyoviruses) clades. Synechococcus and Prochlorococcus are also some of the few B 12 producers in the oceans (Heal et al., 2016;Helliwell et al., 2016). Therefore, B 12 availability would seem to be sufficient for viral replication with a B 12 -dependent Class II RNR, while iron availability for phage-infected cells could be too low to support the highly lytic phenotype displayed by many of these phage.
However, carrying a Class I RNR would relieve marine cyanophage of their dependence on the host to produce sufficient levels of B 12 for deoxyribonucleotide synthesis by a Class II enzyme. Although it is less limiting in ocean waters, B 12 is likely to be more limiting than iron inside a cyanobacterial cell. In Cyanobacteria, B 12 is used as a cofactor for two enzymes, the Class II RNR (NrdJ) and methionine synthase (MetH) (Heal et al., 2016). NrdJ is needed only while the cell is actively replicating, thus, transcription of this gene is closely tied with the cell cycle (Herrick and Sclavi, 2007;Mowa et al., 2009). Similarly, MetH expression is high during early growth of the B 12 -producing cyanobacterium Synechocystis but decreases when cells enter the stationary growth phase (Tanioka et al., 2009). Given that NrdJ and MetH are both tied to cellular growth, intracellular B 12 concentrations are likely highly variable.
In addition, cobalt, the metal at the center of B 12 , is required almost exclusively for B 12 formation and is tightly controlled because of its toxicity to cells (Waldron et al., 2009;Huertas et al., 2014). In contrast, both iron and manganese are required for numerous proteins and molecules within a cyanobacterial cell that are needed throughout the cell cycle (Palenik et al., 2003;Shcolnick and Keren, 2006). Cytoplasmic cyanobacterial iron and manganese quotas have been documented at 10 6 atoms/cell (Keren et al., 2002(Keren et al., , 2004) and a study that aimed to identify and quantify metals in a cyanobacterium found that iron was present in high intracellular concentrations, while cobalt concentrations were below the detection limit (Barnett et al., 2012). Furthermore, some Prochlorococcus are able to maintain growth while uptaking just one atom of cobalt per cell per hour (Hawco and Saito, 2018). Therefore, upon infection, a cyanophage would encounter an intracellular pool of iron many fold larger than that of B 12 .
The acquisition of B 12 from the surrounding environment also seems unlikely. B 12 is bulky and structurally complex, requiring special transporters which neither Prochlorococcus, Synechococcus, nor their phages are known to encode (Rodionov et al., 2003;Tang et al., 2012;Pérez et al., 2016). Furthermore, one study showed that while some organisms, such as eukaryotic microalgae, are able to import partial or finished forms of B 12 , Synechococcus and likely Prochlorococcus are unable to do this (Helliwell et al., 2016). Instead, Synechococcus is required to synthesize B 12 start to finish (Helliwell et al., 2016), likely because both Prochlorococcus and Synechococcus produce a form of B 12 that seems to be unique to Cyanobacteria (Heal et al., 2016).
Finally, B 12 is energetically expensive to synthesize. B 12 synthesis requires a long pathway made up of roughly twenty different enzymes (Warren et al., 2002). By comparison, some Class I RNR metallocofactors are known to self-assemble (Cotruvo et al., 2011). At most, a metallocofactor may require a flavodoxin (NrdI) for assembly . When considering that carrying a Class I enzyme relieves the phage of relying on a complex host-mediated pathway for a molecule that is not consistently produced throughout the cell cycle, the difference in RNR type between host and phage is not surprising.
The RNR from P-SSP7 also seems to have adapted to the environment inside the host cell in other ways. The P-SSP7 β subunit lacks the tyrosine residue used for radical generation in most Class I RNR subclasses (Figure 1B). The tyrosine residue harbors a stable protein radical and is a target of nitric oxide (Eiserich et al., 1995;Radi, 2004). Tyrosine-radical scavenging nitric oxide is hypothesized to be present inside Synechococcus cells as an intermediate in nitrate reduction (Preimesberger et al., 2017), which is widespread among freshwater and marine Synechococcus species and is coupled to photosynthesis (Guerrero, 1985;González et al., 2006;Klotz et al., 2015;Sunda and Huntsman, 2015). Thus, the loss of the tyrosine radical site in the Class I β subunit genes of cyanophage, such as P-SSP7, would enable these phages to avoid RNR inactivation by nitric oxide.

Connections Between RNR and Cyanophage Phenotype
RNR type appears to be predictive of cyanophage morphology, as membership in a particular RNR clade corresponds to phenotype. For example, most marine cyanophage belong to the Cyano M and Cyano SP clades. The Cyano M clade consists of subclass Ia RNRs belonging solely to cyanomyoviruses. The Cyano SP clade consists of RNRs from the proposed novel subclass (If) belonging solely to cyanosipho-and cyanopodoviruses. In addition, all sequenced Class II RNRs from marine cyanophage belong exclusively to the P60 clade, which contains only cyanosiphoand cyanopodoviruses. Furthermore, the myovirus Cyanophage S-TIM5 carries a subclass Id RNR, a subclass not carried by cyanosipho-or cyanopodoviruses. Differences in the RNRs carried by different morphological groups cyanophage have been used to demonstrate possible niche exclusion in the diel infection dynamics of cyanomyovirus and cyanopodovirus populations in the Chesapeake Bay .
Most Class I RNR α subunits contain an ATP cone region. ATP cones are regulatory sites that essentially act as on/off switches for RNRs (Brown and Reichard, 1969;Aravind et al., 2000). When ATP is bound, the RNR holoenzyme enters a conformational state that allows for function (Eriksson et al., 1997). Once dNTP levels rise high enough, dATP binds the ATP cone and the holoenzyme enters a non-functional conformation (Eriksson et al., 1997;Mathews, 2006). Intriguingly, the Class I α subunits of the Cyano SP clade do not have ATP cones. This is unusual for Class I α subunits and likely represents an evolutionary loss, given that only two Class I α subunit clades (NrdAi/NrdAk and NrdEb/NrdEe) (Figure 5) lack ATP cones (Aravind et al., 2000;Jonna et al., 2015). In losing the ATP cone domain, the Cyano SP RNRs have lost this regulatory switch. As a consequence, the RNR of cyanopodo-and cyanosiphoviruses cannot be inactivated through dATP binding, which would be beneficial to a fast-replicating lytic phage (Chen et al., 2009).
The highly lytic nature of the Cyano SP clade is also reflected in the biochemistry of the family A DNA polymerase gene (polA) carried by some of the members of the clade (Supplementary Table 4). The amino acid residue at position 762 (E. coli numbering) plays a role in shaping the activity and fidelity of Pol I (polA peptide) and is hypothesized to be reflective of phage lifestyle (Schmidt et al., 2014). Prior work found that a mutation from phenylalanine to tyrosine at position 762 produced a 1,000fold increase in processivity with a concomitant loss of fidelity (Tabor and Richardson, 1987). Three of the member phages within the Cyano SP clade carry a Pol I with a tyrosine at position 762, indicating that Cyano SP members are capable of fast DNA replication. Other members carry polA genes that contain a frameshift mutation, preventing identification of the 762 position. Pairing an unregulated RNR, such as the Cyano SP RNR, with a highly processive DNA polymerase would be advantageous for a highly lytic phage. This phenotype is thought to be characteristic of most cyanopodoviruses (Suttle and Chan, 1993;Wang and Chen, 2008;Schmidt et al., 2014). Observations of gene associations such as Tyr762 PolA and Cyano SP clade Class I RNR can thus inform predictions of the possible life history characteristics of unknown viruses.

A Novel Class I RNR in Cyanophage
Reannotation of the P-SSP7 RNR from Class II to Class I is based primarily on the discovery of a Class I β subunit in the P-SSP7 genome. The P-SSP7 β subunit was identified using conserved residues, as no conserved domains could be identified in the previously hypothetical protein. Our discovery of the Class I β subunit via active sites and genome location demonstrates that some unknown viral proteins (i.e., the viral genetic dark matter) (Krishnamurthy and Wang, 2017) could actually be well known proteins that are simply too divergent for annotation using homology searches or gene model approaches.
The reannotation is also supported by the presence of the consecutive tyrosine residues in the C-terminus of the newly annotated Class I α subunit, which are essential for radical transfer between Class I α and β subunits (Uhlin and Eklund, 1994;Greene et al., 2017) and are not found in Class II RNRs. Additionally, two trees constructed from Class I α and Class II sequences showed the Cyano SP clade (represented by P-SSP7) on the Class I side of the tree (Figure 2 and Supplementary Figure 1B). While the 70% Class I α with Class II tree showed the Cyano SP clade on the Class II side of the tree, we believe this is an artifact of the low identity threshold and short region of interest (Supplementary Figure 1A). Protein SSNs constructed from the same sequences used in the Class I α with Class II phylogeny showed the Cyano SP clade as being distinct from both Class I and Class II sequences (Supplementary Figure 4). Thus, the high divergence of the Cyano SP clade as compared to Class I α and Class II sequences in the RNRdb are likely contributing to the Cyano SP clade grouping with Class II sequences on the 70% tree. A study of gene transcription in P-SSP7-infected Prochlorococcus cultures lends experimental support for the presence of a Class I RNR in P-SSP7. Both the P-SSP7 Class I RNR α subunit (identified as nrd-020) and the neighboring β subunit (identified as nrd-021) were co-expressed during the second stage of phage infection, during which DNA replication typically takes place (Lindell et al., 2007).
Assignment of the P-SSP7 RNR to an existing Class I subclass was inconclusive as the radical-generating β subunit (Cotruvo et al., 2011) could not be clearly assigned based on conserved residues. β subunits are used for subclassification because, unlike α subunits that all have a consistent mechanism, the mechanisms of β subunits are variable. While the P-SSP7 β subunit contains all of the conserved residues required for function (Supplementary Table 2), it lacks the tyrosine residue (Y122 in E. coli) that harbors the stable protein radical or is conserved in subclasses Ia, Ib, Id, and Ie (Nordlund and Eklund, 1993;Cotruvo et al., 2013;Blaesi et al., 2018) (Figure 1B). Assignment also could not be made to subclass Ic, the only known subclass lacking the tyrosine residue (Högbom et al., 2004), based on the outcome of phylogenetic (Figure 3 and Supplementary Figure 2) and protein SSN analysis (Figure 4). We also examined the metal-binding sites in the P-SSP7 β subunit, as metallocofactor identity is used to discriminate between subclasses Ia-Id (Cotruvo et al., 2011;Rose et al., 2018). The metal-binding residues for the P-SSP7 and other Cyano SP clade member β subunits formed a different pattern than is seen in any of the RNRdb groups ( Table 4). The combination of the unique metal-binding residues, the lack of a tyrosine residue on which to generate a protein radical, and the phylogenetic distance between the Cyano SP clade and subclass Ic (NrdBzc) sequences, suggest that the P-SSP7 Class I β subunit may constitute a novel subclass of Class I RNRs.
Origin of the P-SSP7 RNR Because P-SSP7's host, like most marine Synechococcus and Prochlorococcus, carries a Class II RNR, we were interested in the origin of the Class I RNR found in P-SSP7. The Class I β subunit phylogenies inconsistently placed the Cyano SP clade. Examination of Class I α subunit trees showed a consistent placement of the Cyano SP clade at the base of the branch harboring the RNRdb groups NrdAk (Ia presumed) and NrdAi (subclass Id) (Figure 5 and Supplementary Figure 3). This is perhaps to be expected as, like the NrdAk and NrdAi groups, the Cyano SP Class I α subunits do not contain ATP cone domains, a trait that is rare among Class I α subunits (Jonna et al., 2015).
The observation that the Cyano SP clade does not have the same placement on the Class I β-only and Class I α-only trees is highly unusual. In viruses and cellular organisms, Class I α and β subunits are thought to evolve as units (Dwivedi et al., 2013), producing trees with the same patterns (Lundin et al., 2010). However, viral genomes are known to be highly modular, consisting of genes from multiple sources (Iranzo et al., 2016;Krupovic et al., 2018). It seems possible that an ancestral phage of the Cyano SP clade incorporated the Class I α and β subunits separately. Given that Class I α and β subunits can only perform ribonucleotide reduction as a unit, i.e., both subunits are required for functionality, these acquisitions would have had to occur in quick succession to avoid loss by the phage. Perhaps in support of this hypothesis is that the Cyano SP β subunits sometimes cluster with the NrdBg group (subclass Ia) which harbors the Cyano M clade, while the Cyano SP α subunits consistently cluster with the NrdAi group (subclass Id) that contains the Synechococcus phage S-TIM5. These phage groups (i.e., Cyano SP, S-TIM5, and Cyano M) all infect marine Synechococcus and Prochlorococcus, making the possibility more likely that the Cyano SP RNRs are a mosaic of these cyanomyoviral groups, with the α subunit having been acquired from a cyanophage related to S-TIM5 and the β subunit from a member of the Cyano M clade.
A phylogeny constructed using all Cyanobacteria and cyanophage present in the RNRdb with the Cyano SP clade demonstrates that the majority of known cyanophage carry Class I RNRs (Figure 6). The Synechococcus or Prochlorococcus hosts of phages in the Cyano M, Cyano SP clades, Synechococcus phage S-TIM5, and the Cyanophage P60 clade all carry Class II RNRs (Chen and Lu, 2002;Sabehi et al., 2012;Sakowski et al., 2014). Despite being a myovirus, S-TIM5 does not carry an RNR belonging to the Cyano M clade, likely because it is believed to represent a separate lineage of myoviruses (Sabehi et al., 2012). Cyanosipho-and cyanopodoviruses were found in two widely separated clades. Lytic cyanosipho-and cyanopodoviruses within the Cyanophage P60 RNR clade contain a Class II RNR, which is the same type carried by their hosts, whereas cyanosiphoand cyanopodoviruses in the Cyano SP clade contain a Class I RNR. The biological and ecological explanations behind this divergence are a mystery; however, prior work has indicated that cyanopodoviruses can be broadly divided into two clusters, MPP-A and MPP-B, based on whole genome analyses (Huang et al., 2015), but no single gene or gene group clearly distinguishes the two clusters. Nevertheless, RNRs belonging to the Cyano SP clade seem to be more common among cyanosipho-and cyanopodoviruses Huang et al., 2015). Whether carrying a Class II RNR is the ancestral state of cyanosipho-and cyanopodoviruses could not be determined from our phylogenies.
The use of marker genes such as RNR in studying viral ecology is important in connecting genomic information to phenotypic traits. However, correct annotation of these genes is essential if accurate information is to be gained. This reannotation means that most marine cyanophage carry RNRs that did not come from their hosts (Figure 6), which has implications for our understanding about the acquisition of nucleotide metabolism genes by viruses. That Cyano SP clade members carry Class I RNRs and have lost the tyrosyl radical site in the β subunit is also a reminder that viruses have to adapt to the intracellular environment as well as the extracellular environment. Finally, the discovery of an overlooked β subunit implies that some unknown viral gene space may be composed of known genes that are too divergent for similarity-based annotation methods to detect but can still be identified by other means.

DATA AVAILABILITY STATEMENT
The datasets analyzed for this study can be found in the RNRdb 2 . Accession numbers for the Cyano SP clade, including genome accession, can be found in the Supplementary Material. The Supplementary Material also contains accession numbers for the annotated RNR subclass representatives.

AUTHOR CONTRIBUTIONS
AH did the analysis and wrote the manuscript. RM created the sequence similarity networks, assisted with the analysis, and edited the manuscript. KW and SP contributed to study design, data interpretation, and manuscript preparation. All authors read and approved the final manuscript.