Single Cell Genomics Reveals Viruses Consumed by Marine Protists

The predominant model of the role of viruses in the marine trophic web is that of the “viral shunt,” where viral infection funnels a substantial fraction of the microbial primary and secondary production back to the pool of dissolved organic matter. Here, we analyzed the composition of non-eukaryotic DNA associated with individual cells of small, planktonic protists in the Gulf of Maine (GoM) and the Mediterranean Sea. We found viral DNA associated with a substantial fraction cells from the GoM (51%) and the Mediterranean Sea (35%). While Mediterranean SAGs contained a larger proportion of cells containing bacterial sequences (49%), a smaller fraction of cells contained bacterial sequences in the GoM (19%). In GoM cells, nearly identical bacteriophage and ssDNA virus sequences where found across diverse lineages of protists, suggesting many of these viruses are non-infective. The fraction of cells containing viral DNA varied among protistan lineages and reached 100% in Picozoa and Choanozoa. These two groups also contained significantly higher numbers of viral sequences than other identified taxa. We consider mechanisms that may explain the presence of viral DNA in protistan cells and conclude that protistan predation on free viral particles contributed to the observed patterns. These findings confirm prior experiments with protistan isolates and indicate that the viral shunt is complemented by a viral link in the marine microbial food web. This link may constitute a sink of viral particles in the ocean and has implications for the flow of carbon through the microbial food web.

In a previous study, single cell genomics of nanoeukaryotic cells was employed to elucidate the lineage-specific grazing preferences of aplastidic protists in the Gulf of Maine (GoM) (Martinez-Garcia et al., 2012). Individual cells were isolated using fluorescence-activated cell sorting (FACS), their genomic DNA was amplified, and the obtained single amplified genomes (SAGs) were PCR-screened for bacterial and eukaryotic rRNA sequences. Bacterial signatures were recovered from only 4% of the aplastidic protistan SAGs. However, by targeting bacterial rRNA, the authors made the assumption that bacteria would be the primary prey for this targeted protist population. In fact, a small number of investigations suggest that some protists also prey on viruses (Suttle and Chen, 1992;González and Suttle, 1993;Bettarel et al., 2005;Bouvy et al., 2011;Deng et al., 2014). This understudied process has potential implications for nutrient cycling and how viruses impact bacterial communities (Miki and Yamamura, 2005).
Since this previous study, single cell genomics has improved considerably. It is now possible to use more efficient whole genome amplification techniques and low coverage shotgun sequencing for PCR-free screening of SAGs, which provide a less biased view of the DNA present within individual cells (Stepanauskas et al., 2017). Here we employed these updated techniques to re-examine the composition of protistan prey items in samples collected from the GoM and to examine additional cells from the Mediterranean Sea (Blanes Bay Microbial Observatory; BBMO) (Gasol et al., 2016). For the GoM samples, the new data corroborated prior PCR-based findings that a relatively small fraction of aplastidic protists contained bacterial DNA while BBMO cells exhibited a higher prevalence of protists containing bacterial DNA. Interestingly, we found viral sequences associated with a large fraction of protistan SAGs from both locations. The lack of specificity of the many of these viral sequences to a particular protistan lineage and similarities of recovered viral sequences to bacterial viruses suggest that they are likely non-infectious to the analyzed protistan cells. We explore possible reasons why non-infecting viruses are observed in protistan SAGs and propose that protist feeding on free viral particles is a likely explanation for a number of observed protistvirus associations.

Field Sample Collection and Cell Sorting
The GoM sample was collected in Boothbay Harbor, ME, United States (43 • 50 39.87 N 69 • 38 27.49 W) at one meter depth on 19 July 2009, as previously reported (Martinez-Garcia et al., 2012). For sorting of aplastidic grazers (SAG plates AAA071, AAA072), sampled water was incubated at in situ temperature for 10-60 min with LysoTracker Green DND-26 (75 nmol L −1 ; Invitrogen, Carlsbad, CA, United States) within 3 h of collection. LysoTracker Green DND-26 is a pH-sensitive, fluorescent probe that stains food vacuoles (Rose et al., 2004;Heywood et al., 2011) in live protistan cells. Aliquots of the water samples were also cryopreserved with 6% glycine betaine (final concentration; Sigma-Aldrich) and stored at −80 • C. Plastidic protists from cryopreserved samples (SAG plate AG-605) were sorted based on their chlorophyll autofluorescence from a cryopreserved sample aliquot using an BD InFlux cell sorter (BD Biosciences), a 488 nm laser for excitation and a 692/40 bandpass filter for red fluorescence emission.
Mediterranean water samples were collected in 2016 at the BBMO in winter (19 January 2016) and summer (5 July 2016). The BBMO is located in a temperate oligotrophic coastal site in the North Western Mediterranean Sea (41 • 40 N, 2 • 48 E), and features low human and riverine influence (Gasol et al., 2016). Samples were taken at 1 m depth ∼1 km offshore in a zone with 20 m depth. Samples were pre-filtered in situ through a 200 µm nylon-mesh and transported to the laboratory in 50 ml falcon tubes on ice under dim light. Sub-samples (5 ml) were amended with 6% glycine betaine (final concentration; Sigma-Aldrich), flash-frozen in liquid nitrogen, and stored at −80 • C. The entire sampling process took ∼4 h.
Aplastidic protists from the Mediterranean sample were sorted from a cryopreserved and then thawed sample stained with a SYBR Green DNA stain (20, 21) [SAG plates AG-614 (sample collected on 19 January 2016) and AH-162 (sample collected on 5 July 2016)]. Plastidic protists from BBMO [SAG plate AG-601 (sample collected on 19 January 2016)] were sorted based on chlorophyll autofluorescence as described above.
Fluorescence-activated cell sorting was performed in a cleanroom environment on either a Legacy MoFlo (Beckman-Coulter) cell sorter with a 100 µm nozzle or a BD InFlux (BD Biosciences) cell sorter with a 70 µm nozzle, which deposited single protistan cells individually into 384-well microplates. Handling of samples pre-sorting varied between samples, with some sorted live within 3 h of sample collection (AAA071, AAA072) and others flash frozen and then thawed before sorting (Supplementary Table S1). For sorts of eukaryotic cells, cells in ∼1-20 µm diameter range, i.e., corresponding to the pico-and nano-plankton, were targeted. In addition, cells in the bacterial size range (∼0.1-1 µm) were sorted from the GoM samples collected from the same location as the GoM protists on 15 June 2011 (SAG plates AD-866, AD-867) and 19 July 2009 (SAG plates AAA076, AAA158, AAA160, AAA168, AAA164, AAA169).
Drop volumes for the BD InFlux flow cytometer were estimated by determining both the stream and sample volume run for 1 min using a microbalance and the drop frequency (59 KHz) of the instrument at a differential sample pressure of 0.8 psi and sheath pressure of 33 psi. The total volume of a drop was determined to be ∼ 1 nL, and the volume of sample within each drop was ∼ 21 pL. Previously published calculations were used for estimated drop and sample volumes for protist cell sorts on the Legacy MoFlo (Sieracki et al., 2005).

SAG Generation, Sequencing and Identification
Sorted cells were lysed with KOH and their cellular DNA was amplified; generating SAGs with either multiple displacement amplification using phi29 polymerase (Thermo Fisher) or WGA-X using phi29mut8 (Thermo Fisher) (Supplementary Table S1). Reaction conditions for amplification using both methods are previously described (Stepanauskas et al., 2017). Low coverage sequencing (LoCoS), read curation and de novo assembly, were performed on all SAGs as previously described (Stepanauskas et al., 2017). No-drop negative controls were processed in parallel, and resulted in no amplification and no generated sequences. Contiguous sequences for analyzed protist SAGs are available through OSF (Brown, 2020) and via NCBI BioProject ID PRJNA655200. The 18S rRNA sequences were retrieved and characterized from genomic assemblies by BLASTn comparison of SAG contigs to SilvaMod databases [v106 and v128, (Lanzén et al., 2012)] and extraction of matching regions. Additional PCR amplification and subsequent sequencing of the 18S rRNA gene were performed on aplastidic protistan SAG plates AAA071 and AAA072, as described previously (Martinez-Garcia et al., 2012). The obtained 18S rRNA sequences were used in the identification of protists by BLASTn comparisons to SilvaMod databases (v106 and v128) and least common ancestor classification (Lanzén et al., 2012). Additional taxonomic calls were made via examination of sequence regions resembling 16S sequences from eukaryote organellar genomes based on BLASTn comparison to the NCBI nt database. 18S rRNA phylogenetic trees (Supplementary Figure S1) were constructed using the SILVA ACT web interface, using search and classify parameters of a minimum query sequence identity of 0.8, and extracting five neighbors per query sequence. Trees were computed using the "add to neighbors" workflow, which utilizes RAxML with a GTR model and a Gamma rate model for likelihoods (Pruesse et al., 2012). Trees were visualized using iToL (Letunic and Bork, 2019). Assembly information and taxonomic assignments are available in Supplementary Table S2.

Detection of Bacterial and Viral Sequences in Protistan SAGs
Contiguous sequences (contigs) from eukaryotic SAGs were compared to a collection of bacterial SAGs (Supplementary Table S1) using MASH (Ondov et al., 2016). Contigs originating in protists were considered bacterial if they matched a contig from a bacterial SAG at a MASH distance of < / = 0.05 (∼ > / = 95% ANI). Additional contigs were identified as bacterial based on diamond BLASTp (Camacho et al., 2009) comparison of contig open reading frames (ORFs) to bacterial coding sequences from RefSeq (O'Leary et al., 2016). If the majority (>50%) of ORFs on the contig matched a RefSeq bacterial protein (>50% amino acid identity over >75% of the query open reading frame), contigs were subjected to an additional BLASTn comparison to NCBI's nt database to verify that the contig most closely matched a bacterial genome, rather than a mitochondrial or chloroplast genome.
Two pieces of information were considered in an initial screen for viral contigs, similar to a previous investigation (Labonté et al., 2015), using SCGC's "ViruSCope" pipeline: 1. Fraction of virus-like ORFs. ORFs were identified as viral if a putatively viral gene was within the top 10 BLASTn hits when compared to NCBI's nr database using MICA-accelerated BLASTn (Daniels et al., 2013;Yu et al., 2015).  (Hurwitz and Sullivan, 2013)] versus a bacterial metagenome [LineP prokaryotic metagenome; IMg/MER GOLD Project ID Gm00303; (Swan et al., 2013;Wright et al., 2014)], at a percent identity match of ≥50% using DIAMOND (default settings) (Buchfink et al., 2015). These values are recorded in Supplementary Table S3, column "viruscope_metag_vbr." Using k-nearest neighbors clustering (Pedregosa et al., 2011), these two variables were assessed against a training set of values from contigs that were manually identified as viral and non-viral, with an associated p-value, in a previous study (Labonté et al., 2015), recorded in Supplementary Table S3, column "viruscope_virus_probability." Additional viral contigs were identified using VirSorter (Roux et al., 2015). Contigs identified as viral within VirSorter categories 1 and 2 were considered viral (Supplementary Table S3, columns "virsorter_category" and "virsorter_id"). These methods failed to identify single stranded DNA (ssDNA) viruses. Therefore our analytical pipeline was supplemented with a tBLASTx search for matches to a collection of ssDNA virus sequences from several marine studies (Labonté and Suttle, 2013a,b) and additional sequences obtained via searches within NCBI for "microviridae, " "circoviridae, " and "nanoviridae" (GenBank accessions in Supplementary Data Sheet S1). Contigs were identified as originating from ssDNA viruses if they shared sequence identity with ssDNA virus genome sequences (e < 10 5 ) over at least 20% of the length of the contig (Supplementary Table S3, column "ssdna_virus_match"). All identified ssDNA viral contigs were below 10 kb in length. At this point, ORFs from all putative viral sequences were subjected to an additional blastp comparison against NCBI's nr database to confirm the presence of viral genes on identified contigs, and preliminary viral identities were assigned (Supplementary Table S3, column "blast_vir_term").
Next, a network of similar contigs was built, with nodes representing contigs and edges representing shared MASH distances of < / = 0.05 (∼ > / = 95% ANI). Connected contigs were assigned to clusters using the "community" network clustering algorithm within the Louvain Python package (Blondel et al., 2008), clusters included contigs associated to each other via single linkage. Resulting contig clusters were then categorized as viral, bacterial or eukaryotic based on the origins of the contigs contained in each cluster. Clusters were considered viral if they contained at least 25% contigs identified as viral based on the above methods. Clusters were identified as bacterial if they contained any contigs from bacterial SAGs that were not categorized as viral, if they contained a member with identifiable bacterial 16S rRNA genes or if they were considered bacterial based on comparison to refseq bacterial proteins, as described above. Contigs of protistan SAGs were assumed to be of eukaryotic origin if none of the other conditions were met. Eukaryote SAGs for which all contigs were identified as either viral or bacterial were removed from further analysis. Assigned identities of individual SAG contigs can be found in Supplementary Data Sheet S2.
To further determine identities of viral contigs, ORFs were extracted using Prodigal and compared to HMM profiles provided by the Viral Orthologous Groups database (VOGDB) via hmmscan (eval < / = 0.001), part of the HMMER package (HMMER, 2018). The VOGDB is an online resource that provides an automated construction of Viral Orthologous Groups from all viral genomes within NCBI RefSeq (VOGDB, 2018). Each VOG includes an assigned function and lowest common ancestor (LCA) that aids in further characterization of viral contigs. This database was chosen instead of other available viral databases because it encompasses a larger diversity of viruses (all viruses in RefSeq), rather than domain-specific viruses (Brister et al., 2014;Thannesberger et al., 2017). LCA assignment was determined as the most highly represented LCA amongst ORFs on the contig at a minimum frequency of 25%. If no LCA was found to satisfy these criteria, the contig LCA was categorized only as "Viral" (Supplementary Table S3, column "vtype"). Virus sequences found in multiple eukaryotic phyla were further characterized via manual examination. Protein sequences identified using prodigal were compared to NCBI's nr database via BLASTp. Viral taxonomic affiliations were determined based on examination of best hits to nr.

Non-eukaryote Contigs in Protistan SAGs
Sequences of bacterial origin were identified in 19% of protistan SAGs from the GoM (12% of aplastidic and 31% of plastidic SAGs, Figure 1A) and in 48% of protistan SAGs from the Mediterranean (58% of aplastidic and 32% of plastidic SAGs, Figure 1C). LoCoS increased the fraction of aplastidic GoM SAGs in which we could detect bacterial DNA, as compared to the 4% rate of bacterial 16S rRNA gene recovery from the same SAGs using PCR in our earlier study (Martinez-Garcia et al., 2012). However, even the new results indicated that only a minority of marine protists from the GoM contained remnants of bacterial prey at the time of field sample collection. By contrast, LoCoS of the Mediterranean SAGs revealed a higher prevalence of bacterial contigs in both plastidic and aplastidic protistan SAGs (Figure 1).
We identified virus-like contigs in 51% of SAGs from the GoM (43% of aplastidic and 65% of plastidic SAGs), and in 35% of SAGs from the Mediterranean (36% of aplastidic and 33% of plastidic SAGs). Many putative viral contigs were most closely related to bacterial virus sequences in public databases (present in 40% BBH SAGs and 19% Mediterranean SAGs). In addition, sequences from 3 to 6% of the GoM SAGs resembled ssDNA viruses from Microviridae and circular repencoding single-stranded DNA (CRESS DNA) viruses families (Supplementary Figure S2). The recovery of ssDNA viruses is likely enhanced by to the use of Phi29 polymerase for single cell genome amplification; but this bias would be consistent across cells in this study. The remaining virus-like contigs resembled diverse viruses previously identified to infect picoeukaryotes such as the Phycodnaviridae or could not be taxonomically annotated. Due to the LoCoS methods used for all SAGs, it is likely that some foreign DNA present within these cells was not detected.

Distribution of Viral Sequences
The occurrence of viral sequences varied among the protistan lineages, particularly in the GoM protists (Figure 1). In the GoM SAGs, virus-like contigs were identified in 100% Choanozoa (13/13), 100% Picozoa (9/9), 65% Chlorophyta/Prasinophyta (33/51), 47% Cercozoa (14/30), 39% Stramenopiles (36/91), and 23% Alveolata (22/97). The number of SAGs in which a viral sequence was identified was significantly different between protistan lineages from the GoM SAGs based on a chi-squared contingency test of independence [chi-square(5, N = 291) = 54.12, p < 0.001, Table 1]. In the Mediterranean SAGs, virus-like contigs were found in 46% of Stramenopiles (19/41), 37% of Chlorophyta (15/41), and 22% of Haptophyta (2/9). A chi-square test of independence was calculated to assess the independence of identifying a virus in these three most numerous protist groups from the Mediterranean, and a significant interaction was not found [chi-square(2, N = 91) = 2.071, p = 0.35, Table 1]. We then conducted similar chi-square tests of independence to test the frequency of bacterial virus sequences between taxonomic groups at each location. Similar relationships were observed -the distribution of bacteriophages was significantly different between groups sampled from the GoM, but not different amongst groups sampled from the Mediterranean ( Table 1). Both data sets indicate prevalence of viral sequences in both aplastidic and plastidic (potentially mixotrophic) protistan SAGs despite differences in isolation and genome amplification techniques.
The abundance of virus-like sequences ranged between 0 and 52 per cell. On average, protists from the GoM contained significantly more viral sequences per cell and specifically bacterial virus sequences per cell than protists collected from the Mediterranean (all viruses Mann-Whitney test, U = 253918.5, p < 0.001, Figure 2A, bacterial viruses only Mann-Whitney test, U = 250668, p < 0.001, Figure 2C). The abundance of virus sequences recovered from protist SAGs varied significantly between taxa based on a one-way ANOVA to test the effect of taxonomic group on the number of virus sequences observed within a SAG [F(7,384) = 122.51, p < 0.001]. A post hoc Tukey test showed that the abundance of viruses present within Choanozoa SAGs was significantly higher than all other groups (p < 0.001), with 28 sequences per cell on average. The abundance of virus sequences recovered from Picozoa SAGs, with an average of 5.7 sequences per cell, was significantly different from all groups (p < 0.001) except for the Cryptophyta/Katablepharidophyta which averaged 2.6 sequences per cell. All other groups were not significantly different from each other ( Figure 2B). An ANOVA test and post hoc Tukey's-HSD test examining the distribution of bacterial virus sequences specifically across taxa yielded similar results [F(7,384) = 105.9, p < 0.001, Figure 2D].
Near-identical (>95% ANI) virus-like contigs were found in phylogenetically diverse GoM SAGs (Figure 3). Viruses that were found in SAGs of multiple protistan phyla resembled either bacteriophages of the Caudovirales order (25 clusters) or ssDNA viruses belonging to the CRESS DNA viruses and Microviridae (16 clusters, Figure 3B). The largest cluster of similar sequences contained a 160 kbp contig with genes indicative of a bacteriophage, likely a myovirus. Contigs from this cluster were recovered from 27 SAGs, including 9 Choanozoa, 2 Chlorophyta, 1 Picozoa, 1 Katablepharidophyta, 3 Stramenopiles, and 11 unidentified protists from the GoM (Supplementary Figure S3).

Nature of Associations of Protists and Viruses
The presence and distribution of sequences resembling bacteriophages and ssDNA viruses within picoeukaryote SAGs is intriguing. Double-stranded DNA bacteriophages are prevalent among sequenced protists in this study, yet have never been found to infect eukaryotes, and their genomes are distinct from eukaryotic viruses (Koonin et al., 2015). Not much is known about the host identities of marine circular rep-encoding single-stranded DNA (CRESS DNA) viruses. However, the sharing of near-identical sequences (>95% ANI) across eukaryotic phyla makes it unlikely that they are infecting all cells in which they are found. Other ssDNA viruses found in protist SAGs resembled gokushoviruses, members of the Microviridae known to infect bacteria (Supplementary Figure S2; Roux et al., 2012Roux et al., , 2014Labonté and Suttle, 2013a;Zhong et al., 2015;Székely and Breitbart, 2016;Creasy et al., 2018). The sharing of nearly identical viral sequences across multiple eukaryote phyla within the GoM protists (Figure 3) also makes it unlikely that these viruses are integrated into protistan genomes. Earlier single cell genomics studies of protists in marine environments have reported the presence of bacteriophage, ssDNA and other virus sequences in individuals such as Picozoa (Yoon et al., 2011), Cercozoa (Bhattacharya et al., 2012), and Stramenopiles (Roy et al., 2014;Castillo et al., 2019). In these cases, identified viruses were hypothesized to be either infecting the protist (Yoon et al., 2011), bacterial viruses originating from infected bacterial prey or epibionts (Yoon et al., 2011;Bhattacharya et al., 2012), or simply removed from analysis without further consideration (Roy et al., 2014). The results from this study indicate prevalent viral associations amongst diverse protist SAGs and provide a unique opportunity to investigate the origins of apparently non-infecting viral sequences associated with marine protists. Non-specific associations of viral particles with protist cells offer one possible explanation for the observed patterns. Such associations include non-specific attachment of viral particles to cell surfaces, which cannot be discriminated from viruses located on the interior of the protists using our methods. Alternatively, viruses could have been introduced by accidental co-sorting, where a cell and a free viral particle are co-deposited into the same microplate well during FACS for SAG generation. Our FACS detection methods did not look specifically for viral particles and therefore may have failed to fully prevent their co-sorting. Previous investigations have addressed the possibility of viruses being co-sorted with microbial cells (Sieracki et al., 2005;Labonté et al., 2015). In this study, sorted protist cells were deposited within sample drop sizes of 21 pL on the BD InFlux flow cytometer and 28 pL on the Legacy MoFlo flow cytometer. Considering typical surface marine virus populations are found at concentrations between 10 7 and 10 8 viruses/mL [0.01-0.1 viruses/pL, (Wigington et al., 2016)], we would conservatively expect to find a viral co-sort somewhere between 1 virus in every 5 protist cells sorted to 2 viruses per cell sorted, assuming a complete absence of discrimination against viral particles during cell sorting. The number of SAGs containing viral sequences are within these bounds (51% of SAGs from the GoM, 35% of SAGs from the Mediterranean). However, the non-random distribution of viral sequences across lineages (Table 1), elevated presence of viral sequences within specific protist taxa, and elevated numbers of viral sequences within individuals belonging to nearly all taxa ( Figure 2) indicate that non-specific attachment and viral cosorting cannot fully explain the observed distribution of viruses. While it is possible that different lineages have different surface properties that lead to differences the frequency of random viral attachment to cell surfaces, the shared ingestion strategies of Picozoa and Choanozoa, discussed below, suggests a more plausible explanation.
All identified Choanozoa and Picozoa SAGs contained viral sequences ( Figure 1B and Supplementary Figures S1B,F) and had significantly higher numbers of viral sequences per cell among identified protistan phyla (Figures 2B,D). Many of the viral sequences recovered were shared between cells from diverse phyla (Figure 3) and many resembled viruses from lineages that do not infect eukaryotes (Supplementary Figures S2, S3). These results indicate that all identified Choanozoa and Picozoa universally accumulated viral sequences. Intuitively, suspensionfeeding strategies, such as those employed by Choanozoa, Cercozoa, Ciliophora, and Picozoa would be conducive to viral ingestion, as particle contact would be non-selective and limited only by a maximum morphologically digestible particle size (Boenigk and Arndt, 2002). Previous investigations indicate that members of these groups are capable of consuming viruses (Pinheiro et al., 2007;Hennemuth et al., 2008;Seenivasan et al., 2013;Deng et al., 2014). Collectively, the results of our study and previous reports indicate that grazing of free viral particles by Choanozoa and Picozoa may be an important process in planktonic communities.
As bacteria and small phytoplankton are assumed to be the primary prey of heterotrophic and mixotrophic protists in the ocean (Azam et al., 1983;Sanders et al., 2000;Sherr and Sherr, 2002;Zubkov and Tarran, 2008), it is likely that some of the viral sequences in the analyzed protistan SAGs originated from phage-infected bacteria that were either protistan prey items or symbionts. It is also possible that virus-infected bacteria were preferred over non-infected bacteria by protistan grazers, similar to what was observed for zooplankton predation of infected coccolithophore populations (Evans and Wilson, 2008). Accordingly, viral and bacterial sequences co-occurred in 12% of SAGs from the GoM and in 24% of SAGs from the Mediterranean (Figure 1). 29% of SAGs containing elevated levels of viruses also contained bacterial sequences. These frequencies of cooccurrence are substantially lower than the frequency of protistan SAGs containing viral but not bacterial contigs. One possibility is that preferential digestion of bacterial DNA results in the accumulation of more viral relative to bacterial DNA. However, experimental evidence for the resistance of viruses to digestion by protists is limited and inconsistent (Pinheiro et al., 2007;Akunyili et al., 2008;Hennemuth et al., 2008;Deng et al., 2014). Importantly, many viral sequences found in association with phylogenetically diverse protists (thus unlikely to be infective) were CRESS DNA viruses, known to infect eukaryotes (Figure 3). Because the target host population for many CRESS DNA viruses is unknown, we cannot rule out the possibility that some observed CRESS DNA viruses are infecting the protists in which they were found. However, the CRESS DNA viruses found in multiple phyla were nearly identical, and in some cases identical (95-100% ANI). To suppose that all CRESS DNA viruses are infecting the cells in which they were found implies that some can infect multiple phyla. Viruses with such a host range are unprecedented, and we argue that virus ingestion provides a more plausible explanation for their observed distribution.

Implications for the Functioning of the Marine Microbial Loop
Although early observations suggested protistan grazing as a mechanism for viral removal and nanoflagellate nutrition (González and Suttle, 1993), we are not aware of a study examining how widespread and significant this process is in nature. Though the nutrient content of a single viral particle is small compared to a bacterial cell, the elemental stoichiometry of viruses, with higher P:C and P:N ratios than those of cellular organisms would provide their grazers with needed P and N (Jover et al., 2014), potentially supplementing organic carbon obtained from cellular prey and photosynthesis. Given the prevalence and high nutrient content of viruses and our limited understanding of the ecology of marine protists, the potential of viruses to serve as a food source constitutes a major knowledge gap. Both Choanozoa and Picozoa are cosmopolitan members of marine protist communities (Thaler and Lovejoy, 2015), and can be significant components of microbial eukaryote communities (King, 2005;Monier et al., 2015). Choanozoa filter 10-25% of coastal surface water each day (King, 2005). Strong indications of virus ingestion falling on specific taxa, and notably higher abundances of viruses within Choanozoa SAGs, averaging 28 viral sequences per cell, suggests that the degree of viral loss due to protist predation would be variable and dependent upon protist community structure. A recent study demonstrated the efficient removal of planktonic viruses by marine sponges, highlighting a need to reconsider viral predation by non-host organisms even among metazoans (Welsh et al., 2020).
The "viral shunt" has been the paradigm for ecosystem models and theory, with viral loss uniformly formulated as a non-interactive decay term and/or an infection/adsorption term (Thingstad, 2000;Weitz et al., 2015;Middleton et al., 2017). When building from the standard Lotka-Volterra core equations, including a consumer and a virus that both target the same resource generally leads to a collapse, with one outcompeting the other (Weitz, 2016). To maintain coexistence, models have invoked complex, multi-taxa food web dynamics, such as in the "kill the winner" model (Thingstad, 2000), higher order mortality terms for stability (Talmy et al., 2019), or ecological trade-offs (Record et al., 2016). Direct consumption of viruses by protists adds a new interaction that actually stabilizes the three-equation virus-host-consumer model without requiring a more complex model (Hsu et al., 2015). Indeed, a modeling study that incorporates viral loss due to direct grazing of viruses and predation of virus-infected microbes found that this viral loss led to weakening of the viral loop and a reduction in bacterial species richness (Miki and Yamamura, 2005).
An alternative to the "viral shunt" is a proposed "viral shuttle" which posits that aggregation of organic material from viral lysates enhances carbon export (Weinbauer, 2004;Sullivan et al., 2017). A study examining correlations between metagenomic signatures and carbon flux found that viral sequences correlated with carbon export, supporting the existence of a viral shuttle (Guidi et al., 2016). Our results suggest an additional mechanism of virus-linked carbon export through the predation of viruses by small phagotrophs, forming a viral link in the microbial food web that would weaken the viral shunt and enhance carbon movement into higher tropic levels (Figure 4). A trophic link and a viral sink may be formed even if some viral accumulation within protists stems from viruses actively infecting bacterial prey, viruses consumed by protistan prey or non-specific attachment of viruses to a cell's surface, as each of these mechanisms would contribute to the removal of viruses from the water column. Given the abundance and rapid turnover of viruses in the ocean (Noble and Fuhrman, 2000), these findings call for more extensive field and experimental studies in order to determine what fraction of the energy and nutrient flux is channeled through the proposed viral link and how it affects marine ecosystems.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in the OSF https://osf.io/7pm3u/, doi: 10.17605/OSF.IO/ 7PM3U. Protist SAG assemblies are available through NCBI Bioproject PRJNA655200.

AUTHOR CONTRIBUTIONS
JMB and RS conducted the analyses and interpreted the data. JMB generated all the figures. JL, JB, and NR developed initial virus detection workflows. NP and MS were involved with original experimental design and cell sorting. RL provided Mediterranean SAG data. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
We thank the staff of the Bigelow Laboratory Single Cell Genomics Center for the generation of single cell genomics data, all members of the Blanes Bay Microbial Observatory sampling team and Haiwei Luo for providing bacterial SAG data. We additionally thank the thoughtful and thorough comments from reviewers.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb. 2020.524828/full#supplementary-material Supplementary Figure 1 | (A-F) Phylogenetic trees of 18S rRNA genes from SAGs from this study with closest relatives. Leaves representing sequences from this study are colored blue (Boothbay Harbor, Gulf of Maine) or yellow [Blanes Bay Microbial Observatory, Mediterranean Sea (BBMO)], and outer circles indicate SAGs that contained either a virus found in SAGs from multiple phyla (inter-phylum), viruses with similar sequences found in more than one SAG (shared virus), and singleton viruses found only in the indicated SAG (singleton virus).
Supplementary Figure 2 | Phylogenetic comparison of microvirus VP1 genes from microvirus genomes identified in picoeukaryote SAGs from the Gulf of Maine to other microvirus VP1 genes. In some cases, multiple microvirus contigs were identified within the same SAG, so sequences from this study are indicated in the inner bar, and the node text indicates "(SAG identifier)-(contig number)".
Supplementary Figure 3 | BLAST alignment of contigs from the largest cluster of viral sequences with the longest member of the cluster. Colors on the x-axis indicate the phylogeny of each SAG from which the contigs came.
Supplementary Table 1 | Information about samples used for this study.
Supplementary Data Sheet 1 | GenBank accession IDs used to identify ssDNA viruses within SAGs.