The “Neglected Viruses” of Taihu: Abundant Transcripts for Viruses Infecting Eukaryotes and Their Potential Role in Phytoplankton Succession

Drivers of algal bloom dynamics remain poorly understood, but viruses have been implicated as important players. Research addressing bloom dynamics has generally been restricted to the virus-infection of the numerically dominant (i.e. bloom forming) taxa. Yet this approach neglects a broad diversity of viral groups, limiting our knowledge of viral interactions and constraints within these systems. We examined hallmark virus marker genes in metatranscriptomic libraries from a seasonal and spatial survey of a Microcystis aeruginosa bloom in Lake Tai (Taihu) China to identify active infections by nucleocytoplasmic large DNA viruses (NCLDVs), RNA viruses, ssDNA viruses, bacteriophage, and virophage. Phylogenetic analyses revealed a diverse virus population with seasonal and spatial variability. We observed disproportionately high expression of markers associated with NCLDVs and ssRNA viruses (consistent with viruses that infect photosynthetic protists) relative to bacteriophage infecting heterotrophic bacteria or cyanobacteria during the height of the Microcystis bloom event. Under a modified kill-the-winner scheme, we hypothesize viruses infecting protists help suppress the photosynthetic eukaryotic community and allow for the proliferation of cyanobacteria such as Microcystis. Our observations provide a foundation for a little considered factor promoting algal blooms.


INTRODUCTION
Harmful algal blooms (HABs) are extreme biological events that have detrimental environmental and socioeconomic effects on both fresh and salt waters. In particular, freshwater HAB events during the last two decades have posed threats to potable water supplies and socioeconomic resources for population centers (Qin et al., 2010;Steffen et al., 2014;Bullerjahn et al., 2016).
Freshwater HABs commonly manifest as a summer bloom of cyanobacteria that follows from a winter/spring population of eukaryotic phytoplankton (Ke et al., 2008;Niu et al., 2011;Edgar et al., 2016). HAB research has historically focused on abiotic "bottom-up" controls, such as nutrient or temperature triggers that stimulate cyanobacterial growth (Tang et al., 2018). However, biological success can also arise during suppression of competitors, and there is less known about the decline of the eukaryotic population and factors that constrain these populations as cyanobacterial blooms form. And while grazing is often cited as the major driver of "top-down" regulation on communities (Vanderploeg et al., 2001), viruses also act as predators of algae.
Research over the past two decades has made it clear that viruses are central to ecosystem function (Wigington et al., 2016). There are many cases of direct viral modulation of phytoplankton populations in the literature, but these focus almost entirely on the lysis of dominant bloom-forming species (Hewson et al., 2001;Wilson et al., 2002;Tomaru et al., 2004;Brussaard et al., 2005;Steffen et al., 2017). In particular, there is less focus on numerically less-abundant protists that may be subject to virus-mediated suppression. In part this may be due to the general consensus that viruses of prokaryotes are the most abundant and active in aquatic systems (Weinbauer, 2004), or that many of the viruses infecting protists contain RNA genomes (Moniruzzaman et al., 2017) and are thus not detected by DNA-based metagenomics. These "neglected" viruses have the potential to control community dynamics, reducing the success of competitors. The "kill-the-winner" model invokes this important role of viral suppression of competitive hosts in microbial community assembly (Winter et al., 2010): however, the kill-thewinner model assumes equilibrium dynamics, and is historically thought of in the context of heterotrophic bacteria (Thingstad, 2000). Nevertheless, suppression of competing hosts by viruses, consistent with kill-the-winner, could be important in the selection of dominant, bloom-forming phytoplankton.
To explore the role of the "neglected" viruses, we examined metatranscriptomic data from a Lake Tai (China, or Taihu in Mandarin) bloom event during 2014. Lake Tai experiences yearly successions from photosynthetic protists to cyanobacteria (dominated by Microcystis) and therefore presents an opportunity to study the impact of viruses during and after that transition ( Supplementary Figure 1; Ke et al., 2008;Niu et al., 2011). We assessed gene markers (Moniruzzaman et al., 2017;Stough et al., 2018) for active infections by nucleocytoplasmic large DNA viruses (NCLDVs), ss/ds RNA viruses, ssDNA viruses, as well as bacteriophage that were infecting non-Microcystis populations. In parallel, we characterized the putative host community structure. We detected a broad spatial and temporal richness of viruses, including a shift between the two dominant virus types (Microcystis-infecting phage and ssRNA viruses). In addition to observations of active eukaryotic viruses, determinations of putative host diversity and statistical co-occurrence analyses of "who infects whom" reveal potential advantages for Microcystis that account for part of its ability to form large bloom events in fresh waters.

Sample Collection
Samples were collected monthly (June to October) from a Microcystis spp.-dominated bloom in Lake Tai, China in 2014 at several stations across the northern parts of the lake as described previously Stough et al., 2017). Biomass was collected by filtering between 25 and 180 mL (volume dependent on sample biomass density) of lake water through 0.2-µm nominal pore-size Sterivex TM filters (EMD Millipore Corporation, Burlington, MA, United States) and immediately treated with ∼2 mL of RNAlater (Qiagen) for preservation. Samples were collected in a manner consistent with the separation of bacteria and phytoplankton from dissolved materials: free virus particles generally pass through these filters. Environmental parameters and nutrient data were collected with each sample and have been reported elsewhere (Tang et al., 2018). Total RNA was extracted, quality checked, ribosomally reduced, and sequenced on the Illumina TM HiSeq platform at HudsonAlpha Institute Genomic Services Laboratory (Huntsville, AL, United States) as previously described Stough et al., 2017;Tang et al., 2018). Raw sequence data was retrieved from the HudsonAlpha servers and primarily handled in the CLC Genomics Workbench v. 10.1.1 suite (QIAGEN, Hilden, Germany). Quality scoring was used to remove reads with a score <0.3. Trimmed reads from all 35 samples were combined into a single file and assembled using MEGAHIT to reduce redundancy that can be seen within different contig fragments from identical genes (Li et al., 2015).

Virus Detection
We employed a marker gene-based approach to discover viruses within the Taihu samples (Moniruzzaman et al., 2017;Stough et al., 2018). This protocol is publicly available . A reference protein database was created from hallmark viral genes from a diverse range of taxonomic groups with sequences downloaded from the NCBI refseq database and Uniprot. Hallmark genes included the NCLDV major capsid protein (25 sequences), RNA virus RNA-dependent RNA polymerase (79 sequences), ssDNA viral replicase (11 sequences), and bacteriophage ribonucleotide reductase (1347 sequences) from characterized viral isolates. For virophage major capsid proteins, 9 sequences from virophage isolates as well as 7 from well-studied metagenomic assemblies were used. All data used to establish base phylogenies are available ( Table 1 and Supplementary Material), and were selected to capture the known diversity in each sample (or as much known diversity as available in databases). Several marker genes were used to discover bacteriophage, including gp20, gp23, integrase, and ribonucleotide reductase, with the results of the latter presented in this paper. The other marker genes revealed no additional phage, nor did the phage finding tool, VirSorter (Roux et al., 2015). DNA-directed RNA polymerase beta subunit (Rpb1/RpoB) sequences (3,112 total) from microbial isolates were used as a gene marker to identify potential eukaryotic and prokaryotic hosts (  The assembled library was queried with each virus-specific reference protein database in command line BLASTx v.2.6.0+ with an e-value maximum of e −10 for viral candidates and e −30 for host candidates. A lower stringency was applied to potential viral candidates given the limited number of viral isolates in reference databases. Our separate stringency cutoffs for virus and hosts attempt to eliminate the implicit bias toward broader diversity in the hosts caused by the larger available database. A hidden Markov model search was used to validate our blast method but yielded 11 fewer viral candidates and provided no new candidates (data not shown). A custom python script was used to identify the coordinates of the aligned sequence within the query contig and extract only this region: these extracted components of the contigs were subsequently considered "viral candidates" . Only viral candidates greater than 150 nucleic acids were kept for further analyses. These remaining candidates were then queried against the NCBI refseq database, and any candidates with non-viral hits were removed from the analysis and considered falsepositives. In preliminary work, we found that candidates less than 150 nucleic acids did not consistently produce positive blast hits to viral sequences in the refseq blast. Candidates were considered to represent "near complete" genes if they were larger than the smallest reference protein in length. Shorter candidates were considered and are discussed as "gene fragments." RNA virus candidates were further analyzed using the pfam v. 29 database to identify ORFs in the full-length contigs (contigs before extraction of target gene sequence). Contigs containing both structural and non-structural components were considered "near complete genomes" (data not shown) (Moniruzzaman et al., 2017).

Viral Phylogeny and Expression
The hallmark gene reference proteins and "near complete" gene candidates were used to reconstruct maximum likelihood phylogenies as described in Moniruzzaman et al., 2017 (Supplementary Table 2). Gene fragment candidates were placed on trees using pplacer and subsequently visualized in iTOL v.4 (Matsen et al., 2010;Letunic and Bork, 2019). All reference proteins used to reconstruct phylogenies came from viruses that have been isolated in labs and sequenced, with the exception of some virophage sequences. Mavirus, Sputnik, and Zamilon remain the only virophage that have been isolated, purified and sequenced to date (La Scola et al., 2008;Fischer and Suttle, 2011;Gaia et al., 2014), while three Chrysochromulina parva Virus Virophages (CpVVs denoted Curly, Larry, and Moe) have been isolated but not purified beyond unialgal lysates (Stough et al., 2019). Several intact virophage have been assembled from metagenomic analyses and were employed here to make our phylogenetics more robust (Zhou et al., 2013(Zhou et al., , 2015. Relative virus and host activity in each sample was quantified by mapping trimmed reads from each sample library to the extracted viral or host candidates using a 90% similarity fraction over a 90% length fraction in CLC Genomics Workbench 10.0 (Supplementary Table 3). The abundances for each viral type therefore only represent the expression of the singular target gene (and not other components of the contig). Transcript abundance data were normalized by the length of the extracted viral portion of the candidate contig and by the library size of that sample. Viral transcript expression was visualized using Heatmapper (Babicki et al., 2016). A nonmetric multidimensional scaling (nMDS) plot using Bray-Curtis dissimilarity was generated in Primer7 and used to describe the influence of environmental factors on the abundance data (Clarke and Gorley, 2015).

Cluster Analyses
We worked from the assumption that virus transcripts in cells represent active infections and that viruses require a host population to be present to generate these transcripts. Pearson correlation coefficients were established from square root transformed normalized transcript abundances using Primer 7 (Clarke and Gorley, 2015). A SIMPROF test (α = 0.5) was used to identify significant clusters of co-occurrences. Significant clusters containing both a putative host and putative virus candidate with correlation coefficients greater than 0.8 were visualized in Cytoscape (Smoot et al., 2011) with the nodes representing the viral or host candidate and the edges representing the strength of the correlation.

Data and Code Availability
Raw sequencing data is publicly available in the MG-RAST database (Keegan et al., 2016) under the name "Lake_ Taihu_metatranscriptome_project." Sequencing information and quality control details have been previously reported (Tang et al., 2018). Assembled data is also available in the MG-RAST database under the name "Taihu 2014 MegaHit Assembly 1." The python script used to extract the aligned portion of blast hits is available in GitHub at https://github.com/Wilhelmlab/generalscripts/blob/master/Pound2019_Extract_aligned.py.

RESULTS AND DISCUSSION
Our analyses revealed an active and diverse virus community in the Microcystis-dominated HAB that occurred in Taihu during the 2014 summer sampling season. In total 8.42 × 10 8 transcripts across 35 samples collected during the bloom season passed QA/QC assessments. Across all stations ∼70% of the transcripts were designated to be cyanobacterial in origin (Tang et al., 2018), highlighting the "bloom" nature of this data set. Of the residual transcripts, ∼1.92 × 10 5 were associated with viruses. Phage assigned to groups known to infect Microcystis hosts comprised 47.9% of the total viral reads. The remaining 52.1% consisted of transcripts associated with a broad diversity of viruses. We assigned 827 unique viral candidate contigs to groups that included the NCLDVs, single-stranded and double-stranded RNA viruses, ssDNA viruses, non-Microcystis phage, and virophage.
The NCLDV group displayed the most richness (i.e. number of unique viral contigs) amongst the identified viral groups, with 457 candidates assigned to families ( Figure 1A). The majority of these candidates were assigned to the "Extended-Mimiviridae" or the Mimiviridae clades, which have been associated to date with many marine algae (Claverie and Abergel, 2018). The most abundant NCLDV candidate contigs were assigned to the Iridoviridae, a NCLDV family whose members infect insects and cold-blooded vertebrates (Chinchar et al., 2009). Overall, NCLDVs made up 8.35% of the total reads in the virus dataset.
We also observed virophage signatures in our dataset, although with less richness (30 unique candidates) than the NCLDVs (Supplementary Figure 2). Most candidates were more closely related to an environmentally derived sequence identified as the Dishui Lake Virophage (DSLV) (Gong et al., 2016). Transcripts associated with virophage were the least abundant in our dataset, making up only 0.25% of the total viral reads or <0.0001% of the total reads.
The use of a metatranscriptomic data from size-selected (>0.2 µm) populations allowed us to detect infections by both DNA and RNA viruses, the latter of which should primarily be cell-associated. We detected 268 unique RNA virus candidates ( Figure 1B). Overall, RNA viruses comprised 42.45% of the total virus transcripts, rivaling the 47.9% made up by Microcystis phage. Of the RNA virus candidates, 72% of the expression was assigned to ssRNA viruses and 28% to dsRNA viruses. The distribution between the number of single-stranded and double-stranded candidates was coincidentally identical, with 193 ssRNA virus candidates representing 72% of the richness and 75 dsRNA virus candidates representing 28% of the richness (Figure 1B). The dsRNA virus contigs belonged to Partitiviridae and Picobirnaviridae. Most of our ssRNA virus candidates belonged to the Picornavirales order, with the majority belonging to the Marnaviridae or Dicistroviridae families. Marnaviridae viruses, whose members include the Chaetoceros tenuissimus virus and the Rhizosolenia setigera virus, commonly infect marine diatoms Shirai et al., 2008). In our dataset, Marnaviridae-like viruses represent 12% of the total RNA virus reads, while Dicistroviridae-like viruses, which are typically associated with insect hosts (Valles et al., 2017), comprise 43% of our total RNA virus reads. Interestingly, the majority (86%) of the Dicistroviridae-associated transcripts come from a single candidate, which was highly expressed in most of our samples.
We took the opportunity to examine the RNA virus contigs to determine if any represented near or full-length virus genomes. Of our original 268 contigs, 16 were determined to contain a "near full-length" genome based on our parameters (Supplementary Figure 3). All of these viruses were singlestranded, and most belonged to Marnaviridae or Nodaviridae. Each meta-assembled virus contained both an RNA-dependent RNA polymerase and a structural gene, most commonly a capsid protein specific to the phylogenetic group.
As part of our classification efforts, we also observed evidence for both ssDNA viruses and dsDNA bacteriophage not associated with Microcystis spp. ("other" phage). However, neither group contributed much to the overall transcript abundance. The ssDNA viruses had only 17 unique candidates, which comprised only 0.42% of the total viral reads or <0.0001% of the total reads (Supplementary Figure 4). Eight candidates belonged to Nanoviridae, which typically infect plants. Ribonucleotide reductase markers indicated the presence of 14 unique candidates that were closely related to Microcystis phage. The remaining, "other, " phage had only 41 unique candidates, representing 0.63% of the total viral reads and 0.001% of the total reads (Supplementary Figure 5). The majority of the "other" phage belonged to Caudovirales, with several candidates most closely related to other cyanophages, such as Synechococcus phage.
In addition to the virus community, we examined eukaryotic and bacterial members of the putative host community. With 1,946 unique candidates, we observed a vast richness that spanned cellular phylogeny (Figure 2). Host transcripts associated with DNA-dependent RNA polymerase beta-subunit made up 0.06% of the total reads in the dataset. Candidates closely related to Microcystis spp. were the most abundant, with 52 unique candidate sequences representing 46% of total host reads.
Our dataset also provided the opportunity to investigate variation in virus communities (based on transcript abundance for individual groups) over the course of the 5-month bloom. Figure 3 details the read abundances of each viral group we targeted, with each column representing a different sample and each row representing a different category of virus (alternative color scheme available, Supplementary Figure 6). The colors indicate the sum of viral transcripts for each viral type in each sample, normalized to individual library size. In some samples, the ssRNA viruses showed similar proportions to the Microcystis lytic and lysogenic phage infections, suggesting that these viruses might be just as active. Of these 35 samples, markers for Microcystis phage lytically infecting their host had the most abundant representation in 10 samples, Microcystis phage in a lysogenic infection had the most abundant representation in 15, ssDNA viruses had the most abundant representation in 9, and dsRNA viruses had the most abundant representation in one sample (Supplementary Figure 7). According to a Pearson correlation analysis, this pattern was best explained from the environmental data sets (Tang et al., 2018) by pH and salinity (Supplementary Figure 7A).  Co-occurrence networks described putative virus and host interactions across our 35 samples. We established 20 significant co-occurrence clusters that ranged in membership from 2 to 11 members, with each network containing at least one putative virus and putative host. Twelve clusters contained a bacterial host with an RNA virus or a NCLDV, while four clusters contained a eukaryotic host with an RNA virus, a NCLDV, or ssDNA virus (Supplementary Table 4). Three clusters had both bacterial and eukaryotic hosts with a combination of virus types (Figure 4). An additional three clusters contained a virophage but no NCLDV, which are typically thought to co-infect eukaryotic host cells. In all three cases, the virophage co-occurred with an RNA virus (Supplementary Table 4).
The gene marker approach used in this study allowed us to characterize the viral community within a cyanobacteriumdominated HAB. This technique extended existing virus-finding  programs, such as VirSorter or VirFinder, powerful tools which are currently limited to phage and metagenomic analysis (Roux et al., 2015;Ren et al., 2017). While a similar approach has been used to study marine eukaryotic phytoplankton systems and even Sphagnum peat bogs (Moniruzzaman et al., 2017;Stough et al., 2018), it has not been used to explore freshwater cyanobacterial blooms, to our knowledge. Our results are consistent with the Aureococcus-dominated HAB transcriptome analyzed by Moniruzzaman et al. (2017) in relation to the presence of both NCLDVs and RNA viruses.
Both studies showed abundant Mimiviridae and extended Mimiviridae group members, although Moniruzzaman et al. (2017) did not distinguish between the two groups. Our results also showed a similar, smaller representation of Asfarviridae and Phycodnaviridae. Our ssRNA virus results are consistent with previous work that observed putative transcripts/viruses in this group, although several of Marnaviridae viruses were labeled "unclassified marine viruses" in the previous publication (Moniruzzaman et al., 2017). That article did not discuss dsRNA viruses, so it is unclear if our observed abundance (28% of RNA FIGURE 3 | Heatmap of normalized viral abundances for each sample and virus type with histogram of total viral reads and total library reads in each sample. Each column describes a single sample. Color scale describes viral read abundance normalized to library size. viral reads) of Picobirnaviridae and Partitiviridae is unusual. We also observed many virophage candidates, most of whom were closely related to the DSLV meta-assembly (a lake approximately 60 miles from our study site) (Gong et al., 2016).
Previously the viral component of metatranscriptomic analyses of Microcystis blooms have focused on the Microcystis phage Stough et al., 2017). However, our findings indicate that gene markers for RNA viruses, particularly ssRNA viruses, can reach similar transcript abundances during a bloom event. In many of our early season samples (June and July), we see higher relative abundances of ssRNA virus transcripts than the Microcystis lytic phage. These viruses are mostly members of Marnaviridae, whose members can infect photosynthetic eukaryotes including C. tenuissimus, R. setigera, and Heterosigma akashiwo Shirai et al., 2008). This suggests that these "neglected" viruses might play a role in the transition from a winter diatom bloom to the summer cyanobacterial blooms. Seasonal successions of this nature have been well-recorded in Lake Erie, another shallow lake plagued by summer Microcystis blooms (Saxton et al., 2012). Other virus-driven community controls have been hypothesized to occur in Lake Erie, with viral infection of parasitic fungal hosts hypothesized to contribute to the maintenance of winter diatom blooms (Edgar et al., 2016). In both scenarios, viral suppression of a predator or competitor enables a particular species or group to thrive.
A virus-mediated transition from diatoms to cyanobacteria can be explained within the context of a modified version of the "kill-the-winner" framework. As originally proposed by Thingstad and Lignell (Thingstad and Lignell, 1997), the "killthe-winner" hypothesis posits that competition specialistsgroups that would typically achieve high abundance due to abilities to assimilate resources and grow faster -are held at low densities when top-down pressure is strong (Thingstad and Lignell, 1997). This theory has been invoked to explain why heterotrophic bacteria do not outcompete phytoplankton when mineral nutrients, but not carbon, are limiting (Thingstad and Lignell, 1997). It has also been widely invoked as a mechanism to maintain diversity among bacteria (Winter et al., 2010;Kirchman, 2016), and it has been suggested that this framework is not limited to heterotrophic bacteria and might be applied to other co-occurring species in aquatic systems (Våge et al., 2013). Yet, it is not widely thought of in the context of phytoplankton bloom dynamics.
Silica may act as the constraining nutrient that catalyzes the transition of the community. Reductions in silica deposition are hypothesized to occur in association with increased pH conditions during Microcystis blooms, constraining diatom growth (Krausfeldt et al., 2019). In marine environments, silica limitation has been linked to increases in viral infection of diatoms (Kranzler et al., 2019). Acting together, nutrient limitation and viral infection could act as a negative feedback loop on diatoms that might provide an advantage for competitive species such as Microcystis.
The kill-the-winner hypothesis assumes equilibrium dynamics, and therefore is not implicated (or perhaps even appropriate) in selection during transient phytoplankton blooms. Bloom forming species are often thought of as opportunists, able to take advantage of ephemeral increases in resource availability, predominately through rapid cell division (Grover, 1990;Litchman and Klausmeier, 2001). In classical quantitative ecological theory, feedbacks on the nutrient environment and seasonal changes in environmental conditions control successional bloom dynamics, from opportunists to other groups (Margalef, 1978). In Taihu, diatoms are the opportunists. Abundant viruses consistent with those that infect dominant groups like C. tenuissimus and R. setigera imply that early diatom dominance may be terminated and subsequently suppressed by virus-mediated lysis. Collapse of diatoms would lead to opportunities for other groups, especially groups able to resist predation and infection. In this interpretation of the "kill-the-winner" hypothesis (Thingstad and Lignell, 1997), the potential for virus-induced cell lysis of the "neglected" community members could be one of the factors that indirectly allow for cyanobacterial bloom species to achieve dominance.
The latter season dominance of Microcystis depends on an ability to both consume the available nutrients and itself resist viral infection. One mechanism of Microcystis resistance is through high number of restriction modification systems (Zhao et al., 2018), but Microcystis may also minimize losses by forming a lysogenic state . In other bacterial systems, lysogeny can cause homoimmunity and allow microbes to escape secondary infection (aka super-infection) by other phage, ultimately prolonging the existence of the host (Donnelly-Wu et al., 1993). In the case of the Microcystis populations, this generates an ability for cells to escape (at least temporarily) the significant top down pressure from free Microcystis-infecting phage.
While we observed transcripts for viruses potentially capable of controlling a variety of protists, we saw a surprisingly limited richness and abundance in phage. The majority of phage transcripts were associated with Microcystis phage, which was not a surprise, given the dominance of the Microcystis host. However, Microcystis blooms are commonly accompanied by an abundant heterotrophic bacterial microbiome, particularly proteobacteria, who can play a large role in nitrogen cycling in this system (Wilhelm et al., 2011;Krausfeldt et al., 2017). We expected to observe a corresponding virus community, yet non-Microcystis phage accounted for less than 1% of total virus reads. This contradicts previous findings that phage are the most active viruses in aquatic microbial systems (Weinbauer, 2004) although it is consistent with transcriptomic observations in at least one other environment (Stough et al., 2019). This disparity may be the result of underestimation of phage sequences, our methodology, or a disproportionate expression of RNA virus transcripts. We note that a number of phage candidate sequences were disqualified as false positives during the BLASTx check against the non-redundant database. We identified these sequences as putative prokaryotes and removed them from further analysis and quantification. However, even if every disqualified candidate was considered incorrectly annotated and included in our quantifications, there were still ∼5× more transcripts associated with the non-phage "neglected" viruses than with the non-Microcystis phage. Other possibilities, including much larger burst sizes associated with RNA viruses or increased expression of the Rdrp gene marker are also possible explanations. However, the literature on this subject is patchy, with little information available focusing on insect viruses. To normalize for variation in expression of RNA virus genes during infection, an isolated infection transcriptome would be required for each algal virus we found.
Regardless, there is precedent to our findings; a similar gene marker approach also revealed higher richness and transcript abundance associated with NCLDVs and RNA viruses in a metatranscriptomic study of the microbiome of Sphagnum peat moss relative to the diversity of markers for phage gene expression . Another nucleic acid weighting approach also indicated a higher abundance of RNA viruses compared to DNA viruses in aquatic systems (Steward et al., 2013). It is unclear if our results show contradicting biology or capture previously overlooked and abundant virus types such as NCLDVs and RNA viruses.
Our co-occurrence analyses revealed populations that behave (in terms of persistence and abundance) in similar manners over the course of the bloom. We resolved 23 clusters that contained both host and virus transcripts with a statistically significant co-occurrence of expression across all 35 samples. While this is not directly indicative of a viral infection of the host in the cluster, it does imply these groups were responding to similar pressures in the system, either abiotic or biotic, and to some degree, their ecologies are thus "linked". For example, we discovered several virophage that clustered significantly with RNA viruses and ssDNA viruses with no obvious NCLDV candidate. As virophage have only ever been identified co-infecting a host with a NCLDV, the observed clusters likely do not represent active co-infections, but hypothetically two different viruses benefiting from similar (or parallel) scenarios. We also observe clusters that contain only bacterial hosts and NCLDVs, the latter of which are to date only known to infect eukaryotes. Again, these co-occurrence observations are likely describing community dynamics (i.e. microbiomes) as opposed to direct infections. These smaller clusters of viruses and potential hosts can be used to generate hypotheses regarding predator-prey cascades in a community and interspecies viral interactions.
Given the above information, we looked for data and relationships consistent with the hypothesis that non-Microcystis viruses were involved in the establishment and/or maintenance of the Microcystis bloom. We observed two statically significant co-occurrences including cyanobacterial species and a Marnaviridae-like virus (consistent with viruses that infect C. tenuissimus and R. setigera) (Supplementary Table 4). By constraining competing members of the microbial community, these neglected viruses have the potential to create opportunities for Microcystis populations to compete for nutrient resources. Our findings highlight the activity of protistinfecting viruses -particularly ssRNA viruses -that have been overlooked. The evidence described here provides the rationale for future studies pertaining to the role of indirect infections influencing bloom-forming species, and the complicated collection of drivers that result in broad global persistence of Microcystis.

DATA AVAILABILITY STATEMENT
Raw sequencing data is publicly available in the MG-RAST database (Keegan et al., 2016) under the name "Lake_ Taihu_metatranscriptome_project." Sequencing information and quality control details have been previously reported (Tang et al., 2018). Assembled data is also in the MG-RAST database under the name "Taihu 2014 MegaHit Assembly 1."

AUTHOR CONTRIBUTIONS
HP and SW established the project direction. LK and XT performed the environmental sampling and subsequent RNA extractions. HP, EG, MH, and MS were responsible for computational processing. HP, DT, and SW were responsible for interpreting the data and composing the manuscript.

FUNDING
This work was supported by grants from the National Science Foundation (DEB 1240870, IOS 1451528, and DBI 1530975) to SW, the BGSU Great Lakes Center for Fresh Waters and Human Health (NSF Grant OCE-1840715 and NIH Grant 1P01ES028939-01), the Key Research Program of Frontier Sciences, CAS (QYZDJ-SSW-DQC008), and the Kenneth and Blaire Mossman Endowment to the University of Tennessee.

ACKNOWLEDGMENTS
We thank the Taihu Laboratory for Lake Ecosystem Research (TLLER) for their help in sample collection.