Novel Virophages Discovered in a Freshwater Lake in China

Virophages are small double-stranded DNA viruses that are parasites of giant DNA viruses that infect unicellular eukaryotes. Here we identify a novel group of virophages, named Dishui Lake virophages (DSLVs) that were discovered in Dishui Lake (DSL): an artificial freshwater lake in Shanghai, China. Based on PCR and metagenomic analysis, the complete genome of DSLV1 was found to be circular and 28,788 base pairs in length, with a G+C content 43.2%, and 28 predicted open reading frames (ORFs). Fifteen of the DSLV1 ORFs have sequence similarity to known virophages. Two DSLV1 ORFs exhibited sequence similarity to that of prasinoviruses (Phycodnaviridae) and chloroviruses (Phycodnaviridae), respectively, suggesting horizontal gene transfer occurred between these large algal DNA viruses and DSLV1. 46 other virophages-related contigs were also obtained, including six homologous major capsid protein (MCP) gene. Phylogenetic analysis of these MCPs showed that DSLVs are closely related to OLV (Organic Lake virophage) and YSLVs (Yellowstone Lake virophages), especially to YSLV3, except for YSLV7. These results indicate that freshwater ecotopes are the hotbed for discovering novel virophages as well as understanding their diversity and properties.


INTRODUCTION
Virophages are small double stranded DNA (dsDNA) viruses that act as parasites of giant DNA viruses (La Scola et al., 2008;Fischer and Suttle, 2011;Yau et al., 2011;Gaia et al., 2014). Virophages are considered parasitic because they co-infect unicellular eukaryotes along with large DNA viruses, but are dependent on the large DNA virus to replicate which results in the deformation and abortion of the large DNA viruses (La Scola et al., 2008). The first virophage, Sputnik, was isolated in conjunction with the giant DNA virus mamavirus, a relative of mimivirus, from a water-cooling tower in Paris, France (La Scola et al., 2008). Several other virophages were subsequently identified, including Mavirus, a virophage infecting Cafeteria roenbergensis virus (CroV), and Zamilon, a close relative of Sputnik, which was isolated with the Mimiviridae related virus Mont1 (Fischer and Suttle, 2011;Gaia et al., 2014). In 2015, the viral family of Lavidaviridae was proposed for virophages, which currently contains two proposed genera: Sputnikvirus and Mavirus (Krupovic et al., 2016).
More recently, metagenomic analyses have been utilized to discover diverse and unique virophages from large datasets worldwide. Seven complete genomes of Yellowstone Lake virophages (YSLVs) were assembled from Yellowstone Lake metagenomic datasets (Zhou et al., 2013. The Organic Lake virophage (OLV) was identified from a hypersaline meromictic lake in Antarctica and was considered as an essential manipulator for the carbon flux and energy cycle through the ecosystem (Yau et al., 2011). A nearly complete genome of Ace Lake Mavirus (ALM) was assembled from Ace Lake metagenomic datasets (Zhou et al., 2013). Early this year, rumen virophages (RVPs), a new group of hybrid virophages of virophage-Polinton, were discovered in a sheep rumen metagenome (Yutin et al., 2015). Interestingly, eukaryotic Polinton transposons appeared to bear viral characteristics (Krupovic et al., 2014) and to be involved in the evolution of eukaryotic virus, transposon, and plasmid (Krupovic and Koonin, 2015). A cryoconite virophage was also assembled from dsDNA viromes from cryoconite hole ecosystems of Svalbard and the Greenland Ice Sheet (Bellas et al., 2015). To date, diverse virophages have been discovered in Europe (La Scola et al., 2008;Desnues et al., 2012;Gaia et al., 2014;Bellas et al., 2015), the Americas (Fischer and Suttle, 2011;Zhou et al., 2013Zhou et al., , 2015Campos et al., 2014), and even Antarctica (Yau et al., 2011;Zhou et al., 2013;Zablocki et al., 2014). However, reports of virophages are lacking in Asia, Africa, and Australia.
In this study, we use PCR and metagenomic analyses to shed light on the diversity and distribution of virophages in freshwater ecosystems in China by examining water samples from lakes and rivers in East China. From these data, a novel group of virophages were identified in Dishui Lake (DSL), Shanghai, China. They are more closely related to YSLVs, especially to YSLV3, than to other known virophages. The DSLVs were consistently detected in DSL and their closely related relatives were found in the neighboring freshwater environments of DSL. The diversity and distribution of the virophages observed in DSL highlight the importance of virophages in China's freshwater ecosystem.

Sampling and DNA Extraction
Surface water samples (depth <1 m, 500-1000 mL/site) were collected in sterile glass bottles using an electromotion aspiring pump, and kept on ice during delivery to our laboratory. The longitude and latitude of each sampling site were recorded along with sampling time (Table 1, Figure 1). Water samples were immediately filtered through 0.22 µm membrane (GSWP, Merck Millipore) upon receipt. After drying at room temperature, the membrane filters were cut into small pieces, and DNA was extracted from the sample using QIAamp Fast DNA Stool Mini Kit according to the manufacturer's instructions. Final DNA concentrations for each sample were determined using a microplate reader (BioTek) and stored at −20 • C before use.

Primers Design
Eight pairs of primers were designed using the MCP genes of eight virophages (Ace Lake Mavirus, Mavirus, Sputnik, OLV, YSLV1, 2, 3, and 4; Table 2) and synthesized by Sangon Biotech. Primer specificity was checked in silico based on NCBI blast.

PCR
PCR reactions (25 µL) contained 0.4 mM forward and reverse primers ( Table 2), 12.5 µL Taq PCR master mix 2 × (Sangon Biotech), and 20-25 ng of genomic DNA. The PCR cycle conditions are given in Table 3. After agarose gel electrophoresis and ethidium bromide staining, PCR amplicons were visualized with a Gel Doc XR+ system (Bio-Rad). PCR products were purified from the gel using a universal DNA purification kit (Tiangen Biotech), then ligated into the pUCm-T vector (Sangon Biotech). Recombinant plasmids were transformed into TOP10 cells (Tiangen Biotech), which were streaked on agar plates containing 50 µg/mL ampicillin and grown overnight at 37 • C. Three positive colonies were selected for Sanger sequencing (Sangon Biotech).

Long-Term Detection of DSL Virophages
Dishui Lake water samples (depth <1 m, 500-1000 mL) were collected monthly from Oct. 2013 to Sep. 2014. Their DNA extraction, PCR by using the YSLV3 MCP gene specific primers and sequencing were the same as described above.

Metagenomic Sequencing
Up to 6 µg of genomic DNA extracted from Dishui Lake water samples was used to construct two metagenomic libraries that contained 430 bp and 3.0 kb inserts, respectively. Paired-end sequencing (2 × 251 bp) was performed using an Illumina Miseq sequencer (Shanghai Personal Biotechnology).

QC of Metagenomic Datasets
The quality of metagenomic datasets was determined using FastQC software (Andrews, 2010). The average quality of five neighboring nucleotides was larger than Q20, with read lengths of more than 50 bp. Following FastQC analysis, the NGS QC Toolkit (Patel and Jain, 2012) was used to filter the reads again, with the default parameters, to ensure high quality reads suitable for assembly.

The Assembly of DSL Virophage Genomes
Python scripts (Supplementary Material) were modified according to Metavir 2 (Roux et al., 2014) in order to perform sequence alignment for mining virophages related reads in DSL environmental metagenomic data sets. All reads in the DSL metagenomic datasets were firstly aligned onto 12 known virophage genomes using tblastx with E-value of 10 −5 . Subsequently, the YSLV3 genome was used as a query to perform tblastx with E-value < 10 −10 , which allowed one hit per read, against the DSL metagenomic datasets. Retrieved YSLV3 related reads were assembled de novo with minimum overlap length of 25 bp and minimum overlap identity of 80%. Each of these assembled contigs was used as a reference sequence to which all reads from DSL metagenomic datasets were aligned with minimum overlap length of 25 bp and minimum overlap identity of 90%. The reference assemblies were repeated until the assembled sequences stop extending. All assemblies were performed with the bioinformatics software Geneious Pro (Biomatters).

PCR Verification of DSLV1 Genome
Eight pairs of primers (Table 4) were designed to verify regions of the DSLV1 genome where the coverage of reads was low or ambiguous. PCR reactions (25 µL) contained 0.4 mM forward and reverse primers (Table 4), 12.5 µL Taq PCR master mix 2 × (Sangon Biotech), and 20-25 ng of genomic DNA. The PCR cycle conditions were as follows: initial denaturation at 94 • C for 4 min, followed by 30 cycles of 94 • C for 30 s, 54 • C (primers 1-4 and 6-8) or 56 • C (primer 5) for 30 s, and 72 • C for 45 s. Sequences obtained by PCR and sequencing (Sangon Biotech) were then aligned with the assembled genomes to confirm the accuracy of the sequence assembly.

Phylogenetic Analysis
The three conserved protein-encoding genes: ATPase, MCP, and Pro from the DSLVs were subjected to phylogenetic analysis. Amino acid sequences were aligned using MUSCLE (Edgar, 2004), and phylogenetic trees were reconstructed using the JTT model with PhyML and 100 iterations (Guindon et al., 2005).

Detection of DSLVs in Neighboring Water Environments
Water was sampled from Dianshan Lake, Dazhi River, Yangtze River Estuary, Yangshan harbor, and Gouqi island (Table 1, Figure 1). Sampling, DNA extraction and PCR were the same as described above. The YSLV3 MCP gene specific primers were used for screening since they completely matched the targetsequences of DSLV1 MCP gene.

PCR Detection of Virophages
Samples from five different lakes (Xi Lake, Qiandao Lake, Xuanwu Lake, East Lake, and Dishui Lake) in China were analyzed for virophages by using the MCP gene specific primers of eight known virophages (Figure 1, Table 2). As a result, PCR products using these primers were not detected in 4 of the lake water samples: Xi Lake, Qiandao Lake, Xuanwu Lake, and East Lake. However, PCR using the YSLV3 MCP gene specific primers successfully amplified DNA from the Dishui Lake water samples. The length of the PCR products was almost identical to that of the theoretical target sequence of YSLV3 MCP gene ( Table 2). The PCR products from Dishui Lake were purified and subjected to cloning and sequencing. The sequences were searched against the NCBI non-redundant database using the blastx algorithm. The best blastx hit was the MCP gene sequence of YSLV6 with an expect value of 8e −15 and 33% of identity. When searched against a local dataset containing all translated ORFs of all known virophages, the obtained sequences had the highest sequence similarity with the MCP gene sequence of YSLV3 (89% of identity, E-value: 4.3e −106 ).
Additional 12 Dishui Lake water samples collected monthly from Oct. 2013 to Sep. 2014 were virophage positive based on the PCR analysis by using the YSLV3 primers, and their sequences were all the same. It indicated that the virophages existed stably in Dishui Lake.

Metagenomic Sequencing
To ensure both the depth and quality of metagenomic sequencing, six independent runs of Illumina sequencing were performed, which correspondingly yielded six data sets ( Table 5). In total, 98,347,716 reads were obtained and subjected to sequence quality control analysis. As a result, 74,429,646 high quality reads were obtained and used for subsequent analysis. The average read length was ∼177 bp ( Table 5).

Sequence Alignment Analysis of Metagenomic Datasets
To examine the abundance of virophages-related sequences in the DSL metagenomic datasets, sequence alignment analysis was performed by using 12 known virophage genomes. Reads that were similar to OLV and YSLVs (n = 3337) were much more abundant than those related to Mavirus, ALM, Sputnik, and Zamilon (n = 582) (Figure 2). Notably, YSLV3-related reads were most abundant (n = 1199, 30.6% of total reads, E-value: 1e −5 ), and 22.4% of reads were similar to the MCP gene of YSLV3 (Figure 2). Accordingly, the YSLV3-related reads were considered for subsequent genomic sequence assembly.
FIGURE 2 | Sequence alignment analysis of DSL metagenomic data sets. Red dots represent the recruited reads that shared significant sequence similarity with the virophage genomes. The numbers on the X-axis indicate the position and length (in base pairs) of the virophage genomes. The Y axis shows the percentage of sequence identity shared between the recruited reads and virophage genomic sequences.

Assembly of DSLVs
To be more stringent, the YSLV3 genome was used as the query to search against DSL data sets with tblastx and Evalue of 1e −10 . 280 YSLV3-related reads retrieved from the DSL metagenomic datasets were assembled de novo, resulting in 47 contigs that ranged from 151 to 1560 bp in size ( Table 6). All contigs exhibited sequence similarity to that of YSLV3 (Table 6). Therefore, these contigs were used as references for sequence assembly against the DSL metagenomic datasets. As a result, contig1 was the longest assembly of about 28 kb ( Table 6) and contained a direct repeat of 189 bp at both ends, indicating the Dishui Lake virophage represented by contig1 contained a circular genome. Eight additional assembled contigs were longer than 5 kb, including four longer than 10 kb.

Genomic and Phylogenetic Analysis
The DSLV1 genome encodes 28 predicted ORFs, and more than half of the ORFs share higher sequence similarity with YSLV3 (33-70%) than to the other YSLVs, including five conversed virophage genes: putative DNA helicase (HEL), packaging ATPase (ATPase), cysteine protease (PRO), major capsid protein (MCP), and minor capsid protein (mCP) (Figure 3, Tables 7, 8). In addition, whole genome alignments revealed five highly conserved regions that are shared between DSLV1 and YSLV3 (Figure 4). These results suggest that DSLV1 is closely related to YSLV3.  Interestingly, the DSLV1 ORF2 shares 27% amino acid identity with the hypothetical protein BpV2_168 from the giant DNA virus Bathcoccus sp. RCC1105 virus BpV2. In addition, DSLV1 ORF25 shared 45% of amino acid identity with the hypothetical protein NY2A_B677R from the Paramecium bursaria Chlorella virusNY2A. Such evident genetic link between giant algal viruses and virophages might result from horizontal gene transfer (Table 8).
To better understand the relationship between DSLV1 and previously identified virophages, the concatenated amino acid sequences of three conserved genes of ATPase, PRO, and MCP were used to reconstruct phylogenetic linkages between all the virophages. DSLV1 clustered with YSLV3 ( Figure 5A), indicating their closer relationship, as revealed based on the gene content and genome alignment analyses presented above, in comparison to other virophages.
ORFs obtained from the other 46 contigs isolated from Dishui Lake were also analyzed to better understand the diversity of virophages in the Dishui Lake region. In total, we identified six complete MCP genes and six nearly complete MCP genes. All the complete MCP ORFs were used for the phylogenetic tree construction. As shown in Figure 5B, all seven Dishui Lake virophages along with YSLV3 formed a monophyletic group with 100% bootstrap support.

Distribution of DSLVs in Neighboring Water Environments
To understand whether DSLVs were unique to DSL, water samples were taken from three neighboring freshwater of Dazhi River (inflow of Dishui Lake), Dianshan Lake and Yangtze River Estuary, and two coastal surface sea water of Yangshan harbor and Gouqi island, and screened by PCR for the presence of the DSLVs using the YSLV3 MCP gene specific primers. PCR products were obtained from all three freshwater samples, and when sequenced, these amplicons shared more than 90% of sequence similarity with that of DSLV1. In contrast, DSLVs were not detected in the two sea water samples.

DISCUSSION
In our previous study, virophages were found to be distributed worldwide, and freshwater environments appeared to contain the highest abundance of virophages (Zhou et al., 2013). In addition, most of the known virophages are distantly related to each other (Bellas et al., 2015;Yutin et al., 2015;Zhou et al., 2015). Therefore, the genetic diversity of virophages and their potential roles in different environments remain enigmatic.
To understand the diversity of virophages in freshwater lakes in China, we performed PCR on samples isolated from five different lakes using virophage MCP genes specific primers. Since the MCP genes of the available virophage genomes are not conserved enough to design degenerate primers, eight pairs of primers were designed individually according to each genome. The presence of virophage in lake surface water was detected in one lake, Dishui Lake, and the presence of these virophages were steadily detectable for a whole year. These results indicate that Dishui Lake virophages existed in the lake over a long time, as opposed to a transient phenomenon, and are worthy of further exploration. In addition, the inability to PCR amplify virophage DNA from the other lakes that surveyed here does not exclude the possibility of far distantly related virophage being present that were not detected using the methods in this study.
To shed more light on the diversity of virophages in Dishui Lake, subsequently, the metagenomic analysis was conducted using the DNA extracts that were virophage positive based on the PCR detection. Initially, the virophage-related MCP gene sequence obtained by PCR was used as a reference sequence for the assembly. However, the sequence assembly terminated after extending to 1.9 kb (data not shown). This suggests that the corresponding virophages were not the most abundant in Dishui Lake. Subsequently, the sequence alignment analysis of the DSL metagenomic data sets revealed that the YSLV3-related virophages were more abundant than virophages related to the other known virophages, such as Sputnik and Mavirus. The seven virophage MCP genes obtained from DSL were phylogenetically grouped with that of YSLV3 as well, which indeed supports the close relationship between DSLVs and YSLV3. This results also suggest that DSLVs and YSLV3 likely infect similar giant viruses and unicellular eukaryotes present in these two lakes, although there are significant differences between DSL and YSL with respect to origin, size, location, geography, physicochemical properties, and human activities (Zhang et al., 2014); http://www. nps.gov/yell/learn/nature/geology.htm). Coincidently, the RNV (Rio Negro virophage), associated with Samba virus, was detected in the Rio Negro River in Brazil (Campos et al., 2014), which shared 100% sequence similarity with the MCP gene of Sputnik that was discovered in a water-cooling tower in Paris, France (La Scola et al., 2008). Taken together, these results suggest that viruses move between different biomes, and their diversity could be high on a local scale but relatively limited globally (Breitbart and Rohwer, 2005;Ignacio-Espinoza et al., 2013).
Interestingly, DSLVs and their close relatives appear to distribute widely throughout the freshwater environments in Shanghai. In contrast, they were not detected in the two neighboring sea water samples, although we did observe some reads in the viral metagenomic data sets derived from Yangshan deep harbor and Gouqi island which were similar to Sputnik (Wang et al., unpublished). Accordingly, these results indicate that the giant virus and the eukaryotic hosts of virophages are likely different in the freshwater and sea water ecotopes.
Both the giant virus and protist hosts of the DSLVs we identified remain unknown. Some virophages share homologous genes with their associated giant viral hosts (Zhang et al., 2015). We identified homologs to both Bathycoccus sp. RCC1105 virus BpV2 and Paramecium bursaria Chlorella virus NY2A genes in the DSLV1 genome (ORF2 and 25 respectively). Both of these are algae-infecting large dsDNA viruses and respectively belong to the genera of Prasinovirus and Chlorovirus (King et al., 2011). In a separate study, putative virophage genes as well as giant virus inserts were detected in the nuclear genome of the unicellular alga Bigelowiella natans (Blanc et al., 2015). These data suggest that algae possibly serve as the prey of giant viruses as well as their parasitic virophages, and DSLVs may be the parasites of giant viruses that infect algae. Further, work needs to focus on such potential tripartite relationships in China's aquatic ecosystem.
In conclusion, novel virophages were discovered for the first time in an artificial freshwater lake in Shanghai, China, and were found to be widely distributed in the neighboring freshwater bodies. These results further confirm the global distribution and genetic diversity of virophages in freshwater environments.

AUTHOR CONTRIBUTIONS
Conceived and designed the experiments: SY, YW Performed the experiments: CG, XZ Analyzed the data: CG, WZ, YW Contributed analysis tools: HW, GS, JX, YP Wrote the paper: CG, WZ, YW All authors have read and approved the final manuscript.