Ecological Factors Affecting Infection Risk and Population Genetic Diversity of a Novel Potyvirus in Its Native Wild Ecosystem

Increasing evidence indicates that there is ample diversity of plant virus species in wild ecosystems. The vast majority of this diversity, however, remains uncharacterized. Moreover, in these ecosystems the factors affecting plant virus infection risk and population genetic diversity, two traits intrinsically linked to virus emergence, are largely unknown. Along 3 years, we have analyzed the prevalence and diversity of plant virus species from the genus Potyvirus in evergreen oak forests of the Iberian Peninsula, the main wild ecosystem in this geographic region and in the entire Mediterranean basin. During this period, we have also measured plant species diversity, host density, plant biomass, temperature, relative humidity, and rainfall. Results indicated that potyviruses were always present in evergreen oak forests, with a novel virus species explaining the largest fraction of potyvirus-infected plants. We determined the genomic sequence of this novel virus and we explored its host range in natural and greenhouse conditions. Natural host range was limited to the perennial plant mountain rue (Ruta montana), commonly found in evergreen oak forests of the Iberian Peninsula. In this host, the virus was highly prevalent and was therefore provisionally named mediterranean ruda virus (MeRV). Focusing in this natural host–virus interaction, we analyzed the ecological factors affecting MeRV infection risk and population genetic diversity in its native wild ecosystem. The main predictor of virus infection risk was the host density. MeRV prevalence was the major factor determining genetic diversity and selection pressures in the virus populations. This observation supports theoretical predictions assigning these two traits a key role in parasite epidemiology and evolution. Thus, our analyses contribute both to characterize viral diversity and to understand the ecological determinants of virus population dynamics in wild ecosystems.


INTRODUCTION
Viruses are the most frequent causal agents of emerging infectious diseases in crops (Anderson et al., 2004;Woolhouse et al., 2005), and are responsible of yield losses that may have great economic and social impact (Oerke, 2006;Vurro et al., 2010). As a consequence, most of our knowledge on plant-virus interactions comes from the study of viruses that cause diseases in crops. However, viruses are widespread in wild ecosystems (Cooper and Jones, 2006;Roossinck, 2010;Prendeville et al., 2012), and there is increasing evidence indicating that they have co-evolved with their wild hosts long before they were domesticated (Lovisolo et al., 2003;Gibbs et al., 2008;Pagán and Holmes, 2010). The advent of next generation sequencing (NGS) has allowed exploration of the virome of wild plant communities (Roossinck, 2012;Stobbe and Roossinck, 2014), and these pioneering studies have shown that plant viruses in wild ecosystems are far more diverse than in agroecosystems. Thus, the detailed genomic and biological characterization of this viral diversity is central to fully understand plant-virus interactions (Pagán et al., 2016).
Plant viruses are not only highly diverse and widespread in wild ecosystems, they may also be important ecological agents. For instance, quantitative resistance of wild plants to viruses have been described (Gilbert, 2002;Moreno-Pérez et al., 2014), suggesting that these may affect the host population composition. Also, viral infection can drastically reduce the number of individuals in the host populations by decreasing the competitive and reproductive abilities of infected plants (Anderson et al., 2004;Malmstrom et al., 2005;Vijayan et al., 2017). Although these studies show evidence that in wild host populations plant viruses may have great impact, little is known on their epidemiology and evolution and on the associated determinants (Pagán et al., 2016). The vast majority of the studies on this subject focused in plant viruses that are typical crop pathogens (Sacristán et al., 2004;Cooper and Jones, 2006;Webster et al., 2007;Rodelo-Urrego et al., 2013;Tugume et al., 2016). However, crop viruses are not necessarily the most prevalent in wild ecosystems, and the impact of native plant viruses in the functioning of wild ecosystems could have been underestimated. For instance, wild ecosystems are abundant in long-lived perennial plants (Blanco et al., 2005). In contrast, agroecosystems are often dominated by annual plants -or managed as such -, and viruses adapted to wild perennials may over-compete crop viruses (Wren et al., 2006;Roossinck, 2010). In addition, human management of the host population has a great impact on plant virus populations (Pagán et al., 2012;Alexander et al., 2014;Rodelo-Urrego et al., 2015), so that plant viruses native from wild ecosystems may have different population dynamics than those typical in agroecosystems. Consequently, the factors driving the infection risk and population genetic diversity might be different than those identified for crop viruses (Fraile et al., 2017).
Analyzing the determinants of infection risk and population genetic diversity of native viruses in wild plant communities may be also relevant to understand their emergence. Emergent parasites are defined as those whose infection risk increases following its appearance in a new host population, or whose infection risk increases in an existing host population (Woolhouse, 2002). Consequently, understanding the determinants of parasite infection risk may contribute to understand the factors driving emergence (Keesing et al., 2006;Johnson et al., 2015). Because the rate of parasite emergence has accelerated in the last four decades (Anderson et al., 2004;Jones, 2009), understanding the determinants of emergence has become a long-standing goal of biological research. Changes in host ecology that increase infection risk have been proposed to be key determinants of emergence. Indeed, a recent theory known as the Dilution Effect links two such ecological factorsecosystem species diversity and host density -to this process. This theory posits that reduced species diversity increases the density of the focal host species, facilitating parasite transmission and increasing infection risk, and eventually leading to emergence (Keesing et al., 2006;Ostfeld and Keesing, 2012). For plant viruses infecting wild hosts, experimental analyses of these predictions are scant, and do not always support the Dilution Effect (Malmstrom et al., 2005;Borer et al., 2010;Pagán et al., 2012). These contradictory results have been in part attributed to differences in virus traits such as host range and viral vectors. Consequently, the virus life history strategy -specialist (narrow host range) vs. generalist (wide host range) -, and ecological factors affecting vector populations (temperature, relative humidity, and rainfall) could also affect virus infection risk (Pagán et al., 2012). However, the role of these ecological factors in virus emergence has not been analyzed to date. Finally, it is important to note that ecological changes that affect infection risk may also result in genetic changes in the virus population (Grenfell et al., 2004;Archie et al., 2009;Pybus and Rambaut, 2009). Higher virus infection risk would lead to larger parasite population size, accelerating evolutionary rates (Scholle et al., 2013;Lanfear et al., 2014), which could ultimately result in higher population genetic diversity. Therefore, ecological factors that affect infection risk may also be important determinants of virus population genetic diversity. Again, this relationship has been seldom tested for plant-virus interactions in wild ecosystems (Rodelo-Urrego et al., 2015).
The focal wild ecosystem in this study is the evergreen Holm Oak (Quercus ilex L.) forest. This is the most extended wild ecosystem in the Mediterranean basin, covering approximately three million ha (Patón et al., 2009). In this geographical area, evergreen oak forests have a great ecological importance. They host annual and perennial plant communities that are typically associated with holm oak trees, and are also the refuge for the characteristic Mediterranean fauna (Lumaret et al., 2002). Evergreen oak forests have also great economic value, as they are appreciated as hunting reserves and holm oak acorns are used for animal feeding (Costa et al., 2011). Despite this ecological and economic relevance, the factors affecting virus infection risk and population genetic structure in evergreen oak forests remain largely unexplored. About 60% of Mediterranean evergreen oak forests are located in the Iberian Peninsula, where they represent the most frequent wild ecosystem (Ramírez and Díaz, 2008).
Given that agricultural lands occupy 50% of the peninsula, evergreen oak forests are often adjacent to agroecosystems (Zamora et al., 2007), which favors plant virus dispersal between these two ecosystems (Alexander et al., 2014). Thus, analyzing virus diversity and population dynamics in this wild ecosystem may also be of interest to understand virus epidemics in crops.
We have focused in the species of the genus Potyvirus of plant viruses, which represent ∼30% of all known plant viruses. Potyviruses infect species from all major botanical families, and are transmitted by aphids (King et al., 2012). Most virus species in this genus are major crop pathogens (e.g., Walsh and Jenner, 2002;Quenouille et al., 2013), and some of them have been reported to infect wild plant species commonly found in evergreen oak forests (Malpica et al., 2006;. Together, these observations suggest that they may be important ecological agents in evergreen oak forests, and with potential to cause epidemics in crops. Potyviruses have tubular flexuous capsids, which encapsidate a monopartite single-stranded RNA genome ∼10,000 nucleotides. The genome is characterized by a single major open reading frame (ORF) encoding a large polyprotein that is processed into 10 functional proteins: the first protein (P1), helper component protease (HC-Pro), third protein (P3), 6K1, cylindrical inclusion protein (CI), 6K2, viral protein genome-linked (VPg), small nuclear inclusion protein (NIa), large nuclear inclusion protein (NIb), and coat protein (CP). Two additional proteins, P3N-PIPO and P3N-ALT, are originated through frameshifts in the P3 cistron (Chung et al., 2008;Hagiwara-Komoda et al., 2016), although they are not present in all members of the genus.
Here, we analyze the prevalence of species from the genus Potyvirus in evergreen oak forests of the Iberian Peninsula, as a measure of infection risk, based on surveys done between 2013 and 2016. This analysis identified a new potyvirus highly prevalent in its perennial host mountain rue (Ruta montana L.), commonly found in evergreen oak forests of the Iberian Peninsula. We provide information on the prevalence, genetic diversity and host range of this virus, provisionally named mediterranean ruda virus (MeRV). Using this host-virus interaction, we analyze the ecological factors affecting MeRV prevalence and population genetic diversity, and we explore the potential of MeRV for dispersing into crops by determining MeRV prevalence in cultivated fields located nearby evergreen oak forests where MeRV is present.

Field Sampling
Six locations of evergreen oak forest located in the Iberian Peninsula were visited between the summer of 2013 and the spring of 2016. The locations were distributed among a transect of 200 km (north-south) in the center of the Iberian Peninsula (Figure 1 and Supplementary Table S1). At each location, we defined a plot of 25 m × 25 m, which was divided in a grid of 1 m × 3 m. At each of these rectangles, leaves of one individual of the most abundant plant species (including herbaceous and non-herbaceous vegetation) were harvested. This sampling scheme allowed identifying the position of each sampled plant within the 25 m × 25 m plot, and therefore estimating plant density as the inverse of the mean Euclidean nearest neighbor distance (McGarigal et al., 2009). A total of 200 samples were collected at each location. Individuals representative of each collected sample were also harvested, inventoried in herbariums and their botanical family and species were determined. With this information, we calculated plant species richness (S) as the number of species at each location and visit, and plant species relative abundance (number of individuals of a given species/200). Samplings were performed in summer, autumn and spring to account for seasonal differences in plant species composition and phenological stage, and in aphid activity. A total of 10,800 samples were collected in these surveys. At each location and visit we also recorded maximum plant height and plant coverage in eight 1 m × 1 m squares within the plot. With these measures we calculated plant biovolume in the plot (m 3 ) by averaging values in the eight squares. We also collected from nearby weather stations information on minimal, maximal and average temperature ( • C) relative humidity (%), and on rainfalls (mm) in the months when each visit was done.
We also visited cultivated fields of melon (Cucumis melo L.), tomato (Solanum lycopersicum L.), and pepper (Capsicum annuum L.) located near to the sampled locations of evergreen oak forests (Figure 1 and Supplementary Table S1). At least two fields from each plant species were visited in summer, spring, and autumn, and at each visit between 20 and 50 individuals were sampled. For this aim, three leaves of different branches from one out of every three plants were collected along a fixed itinerary.

Potyvirus Detection and Identification
Infection by potyvirus species was detected in total RNA preparations from leaves using the cetyltrimethylammonium (CTAB) -polyvinyl pyrrolidone (PVP) method (Chang et al., 1993), which allowed efficient RNA extraction from all the collected plant species. The presence of virus species within the genus Potyvirus was analyzed by One-step SYBR Greenbased real-time RT-PCR in the LightCycler R 480 Real-Time PCR System (Roche Diagnostics, Mannheim, Germany). For each run, 10 ng of total RNA were added to the Brilliant III SYBR R Green Ultra-Fast QRT-PCR Master Mix (Agilent Technologies, Santa Clara, CA, United States) following the manufacturer's recommendations. Universal primers NIb2F and NIb3R (Zheng et al., 2010) for species of the genus Potyvirus were used to amplify a region of ∼350 nucleotides of the gene that encodes the NIb. The thermal profile consisted of a 5-min pre-incubation step at 65 • C, 10-min RT step at 50 • C and 5 s of Taq polymerase activation at 95 • C, followed by 50 cycles of PCR at 95 • C for 10 s (denaturation), 50 • C for 20 s (annealing), and 72 • C for 30 s (extension). RNA purified from turnip mosaic virus (TuMV), a well characterized potyvirus species, was always included as positive control. Amplification was confirmed in electrophoresis in 1.2% agarose gel stained with ethidium bromide in Tris-acetate-EDTA (TAE) buffer. PCR products of the expected length were purified using the StrataPrep DNA Gel Extraction Kit (Agilent Technologies) and sequenced. Sequences were assembled using MEGA 6 (Tamura et al., 2013). Nucleotide identity of the obtained sequences with the potyviral sequences available in GenBank was analyzed using BLAST 1 . Following the ICTV criteria (King et al., 2012), sequences with a nucleotide identity over 55% with any known species of the genus Potyvirus were considered as belonging to this genus, and sequences with nucleotide identity between 55 and 76% were considered as belonging to a non-previously described species of this genus (Adams et al., 2005). Total potyvirus prevalence was calculated as the percentage of infected individuals relative to the total number of analyzed plants. Prevalence for each virus species was calculated as the percentage of infected plants relative to the total number of plants collected from a given host species.

Biological Characterization of MeRV
To determine the host range and the symptoms induced by MeRV infection, different plant species of the botanical families Amaranthaceae, Asteraceae, Boraginaceae, Chenopodiaceae, Cistaceae, Cucurbitaceae, Fabaceae, Fagaceae, Poaceae, Rutaceae, and Solanaceae were inoculated ( Table 1). These species were chosen to include the most abundant species in the sampled 1 http://blast.ncbi.nlm.nih.gov/Blast.cgi evergreen oak forests, and the most common crops in the surrounding areas. For each plant species, four to seven plants (10-15 days old) were mechanically inoculated by applying purified virion RNA (100 ng/ul) in 0.1 M Na 2 HPO 4 onto the first two completely expanded leaves dusted with carborundum. The inoculated plants were maintained in a greenhouse (20-25 • C, and 16 h of light), and symptoms were weekly recorded over an 8-week period. Tissue from systemically infected leaves of symptomatic and asymptomatic plants was collected 20 days post-inoculation and tested for potyvirus infection by real-time RT-PCR using specific primers qCPFor (5 -GACTGACTATAGTTTAGCGCGC-3 ) and qCPRev (5 -GC CTCTGATAGCTGCTGCTTTC-3 ) that amplify a 111-bp region of the MeRV CP gene.

Sequencing of the MeRV Genome
Total plant RNA of a single field-collected mountain rue (Ruta montana L.) from El Pardo that tested positive for MeRV infection (MeRV-ParP17) was treated with TURBO DNAfree TM Dnase (Life Technologies, Carlsbad, CA, United States). After treatment, ribosomal RNA (rRNA) was removed using Ribo-Zero TM Plant Leaf kit (Epicenter-Illumina, Madison, WI, United States). For genome sequencing, 3 µg of total RNA was used for library preparation and subjected to high-throughput NGS using the Illumina platform (HiSeq2000, 2 × 125 bp length, at the Centre for Genomic Regulation, Barcelona, Spain), generating 20 million paired-end reads. Adapters and lowquality sequences from NGS data were removed using Seqtk 2 . MeRV genome was assembled using a reference-guided read mapping to the phylogenetically nearest potyvirus genomes as implemented in Bowtie 2.0 (Langmead and Salzberg, 2012). BLAST was used to identify and remove possible chimeric reads contained in the alignment during the assembly. The consensus sequence of the MeRV genome was extracted using the Fasta Alternate Reference Maker tool in the Genome Analysis ToolKit (GATK) (McKenna et al., 2010). Importantly, no other viral sequences were detected in NGS data, in accordance with RT-PCR and virion purification analyses (see Results).
2 https://github.com/lh3/seqtk/ The NGS-derived MeRV genomic sequence was confirmed using the same mountain rue plant RNA preparation as template for RT-PCR, utilizing different sets of specific primers. Primer pairs were designed to produce 12 fragments in such a way that adjacent fragments overlapped by at least 100 nt, covering the potyvirus polyprotein (Supplementary Table S2). Expand TM Reverse Transcriptase (Sigma-Aldrich, St. Louis, MO, United States) was used for retrotranscription and PCR Phusion R High-Fidelity DNA Polymerase (New England BioLabs, Beverly, MA, United States) for PCR amplification. The 5 and 3 ends of the virus were obtained by rapid amplification of cDNA ends (RACE) using Ambion TM FirstChoice R RLM RACE kit (Life Technologies). All fragments were sequenced, and assembled with MEGA 6, revealing 100% nucleotide identity between overlapping fragments and with the NGSderived nucleotide sequence. The same nucleotide sequence was also obtained using RNA from purified virions as template.
The ORF of the MeRV polyprotein was identified with ORF Finder 3 . PredictProtein (Rost et al., 2004) and the NCBI's conserved domain database (CDD)-Search service (Marchler-Bauer et al., 2007) were used to identify the putative cleavage sites and the conserved sequence domains of the polyprotein. Molecular weight of the putative viral proteins was predicted from protein sequence by using the Molecular Weight tool 4 . The genomic sequence of MeRV-ParP17 was deposited in GenBank under accession number MF953305.

Phylogenetic Analyses
Phylogenetic relationships between MeRV and the other members of the genus Potyvirus were analyzed using the nucleotide and amino acid sequences of the polyprotein. To study these relationships a collection of the reference sequences of the potyviruses compiled from GenBank were obtained (Supplementary Table S3). Nucleotide and amino acid sequences of the reference strain of each potyvirus were aligned with the corresponding sequences of MeRV-ParP17 using MUSCLE (Edgar, 2004). Alignments were used to construct phylogenetic trees utilizing Bayesian Markov Chain Monte Carlo (MCMC) methods as implemented in MrBayes v3.2.6 ( Ronquist and Huelsenbeck, 2003). Alignments were run using the general timereversible substitution model with invariant sites and a gamma distribution of among-site rate variation (GTR+I+ 4 ) for nucleotides, and the Whelan and Goldman (WAG) substitution model for amino acids. All analyses were run until relevant parameters converged, with 25% of the MCMC chains discarded as burn-in. Maximum clade credibility (MCC) trees, with Bayesian posterior probability values providing a measure of the robustness of each node, were also summarized from the MrBayes tree samples.
Nucleotide identities (%) of the complete MeRV genome with those of the two phylogenetically closest virus species were compared using SimPlot v3.5.1 (Lole et al., 1999). Plots of nucleotide identity were obtained using the MeRV genome as the query sequence, and a sliding window of 200 nt that was moved across the alignment in steps of 20 nt.

Genetic Diversity and Selection Pressures in the MeRV Population
The genetic diversity of the MeRV population was estimated based on the CP gene sequence of 69 MeRV isolates (Acc No. MF953306-MF953374). For this purpose, total RNA extracts of MeRV-infected plants were used to RT-PCR amplify the viral CP gene (837 nt), utilizing specific primers CPFor (5 -CCA AAGCTTGAACAAGAGAGAATTGTTTCG-3 ) and CPRev (5 -ACACCAAGCATGKTRTGCATAT-3 ), and PCR products were sequenced. Virus genetic diversity (π) as average pairwise nucleotide difference between sequences, and the mean number of non-synonymous (d N ) and synonymous (d S ) nucleotide substitutions per site, were estimated using the Kimura-2parameter nucleotide substitution models implemented in MEGA 6, as this was selected as the best-fitted nucleotide substitution model in jModelTest v. 2.1.10 (Darriba et al., 2012). Selection pressures were estimated as the d N /d S ratio. Standard errors of each measure were based on 1,000 bootstrap replicates. Also, the number of MeRV haplotypes (H), and the haplotype diversity , where x i is the frequency of an haplotype and n is the sample size (Nei and Tajima, 1981)] were calculated using DnaSP v.5 (Librado and Rozas, 2009).

Detection of Recombination in the MeRV Population
Potential recombination breakpoints in the CP and NIb genes were detected using either the MeRV sequences from 69 isolates or including sequences from these genes of all known potyviruses. Recombination was detected utilizing four different methods based on different assumptions (Posada, 2002) as implemented in the RDP4 package 5 : RDP, GENECONV, Bootscan, and Chimera, and employing the default parameters and a Bonferroni correction P-value cut-off of 0.05 (Martin et al., 2015). Only recombination signals detected by all methods were considered as positive to minimize false positives. With this criterion, no MeRV recombinants were detected. Analyses using the more relaxed criterion of considering as recombinants those detected by two or more methods to minimize false negatives yielded the same result.

Statistical Analyses
Differences in host susceptibility and in MeRV prevalence between seasons in the same sampling year (summer, autumn, and spring), between sampling years at each season, and between geographic locations were compared using Chisquare tests. These statistical analyses were done using SPSS 21.0 (SPSS, Inc., Chicago, IL, United States). Mixed effect multiple regression tests were used to analyze the association between ecological factors and MeRV prevalence and population genetic diversity parameters (Burnham and Anderson, 2002). We considered the following factors as 5 http://darwin.uvigo.es/rdp/rdp.html predictors of MeRV prevalence: host plant density and relative abundance, species richness, plant biomass, temperature, relative humidity and precipitations in the sampled locations (minimal, maximal, and average values), and season. The same ecological factors, with the addition of MeRV prevalence, were used as predictors of virus population diversity parameters. A set of models that included a global model containing all ecological factors as fixed predictors (except season that was considered as covariate), and nested models that contained all possible combinations of these predictors, were fitted for each response variable using general linear mixed models (R-library: ASreml-R 3). Models were constructed using a simultaneous autoregressive variance-covariance matrix to account for time-dependency and covariation between predictor variables. Global and nested models were ranked according to Akaike's Information Criteria (AIC), and the model with the lowest AIC score was selected as the best-ranked model. The relative importance of the predictors included in each model was calculated by variance components (R-library: ASreml-R 3).

Potyvirus Prevalence in Evergreen Oak Forests and Isolation of a New Virus Species
Infection by potyvirus was detected in every season during the monitored period, and prevalence was calculated as the percentage of infected individuals relative to the total number of analyzed plants. Average potyvirus prevalence in evergreen oak forests of the Iberian Peninsula ranged between 1.06 and 3.43%, depending on the season and sampling cycle. In the analyzed sampling cycles, potyvirus prevalence was higher in summer than in spring and autumn (χ 2 ≥ 5.03, P ≤ 0.025), prevalence being similar in the later two seasons (χ 2 ≤ 1.58, P ≥ 0.208).
Identification of the potyviruses present in the 171 infected plants indicated the presence of five virus species: Zucchini yellow fleck virus (ZYFV) (34/171), which was detected in six host plant species with a prevalence ranging between 4.76 and 61.54%; Clover yellow vein virus (ClYVV) (10/171), which was detected in two plant species with prevalence of 2.86-13.33%; and Endive necrotic mosaic virus (ENMV) and Turnip mosaic virus (TuMV) (2/171 each), which were found infecting individuals of a single plant species each, with prevalence of 3.51% and 4.65%, respectively. Interestingly, the remaining identified potyvirus accounted for the largest fraction of infected plants (123/171, 71.93%). This virus species was only detected in mountain rue (Ruta montana), a perennial plant that showed the highest potyvirus prevalence (41%) among the host plant species identified in evergreen oak forests. None of the other detected potyvirus was found in mountain rue plants. The comparison of the NIb of this rue-infecting virus with those of all known potyviruses revealed a nucleotide sequence identity of 75% with the closest species. This suggested that this was a novel species of genus Potyvirus that we provisionally named MeRV. For further characterizing the virus, tissue of one mountain rue plant sampled in El Pardo and infected with MeRV was selected. This plant tested negative for other species of the genus Potyvirus and for cucumber mosaic virus (CMV) by RT-PCR. More importantly, analyses of the NGS sequence data obtained from total RNA extracts of this field-infected mountain rue plant did not detect any other viral sequence. Tissue of the infected field plant was grinded in 0.1 M phosphate buffer pH 7, 0.02% sodium diethyldi-thiocarbamate (DIECA), and used to inoculate Nicotiana benthamiana plants. After 20 days, N. benthamiana plants developed a systemic mosaic. MeRV virions were purified from these plants according to the protocol described by Sánchez et al. (2013). Agarose gel electrophoresis of the virion preparation showed a single band, indicating that all virions in the preparation had the same size. Indeed, agarose gel electrophoresis of nucleic acid extracts from these virions showed a single band of ssRNA of about 9,000-10,000 nt (not shown). The same NIb nucleotide sequence amplified from the field-infected mountain rue plant was obtained from virion preparations using universal potyvirus primers. Together, these results indicated that a single virus species was transferred from mountain rue to N. benthamiana plants, and that this species belonged to the genus Potyvirus, discarding the possibility of mixed viral infection. This isolate was named MeRV-ParP17, and its virions were used for further characterization.

Analysis of the Complete Nucleotide Sequence of MeRV
The MeRV genomic sequence was obtained by NGS sequencing and confirmed by RT-PCR. The virus full-length genome sequence consists of 9,560 nt excluding the 3 terminal poly(A) tail. As for all members of the genus Potyvirus, the MeRV-ParP17 genome has a single ORF encoding a polyprotein of 3,077 amino acids with an estimated molecular weight of 350 kDa. This ORF encodes all typical proteins, consensus cleavage sites and conserved catalytic domains present in the potyviruses (Figure 2). The P1 protein, a serine protease, contains the strictly conserved catalytic triad His 208 -(X 8 )-Asp 217 -(X 30 )-Ser 248 with the conserved GHSG 246−249 motif around the active serine site. In the HC-Pro, the protease active site residues are located at Cys 344 , contained in the conserved sequence GYCH 342−345 , and at His 417 . Also, a conserved cysteine-rich region was identified in the N-terminal region of HC-Pro stretching from Cys 27 to Cys 58 . HC-Pro motifs involved in aphid transmission (RITC 52−55 and PTK 310−312 ), genome replication (IGN 152−154 ) and systemic movement (CCC 292−294 ) (Urcuqui-Inchima et al., 2001) are also conserved in MeRV-ParP17. The P3 cistron contains the P3N-PIPO ORF. The 5 end of this ORF starts with the highly conserved motif among potyviruses G 2 A 6 and it has a length of 70 codons, terminating in a UAG stop codon at nucleotides 672-674 of the P3 cistron. The cylindrical inclusion (CI) protein cistron includes the conserved RNA helicase motifs located between residues 85 and 359, and the NIa contains the conserved His 46 , Asp 81 , Cys 151 , His 167 cysteine protease catalytic tetrad. Conserved sequence motifs in RNA-dependent-RNA polymerases of positive strand RNA viruses were also identified in the MeRV-ParP17 NIb protein. Finally, the CP cistron contains a NAG motif, essential for aphid transmission in some potyviruses (Wylie et al., 2002), and located at residues 10-12 in the N-terminal region of the CP. The Arg 170 and Asp 214 residues, involved in virion assembly and cell-to-cell movement (Dolja et al., 1994(Dolja et al., , 1995, are located in the conserved core region of the MeRV-ParP17 CP. The virus 5 untranslated region (UTR) is 149 nt long, an average length for potyviruses (<200 nt). The 3 UTR is 180 nt long excluding the poly(A)-tail and is rich in AU segments, which is a common feature among potyviruses.
The complete nucleotide sequence of the MeRV-ParP17 polyprotein was aligned with those of available fully sequenced potyviruses (n = 107). The percentage of nucleotide identity of the MeRV-ParP17 genome with that of the other potyviruses ranged from 50 to 66%, showing highest identity with bean yellow mosaic virus (BYMV) (65%) and clover yellow vein virus (ClYVV) (66%). Accordingly, Bayesian phylogenetic trees revealed that MeRV-ParP17 clustered together with BYMV and ClYVV (Figure 3). Following the approach described by Mbanzibwa et al. (2011) SimPlot analyses were performed to compare the percentage of nucleotide identity between MeRV-ParP17 and BYMV/ClYVV across the whole genome (Figure 2). When each cistron was analyzed individually, the MeRV-ParP17 P1 and P3 cistrons showed the lowest nucleotide identity with the same cistrons of BYMV and ClYVV (Figure 2). The remaining cistrons showed a sequence identity with the corresponding ones of BYMV and ClYVV ranging on average between 57 and 69%. Equivalent results were obtained using the amino acid sequences (not shown).

MeRV Host Range
The MeRV-ParP17 host range was determined by inoculating purified virions of MeRV-ParP17 into 10 wild species belonging to seven botanical families, which represent the most abundant species in evergreen oak forests, and 13 plant species of six botanical families, including Cucurbitaceae and Solanaceae species that are commonly cultivated or found in the area surrounding the surveyed evergreen oak forests. The results are shown in Table 1. None of the 10 wild species from evergreen oak forests were hosts of MeRV, except its original reservoir mountain rue (6/6 plants infected), in which virus infection was asymptomatic. Only three Solanaceae species were infected by MeRV-ParP17: N. benthamiana (5/5), C. annuum (4/6), and S. lycopersicum (6/6), with infection in these three species causing mild mosaic symptoms. There was no variation in susceptibility across hosts (χ 2 = 0.81, P = 0.668).

Analysis and Ecological Determinants of MeRV Infection Risk in Evergreen Oak Forests
We used MeRV prevalence as a measure of virus infection risk.
MeRV prevalence in mountain rue plants of Iberian evergreen oak forests did not significantly varied between geographic   locations (χ 2 ≤ 2.74, P ≥ 0.098), and this factor was not considered in subsequent analyses. On the other hand, MeRV prevalence varied across seasons between 18% in spring 2015 and 74% in summer 2014 ( Table 2). Average MeRV prevalence did not vary between sampling cycles. However, during the first and second cycle the prevalence was higher in summer than in spring and autumn (χ 2 ≥ 11.43, P < 1 × 10 −5 ), whereas in the third sampling cycle the highest prevalence occurred in autumn (χ 2 ≥ 4.79, P ≤ 0.029) ( Table 2). Given that the MeRV host range included two crop species, tomato and pepper, commonly cultivated near the sampled evergreen oak forests, we analyzed MeRV prevalence in these crops during the same time span. We also included melon fields in our surveys, as, together with pepper and tomato, this was the major cultivated plant in the area. MeRV was never detected in melon tomato and pepper plants.
To further understand the ecological determinants of MeRV infection risk in evergreen oak forest, we considered the following factors as predictors of MeRV prevalence: host plant density and relative abundance, species richness, plant biomass, temperature and relative humidity (minimal, maximal, and average values), and rainfall in the sampled locations ( Table 2). We included season as a covariate to account for the dependency of MeRV prevalence over time. To analyze the association between these traits and MeRV prevalence, we used multiple regression model selection analyses (Table 3). The best-ranked model contained plant density as the only predictor, such that there was a positive association between mountain rue density and MeRV prevalence (r = 0.68; P = 0.046) (Figure 4A). This model closely competed with that containing rainfall as the only predictor, which showed a negative association between MeRV prevalence and rainfall (r = −0.68; P = 0.052) ( Table 3 and Figure 4B). The rest of the models showed much poorer predictive power ( i > 2) (see footnote of Table 3).

Analysis and Ecological Determinants of MeRV Population Genetic Diversity in Evergreen Oak Forests
Most of the MeRV sequenced isolates were collected at El Pardo, the location with the highest density of mountain rue plants. The low number of isolates from other locations prevented analyzing the genetic structure of the virus population according to geographic location. However, MCC trees indicated that isolates from location other than El Pardo did not form a monophyletic cluster (data available upon request). Hence, to test whether changes in ecological factors and MeRV prevalence across seasons affected the genetic diversity of the virus population, we only utilized the CP sequence of 69 MeRV isolates from El Pardo, collected along the three sampling cycles ( Table 4). Using this sequence data set, we estimated the number of haplotypes (H), the haplotype diversity (H d ) and the average nucleotide genetic diversity (π) of the MeRV population at each visit. MeRV population haplotype number, haplotype diversity and genetic diversity greatly varied between seasons (H = 2-9; H d = 0.50-1.00, and π = 0.001-0.007) ( Table 4). Model structures included host plant density, host relative abundance, plant species richness and plant biomass; and temperature, relative humidity, and rainfall in the sampled locations (minimal, maximal, and average values) as predictors, and season as covariate. MeRV prevalence was also included as predictor of virus evolution parameters. Best-ranked models are shown. * The relative importance (%) of each predictor variable is shown in parenthesis. † Correlation coefficient. Asterisks indicate significant correlations (P < 0.05). ‡ Akaike's Information Criterion. § i , is the difference between the AIC of a given model and that of the best-ranked model, and quantifies how models compete (best-ranked model: i = 0; substantial empirical support: i = 1-2; considerable less support: i = 2-7; and no support; i > 10) (Burnham and Anderson, 2002). ¶ AIC model weight as ω i = exp(−0.5 i )/ exp(−0.5 i ). The larger the ω, the greater the likelihood of the model relatively to the competing models. Maximum ω = 1.
We analyzed the ecological factors affecting MeRV population H d , π, d N , d S , and d N /d S using again multiple regression model selection analyses. We utilized the same variables as for predicting prevalence but in this case, we added MeRV prevalence as an additional predictor ( Table 3). The best-ranked model explaining H d contained prevalence as the only predictor, and indicated a positive association between both traits (r = 0.79; P = 0.011) ( Table 3 and Figure 4C). This model closely competed with that including MeRV prevalence and plant species richness as main predictors (r = 0.73; P = 0.055), virus prevalence having much higher relative importance than plant species richness (89.8 and 10.2%, respectively). The model containing MeRV prevalence and host relative abundance also showed i < 2 (r = 0.70; P = 0.046), with virus prevalence having again the highest relative importance (99.1%) ( Table 3). In addition, the model best explaining MeRV population genetic diversity (π) contained MeRV prevalence and plant species richness as predictors (r = 0.74; P = 0.022), prevalence having the highest relative importance (77.2%). This model closely competed with that containing MeRV prevalence as the sole predictor (r = 0.64; P = 0.042) (Figure 4D), or this variable together with rainfall (r = 0.43; P = 0.541). However, this later model did not show a significant association between the predictors and the response variable ( Table 3). None of the tested models accurately predicted d N (P ≥ 0.106) ( Table 3). On the other hand, the best ranked models predicting d S and d N /d S contained MeRV prevalence as predictor, solely (Figures 4E,F) or in combination with plant species richness and rainfall. In all these models, MeRV prevalence had always much higher predictive power (relative importance ≥ 72%) than the other predictors (relative importance ≤ 25%) ( Table 3).

DISCUSSION
Increasing evidence indicates that there is ample diversity of plant viruses in wild ecosystems. The analysis of the prevalence, genetic diversity and structure of plant virus populations has proven essential to understand the epidemiology and evolutionary biology of plant viruses, and to identify the determinants of virus populations dynamics and of the associated disease epidemics (reviewed by Burdon and Thrall, 2008;Pagán et al., 2016;Escriu, 2017). However, most of the virus species present in wild ecosystems remain uncharacterized FIGURE 4 | Bivariate relationships between MeRV prevalence/population evolutionary parameters and the best predictor variable. In each case, best predictor variable was identified by model selection analyses. Regressions of host density (A) and rainfall (B) on MeRV prevalence; and regressions of MeRV population haplotype diversity (C), population genetic diversity (D), number of synonymous mutations per site (E) and overall selection pressures d N /d S (F) on MeRV prevalence (% of mountain rue infected plants) are represented. Note the different scales on the axes depending on the specific parameter analyzed. (Roossinck, 2012;Stobbe and Roossinck, 2014), and the determinants of their epidemiology and evolution are largely unknown (Pagán et al., 2016). Here, we provide a detailed characterization of a new virus species isolated in evergreen oak forests of the Iberian Peninsula. This virus has the genome structure characteristic of the genus Potyvirus, for which the species demarcation criteria include: (1) different inclusion body morphology, (2) differences in host range, (3) overall nucleotide sequence identity not higher than 76%, and (4) capsid protein sequence identity not higher than 80% (Adams et al., 2005;King et al., 2012). Although we have not analyzed the inclusion body morphology, the reported characteristics of this virus fits with the other three demarcation criteria to be considered a new species in the genus Potyvirus, for which the name MeRV is proposed.
In evergreen oak forests, MeRV accounted for the largest proportion of potyvirus infections, and was highly prevalent in its only wild host: the perennial plant mountain rue (Ruta montana). Thus, in the analyzed ecosystem MeRV behaves as a specialist virus, i.e., viruses able to infect one or a few closely related host species (Futuyma and Moreno, 1988). This virus would therefore represent one of the first examples of its kind, as most plant viruses described in wild perennial plants naturally infect broader ranges of plants (Cooper and Jones, 2006). Specialist viruses are often extremely well adapted to their host(s). This means that these parasites are highly transmissible and efficiently exploit host resources to maximize its fitness (Woolhouse et al., 2001;García-Arenal and Fraile, 2013). Theoretical models of host adaptation assume that parasite transmission is positively correlated with its multiplication and negatively correlated with its virulence. In turn, multiplication is positively associated with virulence, which lead to a tradeoff between transmission and virulence. Thus, these models predict that parasite fitness would be optimal at intermediate levels of virulence (Anderson and May, 1982). Despite behaving as specialist, MeRV infection did not induced any apparent symptom in mountain rue, neither in the field nor in the greenhouse. Similarly, infections by any of the other potyviruses detected in evergreen oak forests, which also showed narrow host ranges, were also asymptomatic. This would be in agreement with previous studies that generally reported a lack of obvious symptoms associated with virus infections in wild plants (Cooper and Jones, 2006;Roossinck, 2010;Prendeville et al., 2012 Hence, our data strongly suggests that MeRV is an important modulator of the population dynamics of its host. On the other hand, this does not seem to be the case in crop hosts, as we failed in detecting the virus in pepper and tomato fields. Therefore, although a potential threat for these crops, MeRV infection in pepper and tomato is rare in the surveyed areas. Focusing in the MeRV-mountain rue interaction, we analyzed the ecological factors affecting virus infection risk. Multivariate models indicated that host plant density was the major predictor of MeRV infection risk, such that the higher the host density the higher the virus prevalence. Therefore, our results support theory attributing a key role in infection risk to host density (Keesing et al., 2006;Ostfeld and Keesing, 2012), and are also in agreement with previous work in other plant virus-wild host interactions that also identified host density as a key factor for virus infection risk (Malmstrom et al., 2005;Borer et al., 2010;Pagán et al., 2012;Rodelo-Urrego et al., 2013). These theoretical and experimental works also identified ecosystem species diversity as a key determinant of virus infection risk. At odds, species diversity was not a good predictor of MeRV infection risk in wild mountain rue populations of evergreen oak forests. However, it should be noted that the effect of species diversity on infection risk is linked to its association with host density/abundance (Johnson et al., 2015). Interestingly, we did not find a significant association between density/abundance of mountain rue and number of plant species in evergreen oak forests (r ≤ 0.15; P ≥ 0.699), which could explain the poor predictive power of species diversity. Also, previous analyses identifying species diversity as a determinant of plant virus infection risk focused in viruses with a generalist strategy or at least with several hosts in the monitored ecosystem. The absence of alternative host species for MeRV in evergreen oak forests could minimize the effect of species diversity. These observations would support the idea that the applicability of the Dilution and the Amplification Effect hypotheses would depend on the plant community composition (Johnson et al., 2015). Finally, rainfall was negatively associated with MeRV prevalence. Higher MeRV prevalence was generally observed in summer, when rainfall is lower (see Table 2) and density of aphids (the main vectors of potyviruses) peaks (Nebreda et al., 2004;Mondal et al., 2016). Although we did not characterize MeRV transmission and the specific MeRV aphid vector species involved, our results would be compatible with the maximum activity of the associated MeRV vectors. This highlights the relevance of considering factors associated with virus transmission to fully understand the determinants of infection risk.
Epidemiological changes may result in genetic modifications in the parasite population (Grenfell et al., 2004;Archie et al., 2009;Pybus and Rambaut, 2009). Changes in MeRV prevalence were accompanied by variations in the virus population genetic and haplotype diversities. Viral populations may modify their genetic diversity by changing fixation rates, population sizes, and/or selection pressures (Moya et al., 2000). We could not analyze MeRV fixation rates, as the monitored time span was not sufficiently large for a meaningful estimate. However, given that the observed fluctuations in MeRV population genetic diversity occurred in short periods of time, it is unlikely that these are associated with changes in fixation rates. In turn, multiple regression model analyses indicated that MeRV prevalence was the best predictor of population genetic/haplotype diversity (higher virus genetic/haplotype diversities were observed at increasing prevalence). Higher virus prevalence results in increasing parasite population sizes (Burdon and Thrall, 2008). Therefore, our results suggest that MeRV genetic diversity is likely modulated by fluctuations in virus population size. This would be in agreement with theoretical elaborations predicting that higher population sizes may lead to higher genetic diversity (Scholle et al., 2013;Lanfear et al., 2014). Interestingly, MeRV prevalence was the best predictor of d S , both variables showing a positive association, but not of d N . Accordingly, selection pressures, measured as d N /d S , were negatively associated with virus prevalence. This indicates that neutral evolution, rather than adaptive selection, might be responsible for the changes in MeRV population genetic diversity associated with fluctuations in virus prevalence. Similar results have been obtained in other plant virus populations infecting wild hosts (Lima et al., 2013;Rodelo-Urrego et al., 2015).
Some cautionary comments, however, are called upon our results. First, although we conclude that MeRV is a specialist virus, we cannot discard that this virus is able to infect host species that, due to their low frequency in evergreen oak forests, have been insufficiently sampled. We believe, however, that this is unlikely given the sampling effort done in this work (over 10,000 samples). Second, the analyses of the association between MeRV prevalence and ecological factors are based on the data from six mountain rue populations, and the analyses of the association between MeRV evolutionary parameters and ecological/epidemiological factors are based on data from a single host population. We are aware that this might be a small sample size. However, it was enough to detect significant, and in many cases strong, correlations between the studied parameters. Third, the analysis of the effect of MeRV infection on host mortality is based on the differential virus prevalence between adult and young plants. It should be noted that these results could be also explained if virus titer in adult plants would be much lower than in younger ones, such that it could lead to an underestimation of virus prevalence in adults. However, no significant differences in MeRV accumulation were observed between young and adult plants (F 1,8 = 0.63; P = 0.450). Fourth, although the bestranked models explained a large proportion of the variation in MeRV infection risk and population genetic diversity, other factors not considered here could also play a role in MeRV evolution and epidemiology. For instance, host genetic diversity has been also reported as an important determinant of virus disease risk and population genetic diversity in wild ecosystems (Pagán et al., 2012;Rodelo-Urrego et al., 2015). Unfortunately, the lack of information on the mountain rue genomic sequence prevented including this variable in our analyses. Analyses in other host-pathogen system would help to tests the generality of our observations. In summary, our results provide evidence of the relevant role that infections of native plant viruses may play in modulating the population dynamics of their hosts in wild ecosystems. The effect of the most important ecological factors driving MeRV infection risk and population genetic diversity in its wild host partially supports theoretical predictions. Deviations from these predictions could be at least partially attributed to the behavior of MeRV as specialist parasite. Because both infection risk and population genetic diversification have been proposed to be involved in the appearance of new infectious diseases (Keesing et al., 2006;Holmes, 2009), our results may contribute to understand the factors driving emergence.

AUTHOR CONTRIBUTIONS
CR-N collected and analyzed plant samples, obtained data on ecological factors and virus sequences, assembled the MeRV genome and wrote the manuscript. NM collected samples, performed multiple regression analyses, and participated in drafting the manuscript. IP designed and coordinated the study, helped collecting samples, and wrote the manuscript.

FUNDING
This work was supported by a Career Integration grant (PCIG11-GA-2012-322100), and by grant BIO2016-79165-R, Plan Nacional de I+D+i, from Ministerio de Economía y Competitividad, Spain.