Global genetic diversity and evolutionary patterns among Potato leafroll virus populations

Potato leafroll virus (PLRV) is a widespread and one of the most damaging viral pathogens causing significant quantitative and qualitative losses in potato worldwide. The current knowledge of the geographical distribution, standing genetic diversity and the evolutionary patterns existing among global PLRV populations is limited. Here, we employed several bioinformatics tools and comprehensively analyzed the diversity, genomic variability, and the dynamics of key evolutionary factors governing the global spread of this viral pathogen. To date, a total of 84 full-genomic sequences of PLRV isolates have been reported from 22 countries with most genomes documented from Kenya. Among all PLRV-encoded major proteins, RTD and P0 displayed the highest level of nucleotide variability. The highest percentage of mutations were associated with RTD (38.81%) and P1 (31.66%) in the coding sequences. We detected a total of 10 significantly supported recombination events while the most frequently detected ones were associated with PLRV genome sequences reported from Kenya. Notably, the distribution patterns of recombination breakpoints across different genomic regions of PLRV isolates remained variable. Further analysis revealed that with exception of a few positively selected codons, a major part of the PLRV genome is evolving under strong purifying selection. Protein disorder prediction analysis revealed that CP-RTD had the highest percentage (48%) of disordered amino acids and the majority (27%) of disordered residues were positioned at the C-terminus. These findings will extend our current knowledge of the PLRV geographical prevalence, genetic diversity, and evolutionary factors that are presumably shaping the global spread and successful adaptation of PLRV as a destructive potato pathogen to geographically isolated regions of the world.


Introduction
The genetic diversity facilitates virus evolution and adaptation to new environments and regulates acute viral infections to species from all of life's domains being parasitized. Factually, massive diseases around the globe suspected to be caused by viruses have been documented for millennia (Wasik and Turner, 2013;Kumar et al., 2017;Gelbart et al., 2020;Rubio et al., 2020;Pascall et al., 2021;Harvey and Holmes, 2022). Potato leafroll Polerovirus (PLRV), from the genus Polerovirus and the family Solemoviridae, consists of a 5.3-5.7 kb long positive sense (+) monopartite singlestranded RNA [(+)ssRNA] genome with virus protein genomelinked (VPg) cap bounded at the 5′ end and an OH group at the 3′ end without poly(A) tail or tRNA-like organization Walker et al., 2021). Typically, the PLRV genome contains 6-7 overlapping open reading frames (ORFs), which are organized into genomic and sub-genomic RNAs (Figure 1; Krueger et al., 2013;Sõmera et al., 2015;Barrios Barón et al., 2017). The RNA-dependent RNA polymerase (RdRp), expressed via ribosomal frameshifting, is a translational fusion of ORF1 that encodes P1 and ORF2, which encodes P2. Additionally, a Rap1 translation initiates through a peculiar internal ribosome entry site (IRES) around 1,500 nt downstream of the 5′ end of the gRNA. From sub-genomic RNA 1 (sgRNA1), a capsid protein (CP), involved in virion formation, vector transmission, and virus movement, is encoded from ORF3 (Kaplan et al., 2007;Smirnova et al., 2015). Subsequently, ORF3 extends and ribosomes incorporate one amino acid and continue to translate ORF5 into a CP-read through domain (CP-RTD) as a fusion protein of the translational fusion of ORF3 and ORF5, which is involved in vector transmission and virus movement (Peter et al., 2008;Boissinot et al., 2014;Xu et al., 2018). Furthermore, leaky scanning of sgRNA1 results in the expression of P3a from ORF3a, reported for virus long-distance movement, and P4 from ORF4, a phloem restricted or cell-to-cell movement protein. In addition, PLRV encodes P6 (ORF6) and P7 (ORF7) from the sub-genomic RNA 2 (sgRNA2; Figure 1). The P7 protein plays the most important role in the enhancement of aphid fecundity (Patton et al., 2020). Interestingly, the characteristic feature of poleroviruses is the presence of ORF0 that encodes P0 protein, significantly involved in the viral suppressing of RNA silencing (VSR). The P0 with CP and VPg also contributes to vector specificity (Pazhouhandeh et al., 2006;Baumberger et al., 2007;Csorba et al., 2010;Patton et al., 2020). There are more than 50 viruses that infect potatoes (Kreuze et al., 2020); among them, PLRV is a major pathogen of potato (Solanum tuberosum) and transmitted by aphids (Myzus persicae) being dreadfully responsible for annual production losses of more than 20 million tones globally (Kumari et al., 2020;Patton et al., 2020). It is ranked the second most prevalent pathogen in Asia, Africa, Europe, North America, and South America, threatening the sustainable food production system (CABI, 2010). In most cases, viral interaction may occur in the host as the result of mixed viral infection leading to disease severe epidemics. PLRV in association with potato potyvirus Y (PVY) causes additional losses to the marketable potato production industry (Palukaitis, 2012;Valkonen, 2015;Okeyo, 2017;Byarugaba et al., 2020).
The frequent emergence of new viral diseases is primarily due to the ability of the viruses to rapidly evolve. Mostly, viruses retain genomic flexibility in order to adapt to various hosts and vectors (Rantalainen et al., 2011;Garcia-Ruiz, 2018;Nigam and Garcia-Ruiz, 2020;Rubio et al., 2020). Virus evolution and host adaptation are merely determined through genetic diversity among viral populations (Obenauer et al., 2006;Moury and Simon, 2011;Gelbart et al., 2020;Nigam and Garcia-Ruiz, 2020;Rubio et al., 2020;Farooq et al., 2021;. This viral evolutionary process, which is mediated by RNA recombination and preferential accumulation of mutations in certain portions of the genome, leads to the emergence of new viral strains with exceptional characteristics (Obenauer et al., 2006;Moury and Simon, 2011;Dombrovsky et al., 2013;Ibaba et al., 2017;Nigam and Garcia-Ruiz, 2020;. In general, poleroviruses evolve due to the most common genetic variations, which occur in an area encoding VPg, RdRp, and CP, and in a non-coding intergenic region between ORF2 and ORF3 at 5' UTR of the sub-genomic RNA 1 (Pagán and Holmes, 2010;Dombrovsky et al., 2013;Ndikumana et al., 2017;Kwak et al., 2018;. Previous studies have revealed that poleroviruses exhibit higher single-nucleotide polymorphisms (SNPs) on ORFs encoding P0, P1, and CP-RTD and lower SNPs between ORFs, encoding P2 through P4, with exception of a conserved region of P2 within the P1-P2 (Huang et al., 2005;Delfosse et al., 2021;.
Since the PLRV genome comprises overlapping ORFs, mutations may affect various proteins to mediate virus evolution. However, comprehensive knowledge regarding the geographical distribution, standing genetic diversity, and evolutionary patterns existing among global PLRV populations is currently not available, which is crucial for designing sustainable disease management strategies. To fulfill the knowledge gap, we vigorously analyzed the recent global biodiversity, genomic variations, evolutionary endpoints, and the patterns of disordered proteins to gain further insights into the genetic complexity and molecular variability among PLRV populations. To date, with the advances in next-generation sequencing technology, a total of 84 full-length genome sequences of PLRV isolates have been reported across 22 different countries.
Frontiers in Microbiology 03 frontiersin.org PLRV isolates revealed a high level of genetic variability while the distribution patterns of recombination breakpoints across various genomic regions of PLRV isolates remained variable and most of its populations are evolving under strong purifying selection. These findings will expand our knowledge of PLRV geographical prevalence, genetic diversity, and evolutionary forces that are presumably governing the continual PLRV global spread and successful adaptation to different ecosystems.

Materials and methods
Acquisition of full-length PLRV genomic sequences A total of 84 PLRV full-length genomic RNA sequences were retrieved from the GenBank database 1 on 20 July 2022. Detailed information about the attributes (accession, isolate, country of origin, host, and sequence length) of these sequences is given in Supplementary Table S1. These 84 PLRV sequences were selected 1 www.ncbi.nlm.nih.gov for the subsequent analyses based on the criteria; partial sequences (<95% coverage of the reference genome), sequences with >2.5% of unknown characters, and sequences with <90% similarity with reference genome have been excluded.

Multiple sequence alignments (MSAs) and phylogenetic analysis
The MSAs were prepared by aligning 84 full-length genomic RNA sequences of globally reported PLRV isolates derived from the GenBank database (Supplementary Table S1), using the MUSCLE tool in the software Geneious Prime version 9.0.2. Likewise, alignments of the individual PLRV genes (P0, P1, RdRp, CP, CP-RTD, MP, and RTD) among corresponding genes of the globally reported PLRV isolates were performed. All alignments were manually analyzed and adjusted (when necessary) before proceeding to the subsequent analysis. The phylogenetic tree was constructed in the molecular evolutionary genetics analysis (MEGA-X) software by the maximum likelihood (ML) method with 1,000 bootstrap replicates (Kumar et al., 2018). The tree was visualized and annotated using iToL (Letunic and Bork, 2021). Finally, the distribution and matrix of pairwise identities among Schematic representation of PLRV genomic organization and strategies for gene expression. The ribosomal frameshifts and read-through strategies are indicated. The types and positions of different ORFs along with corresponding translated proteins are represented by colored boxes while solid lines denote the non-coding regions. The abbreviations denote VPg, viral genome-linked protein; IRES, internal ribosomal entry site; Rap1, replication-associated protein 1; CP, coat protein; MP, movement protein; RdRp, RNA-dependent RNA polymerase. The genome annotation is based on the full genome sequence of PLRV (GenBank accession: D13954.1).

Estimation of nucleotide diversity and haplotype variability indices
The nucleotide diversity or π (represented by the average pairwise number of nucleotide differences per site) was calculated using DnaSP V.5 (Librado and Rozas, 2009). The significant differences in the average nucleotide diversity among all PLRV sequences were estimated by calculation of their 95% bootstrap confidence intervals. A 100-nt sliding window with a step size of 10 nts across the full-length sequences of PLRV was implicated to calculate π. Additional population genetics-related parameters including the number of haplotypes (H), haplotype diversity (H d ), the number of polymorphic sites (S), Watterson's theta (θ), the total number of mutations (Eta), and Tajima's D were also estimated for PLRV genomes and individual coding sequences (P0, P1, RdRp, CP, CP-RTD, MP, and RTD) using DnaSP V.5 (Librado and Rozas, 2009).

Recombination analysis
The occurrence of recombination events across 84 full-length PLRV sequences was investigated by using several methods including Rdp, SisterScan, Bootscan, Chimera, Geneconv, maximum Chi-square, and 3Seq. The recombination analysis was implemented in the recombination detection program (RDP) V.4 (Martin et al., 2015). For all methods, alignments were performed with default settings. The p-values less than the Bonferronicorrected cutoff (0.05) were used to infer the statistically significant results. The signals detected by at least four methods were regarded as reliable recombination events.

Analysis of positive and negative selection
The identification of potential positively and negatively selected sites in the coding sequences of P0, P1, RdRp, CP, CP-RTD, MP, and RTD was performed by using four distinct methods including single-likelihood ancestor counting (SLAC), partitioning for robust inference of selection, fixed-effects likelihood and random-effects likelihood (Scheffler et al., 2006). All these methods were employed in the adaptive evolutionary tool "Datamonkey" available online at www.datamonkey.org (Pond and Frost, 2005). To exclude the possibility of misleading results, the recombination breakpoints among all PLRV sequences (P0, P1, RdRp, CP, CP-RTD, MP, and RTD) were searched by the implementation of the Genetic Algorithm Recombination Detection (GARD) method (Kosakovsky Pond et al., 2006). Further, to analyze the mode and strength of natural selection pressure acting on the coding sequences of P0, P1, RdRp, CP, CP-RTD, MP, and RTD, the ratio of non-synonymous to synonymous substitutions (dN/dS) was estimated by SLAC method based on the GARD-corrected phylogenetic trees.

Prediction of intrinsically disordered proteins
Several studies have been performed to predict and validate the disordered protein regions in the proteins of plant-infecting RNA viruses (Charon et al., 2016(Charon et al., , 2018Walter et al., 2019;Byrne et al., 2021;. In our study, the probability of protein disorder was predicted for P0, P1, RdRp, CP, CP-RTD, MP, and RTD proteins using the Protein DisOrder prediction System (PrDOS; Ishida and Kinoshita, 2007). PrDOS combines the disorder of homologous proteins and template and predicts residue disorder by a sliding window analysis of the target protein sequence. The designated PLRV reference genome sequence (NC_001747.1) was used for this purpose. The ordered and disordered protein regions were mapped and distinguished by different colors along with the prediction of the disorder probabilities. A default (0.5) threshold value indicating a false positive (FP) rate of 5% was used.

Analysis of genetic diversity among PLRV populations
To comprehend the exact pattern of virus evolution, the comparison of molecular genetic variability among viral populations is mandatory. Therefore, we conducted an in-depth analysis of genetic variability based on all datasets containing 84 isolates of PLRV genes (P0, P1, RdRp, CP, CP-RTD, MP, and RTD). The genetic diversity, determined for the complete genome analysis of PLRV populations, revealed that PLRV had a high number of mutations (Eta = 29.23%) with a high haplotype diversity (H d = 0.999), an average nucleotide diversity (π = 0.02307), significantly high negative Tajima's D value (−2.23905**), and an average number of segregating/polymorphic sites (θw = 0.05416; Figure 3; Table 1).
Furthermore, we performed neutrality tests, such as Tajima's D and Fu and Li's D or Fu and Li's F. Fu and Li's D and Fu and Li's F values were significantly negative for the whole genome of PLRV and RdRP than other genes, such as P0, P1, CP-RTD, MP, and RTD (Table 1). Tajima's D value was also highly negative for RdRP (−2.34169), indicating the presence of polymorphic sites among this gene. Likewise, Tajima's D value was highly negative in P1 (−2.29075) as compared to CP-RTD (−2.10956) and RTD (−2.08312). Finally, Tajima's D values remained highly negative for CP (−2.13716) followed by a closer value in MP (−2.08740) trailed by P0 (−1.51524), whereas the PLRV population had highly negative Tajima's D  value (−2.23905), showing the existence of extensive polymorphic sites in the PLRV population. These observations statistically remained significant, except for the observation of P0 ( Figure 3C; Table 1).
In addition, the average number of segregating/polymorphic sites (θw) was higher in RTD gene (θw = 0.0668) and P1 (θw = 0.0570) as compared to CP-RTD (θw = 0.0549), RdRP (θw = 0.0529), and P0 (θw = 0.0412). However, the lowest values were found in the CP (θw = 0.0288) and MP (θw = 0.0241), while the value of Watterson's theta for the full-length PLRV sequences was 0.0542 ( Figure 3D; Table 1). These findings indicate that each gene has high genetic diversity displaying a high proportion of haplotypes, although P0 and CP-RTD appeared to be the most genetically diverse genes in the genome of PLRV. Recombination is predominantly driving the genomic diversity of PLRV To investigate the existence of recombination events among geographically isolated PLRV populations, we employed all seven methods, including RDP, GeneConv, Bootscan, MaxChi, Chimera, SisScan, and 3SEQ ( Figure 4; Table 2). For all analyzed datasets, only the exceptional recombination events detected by at least four methods supported by a p-value of <0.001 were considered highly significant. The results showed that 63/84 isolates had detectable recombination events. Among them, the PLRV population that originated from Kenya was more likely to be prone to recombinational changes. For instance, a frequent recombination event was detected among 31 sequences of PLRV (p-value = 3.326 × 10 −17 ), and the predicted beginning and ending breakpoints were located at 1125-5015 nt of the genomic RNA, covering full sequences of RdRP, CP, and MP while partial sequences of P1 and RTD ORFs ( Figure 4; Table 2). The major recombinant isolate (MN689367.1) associated with this event was reported from Kenya with KY856831.1 (Argentina) and NC001747.1 (United Kingdom) being major and minor parents, respectively. Furthermore, the second most frequently detected recombination event was found in 12 isolates with a high significance (p-value = 4.688 × 10 −14 ). The recombination breakpoints were distributed between 42 and 1,125 nt that covered a full portion of ORF0, encoding partial P0 and partial sequence of P1 ORF (Figure 4; Table 2). For this event, the major recombinant sequence was MN689365.1 (Kenya) with major (MN689381.1) and minor (KX712226.1) parents from Kenya and Colombia, respectively. Likewise, a third recombination event was found in 7 isolates (p-value = 2.057 × 10 −04 ) at 5745-2414 nt positions and covered a partial sequence of P7, 5' UTR, P0 (full), P1 (full), and RdRp (partial). The major recombinant isolate (MN689380.1) originated from Kenya while the major (MN689369.1) and minor (MN689384.1) parental sequences were reported from Kenya ( Figure 4; Table 2). Notably, of ten significantly supported recombination events, five were associated with isolated reported from Kenya, while other recombinant isolates were reported from Australia (1), Germany (1), Argentina (1), Colombia (1), and China (1).
Colombian populations of PLRV were also detected to have recombinational changes. One out of five sequences was found to have recombination events (p-value = 1.014 × 10 −10 ) at 5856-751 nt positions, which partially mapped the dynamically functional P0 and P1 ORFs (Figure 4; Table 2). Moreover, one PLRV isolate (JQ346189.1) from Germany showed recombinational changes (p-value = 8.873 × 10 −32 ), and the predicted recombination breakpoints were located at 5793-2927 nt positions; mapping approximately half of the PLRV genome and including sub-genomic RNAs (Figure 4; Table 2). The analyses of recombination events and mapping of recombination breakpoints among the PLRV analyzed sequences signify that great diversity exists among viral populations, particularly at genomic and sub-genomic RNAs encoding various functional proteins. Our results demonstrate that some of the genes have completely recombinant sequences, such as P0, P1, RdRp, CP, MP, and CP-RTD, while RTD and P6, and P7 have partial sequences under the effect of recombination.
Purifying selection pressure governs The evolution of PLRV In order to attain a better understanding of the possible role of selection pressure in the evolution of PLRV, we compared the non-synonymous to synonymous substitutions (dN/dS) between the analyzed datasets, which included major functional proteins of PLRV, such as P0, P1, RdRp, CP, CP-RTD, MP, and RTD. Interestingly, the results indicated that the observed diverse variations on the genome of PLRV were being driven both by the positive and negative selection pressure as in addition to negative selection, several positively selected sites were also found among all PLRV genes with variable frequency ( Figure 5). Particularly, both positively and negatively selected sites in the coding sequences of these genes were detected. Although most of the genes had their major part of tested sites under negative or purifying selection pressure (dN/dS < 1), the impact of positive selection (dN/dS > 1) and neutral selection (dN/dS = 1) could not be negated. However, the probability of positively and neutrally selected sites remained much lower than the negatively selected sites. Specifically, the proportion of negatively selected sites in P0 (13/239), P1 (24/592), RdRp  Figures 5A,F). The considerably higher percentages of positively selected residues in RTD (4.36%) followed by RdRp (3.44%) indicate their functional importance in the relevant proteins. In conclusion, all tested proteins were observed to evolve under purifying and positive selection pressures with the former having a significant impact because the percentage of negatively selected sites was higher ( Figure 5), compared with a significantly lower percentage of positive selection pressure found to variably affect all PLRV proteins ( Figure 5).

Comparison of disordered residues among PLRV proteins
The intrinsically disordered proteins (IDPs) and intrinsically disordered protein regions (IDPRs) are well-known to participate in protein-protein interactions (PPIs) involving multiple or diverse binding partners. Additionally, these unstructured proteins Frontiers in Microbiology 08 frontiersin.org regulate several vital processes including the assembly of protein complexes, transcription, and translation (Uversky, 2002;Szilágyi et al., 2008). The results of protein disorder prediction showed that CP-RTD contained the highest percentage (48%) of disordered amino acids followed by CP (44%), MP (37%), P1 (35%), RdRp (29%), and RTD (19%). Notably, P0 contained the lowest percentage (5.4%) of disordered residues (Supplementary Figure S1). A visual summary of the disorder probability and position in the target amino acid sequences is shown in Figure 6. We further compared the N-and C-terminals of all proteins to investigate the The most frequent recombination events in the PLRV genome (indicated by pink outline) were detected by using RDP V.4. Panels (A,C,E) represent the recombination events detected among 14, 8, and 2 isolates, respectively. The x-axis indicates the position in alignment while the y-axis denotes the percentage value for bootstrap support. Panels (B,D,F) represent the probabilities of recombination breakpoints best supported by p-values (displayed by the color key). Dark red peaks marked with white arrows indicate the statistically optimal position of recombination breakpoint pairs. Pannel G is the schematic presentation of all significantly supported recombination events found with their corresponding position on the PLRV genome.
Frontiers in Microbiology 09 frontiersin.org patterns of disordered residues. Results demonstrated that of 5.4% disordered residues of P0, approximately half (2.5%) were confined to the N-terminus, while the rest (2.9%) were detected at the C-terminus of the protein ( Figure 7A). Likewise, in P1, 31.3% of disordered residues were found at the 3′ end, while only a small proportion (3.2%) of disordered amino acids was located at the 5′ half of the protein ( Figure 7B). Moreover, RdRp displayed a similar pattern where 16.1% disordered residues at the 3′ end while 13% at the 5′ half were located ( Figure 7C). However, CP exhibited a contrasting pattern of distribution for the disordered residues. We found that a higher percentage (32.2) of disordered residues was located at the 5′ region as compared to a lower percentage (12.2) at the 3′ end ( Figure 7D). Additionally, for CP-RTD and MP proteins, the disordered residues were 20.7 and 30.9% at the 5′ half, while 27 and 6% were positioned at the 3′ end, respectively ( Figures 7E,F). Finally, RTD showed a pattern similar to P0 where 9.8 and 8.8% of disordered residues were located at the 5′ and 3′ halves, respectively ( Figure 7G).

Discussion
In the present study, we performed a comprehensive analysis of the current geographical prevalence, genomic variations, recombination, and evolutionary endpoints associated with global PLRV populations. We also investigated and compared the disordered amino acids among all PLRV-encoded proteins (P0, P1, RdRp, CP, CP-RTD, MP, and RTD). We found that among full-length PLRV genomes reported from 22 countries, the PLRV isolates originating from Kenya displayed the highest genetic variability. The most significantly supported recombination event was also associated with PLRV isolates reported from Kenya. Notably, in the case of the observed type of infected host and disease prevalence, the influence of sampling biasness together with other factors cannot be ruled out (Lacroix et al., 2016). Further analysis at the individual gene level demonstrated that RTD and P1 contain the highest number of mutations. We found a considerable yet variable number of positively selected sites among all genes (P0, P1, RdRp, CP, CP-RTD, MP, and RTD). While a significant influence of purifying (negative) selection pressure seems to govern the evolution of the majority of PLRV proteins, a considerably higher percentage of positively selected amino acids in RdRp and RTD demonstrate the functional importance of these proteins. Additional results revealed that CP-RTD protein contains approximately half (48%) of amino acids identified as disordered. Given that previous studies on PLRV are either focused on one region and a single ORF (Zarghani et al., 2012) or include a few isolates , our findings will provide detailed information on the aforementioned aspects of PLRV genetic diversity.
As mentioned earlier, PLRV belongs to poleroviruses that are exclusively vectored by aphid species (Kaplan et al., 2007). During the insect-mediated transmission and infection processes, several hosts (insects and/or plants) and  Frontiers in Microbiology 10 frontiersin.org environmental factors and their interactions impose certain evolutionary constraints on these viruses (Wan et al., 2015;Li et al., 2016;Nigam and Garcia-Ruiz, 2020). In the course of virus-host co-evolution, the viral and host-related factors mainly regulate the compatible or incompatible interactions (Garcia-Ruiz, 2018). The viral factors are well-known to modulate the host physiology to facilitate the dissemination of virions through insect vector feeding (Mauck et al., 2014).

Frontiers in Microbiology 11 frontiersin.org
Additionally, climatic changes support the emergence of frequent viral epidemics by facilitating the vector populations into new and geographically isolated regions which ultimately expose new hosts to these viral pathogens (Trebicki, 2020). Thus, to reduce the chances of exclusion from the population during selection by the host (vector/plant) or the environment, viruses must attain higher levels of fitness, and maintain balanced genomic flexibility and functionality. The genome-wide profiling of variability among P0, P1, RdRP, CP-RTD, and RTD showed that RTD has the highest percentage of mutations followed by P1, CP-RTD, and RdRP. On the contrary, the accumulation of mutations remained lowest in the ORF Distribution of ordered and disordered residues across different proteins (P0, P1, RdRp, CP, CP-RTD, MP, and RTD) of PLRV. Color-based coding was used to differentiate between disordered (red) and ordered (black) residues. The horizontal red line denotes a default (0.5) threshold value representing a 5% false positive (FP) rate. Percentage of disordered protein residues at N-and C-terminus of PLRV-encoded proteins (P0, P1, RdRp, CP, CP-RTD, MP, and RTD). Mapping of the disordered amino acids was performed using PrDOS tool. Color-based coding was used to differentiate between disordered (red) and ordered (black) residues. A default (0.5) threshold value indicating a false positive (FP) rate of 5% was used.

Frontiers in
encoding MP. A similar trend was observed in the nucleotide diversity results where RTD displayed a higher value of Pi followed by P0 (Figure 3; Table 1). A recent study involving the combinatory analysis of PLRV and five other most variable poleroviruses demonstrates that P0 and CP-RT regions exhibited a high accumulation of nucleotide substitutions . Further results showed that a highly negative value of Tajima's D was associated with all genes under study, indicating the presence of excessive low-frequency polymorphism and a possible expansion in the population size after deviation from neutrality. Notably, different forms of mutations in the genomic RNA of viruses are introduced by RdRp during replication of the viral RNA (García-Arenal et al., 2001). Additionally, recombination events during the replication of viral RNA rapidly generate genetic diversity . Both interspecific and intraspecific forms of recombination are frequently found in poleroviruses (Pagán and Holmes, 2010). The mutated, new viral genomes derived from RNA recombination Frontiers in Microbiology 13 frontiersin.org might exert positive, deleterious, or neutral effects on the fitness of viruses, which consequently lead to the removal or fixation of these genomes in the viral populations Nigam and Garcia-Ruiz, 2020). These new genomes lead to the appearance of new strains/species of poleroviruses enabling them to cause infections in the new hosts (Ibaba et al., 2017). According to this model, it is reasonable to assume that the accumulation of mutations in the PLRV genomes is not random. Rather, preferential accumulation of mutations occurs in the proteins that are key determinants of vector transmission or host adaptation. Recombination is the most commonly found phenomenon among +ssRNA viruses and facilitates genetic diversity via switching RNA genetic segments between viral isolates. This results in the introduction of new, resistance-breaking/virulent strains and host expansion (Nagy, 2008;Traoré et al., 2010;Garcia-Ruiz, 2018). In this study, we found a total of 10 recombination events spanning different regions of the PLRV genome ( Table 2). The most frequently detected recombination event was associated with P0 while other recombination hotspots were also detected in other coding sequences including P1, RdRp, and CP-RTD, with variable levels of significance ( Figure 4). The differential occurrence of recombination sites at the N-and/or C-terminus of these ORFs and overlapping regions suggest that possibly, different recombination-driven evolutionary histories are associated with these sites. Recent studies on poleroviruses have demonstrated that recombination breakpoints are located across the RdRp and RTD regions of Brassica yellows virus (BrYV;  while recombination events in P0 and CP are hypothesized to possibly drive the evolution of Turnip yellows virus (TuYV; . Likewise, studies have demonstrated that recombination events are associated with specific hotspots in the P2 and CP regions (Dombrovsky et al., 2013;Kwak et al., 2018). In the future, it will be worth studying how recombination-driven changes modulate the biological functions of these proteins.
RTD had the highest percentage of sites under negative selection followed by CP-RTD and RdRp, as compared to other genes. The highest number of positively selected sites were associated with RTD, followed by RdRp and P1. Noticeably, P0 and MP did not contain sites evolving under neutral selection pressure. Although, the ratios of negative selection associated with each ORF were variable, and there was a considerably high number of positively selected codons as well; it is evident that overall, purifying selection is the major evolutionary pressure acting on the PLRV genome ( Figure 5). Likewise, findings from polerovirus-based studies  suggest that the negative selection pressure acting on different ORFs of the viral genome might be essential to maintain the protein functionality, ultimately affecting the overall viral fitness.
The disordered protein regions, often referred to as IDPRs or IDPs lack a stable tertiary or three-dimensional structure and proper folding. Owing to their greater plasticity and flexibility, the IDPRs and IDPs are known to play vital roles in various biological functions (Wright and Dyson, 1999;Uversky, 2019). Some of the important biological functions associated with IDPRs and IDPs are signaling, cell regulation, survival, differentiation, proliferation, and apoptosis (Kozlowski and Bujnicki, 2012;Katuwawala et al., 2019). While some of these proteins are assumed to participate in disease etiology and possibly represent novel drug targets (Hu et al., 2016). In viruses, IDPs are recognized to govern numerous functions including adaptation to the dynamic host-related environment, counteracting host-mediated defense mechanisms, and regulation of gene expression to facilitate viral replication in the host (Gitlin et al., 2014;Xue et al., 2014;Mishra et al., 2020). In the present study, we found that CP-RTD of the PLRV contains a high percentage (48%) of disordered amino acids (Figure 7). The identified disordered regions in CP-RTD encompass domains that essentially contribute to the virion formation, systemic movement of the virus, and aphid-mediated viral transmission (Peter et al., 2008). The presence of disordered regions in the CP-RTD of PLRV elaborates the effect of host-dependent mutations in this region (Peter et al., 2008) and it might also explain that why the viral genome is capable of tolerating high rates of mutation (Tokuriki et al., 2009;Sanjuán et al., 2010). Additional research is imperative to deeply understand the correlation of high mutation rates, accumulation of disordered regions, and rapid evolution of the viral genome.
The dynamic involvement of IDPs in the protein-protein interactions could lead to the expansion of the virus host range (Charon et al., 2018). Researchers have demonstrated that in the PVY genome, IDPs might be associated with virus adaptation either by reducing the fitness cost caused by resistance-breaking mutations or by larger exploration of the evolutionary pathways. Eventually, IDPs positively affect the adaptive capacity of RNA viruses (Charon et al., 2018). From an evolutionary point of view, intrinsically disordered regions (IDRs) are thought to have a higher mutational permissiveness than highly ordered regions (Lafforgue et al., 2022). Thus, IDRs-associated amino acid polymorphism could result in the emergence of adaptive solutions during the selection process (Lafforgue et al., 2022). It has been documented that during the evolution of potyviruses, the IDPRs tend to evolve faster than the ordered regions (Charon et al., 2016). Notably, another study involving an insect-infecting RNA virus (Nodamura virus, NoV) demonstrates that rapidly evolving IDRs might act as the reservoir for evolutionary innovation and play vital roles in virus adaptation to new environments (Gitlin et al., 2014). In this context, the IDPs/IDPRs are hypothesized to regulate the adaptive ability of PLRV either by introducing distinct evolutionary pathways or by minimizing the mutation-induced fitness penalty; however, further experiments are imperative to validate this hypothesis.

Conclusion
Our findings provide compelling evidence that global PLRV populations have high genetic diversity and the PLRV-encoded proteins are evolving both under positive and purifying selection with later having a more significant effect. The genome-wide profiling of Frontiers in Microbiology 14 frontiersin.org variability shows that high mutation and recombination are the main factors governing the rapid evolution of PLRV genomes. The presence of a significantly high number of disordered sites in the CP-RTD region might enable PLRV to attain efficient virion formation, systemic movement, and transmission by aphid vector. Previously less-known, these mechanisms presumably are the major determinants of PLRV adaptation to new environments by broadening the host range and pathogenicity levels. These results lay solid foundations for the planning and implementation of strategies aimed at the timely diagnosis and sustainable management of PLRV.

Data availability statement
Publicly available datasets were analyzed in this study. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary material.

Author contributions
TF and MH conceived the idea, acquired and analyzed data, and prepared the original draft. MS, HR, UW, MS, and IS analyzed data. MA, XS, and YT contributed to review and editing of the manuscript. ZH edited the manuscript, acquired funding, and supervised the study. All authors contributed to the article, and read and approved the submitted version. Charon, J., Barra, A., Walter, J., Millot, P., Hébrard, E., Moury, B., et al. (2018).