Geographic Distribution and Genetic Diversity of Rice Stripe Mosaic Virus in Southern China

Rice stripe mosaic virus (RSMV) transmitted by the leafhopper Recilia dorsalis is a tentative new species in the genus Cytorhabdovirus identified recently in South China. To explore its geographic distribution and genetic diversity, field investigation and viral whole-genome sequencing were conducted in this study. The results indicated that RSMV was present in the rice samples collected across southern China. Twelve representative samples from different geographical regions were selected for viral whole-genome sequencing and the viral genome variation was analyzed in combination with a previously reported RSMV isolate. Identity analysis showed that the genome sequences of 13 RSMV isolates were highly conserved with nucleotide identities over 99.4%. There was a strong negative selection pressure during the evolution of RSMV with more transitions (72.08%) than transversions (27.92%) found between the RSMV isolates. Among the seven genes encoded by RSMV, the P gene was the most variable, followed by N, M, L, and G; the P3 and P6 amino acid sequences were not found to be mutated and no mutations were found in the non-coding region. A phylogenetic tree based on the RSMV whole-genome nucleotide sequences revealed that all RSMV isolates clustered in two groups based on geographical origin. Notably, the L proteins of the Guangxi and Hainan isolates had five and one specific amino acid sites, respectively, suggesting that the L gene has undergone environmental adaptive variation during the dispersal of RSMV.


INTRODUCTION
Rice (Oryza sativa) is a major staple crop worldwide with more than 90% of rice production coming from Asia (Bheemanahalli et al., 2016). Several viruses has been reported infecting rice which result in yield losses (Uehara-Ichiki et al., 2013) with rice stripe mosaic virus (RSMV) representing an emergent pathogen since it was first detected in southern China in 2015 (Yang et al., 2017a). Infected rice plants show slight dwarfing, yellow stripes, mosaic and twisted tips on leaves, increased tillering, unfilled grains, and yield losses (Yang et al., 2017a). RSMV is a tentative new species of the genus Cytorhabdovirus of the family Rhabdoviridae (Yang et al., 2017a) which is transmitted by the leafhopper Recilia dorsalis (Yang et al., 2017b). It replicates in the cytoplasm of infected cells and has a negative-sense single-strand RNA genome of about 12.7 kb which encodes seven proteins: N, P, P3, M, G, P6, and L (Yang et al., 2017a). Although its host range, genomic information, vector, and transmission characteristics have been studied, the geographic distribution and genetic diversity of RSMV remain unclear.
In this study, the distribution of RSMV in southern China was determined analyzing 459 rice samples collected from southern China during 2017 and 2018. The genetic diversity of 12 RSMV was explored by determining the whole-genome sequence of RSMV isolates from different geographical areas. These results provided basic information for further analyses of the genetic variability of RSMV which will be useful for developing strategies towards the management of this virus.

Disease Investigation and RSMV Detection
Between May 2017 and May 2018, field surveys were performed in three provinces (Guangdong, Guangxi, and Hainan) of southern China. Rice samples showing leaf mosaic or stripe symptoms were sampled ( Table 1). RT-PCR detection of RSMV was carried out as previously described (Yang et al., 2017a) and some positive PCR products were verified by directly sequencing.

RSMV Genomic Sequencing
Ten pairs of specific primers ( Table 2) were designed according to the reported RSMV Guangdong isolate (GenBank Accession No. KX525586.2) using the PrimerSelect program in DNASTAR 7.1 (Lasergene, United States) so as to amplify the complete genome sequence that including the 3 -terminal and 5 -terminal sequences. Total RNAs from RSMV-infected samples were used as a template and a one-step RNA PCR kit (TaKaRa, Dalian, China) was used according to manufacturer's instructions. The RT-PCR program was 50 • C for 30 min, 94 • C for 2 min; 35 cycles of 94 • C for 30 s, 55 • C for 30 s, and 72 • C for 1 min; and 72 • C for 10 min. The PCR products were analyzed by electrophoresis in 1.2% agarose gel (stained with 5 µl/100 ml GoldView), purified using the AxyPrePTM DNA gel Extraction Kit (AxyGEN), and sequenced directly in both directions with three replicates (Shanghai Biotech, Shanghai, China). Sequence assembly and analysis was performed using the SeqMan program in DNASTAR 7.1 (Lasergene), and the whole genome sequence of each isolate was obtained (Supplementary Table S7) and submitted to the Genbank. The accession numbers of each sequenced RSMV isolate are listed in Table 3.

Genomic Sequence Analysis
Multiple sequences analysis was carried out by MAFFT 7.149 software (Katoh and Standley, 2013) using the 12 RSMV isolates sequenced in this study and the previously obtained RSMV (Guangdong Luoding isolate). Molecular diversity among those RSMV isolates was determined using MEGA X (Kumar et al., 2018). Phylogenetic tree was constructed using MEGA X (Kumar et al., 2018) using the Maximum likelihood (ML) method. The genetic distance and base substitution type were calculated by the ML method (Maximum Composite Likelihood model) and ML statistical method with MEGA X, respectively. The extent and distribution of genetic variation among RSMV isolates was estimated by the average number of nucleotide differences per site (π) using DnaSP 5.0 (Librado and Rozas, 2009). Recombination site analysis in RSMV sequences was done using Recombination Detection Program (RDP) 4.97 (Martin et al., 2015). The ratio of non-synonymous and synonymous (d n /d s ) was calculated to estimate the selection pressure using the Datamonkey sever (Weaver et al., 2018) with single likelihood ancestor counting (SLAC), fixed effects likelihood (FEL) and Fast, Unconstrained Bayesian AppRoximation) (FUBAR) methods.

RSMV Distribution in Southern China
From May 2017 to May 2018, 459 suspected RSMV-infected rice samples were tested in three provinces (Guangdong, Guangxi, and Hainan) of southern China (Table 1). Results showed that 264 (57.5%) samples were infected with RSMV. Currently, RSMV mainly occurs in southwestern Guangdong, with a disease incidence of typically 5-10% but sometimes greater than 40% in some fields. In certain fields of Wuzhou and Hezhou in Guangxi, which is adjacent to Guangdong, the disease incidence was 1%-5%; RSMV-infected rice samples were also occasionally observed in central parts of Guangxi and Hainan (Figure 1).

Genomic Sequence Homology of RSMV Isolates
Twelve samples with different geographical origins were sequenced and compared with the sequence of the reference isolate. The 13 sequenced genomes were 12774 bp in length and contained seven ORFs. Nucleotide (nt) sequence identities ranged from 99.4 to 100% among the genome sequences of the 13 isolates ( Table 3) indicating high sequence conservation between isolates. Furthermore, the nt identities (99.4-99.7%) among Guangdong isolates were lower than the nt identities    Table S1).
Although sequence variability was observed in the untranslated regions between all ORFs, the conserved intergenic region motif (3 -AUUCUUUUUGCUCUGG-5 ) was conserved in all isolates tested.

Nucleotide and Amino Acid Variation Among RSMV Isolates
For each gene, the highest variability was observed in the P gene (nt and amino acid (aa) identities ranged from 98.9 to 100% and 98.9 to 100%, respectively), followed by the N, M, L, and G genes (Supplementary Tables S2-S6). No variability was observed in the P3 and P6 genes (data not shown). Analysis of the nucleotide variation of the 13 RSMV genome sequences revealed that transitions from A↔G (37.22%) and C↔U (34.86%) occurred more frequently than transversions from A↔U (4.34%), A↔C (9.93%), U↔G (10.29%), and C↔G FIGURE 2 | Diagrammatic representation of amino acid mutations among rice stripe mosaic virus (RSMV) isolates. The black and red lines indicate amino acid mutations at random sites among Guangdong RSMV isolates and specific sites among three RSMV isolates with different geographic origins, respectively.
(3.35%) ( Table 4). The above results indicated that the RSMV genome has a mutational bias for A↔G and C↔U transitions.
To estimate amino acid mutations (non-synonymous and synonymous) in all RSMV isolates, mutation analyses were performed for each protein encoded by RSMV with the reported isolate used as a reference sequence. The results showed that the majority of mutations were synonymous (Table 4). Furthermore, comparisons of multiple amino acid sequences showed that there were seven amino acid mutations in the N protein that mainly occurred in the Guangdong isolates; in the SG2 isolates, positions 40 and 294 were amino acids R and V, respectively, while in the other isolates they were K and A, respectively; in the SG1 isolate, positions 203 and 293 were R and T, respectively, while in other isolates they were K and A, respectively; in LJ1 and LJ2 isolates there was a G at position 365, while other isolates had a D; and in the TP2 isolate there was an L at position 449, while other isolates had a V. There were six amino acid mutations in the P protein, mostly occurring in the Guangdong isolates; the LJ1 isolate had the amino acid D at position 98, while other isolates had an N; in the SG1 isolate there was a T and G at positions 101 and 246, respectively, while in other isolates they were A and S, respectively; the SG2 isolate had a G at position 246, while other isolates had an S; and the TP2 isolate had a P and D at positions 110 and 155, respectively, while the other isolates had an L and E, respectively. There was one amino acid mutation in the M protein in the LD isolate that had a C at position 52 while the other isolates had an S. There were two amino acid mutations in the G protein, mainly occurring in the Guangdong isolates; the TP1 isolate had an I at position 101, while the other isolates had an M; and the SG1 isolate had an M at position 493, while the other isolates had an I. There were 32 amino acid mutations in the L protein, including six amino acid mutation sites in the Guangxi isolates, namely amino acids D, R, V, R, L, and N at positions 479, 966, 1759, 1805, 1876, and 1984, respectively; the latter five amino FIGURE 3 | Distribution of genetic variation along RSMV whole genome sequence estimated by nucleotide diversity (π). A 100-nt wide sliding window was used with a 25-nt step size. acid sites were specific mutation sites that were only found in all Guangxi isolates. There was one specific amino acid mutation site in the Hainan isolate, namely amino acid A at position 1491. The remaining 25 amino acid mutations occurred among Guangdong isolates and no amino acid mutations were found in the P3 and P6 proteins (Figure 2).

Analysis of Genetic Variation, Recombination and Selection Pressure
The whole genomic RSMV nucleotide diversity (π) was analyzed. The variation rates were mostly below 1.0% with a highest peak at the ending of L gene (Figure 3). There was no evidence of recombination in the seven genes of RSMV (data not shown).
The mean values of the d n /d s ratios were calculated for the seven genes based on the SLAC method ( Table 5). This result showed that the d n /d s ratio in all genes was significantly < 1, implying that all RSMV genes are under negative or purifying selection.

Phylogenetic Analysis
To reveal the relationship of the sequenced RSMV isolates, a phylogenetic tree was constructed using the nt complete genome sequences. The results showed that the RSMV isolates are divided into two groups based on their geographical origin; group 1 included Guangdong and Guangxi isolates, group 2 was formed by the Hainan isolates. Interestingly, the Guangxi and Hainan isolates showed little genetic distance, whereas higher divergence was observed in the Guangdong isolates, suggesting the genetic diversity among Guangdong isolates was greater than among the Guangxi and Hainan isolates (Figure 4).

DISCUSSION
A newly discovered rice virus, RSMV, commonly occurs in the southwestern region of Guangdong (Yang et al., 2017a). Our results showed that RSMV is present in three provinces

of southern China and commonly occurs in Wuzhou and
Hezhou, adjacent to the Guangdong Province. This indicates that RSMV is widely distributed in southern China and it is likely to spread to other Chinese rice regions or even to regions in Vietnam adjacent to Chinese rice-growing areas. In our investigation, all rice samples showed mosaic or stripe symptoms; most of these samples (264) were infected with RSMV but the remaining samples (195) were RSMV negative. These RSMVnegative samples may have been infected with other viruses that cause mosaic or stripe symptoms, such as rice stripe virus, or may be the result of environmental factors causing the rice plants to produce virus-like symptoms. Analysis of the genetic diversity of RSMV populations with different geographical origins can provide relevant information for understanding its genetic relationships, epidemiology, and dispersal. In our study, all RSMV isolates showed low genetic variability (less than 1% at the nt level, Table 3; and the variation rates were mostly below 1.0%, Figure 3). However, lettuce necrotic yellow virus (LNYV) was found to form two subgroups and differed significantly between them (about 20% at the nt and 4% at the aa level) based on the analysis of the N gene (Higgins et al., 2016), whereas the N gene sequence of Taro vein chlorosis virus isolates differed significantly between them (19.3% at the nt and 6.3% at the aa level) (Revill et al., 2005). This may indicate some evidence that RSMV is a "new" pathogen that emerged and dispersed recently. Additionally, the d N /d S rate was significantly < 1 in all genes ( Table 5), indicating that all genes were under negative selection pressure which was similar to that reported for coffee ringspot virus (CoRSV; Ramalho et al., 2016), orchid fleck virus (Kondo et al., 2017) and alfalfa dwarf virus (ADV; Samarfard et al., 2018). These results suggest that the insect vector may be involved in limiting RSMV genetic diversity. As is shown by the significantly lower mutation rate of aphid-transmitted cucumber mosaic virus (CMV) populations when compared with mechanically inoculated CMV populations, viral vectors can result in strong genetic bottlenecks in viral genetic diversity; this, in turn, promotes viruses that are better adjusted to plant-vector systems (Ali et al., 2006).
Most studies of the genetic diversity of plant rhabdoviruses are confined to the analysis of one or more genes, i.e., LNYV was found to form two subgroups based on the analysis of the N gene (Callaghan and Dietzgen, 2005); while CoRSV showed a strong geospatial relationship among isolates based on analysis of the N gene (Ramalho et al., 2016). Klerks et al. (2004) described the genetic diversity of strawberry crinkle virus based on the putative polymerase coding region. Pappi et al. (2016) analyzed the genetic diversity of eggplant mottled dwarf virus between seed and asexually propagated plants by sequencing 51.5% of the genome. Recently, Samarfard et al. (2018) analyzed the population diversity of ADV based on the N gene. In our study, the phylogenetic tree showed that the RSMV isolates divided into two groups (Figure 4), with isolates from different geographical sources located in different branches. However, the Guangdong Luoding isolate was most closely related to Guangxi isolates; geographically, Luoding is adjacent to Guangxi. Therefore, to understand whether the RSMV is related to their geographical origin, more RSMV isolates sequences should be added in a future study.
The base substitution types described here for RSMV are biased for transitions over transversions (Table 4), consistent with other plant viruses (Ge et al., 2007;Duffy and Holmes, 2009;Rao et al., 2017). Studies have shown that viral population genetic diversity resulting from base substitution is controlled by host-virus interactions (Schneider and Roossinck, 2001), so this bias might be caused by host preferences for viral genomic base types and base substitutions produced by the replication of the virus itself. Additionally, there are several specific deaminating enzymes in virus-infected plant cells that may affect base deamination when the viral genome is in a single-stranded state, thereby resulting in high transitions rates (van der Walt et al., 2008;Duffy and Holmes, 2009).
In plant rhabdoviruses, the P3 gene between the P and M genes encodes a viral movement protein that facilitates cell-to-cell transport (Jackson et al., 2005;Ammar et al., 2009;Mann and Dietzgen, 2014;Mann et al., 2016). The P6 gene product between the G and L genes shows RNA silencing suppressor activity (Guo et al., 2013), has been shown to be associated with virions, and may have a structural role (Huang et al., 2003). Our results showed that no amino acid mutations were observed in the P3 and P6 proteins among all RSMV isolates; this is most likely because the two genes are critical to the viral life cycle. Compared with Guangxi and Hainan isolates, Guangdong isolates show greater divergence, indicating that Guangdong isolates have a higher mutation rate. This may indicate that RSMV comes from Guangdong, although further studies are needed to confirm this hypothesis (Table 3). Notably, there are multiple specific amino acid sites in the L protein of Guangxi and Hainan isolates (Figure 2). The L protein of the negative-sense RNA virus has six conserved regions (CR): CR I, II, and IV are required for the RdRp-containing ring domain composition; CR III is involved in RNA polymerization; CR V is required for addition of the cap; and CR VI is involved in the cap methylation (Rahmeh et al., 2010;Ogino and Banerjee, 2011). The mutations among Guangxi isolates were mainly concentrated in the CR VI region and their biological functions should be examined in further in-depth studies. A previous report showed that a point mutation in CR II of the L protein encoded by the vesicular stomatitis virus induced sensitivity to high temperatures (Galloway and Wertz, 2009). Another study determined that interchange of the L polymerase protein between two strains of viral hemorrhagic septicemia virus altered temperature sensitivities in vitro (Kim et al., 2015). Therefore, we speculate that RSMV has adapted to environmental changes by changing the amino acids of the L protein during its dispersal; this hypothesis requires further study in the future.
Overall, we have reported the first study on the geographical distribution and occurrence of RSMV, as well as on the genetic variation of virus isolates with different geographical origins. Our results not only provide valuable molecular diversity and spatial distribution data for RSMV but also provide basic data for studying the evolutionary origin of negative-sense strand viruses.

AUTHOR CONTRIBUTIONS
GZ conceived and designed the experiments. XY and BC performed the experiments and wrote the draft. TZ analyzed the data and revised the manuscript. ZL and CX conducted field investigation and sample collection. All authors read and approved the final manuscript.