Discovery and Evolution of a Divergent Coronavirus in the Plateau Pika From China That Extends the Host Range of Alphacoronaviruses

Although plateau pikas are the keystone species in the plateau ecosystem of the Qinghai Province of China, little is known about their role in the evolution and transmission of viral pathogens, especially coronaviruses. Here, we describe the characterization and evolution of a novel alphacoronavirus, termed plateau pika coronavirus (PPCoV) P83, which has a prevalence of 4.5% in plateau pika fecal samples. In addition to classical gene order, the complete viral genome contains a unique nonstructural protein (NS2), several variable transcription regulatory sequences and a highly divergent spike protein. Phylogenetic analysis indicates that the newly discovered PPCoV falls into the genus Alphacoronavirus and is most closely related to rodent alphacoronaviruses. The co-speciation analysis shows that the phylogenetic trees of the alphacoronaviruses and their hosts are not always matched, suggesting inter-species transmission is common in alphacoronaviruses. And, PPCoV origin was estimated by molecular clock based on membrane and RNA-dependent RNA polymerase encoding genes, respectively, which revealed an apparent discrepancy with that of co-speciation analysis. PPCoV was detected mainly in intestinal samples, indicating a potential enteric tropism for the virus. Overall, this study extends the host range of alphacoronaviruses to a new order (Lagomorpha), indicating that plateau pikas may be the natural reservoir of PPCoV and play an important and long-term role in alphacoronavirus evolution.

The plateau pika (Ochotona curzoniae), also known as the black-lipped pika, belongs to family Ochotonidae within order Lagomorpha and is a keystone species of the Tibetan Plateau of China (Yang et al., 2011;Wu et al., 2019). They are small, non-hibernating, rabbit-like mammals with short limbs, rounded ears, no external tail, and inhabiting areas around 3,100 to 5,000 meters above sea level throughout the Tibetan Plateau of China (Yang et al., 2011;Yu et al., 2014;Su et al., 2016). Recently, plateau pikas have been considered as hosts for influenza viruses (H7N2, H9N2 and H5N1; Zhou et al., 2009;Yu et al., 2014;Su et al., 2016). Additional viruses from plateau pikas are rarely reported. Although betacoronaviruses and deltacoronaviruses have been identified in other wild animals in the Qinghai-Tibetan Plateau (Wu et al., 2018;Zhu et al., 2021b), little is known about the role of plateau pikas in the evolution and transmission of coronaviruses. In this study, we identified a novel coronavirus in the plateau pika from the Tibetan Plateau of China and explored the evolution and virus-host co-divergence within the genus Alphacoronavirus.

Samples
In July 2019, a total of 157 plateau pikas were captured from four sites (3,890-3,970 meters above sea level) in mountainous regions of Yushu Tibetan Autonomous Prefecture in Qinghai Province of China. The distance between sample collection sites is about 50 kilometers. All captured animals were clinically healthy and identified through morphological examination. Pikas were trapped and euthanized, and their respiratory tracts and intestinal contents were removed and preserved in 5-ml tubes directly. All sampling work was approved by the ethics committee of the National Institute for Communicable Disease Control and Prevention of China CDC (ICDC-2019012), and conducted by Yushu Prefecture Center for Disease Control and Prevention as part of plague surveillance.

RNA Extraction and Coronavirus Screening
Viral RNA of each fecal sample was extracted using QIAamp Viral RNA Mini kit (QIAGEN) and resuspended in 50 μl of DNase-free, RNase-free water. Conserved primers were used to amplify the 440-bp fragment of RNA-dependent RNA polymerase (RdRp) of coronaviruses (Woo et al., 2005). Reverse transcription-polymerase chain reaction (RT-PCR) was performed by PrimeScript™ One Step RT-PCR Kit Ver.2 (Takara) with 50°C for 30 min and 33 cycles of 94°C for 30s, 48°C for 30 s, 72°C for 30s, and a final extension at 72°C for 10 min. The PCR products were gel-purified before conducting Sanger sequencing.

Complete Genome Sequencing
The total RNA of a coronavirus positive sample was firstly rRNA-removed using the Ribo-Zero Gold rRNA Removal Kit. The TruSeq stranded total RNA library prep gold kit was used for library construction according to instructions. Paired reads (2 × 150 bp) were obtained using the Illumina HiSeq2000 platform, which were assembled into contigs using Trinity v2.8 (Grabherr et al., 2011). The assembled contigs were annotated in NCBI non-redundant protein database using Diamond (Buchfink et al., 2015). To confirm the assembled genome, clean data were mapped back to the genome using Bowtie 2 (Langmead and Salzberg, 2012) and assembled using SPAdes (Bankevich et al., 2012). Moreover, several primers were designed to amplify the NS2, S, and N genes followed by Sanger sequencing (Supplementary Table S2).

Genome and Phylogenetic Analyses
The deduced amino acid and nucleotide sequences of the open reading frames (ORFs) were found using ORF finder with default parameters and compared to those of other coronaviruses (Gao et al., 2003). The proteinase cleavage sites of CoVs were predicted using an online tool (Gao et al., 2003). The amino acid and nucleotide identities were calculated using BioAider version 1.334 (Zhou et al., 2020). Protein families and homology searches were performed by InterPro (Blum et al., 2021) and HMMER (Potter et al., 2018), respectively. Transmembrane helices in proteins were predicted by TMHMM with a hidden Markov model (Krogh et al., 2001). N-linked glycosylation and O-GalNAc (mucin type) glycosylation sites were predicated using web server. 1 The amino acid sequences were aligned using the MAFFT program (Katoh and Standley, 2013). Aligned regions were polished using Gblocks (Talavera and Castresana, 2007). Phylogenetic analyses were constructed using the maximum-likelihood method by PhyML 3.0 with IBV as the outgroup and bootstrap values of 1,000 (Guindon et al., 2010). Based on AIC criteria, the substitution models were selected using ModelFinder (Kalyaanamoorthy et al., 2017).

Estimation of Divergence Dates
The complete RdRp and M genes of PPCoV and related coronaviruses were acquired and aligned using the MAFFT program (Katoh and Standley, 2013). Sampling times were collected, and the divergence was estimated using a Bayesian Markov chain Monte Carlo (MCMC) method implemented in BEAST v 1.10 (Lau et al., 2012a). Analyses were performed using the SRD06 substitution model with uncorrelated relaxed clock. The MCMC run was 1 × 10 8 steps long, with sampling every 1,000 steps. The mean time of the most recent common ancestor (tMRCA) and the highest posterior density regions at 95% (HPDs) were calculated. All the ESS values of statistic parameters were greater than 200 and visualized using Tracer v1.7.1 (Suchard et al., 2001). The trees were summarized in a target tree (a 10% burn-in) by the Tree Annotator program included in the BEAST package.

Virus-Host Co-divergence and Recombination Detection
The host species were collected and used to infer phylogenetic trees of alphacoronavirus mammal hosts using TimeTree (Kumar et al., 2017). A phylogenetic virus tree was built based on polyproteins as described above. A virus-host co-divergence tree was constructed using Jane software package (Conow et al., 2010) with generations set at 100 and population size set at 100. Event costs were set at 0 for co-divergence, 1 for duplications, 1 for host-switching, and 1 for loss. Recombination analysis was conducted using both Simplot version 3.5.1 with setting parameters of the F84 model, window size 1,000 bp and step 200 bp (Lau et al., 2012b) and Recombination Detection Program (RDP) version 4.97 with default parameters (Martin and Rybicki, 2000).

Novel Alphacoronavirus in Plateau Pikas
A total of 157 plateau pikas were obtained from four sites in Qinghai-Tibet plateau (Qinghai Province) in the northwest of China. RT-PCR targeting a 440 bp fragment of the RdRp gene was conducted to screen potentially novel coronaviruses (CoVs). Viral RNA was positive in seven (4.5%) of 157 intestinal samples, with only positive in two respiratory samples of these seven pikas. All seven positive intestinal samples were collected from the same site, with overall detection rate of 13.7% at this site. Sequences of the screening PCR products revealed that they had <85.8% nucleotide identities to the corresponding sequences of known alphacoronaviruses, with 100% nucleotide identity to each other. Primer for multiple genome sites including NS2, S, and N genes were used to confirm the genome of all the positive samples, showing 100% nucleotide similarity among them. These results indicated that a new CoV (sharing <85.8% nucleotide identities to known alphacoronaviruses based on the 440 bp fragment) circulates in plateau pikas of this site. No obvious diseases were observed in plateau pikas infected with the new CoV. To further confirm the pathogenesis, sections of respiratory tracts of plateau pikas that were positive for CoV were checked by staining with hematoxylin-eosin (HE). No inflammation was observed in respiratory samples (Supplementary Figure S1).

Complete Genome Characterization
Since partial RdRp sequences indicated that the CoV may represent a new member of the genus Alphacoronavirus, we selected one (designated P83) of the CoV positive samples for high-throughput sequencing. A total of 144,123,796 reads were obtained. To verify the assembled viral genome, a total of 154,419 reads were mapped back to the coronavirus genome. The genome was temporarily named plateau pika coronavirus (PPCoV) P83 (GenBank accession number MZ577265) with a size of 28,312 nucleotides (without 5' and 3' rapid amplification) and 35.8% G + C content. The genome organization of PPCoV was similar to that of other related alphacoronaviruses, with classical gene order of 5'-replicase ORF1abspike (S)-envelope (E)-membrane (M)-nucleocapsid (N)-3' (Figure 1; Table 1). A putative transcription regulatory sequence (TRS) motif of 5'-AACUAA-3' was located upstream in majority of the genes (Table 1), which was similar to that in other alphacoronaviruses. Some variations of TRS motif were observed in the PPCoV genome for NS2a (5'-TACUUUAA-3'), E (5'-UACUAA-3'), and NS7a (5'-AAGUAA-3') genes ( Table 1), but no TRS motif was observed for NS9 gene. The putative nonstructural proteins (nsp) from ORF1ab with their corresponding cleavage sites are listed in Supplementary Table S1. The lengths of nsp2 and nsp4 in PPCoV differed from those of related alphacoronaviruses (Supplementary Table S1).

Phylogenetic Relationships
In order to evaluate the evolutionary status of PPCoV, we constructed phylogenetic trees based on aa sequences of ORF1ab, S, M, and N from genera Alphacoronavirus and Betacoronavirus (Figures 2,  3 and Supplementary Figure S3). The phylogenetic tree of ORF1ab (Figure 2) showed that PPCoV formed a distinct lineage within the genus Alphacoronavirus, and was closely related to the cluster of rodent alphacoronaviruses (viruses obtained from rat species; Tsoleridis et al., 2019). A slightly different clustering pattern was observed in phylogenetic trees based on M and N protein sequences, showing that PPCoV was clustered within the rodent coronavirus clade (Supplementary Figure S4). Evolutionary tree based on S protein sequences (Figure 3) revealed significantly different clustering patterns to phylogenetic trees based on ORF1ab, M, and N proteins. The tree of S indicated that PPCoV formed a divergent cluster with Shrew-CoV/Tibet2014, Wencheng Sm shrew CoV, SADS-CoV, bat coronavirus HKU2, and rodent alphacoronaviruses. However, the clade of the PPCoV as well as other phylogenetically related sub-genera all clustered together with the betacoronaviruses with high bootstrap values.

Recombination Analysis and Estimation of Divergence Times
All alpha CoVs clustered together in tree of ORF1ab, but some of alpha CoVs clustered together with the beta CoVs in tree of S. This discrepancy in clades between trees suggested a potential recombination for S gene. However, recombination analysis based on S genes of whole set of alpha-and betacoronaviruses indicated that no recombination event was detected for PPCoV (Supplementary Figure S5).
For tip-dating analysis, the RdRp and M genes of PPCoV and related alphacoronaviruses were selected and checked for potential recombination regions. Then, the entire sequence alignment (recombination region free) was used to evaluate the divergence time with relaxed clock model as previously described (Lau et al., 2012a,b). The most recent common ancestor (tMRCA) of PPCoV and rodent alphacoronaviruses was estimated to be in the year of 1935 (HPDs 1782 to 1987; approximately 84 years ago; Figure 4). The molecular clock analysis based on the M genes also showed the tMRCA of PPCoV and rodent alphacoronaviruses was at 1935 (HPDs 1,420 to 2,008; Supplementary Figure S6). The mean substitution rate of RdRp and M genes was calculated to be 2.1 × 10 −4 and 4.8 × 10 −4 per site per year, respectively, which is similar to previous report (Woo et al., 2012).

Virus-Host Co-divergence Analysis
The evolutionary history of a virus can mirror that of its host over long-term evolutionary timescales (Shi et al., 2016). A tanglegram was obtained by comparing the tree topologies of viruses and their hosts, which revealed a significant codivergence for alphacoronaviruses (p < 0.01). The viruses clearly clustered with their hosts at the order level ( Figure 5). However, the topology of hosts did not fully match those of viruses with 14 host switching events, two loss events, and three duplication events. Until now, the host species of alphacoronaviruses were clustered into six orders, including orders Artiodactyla, Carnivora, Chiroptera, Eulipotyphla, Rodentia, and Primates ( Figure 5). PPCoV was identified in Ochotona curzoniae belonging to order Lagomorpha, which is different from host taxon of other alphacoronaviruses. This extends the host range of alphacoronaviruses. Meanwhile, virushost co-speciation analysis showed that plateau pikas appear to be infected with the ancestor of PPCoV through crossspecies transmission (Figure 5), and indicated that the common ancestor for PPCoV and rodent alphacoronaviruses can now trace back to 11.6 (range 0.2-44.6) million years ago.

DISCUSSION
In this study, a novel alphacoronavirus (PPCoV) was detected and characterized from a new host (plateau pikas) in the Qinghai-Tibet Plateau of China. PPCoV was divergent from known alphacoronaviruses, showing 90% aa identities in three conserved replicase domains (3CLpro, ExoU, and NendoU), and <73.0% aa identities in S protein to other members of genus Alphacoronavirus. Moreover, phylogenetic analyses showed that PPCoV clusters together within the clade of rodent CoVs, but separates from the other viruses.
The genome of PPCoV contains an NS2a gene encoding a protein homologous to the cyclic phosphodiesterase superfamily. This gene seems to be nonessential for viral replication, but plays a role in pathogenicity of mouse hepatitis virus (MHV;de Haan et al., 2002;Lau et al., 2012b). Further studies are needed to understand the potential function of the NS2a protein.
Recently, it was shown that blocking N-glycan biosynthesis will enhance S protein proteolysis, which could decrease SARS-CoV-2 binding to host ACE2 and reduce viral entry (Yang et al., 2020). Thus, the N-glycosylated modifications in S protein FIGURE 5 | Co-phylogenetic analyses of alphacoronaviruses and their hosts. The host tree is labeled in black with coronavirus tree in blue. Co-divergence events, duplication events, host-switching events, and loss events were labeled with filled circles, empty circles, arrows, and dotted lines, respectively. The virus from this study is labeled with a red font.
of PPCoV may have an important role in viral virulence. By comparing all the trees, and observing discrepancies in clades between trees and values in Table 2, there may be recombination because all alpha CoVs cluster together in tree of ORF1ab, but some of alpha CoVs cluster together with the beta CoVs in tree of S. The identical PPCoV in seven samples indicated that PPCoV is circulating in this site, with a relatively high detection rate of 13.7% in the positive sampling site (Lau et al., 2012b;Wang et al., 2017;Mendenhall et al., 2019). The fact that the respiratory tract of plateau pikas infected with PPCoV showed no obvious pathological change, and that PPCoV was mainly detected in fecal samples with only two positive respiratory samples, suggests a benign enteric tropism for PPCoV, which is similar to that of bat coronavirus HKU10 (Lau et al., 2012a). Molecular-clock analysis estimates that the tMRCA of PPCoV and rodent alphacoronaviruses emerged around the year of 1935. The virus-host co-speciation analysis shows that alphacoronaviruses co-diverged with their hosts over long-time scales and had a complex evolutionary history with host-switching. Moreover, the study reveals that (1) plateau pikas may have been infected with the common ancestor of PPCoV through cross-species transmission from other unknown mammalian hosts; and (2) the genome of PPCoV represents the first genome of alphacoronaviruses from hosts belonging to the order Lagomorpha, thus broadening the host range of alphacoronaviruses into the seventh order. However, we must acknowledge that the time scales estimated by molecular clock and virus-host co-divergence analyses had a huge discrepancy. There are two possibilities for this discrepancy: (1) the molecular clock is not constant (Holmes, 2003;Sharp and Simmonds, 2011). Because the relative rates of substitution are different between sites along the sequence, and could change dramatically both between viruses and lineages, which lead to a substantial underestimation of divergence times (Holmes, 2003). Also, the genome of PPCoV may have changed by combination of multiple substitution and mutation to adapt to host and/ or environmental factors in the long evolutionary history, which influenced the result of tip-dating analysis. Thus, the co-speciation over millions of years between pikas and PPCoV is true. (2) The PPCoV is young as depicted by molecular clocks. It is well-known that cross-species transmissions between closely related host species occurs more easier than between distantly related hosts (Charleston and Robertson, 2002). Thus, PPCoV may in fact infect the pikas through occasional cross-species jump from a closely related animal species recently, and the match between PPCoV and host phylogenies gives a false impression of co-speciation. In addition, a previous report showed that the genus Alphacoronavirus may have originated early in Asian house shrews (Suncus murinus; Wang et al., 2017). Recently, another phylogenetically distinct genome of alphacoronavirus (Shrew-CoV/Tibet2014) was also identified in shrews that belong to order Eulipotyphla (Wu et al., 2018). Results of tMRCA and co-divergence analyses support shrews as emerging hosts of alphacoronaviruses and suggest that alphacoronaviruses may have emerged from Sorex araneus. The role of shrews in alphacoronavirus evolution needs continuous studies.
Although no evidence for recombination was detected for S gene of PPCoV, the recombination of coronaviruses at this site deserves future attention. Previously, we have identified a new deltacoronavirus in birds and marmots at the same sampling site (Zhu et al., 2021b). Interestingly, birds and plateau pikas may inhabit the same caves in the Qinghai-Tibet Plateau of China. Coinfection of different CoVs in the same sites may potentially create opportunities for new recombination and emergence of new viruses.
In conclusion, based on its host, phylogenetic status, and sampling location, PPCoV represents a novel member of the genus Alphacoronavirus. This study indicates that plateau pikas may play a significant and previously unrecognized role in the ecology of alphacoronaviruses, and that their role in coronavirus evolution merits further study.

DATA AVAILABILITY STATEMENT
The viral genome sequences obtained in this study have been deposited in GenBank under accession number MZ577265.

ETHICS STATEMENT
The animal study was reviewed and approved by the Ethics Committee of the National Institute for Communicable Disease Control and Prevention of China CDC.

AUTHOR CONTRIBUTIONS
JX contributed to conception and design of the study. WZ, SL, and SW collected the samples. WZ performed the experiments. WZ, JP, ZL, DJ, and X-lL performed the statistical analysis. JX, WZ, and JY wrote the first draft of the manuscript. JX, JY, and LL acquired the funding. All authors contributed to the article and approved the submitted version.