Genome Analysis of Coxsackievirus A4 Isolates From Hand, Foot, and Mouth Disease Cases in Shandong, China.

Coxsackievirus A4 (CVA4) is one of the most prevalent pathogens associated with hand, foot and mouth disease (HFMD), an acute febrile illness in children, and is also associated with acute localized exanthema, myocarditis, hepatitis and pancreatitis. Despite this, limited CVA4 genome sequences are currently available. Herein, complete genome sequences from CVA4 strains (n = 21), isolated from patients with HFMD in Shandong province, China between 2014 and 2016, were determined and phylogenetically characterized. Phylogenetic analysis of the VP1 gene from a larger CVA4 collection (n = 175) showed that CVA4 has evolved into four separable genotypes: A, B, C, and D; and genotype D could be further classified in to two sub-genotypes: D1 and D2. Each of the 21 newly described genomes derived from isolates that segregated with sub-genotype D2. The CVA4 genomes displayed significant intra-genotypic genetic diversity with frequent synonymous substitutions occurring at the third codon positions, particularly within the P2 region. However, VP1 was relatively stable and therefore represents a potential target for molecular diagnostics assays and also for the rational design of vaccine epitopes. The substitution rate of VP1 was estimated to be 5.12 × 10-3 substitutions/site/year, indicative of ongoing CVA4 evolution. Mutations at amino acid residue 169 in VP1 gene may be responsible for differing virulence of CVA4 strains. Bayesian skyline plot analysis showed that the population size of CVA4 has experienced several dynamic fluctuations since 1948. In summary, we describe the phylogenetic and molecular characterization of 21 complete genomes from CVA4 isolates which greatly enriches the known genomic diversity of CVA4 and underscores the need for further surveillance of CVA4 in China.


INTRODUCTION
Human enteroviruses (EVs), non-enveloped, single-stranded RNA viruses, are taxonomically classified in the genus Enterovirus (Dalldorf, 1953), family Picornaviridae and consist of four species: EV-A, EV-B, EV-C, and EV-D (Knowles et al., 2012). Coxsackievirus A4 (CVA4), a member of the EV-A species, is currently composed of 25 types including the common pathogens enterovirus A71 (EVA71) and Coxsackievirus A16 (CVA16) which are the most prevalent agents isolated from pediatric cases of hand-foot-mouth disease (HFMD), a common contagious disease among children which sporadically occur worldwide .
The CVA4 prototype strain, High Point (GenBank accession number: AY421762), was first isolated from urban sewage during a poliomyelitis outbreak in North Carolina, United States in 1948 (Melnick, 1950;Oberste et al., 2004). In Carey and Myers (1969), another CVA4 strain was isolated from the serum of a child with fatal illness marked by severe central nervous system manifestations. In Estrada Gonzalez and Mas (1977), found that CVA4 was associated with the occurrence of acute polyradiculoneuritis. Notably, CVA4 was also reported as pathogens in HFMD outbreaks (Hu et al., 2011). Associations between CVA4 infection and herpangina (Cai et al., 2015), mucocutaneous lymph node syndrome (Ueda et al., 2015), and bilateral idiopathic retinal vasculitis (Mine et al., 2017) have also been reported. These clinical findings highlight that CVA4 infections exert a disease burden to global public health, especially for neonates and young children which necessitates a greater understanding of the genetic diversity of this pathogen.
Few complete genomes from CVA4 strains were sequenced prior to 2004 and were predominantly from Asia and the United States (Oberste et al., 2004). However, in 2004 and 2006, CVA4 caused two HFMD epidemics in Taiwan (Chu et al., 2011). After 2006, a greater number of CVA4 strains were isolated in Europe and Asia, with the majority from China. For example, seven samples from acute flaccid paralysis (AFP) patients in Shaanxi province collected between 2006 and 2010 were diagnosed with CVA4 infection . Since 2008, predominantly partial VP1 and a far smaller number of complete genome sequences of multiple CVA4 strains have been reported from clinical specimens from pediatric HFMD cases in mainland China, such as Gansu (Liu et al., 2009), Shenzhen (Hu et al., 2011;Cai et al., 2015), Guangdong (Zhen et al., 2014), and Beijing .
The CVA4 RNA genome (∼7434 nt) can be sub-divided into the 5 -untranslated region (UTR), 3 -UTR, and the open reading frame (ORF) (∼6606 nt) encoding one polyprotein comprising four structural proteins (VP4, VP2, VP3, and VP1 in region P1) and seven non-structural proteins (2A, 2B, and 2C in region P2, and 3A, 3B, 3C, and 3D in region P3). The VP1 gene is typically employed for taxonomic classification of EV types within the genus Enterovirus (Palacios et al., 2002). Currently, few fulllength genome sequences of CVA4 (n = 10) are available in the GenBank database, which limits the understanding of CVA4 genome evolution and necessitates a greater understanding of the genetic diversity in this pathogen to provide an evidence base for the rational development of molecular diagnostics, drug development and vaccine design.
In the present study, we describe the full-length genome sequences (n = 21) from CVA4 isolates from pediatric HFMD patients from Shandong province, China identified between 2014 and 2016 and the phylogenetic and molecular characterization of CVA4 VP1 sequences (n = 175). These findings deepen our understanding of CVA4 diversity and evolution and have important implications to mitigate global disease burden attributable to this pathogen.

Clinical Samples
In this study, stool specimens from children <6 years of age presenting with HFMD were collected between 2014 and 2016, and EV-positive clinical samples which were laboratory-confirmed as non-EVA71 and non-CVA16 infections were kindly provided by the Shandong Center for Disease Control and Prevention.

CVA4 Isolation and RNA Extraction From Clinical Samples
Human rhabdomyosarcoma (RD) cells were grown in MEM (minimal essential medium, Gibco) supplemented with 10% FBS (fetal bovine serum, Gibco), 50 IU/mL of penicillin and 50 µg/mL of streptomycin. The samples were propagated three times in human RD cells at 37 • C in a humid atmosphere under 5% CO 2 .
Total RNA was extracted from the RD supernatant after development of cytopathic effect (CPE) using the RNAiso Plus kit (TaKaRa, 9109). RNA was reverse transcribed into cDNA using random hexamers with the PrimeScript RT Reagent kit (TaKaRa, RR037A). TaqMan-based real-time PCR assays were performed to detect the presence of EV using an ABI 7500 Real-Time PCR System, as previously described (Zhang et al., 2017a,b). The EV positive samples (n = 21) were sequenced using the MiSeq high-throughput sequencing platform, and the sequencing data were analyzed using the CLC program to obtain the complete viral genome sequences. Non-EVA71 and non-CVA16 samples from HFMD cases were further tested by a CVA4-specific real-time PCR assay employing oligonucleotide primers and probe designed based on the VP1 gene of the prototype CVA4 strain (GenBank accession number: AY421762): CVA4-F: TATGGGCTTTGTCCAACTCC, CVA4-R: GTCTAGGGACCCATGCCCTCACT, CVA4-probe: FAM-TGGGGACATTTTCAGCTAGAGTTGTGAGCAAG-BHQ1.

Phylogenetic Analysis of CVA4
For phylogenetic analysis, two datasets including all available complete genomic sequences (n = 10) and the full-length sequences of VP1 gene (n = 154) of CVA4 strains were downloaded from GenBank. Multiple sequence alignments for the two datasets were performed using Muscle (Edgar, 2004), together with the Shandong CVA4 sequenced genomes (n = 21). The nucleotide substitution model was chosen using jModeltest (Darriba et al., 2012), and the general time reversible (GTR) model was always the best model for analysis of both datasets. Phylogenetic analysis of the two datasets was conducted using RAxML v8.1.6 (Stamatakis et al., 2005), with the GTRGAMMA nucleotide substitution model with 1000 bootstrap replicates. The nucleotide and amino acid distances were estimated using MEGA 5 (Tamura et al., 2011).

Genetic Diversity Along the CVA4 Polyprotein Genes
The average pairwise genetic diversity along the CVA4 genomes was calculated using Phylip with a sliding window of 300 nucleotides and a step size of 50 nucleotides (Faria et al., 2017). To identify at which site of the codons that nucleotide variations mostly occur, the Shannon entropy (Sn) at positions 1, 2, and 3 of each codon of the aligned polyprotein genes and the VP4, VP2, VP3, VP1, 2A, 2B, 2C, 3A, 3B, 3C, and 3D genes, respectively, were calculated as reported previously (Ramirez et al., 2013). The data processing was performed on the open source R environment (R Development Core Team, 2012), using the ggplot2 package.

Evolutionary Dynamics of Global CVA4 Strains
To determine the molecular clock-like structure in the CVA4 VP1 data, the maximum likelihood tree of the VP1 gene sequences estimated in the previous step was used to study the association between sequence divergence and sampling times using TempEst (Rambaut et al., 2016). To better understand the evolutionary dynamics of CVA4, the VP1 sequence dataset (n = 175) was used to estimate the nucleotide substitution rate using Bayesian Markov chain Monte Carlo (MCMC) sampling implemented in the BEAST v1.8.4 package (Drummond and Rambaut, 2007). The SRD09 model was employed as the nucleotide substitution model. The Bayesian MCMC process was run for 100 million steps, with a sampling frequency of every 5,000 steps and the first 10% of the steps were discarded as burn-in. To select the best combination of the molecular clock and tree prior, both path sampling and stepping-stone sampling (Baele et al., 2012) were employed. The best model combination was a Bayesian skyline tree prior and an uncorrelated relaxed molecular clock model, with log-normally distributed variation in rates among branches (Supplementary Table S1) (Drummond et al., 2005). The posterior distributions were evaluated using Tracer v1.6 1 where the estimated sample size (ESS) was no less than 200. The Bayesian maximum clade credibility (MCC) tree was generated using TreeAnnotator v1.8.4, and then visualized using FigTree v1.4.3 2 . In addition, we performed a Bayesian skyline plot analysis using Tracer v1.6 program to reconstruct the evolutionary history based on the 175 full-length VP1 gene sequences of CVA4 strains.

Dataset Summary
In this study, a total of 21 CVA4 strains were isolated from stool specimens of HFMD patients in Shandong Province, including five from Liaocheng, three from Heze, two each from Dezhou, Linyi, Weihai, Yantai and single samples from Binzhou, Laiwu, Taian, Qingdao and Rizhao (Figure 1 and Supplementary  Table S2). The whole genome sequences of the 21 isolates have been deposited into the GenBank database: MH086030-MH086050 (Supplementary Table S2).
In the maximum likelihood tree of VP1 genes (Supplementary Figure S2), the CVA4 isolates were divided into four highly supported genotypes: A (n = 1), B (n = 1), C (n = 28), and D (n = 145). The prototype High Point strain isolated from the United States in 1948 was classified as genotype A, and one isolate from Kenya in 1999 was designated as genotype B. The CVA4 strains segregating with genotype C were collected from Russia during 2006-2014 (n = 14), India in 2010 (n = 8), the United States in 1999 (n = 1) and 2015 (n = 1), Sichuan province, China in 2006 (n = 1) and 2007 (n = 1), Azerbaijan (n = 1) and Turkmenistan (n = 1), respectively. Strikingly, genotype D was the predominant genotype in China. Genotype FIGURE 1 | The maximum likelihood phylogenetic tree of the CVA4 whole genome sequences (n = 31). The 21 CVA4 isolates described in this study are colored in rose and the nodes of the reference stains downloaded from GenBank are colored in lavender. The three clades (I, II, and III) are colored in blue, purple and light green, and the prototype strain CVA4 High Point (AY421762) is colored in red.
D could be further divided into two highly supported subgenotypes: D1 (n = 5) and D2 (n = 140). In detail, five Chinese strains isolated before 2006 belonged to sub-genotype D1, and the remaining Chinese strains clustered within sub-genotype D2, including the 21 isolates described in the present study. However, our sequenced strains did not cluster together, and were interspersed within sub-genotype D2. Notably, there was one CVA4 strain from Russia, KC879539/40238/Russia/2011, falling within sub-genotype D2, and sharing high homology (97.5%) with KY978552/11142/SD/China/2011.

Pairwise Genetic Diversity Along the CVA4 Genome
Employing a sliding windows analysis across CVA4 genomes (n = 31), we found that the pairwise genetic diversity at regions P2 (including the 2A, 2B, and 2C genes) and P3 (including 3A, 3B, 3C, and 3D genes) was higher than that at region P1 (including VP4, VP2, VP3, and VP1 genes) (Figure 2A). In addition, 3A, 3B and the 3 terminus of 3D showed higher pairwise genetic diversity than other gene regions. The polyprotein gene of the 21 CVA4 genomes from the present study had 82.6-99.2% nucleotide sequence homology and a corresponding 97.1-99.9% amino acid sequence homology, suggesting the occurrence of numerous synonymous mutations. Consistent with this observation, the mean Sn at the third position of the codons of the polyprotein gene of the CVA4 strains was 0.224, significantly higher than those at positions 1 and 2 of the codons (Figure 2B). Furthermore, the mean Sn values at the third position of the codons of 2B, 2C, 3A, 3C, and 3D genes were significant higher than those of the VP1, VP2, VP3, and VP4 genes in the region P1 ( Figure 2C).

Evolutionary Dynamics of Worldwide CVA4
The Bayesian phylogenetic tree of all CVA4 VP1 genes (n = 175) reconstructed employing the best-fit parameter model revealed the same topology as the maximum likelihood phylogenetic tree (Figure 3). The Tempest analysis of molecular clock structure revealed a strong temporal correlation in the VP1 gene (R 2 = 0.86). The mean nucleotide substitution rate for the CVA4 VP1 genes was estimated to be 5.12 × 10 −3 substitutions/site/year (95% highest posterior density (HPD): 4.45 × 10 −3 -5.81 × 10 −3 substitutions/site/year). Around the year 1941 (95% HPD: 1931HPD: -1948, the CVA4 viruses were estimated to diverge into genotype A and the common ancestry of genotypes B, C, and D. Genotype B was suggested to have diverged from the common ancestry around 1951 (95% HPD: 1933HPD: -1966, and finally genotypes C and the genotype D diverged around 1970 (95% HPD: 1961HPD: -1977. Furthermore, genotype D was further divided into sub-genotypes D1 and D2 which occurred approximately in 1979 (95% HPD: 1974HPD: -1985, and the time to the most recent common ancestor (tMRCA) of the sub-genotype D2 could be traced back to 2001(95% HPD: 1999-2003. Additionally, we performed a Bayesian skyline plot analysis to reconstruct the demographic history of CVA4 based on the VP1 gene sequences (Figure 3). Our results showed that the effective population size (analog to the number of infected individuals) of CVA4 virus experienced five major stages since the prototype strain High Point was described in 1948 (Figure 3). The first stage covered 1948 to 2000 where the population size was relatively constant, with evidence of a slight increase. During this stage, genotypes B, C, and D were already present before 1975, as well as the emergence of genotype D1 in the early 1980s and the subsequent co-circulation of genotypes B, C, and sub-genotype D1. In the second stage, the population size then experienced a sharp and rapid decrease from 2000 to approximately mid-2003 possibly due to a reduction in the prevalence of genotype B and sub-genotype D1 (or potentially a lack of surveillance data), as genotype C and sub-genotype D2 diversified during this period. Soon after mid-2003, the population size of CVA4 experienced a remarkable and rapid growth until 2010, with the widespread circulation of genotype C in Russia and in India, and sub-genotype D2 in China. After 2010, there was another sharp and abrupt decrease in the effective population size which may have been caused by the decreased prevalence of genotype C. Since 2011, CVA4 appears to have entered a rebound stage with the population size becoming stable again with sub-genotype D2 dominating in China.

DISCUSSION
Although EVA71 and CVA16 are known as the major causative agents of HFMD in China, a number of other EV-A species including CVA6, CVA10, CVA2, and CVA4 have been reported to be associated with HFMD 3 . In addition, CVA4 was also one of the common pathogens associated with cases of aseptic meningitis, herpetic angina, and viral myocarditis (Carey and Myers, 1969;Estrada Gonzalez and Mas, 1977;Lee et al., 2014;Wang et al., 2016), which has raised serious public health concern and speculation as to whether there is an altered tropism or pathogenicity associated with this CV type. However, detailed genome analysis has been problematic because there were only ten complete CVA4 genomes available in the GenBank database to date (Hu et al., 2011). Therefore, the molecular epidemiological characterization of CVA4 remains far well less understood due to a paucity of available genomic data.
In this study, we have described 21 novel CVA4 genomes isolated from children with HFMD and revealed that our isolates clustered into three highly supported clades in a maximum likelihood tree estimated using the whole genomes. This suggested that there was a certain genetic diversity in the Chinese CVA4 viruses, which would have been underestimated due to the limited surveillance data available. Phylogenetic analysis of the VP1 gene sequences showed that the CVA4 strains could be classified into four separable genotypes, A-D,   Contributes to viral infectivity in vitro and mouse lethality in vivo (Huang et al., 2012) K149I Alters the tropism in a receptor-dependent or -independent manner (Miyamura et al., 2011) Efficient virus replication in Chinese hamster ovary cells (Chua et al., 2008) VP1 Important residue for mouse adaptation  Increased virulence and neuro-tropism in adult interferon-deficient mice (Caine et al., 2016) Resulting in a strong temperature-sensitive phenotype (Arita et al., 2005) C363I C C C (n = 20), Y (n = 1) a Numbering according to EVA71 (U22521).
with a mean within genotype genetic distance of 0.00-13.2% and the mean between genotype genetic distance of 20.4-28.9%. Therefore, consistent with previous reports (Rico-Hesse et al., 1987;Hu et al., 2011), different CVA4 genotypes could be defined with >15% of between genotype nucleotide distance in the VP1 gene. However, in contrast with a previous report (Hu et al., 2011), the prototype strain High Point formed an independent branch as genotype A in our classification. The vast majority of the Chinese strains including the 21 isolates in this study belonged to genotype D, with just two strains identified from Sichuan province in 2006 and 2007 belonging to genotype C. In addition, based on the available evidence sub-genotype D2 appears to have supplanted D1 and become predominant in China over the previous decade. Further analysis showed that mutations were more likely to occur in the P2 and P3 regions encoding non-structural proteins and at the third position of the codons, consistent with previous reports that there was a general synonymous codon usage pattern for many types of enteroviruses (Liu et al., 2011;Ma et al., 2014;Zhang et al., 2014;Su et al., 2017). Moreover, the VP1 gene was the most conserved region in our analysis, suggesting that it may be a potential target to design a sensitive and CVA4-specific RT-qPCR diagnostic assay and may also be a candidate epitope for development of vaccines.
Since the mid-1950s, the genetic diversity of CVA4 has gradually increased from 1948 to 1999 and experienced a sharp increase between mid-2003 and 2009, paralleling the increased prevalence of CVA4 in China. However, the population size was estimated to have experienced more substantial declines during 1999 and mid-2003 and from 2009 to 2011 than those in a previous report (Chen et al., 2018). Considering that the precision of demographic reconstruction analysis is sensitive to sampling, pinpointing the abrupt decrease of population size of CVA4 between 1999 and mid-2003 would require further sampling efforts. Therefore, national surveillance of various human enteroviruses causing HFMD is urgently needed to elucidate the complex evolutionary dynamics and co-circulation of the enteroviruses to provide an evidence base for rational diagnostic and prophylactic approaches to mitigate disease burden.
The mean evolutionary rate of the CVA4 VP1 gene was estimated to be 5.12 × 10 −3 substitutions/site/year, slightly lower but of a similar range to 6.4 × 10 −3 substitutions/site/year estimated by Chen et al. (2018). We estimated the common ancestor of known CVA4 to have existed approximately 75 years ago, also similar to a previous estimate (71.8 years ago) (Tee et al., 2010). Interestingly, the evolutionary rate of EVA71 VP1 was estimated to be around 4.6 × 10 −3 substitutions/site/year for both EVA71 genogroups B and C (Tee et al., 2010;Chen et al., 2018). The evolutionary rate of CVA6 was estimated at 8.1 × 10 −3 substitutions/site/year (Puenpa et al., 2016), and that of CVA16 was 6.7 × 10 −3 substitutions/site/year (Zhao et al., 2016). Therefore, CVA4 appears to have been evolving at a similar rate to the other major HFMD pathogens, which deserves further attention.
Coxsackievirus A4 shared phenotype-associated amino acid substitutions when compared to the EVA71 prototype strain, BrCr. Of particular note, the VP1 169F identified in EVA71 may have conferred viral binding ability to the mammalian SCARB2 receptor, also found in the CVA4 isolates (Yamayoshi et al., 2012). In addition, there were several sites with different amino acid motifs between BrCr and the CVA4 viruses. However, it should be noted that although the biological relevance of these sites with specific amino acid substitutions have been confirmed in EVA71, they were identified through sequence comparison between CVA4 and EVA71, Therefore, the effects of these critical amino acids in CVA4 on receptor binding, viral infectivity, replication mammalian adaptation and pathogenesis warrant further studies.
Additionally, we also detected potential genetic recombination events within the CVA4 genomes (n = 31) using the Recombination Detection Program (RDP) v4.16 (Martin et al., 2015) and Simplot v3.5.1 (Hu et al., 2011). However, we failed to detect convincing recombination events in the 21 newly sequenced CVA4 viruses (data not shown). Overall it appears that genetic substitution, rather than recombination, has played a more important role in the evolution and diversification of CVA4, despite recombination events being frequently identified in other human enteroviruses (Hu et al., 2011). However, limited surveillance data may also have led to an underestimation of the number of genetic recombination events on CVA4 evolution.
In summary, we have described 21 novel CVA4 genome sequences, which greatly enriches the available genomic data for CVA4. Phylogenetic analysis revealed that genotypes C, D1 and D2 were co-circulating in the early 2000s and D2 has now supplanted D1 to become the predominant sub-genotype in China. The genetic diversity among Chinese CVA4, occurred at regions encoding the non-structural proteins and at the third wobble position of the codons. In contrast, fewer mutations occurred at the VP1 gene region, which makes it an optimal candidate region for design of molecular assays and potentially as a conserved epitope for vaccination. Notably, CVA4 may possess higher binding affinity to SCARB2 because of the substitution L169F in the VP1 protein and how this substitution affects the viral infectivity and clinical outcome should be investigated in the future. In addition to monitoring EVA71 and CVA16, enhanced surveillance of increasingly prevalent and also virulent agents, including CVA4, in China is urgently needed to better understand the dynamics of circulation and evolution of human enteroviruses, which will provide invaluable information for diagnostic and prophylactic approaches to contain disease.

AUTHOR CONTRIBUTIONS
MW, JL, and W-FS designed the study, participated in all tests and drafted the manuscript. M-XY, Y-WZ, S-JD, and X-JW participated in collecting and testing samples. TH, MC, SD, X-CZ, Z-JZ, HZ, and Y-GT conceived the study, contributed to the analysis of the results and preparation of revised manuscript versions. All authors read and approved the final manuscript.