Sign of APOBEC editing, purifying selection, frameshift, and in-frame nonsense mutations in the microevolution of lumpy skin disease virus

The lumpy skin disease virus (LSDV), which mostly affects ruminants and causes huge-economic loss, was endemic in Africa, caused outbreaks in the Middle East, and was recently detected in Russia, Serbia, Greece, Bulgaria, Kazakhstan, China, Taiwan, Vietnam, Thailand, and India. However, the role of evolutionary drivers such as codon selection, negative/purifying selection, APOBEC editing, and genetic variations such as frameshift and in-frame nonsense mutations in the LSDVs, which cause outbreaks in cattle in various countries, are still largely unknown. In the present study, a frameshift mutation in LSDV035, LSDV019, LSDV134, and LSDV144 genes and in-frame non-sense mutations in LSDV026, LSDV086, LSDV087, LSDV114, LSDV130, LSDV131, LSDV145, LSDV154, LSDV155, LSDV057, and LSDV081 genes were revealed among different clusters. Based on the available complete genome sequences, the prototype wild-type cluster-1.2.1 virus has been found in other than Africa only in India, the wild-type cluster-1.2.2 virus found in Africa were spread outside Africa, and the recombinant viruses spreading only in Asia and Russia. Although LSD viruses circulating in different countries form a specific cluster, the viruses detected in each specific country are distinguished by frameshift and in-frame nonsense mutations. Furthermore, the present study has brought to light that the selection pressure for codons usage bias is mostly exerted by purifying selection, and this process is possibly caused by APOBEC editing. Overall, the present study sheds light on microevolutions in LSDV, expected to help in future studies towards disturbed ORFs, epidemiological diagnostics, attenuation/vaccine reverts, and predicting the evolutionary direction of LSDVs.


Introduction
Lumpy skin disease is a double-stranded DNA virus infection caused by the Lumpy skin disease virus (LSDV), which is causing significant economic losses in many countries by causing decreased milk production, abortions, infertility, reduced sperm quality, and damaged hides in cattle (Abutarbush et al., 2015;Lu et al., 2021;Kumar and Tripathi, 2022;Shumilova et al., 2022).pathogenicity, and virulence (Biswas et al., 2020;Senkevich et al., 2020Senkevich et al., , 2021;;Brennan et al., 2022).In a few viruses recently detected in the LSDV outbreak, LSDV019 and LSDV144 genes have been reported to be two fragments, LSDV019a, LSDV019b, and LSDV144a, LSDV144b, respectively, through frameshift mutations (Kumar et al., 2023).In this situation, frameshift mutations, in-frame nonsense mutations, codon selection pressure, positive selection pressure, negative/purifying selection pressure, and APOBEC editing among different clusters in LSD viruses are largely unknown.By understanding such evolutionary development, it will be helpful to determine which types of viruses and their genetic variation are spreading in different geographical areas.
In this present study, frameshift mutations and in-frame nonsense mutations in LSD viruses were found by systematically analyzing almost all complete genome sequences of LSD virus in the NCBI public database, and it has brought to light that there is a sign of selection pressure, purifying selection, and APOBEC editing in the microevolution of these viruses.

Data collection and data curation
In the present study, almost all complete genome sequences of the LSD viruses were retrieved from the NCBI public database.The complete genome sequences were aligned using the MAFFT 7.407_1 alignment program with the parameters of Gap extend penalty-0.123and Gap opening penalty-1.53(Katoh and Standley, 2013;Mareuil et al., 2017;Lemoine et al., 2019).Further, the quality of the alignments was checked and curated using the LAST Plot hits utilizing the MAFFT version 7 of the online server tool with a score of 39 (E = 8.4e −11 ) 1 (Katoh and Frith, 2012;Katoh et al., 2019).When these nucleotide sequences were aligned, it was revealed that the three sequences OK422492.1/India/2019/Ranchi-1/P10,OK422493.1//India/2019/Ranchi-1/P3, and ON400507.1/208/PVNRTVU/202submitted to the NCBI public database from India had the highest nucleotide diversity.Subsequently, LAST hits plot analysis revealed that these three sequences were submitted as reverse complement compared to NCBI reference sequences NC_003027.1_LSDV_NI-249(Supplementary Figures S1A-D).Therefore, throughout this present study, we have converted these three sequences into reverse complements and subjected them to analysis.Also, we have retrieved LSDV complete genome sequences from SRA run files SRR21590382, SRR21590384, SRR21590385, SRR21590386, and SRR21590383 related to LSDV submitted to NCBI public database from India and subjected to analysis.Briefly, these SRA run files were subjected to quality control using Trimmomatic (Bolger et al., 2014) to filter and remove low-quality reads and potential adopters sequences from the reads (Vilsker et al., 2019).From these quality control passed reads, the virus-specific reads were filtered using the protein-based alignment method DIAMOND (Buchfink et al., 2015), and de novo assembled using metaSPAdes (Bankevich et al., 2012), de novo assembled virus sequences recognized using Blastx and Blastn in the 1 https://mafft.cbrc.jp/alignment/server/10. 3389/fmicb.2023.1214414Frontiers in Microbiology 03 frontiersin.orgNCBI RefSeq virus database.Further, individual contig was aligned in the advanced genome aligner (AGA) (Deforche, 2017), and consensus variant caller GATK/BcfTools were used in the analyses.Confirm annotated variants and SNPs and mismatches with Raw reads were presented in Supplementary Data 10.

Phylogenetic analysis
In the present study, the complete genome nucleotide sequences of LSD viruses were aligned in MAFFT 7.407_1, and then phylogenetic analysis was performed in PhyML 3.3_1 (Galaxy Version 3.3_1) (Mareuil et al., 2017).The GTR (evolutionary model), discrete gamma model (categories with the n = 4), empirical (equilibrium frequencies), subtree pruning and regraphing with tree topology search with tree topology, model parameter, and branch length, and then branch support were tested with approximate Bayes branch.Subsequently, the generated phylogenetic tree with the above parameters was visualized in the interactive tree of life (iTOL)-v5 (Letunic and Bork, 2021).

Net between group mean distance analysis
In the present study, the whole genome nucleotide sequences of the LSD viruses were aligned in MAFFT 7.407_1, and then the aligned nucleotide sequences were used for the NBGMD analysis in MEGA7 (Kumar et al., 2016) using the Kimura two-parameter model with the transitions + transversions substitution, gamma distribution (shape parameter = 5), gaps/missing data were deleted by pairwise deletion, and finally, the standard errors for the NBGMD analysis were calculated by the bootstrap of 1,000 replicates.The calculated standard error in the analysis was presented above the diagonal of the result table.

SimPlot analysis
The SimPlot 3.5.1 (Desingu et al., 2022a,b) tool was used to determine the per cent identity/similarities among the different clusters of LSD viruses against reference sequences.In this study, the whole genome nucleotide sequences of the LSD viruses were aligned in MAFFT 7.407_1.Then the aligned nucleotide sequences were exported to SimPlot 3.5.1 tool for the subsequent analysis using the Kimura (two-parameter) method and base pairs of the window of 500 at a base-pair step of 50.

Recombination detection program analysis
The complete genome sequences of LSDVs were aligned in the MAFFT 7.407_1 and then exported to RDP4 (Martin et al., 2015) for the recombination analysis.The recombination analysis was performed using default parameter values for the RDP, GENECONV, BOOTSCAN, Chimaera, 3seq, SISCAN, and MaxChi methods, and a minimum of four approaches was assessed for possible recombination using a Bonferroni corrected p-value cut-off (0.05).

Measurement of nucleotide/amino acid mismatch, transition/transversion, and silent/non-silent mutation
The nucleotide/amino acid mismatch, transition/transversion, and silent/non-silent mutations were measured among the different clusters of LSD viruses against the reference sequence at the complete genome levels, gene levels, and genes that are transcribed in the forward and reverse directions by the Highlighter tool (Keele et al., 2008) with or without similarity sorting of the sequences, with/ without treating the gaps as a character, and the reference sequences used in the analysis were displayed in the respective figures.Additionally, the nucleotide/amino acid mismatch was also visualized by the online Variant Visualizer, 2 and the reference sequences used in the analysis were displayed in the respective figures.

Measurement of APOBEC motif mutations and dN/dS ratio
The APOBEC motif mutations in the LSD viruses' complete genome sequences and genes transcribed in the forward/reverse directions were determined in the Hypermut 2.0 tool with the customized options (Rose and Korber, 2000) against the reference sequences, and the reference sequences used were displayed in the respective figures.Next, the dN/dS ratio in the LSD virus's genes transcribed in the forward/reverse directions was measured in the SNAP v2.1.1 (Ota and Nei, 1994;Ganeshan et al., 1997;Korber, 2000).The reference sequences used in the analysis were displayed in the respective figures.

Nucleotide sequence composition analysis
The nucleotide composition (A%, T%, G%, and C%) of the LSD viruses' complete genome sequences and genes transcribed in the forward/reverse directions were determined in MEGA7 (Kumar et al., 2016) and Automated Codon Usage Analysis (ACUA) Software (Vetrivel et al., 2007).

Effective number of codons
The effective number of codon usage from 61 codons for the 20 amino acids for the LSD viruses' genes transcribed in the forward/ reverse directions was determined in version 6 of the DNA Sequence Polymorphism (DnaSP) software (DnaSP 6) (Rozas et al., 2017).

Genetic diversity at the complete genome levels
To understand the phylogenetic relationship of LSD viruses at the whole genome level, we retrieved almost all LSDV complete genome sequences from the NCBI public database and performed phylogenetic analysis.In the phylogenetic analysis, viruses related to wild-type were clustered into cluster-1.2,viruses related to vaccine into cluster-1.1,and recombinant viruses clustered into recombinant-cluster (Figure 1A), as reported by the previous studies (van Schalkwyk et al., 2020;Krotova et al., 2022a,b;Ma et al., 2022).Consistent with this, we also observed the recombinant events in the recombinant viruses by recombination detection program (RDP) analysis (Supplementary Figures S1-S3).Further, with the topology of the phylogenetic tree, LSD viruses, we have divided cluster-1.1 viruses into five sub-clusters, namely cluster-1.1.1 to 1.1.5(Figure 1A), and similarly cluster-1.2viruses into three sub-clusters namely cluster-1.2.1 to 1.2.3 (Figure 1A) (Ma et al., 2022;van Schalkwyk et al., 2022;Bhatt et al., 2023).Among these, sub-cluster-1.1.1 and 1.1.3contain vaccine viruses, sub-cluster-1.1.2contains vaccine-associated virulence field strains detected in South Africa, and sub-clusters-1.1.4and 1.1.5contain vaccine-associated recombinant virulence field strains detected in Russia (Figure 1A).Similarly, cluster-1.2.3 of the Udmurtiya strain is also a recombinant virus (Figure 1A).Furthermore, the recombinant viruses in clusters-1.1.4,1.1.5,and 1.2.3 are different from the recombinant viruses belonging to the recombinant cluster that were detected in Russia, China, Thailand, Hong Kong, Taiwan, and Vietnam in 2019-2021 (Figure 1A).For better readability, hereafter recombinant cluster viruses are recognized as recombinant, and other recombinant viruses such as Utmurtia, Saratov, and Tyumen are recognized as 1.2.3, 1.1.4,and 1.1.5,respectively.In particular, wild-type viruses in sub-cluster-1.2.1 have been detected only in African countries and India.Finally, wild-type viruses in sub-cluster-1.2.2 are detected in African countries, Russia, Kazakhstan, Turkey, Israel, Greece, Bulgaria, and India (Figure 1A), however, Greece and Bulgaria have eradicated LSDV through mass vaccination.
Furthermore, Net Between Group Mean Distance (NBGMD) analysis revealed less than 1.25% nucleotide diversity among wild-type viruses (sub-clusters-1.2.1 and 1.2.2) and vaccine and vaccine-derived viruses (sub-clusters-1.1.1,1.1.2,and 1.1.3)(Figure 1B).Viruses in the recombinant cluster exhibited 0.52%-0.72%nucleotide diversity with viruses in other clusters, including wild-type and vaccine strains (Figure 1B).Since there is only a very low level of nucleotide diversity between different clusters, it is clear that microevolution has occurred between these clusters.After this, we conducted a similarity plot analysis to determine which genomic regions have the highest genetic diversity among these clusters.This analysis revealed less than 0.25% genetic diversity in almost all genomic regions between distinct clusters, and viruses in the recombinant cluster exhibited recombination with wild-type and vaccine viruses in multiple genomic regions (Figure 1C) (Krotova et al., 2022a,b;Ma et al., 2022).Also, similar to similarity plot analysis, nucleotide mismatches analysis revealed that viruses in the recombinant cluster alternately exhibited nucleotide similarity with wild-type and vaccine viruses in multiple genomic regions and that nucleotide differences were primarily due to SNPs (Figure 1D) and majorly, transition mutations (Supplementary Figure S4).Since viruses in the recombinant cluster alternately express SNPs with wild-type and vaccine viruses in multiple genomic regions, LSDV-vaccine strains are attenuated (likely due to disruption of virulence genes to attenuate the virus) mainly by the passage in the unnatural host or unnatural host's cells (Wallace and Viljoen, 2005;Tuppurainen et al., 2021), and this virus infects cattle, buffaloes, springbok, impala and giraffe (Young et al., 1970;Le Goff et al., 2009;Namazi and Khodakaram Tafti, 2021).A recent study using a short-read next-generation sequencing method explains the possibility of the origin of recombinant LSDVs through the homologous recombination of the Neethling-like LSDV vaccine strain and KSGP-like LSDV vaccine strains in the vaccine (Vandenbussche et al., 2022).

Mutations altering the open reading frames
In some viruses recently detected in the LSDV outbreak, LSDV019 and LSDV144 genes have been reported to be two fragments, LSDV019a, LSDV019b, and LSDV144a, LSDV144b, respectively, through frameshift mutations (Kumar et al., 2023), so we are interested in finding out which other genes have frameshift mutations and in-frame nonsense mutations among different clusters.In the present study, a frameshift mutation in LSDV035, LSDV019, LSDV134, and LSDV144 genes and in-frame non-sense mutations in LSDV026, LSDV086, LSDV087, LSDV114, LSDV130, LSDV131, LSDV145, LSDV154, LSDV155, LSDV057, and LSDV081 genes were revealed among different clusters (Figures 2-4; Supplementary Figure S5).
Remarkably, if the translation of the LSDV035 (putative RNA polymerase subunit-402 amino acid length) gene starts at the start codon frame and position of wild-type viruses, this gene is truncated in the vaccine, vaccine-derived, and recombinant viruses (Figure 2A).Whereas, if the LSDV035 gene starts translation in the start codon frame and position of vaccine, vaccine-derived, and recombinant viruses, this gene is truncated in wild-type viruses (Figure 2B).Also, it is significant that 32 amino acids "MFVLKLFNFNIYKNEFLVLLYLDFSINAKMENN" are extra at the N-terminal of the wild-type viruses (Figures 2A,B).In addition, OK422493.1/India/2019/Ranchi-1/P30,OK422494.1/India/2019/Ranchi-1/P50, and MT007951.1/Namibia/2016/10Fviruses have  Interestingly, if the translation of the LSDV019 (putative remodeling and stabilization of the host cytoskeleton and host immune evasion) gene is initiated in the start codon frame and   E).
Further, our analysis revealed that the LSDV026 gene was truncated by in-frame non-sense mutations in viruses belonging to wild-type cluster-1.2.2, except for viruses such as AF409137.4A).The LSDV026 gene was truncated by in-frame non-sense mutations in MW631933.1_LSDV_LSDviruses belonging to wildtype cluster-1.2.1 and MW435866.1_LSDV_SA-Neethlingviruses belonging to cluster-1.1.3(Figure 4A).Similarly, the LSDV086 (similar to vaccinia virus strain Copenhagen D9R) gene was found to be truncated by in-frame non-sense mutations in viruses belonging to vaccine cluster-1.1.1 except MW656252.1_LSDV/Haden/RSA/195virus (Figure 4B).The LSDV086 gene was truncated by in-frame non-sense mutations in MN636839.1_LSD-103-GP-RSA-1991virus belonging to vaccine-derived (Figure 4B).Also, LSDV087 (similar to vaccinia virus strain Copenhagen D10R) gene has been truncated by in-frame non-sense mutations in viruses belonging to vaccine cluster-1.1.1 and SRR21590382, SRR21590384, SRR21590385, and SRR2159038 belonging to wild-type cluster-1.2.2 found in India (Figure 4C).Interestingly, the LSDV114 gene is truncated by in-frame non-sense mutations in viruses other than vaccine cluster-1.1.1,cluster-1.1.5,and recombinant clusters (except MW732649.1_LSDV/HongKong) (Figure 4D).Finally, the LSDV145 (ankyrin repeat protein) gene is truncated by in-frame non-sense mutations in viruses other than cluster-1.1 viruses (Figure 4E), and the LSDV131 (superoxide dismutase precursor) gene is truncated in the majority of the vaccine strains in the cluster-1.1.1 (Supplementary Figure S5A).
It appears that none of the detected frameshift mutations and in-frame nonsense mutations (except LSDV114) are common to all viruses in a particular wild-type cluster-1.2.1, whereas these mutations are common among some of the viruses in different clusters of vaccine, vaccine-derived, and recombinant viruses (Figure 4F).A similar trend was observed in wild-type cluster-1.2.2 genes except for LSDV114, LSDV035, and LSDV019 (Figure 4F).Also, it is noteworthy that there were no detected frameshift mutations and in-frame nonsense mutations present only in all the viruses in the vaccine cluster-1.1.1 (Figure 4F).From these, since these frameshift and in-frame nonsense mutations are common among viruses in different clusters detected at various geographical locations at different times, these frameshift and in-frame nonsense mutations are likely caused by some common factors such as host adaptation, immune evasion, and recombination.

Purifying selection is the dominant driver of LSDV evolution
In the earlier sections, we analyzed differences in genes associated with frameshift mutation and in-frame nonsense mutations between different clusters of LSD viruses, so here we aimed to find out what differences exist in genes other than those described above.For this, we included genes that were not analyzed in the earlier section and also did not have open reading frames (ORF) overlaps; further, the genes that are transcribed in forward and reverse directions were analyzed separately in this study.First, we analyzed the nucleotide composition in the coding regions, and this analysis revealed that "AT" occupies around 75% of the nucleotides in genes transcribed in both forward and reverse directions, similar to the whole genome level (Figures 5A-C).From these, it is revealed that there is a bias in "AT" and "GC" in LSD viruses at the complete genome levels and the coding regions.Therefore, we were interested in whether this "AT" bias in coding regions could mediate codon usage bias among different clusters of LSD viruses.For this, we first performed an effective number of codons (ENc) analysis (Zhao et al., 2016;Wang et al., 2018;Desingu and Nagarajan, 2022;Desingu et al., 2022a,b), and this analysis revealed that ENc was around 39 of the genes transcribed in forward and reverse directions of all LSD viruses (Figures 5D,E).From these results, LSD viruses effectively use only 39 codons out of 61   Frontiers in Microbiology 10 frontiersin.orgmutate to adapt to the host's synonymous codon usage and achieve evolutionary development if they undergo host-jump or (or) pass through unnatural hosts for virus attenuation.
After this, we were interested in finding out whether LSDVvaccine strains attenuated by the passage in the unnatural host or the unnatural host's cell culture have evolved to adapt to the usage of synonymous codons of the unnatural host.For this purpose, we performed a dN/dS analysis comparing LSDV wild-type NCBI reference strain NC_003027.1_LSDV_NI-2490with viruses from other clusters.This dN/dS analysis revealed negative/purifying selection in genes transcribed in forward and reverse directions in clusters of vaccine, vaccine-derived, and recombinant viruses compared to wild-type (NC_003027.1)(Figures 6A,B).When comparing vaccine, vaccine-derived, and recombinant viruses with wild-type (NC_003027.1)dN/dS is around 0.1 (Figures 6A,B), it is clear that most of the total mutations in these viruses are synonymous codon mutations.Further, synonymous and nonsynonymous mutations in each cluster were subjected to in-depth analysis.In this analysis, it was revealed that the presence of synonymous and nonsynonymous mutations in genes transcribed in forward and reverse directions in LSD viruses was almost equal, and specifically, synonymous mutations were more abundant than nonsynonymous mutations in all clusters compared to wild-type (NC_003027.1)(Figures 6C-T).Also, it was revealed that synonymous and nonsynonymous mutations in vaccine and vaccine-derived viruses are around 500 and 175, respectively, whereas synonymous and nonsynonymous mutations in recombinant viruses are around 300 and 100, respectively (Figures 6C-T).In addition, these synonymous and nonsynonymous mutations can be seen to increase from central to terminal genes in genes that are transcribed in forward and reverse Purifying selection in the coding regions of LSD viruses.(A,B) The graphs depict the dN/dS ratio in the coding regions of different clusters of LSD viruses, (A) genes that are transcribing in the forward direction; and (B) genes that are transcribing in the reverse direction.The NC_003027.1 sequence was used as a reference in this analysis.(C-T) The graphs depict the cumulative dN/dS ratio in the coding regions of proteins from the center to ITR regions of different clusters of LSD viruses.The NC_003027.1 sequence was used as a reference in this analysis.Details of sequences used in this analysis are provided in Supplementary Datas 5 and 6.The forward and reverse directions for transcribing gene names and orders are presented in Supplementary Figures S3, S4.
directions in LSD viruses (Figures 6C-T).Notably, genes in the terminal part of the genome of poxviruses generally have high genetic diversity, and these genes play an essential role in host range, host adaptation, evasion of the host immunity, pathogenicity, and virulence (Biswas et al., 2020;Senkevich et al., 2020Senkevich et al., , 2021;;Brennan et al., 2022).
Next, we were interested in identifying nonsynonymous mutations in the genes transcribed in the forward and reverse directions of viruses in the vaccine, vaccine-derived, and recombinant clusters compared to the wild-type virus.In this analysis, nonsynonymous mutations increased from central to terminal genes in genes transcribed in forward and reverse directions of viruses in the vaccine, vaccine-derived, and recombinant clusters compared to wild-type viruses (Supplementary Figures S6, S7).Further, it was observed that the nonsynonymous mutations unique to the viruses in the recombinant cluster were not at a significant level and were a mixture of wild-type cluster-1.2.1 and vaccine cluster-1.1.1 (Supplementary Figures S6, S7).Also, since the nonsynonymous mutations found in the viruses in the wild-type cluster-1.2.2 are mainly absent in the viruses in the vaccine-derived and recombinant clusters (Supplementary Figures S6, S7), it could be realized that the viruses in the wild-type cluster-1.2.2 are evolving in a different direction from the viruses in the vaccine, vaccine-derived, and recombinant clusters.
Overall, it could be felt that the attenuated vaccine strains have evolved possibly through purifying selection for host adaptation by attaining the majority of mutations in synonymous codons as adapted to the codon usage of unnatural hosts.Also, viruses in wild-type cluster-1.2.2 have a purifying selection compared to wild-type cluster-1.2.1, and LSD viruses affect animals such as cattle, buffaloes, springbok, impala and giraffe (Young et al., 1970;Le Goff et al., 2009;Namazi and Khodakaram Tafti, 2021); this purifying selection suggests that possibly host adaptation has resulted in the majority of mutations in synonymous codons adapted to the codon usage of these or other animal hosts.

APOBEC editing is the dominant driver of LSDV evolution
In the previous sections, in the attenuation and evolution of LSD viruses, mostly synonymous codons are evolved, and CT-bias and GA-bias are present in the third nucleotide position of codons, so here we are interested in finding out the mechanism by which such mutations are acquired.We investigated the nucleotide mismatches in genes transcribed in forward and reverse directions of LSD viruses in different clusters compared to wild-type NCBI reference strain NC_003027.1_LSDV_NI-2490.In this analysis, it can be realized that there are more nucleotide mismatches in the vaccine and vaccinederived clusters, and these nucleotide mismatches are increasing from the central part toward the terminal part of the genes that are transcribed in the forward and reverse directions of LSD viruses (Figures 7A,B).Further, our analysis revealed that most of these nucleotide mismatches are silent mutations (Supplementary Figures S8A,B) and are caused by transition mutations (Figures 7C,D).Also, it is of increasing importance that around 80% of the mutations in genes transcribed in forward and reverse directions of LSD viruses of all clusters compared to wildtype (NC_003027.1) are transition mutations (Figures 8A,B).Furthermore, our results show that the G → A & C → T transition mutations fraction is around three times higher than the A → G & T → C transition mutations fraction in genes transcribed in forward and reverse directions of LSD viruses with all clusters compared to wild-type (NC_003027.1)(Figures 8C,D).Interestingly, G → A (or) C → T mutations can be generated by the host's APOBEC enzymes.It is noteworthy that these APOBEC enzymes play an essential role in the restriction of retroviruses (HIV), DNA viruses such as monkeypox virus (Mpoxv), hepatitis B virus (HBV), and human papillomavirus (HPV) (Bonvin et al., 2006;Bulliard et al., 2011;Warren et al., 2015;Gigante et al., 2022;Isidro et al., 2022;Pecori et al., 2022).Since DNA editing by APOBEC enzymes is based on TC > TT (or) GA > AA, GG > AG (or) CC > CT, and AC > AA (or) GT > AT motifs (Gigante et al., 2022;Isidro et al., 2022;Pecori et al., 2022), we were interested in finding out what motif mutations are present in genes transcribed in forward and reverse directions of viruses in all clusters compared to wild-type (NC_003027.1).Genes transcribed in the forward directions of viruses in the vaccine, vaccine-derived, and recombinant clusters revealed a higher abundance of AC > AA & GT > AT motif mutations compared to wild-type (NC_003027.1)(Figure 8E; Supplementary Figure S9A).On the other hand, AC > AA & GT > AT and TC > TT & GA > AA motif mutations were found to be almost higher abundance in genes transcribed in reverse directions of viruses in the vaccine, vaccinederived, and recombinant clusters compared to wild-type (NC_003027.1)(Figure 8F; Supplementary Figure S9B).After this, we were interested in finding out how the mutations created by these APOBEC enzymes compared to the wild-type (NC_003027.1)at the complete genome level of the viruses in the vaccine, vaccine-derived, and recombinant clusters.This analysis revealed that almost 80% of the mutations in the viruses in the vaccine, vaccine-derived, and recombinant clusters were transition mutations compared to the wild-type (NC_003027.1)at the whole genome level (Figure 8G), as in the coding regions (Figures 8A,B).Also, as in the coding regions, it was revealed that the G → A & C → T transition mutations fraction was almost three times higher than the A → G & T → C transition mutations fraction in the viruses in the vaccine, vaccine-derived, and recombinant clusters when compared to the wild-type (NC_003027.1)at the complete genome level (Figure 8H), and AC > AA & GT > AT and TC > TT & GA > AA motif mutations were also found to be more prevalent (Figure 8I; Supplementary Figure S9C).Interestingly, it is noteworthy that AC > AA & GT > AT motif mutations are edited by APOBEC1 enzyme present in tetrapod to humans, whereas TC > TT & GA > AA, GG > AG & CC > CT motif mutations are edited by APOBEC3 enzyme present in placental mammals (Gigante et al., 2022;Pecori et al., 2022).Overall, the viruses in the vaccine clusters have a large number of transition mutations at the complete genome and coding region level compared to the wild-type virus, and the G → A & C → T transition mutations fraction is higher in these transition mutations, and the G → A & C → T mutations are in motif mutations that are genome editing by host APOBEC enzymes.Also, since transition mutations in coding regions are mostly silent mutations, it is revealed that APOBEC enzymes are the dominant driver in the evolution of host codon usage adaptation of these vaccine viruses.

Discussion
As the monkeypox virus in humans has caused outbreaks in non-endemic countries, recently, the lumpy skin disease virus (LSDV) Poxvirus in ruminants has caused outbreaks in non-endemic countries.However, the role of evolutionary drivers and genetic variations (frameshift and in-frame nonsense mutations) of LSDVs, which cause outbreaks in cattle in various countries and cause    Desingu et al., 2022a,b), LSDV started spreading in the non-endemic countries (Krotova et al., 2022a,b).On the other hand, APOBEC3 mutations were found to be enriched only in clade-IIb viruses when compared to the most recent common ancestors of viruses within the clade and considered as the major driver of human adaptationmediated microevolution (Gigante et al., 2022;Isidro et al., 2022).The present study sheds light on the microevolution of APOBEC editing, purifying selection, frameshift, and in-frame nonsense mutations in LSD viruses.Homologous vaccines containing the Neethling strain have been attenuated by a very high number of passages in cell cultures and chicken eggs (Kitching, 2003;Vandenbussche et al., 2022), and studies have shown that this vaccine generally induces good protection and mild to negligible adverse reactions in cattle (Klement et al., 2020;Morgenstern and Klement, 2020;Bamouh et al., 2021;Haegeman et al., 2021).Studies reported that the Kenyan sheep and goat pox (KSGP) strains vaccine contains lumpy skin disease viruses (Tuppurainen et al., 2014;Vandenbussche et al., 2016Vandenbussche et al., , 2022)), and this vaccine has been reported to cause clinical signs in vaccinated cattle (Bamouh et al., 2021;Tuppurainen et al., 2021), this may be due to the lower number of passages to attenuate the virus (Vandenbussche et al., 2022).Recombinant LSDVs with genetic signatures of Neethling vaccine strain and KSGP vaccine strains have recently been reported in several studies (Sprygin et al., 2018a,b;Mathijs et al., 2021;Flannery et al., 2022;Huang et al., 2022;Ma et al., 2022).Further, a study using short-read next-generation sequencing methods reported that the KSGP vaccine contains lumpy skin disease viruses such as the Neethling-like LSDV vaccine strain, KSGP-like LSDV vaccine strain, and almost identical recombinant LSDV strains detected in the field outbreaks, suggesting recombinant LSDVs may be originated by recombination of the Neethling-like LSDV vaccine strain and KSGPlike LSDV vaccine strains in the vaccine (Vandenbussche et al., 2022).
Generally, LSD viruses are attenuated by serial passages in the unnatural host or unnatural host cells such as chicken eggs, rabbit kidney cells, and lamb kidney cells (Wallace and Viljoen, 2005;Tuppurainen et al., 2021).Viruses in general, are transformed into vaccine strains by passages in the unnatural host or unnatural host cells.In the process of attenuation, these viruses mostly undergo host adaptation evolution and become attenuation.In such evolution for host adaptation, mutations are adopted in synonyms codons to adapt to host codon usage bias.Non-synonymous mutations are adopted for virus-host cell entry, replication, and host immune evasion (Desingu and Nagarajan, 2022;Desingu et al., 2022a,b).In general, ENc values <35 are considered to have high codon bias, and >50 to indicate general random codon usage (Sheikh et al., 2020;Desingu and Nagarajan, 2022;Li et al., 2022).In the present study, we observed the ENc value around 39 in LSD viruses suggesting moderate codon use bias in the LSD viruses, and this ENc value is possibly associated with the limited host tropism of LSDVs.Interestingly, the present study revealed that compared to wild-type virus, attenuated vaccine strains have more transition mutations, G → A & C → T transition mutations fraction is greater than A → G & T → C, and G → A & C → T are APOBEC editing motif mutations.Further, we observed negative/purifying selection in genes transcribed in forward and reverse directions in clusters of vaccine, vaccine-derived, and recombinant viruses in the dN/dS analysis.Consistent with this, we also noticed the abundance of synonymous codon mutations in attenuated vaccine strains compared to wild-type viruses revealed moderate selection pressure for host codon usage bias.In addition, the present study identified the frameshift mutations and in-frame nonsense mutations in distinct ORF disturbances in different clusters will help to understand the pathogenic importance of these ORFs in LSD viruses by future experimental studies and also monitor the epidemiological spread of viruses.Further, we observed that even though the viruses causing the outbreaks are grouped in a specific cluster at the whole genome level, they have attained ORF disturbance similar to other clusters.From these, it could be realized that the attenuated vaccine strains and evolution of field strains could have been transformed into vaccine strains and mutant strains, respectively, through gene deletion, host selection pressure, purifying selection, and APOBEC editing.
In conclusion, it is revealed that LSD viruses have achieved microevolution through host selection pressure, purifying selection, and APOBEC editing.There are unique frameshift and in-frame nonsense mutations in the specific genes of most viruses in each cluster.Also, it has come to light that despite being grouped in a specific cluster at the complete genome level, some genes have frameshift and in-frame nonsense mutations similar to those in other clusters and have mutated into viruses, causing outbreaks in different geographical regions.The findings in the present study are expected to help in the virus pathogenesis studies of disturbed ORFs in LSD viruses, and monitoring the epidemiological spread of viruses and their genetic variants.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.
transcribing in the reverse direction.The NC_003027.1 sequence was used as a reference in this analysis.(C,D) The graphs illustrate the Fraction A > G & T > C and Fraction G > A & C > T mutations in the coding regions of different clusters of LSD viruses, (C) genes that are transcribing in the forward direction; and (D) genes that are transcribing in the reverse direction.The NC_003027.1 sequence was used as a reference in this analysis.(E,F) The graphs elucidate the APOBEC motif mutations in the coding regions of different clusters of LSD viruses, (E) genes that are transcribing in the forward direction; and (F) genes that are transcribing in the reverse direction.The NC_003027.1 sequence was used as a reference in this analysis.(G) The graphs depict the transitions and transversion mutations at the complete genome levels of different clusters of LSD viruses.The NC_003027.1 sequence was used as a reference in this analysis.(H) The graphs illustrate the fraction A > G & T > C and fraction G > A & C > T mutations at the complete genome levels of different clusters of LSD viruses.The NC_003027.1 sequence was used as a reference in this analysis.(I) The graphs elucidate the APOBEC motif mutations at the complete genome levels of different clusters of LSD viruses.The NC_003027.1 sequence was used as a reference in this analysis.Details of sequences used in this analysis are provided in Supplementary Datas 7-9.

FIGURE 1
FIGURE 1 Complete genome nucleotide sequence level genetic diversity in LSD viruses.(A) Whole genome nucleotide sequences based on phylogenetic analysis of LSDV sequences separated them into different clusters.(B) The whole genome nucleotide sequences based NBGMD analysis revealed less than 1.25% nucleotide diversity among clusters of the LSD virus.The details of the virus's sequence in different clusters are presented in (A).The estimated standard error was displayed above the diagonal in the table.(C) The SimPlot analysis depicts the multiple regions of recombination with mostly lesser than 0.25% genetic diversity among clusters of the LSD virus at the whole genome nucleotide sequence levels.The NC_003027.1 sequence is used as a query sequence in this analysis.(D) Nucleotide mismatches at the level of whole genome nucleotide sequences have been depicted in different clusters of LSD viruses.The NC_003027.1 sequence was used as a reference in this analysis.
codons in hosts to produce 20 amino acids, so it is clear that LSD viruses have a bias in using host codons.Further, in the analysis of ENc-GC3s plot(Wang et al., 2018;Tian et al., 2020;Desingu and Nagarajan, 2022; Desingu et al., 2022a,b), the genes transcribed in forward and reverse directions of LSD viruses fall slightly below the expected curve (Figures5F,G), so it could be realized that selection

FIGURE 3
FIGURE 3Frameshift mutations in LSDV134 and LSDV144 genes.(A-C) Frameshift mutations in the LSDV134 gene (A,B) cluster-1.2frame, and (C) cluster-1.1.1 frame.(D,E) Frameshift mutations in the LSDV144 gene (D) cluster-1.2frame, and (E) cluster-1.1.1 frame.Details of sequences used in this analysis are provided in Supplementary Data 1.

FIGURE 4
FIGURE 4In-frame nonsense mutations in LSDV026, LSDV086, LSDV087, LSDV114, and LSDV145 genes.In-frame nonsense mutations in the gene (A) LSDV026; (B) LSDV086; (C) LSDV087; (D) LSDV114; and (E) LSDV145.Details of sequences used in this analysis are provided in Supplementary Data 1. (F) The table depicts the frequency of frameshift and in-frame nonsense mutations in different genes detected in the present study in each cluster.

FIGURE 5
FIGURE 5Host codons usage bias, selection pressure, codon third position CT-bias, and GA-bias in the coding regions of LSD viruses.(A-C) The graphs depict the nucleotide composition of LSD viruses, (A) at complete genome levels; (B) genes that are transcribing in the forward direction; and (C) genes that are transcribing in the reverse direction.(D,E) The graphs display the ENc values in the coding regions of LSD viruses, (D) genes that are transcribing in the forward direction; and (E) genes that are transcribing in the reverse direction.(F,G) The graphs illustrate the ENc-plot in the coding regions of LSD viruses, (F) genes that are transcribing in the forward direction, and (G) genes that are transcribing in the reverse direction.(H,I) The graphs show the parity-plot in the coding regions of LSD viruses, (H) genes that are transcribing in the forward direction; and (I) genes that are transcribing in the reverse direction.(H,I) The graphs depict the G3/G3 + A3-plot in the coding regions of LSD viruses, (J) genes that are transcribing in the forward direction; and (K) genes that are transcribing in the reverse direction.Details of sequences used in this analysis are provided in Supplementary Datas 2-4.

FIGURE 7
FIGURE 7Nucleotide diversity in the coding regions of LSD viruses.(A,B) Nucleotide mismatches in the coding regions of LSD viruses were visualized; (A) genes that are transcribing in the forward direction and (B) genes that are transcribing in the reverse direction.The NC_003027.1 sequence was used as a reference in this analysis.(C,D) Transitions and transversions mutations in the coding regions of LSD viruses were visualized, (C) genes that are transcribing in the forward direction; and (D) genes that are transcribing in the reverse direction.The NC_003027.1 sequence was used as a reference in this analysis.

FIGURE 8 APOBEC
FIGURE 8 APOBEC editing at the coding regions and complete genome sequences level of LSD viruses.(A,B) The graphs depict the transitions and transversion mutations in the coding regions of different clusters of LSD viruses, (A) genes that are transcribing in the forward direction; and (B) genes that are (Continued)