Revisiting the genotypes of Theileria equi based on the V4 hypervariable region of the 18S rRNA gene

Introduction Equine theileriosis, an economically important disease that affects horses and other equids worldwide, is caused by a tick-borne intracellular apicomplexan protozoa Theileria equi. Genotyping of T. equi based on the 18S rRNA gene revealed the presence of two, three, four or five genotypes. In previous published reports, these genotypes have been labelled either alphabetically or numerically, and there is no uniformity in naming of these genotypes. The present study was aimed to revisit the phylogeny, genetic diversity and geographical distribution of T. equi based on the nucleotide sequences of the V4 hypervariable region of the 18S rRNA gene available in the nucleotide databases. Methods Out of 14792 nucleotide sequences of T. equi available in the GenBank™, only 736 sequences of T. equi containing the complete V4 hypervariable region of the 18S rRNA gene (>207 bp) were used in multiple sequence alignment. Subsequently, a maximum likelihood phylogenetic tree was constructed based on the Kimura 2-parameter model (K2+I). Results The phylogenetic tree placed all the sequences into four distinct clades with high bootstrap values which were designated as T. equi clades/ genotypes A, B, C and D. Our results indicated that the genotype B of Nagore et al. and genotype E of Qablan et al. together formed the clade B with a high bootstrap value (95%). Furthermore, all the genotypes probably originated from clade B, which was the most dominant genotype (52.85%) followed by clades A (27.58%), and C (9.78%) and D (9.78%). Genotype C manifested a comparatively higher genetic diversity (91.0-100% identity) followed by genotypes A (93.2-99.5%), and B and D (95.7-100%). The alignment report of the consensus nucleotide sequences of the V4 hypervariable region of the 18S rRNA gene of four T. equi genotypes (A-D) revealed significant variations in one region, between nucleotide positions 113-183, and 41 molecular signatures were recognized. As far as geographical distribution is concerned, genotypes A and C exhibited far-extending geographical distribution involving 31 and 13 countries of the Asian, African, European, North American and South American continents, respectively. On the contrary, the genotypes B and D exemplified limited distribution with confinement to 21 and 12 countries of Asian, African and European continents, respectively. Interestingly, genotypes A and C have been reported from only two continents, viz., North and South America. It was observed that genotypes A and C, and B and D exhibit similar geographical distribution. Discussion The present study indicated the presence of only four previously described T. equi genotypes (A, B, C and D) after performing the molecular analyses of all available sequences of the complete V4 hypervariable region of the 18S rRNA gene of T. equi isolates in the GenBank™.


Introduction
Equine piroplasmosis (EP) consists of two tick-borne diseases, equine theileriosis and babesiosis, which are, respectively, caused by hemoprotozoa, Theileria equi and Babesia caballi (1,2).Besides worldwide geographical distribution, EP is an economically important tick-borne disease with high morbidity and mortality rates (1,3).Theileria equi and B. caballi are transmitted by ixodid ticks under natural conditions which act as biological vectors for them (4).Additionally, transplacental (5,6), and iatrogenic transmission by the use of contaminated needles and syringes, surgical instruments, and blood transfusions have been reported (7).Both T. equi and B. caballi infections cause subclinical to acute diseases, and the clinical signs are usually similar and non-specific in nature (4,8).In general, T. equi causes a more severe clinical disease compared to B. caballi (9).

Retrieval of nucleotide sequences
Out of 14,792 nucleotide sequences of T. equi available in the GenBank™, the 18S rRNA sequences of T. equi (n = 927) were downloaded in the FASTA format from the nucleotide database accessed in March, 2023.A dataset was created with 736 sequences of T. equi containing the complete V4 hypervariable region of the 18S rRNA gene (>207 bp) and the remaining sequences were discarded.The sequences derived from blood, spleen of the aborted foetus and ticks of various vertebrate and invertebrate hosts, viz., horse, African wild donkey (Equus africanus), Asiatic wild ass (Equus hemionus), zebra (Equus quagga), Equus ferus caballus, Rhipicephalus sanguineus ticks of dogs, Rhipicephalus sanguineus s.l., naturally infected dogs, German shepherd dog, Rhipicephalus bursa, Canis lupus familiaris, Hyalomma excavatum, Hyalomma anatolicum, Black rhinoceros (Diceros bicornis), Rhipicephalus annulatus ticks of cattle, domestic donkey (Equus asinus), Haemaphysalis sp., Dermacentor nuttalli isolated from horse, Ixodes ricinus, Rhipicephalus appendiculatus and Rhipicephalus evertsi evertsi, were included in the analysis.In addition, Theileria haneyi sequences containing the complete V4 hypervariable region of the 18S rRNA gene (n = 06) were also included in the analysis.The details of accession numbers used in the current study are provided in Supplementary Table S1.

Multiple sequence alignment and phylogenetic analyses based on 18S rRNA gene
The dataset containing 18S rRNA sequences was uploaded to MEGA-X software version 10.1.7 for sequence alignment (33).Sequences of varying lengths were aligned using ClustalW and their unequal lengths were trimmed at one or both ends for equalization using a reference sequence of the V4 hypervariable region of 18S rRNA gene of T. equi (Accession number MN818862).The nucleotide identities were computed using MegAlign (DNASTAR) software (34).
For phylogenetic analysis, the V4 hypervariable region of the 18S rRNA gene sequences was subjected to Multiple Alignment using Fast Fourier Transform (MAFFT) online software (35).The aligned sequences were analyzed using MEGA-X software version 10.1.7 to predict the most suitable model based on Akaike and Bayesian information criterion.Subsequently, a maximum likelihood phylogenetic tree was constructed based on the Kimura 2-parameter model (K2 + I; 36) as previously described in detail by Nehra et al. (32).The rate variation model allowed some sites to be evolutionarily invariable ([+I], 26.67% sites).The final alignment involved 743 nucleotide sequences (736 T. equi, 06 T. haneyi and 01 outgroup) with a total of 213 positions.Theileria parva (MK792993, South Africa) was used as an outgroup species for rooting (33).The reliability of the tree was assessed by 1,000 bootstrap replications (Figures 1, 2).
For ease in display of results of the phylogenetic analysis, the identical sequences of T. equi were removed to obliterate the superfluous sequences; consequently, only 154 and six sequences of T. equi and T. haneyi were included, respectively.The phylogenetic analysis was again performed as described before to generate Figures 1, 2.

Phylogenetic analyses
The maximum likelihood tree placed all the sequences into four distinct clades/genotypes with high bootstrap values which were designated as T. equi clades A, B, C, and D (Figures 1, 2A,B

Genetic diversity
The percent nucleotide identity of all the T. equi sequences was 91.0-100%.Nucleotide variations were ascertained in the V4 hypervariable region of the 18S rRNA gene at isolated places upon multiple sequence alignment.The careful examination unraveled four disparate genotypes, viz., T. equi clades A, B, C, and D.
Sequence analysis of the consensus V4 hypervariable region sequences of T. equi genotypes A, B, C, and D revealed significant nucleotide variations between positions 113-183 and identified 41 molecular signatures.Moreover, no nucleotide variations were observed at other places among clades (Figure 3).

Theileria equi genotype A
It consisted of 203 sequences of T. equi which displayed 93.2-99.5% similarity amongst each other (Tables 1, 2).Additionally, it exhibited 85.5-89.9%,85.0-94.7%, and 83.6-89.9%sequence similarity with T. equi genotypes B, C, and D, respectively (Table 3).The high similarity (85.0-94.7%)with genotype C suggested its close association with this genotype, which is also displayed in the phylogenetic analysis (Figures 1, 2).Single nucleotide substitutions and deletions at 33 and two positions (145 and 156), respectively, were documented in the V4 hypervariable region of this genotype upon sequence analysis (Figure 4).Likewise, single nucleotide variations were observed at 116, 62, and 64 places when compared with T. equi genotypes B, C, and D, respectively.A circular maximum likelihood tree based on the V4 hypervariable region of the 18S rRNA gene clearly depicts the four genotypes/clades (A, B, C and D) of T. equi due to extensive nucleotide heterogeneity in this region.The taxon name of each sequence is depicted by its accession number followed by the country of origin.The color coding of different clades is as below: Genotype A-Green font color with green filled square as taxon marker; Genotype B-Red font color with red filled circles as taxon markers; Genotype C-Pink font color with pink filled triangles as taxon markers; Genotype D-Blue font color with blue filled rhombi as taxon markers; Outgroup-Purple font color with purple filled inverted triangle as taxon marker. 10.3389/fvets.2024.1303090 Frontiers in Veterinary Science 04 frontiersin.org

Theileria equi genotype B
It contained 389 sequences of T. equi which manifested 95.7-100% nucleotide homology (Table 2).It displayed 85.5-89.9%,79.5-86.5%, and 84.6-88.9%nucleotide identity with T. equi genotypes A, C, and D, respectively (Table 3); consequently, evinced its close association with genotype A (85.5-89.9%).The multiple sequence alignment report of the V4 hypervariable region of the 18S rRNA gene revealed nucleotide variations at 96 places within this genotype.Similarly, it showed sequence variations at 116, 122 and 118 places when compared with the 18S rRNA sequences of T. equi genotypes A, C and D, respectively.

Geographical distribution
In Table 4, the country-wise distribution of various genotypes of T. equi is enlisted.Theileria equi genotype A exhibited the most widespread and far-extending geographical distribution involving 31 countries of the Asian, African, European, North American and South American continents.Similar to this genotype, genotype C extended its distribution to 13 countries of the Asian, African, European, North American and South American continents.On the contrary, the genotypes B and D exemplified limited distribution with confinement to 21 and 12 countries of Asian, African, and European continents, respectively (Table 1).Genotypes A and C are    Sequence variations detected in the V4 hypervariable region of the 18S rRNA gene of T. equi genotype A upon multiple sequence alignment.The identical sequences were removed from the alignment.Single nucleotide substitutions were documented at 33 places within this genotype (marked * and shaded yellow in red box).Similarly, single nucleotide deletions (marked # and shaded green in blue box) were observed at two positions (145 and 156).
the only genotypes recorded from five continents, viz., Asia, Africa, Europe, North America, and South America.Furthermore, all the four genotypes (A-D) have been reported from three continents namely, Asia, Africa, and Europe.Interestingly, genotypes A and C have been reported from only two continents, viz., North and South America.It was observed that genotypes A and C, and B and D Frontiers in Veterinary Science 07 frontiersin.orgexhibit similar geographical distribution.Amongst all the T. equi genotypes, genotype D revealed the most restricted distribution with confinement to 12 countries only (Table 1).The genotype C showed relatively wide-stretched distribution compared to the genotype D with reports from 13 countries (Figure 7).One  4).

Discussion
The current study aimed to investigate the genetic diversity of T. equi based on the V4 hypervariable region of the 18S rRNA gene, as it influences both the transmission of the disease and sensitivity of the diagnostic tests.Even though the equine merozoite antigen (EMA)-1 (20, 37-40) and β-tubulin (41) genes have been targeted, the hypervariable regions of the 18S rRNA gene are considered as the most suitable target for identification, phylogenetic, and genetic variation analysis of Apicomplexa and Piroplasmids (32, [42][43][44].It is due to the presence of its multiple copies within the genome (45), a level of sequence conservation, and the existence of hypervariable regions, which result in meaningful phylogenetic comparisons (46,47).Preliminary studies based on this gene first detected only two clades of T. equi in Spain (10) in the year 2004.Soon after, third (genotype C; 2009) and fourth (genotype D; 2010) clades were reported from South Africa (12) and Sudan (15), respectively.Two years later, Qablan et al. (19) reported an additional clade (genotype E) from horses in Jordan (Suwaymah), South Korea and Spain in 2012;  19) together form the clade B with a high bootstrap value (95%).In addition, Theileria haneyi occupied clade C of the current T. equi umbrella just like previous studies (50).A change in number of T. equi genotypes can affect the diagnostic results, clinical outcome of infection, and therapeutic efficacy.For example, genotype A is reported to be more commonly associated with clinical piroplasmosis than the other genotypes (14).Similarly, repeated treatment with imidocarb dipropionate cleared the single infection of T. equi genotype A, but not from horses co-infected with T. haneyi (genotype C) and genotype A (51).Nowadays, it is well established that T. equi exhibits greater genetic diversity in the 18S rRNA gene as compared to B. caballi (32), and T. equi isolates diversify even within the same geographical regions (12,15).It also exhibits a broad host range involving horses (10, 26), domestic donkeys (Equus asinus; 13, 17, 23, 27), camels (19), dogs (29,52,53), Asiatic wild ass (Equus hemionus), African wild donkeys (Equus africanus), zebras (Equus quagga; 27), and Black rhinoceros (Diceros bicornis; 54), which contribute to the maintenance and circulation of the parasite.It also suggests a reduced host specificity of the parasite.Theileria   equi genotype A has been documented to infect a wider spectrum of hosts, viz., horses, camels, and dogs (19).For T. equi, natural recovery is not possible and life-long asymptomatic carriers are seen as opposed to B. caballi infection (55).Therefore, a change in the number of genotypes can be expected with an increase in new submissions of T. equi sequences in the nucleotide databases from countries/ geographical locations where it has not been reported thus far (21).Furthermore, it is also documented that the endemicity of T. equi infection is implicated by only one genotype in combination with (or without) other introduced genotypes in a geographical location (15).Genotype C demonstrated a comparatively higher genetic diversity (91.0-100% identity) contrary to the remaining genotypes.The genotypic variations in the 18S rRNA gene of T. equi seems to be due to an increased genetic divergence over a protracted period of time (12,15,17).In addition, appearance of single nucleotide polymorphisms (SNPs) in the genome, genetic recombination in the tick vectors, and co-infection of ticks and various hosts with two or more genotypes of T. equi within the same population can also be attributed (12).A parallel genetic divergence resulting in generation of similar sequences and/or variations at places other than the site of origin had also been postulated (15).The increased dispersal and gene flow due to migration of various hosts and tick vectors from one geographical area to the other cannot be neglected, as it results in progressive spread of genetic variations to different locations.The genetic diversity observed in the V4 hypervariable region of this gene of T. equi has been in agreement with the earlier reports from Africa (12,15), Asia (19)(20)(21)25), Europe (10, 14, 17, 26), North America (16), and South America (13, 28).It is pertinent to note that the complete V4 region was not significantly variable; instead, a portion containing 41 molecular signatures between nucleotide positions 113-183 was highly variable between genotypes.It can be targeted for designing primers/ probes for the development of genotype-specific conventional and realtime polymerase chain reaction (PCR) assays.The various implications, viz., taxonomy, virulence, immunological cross-reactivity, infection persistence, diagnosis, and transmission dynamics of the genotypes described here are yet to be studied, particularly in experimental infections, and warrants future research.

Country Genotype
The clustering of T. equi clades and the length of branches possibly indicate the existence of one or even more new/ cryptic species.However, to confirm this proposition, additional data based on other potential molecular markers need to be generated in addition to a better knowledge of the vectors involved in the transmission (21).Recently, T. equi genotype C has been reported to represent a novel species, T. haneyi, on the basis of whole genome sequence, and several cryptic Theileria species have been collectively classified as T. equi (50).
As the 18S rRNA gene sequences originating from different geographical locations were found to group together in the phylogenetic analysis, it can be envisaged that the various genotypes of T. equi are not geographically delimited.However, it is reported that genotypes A and B are more prevalent in symptomatic and asymptomatic animals, respectively (14,16).
Widespread geographical distribution of T. equi genotypes was demonstrated worldwide, which was in line with the results of other authors (14,17,20,26,28).It could be due to movement of tick vectors, and equines for trade and equestrian competitions.Besides, the unrestricted cross-border movement of wild animals cannot be neglected.In spite of that, the underlying reasons for the most restricted geographical distribution of the T. equi genotype D amongst all the genotypes are difficult to assert at this moment.An adequate comparison of the genetic variations between sequences from different countries was carried out in the current study as it involved all the sequences of the V4 hypervariable region available in the GenBank™.Considering the sequence heterogeneity within the V4 hypervariable region of T. equi genotypes, the probe based hybridization/ diagnostic techniques should cover the local as well as international strains (including all the four genotypes).
It is important to identify any association between parasite genotypes and tick species, if present.The relationship between genotypes, symptomatology and serological cross-reactivity, as established for canine babesiosis (36), needs to be explored for equine piroplasmosis.For formulating the preventive and control strategies for T. equi infection in different countries, the presence and distribution of various genotypes need to be kept in mind.Moreover, studies focusing at possible clinical impact, and concomitant detection of the different T. equi genotypes in a single animal and/or tick vector, whose existence cannot be discounted, are necessary.Such investigations would require cloning and sequencing of the PCR amplicons, or additional molecular testing using genotypespecific primers and/or probes.

Conclusion
The present study indicated the presence of only four previously described T. equi genotypes (A, B, C and D) based on the V4 hypervariable region of the 18S rRNA gene.It did not support the independent existence of previously identified genotypes B (10) and E (19).Instead, both these genotypes collectively represent T. equi genotype B. The presently identified genetic diversity provides novel insights to the clinicians, researchers, government officials (policy makers) and animal owners for the control of equine piroplasmosis caused by T. equi in domestic and wild animals.Animal Sciences for providing the necessary facilities to carry out the research.

FIGURE 2 (
FIGURE 2 (A) Cladogram depicting clear distinction between four genotypes (A-D) of T. equi based on the V4 hypervariable region of the nuclear 18S rRNA gene.The taxon name of each sequence is depicted by its accession number followed by the country of origin.(B) The compressed tree depicting the phylogenetic relationship between four genotypes (A-D) of T. equi.It is evident that all genotypes have probably originated from clade B. The color coding of different clades is as below: Genotype A-Green font color with green filled square as taxon marker; Genotype B-Red font color with red filled circles as taxon markers; Genotype C-Pink font color with pink filled triangles as taxon markers; Genotype D-Blue font color with blue filled rhombi as taxon markers; Outgroup-Purple font color with purple filled inverted triangle as taxon marker.

Switzerland ( 05 12 FIGURE 3
FIGURE 3Multiple sequence alignment of the consensus sequences of the V4 hypervariable region of the 18S rRNA gene of T. equi genotypes A, B, C, and D exhibited significant variations between nucleotide positions 113-183 (shaded yellow in red box).A total of 41 molecular signature residues (marked *) were identified in the V4 hypervariable region.

FIGURE 5
FIGURE 5Sequence variations detected in the V4 hypervariable region of the 18S rRNA gene of T. equi genotype C upon multiple sequence alignment.The identical sequences were removed from the analysis.The alignment report of this genotype exhibited single nucleotide substitution (marked * and shaded yellow in red box) and deletion (marked # and shaded green in blue box) at 24 and nine places (125, 156, 157, 167, 168, 169, 170, 171, and 172), respectively.
(A/B/C/D), two (A + B/A + C/A + D), three (A + B + C/A + B + D/A + C + D) and four (A + B + C + D) genotypes have been reported from 23, 11, seven, and two countries, respectively.China and South Africa are thus far the only countries harboring all the four genotypes of T. equi (Table thus, giving rise to the contemporary five clades (A-E) of T. equi (20-27).The present study exhibited marked nucleotide variations in the V4 hypervariable region of the 18S rRNA gene, with the presence of four significantly different T. equi genotypes (A, B, C and D), in agreement with the results of some of the previous researchers (15-18).Although the majority of sequences (52.85%) of T. equi in our dataset accorded to genotype B, our analysis did not firmly support the separate existence of the clades B and E of five genotype classification, in concordance with the key results of Hall et al. (16), and contrary to the findings of Qablan et al.(19).Instead, it indicated integration of both of the former genotypes to generate a distinct clade B of the four clade system, in consonance with the findings of Veronesi et al. (17), Alanazi et al. (48), and Coultous et al. (49).Our results indicated that the genotype B of Nagore et al. (10) and genotype E of Qablan et al. (

FIGURE 6
FIGURE 6Sequence variations detected in the V4 hypervariable region of the 18S rRNA gene of T. equi genotype D upon multiple sequence alignment.The identical sequences were removed from the analysis.The alignment report of this genotype exhibited single nucleotide substitution (marked * and shaded yellow in red box) and deletion (marked # and shaded green in blue box) at 19 and two places (131 and 132), respectively.
, and DUnited States of America (USA)A and C

TABLE 1
Geographical distribution of T. equi genotypes, along with the clade-wise and country-wise distributions of the partial 18S rRNA gene sequences (n = 736) involved in the analysis.

TABLE 2 A
breakdown of the percent nucleotide identity of T. equi genotypes based on the V4 hypervariable region of the 18S rRNA gene.

TABLE 4
Country-wise breakdown of the T. equi genotypes based on the V4 hypervariable region of the 18S rRNA gene.