Comparative Mitochondrial Genomic Analysis Robustly Supported That Cat Tapeworm Hydatigera taeniaeformis (Platyhelminthes: Cestoda) Represents a Species Complex

Hydatigera taeniaeformis is one of the most common intestinal tapeworms that has a worldwide distribution. In this study, we sequenced the complete mitochondrial (mt) genome of H. taeniaeformis from the leopard cat (designated HTLC) and compared it with those of H. taeniaeformis from the cat in China (designated HTCC) and Germany (designated HTCG). The complete mt genome sequence of HTLC is 13,814 bp in size, which is 167 bp longer than that of HTCC and is 74 bp longer than that of HTCG. Across the entire mt genome (except for the two non-coding regions), the sequence difference was 3.3% between HTLC and HTCC, 12.0% between HTLC and HTCG, and 12.1% between HTCC and HTCG. The difference across both nucleotide and amino acid sequences of the 12 protein-coding genes was 4.1 and 2.3% between the HTLC and HTCC, 13.3 and 10.0% between the HTLC and HTCG, and 13.8 and 10.6% between the HTCC and HTCG, respectively. Phylogenetic analysis based on concatenated amino acid sequences of 12 protein-coding genes showed the separation of H. taeniaeformis from different hosts and geographical regions into two distinct clades. Our analysis showed that the cat tapeworm H. taeniaeformis represents a species complex. The novel mt genomic datasets provide useful markers for further studies of the taxonomy and systematics of cat tapeworm H. taeniaeformis.


HIGHLIGHTS
-The complete mitochondrial (mt) genome of Hydatigera taeniaeformis from the leopard cat in China is 13,814 bp in size. -Phylogenetic analysis showed the separation of H. taeniaeformis from different hosts and geographical regions into two distinct clades. -The molecular evidence presented here supports the hypothesis that H. taeniaeformis represents a species complex.

INTRODUCTION
Tapeworms (Platyhelminthes: Cestoda) are one of the major groups of parasitic flatworms that are passively transmitted between hosts and parasitize virtually every vertebrate species (1). The adults of tapeworms live in the digestive tract of all groups of vertebrates, whereas larvae occur in different organs and body cavities of invertebrates and vertebrates (2). Tapeworms cause neglected diseases that can be fatal and are a major impediment to socioeconomic development (3,4). Cestoda is a large animal class that is currently classified into 19 orders with ∼5,000 valid species (5). The family Taeniidae is a medically important group of tapeworms that generally consists of four valid genera, Taenia, Echinococcus, Hydatigera, and Versteria. Based on morphological features, some authors have recognized that the genus Hydatigera is valid, whereas others have treated this genus as a junior synonym of the genus Taenia. In a recent revision, the resurrection of the genus Hydatigera was proposed based on molecular evidence (6). The cat tapeworm H. taeniaeformis (formerly referred to as T. taeniaeformis), the type species of the genus Hydatigera, is the most common tapeworm of cats and rats (7). Although the zoonotic significance of this tapeworm remains unclear, it has also been recorded in humans in unusual circumstances (8). The correct identification and differentiation of cat tapeworm H. taeniaeformis have important implications for studying its epidemiology, systematics, and population genetics. Many studies have shown that several criteria (e.g., infectivity, development, morphology, and biochemistry) differ between H. taeniaeformis from different geographical origins (9)(10)(11)(12). Based on morphological analysis, H. taeniaeformis was considered a complex of three cryptic species (13). Phylogenetic analyses of the H. taeniaeformis based on sequences of nuclear 18S rRNA and mitochondrial (mt) cox1 DNA also strongly support the presence of three distinct clades (13).
The mitochondrial genome (containing 36-37 genes) has been considered a useful marker for the species identification and differentiation of many tapeworms (14)(15)(16). The complete mt genomes of H. taeniaeformis from cats in China and Germany have been sequenced and supported that they represent different species (17), but it is yet unknown whether H. taeniaeformis from leopard cats represents third cryptic species. Therefore, in this study, we (i) sequenced the full mt genome sequence of H. taeniaeformis from the leopard cat (HTLC), (ii) compared it with that of H. taeniaeformis from the cat in China (HTCC) and Germany (HTCG), and (iii) tested the hypothesis that three isolates of H. taeniaeformis are genetically distinct via phylogenetic analyses of amino acid sequence datasets.

Parasites and DNA Isolation
An adult tapeworm representing H. taeniaeformis (designated HTLC) was collected from the small intestine of a leopard cat (Prionailurus bengalensis) in Beijing Zoo, China. The tapeworm was washed in physiological saline, identified primarily based on morphological characters using existing keys to species (18), fixed in 70% (v/v) ethanol, and stored at −20 • C until use. Total genomic DNA was isolated from this tapeworm using sodium dodecyl sulfate/proteinase K treatment, followed by spin-column purification (Wizard R SV Genomic DNA Purification System, Promega). Furthermore, the specimen was also identified as H. taeniaeformis by PCR-based sequencing of 28S rDNA gene and mt cox1 gene and had 99.2 and 100% nucleotide homology with those of H. taeniaeformis from Norway rat in India and brown rat in China deposited in GenBank (GenBank accession nos. JN020350 and MF380378), respectively.

Sequencing, Assembling, and Verification
A genomic DNA library (350 bp inserts) was prepared and sequenced by Novogene Bioinformatics Technology Co. Ltd. (Tianjin, China) using the Illumina HiSeq 2500 (250 bp pairedend reads). The clean reads were obtained from raw reads by removing adaptor sequences, highly redundant sequences, and "N"-rich reads, then were assembled into contigs with Geneious v 11.1.5 (19). Preliminary cox1 sequences of H. taeniaeformis were used as initial references for assembly with the suitable parameters (minimum overlap identity = 99%, minimum overlap = 150 bp, and maximal gap size = 5 bp) (19). The assembly generated a large contig ending with overlapping fragments. As this structure allowed a single circular organization of the mt genome, we assumed that the complete mt genome had been assembled completely. The completeness of the mt genome assembly was further verified by a long PCR experiment using four pairs of primers (Supplementary Figure S1 and Supplementary Table S1), which were designed in the conserved regions. When the additive sizes of amplicons are identical to the length of the assembled contig, it proves that the assembly is correct. The long PCR reaction system consisted of 25 µl: 10.5 µl ddH 2 O, 0.5 µl upstream primer, 0.5 µl downstream primer, 12.5 µl Premix Taq TM (Takara, LA Taq TM V 2.0 plus dye), and 1 µl genomic DNA. Samples were tested in the C1000 Touch TM Thermal Cycler (BioRad, USA) under the following conditions: 94 • C for 1 min (initial denaturation), then 98 • C for 10 s (denaturation), 54-59 • C for 40 s (annealing), and 68 • C for 4 min for 35 cycles, with a final extension at 72 • C for 8 min.

Annotation, Visualization, and Sequence Analysis
Protein-coding and rRNA genes were identified by alignment to homologous genes of the previously sequenced mt genome of H. taeniaeformis (17) using the MAFFT v7.122 software (20). tRNA genes were identified using ARWEN (21) and tRNAscan-SE (22). MEGA v6.0 (23) was utilized to analyze the amino acid sequences of each protein-coding gene. The map of the mt genome of the H. taeniaeformis was figured on the proksee online server (https:// proksee.ca/). Pairwise comparisons of the complete mt genomes were made among China isolates (designated HTCC, GenBank accession number: FJ597547), Germany isolates (designated HTCG, GenBank accession number: JQ663994), and China isolates in the present study, such as lengths, identities, A + T content, and codons. The A + T content was computed using DNAStar (v. 5.0). Sequence identities were calculated by ClustalW in Geneious v 11.1.5 (19): cost matrix IUB, gap open cost of 15, and gap extend cost of 6.66.

Phylogenetic Analyses
A total of 33 mt genome sequences of tapeworms within the family Taeniidae were used for phylogenetic analysis (Table S2), using Paruterina candelabraria (GenBank accession number NC039533) as an outgroup (24). Amino acid sequences were single aligned using MAFFT 7.122. The aligned sequences were then concatenated to form a single contig. The poor blocks were excluded from the contig using Gblocks 0.91b (http://phylogeny. lirmm.fr/phylo_cgi/one_task.cgi?task_type=gblocks) using default parameters (25). Phylogenetic analyses were conducted using two methods: Bayesian inference (BI) and maximum likelihood (ML). BI analysis was performed in MrBayes 3.1.1 as described previously (26,27). ML analysis was performed with PhyML 3.1 as described previously (27, 28

Analysis of Cox1 Gene Sequence
The newly generated sequences in the present study and previously published cox1 sequences of H. taeniaeformis (n = 84) from different hosts and regions (13), were aligned using the software MAFFT 7.122. The nucleotide sequence of these sequences was included in the present analysis, using H. krepkogorski (GenBank accession number NC021142) as an outgroup (6). The phylogenetic tree was constructed with Mega v 6.0 (23) using the Neighbor-Joining (NJ) method. The Kimura 2parameter (K2P) model was the most suitable model, with 1,000 bootstrap replicates.

Genome Content and Organization
We sequenced the H. taeniaeformis genome and produced over 3 Gb of Illumina short-read sequence datasets. A total of 20,634,954 × 2 raw reads were generated and 19,826,948 × 2 clean reads were obtained for assembly of the mt genome. The complete mt genome sequence of HTLC (GenBank accession no. ON055368) was 13,814 bp in size (Figure 1) and have a high A + T content (73.2%). It contains 12 protein-coding genes (cox1-3, nad1-6, nad4L, atp6, and cytb), 22 tRNA genes, two rRNA genes (rrnS and rrnL), and two non-coding (control or AT-rich) regions (NCR), but lacks an atp8 gene (Figure 1). The mt genome arrangement of this tapeworm is the same as that of Spirometra erinaceieuropaei and Raillietina tetragona (27,30).

Annotation
All of the 12 protein-coding genes begin with ATG (cytb, atp6, nad4L, nad1-3, cox1, cox2, and nad5), ATT (cox3), or GTG (nad4 and nad6) as their start codons. Six of the 12 genes appear to use TAA (nad1, nad2, nad5, nad6, cox3, and cytb) or TAG (atp6, nad4, and nad4L) as the stop codons. Three genes (cox1, cox2, and nad3) end with incomplete stop codon T ( Table 1). The 22 tRNA genes varied from 59 to 68 bp in size ( Table 1). Secondary structures predicted for the 22 tRNA genes of HTLC (not shown) are similar to those of other tapeworms T. solium and T. asiatica (31). The rrnL gene of this mt genome is located between tRNA-Thr and tRNA-Cys, and rrnS gene is located between tRNA-Cys and cox2. The length of rrnL and rrnS genes is 953 and 724 bp, respectively. The A + T content of rrnL and rrnS is 72.8 and 71.9%, respectively. Two NCRs in the mt genome were inferred. The NCR1 (172 bp) is located between the tRNA-Leu CUN (L1) and tRNA-Ser UCN (S2), and has an A+T content of 79.7%. The NCR2 (418 bp) is located between the nad5 and tRNA-Gly and has an A+T content of 79.5%.

Comparative Mt Genomic Analyses of HTLC With HTCC and HTCG
The mt genome sequence of HTLC was 13,814 bp in length, 167 bp longer than that of HTCC, and 74 bp longer than that of HTCG. The arrangement of the mt genes (i.e., 12 protein-coding genes, 22 tRNA genes, and two rRNA genes) and two NCRs was the same. Across the entire mt genome (except for the NCR), the sequence difference was 3.3% between HTLC and HTCC, 12.0% between HTLC and HTCG, and 12.1% between HTCC and HTCG. The greatest nucleotide variation among the three isolates of H. taeniaeformis was in the nad3 gene, whereas the least difference was detected in the cox3 ( Table 2). Amino acid sequences inferred from individual mt protein genes of HTLC were compared with those of HTCC and HTCG. The difference across amino acid sequences of the 12 protein genes between the HTLC and HTCC was 2.3%, 10.0% between HTLC and HTCG, and 10.6% between HTCC and HTCG. The amino acid sequence differences among three isolates of H. taeniaeformis ranged from 0.4 to 18.3%, with COX2 being the most conserved and NAD3 the least conserved protein. For the 22 tRNAs, the sequence difference was 3.6% between HTLC and HTCC, was 5.7% between HTLC and HTCG, and was 7.5% between HTCC and HTCG. For the rrnL and rrnS genes, the sequence difference was 3.1 and 6.1% between HTLC and HTCC, 11.5 and 13.9% between HTLC and HTCG, and 10.0 and 9.4% between HTCC and HTCG, respectively ( Table 2).

Phylogenetic Analyses
The topologies of the trees using BI and ML were identical (Figure 2). In the tree, four major clades were recovered within the family Taeniidae: [(Hydatigera + Taenia) + Echinococcus] and Versteria from the monophyletic groups. Within the Hydatigera, all Hydatigera isolates clustered together with high statistical support (Bpp = 1; Bf = 100), supporting that the genus Hydatigera is valid (32). In the present study, the separation of H. taeniaeformis isolates from different hosts and countries was shown to be into two distinct clades with strong support FIGURE 1 | The organization of mitochondrial genome of Hydatigera taeniaeformis China isolate. Gene scaling is only approximate. "NCR1" refers to a short non-coding region and "NCR2" refers to a long non-coding region.
(Bpp = 1; Bf = 100; Figure 2). The H. taeniaeformis isolates from cats in Japan, and China and leopard cat in China were clustered in the same clade, while H. taeniaeformis isolates from cats in Finland and Germany were clustered in the other clade, which was sister taxa with the former (Figure 2). The differences between the two distinct clades are slightly longer (looking at branch lengths) than between T. asiatica and T. saginata (6) and similar to Echinococcus spp. (33)(34)(35) (Figure 2). In addition, phylogenetic analyses based on the mt cox1 sequences among H. taeniaeformis (n = 84) from different hosts and geographical regions revealed three distinct clades (Supplementary Figure S2), which are consistent with a previous study (13).

DISCUSSION
The broad consensus on the presence of cryptic species within cat tapeworm H. taeniaeformis (6,13,17,36), but their taxonomic status remains unknown. As described previously, in the morphological features, the variations were embodied in the number of proglottids, the mean number of rostellar hooks, the small hooks of adult worms and metacestodes, the direction of the small hooks, the length of the cirrus sac, and the mean number and location of testes (10). Moreover, in terms of molecular evidence, a previous study reported that the level of nucleotide variation in the cox1 gene within the genus Hydatigera (formerly referred to as Taenia) is about 6.3-15.6% (37). Another study showed the cox1 gene sequence of a new cryptic species H. taeniaeformis was distinct from those of other isolates (9.0-9.5%) (12). In the present study, the sequence variations (12.0 and 12.1%) detected in the whole mt genome were consistent with previous results mentioned above in part of the mt cox1 gene among H. taeniaeformis from different regions and hosts (12,13). These findings indicate that in terms of classification, our results support our proposal that the H. taeniaeformis species complex consists of at least two cryptic species. In addition, prior studies have indicated that mtDNA sequence variation between closelyrelated nematode species was typically in the range of 10-20% (12,38). Phylogenetic analyses of H. taeniaeformis provided additional evidence that H. taeniaeformis represents closely but distinct taxa. Taken together, the molecular evidence presented here supports the hypothesis that H. taeniaeformis represents a species complex (13,17). The taxonomy of H. taeniaeformis is often insufficient based on morphological features and their hosts or geographical origins; therefore, accurate identification and differentiation of H. taeniaeformis can sometimes be challenging (7,13,36). In the present study, the characterization of the mt genome of HTLC in China also stimulates a reassessment of the phylogenetic relationships of H. taeniaeformis using mt genomic/proteomic datasets. Mt genomic sequences are useful molecular markers to study the species identification, genetic structure, and phylogenetic analyses of many worms, especially for searching potential cryptic species (17,38). Therefore, in the present study, the analyses of mt genomic sequences provided insight into the phylogenetic relationships of H. taeniaeformis;

CONCLUSIONS
The present study sequenced the complete mt genome sequence of the cat tapeworm HTLC in China. The comparative mt genomic analysis provided robust genetic evidence that H. taeniaeformis represents a species complex. The novel mt genomic datasets provide useful markers for further studies of the taxonomy and systematics of cat tapeworm H. taeniaeformis.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: https://www.ncbi.nlm.nih. gov/genbank/, ON055368.