Comparative Genomics Analysis of Ciliates Provides Insights on the Evolutionary History Within “Nassophorea–Synhymenia–Phyllopharyngea” Assemblage

Ciliated protists (ciliates) are widely used for investigating evolution, mostly due to their successful radiation after their early evolutionary branching. In this study, we employed high-throughput sequencing technology to reveal the phylogenetic position of Synhymenia, as well as two classes Nassophorea and Phyllopharyngea, which have been a long-standing puzzle in the field of ciliate systematics and evolution. We obtained genomic and transcriptomic data from single cells of one synhymenian (Chilodontopsis depressa) and six other species of phyllopharyngeans (Chilodochona sp., Dysteria derouxi, Hartmannula sinica, Trithigmostoma cucullulus, Trochilia petrani, and Trochilia sp.). Phylogenomic analysis based on 157 orthologous genes comprising 173,835 amino acid residues revealed the affiliation of C. depressa within the class Phyllopharyngea, and the monophyly of Nassophorea, which strongly support the assignment of Synhymenia as a subclass within the class Phyllopharyngea. Comparative genomic analyses further revealed that C. depressa shares more orthologous genes with the class Nassophorea than with Phyllopharyngea, and the stop codon usage in C. depressa resembles that of Phyllopharyngea. Functional enrichment analysis demonstrated that biological pathways in C. depressa are more similar to Phyllopharyngea than Nassophorea. These results suggest that genomic and transcriptomic data can be used to provide insights into the evolutionary relationships within the “Nassophorea–Synhymenia–Phyllopharyngea” assemblage.


INTRODUCTION
Members of the order Synhymeniida are characterized by a cytopharyngeal basket of well-developed nematodesmata bound proximally by the presence of a band of somatic dikinetid cilia (Kivimaki et al., 2009; Figure 1). Corliss assigned the Synhymeniida to the subclass Hypostomata and placed the nassulids and microthoracids in a separate order (Nassulida) within the same subclass (Corliss, 1979; Table 1). Subsequently, Synhymeniida was assigned to the class Nassophorea because it possesses a cytopharyngeal basket (nasse), a character shared by the typical nassophorean groups (nassulids and microthoracids) (Small and Lynn, 1985;Lynn and Small, 2002;Lynn, 2008; Table 1). However, phylogenetic studies based mainly on the small subunit ribosomal DNA (SSU rDNA) gene and other related genes (e.g., LSU rDNA, ITS, alpha-tubulin), as well as a comparison of ultrastructural synapomorphies and autapomorphies, suggested that the subclass Synhymenia should be moved from the class Nassophorea to Phyllopharyngea (Gong et al., 2009;Zhang et al., 2014;Gao et al., 2016; Table 1). Based on a study of the ultrastructure of the typical synhymenian species Zosterodasys agamalievi Kivimaki et al. (1997) concluded that the synhymenians cannot be assigned to either Phyllopharyngea or Nassophorea, because they have a combination of characters that precludes their inclusion in either. A recent study used a phylogenomic approach to test evolutionary relationships within the class Nassophorea, however, the dataset did not represent a broad sampling of all major ciliate clades (Lynn et al., 2018). Genomes of typical synhymeniids, for example, have yet to be reported (Lynn and Small, 2002).
The subclasses Cyrtophoria, Rhynchodia, Chonotrichia, and Suctoria each constitutes a clade within the monophyletic class Phyllopharyngea (de Puytorac, 1994;Lynn and Small, 1997;Lynn, 2008; Table 1). The chonotrichians, which are sessile symbionts typically found on the mouthparts of crustaceans, have long been recognized as phyllopharyngeans (Corliss, 1979;Lynn, 2016). Previous studies have revealed strong similarities between the cortical microtubular components of chonotrichians (represented by Chilodochona) with the cortical components of free-living cyrtophorians (Dobrzaprnska-Kaczanowska, 1963;Grain and Batisse, 1974;Fahrni, 1982). However, molecular phylogenies for chonotrichians were lacking (Gao et al., 2012(Gao et al., , 2014. Cyrtophorians are characterized by a glabrous dorsal side and prominent cytopharyngeal basket composed of a large number of microtubules (Lynn, 2008). Although several cyrtophorians have been studied recently (Chen et al., 2016;Pan et al., 2016;Zhang et al., 2018), the systematics of this group remains ambiguous. Therefore, the class FIGURE 1 | Hypothetical evolution of ciliated protozoa based on both morphological and molecular data to show the relationships and the positions of the taxa at class level. Synhymenia, Phyllopharyngea, and Nassophorea are highlighted. This figure was modified from Gao et al. (2016). Phyllopharyngea needs to be re-evaluated based on more data. In the present study, genomic and transcriptomic data were used to analyze the evolutionary relationships among the synhymenians, nassophoreans, and phyllopharyngeans.
For the three nassophorean species, Nassula variabilis, Furgasonia blochmanni, and Pseudomicrothorax dubius, RNA sequences were obtained from the published dataset (NCBI accession number PRJNA434361). The adapters of reads were trimmed by Trimmomatic v0.36 (parameters as described above), and the clean reads were used for transcriptome assembly by Trinity v2.6.6 (default parameters) (Grabherr et al., 2011). Any redundancy and contamination of transcriptome assemblies were removed by BLAST and CD-HIT, as described above.
RNA-seq transcriptome data for another 21 ciliate species were downloaded from the Marine Microbial Eukaryote Transcriptome Sequencing Project 1 plus six ciliate species whose high-quality data were acquired from public databases (data sources are summarized in Supplementary Table S1).

Intron Length Distribution and Stop Codon Preference Detection
RNA-seq data of C. depressa were mapped to the corresponding reference genome assembly by HISAT2 v2.1.0 (Kim et al., 2015) and the transcriptome was assembled by StringTie v1.3.4d (Pertea et al., 2016). The size distribution of introns was analyzed by custom Perl scripts and the intronic boundary motif was visualized by WebLogo3 (Crooks et al., 2004). The frequency of stop codon usage was estimated by a custom Perl script which recognized the three nucleotides following the codon of the last amino acid in each matched complete protein sequences. This is identified as a potential stop codon by BLAST (Evalue ≤ 1e−10, identity ≥ 30%, alignment length ≥ 50 amino acids) transcript sequences of investigated species against ciliate protein database in Uniprot 2 (detail of method was uploaded to the GitHub repository: Bryan0425/Stop_codon_usage). Among different groups of ciliates, Student's t-test was performed in inner-class and outer-class.

Datasets and Alignments
Newly characterized sequences that combined relevant sequences obtained from the GenBank database were assembled into two datasets: (1) phylogenomics data (59 taxa in total), i.e., the 2 newly sequenced trascriptomes; 5 newly sequenced genomes; and 39 other ciliates, plus 9 apicomplexans and 4 dinoflagellates as outgroups; (2) SSU-rDNA database (56 taxa in total), i.e., 5 newly sequenced genes plus 3 heterotrichs and karyorelictids as outgroups. Sequences information and GenBank accession numbers are shown in Supplementary Table S2.

Phylogenetic and Phylogenomic Analyses
Small subunit ribosomal DNA sequences were multiple-aligned and trimmed to be blunt using BioEdit 7.1.3.0 (ClustalW algorithm) with the default parameters. Maximum-likelihood (ML) and Bayesian inference (BI) analyses were carried out on the online server CIPRES Science Gateway (Miller et al., 2010;Penn et al., 2010), using RAxML-HPC2 on XSEDE v8.2.10 (Stamatakis, 2014) with GTRGAMMA model and MrBayes on XSEDE v3.2.6 (Ronquist et al., 2012) with GAMMA model calculated by MrModeltest v2.3 (Nylander, 2004), respectively. In the ML analyses, 1,000 bootstraps were performed to assess the reliability of internal branches. For Bayesian analyses, Markov chain Monte Carlo (MCMC) simulations were run with two sets of four chains for 4,000,000 generations, with sampling every 100 generations and a burn-in of 10,000 (Chen et al., 2015). All the remaining trees were used to calculate posterior probabilities (PP) using a majority rule consensus. MEGA v7.0.20 (Kumar et al., 2016) was used to visualize the tree topologies.

Morphology of Synhymenians, Nassophoreans, and Phyllopharyngeans
The main morphological characteristics of the synhymenians, nassophoreans, and phyllopharyngeans are summarized in Figure 2A. Their divergence/convergence has been described in detail and discussed previously (Hausmann and Peck, 1978;Eisler, 1988;Chen et al., 2018a). Briefly, synhymenians and nassophoreans have a synhymenium, which is a kinetic structure below the oral apparatus, mainly on the ventral surface. In synhymenians the synhymenium extends from left to right on the ventral surface, but in Nassophorea it is restricted to the left ventral surface. This serves as the main characteristic to distinguish these two groups. With the exception of the synhymenians, this structure is absent in the Phyllopharyngea.

Genomic Features of Classes Phyllopharyngea and Nassophorea
Although the newly sequenced ciliates have differing genome sizes and GC contents, most genome assemblies of phyllopharyngeans are about 40 M ( Table 2).
Chilodontopsis depressa has small, AT-rich introns, most of which are 36 bp ( Figure 2B) with a GT-AG motif and a conservative third A site ( Figure 2C). Also, of the non-coding RNA types predicted by RepeatMasker, 55% are tRNAs (171), 21% are snRNAs (65), 13% are rRNAs (40), and 11% are other RNAs (34) (Figure 2D). The predicted genes were annotated according to cellular component, molecular function, and biological process ( Figure 2E and Supplementary Figure S2). Compared to Tetrahymena thermophila, C. depressa has a high gene content in its binding and protein-binding pathways, which might be associated with the molecular function of its well-developed cytopharyngeal basket of microtubules and its algivorous mode of nutrition (Tucker, 1968;Weisenberg et al., 1968;Chaaban and Brouhard, 2017). Orthologous genes were identified among the seven species sequenced in the current work and other previously sequenced ciliates (Figure 3 and Supplementary Figure S1). 703 orthologous genes shared by Chilodontopsis depressa and four model ciliates were identified ( Figure 3A). C. depressa shared fewer orthologous genes with phyllopharyngeans (550) than with nassophoreans (1141), which is consistent with its closer morphological resemblance to the latter, e.g., the presence (vs. absence) of a synhymenium (Figures 2A, 3B,C). Moreover, the GO term enrichment results suggested that C. depressa shared more metabolic features with phyllopharyngeans than with nassophoreans ( Figure 4A and Supplementary Figure S2), such as regulation of cellular processes, regulation of biological processes, cellular response to stimulus, and biological regulation pathways. Such features tend to be enriched in nassophoreans not phyllopharyngeans.
Based on the homolog search between transcripts of C. depressa and protein sequences of other ciliates, the most frequently used stop codon in C. depressa was identified as TAA (91.3%, Figure 4B). Compared with model ciliates such as T. thermophila and Oxytricha trifallax, stop codon usages of other newly sequenced phyllopharyngeans and nassophoreans were also predicted. The results showed that the stop codon usage of C. depressa was closer to Chilodonella uncinata and D. derouxi (both of which belong to Phyllopharyngea) (deviation = 1.9) than to Nassophorea species (deviation = 25.6). The preference of stop codon usage was well-conserved within each class (pvalue = 0.01538 < 0.05 by t-test).

Phylogenetic and Phylogenomic Analyses
The SSU rDNA dataset provided the most abundant taxon sampling. We carried out phylogenetic analyses based on 56 SSU-rDNA sequences (Figure 5). In the resulting trees, C. depressa grouped with the typical synhymenians, albeit in the basal position. The subclass Synhymenia grouped with the class Phyllopharyngea (or "Subkinetalia, " a name coined for the class Phyllopharyngea with synhymenians excluded), with high support values (ML/BI, 93/1.00). The monophyly of Subkinetalia is fully supported. The class Nassophorea is paraphyletic in the SSU rDNA tree, with the three orders (Microthoracida, Nassulida, and Discotrichida) separating to various positions within the CONthreeP assemblage (a robust clade comprising the classes Colpodea, Oligohymenophorea, Nassophorea, Prostomatea, Plagiopylea, and Phyllopharyngea) (Lynn, 2008;Adl et al., 2019).
Seven new genomes and two new transcriptomes were generated in the present study. The "omics" data source of orthologous genes for phylogenomic analyses has thus been increased to 46 ciliates, although C. depressa remains the only representative synhymenian (Figure 6). By using GPSit with relaxed mode (Chen et al., 2018b), we established a database with 157 genes comprising 173,835 amino acid residues. In the resulting phylogenomic tree using the ML algorithm, C. depressa clustered with the Subkinetalia with moderate support (ML, 78). The monophyly of the Subkinetalia was strongly supported by both ML and BI analyses. The branching pattern within Subkinetalia is in accordance to the SSU rDNA phylogeny inferences (Figure 5), although present taxon sampling is restricted to the subclass Cyrtophoria. For the class Nassophorea, transcriptome data are available for three species representing two orders, i.e., Nassulida and Microthoracida. The three species grouped together with a  Jankowski (1973). 1, Synhymenium restricted on left ventral sometimes dorsal surface; 1 , synhymenium extends from left to right ventral surface; 1 , synhymenium absent; 2, synhymenium usually comprises three adoral polykinetids; 2 , synhymenium comprises more than three adoral polykinetids; 3, free-living, merotelokinetal division, oral kinetofragments present; 3 , sessile, division by budding, oral kinetofragments absent. moderate support (ML/BI, 78/0.50). The classes Nassophorea and Colpodea formed a moderately supported clade in the ML tree, which subsequently grouped with the class Oligohymenophorea. The class Phyllopharyngea was sister to the Colpodea + Nassophorea + Oligohymenophorea clade according to the ML algorithm.

DISCUSSION
The Systematic Assignment of Synhymenians-Nassophoreans-Phyllopharyngeans Assemblage The systematic position of synhymenians has long puzzled taxonomists, and their assignment to the class Nassophorea (nassulids and microthoracids) or to the Subkinetalia have been controversial. Synhymenians share ultrastructural and morphological similarities with both Nassophorea and Phyllopharyngea, suggesting that they are likely a transition group between these two classes (Eisler and Bardele, 1983;Small and Lynn, 1985;Kivimaki et al., 1997;Lynn and Small, 2002). A comprehensive comparison of morphology and ultrastructure among synhymenians, Nassophorea, and the Subkinetalia has previously addressed this issue (Gong et al., 2009). The SSU rDNA phylogeny ( Figure 5) and previous studies support a close relationship between synhymenians and Subkinetalia, and thus synhymenians were suggested as a subclass (Synhymenia) within the class Phyllopharyngea (Zhang et al., 2014;Gao et al., 2016Gao et al., , 2017Liao et al., 2018). In this study we provide genomic data for a synhymenian, i.e., Chilodontopsis depressa, for the first time. C. depressa occupied the basal position within the subclass Synhymenia (Figure 5), indicating that this species might be an ideal representative to demonstrate the ancestral candidate of the synhymenians. The phylogenomic analysis moderately supported (ML, 78) the affiliation of C. depressa to Subkinetalia, while the nassophoreans (represented by nassulids and microthoracids) formed another moderately supported monophyletic group (Figure 6). This is consistent with findings of previous studies based on SSU rDNA and multi-gene data which inferred that synhymenians are more closely related to Phyllopharyngea than to Nassophorea (Gong et al., 2009;Gao et al., 2016).
It is noteworthy that bootstrap support values for grouping C. depressa with Subkinetalia are lower in the phylogenomic analysis than those based on SSU rDNA data (ML, 78 vs. ML/BI, 93/1.00), whereas support for the monophyly of Subkinetalia are high in both analyses. These findings imply that, on a genomic scale, synhymenians and Subkinetalia are more divergent than indicated by the SSU rDNA gene (Figures 5, 6). This is consistent with the obvious ultrastructural differences between these two groups: in synhymenians, the postciliary microtubules of each kinetosome form a double row (vs. triads in cyrtophorians), and the synhymenian monokinetids lack both the subkinetal ribbons of microtubules and the dense transverse fibrils that characterize the phyllopharyngeans (Kivimaki et al., 1997). Moreover, in Subkinetalia, there are some extremely specialized Subkinetalian groups, such as the suctorians and chonotrichians (Figure 2A), the adults of which are sessile and divide by budding. Collectively, these data indicate that there was a long history, and that a number of evolutionary events occurred, after the synhymenians and Subkinetalia diverged from their common ancestor.
The phylogenomic tree recovered only low to moderate support for the monophyly of Nassophorea (i.e., orders Nassulida and Microthoracida, with Discotrichida excluded). This is in accordance to a recent phylogenomic study of the nassulids (Lynn et al., 2018) and multi-gene analyses of the major ciliate lineages . The cytopharyngeal basket of nassulids is composed of one or more out of three sets of microtubular lamellae, namely cytostomal (Z), subcytostomal (Y), and nematodesmal (X) lamellae (Hausmann and Peck, 1978;de Puytorac and Njiné, 1980;Eisler, 1988). Microthoracids (e.g., Pseudomicrothorax and Leptopharynx) have only X lamella (Hausmann and Peck, 1978;de Puytorac and Njiné, 1980;Eisler, 1988), synhymenians have only Z lamella (Kivimaki et al., 1997), and cyrtophorians have only Z and Y lamellae (Kurth and Bardele, 2001). Controversial scenarios have been proposed to deduce the evolution of the cytopharyngeal basket apparatus based on phylogenetic (Gong et al., 2009) and phylogenomic analyses (Lynn et al., 2018). Inclusion of the new genomic data for the synhymenians in the analyses results in partial support for the previous hypothesis: synhymenians and Subkinetalia are members of the class Phyllopharyngea not Nassophorea (Gong et al., 2009;Gao et al., 2016). According to a scenario of morphological evolution deduced by Lynn et al. (2018), if the present phylogenomic tree is a probable representation of the order of divergence of these groups, i.e., a cytopharyngeal basket with microtubular nematodesmata and with only Z microtubular ribbons would probably be an ancestral feature of the CONthreeP main clade. A loss of Z lamella might have happened in the microthoracids, while the X and Y lamellae in nassophoreans and phyllopharyngeans could be apomorphies acquired later to enable the rapid ingestion of different food resources (Tucker, 1968;Hausmann and Peck, 1978;Lynn et al., 2018).
Nowadays, SSU rDNA is a powerful phylogenetic marker to resolve many evolutionary relationships, although it has its limitations (Hasegawa and Hashimoto, 1993). For example, SSU rDNA sequences that are similar in nucleotide composition FIGURE 5 | Phylogenetic tree based on the SSU rDNA sequence data. The five newly sequenced species are shown in bold and with red arrows. Numbers at nodes represent bootstrap values of ML from 1000 replicates and the posterior probability of BI, respectively. Clades with a different topology between the ML and BI tree are indicated by " * ". Scale bar corresponds to 10 substitutions per 100 nucleotide positions. might have been placed incorrectly close together in phylogenetic trees (Woese et al., 1991). In addition, because of random variation of single genes, it is risky to infer the phylogeny from any single gene. Thus, it is suggested that single gene phylogenetic analyses should be corroborated by use of other phylogenetic markers (Ludwig and Klenk, 2001). Following the development of illumina sequencing, phylogenomic analysis is used as the gold standard to estimate evolutionary relationships on the "genome level" (Brown et al., 2001;Ciccarelli et al., 2006;Wu and Eisen, 2008). Phylogenomic data are usually expected to be well resolved without many limitations of small data sets, e.g., relatively less informative characters from independent loci and random noise (Delsuc et al., 2005;Jeffroy et al., 2006). Among the present analysis, 157 proteins sequences of 59 species were used to constructed the phylogenomic tree. Therefore, a more robust estimation of evolutionary relationships could be obtained. On the other hand, our results show that on a genomic scale, there are more variations indicated than those on single gene phylogenetic analysis, like SSU rDNA (Figures 5, 6). Phylogenomic analysis applies much more data from whole genome/transcriptome compared to a specific one gene. Furthermore, proteins sequences exhibit more differences and variations among species when used to as a material of analysis. Some statistical support values in the tree were relatively high, but some were low or not as high as that in single-gene phylogenetic analysis, e.g., 58/0.79 versus full value at the branch of Dysteria and Trochilia.
Higher support values might be achieved if the number of informative sites is inadequate or the key species are not included in the analysis. Furthermore, gene selection of phylogenomic analysis is an important process. The orthologs must to be considered in analysis, especially phylogenetic markers such as HSP70 and tubulin (Baldauf et al., 2000;Ludwig and Klenk, 2001).

Tiny Introns Characterize Ciliate Genomes as Class Level
All eukaryotic genomes that have been sequenced harbor introns, but so far none have been identified in prokaryotes. Intron size varies in genomes of different species, e.g., from 169 bp in the model flowering plant Arabidopsis thaliana to 7,386 kb on average in humans (Jo and Choi, 2015;Piovesan et al., 2015). Previous studies of human genomes have suggested that intron size may play an important role in governing alternative splicing and sensing self/foreign circular RNAs (Fox-Walsh et al., 2005;Dewey et al., 2006;Kim et al., 2006;Chen et al., 2017). Furthermore, the minimum length of introns required for the splicing reaction and other advanced biological functions is thought to be 30 bp (Roy et al., 2008;Piovesan et al., 2015). However, smaller introns (15 bp) have been reported in heterotrich ciliates (e.g., Stentor coeruleus and Condylostoma magnum) (Slabodnick et al., 2017). Tiny introns have also been identified in the classes Armophorea (Nyctotherus ovalis: 27 bp) (McGrath et al., 2007;Ricard et al., 2008), Spirotrichea (O. trifallax: 38 bp) (Swart et al., 2013), and Oligohymenophorea (T. thermophila: 74 bp and Paramecium species: 26 bp) (Swart et al., 2013;Arnaiz et al., 2017). The present study reports for the first time tiny introns (36 bp) in the genome of species of the class Phyllopharyngea (Figures 2A,B).
Thus, based on the present and previous studies, tiny introns are another prevalent important feature of ciliate genomes just like the variation of rDNA copy numbers (Wang et al., 2019). Whether the small size of these introns impair their biological functions, governing alternative splicing for instance, requires further exploration.

Stop Codon Usage in Different Groups of Ciliates
Stop codon preference in the terminal region has been intensively investigated by evolutionary biologists (Alff-Steinberger and Epstein, 1994;Smith and Smith, 1996;Jungreis et al., 2011). Previous studies have reported the flexibility of the nuclear genetic code in ciliates by demonstrating that standard stop codons are reassigned to amino acids (Lozupone et al., 2001;Heaphy et al., 2016;Swart et al., 2016). One possible reason is that ambiguous genetic codes enabled the ancestors of ciliates to thrive during a certain periods in their evolutionary history (Swart et al., 2016). To determine how the stop codon rearrangement is conserved in different lineages, we systematically assessed the preference of stop codon usage in 33 ciliates based on their transcriptome data ( Figure 4B). Our results show that stop codon usage varies among different classes and, in some cases, within a class. For example, the stop codon usage of C. depressa (formerly classified in the class Nassophorea, order Synhymeniida) is more similar to that of C. uncinata (class Phyllopharyngea, order Chlamydodontida) and D. derouxi (class Phyllopharyngea, order Dysteriida) than to nassophoreans (e.g., N. variabilis, P. dubius, and F. blochmanni), suggesting Synhymeniida and other phyllopharyngeans may have a common recent ancestor. The close relationship between Synhymeniida and Phyllopharyngea is also supported by the results of phylogenomic analysis (Figure 6). We therefore posit that the preference of stop codon usage can be adopted as an extra parameter to resolve phylogenetic relationships. However, how the preference of stop codon usage is inherited remains poorly understood and should be addressed in future studies.

DATA AVAILABILITY STATEMENT
All Illumina sequencing datasets are deposited in the NCBI under the Accession Number PRJNA546036.

AUTHOR CONTRIBUTIONS
MM, BP, and XC conceived the study. BP and XC analyzed the data and interpreted the results using bioinformatics methods. LH, QZ, and ZQ performed the phylogenetics analysis and morphology identification and description. AW revised the writing of the manuscript. All authors contributed to the manuscript and agreed on the manuscript before review.