Sequencing and Analysis of the Complete Organellar Genomes of Prototheca wickerhamii

Of the Prototheca genus, Prototheca wickerhamii has the highest clinical significance in humans. However, neither nuclear nor organellar genomes of this species were sequenced until now. The hitherto determined and analyzed mitochondrial and plastid genomes of the alleged P. wickerhamii species belong in fact to another species, recently named Prototheca xanthoriae. This study provides a first insight into the organellar genomes of a true P. wickerhamii (type strain ATCC 16529). The P. wickerhamii mitochondrion had a 53.8-kb genome, which was considerably larger than that of Prototheca ciferrii (formerly Prototheca zopfii gen. 1) and Prototheca bovis (formerly Prototheca zopfii gen. 2), yet similarly functional, with the differences in size attributable to a higher number of introns and the presence of extra unique putative genes. The 48-kb plastid genome of P. wickerhamii, compared to autotrophic Trebouxiophyceae, was highly reduced due to the elimination of the photosynthesis-related genes. The gene content of the plastid genome of P. wickerhamii was, however, very similar to other colorless Prototheca species. Plastid genome-based phylogeny reinforced the polyphyly of the genus Prototheca, with Helicosporidium and Auxenochlorella branching within clades of Prototheca species. Phylogenetic reconstruction also confirmed the close relationship of P. wickerhamii and P. xanthoriae, which is reflected in the synteny of their organellar genomes. Interestingly, the entire set of atp genes was lost in P. wickerhamii plastid genome while being preserved in P. xanthoriae.


INTRODUCTION
The genus Prototheca comprises unicellular, nonphotosynthetic, saprophytic microalgae, usually associated with humid and organic-rich environments. These organisms are unique in the Plantae kingdom in that they have consistently been implicated in human and animal infections, collectively referred to as protothecosis (Jagielski and Lagneau, 2007). The taxonomy of the Prototheca genus has recently been revised based on phylogenetic analysis of partial cytb gene sequences. In the light of this new classification, 14 Prototheca species are proposed, split into two major lineages, comprising either human-or cattle-associated species (Jagielski et al., 2019).
The genus Prototheca along with another nonphotosynthetic genus Helicosporidium belongs to the predominantly photosynthetic clade of Trebouxiophyceae green algae. All Prototheca species evolved from photosynthetic algae that had lost their ability to photosynthesize yet retained vestigial plastids with substantially reduced genome (Suzuki et al., 2018). It has been suggested that the loss of photosynthesis happened independently three times in this lineage (Suzuki et al., 2018), and that might be reflected in the gene order and gene complement of the vestigial plastid genomes.
Prototheca wickerhamii represents the predominant etiological agent of human protothecosis, affecting both immunocompetent and immunocompromised patients (Todd et al., 2018). Clinically, the disease most frequently involves the skin and subcutaneous tissue followed by articular and disseminated manifestations. Treatment of protothecal infections is often difficult due to resistance of the algae to multiple antimicrobial agents (Lass-Flörl and Mayr, 2007;Todd et al., 2018).
In this work, we report, for the first time, the complete organellar genomes of the true P. wickerhamii species.

MATERIALS AND METHODS
The type strain of P. wickerhamii (ATCC 16529) was used in the study. Genomic DNA was extracted with a previously described protocol (Jagielski et al., 2017).
Whole genome sequencing (WGS) was performed with a combination of the Illumina MiSeq (Illumina, USA; paired-end, 2 × 300 bp) and PacBio (Pacific Biosciences, USA) platforms using the manufacturer's standard protocols.
Sequencing of the organellar P. wickerhamii DNA yielded a total of 134,966 and 9,601 reads for Illumina and PacBio respectively. This accounted for 1.1% (Illumina, USA) and 3.3% (PacBio) of the total number of reads for the entire P. wickerhamii genome.
Sequence reads were quality filtered and trimmed using FASTX toolkit (Pearson et al., 1997) and Cutadapt (Martin, 2011), respectively. The PacBio read sets were assembled de novo using wtdbg2 software (Ruan and Heng, 2019) and then corrected using Illumina data with Pilon software (Walker et al., 2014). All bioinformatics manipulations were done using the SeqMan software (DNAStar, USA) and CLCBio Genomic Workbench NGS pipeline (CLCBio, Denmark).
RNA was isolated using Total RNA kit (A&A Biotechnology, Poland) with RNase-free DNase (A&A Biotechnology, Poland) treatment step. Libraries were generated and sequenced according to the producer's protocol on a MiSeq instrument (Illumina, USA).
Plastid-based maximum likelihood phylogenomic analysis was performed using IQ-TREE v1.6.12 (Nguyen et al., 2015;Chernomor et al., 2016) with 1,000 bootstrap replicates on all 79 nonhypothetical, protein-coding genes present in the plastid genomes of the 24 investigated taxa. All genes were translated into amino acid sequences, aligned with MAFFT v7.271 (Katoh and Standley, 2013), and concatenated in Geneious 10.2.6 (Kearse et al., 2012) to produce a single alignment with total length of 37,109 amino acids, which was subsequently used as the input dataset for tree reconstruction. The sequence evolution model was selected automatically by IQ-TREE (-m TEST parameter; Kalyaanamoorthy et al., 2017) for every partition (gene), and the selected models are shown in Supplementary  Table 1.
The mitochondrial and plastid genomes of P. wickerhamii were deposited in the GenBank under MN794237 and MN794236 accession numbers, respectively.
All raw sequence data produced in this study were deposited in the NCBI Short Reads Archive (SRA) under project numbers PRJNA646401 (mitochondrion genome) and PRJNA646400 (plastid genome).

RESULTS AND DISCUSSION
The P. wickerhamii mitochondrial DNA (mtDNA), comprising 53,878 and 3,861 reads for Illumina and PacBio, respectively and a 475-fold coverage, was AT-rich, circular mapping molecule ( Figure 1A, Table 1). It was characterized with 42.4% of noncoding DNA including introns (36.1% excluding introns) and an average intergenic space of 340.6 bp. Overall, the P. wickerhamii mtDNA architecture mirrored P. xanthoriae mtDNA (Wolff et al., 1994;Severgnini et al., 2018;Jagielski et al., 2019). The P. wickerhamii mitochondrial genome was significantly larger than that of P. ciferrii and P. bovis ( Table 1). These differences might be explained by putative rearrangements and/or reduction events. Exemplarily, in P. wickerhamii and P. xanthoriae, the cox1 gene contains three (4,975 bp) and four exons (5,376 bp), respectively, whereas in P. bovis and P. ciferrii, it is a single exon gene of 1,574 bp. Noteworthy, in the former two species, the cox1 gene contains two additional genes (in P. wickerhamii designated as DBVPGmt_008 and DBVPGmt_009), which are missing in the mitochondrial genomes of P. bovis and P. ciferrii ( Figure 1A).
The mitochondrion of P. wickerhamii encoded 38 proteins, a number that almost equaled that of A. protothecoides (39), yet being higher than in other analyzed species, where it ranged from 32 (C. variabilis) to 37 (Helicosporidium sp.) ( Table 1).
Mitochondria and plastids originated from a primary endosymbiotic event, yet the subsequent evolution of the two organelles differ. Whereas mitochondria have evolved in a vertical inheritance, plastid evolution has involved both vertical and horizontal spread (Archibald, 2015;Martin et al., 2015;de Vries and Archibald, 2018).
A standard set of 32 mitochondrial protein-coding genes was present in P. wickerhamii, namely ribosomal proteins, apocytochrome b, subunits of the ATPase, cytochrome oxidase, NADH dehydrogenase complexes, and Twin-arginine translocation protein. Almost all of them were found among the analyzed species (Supplementary Table 2). One exception was atp8 coding for ATP synthase F0 subunit 8, which was demonstrated in all algal species but P. xanthoriae (Supplementary Table 2). Furthermore, the rpl10 gene coding for a ribosomal protein was found only in P. ciferrii and P. bovis. Transfer of the atp8 gene from the mitochondrial genome to the nuclear genome had already been reported in various eukaryotic lineages, including ciliates (Burger et al., 2000), apicomplexans, dinoflagellates (Slamovits et al., 2007), and Chlorophyceae (Martıńez-Alberola et al., 2019). The rpl10 gene was lost several times in Chlorophytes, including Chlorophyceae, Ulvophyceae, perhaps Prasinophyceae, and some Trebouxiophyceae (Martıńez-Alberola et al., 2019). Whether the atp8 and rpl10 were lost entirely or transferred to the protothecal nuclear genomes remains to be answered. Overall, the mitochondrial gene content appears to be highly conserved among the analyzed species.
A total of five introns in two genes (2/67; 3%) were characterized in P. wickerhamii mtDNA, with a total length of 7,016 bp ( Table 1). Those introns split the cox1 and rrn23, into three and four exons, respectively. A more complex intron structure, with up to seven introns and the total length reaching 8,200 bp was observed in P. xanthoriae, C. variabilis, A. protothecoides, and Helicosporidium sp. Interestingly, P. bovis showed only a single intron in the rrn23 gene (Table 1). In P.

A B
FIGURE 1 | P. wickerhamii mitochondrion (A) and plastid (B) circular plot. Genes (filled boxes) located outside/inside the map are transcribed clockwise/ counterclockwise. tRNA genes are indicated by the "trn" followed by one-letter amino acid code and anticodon given behind the dash. Innermost circle represents GC content. wickerhamii, four introns were annotated either as LAGLIDADG endonuclease (DBVPGmt_004) or had LAGLIDADG motifs (DBVPGmt_001, DBVPGmt_002, DBVPGmt_009), which indicates a putative endonuclease function for the protein (Pombert and Keeling, 2010). LAGLIDADG motifs, commonly found in group I introns (Haugen and Bhattacharya, 2004), had previously been described in P. bovis and P. xanthoriae, but not in P. ciferrii (Supplementary Table 2). The P. wickerhamii plastid DNA (ptDNA), comprised 81,088 and 5,740 reads for Illumina and PacBio, respectively, had 730-fold coverage ( Figure 1B, Table 1). It was similar in size to the plastid of P. stagnora and almost double in size compared to plastids of P. ciferrii and P. bovis. Structurally, the P. wickerhamii plastid was compact, with about 25% of noncoding DNA including introns (23.9% excluding introns) and an average intergenic space of 171.2 bp.
The P. wickerhamii ptDNA was predicted to contain 35 protein-coding genes, which was somewhat lower compared with P. xanthoriae and P. cutis (40), yet clearly higher as compared with P. ciferrii and P. bovis (19). More than twice as much proteins as in P. wickerhamii were encoded by the plastid genomes of the two photosynthetic species ( Table 1).
The so far sequenced plastid genomes of colorless Prototheca spp. were shown to be highly reduced due to the elimination of photosynthesis-related genes. Moreover, comparative analyses of the ptDNAs revealed that the gene content for plastid functions was highly conserved among these nonphotosynthetic lineages (Severgnini et al., 2018;Suzuki et al., 2018;Maciszewski and Karnkowska, 2019;Zeng et al., 2019;). The plastid genomes of Prototheca spp. lacked cytochrome complex, photosystem I and II proteins, and genes involved in chlorophyll biosynthesis when compared with their photosynthetic relatives, i.e. A. protothecoides and C. variabilis. The gene content differed also between the Table 3). Only P. wickerhamii and P. xanthoriae retained all large ribosomal subunits. Only P. wickerhamii, P. xanthoriae, and P. cutis preserved all small ribosomal subunits, two protein quality control genes-ftsH and clpP, two translation mediating genes-tufA and infA, and cell division gene-minD. Algae that contain minD in their plastid genome are exclusively monoplastidic (de Vries and Gould, 2018). However, the number of plastids in P. wickerhamii remains unknown (Nadakavukaren and McCracken, 1977;Kiyohara et al., 2006). P. wickerhamii, P. xanthoriae, P. cutis, and P. stagnora retained also all rpo subunits, which are absent from the plastid genomes of P. ciferrii and P. bovis (Supplementary Table 3).

Prototheca species (Supplementary
As it was previously hypothesized, changes in ycf1 occur concomitantly with changes in FtsH (de Vries et al., 2017). Unexpectedly, P. wickerhamii did encode FtsH but not ycf1 (Supplementary Table 3). Although FtsH is a plastid-encoded component of the photosystem II maintenance machinery, ycf1 function is still debatable.
Interestingly, P. wickerhamii, P. stagnora, P. bovis, and P. ciferrii, in contrast to P. xanthoriae and P. cutis, lacked six genes for the ATP synthase subunits, typically involved in the photosynthesis (Supplementary Table 3). Genes of the ATP synthase/hydrolysis complex were also detected in nonphotosynthetic plastids of a diatom Nitzschia despite the lack of genes for photosynthesis, carbon fixation, and chlorophyll production. It has been hypothesized that ATP synthase subunits present in Nitzschia may produce a proton gradient between the thylakoids and stroma, which is involved in protein translocation (Kamikawa et al., 2015). Moreover, reconstruction of plastid metabolism of this diatom suggested that the ATP synthase complex might function to regulate activities of plastid proteins involved in amino acid biosynthesis, reductive pentose phosphate pathway, and glycolysis/gluconeogenesis by pumping protons between the stroma and thylakoid lumen (Kamikawa et al., 2017). The function of the ATP synthase in Prototheca spp. might be similar. However, it appeared not indispensable in this lineage since most of the known Prototheca ptDNAs completely lacked all genes required for the plastid ATP synthase. In the absence of ATP synthase, some ATP might be imported from the cytosol by plastid ATP transporters as it was shown in diatoms (Ast et al., 2009). Still, whether or not that is the case in Prototheca cannot be answered without plastid proteome reconstruction. At this place, it is worth to note that P. wickerhamii was demonstrated, upon electron microscopy studies, to contain double-membraned plastids with starch grains and rudimentary lamellar-like structures (Nadakavukaren and McCracken, 1977;Kiyohara et al., 2006). The observed differences in the gene content among Prototheca spp. may reflect an independent loss of photosynthesis in several protothecal lineages. Therefore, various lineages and species might be at a different stage of the reductive evolution of plastid functions after the loss of photosynthesis.
Plastid genome-based phylogeny resolved Prototheca as a polyphyletic genus, composed of two major clades ( Figure 2). The first clade contains two pairs of sister species: P. cutis and P. wickerhamii, and P. xanthoriae and Auxenochlorella protothecoides. The second clade contains P. stagnora, P. bovis, and P. ciferrii, with Helicosporidium sp. situated on a long branch at its base. All nodes within the aforementioned clades have absolute bootstrap support. Despite limited taxon sampling, our results are fully concordant with cytb-based single-gene phylogeny of Prototheca and their relatives by Jagielski et al., 2019. Although no plastid genome sequence is available for the Prototheca lectotype strain, which is P. zopfii ATCC 16533, the recent work of Jagielski et al., 2019 resolves the phylogenetic position of this species as sister to P. ciferrii. Therefore, the lectotype would certainly be within the second of the aforementioned clades in our phylogeny, suggesting that only this clade should be recognized as the genus Prototheca. Except A. protothecoides, the rest of the analyzed species, including Prototheca spp. and Helicosporidium sp., are secondary nonphotosynthetic Trebouxiophyceae. The tree topology suggests that photosynthesis has been lost at least three times in this lineage, first in P. xanthoriae, the second time in the common ancestor of P. wickerhamii and P. cutis, and the third time in the second clade encompassing Helicosporidium, P. stagnora, P. bovis, and P. ciferrii. Three independent losses of photosynthesis in this group are in agreement with earlier reconstructions (Suzuki et al., 2018), and are an excellent example of convergent reductive evolution, reflected in the nearly identical plastid-genome complement in all the Prototheca species.
The mtDNA gene order analysis revealed two lineages among the Prototheca spp. investigated: the first allocated P. wickerhamii and P. xanthoriae, the second contained P. ciferrii and P. bovis ( Figure 3A). Not surprisingly, highly syntenic pairs of genomes represented closely related taxa (Jagielski et al., 2019).
The ptDNA protein-coding gene order was exactly identical in the entire clade containing A. protothecoides, P. wickerhamii, P. cutis, and P. xanthoriae-a sole rearrangement was observed in P. cutis, where a small block of three tRNA-coding genes (trnG, trnH, and trnL) was inverted. An overall similar, yet definitely separate ptDNA synteny type was observed in the other clade of Prototheca, composed of P. stagnora, P. bovis, and P. ciferrii ( Figure 3B). In this group, four locally collinear ptDNA blocks are translocated in comparison to the first Prototheca clade, with a fifth (tilS-rps4) block being additionally inverted, which is a unique case of protein-coding gene inversion in protothecan ptDNA since their diversification from the last common ancestor of the entire genus.
The presence of two Prototheca lineages, evidenced by pt-and mtDNA structure raises a question, if the human-and cattleassociated clades, represented by P. wickerhamii and P. bovis,  respectively, have acquired pathogenic features independently and in parallel rather than from a common ancestor. The WGS data of Prototheca spp. will give a better understanding of the pathobiology and evolution of this genus.

CONCLUSIONS
In conclusion, this study provides a first, brief insight into the organellar genomes of P. wickerhamii. The mtDNA of P. wickerhamii preserved its functionality similar to other related organisms, with its size extension, mostly due to a higher number of introns (five in both P. wickerhamii and P. xanthoriae), as well as some unique putative genes unseen in other species (P. bovis and P. ciferrii). Compact and simplified structure was observed in the P. wickerhamii plastid genome, driven by the lack of photosynthesis-related genes. The architecture of the P. wickerhamii mitochondrial and plastid genomes resembles more that of closely related saprophytic P. xanthoriae than of pathogenic P. ciferrii and P. bovis.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/supplementary material.

AUTHOR CONTRIBUTIONS
ZB performed culturing, analyzed the data, and wrote the article. JG performed DNA isolation and genome sequencing (Illumina). RG provided genome annotation. PS performed synteny analysis. JP performed genome sequencing (PacBio). KM performed phylogenetic analysis, reviewed synteny analysis and genome annotation. AG  performed RNA-seq analysis. AK performed phylogenetic analysis and reviewed the manuscript. TJ conceptualized and supervised the study, provided the funding, critical revision. All authors contributed to the article and approved the submitted version.