Definition of a High-Resolution Molecular Marker for Tracking the Genetic Diversity of the Harmful Algal Species Eucampia zodiacus Through Comparative Analysis of Mitochondrial Genomes

The cosmopolitan phytoplankton species Eucampia zodiacus is a common harmful algal bloom (HAB) species that have been found to cause HABs in essentially all coastal regions except the Polar regions. However, molecular information for this HAB species is limited with only a few molecular markers. In this project, we constructed the mitochondrial genome (mtDNA) of E. zodiacus, which was also the first mtDNA constructed for any species in the order Hemiaulales that includes 145 reported species (including two additional HAB species Cerataulina bicornis and Cerataulina pelagica). Comparative analysis of eight E. zodiacus strains revealed that they could not be distinguished using common molecular markers, suggesting that common molecular markers do not have adequate resolution for distinguishing E. zodiacus strains. However, these E. zodiacus strains could be distinguished using whole mtDNAs, suggesting the presence of different genotypes due to evolutionary divergence. Through comparative analysis of the mtDNAs of multiple E. zodiacus strains, we identified a new molecular marker ezmt1 that could adequately distinguish different E. zodiacus strains isolated in various coastal regions in China. This molecular marker ezmt1, which was ∼400 bp in size, could be applied to identify causative genotypes during E. zodiacus HABs through tracking the dynamic changes of genetic diversity of E. zodiacus in HABs.


INTRODUCTION
Harmful algal blooms (HABs) are results of rapid algal proliferation and/or aggregation of algae that can cause massive fish deaths, contamination of seafood with toxins, and/or ecological damages through the development of anoxia or habitat alteration (Gentien et al., 2003). HABs have become a global epidemic with significant economic, social, and human health consequences (Gentien et al., 2003). In recent decades, HABs have been increasing their frequency, persistence, regional coverage/spatial extent and economic impact worldwide as a result of enhanced coastal eutrophication, climate change and invasion of alien species (Sarkar, 2018). HAB species are multitudinous but hard to be identified accurately only using traditional morphological examination-based methods (Chen, 2020).
The Eucampia zodiacus Ehrenberg is a common HAB species of the genus Eucampia, family Hemiaulaceae, order Hemiaulales, class Mediophyceae, and phylum Bacillariophyta. It is 36-72 µm in width and 6-32 µm in height (Jin, 1965;Guo, 2004;Nishikawa and Imai, 2011). Under the light microscope, the alga has an "H" shape in its curved girdle view and it is elliptic in the valve view. The cells are connected by two short, blunt elevations, forming a spiral colony. The plastids are small and numerous, with a smallcake-shape (Hendey, 1964;Yang and Dong, 2006). E. zodiacus has a worldwide distribution except for the Polar regions and can be detected almost all-year round in the water column, providing considerable primary production (Horner, 2002;Ito et al., 2013;Nishikawa et al., 2013).
E. zodiacus can form dense blooms in coastal waters, which have been observed in the Tokyo Bay , Harima-Nada (Nishikawa et al., 2007), and Ariake sea (Matsubara, 2012) in Japan, Bay of Fundy (Martin et al., 2008) in Canada, Jiaozhou Bay, Haizhou Bay, Xiangshan Harbour and many other sea areas in China (Huo et al., 2001;Zhang et al., 2002;Liang, 2012). E. zodiacus blooms develop and last for a longer time because it is able to grow until the complete exhaustion of the available nutrients in the water column, and can take up as much nitrogen as other species such as Skeletonema species at low temperatures (Nishikawa et al., 2009;Ito et al., 2013). Notably, E. zodiacus blooms have been reported to cause bleaching of aquacultured nori, fisheries damage and economic losses through algal aggregations, competitive utilizing of nutrients (especially nitrogen) and resultant nutrient depletion in water columns (Martin et al., 2008;. Notably that E. zodiacus blooms displayed both spatial and temporal attributes based on previous studies. For example, E. zodiacus blooms often occur in winter and early spring in Japan (Nishikawa et al., 2007), while E. zodiacus blooms have been reported to occur most in summer in China (Huo et al., 2001;Zhang et al., 2002;Liang, 2012). Such differential spatial and temporal dynamics of E. zodiacus blooms suggest that E. zodiacus has genetic diversity and different strains are different in their ability to produce HABs.
Many common molecular markers of E. zodiacus including 18S rDNA, 28S rDNA, ITS, rbcL, and CO1 have been sequenced and applied to characterize E. zodiacus (Sorhannus, 2007;Rampen et al., 2009;Sorhannus and Fox, 2011;Ashworth et al., 2013;Hamsher et al., 2013;Guo et al., 2015). However, these molecular markers have not been evaluated for their ability to study intra-species genetic diversity of E. zodiacus. Some common molecular markers including 18S rDNA, 28S rDNA, and rbcL have been used to study intra-species variation (Riisberg and Edvardsen, 2008). However, common molecular markers are usually inadequate for distinguishing intra-species genetic diversity. For example, molecular markers including 18S rDNA, 28S rDNA, ITS, rbcL, and COI were demonstrated to be ineffective in resolving intra-species genetic diversity in the HAB species Phaeocystis globosa (Song et al., 2020). High-resolution molecular markers can be identified through comparative analysis of genomics sequences of the organelle genomes and the nuclear genomes (Song et al., 2020).
Mutation rates differ among mtDNAs, plastid genomes, and nuclear genomes and mutation rates for mtDNAs are usually higher than that for plastid and nuclear genomes. For example, comparative analysis of Phaeocystis antarctica and P. globosa mtDNAs suggested that the mutation rates for mtDNAs is 10 and 3 times that of the plastid and nucleus, respectively (Smith et al., 2014). Furthermore, mutation rates for intergenic regions are usually much higher than that for genic regions (Guo et al., 2015). As a result, many molecular markers have been developed based on mtDNAs. For example, the molecular marker MSS designed for distinguishing different mitotypes in Brassica napus help successfully identify 570 different inbred lines collected from various scientific research institutes in China (Heng et al., 2015). However, until now, mtDNAs of only 33 diatoms have been constructed and published, and by now there has been no published mtDNAs in the entire order Hemiaulales, to which E. zodiacus belongs. The order Hemiaulales has 145 annotated species including two additional HAB species Cerataulina bicornis and Cerataulina pelagica according to National Marine Data and Information Service (NMDIS).
We hypothesize that high-resolution molecular markers for analyzing genetic diversity can be developed through comparative analysis of E. zodiacus mtDNAs, especially the noncoding sequences that display higher variations. In this study, we constructed the mtDNA of E. zodiacus for the first time, demonstrated that common molecular markers including 18S rDNA, 28S rDNA, ITS, rbcL, and CO1 were inadequate for distinguishing E. zodiacus strains, and designed a new molecular marker ezmt1 with high resolution and specificity.

Strain Isolation, Culturing, and Characterization
Eight E. zodiacus strains (CNS00060, CNS00061, CNS00310, CNS00311, CNS00312, CNS00313, CNS00314, and CNS00315) were individually isolated from seawater samples collected during expeditions in multiple coastal regions in China, including the Jiaozhou Bay (August, 2019 and January, 2020) on the research vehicle "Chuangxin" operated by the Jiaozhou Bay Marine Ecosystem Research Station, the Changjiang Estuary (July, 2019) on the research vehicle "Zheyu 2" supported by the Natural Science Foundation of China (NSFC), and the Bohai Sea (October, 2019) on the research vehicle "Beidou" supported by the National Natural Science Foundation of China, Bohai and Yellow Sea Oceanography Expedition (NORC2019-01) (Figure 1). Briefly, phytoplankton cells were individually selected with a micropipette, followed by repeated washes before being transferred to 24-well culture dishes. They were then transferred to cell culture flask (60-750 ml) to accumulate enough biomass for further molecular assays. Phytoplankton cells were grown in L1 seawater culture medium and maintained with temperature of 18-20 • C, irradiance of 30 µM photons m −2 s −1 and photoperiod of 12/12-h light/dark.
For morphological identification, cells were mounted on the glass-slide and observed with a ZEISS IMAGER A2 microscope equipped with differential interference contrast optics (Hadziavdic et al., 2014). For molecular identification, sequences of five common molecular markers, including fulllength 18S rDNA, 28S rDNA D1-D2, ITS, COI, and rbcL were sequenced using Sanger sequencing technology after PCR amplification using primers listed in Table 1. PCR conditions for amplifying 18S rDNA began with a denaturation at 94 • C for 4 min, followed by 32 cycles of (denaturation at 94 • C for 1 min, annealing at 57 • C for 1:50, extension at 72 • C for 2 min), and a final extension at 72 • C for 10 min (Saldarriaga et al., 2003). PCR conditions for amplifying 28S rDNA D1-D2 began with a denaturation at 94 • C for 5 min, followed by 35 cycles of (denaturation at 94 • C for 30 s, annealing at 60 • C for 30 s, extension at 72 • C for 50 s), and a final extension at 72 • C for 10 min (Lundholm et al., 2002). PCR conditions for amplifying ITS began with a denaturation at 94 • C for 5 min, followed by 35 cycles of (denaturation at 94 • C for 40 s, annealing at 58 • C for 40 s, extension at 72 • C for 1 min), and a final extension at 72 • C for 10 min (Utama et al., 2017). PCR conditions for amplifying rbcL began with a denaturation at 94 • C for 5 min, followed by 35 cycles of (denaturation at 94 • C for 50 s, annealing at 53 • C for 50 s, extension at 72 • C for 1:10), and a final extension at 72 • C for 10 min (Alverson et al., 2007). PCR conditions for amplifying COI began with a denaturation at 94 • C for 5 min, followed by 35 cycles of (denaturation at 94 • C for 30 s, annealing at 50 • C for 1 min, extension at 72 • C for 1:10), and a final extension at 72 • C for 10 min.

DNA Library Preparation and Whole Genome Sequencing
Cultures at the exponential growth phase were harvested and concentrated via centrifugation, followed by total nucleic acids extraction with TIANGEN DNAsecure Plant Kit (TIANGEN, DP121221). Genomic DNA sample was fragmented by sonication via set program to a size of 350 bp. Then a single adenosine "A" was added to the 3 end of the double-stranded DNA after end modification to prevent the self-connection of the flat ends between DNA fragments, and it can also highlight the complementary pairing with the single "T" at the 5 end of the next sequencing connector for accurate connection, effectively reducing the self-connection between library fragments. DNA fragments were then ligated with the full-length adapter for Illumina sequencing, followed by further PCR amplification. After PCR products were purified by AMPure XP system (Beckman Coulter, Beverly, United States), DNA concentration was measured by Qubit R 3.0 Flurometer (Invitrogen, United States), libraries were analyzed for size distribution by NGS3K/Caliper and quantified by real-time PCR (3 nM). After cluster generation, the DNA libraries were sequenced on Illumina Novaseq 6000 platform and 150 bp paired-end reads were generated. The whole genome sequencing was finished at Novogene (Beijing, China).

Construction of mtDNA
Raw data were filtered into clean data with FASTQ following the rules (1) identifying and removing reads with tail pollution; (2) removing reads with low quality (>50% bases having Phred quality < 5) and (3) (Jackman et al., 2017) with the option k = 96 and SPAdes (v3.14.0) (Bankevich et al., 2012) with default parameters. With the mtDNA of Skeletonema marinoi (NC_028615)  and Thalassiosira pseudonana (NC_007405) (Armbrust et al., 2004) serving as references, scaffolds corresponding to mtDNA of E. zodiacus were identified using BLAST with the option e-value = 0.00001, max_target_seqs = 100. When achieving one scaffold only, we then used MEGA (v7.0) (Matus et al., 2014) and DOTTER (v4.44.1) to estimate whether sequences at the ends achieved overlap. Draft mtDNA sequence was constructed by merging the ends by taking advantage of the overlapping segments at the ends. If no overlapping sequences were identified, draft mtDNA sequence was formed by substituting gaps with a stretch of N. Reads were then aligned to the draft mtDNA sequence using BWA (v0.7.17-r1188) (Li and Durbin, 2009) with default parameters, results of which were extracted with SAMtools (v1.10) ) and viewed with IGV (v2.7.2) (Robinson et al., 2011). According to alignments, assembly errors were corrected and N regions were replaced. The final version of the mtDNA was validated through an additional round of alignment with BWA and visualization with IGV. Of all filtered clean sequence data, 1.24% represented mtDNA, while contamination accounted for 0.33%.

mtDNA Annotation
Protein-coding genes (PCGs) and open reading frames (orfs) were annotated using NCBI ORF Finder and BLAST similarity searches of the non-redundant databases at NCBI (Altschul et al., 1997). tRNAs were determined by reconstructing their cloverleaf structures using the tRNAscan-SE (v1.3.1) (Lowe and Chan, 2016) with default parameters. rRNAs were identified using RNAmmer (v1.2) (Lagesen et al., 2007), Barrnap (v0.9) and MEGA (v7.0) for homologous comparison. The gene map of the circular mtDNA of E. zodiacus was generated with Organellar Genome DRAW (OGDraw) (Lohse et al., 2007). The mtDNA sequence of E. zodiacus strain CNS00060 has been deposited in GenBank with the accession number of MW026607. For accurate comparative analysis of genes of mtDNAs of 33 diatom species in Bacillariophyta, we re-annotated all of these 33 published mtDNAs (Table 2) by searching for missing genes and correcting annotation errors. Nucleotide composition was calculated using DNA Sequence Polymorphism (DnaSP) software (v6.0) (Rozas et al., 2017).

Single Nucleotide Variants (SNVs) Detection in mtDNAs of E. zodiacus Strains
Phylogenetic tree based on the whole mtDNAs showed genomic diversity. To search for genomic variations (GVs), we aligned Illumina sequencing clean reads of the seven E. zodiacus strains against the mtDNA of the reference strain CNS00315 using BWA with default parameters. Alignment results were screened using SAMtools with default parameters, and SNVs with homozygous support > 85% were called using VarScan (v2.4.4) (Koboldt et al., 2012) with the option -min-freq-for-hom = 0.85.

Searching for High-Density SNVs Regions for Designing High-Resolution Molecular Markers
SNV positions of seven strains were integrated relative to mtDNA of strain CNS00315 using in-house Python scripts, which were also developed to scan for variations in 400 bpsliding (the length was appropriate for metabarcoding projects using Illumina DNA sequencing technology) successive windows (spaced at 1 bp) along the mtDNA of CNS00315. Each window was evaluated for SNV density and its ability to resolve eight E. zodiacus strains. The results were displayed using CIRCOS. To amplify the identified molecular marker, we use the forward primer: MCCCTATGGTATTAGAGA, and the reverse primer: RTTAAGTGACCCAAGTTCTAAG. PCR amplifications in reaction mixtures (final volume, 50 µl) began with a 5 min denaturation step at 94 • C, which was followed by 35 cycles of denaturation at 94 • C for 30 s, annealing at 45 • C for 1 min, and extension at 72 • C for 1 min and then by a final extension at 72 • C for 10 min.

Morphological and Molecular Identification of E. zodiacus Strains
Eight E. zodiacus strains collected in the Jiaozhou Bay, the Changjiang Estuary and the Bohai Sea were first identified based on their morphological features observed using light microscopy. These cells were generally "H" shaped with small and numerous plastids, with the middle part of each girdle being concave (Figure 2A). Both ends of the apical axis had elevations, with single-cells connected and forming a spiral population with small intercellular space ( Figure 2B). The morphological features were consistent with published observations of E. zodiacus (Guo, 2004;Yang and Dong, 2006). The strains were further examined and compared molecularly using five common molecular markers including full-length 18S rDNA, 28S rDNA D1-D2 region, ITS, COI, and rbcL. All eight strains shared the same percent identity (PID, which was used to quantify the similarity between the biomolecular sequence) (99.94%) when compared to the reference sequence of E. zodiacus on full-length 18S rDNA (Sorhannus, 2007). Similar high PIDs were found for other molecular markers including 28S rDNA D1-D2 (100%) (Hamsher et al., 2013), ITS (99.28%) (Guo et al., 2015), CO1 (99.25%) (Guo et al., 2015), and rbcL (100%) (Guo et al., 2015), respectively. Phylogenetic analysis of molecular marker sequences obtained for all eight strains indicated that they all clustered well with corresponding E. zodiacus sequence records at GenBank (Supplementary Figure S1), further confirming that these strains were all indeed E. zodiacus. However, none of these common molecular markers could distinguish these 8 E. zodiacus strains, suggesting that their resolution was limited in distinguishing intra-species genetic diversity.

General Characteristics of the E. zodiacus mtDNA
The complete mtDNA of E. zodiacus (strain CNS00060) was a circular molecule that was 36,162 bp in size (Figure 3), which was similar to but smaller than the mtDNAs of most diatoms ( Table 2). The compact genome size of E. zodiacus was primarily due to small intergenic regions (Ravi et al., 2018). Total intergenic regions in E. zodiacus mtDNA had a total size of 2,495 bp (only accounting for 6.9%). Three pairs of genes overlapped with each other, including rps4-rps2 (20 bp), nad1-tatC (20 bp), and orf158-trnP (9 bp). The first two were synthetic and the last one was reversed. Additionally, three pairs of genes were connected directly without space, including rps19-rps3, rps7-rps12, and atp6-cob. No introns were identified in the E. zodiacus mtDNA.
We found a pair of small inverted repeat (IR) region (129 bp) on either side of the orf98 in the intergenic regions.

Phylogenetic Analysis of Evolutionary Relationships
The amino acid sequence alignment of 32 concatenated PCGs (5,836 bp combined size) which were shared by mtDNAs of Bacillariophyta and Oomycota was constructed for phylogenetic analysis. Phylogenetic analysis indicated that the 34 species in Bacillariophyta formed three groups, corresponding to three classes of Bacillariophyta, including Bacillariophyceae, Mediophyceae and Coscinodiscophyceae (Figure 4), which was consistent to the current classification in AlgaeBase. E. zodiacus belongs to class Mediophyceae that also includes T. pseudonana, S. marinoi, and Toxarium undulatum. E. zodiacus formed an independent clade, so did the T. undulatum, which was consistent with previous report that T. pseudonana was more closely related to S. marinoi . Syntenic analysis between E. zodiacus and each of T. pseudonana and S. marinoi revealed a series of translocation and inversion events (Figure 5). High similarity was observed between T. pseudonana and S. marinoi mtDNAs, with only 5 small translocation events, involving cox2, cox3, trnW, trnV, and trnM, and several free-standing orf s (each being at least 100 codons in size). In contrast, E. zodiacus mtDNA exhibited a high level of genome rearrangement when compared to T. pseudonana or S. marinoi. The three diatom mtDNAs shared a relatively conservative gene block with about 41 genes (from nad11 to nad2), within which gene orders of T. pseudonana and S. marinoi were almost identical (except for orf272). In contrast, E. zodiacus had a translocation of trnC, a specific orf238, and two missing genes (atp6 and cob). What is more, we found that genes in two smaller gene blocks,  FIGURE 7 | Genomic variations density in E. zodiacus strains. The green band represented the reference genome CNS00315. From inside to outside, circles represent three E. zodiacus strains isolated from the Bohai Sea (orange), one strain from the Changjiang Estuary (blue), and three strains from the Jiaozhou Bay (green). nad11-trnD-nad4L-trnK-atp9-rpl16-rps3-rps19 and atp8-trnA were inverted in E. zodiacus mtDNA.

Defining a High-Resolution Molecular Marker for Distinguishing E. zodiacus Strains
While common molecular marker sequences were indistinguishable among these eight E. zodiacus strains (Supplementary Figure S1), phylogenetic tree constructed using the whole mtDNAs showed clear between-strain differences ( Figure 6A). Comparative analysis of the E. zodiacus mtDNAs identified a 400 bp-window with dense variations (Figure 7). We identified 26 SNVs (position: 32,131-32,530 bp in mtDNA of strain CNS00315). This 400 bp region partially overlapped with orf98 (261 bp). Phylogenetic analysis using this small region of all eight E. zodiacus strains based on the sequence alignment suggested that it could be used to effectively distinguish these strains as molecular marker ( Figure 6B). The resolution of this molecular marker, which we named E. zodiacus mitochondrial 1 and abbreviated as ezmt1, was high.

Specificity Evaluation of ezmt1
The specificity of a molecular marker is high if it can be used only for distinguishing a small set of closely related species. In contrast, the specificity of a molecular marker is low if it can be used for distinguishing a large set of broadly related species. In this study, we would like to identify a molecular marker with high specificity that specifically recognizes intraspecies variations in the species E. zodiacus. To test the specificity of newly developed molecular marker ezmt1, we first carried out BLAST searches against the NCBI nt database, which showed low similarity and low coverage to sequences of other species. Second, we carried out PCR amplification assays on DNAs extracted from 13 representative eukaryotic algae species, including seven species in Bacillariophyta including S. marinoi, Thalassiosira weissflogii, Chaetoceros curvisetus, Pseudo-nitzschia pungens, Planktoniella sol, Psammodictyon constrictum, and Rhizosolenia sp., three species in Dinoflagellata including Alexandrium tamarense, Karenia mikimotoi, and Prorocentrum donghaiense, three species in Ochrophyta including Aureococcus anophagefferens, Chattonella marina, and P. globosa. Results of all 13 PCR reactions showed that ezmt1 sequences could only be amplified in E. zodiacus (Supplementary Figure S2), further confirming the high specificity of ezmt1.

DISCUSSION
The E. zodiacus is a common HAB species that has been identified in many ocean regions including the Tokyo Bay , Harima-Nada (Nishikawa et al., 2007), and Ariake sea (Matsubara, 2012) in Japan, Bay of Fundy (Martin et al., 2008) in Canada, Jiaozhou Bay, Haizhou Bay, Xiangshan Harbour and many other sea areas in China (Huo et al., 2001;Zhang et al., 2002;Liang, 2012). Indeed, it is the only HAB species that has been identified in all recorded expeditions in the Jiaozhou Bay (Liu and Chen, 2021). E. zodiacus HABs have been found to have caused negative impacts on bleaching of aquacultured nori, fisheries damage and economic losses (Martin et al., 2008;. The differential special and temporal features of E. zodiacus HABs reported in Japan (Nishikawa et al., 2007) and China (Huo et al., 2001;Zhang et al., 2002;Liang, 2012) suggest that it has important genetic diversity. Nevertheless, the genomic information of E. zodiacus is limited and the genetic diversity of E. zodiacus has not been studied.
In this project, we constructed the mtDNA of E. zodiacus for the first time, which was also the first mitochondria genome for all species in the order Hemiaulales. The mtDNA of E. zodiacus was 36,162 bp in size, which is shorter than most diatom mtDNAs that are generally compact with few repeats and a small number of introns (Secq and Green, 2011). The small size of mtDNA of E. zodiacus is due to its small intergenic regions, the low repeat content and the absence of introns. First, the variations in mtDNA sizes could be due to variations of intergenic regions (Pogoda et al., 2019), and the average intergenic regions for T. pseudonana and Phaeodactylum tricornutum (Secq and Green, 2011) are 157 and 841 bp, respectively. The average length of intergenic regions of E. zodiacus mtDNA was only 39 bp. Second, repeats in diatom mtDNAs are either small or concentrated in only a small number of sites, without interrupting the genes in the mtDNAs or gene densities of the mtDNAs. For example, only a single 35 kb-long repeat was found in the mtDNA of P. tricornutum (Secq and Green, 2011). No such repeats were found in the mtDNA of E. zodiacus. Third, the introns in the diatom mtDNAs are generally found in a few genes including cox1 (Guillory et al., 2018), rnl, rns, cob, cox2, cox3, and nad7 (Pogoda et al., 2019). No introns were found in E. zodiacus mtDNA.
There is very little difference in gene content of diatom mtDNAs, except for orf s, some of which are found within introns (Pogoda et al., 2019). The only gene that was not found in the E. zodiacus mtDNA was rrn5, which is found only in a few diatom species (Secq and Green, 2011;Valach et al., 2014). The rrn5 may exist in the common ancestor of organelle genomes but have lost afterward (Valach et al., 2014). A full nad11 gene was found in the E. zodiacus mtDNA. This gene is present in many diatom mtDNAs including the mtDNAs of T. undulatum (Guillory et al., 2018) and Asterionella formosa (Villain et al., 2017), while it is split into two parts in the mtDNAs of many species in Bacillariophyceae including Cylindrotheca closterium (Guillory et al., 2018) and Nitzschia palea (Crowell et al., 2019). Three ribosomal protein coding genes rps2, rps7, and rps12 are lost in some diatom mtDNAs (Pogoda et al., 2019). However, all of these three genes are found in the E. zodiacus mtDNA.
The advantage of compact E. zodiacus mtDNA is not known (Secq and Green, 2011;Liu et al., 2014). However, as intergenic regions may facilitate intragenomic recombination, as observed in mtDNAs of mosses (Liu et al., 2014), the small intergenic regions in E. zodiacus mtDNA may be associated with low intragenomic recombination activities, which may be critical for maintaining the stability of the mtDNA. Furthermore, the organization of genes is important to the transcription of polycistronic operons (Liu et al., 2014), thus the small genome size of E. zodiacus mtDNA may be important in insuring proper transcription of genes in the mtDNA.
While the number of genes in diatom mtDNAs show high similarity, their syntenic relationships vary greatly in a lineagespecific manner. Numerous genome rearrangement events were observed between E. zodiacus and mtDNAs of other diatom species, which may be explained by the large evolutionary distances between E. zodiacus and other diatom species. However, the mtDNA of E. sodiacus shared relatively high syntenic similarity with mtDNAs of representative diatoms including T. pseudonana and S. marinoi (Figures 5A,B) of another order in class Mediophyceae, supporting the current taxonomic status of E. zodiacus, which is also supported by phylogenetic analysis based on core genes. mtDNAs of more closely related species are needed to understand the origin and the evolutionary relationship of such genome rearrangements.
An ideal molecular marker usually requires many criteria. First, low intra-genome variation among multiple copies of a molecular marker is critical for ensuring enough representativeness and reduce ambiguity (Xiao-Kun et al., 2019). Second, a molecular marker should have conserved flanking sequences to facilitate the design of universal primers and obtain an appropriate sequence length in a single amplification (Guo et al., 2015). For example, the short variable region V4 region of the 18S rDNA sequence, which is frequently used for metabarcoding analysis of microbial eukaryotes (Decelle et al., 2014;Liu et al., 2020). Third, a molecular marker should have appropriate specificity, dependent on its applications (Fechner et al., 2010). To be specific, when surveying large number of species in large areas, low specificity is preferred. When focusing on specific species, like in this project, for tracking E. zodiacus strains, high specificity is more desirable.
For this project, we isolated and characterized eight E. zodiacus strains from three different sea areas in China, spanning about eight latitudes (30.3625 • N-38.3658 • N) and covering three seasons (summer, autumn, and winter). Despite such large geographical span and seasonal differences, phylogenetic analysis based on common molecular markers could not distinguish these strains, suggesting that they shared high genetic similarity. We found clear distinction among different E. zodiacus strains based on whole mtDNAs, suggesting unambiguous genetic differences among different E. zodiacus strains. Through sequence alignment and comparative analysis, we identified a molecular marker ezmt1 that could adequately distinguish different E. zodiacus strains. Common molecular markers of E. zodiacus may fit part of the criteria listed above, while ezmt1 satisfies all criteria. The ezmt1 could be an effective molecular marker for studying E. zodiacus all over the world. On the one hand, we can distinguish and track different strains of E. zodiacus, especially during blooms, to evaluate strain-specific differential contribution to blooms. For example, E. zodiacus blooms occurred in Japan in winter (Nishikawa and Yamaguchi, 2006;Nishikawa et al., 2009) revealed different features with that in China usually occurred in summer (Huo et al., 2001;Zhang et al., 2002;Liang, 2012), which suggested that different E. zodiacus strains involved. The newly designed molecular marker ezmt1 may help study the genetic evolutionary relationship between them. On the other hand, by further collecting large number of samples, we can study the geographical distribution pattern of E. zodiacus strains. CONCLUSION E. zodiacus is the first species having its complete mitogenome sequenced in the order Hemiaulales. The availability of the E. zodiacus mtDNA will facilitate evolutionary study of mtDNAs in Bacillariophyta, especially in the order Hemiaulales. Through comparative analysis of mtDNAs among different E. zodiacus strains, we identified a molecular marker ezmt1 that can effectively distinguish different E. zodiacus strains. The ezmt1 holds great potential in research on genetic diversity in E. zodiacus, and, more importantly, on tracking causative strain in E. zodiacus HABs.