Evolutionary Analysis of Unicellular Species in Chlamydomonadales Through Chloroplast Genome Comparison With the Colonial Volvocine Algae

This study is the first determination of six chloroplast genomes of colonial volvocine algae, Colemanosphaera charkowiensis, Volvulina compacta, Pandorina colemaniae, Pandorina morum, Colemanosphaera angeleri, and Yamagishiella unicocca. Based on 55 chloroplast protein-coding genes, we compared the nonsynonymous (dN) and synonymous (dS) substitution rates between colonial volvocine algae and the other unicellular Chlamydomonadales species. When refer to the dN, we found 27 genes were significantly different, among them, 19 genes were significant higher in unicellular species (FDR-adjusted P < 0.05). When refer to the dS, we found 10 genes were significantly different, among them, 6 genes were significant higher in unicellular species (FDR-adjusted P < 0.05). Then we identified 14 putative fast-evolving genes and 11 putative positively selected genes of unicellular species, we analyzed the function of positively selected sites of the overlap genes of putative fast-evolving and positively selected genes, and found some sites were close to the important functional region of the proteins. Photosynthesis is the process to transform and store solar energy by chloroplast, it plays a vital role in the survival of algae, this study is the first to use the chloroplast genomes to analysis the evolutionary relationship between colonial and unicellular species in Chlamydomonadales. We found more genes have higher substitution rates in unicellular species and proposed that the fast-evolving and positively selected two genes, psbA and psbC, may help to improve the photosynthetic efficiency of unicellular species in Chlamydomonadales.


INTRODUCTION
The volvocine algae belong to Chlamydomonadales (Chlorophyta, Chlorophyceae). This group of algae span the full range of organizational complexity, from unicellular species to colonial species, thus these algae are ideal model organisms to study the fundamental issues related to the transition to multicellularity. In recent years, many chloroplast genomes of Chlamydomonadales species have been sequenced using the application of next generation sequencing technology, this provided massive data for us to study the nucleotide substitution rates based on the chloroplast protein-coding genes data sets (Smith and Lee, 2009;Smith and Lee, 2010;Hamaji et al., 2013;Smith et al., 2013;Del et al., 2015;Lemieux et al., 2015;Featherston et al., 2016;Hu et al., 2019). However, due to the limited number of chloroplast genomes of colonial volvocine algae, little is known about the evolutionary relationship between colonial and unicellular species in Chlamydomonadales based on chloroplast genomes.
The nucleotide substitution rates are often used as the criterion to reflect the selection pressure. The nonsynonymous substitution rates (dN) can cause an amino acid change, the synonymous substitution rates (dS) do not cause an amino acid change. The dN/dS is the ratio of nonsynonymous substitution and synonymous substitution, the ratio of dN/dS is the measure of natural selection acting on the protein. According to Yang (Yang, 2007), dN/dS < 1 means negative purifying selection, dN/dS = 1 means neutral evolution, dN/dS > 1 means positive selection. Generally, the chloroplast protein-coding genes need to maintain the photosynthetic function, so these genes are conserved and have a low dN/dS ratio (Smith, 2015). Meanwhile, a relatively high dN/dS ratio could be interpreted as the positive or relaxed selection (Guisinger et al., 2008), the evolutionary analysis can reveal how species adaptive under selection pressure. For example, Guisinger et al. (2008) calculated the nucleotide substitution rates of chloroplast protein-coding genes of angiosperm, and found unprecedented accumulation of nucleotide substitutions in Geraniaceae, then a model was proposed to illustrate this phenomenon. Zhang et al. (2018) studied the molecular evolution of chloroplast protein-coding genes of an Antarctic sea ice alga Chlamydomonas sp. and revealed the adaptive mechanism of sed-ice environment.
In this study, we determined six chloroplast genomes of colonial volvocine algae, these genomes provide valuable opportunity for us to conduct the evolutionary study in Chlamydomonadales. Based on the chloroplast protein-coding genes, we examined the nucleotide substitution rates of Chlamydomonadales species and found that more genes have higher substitution rates in unicellular species when compared with colonial species. Then we identified the putative fast-evolving genes and positively selected genes of unicellular species, based on our analysis, we proposed that the positively selected sites of specific chloroplast protein-coding genes may improve the photosystem efficiency, and our photosynthetic experiment further support our conclusion. The strains were grown in a conical flask containing artificial freshwater-6 (AF-6) (Kato, 1982) at 20-25 • C under a 14 h light: 10 h dark schedule under cool-white fluorescent lamps at an intensity of 1000-2000 lux. Total genomic DNA was extracted using a Universal DNA Isolation Kit (AxyPrep, Suzhou, China) following the manufacturer's instructions. Species identification was based on morphological observation and phylogenetic analysis based on five chloroplast genes (rbcL, atpB, psaA, psaB, and psbC), and the sequence data of related species were chosen according to Nozaki et al. (2014). The sequence matrix was aligned by MAFFT v7.394 (Katoh and Standley, 2013), and the ambiguously aligned regions were further manually edited and adjusted by eye using MEGA7 (Kumar et al., 2016). jModelTest v.2.1.7 (Darriba et al., 2012) was used to determine the evolutionary model, which was then analyzed using Bayesian inference (BI) with MrBayes v3.2.6 (Ronquist et al., 2012) and maximum likelihood (ML) with RAxML v8.2.10 (Stamatakis, 2014). Microphotographs were taken by using an Olympus BX53 (Tokyo, Japan) light microscope with an Olympus DP80 digital camera and cellSens standard image analysis software (Tokyo, Japan).

Library Preparation, Sequencing, Genome Assembly, and Annotation
A sequencing library was prepared using an NEBNext Ultra DNA Library Prep Kit for Illumina (New England Biolabs, United States) and sequenced with an Illumina NovaSeq6000 at Novogene (Beijing, China). The data were trimmed using SOAPnuke v1.3.0 (Chen et al., 2017) and then assembled with SPAdes v3.10.1 (Bankevich et al., 2012). The resulting assembly contigs were determined to be form the chloroplast genome based on the following criteria: (1) blast searches of publicly available chloroplast genomes of Chlorophyta algae species with significant e-values (1e-5); (2) the GC content of the contig is less than 45% (the GC content of green algae chloroplast genomes that have been sequenced to date is normally less than 45%); and (3) the sequencing depth is higher than 100×. Then, the trimmed reads were aligned to the resulting assembly contigs by BWA-MEM v0.7.12 (Li, 2013). If reads mapped two contigs at the same time, we determined the order of contigs, and after confirming the orders of the contigs, the sequence we produced was then rechecked by Sanger dideoxy sequencing technology and synteny analysis with related species. The chloroplast genomes were initially annotated using CpGAVAS (Liu et al., 2012). Protein-coding genes were further polished using Blast with genes from the available colonial volvocine chloroplast genes. All chloroplast genome sequences have been submitted to GenBank, the accession number was listed in Table 1.

Phylogenomic Analysis
Our study aims to reveal the evolutionary relationship between colonial and unicellular species in Chlamydomonadales based on the protein-coding genes of chloroplast genomes, we tried to ensure the gene we analyzed are exist in all species, however, we found some species may have limited number of protein-coding genes and some genes of specific species have poor alignment with other species, to keep a balance between the number of genes and the number of species, 12 colonial species and 16 unicellular species were chosen and species information was listed in Table 1. The data set was assembled from the following 55 protein-coding genes : atpA, atpB, atpE, atpF, atpH, atpI, ccsA, cemA, chlB, chlL, chlN, clpP, petA, petB, petD, petG, petL, psaB, psaC, psaJ, psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbK, psbL, psbM, psbN, psbT, psbZ, rbcL, rpl14, rpl16, rpl2, rpl20, rpl23, rpl36, rpl5, rps11, rps12, rps14, rps18, rps19, rps2, rps3, rps4, rps7, rps8, rps9, tufa, and ycf4. The method of phylogenomic analysis was mainly refer to Lemieux et al. (2015). All genes were aligned using MUSCLE v3.8 (Edgar, 2004), and the alignments of all genes were converted into a codon alignment by TranslatorX (Abascal et al., 2010). The ambiguously aligned regions in alignment were excluded using Gblocks0.91b (Castresana, 2000) with the options −t = c, −b3 = 5, −b4 = 5, and −b5 = half. All alignments were concatenated using Phyutility v2.2.6 (Smith and Dunn, 2008), and then the Degen1.pl 1.2 script (Regier et al., 2010) was applied to the concatenated alignment. jModeltest v.2.1.7 (Darriba et al., 2012) was used to determine the evolutionary model. The data was partitioned by gene, with the model applied to each partition. Phylogenies were inferred using ML and BI methods. ML analyses were carried out using RAxML v8.2.10 (Stamatakis, 2014) and the GTRGAMMA model of sequence evolution. Bootstrap analysis with 1,000 replicates of the dataset for ML was performed to estimate statistical reliability. Bayesian analyses were performed with MrBayes v3.2.6 (Ronquist et al., 2012) with the GTR + I + G model. Markov chain Monte Carlo (MCMC) analyses were run with four Markov chains (three heated, one cold) for 1,000,000 generations, with trees sampled every 500 generations. Each time the diagnostics were calculated, a fixed proportion of samples (burninfrac = 0.25) were discarded from the beginning of the chain. A stationary distribution was assumed when the average standard deviation of the split frequencies was lower than 0.01.

Evolutionary Analysis
The CODEML program of PAML v4.9 (Yang, 2007) with the ML model (runmode = −2, CodonFreq = 2) was used to measure the values of dS and dN, the analysis was based on 55 chloroplast protein-coding genes. As Chloromonas radiata belongs, with Carteria cerasiformis and Carteria sp., to a clade that is sister to all the other species, so C. radiata was used as reference.
Comparisons of the evolutionary rates were conducted using the two-tailed Wilcoxon rank sum test. The multiple testing was corrected by applying the false discovery rate method (FDR) (Benjamini and Hochberg, 1995) as implemented in R. 1 The phylogenetic tree was used as a constraint tree, but branch lengths were inferred by using PAML.
The ML method is a pairwise approach to estimate the dN/dS ratio, a dN/dS ratio may indicate in one or both species, and some specific sites under positive selection may remain undetected (Dussert et al., 2018). So, two precise assessments were used to detected the difference of dN/dS and positive selection.
We use the branch model to test whether unicellular species have a different dN/dS ratio relative to the colonial species. The 1 http://www.R-project.org Genes were considered putative fast-evolving genes if they had an FDR-adjusted P < 0.05 and a higher dN/dS ratio in the foreground branch than in the background branches. We use the branch-site model to find genes that potentially experienced positive selection. The improved branch-site model (model = 2, Nsites = 2) was used to detect signatures of positive selection on individual codons in a specific branch (Zhang et al., 2005). The unicellular species were set as the foreground branch. The null model assumed no positive selection occurred on the foreground branch (fix_omega = 1, omega = 1), and the alternative model assumed that sites on the foreground branch were under positive selection (fix_omega = 0, omega = 2). LRT were used to test model fit and the Chi-square test was applied for testing P values. We performed a correction for multiple testing using an FDR criterion, and Bayes empirical Bayes (BEB) method was used to statistically identify sites under positive selection.
Genes were considered putative selected genes if they had an FDR-adjusted P < 0.05.
The overlap gene represent a group of genes that are not only putative fast-evolving genes but also putative positive selected genes, we analyzed the function of positive selected sites of overlap genes, the functional information were derived from the Uniprot. 2 The three-dimensional (3D) structures were predicted using Phyre2 (Kelley et al., 2015). The 3D structures were visualized by ePlant Web server (Fucile et al., 2011).

Species Identification
Vegetative colonies of strain FACHB-2326 were ellipsoidal in shape, composed by 16 cells of approximately identical sizes embedded by gelatinous matrix forming a hollow colonial structure. Cells spherical, the chloroplast contained more than two pyrenoids of almost identical size ( Figure 1A) and prominent longitudinal striations ( Figure 1B). Only two contractile vacuoles distributed near the base of the flagella ( Figure 1B). Based Frontiers in Microbiology | www.frontiersin.org on the morphology of colony, the number and size of pyrenoids, the number of contractile vacuoles, we identified strain FACHB-2326 as C. charkowiensis. Vegetative colonies of strain FACHB-2337 were ellipsoidal in shape, cells were embedded in a gelatinous matrix forming a hollow sphere. Cells were more or less contiguous and appeared nearly tetragonal by mutual compression. There were no prominent spaces between adjoining cells. Based on the spaces between adjoining cells and the shape of cells, we identified strain FACHB-2337 as V. compacta (Figures 1C,D). Vegetative colonies of strain FACHB-2361 were ellipsoidal in shape and contained eight cells compactly arranged in a gelatinous matrix. The chloroplast had more than two pyrenoids. Based on the shape of cells and the number of pyrenoids, we identified strain FACHB-2361 as P. colemaniae (Figures 1E,F). Vegetative colonies of strain FACHB-2362 resembled strain FACHB-2361, but the chloroplast contained a single, basal pyrenoid, so we can identify strain FACHB-2362 as P. morum (Figures 1G,H). Vegetative colonies of strain FACHB-2363 resembled strain FACHB-2326, but the chloroplast striations were not prominent and contained a large basal pyrenoid and small pyrenoids, so we identified strain FACHB-2363 as C. angeleri (Figures 1I,J). Vegetative colonies of strain FACHB-2364 resembled strain FACHB-2363, but the chloroplast only contained a single, basal pyrenoid, so we identified strain FACHB-2364 as Y. unicocca (Figures 1K,L).
The phylogenetic tree based on five genes (Figure 2) supported our morphological identification with high bootstrap value (both 100) and Bayesian posterior probability (both 1.00), except the phylogenetic position of strain FACHB-2337. We noticed that strain FACHB-2337 clustered with Volvulina pringsheimii form a lineage. The cells of V. pringsheimii are multiangular or circular and not contiguous in surface view, and the colony of V. pringsheimii is a hollow sphere more resembles with Eudorina. But the colony of strain FACHB-2337 was more compact and resembled with Pandorina, and the cells were contiguous in surface view, the morphological observation strongly supported strain FACHB-2337 as V. compacta rather than V. pringsheimii (Starr, 1962;Nozaki and Kuroiwa, 1990). So, we still considered strain FACHB-2337 as V. compacta. In our phylogenetic tree, the phylogenetic position of most species was consisting with the study of Nozaki et al. (2014), except Volvulina steinii. The bootstrap value and posterior probability of V. steinii clade were relatively low in the study of Nozaki et al. (2014), we found the phylogenetic position of V. steinii in our study was consisting with Nakada et al. (2010), and both studies showed high bootstrap value and posterior probability of V. steinii clade. Meanwhile, recent study both show polyphyletic of genera Pandorina and Volvulina (Coleman, 2001;Nakada et al., 2010;Nozaki et al., 2014), so the phylogenetic position of these species may need further study.

Phylogenomic Analysis and Evolutionary Rate Estimation
We conducted our phylogenomic analysis based on the nucleotide sequence of 55 chloroplast protein-coding genes, ML was carried out using RAxML, Bayesian analyses was performed with MrBayes, the phylogenetic position of most species inferred from both methods are the same except Dunaliella salina. The phylogenetic position of D. salina inferred from both method have high bootstrap value (87) and posterior probability (1.00), but the result of ML method was in accordance with previous study (Lemieux et al., 2015), so the ML tree was used to represent the result (Figure 3). The phylogenetic position of other unicellular species were consistent with previous study with high bootstrap values and posterior probability values (Yumoto et al., 2013;Lemieux et al., 2015). The phylogenetic position of most colonial species were consistent with previous study (Nozaki et al., 2014) except P. morum, we noticed that the P. morum together with V. compacta formed a lineage instead of P. colemaniae, this situation have been reported before (Coleman, 2001), this may mainly due to the polyphyletic of Pandorina (Herron, 2016). In our study, this tree was used as the constraint tree for our evolutionary analysis.
Based on the ML method of 55 chloroplast protein-coding genes, the value of dN and dS were compared between colonial and unicellular species in Chlamydomonadales ( Table 2 and  Supplementary Table S1). When refer to dN, 27 genes were significantly different between the two group of algae, among these genes, we found 19 genes significantly higher in unicellular species. When refer to dS, 10 genes were significantly different between the two group of algae, among these genes, we found six genes significantly higher in unicellular species. Among genes with statistical significance, both comparisons show more genes have higher substitution rates in the unicellular species.
The branch model was used to compare the dN/dS ratio between colonial and unicellular species based on the chloroplast protein-coding genes, the LRT was used to compare the fit of two models. The null model (H0) assumed that all tree branches evolved at the same rate (the same dN/dS ratio), the alternative model assumed that the foreground branch (the unicellular species) could evolved at a different rate (different dN/dS ratio). We found the FDR-adjusted P value of 16 genes were less than 0.05, this indicates the dN/dS ratio of these genes were significantly different among the unicellular and colonial species (Figure 4 and Supplementary Table S2). Among these genes, the dN/dS ratio of 14 genes (atpA, rpl16, psaB, psbC, atpB, psbE, psaJ, psbA, rps8, rpl2, rps12, rps14, psbN, atpI) were higher in the unicellular species compared with the colonial species, so these genes were considered putative fast-evolving genes.
The positive selection analysis was performed based on the branch-site model, and we also conducted the comparison between the null and alternative models. The null model considered the foreground branch only have dN/dS = 1, and the alternative model considered sites on the foreground branch have dN/dS > 1 (positive selection). We used the Chi-square test to testing P values, after the FDR correction, we found 11 genes (psaB, psbB, psbC, rbcL, tufA, psbA, rps4, rpl5, rpl16, rps12, atpF) have the FDR-adjusted P value lower than 0.05 (Supplementary Table S3), and we considered these genes as the putative positively selected genes. The overlapped genes between the putative fast-evolving genes and positively selected genes were psaB, psbC, psbA, rpl16, and rps12. Based on the BEB method, the positively selected sites for each gene were  shown in Table 3. We found the psaB, psbC and psbA have sites may likely under positive selection. For psaB, 134GLN have posterior probability higher than 95%, 253GLN have posterior probability higher than 90%, but there is no related functional sites information of Chlamydomonas reinhardtii in Uniprot, so the positively selected sites of psaB reminds further study. For psbA, site 237ARG (posterior probability higher than 90%) was close to the 215HIS and 272HIS of C. reinhardtii, the 215HIS is the metal binding site of iron and binding site of Quinone (B), the 272HIS is also the metal binding site of iron. For psbC, site 409SER (posterior probability higher than 95%) was close to the 355GLU of C. reinhardtii which was the metal binding site of calcium-manganese-oxide ( Figure 5).

DISCUSSION
In this study, we determined six chloroplast genomes of colonial volvocine algae, this provided opportunity for us to reveal the different evolutionary rate between unicellular and colonial species in Chlamydomonadales. Our analysis was based on the protein-coding genes of 12 colonial volvocine algae and 16 unicellular species, we used the ML method to calculate the value of dN and dS of each gene and each species. The rate of synonymous substitutions and nonsynonymous substitutions of more genes were higher in unicellular species; the nonsynonymous substitution can modify the produced amino acid sequence, among the 27 significantly different genes, we found the nonsynonymous substitution of 19 genes were significantly higher (FDR-adjusted P < 0.05) in unicellular species than colonial species. More genes also have higher synonymous substitutions in the unicellular species (6 gene higher in unicellular species among 10 significantly different). All this analysis indicated more genes have higher substitution rates in unicellular species. Guisinger et al. (2008) have found the increased substitution rates in Geraniaceae, and they proposed that the mutations in chloroplast-targeted genes could leading to increased substitution rates in chloroplast genes. Such explanation would expect rate increased for all chloroplast genes (Wang et al., 2015), since we observed the increased of dN and dS in limited number of genes in this study, the plastid DNA repair mechanism could only partly be one of the reasons responsible for the higher substitution rates in unicellular species. Shen et al. (2009) found higher substitution rates in weakly locomotive species when compared with strongly locomotive species, they associated such phenomenon with the different demand for energy. Likewise, based on our photosynthetic experiment (Supplementary Tables S4, S5), we found that the unicellular species may have lower demand for light when compared with colonial species, here, we speculate that the lower demand for light could indicate a relaxation of constraint on unicellular chloroplast genes compared with colonial species (Björnerfeldt et al., 2006), and the relaxation of constraint allow for more substitutions in the chloroplast genes of unicellular species.
To explore the substitution happens in the unicellular species whether harboring an advantageous that increased individual adaptability, we used the branch model to test whether genes were under fast-evolving in unicellular species, and we used the branch-site model to test whether sites on the unicellular  Genes of substitution rates significantly higher in unicellular species and FDR less than 0.05 were showed in bold.
FIGURE 4 | Plot showing ranked FDR-adjusted P values for 55 chloroplast protein-coding genes. P value were obtained from the branch model likelihood ratio tests, were applied by the false discovery rate method. branch were under positive selection. Based on our analysis, 14 genes were considered as the putative fast-evolving genes and 11 genes were considered as the putative positively selected genes, five genes were overlapped among these two group of genes. The overlap genes have higher dN/dS ratio in unicellular species (fast-evolving), meanwhile, they were undergone adaptive molecular changes (positively selection), so the positively selected sites of overlap genes may closely relate to the adaptive of unicellular species. Among the five overlap genes, we analyzed the positively selected sites for each gene FIGURE 5 | The three-dimensional structures of psbA and psbC. The psbA encodes photosystem II reaction center protein D1, the psbC encodes photosystem II CP43 chlorophyll apoprotein. The positively selected sites were showed in green, and the functional sites of Chlamydomonas reinhardtii were showed in red. The schematic model of photosystem II was drawn with reference from Yamamoto (2001).
by refer to the functional sites of C. reinhardtii in Uniprot, and we found two genes (psbA, psbC) may play an import role in adaption. The psbA encodes photosystem II reaction center protein D1, it is one of the two reaction center proteins of photosystem II, the function of this protein is associated with the electron transfer. The psbC encodes photosystem II CP43 chlorophyll apoprotein, it is one of the components of the core complex of photosystem II, it binds chlorophyll and helps catalyze the primary light-induced photochemical processes of photosystem II. According to the sites information in Uniprot, the positive selected sites of psbA and psbC gene were close to the functional sites of the homologous protein in C. reinhardtii. One positively selected site of psbA was close to the iron binding site and Quinone (B) binding site, this is associated with the formation of the iron-quinone complex (Wydrzynski and Satoh, 2005;Umena et al., 2011). One positively selected site of psbC was close to the binding site of calcium-manganese-oxide possibly contributed to the oxidation of water (Govindjee et al., 2010;Najafpour, 2011).
In general, these two genes were all act an important role in photosynthesis, the molecular evidence show that their substitutions may contributed to the efficiency of photosystem.
Our photosynthetic experiment showed the limitation of iron or calcium have lower impact on unicellular species compared with colonial species (Supplementary Tables S4, S5). We speculate that the lower impact could due to the positive selection sites in the psbA and psbC gene, the positive selection sites help these two gene have a better binding efficiency with iron or calcium, then allow unicellular species could better utilize the trace amount of iron or calcium left in the culture medium than colonial species. Our experiment further supported our conclusion. Our study is the first determination of the chloroplast genomes of six colonial volvocine algae, by compared with the chloroplast genomes of colonial volvocine algae, we reveled more genes have higher substitution rates in unicellular species of Chlamydomonadales. We identified the fast-evolving and positively selected genes in unicellular species and found the psbA and psbC might improve the photosynthetic efficiency of unicellular species. This study not only increased the chloroplast genome information of volvocine algae but also provided useful information to understand the evolutionary relationship between unicellular and colonial species in Chlamydomonadales.