Six Newly Sequenced Chloroplast Genomes From Trentepohliales: The Inflated Genomes, Alternative Genetic Code and Dynamic Evolution

Cephaleuros is often known as an algal pathogen with 19 taxonomically valid species, some of which are responsible for red rust and algal spot diseases in vascular plants. No chloroplast genomes have yet been reported in this genus, and the limited genetic information is an obstacle to understanding the evolution of this genus. In this study, we sequenced six new Trentepohliales chloroplast genomes, including four Cephaleuros and two Trentepohlia. The chloroplast genomes of Trentepohliales are large compared to most green algae, ranging from 216 to 408 kbp. They encode between 93 and 98 genes and have a GC content of 26–36%. All new chloroplast genomes were circular-mapping and lacked a quadripartite structure, in contrast to the previously sequenced Trentepohlia odorata, which does have an inverted repeat. The duplicated trnD-GTC, petD, and atpA genes in C. karstenii may be remnants of the IR region and shed light on its reduction. Chloroplast genes of Trentepohliales show elevated rates of evolution, strong rearrangement dynamics and several genes display an alternative genetic code with reassignment of the UGA/UAG codon presumably coding for arginine. Our results present the first whole chloroplast genome of the genus Cephaleuros and enrich the chloroplast genome resources of Trentepohliales.


INTRODUCTION
The order Trentepohliales (Ulvophyceae, Chlorophyta) consists of approximately 80 species of terrestrial green algae that are widely distributed in tropical, subtropical and temperate regions with humid climates (Thompson and Wujek, 1997;Suto and Ohtani, 2009;Zhu et al., 2017). The species are found on a large variety of subaerial substrates including rocks, tree bark, leaves, tree trunks, building walls, wood and metal (Zhu et al., 2019). Algae in the Trentepohliales are characterized by uniseriate branched filaments, net-like chloroplasts without pyrenoids, zoosporangia supported by a special curved supporting cell, cell walls with plasmodesmata and cytokinesis by production of a phragmoplast similar to that of the vascular plants (Brooks et al., 2015). Many Trentepohliales have a striking color due to abundant carotenoids in their cells, facilitating resistance to the strong ultraviolet radiation in subaerial habitats (Liu et al., 2012) and offering prospects for uses in the fields of health care products, cosmetics and feed (Li et al., 2014). Some species of the order Trentepohliales live in association with fungi, forming lichens (Nelsen et al., 2011;Kosecka et al., 2020). Several species of the genus Cephaleuros Kunze ex Fries are well-known plant pathogens, including C. virescens and C. parasiticus. The infections caused by them are often referred to as 'red rust' or 'algal spot disease, ' which can damage economically important crops (e.g., tea, coffee, citrus) (Wolf, 1930;Nelson, 2008;Brooks et al., 2015). The treatments for algal spot disease has always been a headache for phytopathologists (Browne et al., 2019;Lee et al., 2020).
The relationships among the major lineages of core Chlorophyta have been evaluated based on nuclear and plastid datasets as well as genome-scale data derived from chloroplast genomes or transcriptomes (Cocquyt et al., 2010;Fučíková et al., 2014;Fang et al., 2018;Li et al., 2021). However, some critical data is missing for the Ulvophyceae, with the scarcity of complete chloroplast genomes in Dasycladales and Trentepohliales being particularly problematic. Previous studies of Trentepohliales have focused mainly on morphological traits, and phylogenies based on traditional molecular markers such 18S rDNA, ITS and rbcL often lacked the resolution that could be provided by chloroplast genomes (López-Bautista et al., 2006;Rindi et al., 2009;Zhu et al., 2017).
Chloroplast genomes have become a work horse for phylogenetic and evolutionary studies on algae and higher plants with its unique advantage of matrilineal inheritance (Sun et al., 2016;Xu et al., 2021) and variable rates of molecular evolution across regions (Costa et al., 2016). High-throughput sequencing has facilitated a dramatic increase of green algal chloroplast genome sequencing, but so far, only a single complete chloroplast genome has been published in the order Trentepohliales (Zhu et al., 2019), and none in the genus Cephaleuros.
The goal of this study is to address this shortfall in chloroplast genome information in the Trentepohliales by sequencing and analyzing six chloroplast genomes including four in the genus Cephaleuros. We provide a detailed comparative account of genome features within the order and with other lineages of the Ulvophyceae.

Samples Collection and Culture Conditions
Strains BN17, YN1242, and YN1317 were collected in Xishuangbanna Tropical Botanical Garden (Yunnan Province, China), strains GD1942, GD1927 in South China Botanical Garden (Guandong Province, China) and strain SAG 42.85 was purchased from University of Göttingen, Germany (SAG 1 ). Detailed specimen information was presented in Supplementary  Table S1. All strains were grown at the Freshwater Algal Herbarium (HBI), Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China. They were cultivated in BBM medium (Bischoff and Bold, 1963), maintained in a culture chamber at a temperature of 20-25 • C and illumination at 35-50 µmol photons m −2 s −1 for a 12 h:12 h light:dark cycle.

Genome Sequencing, Assembly, and Annotation
The sequencing library was prepared using NEBNex Ultra DNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA, United States), and sequenced on the Illumina NovaSeq 6000 platform by Benagen company (Beijing, China). The raw data is trimmed with SOAPnuke v. 1.3.0 (Chen et al., 2018). Sequencing data were assembled with SPAdes v. 3.13.0 (Bankevich et al., 2012). Chloroplast genes were annotated using PGA (Qu et al., 2019) and MFannot 2 . The annotation results of protein-coding genes were further polished using Blast search (Altschul et al., 1997) with homologous genes from closely related chloroplast sequences. The tRNA and rRNA genes were identified using tRNAscan-SE v 1.23 and RNAmmer, respectively (Lowe and Eddy, 1997;Karin et al., 2007). All open reading frames (ORFs) (with length > 300 bp) were extracted by ORFfinder 3 . Intron boundaries were determined by comparing intron-containing genes with homologs without introns. In order to identify the intron class (group I or group II), RNA secondary structures of the introns were predicted using RNAweasel (Michel et al., 1989;Michel and Westhof, 1990). The circular chloroplast genome maps were drawn using OrganellarGenomeDRAW v 1.3.1 (Stephan et al., 2019). Annotated sequences were deposited in the NCBI GenBank database under the accession numbers listed in Table 1. The relative amino acid frequencies at premature termination UGA or UAG position was performed in web WebLogo v 2.8.2 (Crooks et al., 2004). The frequency of the six canonical and two noncanonical arginine codons was calculated for each species using MEGA v 4.0 (Tamura et al., 2007). The distribution of codon frequencies was obtained using the ggplot2 package in R v 3.6.0 (R Core Team, 2016).

Comparative Genomics
Synteny comparison was visualized using the progressiveMauve program under Mauve v 2.3.1 (Darling et al., 2010) with default settings after adjusting all genomes to start at the 16S rRNA gene. Only one copy of the IR region in T. odorata was included in the analysis.

Phylogenetic Analyses
The first phylogenetic analyze is based on 18S rDNA to verify the identity of our strains and place them among known Trentepohliales biodiversity. The next analysis used concatenated 1 http://sagdb.uni-goettingen.de/ 2 https://megasun.bch.umontreal.ca/cgi-bin/mfannot/mfannotInterface.pl 3 https://www.ncbi.nlm.nih.gov/orffinder/ data of 31 chloroplast genes from chloroplast genomes sampled across the Chlorophyta, in order to study molecular evolution of the chloroplast genomes in a broader taxonomic context. The 18S rDNA of strains BN17, GD1942, and GD1927 were obtained as described in Fang et al. (2021) and sequences for strains SAG 42.85, YN1242, YN1317 and a range of other Trentepohliales were downloaded from NCBI 4 . Sequences were aligned with MAFFT 7.0 (Katoh and Standley, 2014) and ambiguous regions removed using trimAl 1.2 (Capella-Gutierrez et al., 2009). We performed Maximum Likelihood (ML) analysis using the IQ-TREE web server 5 (Trifinopoulos et al., 2016) and Bayesian Inference (BI) in MrBayes v 3.1.2 (Huelsenbeck and Ronquist, 2001). For ML, the K3P + I + G4 model was selected using IQ-TREE as the best-fit model according to the Bayesian information criterion (BIC), and 1,000 bootstrap replicates were used to estimate statistical reliability. For BI, ModelFinder program of PhyloSuite v 1.2.2 was used to select the best-fitting model (Kalyaanamoorthy et al., 2017) available among those implemented in MrBayes, in this case GTR + I + G. Markov chain Monte Carlo analyses were run with four Markov chains for 1,000,000 generations or more, with trees sampled every 1,000 generations. The first 25% of the calculated trees were discarded as burn-in, and the remaining samples were used to construct a Bayesian consensus tree with posterior probabilities. A stationary distribution was assumed when the average standard deviation of the split frequencies was lower than 0.01.

Chloroplast Genome Features
We sequenced chloroplast genomes from two Trentepohlia and four Cephaleuros strains (Figure 1 and Supplementary Figure S1). The annotated chloroplast genomes were submitted to GenBank under the accession numbers given in Table 1. An 18S rDNA analysis situated the sequenced strains in the broader Trentepohliales biodiversity, showing wellresolved positions for Cephaleuros tumidae-setae BN 17 and Cephaleuros karstenii GD1942. Cephaleuros virescens was paraphyletic, with Cephaleuros virescens SAG 42.85 being closely related to Cephaleuros parasiticus GD1927 (Supplementary Figure S2). The positions of Trentepohlia sp. YN1317 and Trentepohlia sp. YN1242 received weak support values in both ML and BI analyses. The result also indicated that Trentepohlia odorata was most closely related to Trentepohlia annulata and Trentepohlia cf. umbrina (Supplementary Figure S2), consistent with previous work (Zhu et al., 2019).
Trentepohliales chloroplast genomes ranged in size from 216,308 to 408,697 bp. All the newly sequenced chloroplast genomes of Trentepohliales were circular-mapping and lacked the quadripartite structure (Figure 1 and Supplementary Figure S1) seen in many other Chlorophyta including Trentepohlia odorata. GC content was low and showed minor differences between species, ranging from 25.9% in Trentepohlia sp. YN1242 to 36.1% in Cephaleuros virescens SAG 42.85. The distribution of protein-coding genes in the double strands was skewed and varied among species. The distribution of proteincoding genes of Trentepohlia odorata was the most uneven (±, 19/44). The free-standing ORFs of length > 100 aa detected in the genomes varied between 16 and 95, with Trentepohlia odorata having the most (95), followed by Cephaleuros virescens (82). The 6 completely sequenced cpDNAs contain 93-98 genes, shared nearly identical gene repertoires (Supplementary Table S4) except for a handful of genes. For example, rpl12, rpl32, petL were not found in T. odorata, but were present in the rest of the Trentepohliales taxa. The ycf12 gene was lost in Trentepohlia odorata, Cephaleuros parasiticus, and Cephaleuros karstenii, ycf1 was lost in Cephaleuros tumidae-setae and Cephaleuros virescens.
There were also some duplicated genes, including the clpP gene with two copies in T. odorata. The trnE-TTC, trnD-GTC, petD, and atpA genes have two copies in Cephaleuros karstenii and trnG-TCC has two copies in Cephaleuros virescens and Cephaleuros parasiticus.
The gene repertoire of the chloroplast genome was quite homogeneous in Trentepohliales and similar to other Ulvophyceae members. A total of 77 genes including two ribosomal RNAs and 21 transfer RNAs are shared by all members of Ulvophyceae (Supplementary Table S4). In comparison with other lineages of Ulvophyceae, however, some chloroplast protein coding genes were lost in Trentepohliales, but found in other Ulvophyceae chloroplast genomes, such as ycf20, ycf4, rpl19, rpl36, psaM, cemA, ftsH, and accD genes. The genes that were lost in Trentepohliales have diverse functions including inorganic carbon dioxide uptake into chloroplasts (cemA), photosynthesis (ycf4), translation (rpl12, rpl32 lost in T. odorata), and proteins of unknown function (ycf20). Two tRNAs (trnI-AAT and trnR-TCG) were found in Trentepohliales but were missing in all other analyzed Ulvophyceae and the trnI-GAT gene was found in all Ulvophyceae except the Trentepohliales (Supplementary Table S4).

The Alternative Genetic Code
For the newly sequenced Trentepohliales chloroplast genomes, all three codons (UAA/UAG/UGA) used as a genuine termination codon present in plastid genes in Trentepohliales. Among them, UAA stop codons are common. However, some apparent UGA and UAG codons were also found in the otherwise well-conserved CDSs of several protein-coding genes, suggestive of an alternative genetic code (Figure 2 and Supplementary Table S5). The genes with in-frame UGA/UAG codons included ribosomal proteins (rps and rpl) and photosynthesis-related genes (chlB and ycf3). At 20 of these 40 positions, the corresponding amino acid residue in orthologous genes was highly conserved (i.e., present in almost all taxa in the alignment matrix), and most in-frame UAG/UGA codons were found at highly conserved positions where other green algae encode arginine (Figure 3). These results suggest that the most in-frame UGA/UAG codons may be reassigned as arginine. Additionally, the UAG codons of the third position of the rpl14 gene were found at highly conserved positions where other green algae encode isoleucine (Figure 3). UAG/UGA codons at conserved arginine and isoleucine positions, in combination with conservation of the sequence before and after the in-frame UAG/UGA codon, further support the presence of alternative genetic code. Several different amino acids were also observed at these positions: 52th and 67th in rps2, and 46th in rpl20 (Figure 2), the amino acid in these positions were not conserved, and therefore the amino acid coded by the UGA/UAG codon could not be determined with certainty.
The estimated evolution of arginine codon usage frequencies was shown in Figure 4 (only the portion of Ulvophyceae taxon were shown). In general, the canonical codon AGA is most commonly used, followed by the canonical codon CGU. Among non-canonical arginine codons, UGA being more common than UAG codon. The bias toward AGA and UGA is likely a product of the overall bias of these genes for GC residues, leading to increased use of G in the second codon position.

Introns
All analyzed Trentepohliales chloroplast genomes contained introns, and the distribution and types of introns are listed in Supplementary Table S5. The number of introns ranged from 16 (Trentepohlia sp. YN1242) to 62 (C. tumidae-setae BN 17). Among the newly sequenced Trentepohliales chloroplast genomes, only psbC, psbA, and rrl genes always contained introns. Intron prevalence was not necessarily correlated with genome size; for example even though the Cephaleuros tumidaesetae chloroplast genome was not the largest in Trentepohliales, it had the largest number of introns (Supplementary Table S6).
The atpA gene in Cephaleuros parasiticus contains one introns, it identified as a group I intron, which is commonly found in this gene in other green algae (Pombert et al., 2005). PsbA gene in Trentepohlia sp. YN1317 has six introns, that five introns were identified as group II introns, one as group I intron. The third and fourth introns contain one and three ORFs respectively. The psbA gene in Tydemania expeditionis had also been found two ORFs . Introns are common in psbC genes of Trentepohliales. The psbC of C. parasiticus and Trentepohlia sp. YN1242 both contain an intron, both of which are group I introns. The psbC introns in remaining species contain both group I and group II introns. The intron in psbD gene were present in all newly sequenced chloroplast genomes of Trentepohliales, and also found in Caulerpa okamurae and Tupiella akineta (Zheng et al., 2020).

Synteny Analysis
ProgressiveMauve alignment of seven Trentepohliales chloroplast genomes showed fragmentation of the genomes into many small locally collinear blocks (LCBs) and suggested high levels of rearrangements across the Trentepohliales green algae (Figure 5). A separate alignment within Cephaleuros identified > 20 LCBs and considerable rearrangements and inversions (Supplementary Figure S3).
Across the seven Trentepohliales chloroplast genomes, 17 conserved gene clusters encoding 64 genes were found (Figure 6). Nine of these gene clusters were partially conserved, and the rest were intact across all genomes. The incomplete clusters (trnR-TCT-trnW-CCA, trnP-TGG-trnH-GTG-trnM-CAT-trnR-CCT, psbM-ccsA-psbZ) differed between the genera Cephaleuros and Trentepohlia. Despite the reported close relationship between C. virescens and C. karstenii (Fang et al., 2021), 7 out of 17 gene clusters between them had differences. The breakage of two gene clusters (trnQ-TTG-trnC-GCA, clpP-petB) in C. karstenii set it apart from all other investigated species of the order Trentepohliales (Figure 6).  FIGURE 4 | The estimated frequencies of arginine codon usage. Canonical codon AGA is most commonly used, followed by other canonical codon CGU. Among the non-canonical codons, UGA is used more commonly then UAG. (1) A single origin of the non-canonical code along the branch leading to C. parasiticus, C. virescens, C. karstenii, C. tumidae-setae, and Trentepohlia sp. YN1317, and a subsequent reversal to the standard code in Cephaleuros sp., Trentepohlia sp. YN1242, and Trentepohlia odorata (indicated with oblique arrow and cross). (2) A stepwise process of evolution of the non-canonical code with a single initiation of the process along the branch leading to C. parasiticus, C. virescens, C. karstenii, C. tumidae-setae, and Trentepohlia sp. YN1317, followed by a completion of the process in all species except Cephaleuros sp., Trentepohlia sp. YN1242 and Trentepohlia odorata (oblique arrow combined with vertical arrow). (3) Two independent gains of the non-canonical code in the Trentepohliales (vertical arrow).

Comparison of Chloroplast Genomes in Ulvophyceae
Among the 35 chloroplast genomes of Ulvophyceae included in the phylogenetic tree, only eight possessed an inverted repeat region (Figure 7). Among these IR-containing chloroplast genomes, Trentepohlia odorata has the largest chloroplast genome due to the very long single-copy (SC) and inverted repeat regions. IR-less chloroplast genomes are mainly found in Bryopsidales, Trentepohliales, and Ulvales. In the seven Trentepohliales chloroplast genomes, the length of CDS region was almost equal, and the difference in chloroplast genome size was largely caused by length of intergenic regions and intron content (Figure 7), consistent with findings in other taxa . Likewise, the greatest influence on chloroplast genome size among Ulvophyceae more broadly was the length of intergenic regions, followed by intron content and the length of coding regions. Remarkably, the Trentepohliales chloroplast genomes were the largest among the 35 surveyed Ulvophyceae, with six out of seven Trentepohliales chloroplast genomes exceeding 250 kb (Figure 7). Trentepohliales were recovered as the sister lineage to Dasycladales in ML phylogenetic trees constructed by nucleotide (nt) data with two methods (partitioned by gene position, and codon position) (Figure 8 and Supplementary Figure S5), but support values for this relationship are unconvincing (Support value: 69 in Figure 8, 62 in Supplementary Figure S5). Taking into account the nucleotide composition bias that mainly affected by the position of the third codon, a phylogenetic tree inferred from the nucleotide alignment without third codon positions was summarized in Figure 9A. This result indicated that Trentepohliales was more closely related to Bryopsidales with bootstrap values 77, the support value in phylogenetic tree based on the amino acid (aa) data set were higher ( Figure 9B). No matter what type of phylogenetic tree, the internal relationships of Trentepohliales were constant, and non-monophyly of the genus Trentepohlia strongly supported by plastid phylogenomics. An interesting observation was that the Trentepohliales have substantially longer branch lengths than other Chlorophyta lineages, indicating an elevated rate of substitution in the group. In addition, the Ulvophycean were polyphyletic in all the above phylogenetic analyses, which is consistent with consistent with previous work (Sun et al., 2016;Fang et al., 2018;Zhu et al., 2019).

Inflated Chloroplast Genomes in Terrestrial Algae
The newly presented chloroplast genomes clearly establish the Trentepohliales as having very large chloroplast genomes (216-408 kbp), much larger than most published Chlorophyta chloroplast genomes. This difference in chloroplast genome size results primarily from variation in non-coding regions including the length of intergenic space and introns (Figure 7). We note that other terrestrial algae also have large chloroplast genomes, including Floydiella terrestris (521,168 bp) in the Chaetopeltidales (Brouard et al., 2010) and Gloeotilopsis sarcinoidea (262,888 bp) in the Ulotrichales (Turmel et al., 2015), and it is interesting to contemplate the possibility that there may be a common cause behind this, which we will expand upon below.

Vanishing Inverted Repeats
Although quadripartite architecture is believed to be ancestral in the green algae (Civán et al., 2014), the six new Trentepohliales chloroplast genomes have no inverted repeats, similar to the Bryopsidales Cremen et al., 2018). In addition to Ulvophyceae, it was reported IR had been lost many times in other lineages of Chlorophyta, including at least twice in the Chlorophyceae (Brouard et al., 2010) and at least seven times in the Trebouxiophyceae (Turmel et al., 2015). Three possible models have been proposed to explain how the IR is lost , all of which seem to match well for some groups of organisms but not others, suggesting there are many paths toward losing an IR.
Chloroplast genomes of Cephaleuros species lack the IR region but C. karstenii features duplicated copies of some genes that we speculate may be remnants of the IR region. The IR-less C. karstenii genome can best be compared to that of IRcontaining T. odorata, as they are similar in many ways. Our reasoning for a remnant IR lies in the trnD-GTC, petD and atpA genes, which are duplicated in C. karstenii and located near the IR/LSC boundary in T. odorata. Adding to the evidence is that the two copies of petD and atpA in C. karstenii have the same length but are oriented in opposite directions, a characteristic shared with the IR region. It is interesting to note that Oltmannsiellopsis viridis has the petD gene in its IR regions. While this species is not a close relative, it suggests that petD might have been part of an ancestral IR and that this duplication has been maintained in the evolutionary lineage to C. karstenii. The two trnD-GTC sequences in the IR-less genome of C. karstenii could be the product of a duplication event or an IR remnant. The trnD-GTC gene is also duplicated in the IR-containing plastid of some Trebouxiophyceae, for example in Dicloster acuatus (Lemieux et al., 2014).
Remarkably, the two newly sequenced Trentepohlia chloroplast genomes lack IR regions even though Trentepohlia odorata does have quadripartite structure. This situation is well explained by observing the phylogenetic positions of these two newly sequenced Trentepohlia strains and Trentepohlia odorata based on the 18S rDNA phylogenetic tree (Zhu et al., 2017). The two newly sequenced Trentepohlia strains are both in the core Trentepohlia clade, sister to the Stomatochroon and Cephaleuros clades (Supplementary Figure S2; Zhu et al., 2017). Trentepohlia odorata on the other hand was located elsewhere in the phylogeny as sister to the core Phycopeltis group. Both newly sequenced Trentepohlia strains are thus more closely related to Cephaleuros than to Trentepohlia odorata. At present, only Trentepohlia odorata presents quadripartite structure, but it seems likely that with further sampling, additional strains with this structure would be recovered in the Trentepohliales.

The Uncertain Phylogenetic Position of the Trentepohliales
Our phylogenetic analysis based on chloroplast genomes indicated that Trentepohliales are most closely related to Bryopsidales. While this is not consistent with some previous studies based on chloroplast genomes (Fučíková et al., 2014;Melton et al., 2015;Fang et al., 2018;Zhu et al., 2019), support for the sister relationship between Trentepohliales and Dasycladales is low in our phylogeny trees generated with the concatenated nucleotide (nt) data set treated with two methods (partitioned by gene position, codon position) (Figure 8 and Supplementary Figure S5). The close relationship relationship between Trentepohliales and Bryopsidales is better supported in phylogenetic trees based on amino acid datasets than nucleotide datasets (Figure 9). The in-depth phylogenetic work based on chloroplast data has shown that the relationships in the core Chlorophyta, particularly Bryopsidales, Dasycladales and Trentepohliales depend on analysis settings and whether data are analyzed at the nucleotide or amino-acid level (Fang et al., 2018;Jackson et al., 2018).
More gene-rich analyses based on nuclear genes show that the phylogenetic configuration of these orders is different from that obtained with chloroplast genome data, with Bryopsidales more closely allied with Chlorophyceae than with Dasycladales and Trentepohliales (Del Cortona et al., 2020;Gulbrandsen et al., 2021;Li et al., 2021). In these analyses of nuclear genes, Trentepohliales are recovered as sister to Cladophorales, an order that was not included in the present analysis because of their highly deviant chloroplast genomes divided into dozens of small hairpin chromosomes (Del Cortona et al., 2017).

Highly Dynamic Trentepohliales Chloroplast Genomes
Part of the difficulty in resolving the relationships of Trentepohliales with other Ulvophyceae based on chloroplast genomes may lie in their elevated rates of substitution. This can clearly be observed from the long branches for Trentepohliales in all our phylogeny being inferred from a set of highly conserved genes with critical functions in chloroplast functioning. Such elevated rates of molecular evolution can have a range of causes including elevated rates of mutation, lower effective population sizes leading to elevated drift thereby fixing more mutations, or poorer DNA repair mechanisms (Lynch, 2007;Smith, 2016). It is impossible to determine from the data at hand what drives the elevated rates in Trentepohliales, but we can make conjectures based on the biology of the organisms. First, it can be argued that due to their subaerial-terrestrial habitat, Trentepohliales may experience higher levels of UV irradiation, which could lead to higher mutation rates. Second, compared to their aquatic counterparts, subaerial algae may experience poorer dispersal, leading to more spatial isolation and hence smaller effective population size. Both these factors, along with a range of other influences, could contribute to the observed higher rates of molecular evolution.
The cause for the expansion of chloroplast genomes in Trentepohliales is similarly difficult to pin down. The mutational hazard hypothesis (Lynch et al., 2006) suggests that genomes tend to inflate when mutation rates and effective population sizes are low. This seems at least partially incompatible with the high rates of molecular evolution in Trentepohliales, which may point toward a higher mutation rate, but as indicated above, the potentially small effective population sizes may contribute to elevated rates of evolution too. Inaccurate DNA repair mechanisms are another potential mechanism that can lead to expansion of non-coding parts of organelle genomes (Christensen, 2014) and could help explain the growth of genomes even when mutation rates are high (Smith, 2016). The proliferation of introns also contributes and certainly seems to be among the factors contributing to large Trentepohliales chloroplast genome sizes, not unlike the situation in the recently characterized heavily expanded mitochondrial genomes of Bryopsidales (Repetti et al., 2020).
The dynamic nature of Trentepohliales chloroplast genomes also extends to patterns of rearrangement, with low levels of synteny across the order (Figure 5) or even within single genera (Supplementary Figure S3). Trentepohlia and Cephaleuros differ in the conservation of gene clusters, for example, trnR-TCT-trnW-CCA (Figure 6), and some gene clusters that are conserved across other Ulvophyceae were broken up in the Trentepohliales (Supplementary Figure S4). High variability in chloroplast genomes architecture has also been observed among the Bryopsidales, which also lacks the quadripartite structure (Cremen et al., 2018), but this is by no means a universal association as some higher-level groups of algae without quadripartite structure have exceptionally well-preserved synteny, for example the Ostreobineae (Verbruggen et al., 2017;Pasella et al., 2021) and Nemaliales (Costa et al., 2016).

Alternative Genetic Code
Deviations from the standard genetic code are universally present in mitochondrial genomes, chloroplast genomes and nuclear genes, and include reassignment of stop codons, loss of start and stop codons in some groups, including apicomplexan, dinoflagellate, green algae and ciliates (Hanyu et al., 1986;Lang-Unnasch and Aiello, 1999;Cocquyt et al., 2010;Matsumoto et al., 2011). In the current study, plastid genes in Trentepohliales have canonical stop codon (UAA/UGA/UAG) at the end, also have in-frame UGA/UAG codons. The reassigned UGA/UAG codons were shown to occur in 11 chloroplast genes of Trentepohliales (Supplementary Table S5). Based on the alignments, the most conserved positions where these codons occur encode arginine (R) in other green algae, suggesting this is the most likely amino acid encoded by this reassigned codon (Figure 3). Most codon reassignments can be attributed to changes in tRNAs, either by base modification, or RNA editing. In Trentepohliales, several in-frame UAG/UGA codons were found at highly conserved positions where other green algae encode arginine. Additionally, the alignment sequence before and after the inframe UAG/UGA codons are also conservative, the possibility of base insertion and loss in RNA editing is low. Our results reveal the distribution of a non-canonical genetic code in the Trentepohliales, where arginine is encoded by canonical codons as well as non-canonical UAG and UGA codons. Surprisingly, non-canonical code in Trentepohlia sp. YN1242 and Trentepohlia odorata is not observed. More specifically, the Trentepohliales lineages with the non-canonical code form a paraphyletic group. Several models have been proposed to explain the origin of the genetic code, for example, a stepwise acquisition model, the ambiguous intermediate model, multiple independent gains of the non-canonical code (Cocquyt et al., 2010;Li et al., 2021). If the internal relationship of the Trentepohliales in phylogenetic tree is correct, we infer alternative evolutionary scenarios to explain the distribution of the non-canonical code on that tree: (1) a single origin of the non-canonical code along the branch leading to C. parasiticus, C. virescens, C. karstenii, C. tumidae-setae, and Trentepohlia sp. YN1317, and a subsequent reversal to the standard code in Cephaleuros sp., Trentepohlia sp. YN1242, and Trentepohlia odorata (Figure 4: indicated with oblique arrow and cross).
(2) based on the ambiguous intermediate model, we infer that a stepwise process of evolution of the non-canonical code with a single initiation of the process along the branch leading to C. parasiticus, C. virescens, C. karstenii, C. tumidae-setae, and Trentepohlia sp. YN1317, followed by a completion of the process in all species except Cephaleuros sp., Trentepohlia sp. YN1242 and Trentepohlia odorata (Figure 4: oblique arrow combined with vertical arrow).
(3) Two independent gains of the non-canonical code in the Trentepohliales (Figure 4: vertical arrow). The hypothesis 1 is almost impossible, since a reversal from the non-canonical to the standard genetic code is improbable due to the profound genetic changes that coincide with codon reassignment (Cocquyt et al., 2010). Several independent acquisitions of non-canonical codes have been reported for ciliates (Tourancheau et al., 1995), in the present study, the possibility that codon reassignment occurred second times independently in Trentepohliales cannot be excluded. Previous studies have shown that stop codon reassignment was a gradual process, requiring changes in the tRNA and eukaryotic release factor 1 (RF1) genes (Beier and Grimm, 2001). Some conditions are required to complete the reassignment of UGA or UAG codons to arginine codon in Trentepohliales. One is that RF1 can recognize and easily bind to the UAG/UGA glutamine codon, and the other is mutant arginine tRNA can bind to UAG or UGA codons, so that it has the ability to translate UAG or UGA into arginine. A situation has been discovered in ciliate, in addition to canonical glutamine tRNAs, two supplementary tRNAs that evolved from the normal glutamine tRNA could recognize the non-canonical TAR codons (Hanyu et al., 1986). Interestingly, there are 4 tRNA genes for arginine (trnR-TCT, trnR-ACG, trnR-TCG, trnR-CCT, and trnR-CCT) in all chloroplast genomes of Trentepohliales expect Trentepohlia sp. YN1317 and Trentepohlia sp. YN1242. All four tRNA genes are associated with standard arginine. Surprisingly, trnR-TCG and trnR-CCT genes are lost in Trentepohlia sp. YN1242 with canonical genetic code, trnR-TCG genes are also lost in Trentepohlia sp. YN1317 which has non-canonical genetic code (Supplementary Table S4). However, Trentepohlia odorata with canonical code had 4 tRNA for arginine like other species with non-canonical genetic code. Since the chloroplast genome of Cephaleuros sp. is incomplete, it is not clear how many tRNA genes for arginine are in Cephaleuros sp., which hinders our understanding of the non-canonical genetic code in Trentepohliales. More samples and deeper analysis of arginine tRNA is needed to clarify evolutionary scenarios in Trentepohliales.
Interestingly, a non-canonical genetic code has also been described for Trentepohliales nuclear genes, where UAG and UAA codons are reassigned to glutamine (Cocquyt et al., 2010), this result has been verified by transcriptome-based data . The non-canonical genetic code was also found internally in chloroplast genes and nuclear genes of Cladophorales (Del Cortona et al., 2017). Considering that Cladophorales is likely to be the sister group of Trentepohliales (Cocquyt et al., 2010;Del Cortona et al., 2020), it seems reasonable to speculate that this codon reassignment occurred in the common ancestor of these two orders, prior to the drastic divergence of their chloroplast genome dynamics leading to inflated genomes in the Trentepohliales and hairpin chromosomes in the Cladophorales.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

AUTHOR CONTRIBUTIONS
HZ and GL designed research. JF, BL, and HZ performed research. JF and HV analyzed the data and wrote the manuscript. HZ and HV revised the draft manuscript. All authors contributed to the article and approved the submitted version. Supplementary Figure S4 | Comparison of conserved gene clusters between at least two of the six representative species of the Ulvophycean with gene order found in Trentepohliales. Black connected boxes indicate gene clusters that made up of contiguous genes. Black boxes that are contiguous but are unlinked indicate that the corresponding genes are not adjacent on the genome. Gray boxes indicate genes that are located elsewhere on genome. White boxes indicate genes that are missing from the chloroplast genomes.
Supplementary Figure S5 | ML phylogenetic tree of the Chlorophyta constructed by using nucleotide data partitioned by codon position. Node support is given as maximum-likelihood (ML) bootstrap value.