Metagenomes of Red Sea Subpopulations Challenge the Use of Marker Genes and Morphology to Assess Trichodesmium Diversity

Trichodesmium are filamentous cyanobacteria of key interest due to their ability to fix carbon and nitrogen within an oligotrophic marine environment. Their blooms consist of a dynamic assemblage of subpopulations and colony morphologies that are hypothesized to occupy unique niches. Here, we assessed the poorly studied diversity of Trichodesmium in the Red Sea, based on metagenome-assembled genomes (MAGs) and hetR gene-based phylotyping. We assembled four non-redundant MAGs from morphologically distinct Trichodesmium colonies (tufts, dense and thin puffs). Trichodesmium thiebautii (puffs) and Trichodesmium erythraeum (tufts) were the dominant species within these morphotypes. While subspecies diversity is present for both T. thiebautii and T. erythraeum, a single T. thiebautii genotype comprised both thin and dense puff morphotypes, and we hypothesize that this phenotypic variation is likely attributed to gene regulation. Additionally, we found the rare non-diazotrophic clade IV and V genotypes, related to Trichodesmium nobis and Trichodesmium miru, respectively that likely occurred as single filaments. The hetR gene phylogeny further indicated that the genotype in clade IV could represent the species Trichodesmium contortum. Importantly, we show the presence of hetR paralogs in Trichodesmium, where two copies of the hetR gene were present within T. thiebautii genomes. This may lead to the overestimation of Trichodesmium diversity as one of the copies misidentified T. thiebautii as Trichodesmium aureum. Taken together, our results highlight the importance of re-assessing Trichodesmium taxonomy while showing the ability of genomics to capture the complex diversity and distribution of Trichodesmium populations.

Trichodesmium are filamentous cyanobacteria of key interest due to their ability to fix carbon and nitrogen within an oligotrophic marine environment. Their blooms consist of a dynamic assemblage of subpopulations and colony morphologies that are hypothesized to occupy unique niches. Here, we assessed the poorly studied diversity of Trichodesmium in the Red Sea, based on metagenome-assembled genomes (MAGs) and hetR genebased phylotyping. We assembled four non-redundant MAGs from morphologically distinct Trichodesmium colonies (tufts, dense and thin puffs). Trichodesmium thiebautii (puffs) and Trichodesmium erythraeum (tufts) were the dominant species within these morphotypes. While subspecies diversity is present for both T. thiebautii and T. erythraeum, a single T. thiebautii genotype comprised both thin and dense puff morphotypes, and we hypothesize that this phenotypic variation is likely attributed to gene regulation. Additionally, we found the rare non-diazotrophic clade IV and V genotypes, related to Trichodesmium nobis and Trichodesmium miru, respectively that likely occurred as single filaments. The hetR gene phylogeny further indicated that the genotype in clade IV could represent the species Trichodesmium contortum. Importantly, we show the presence of hetR paralogs in Trichodesmium, where two copies of the hetR gene were present within T. thiebautii genomes. This may lead to the overestimation of Trichodesmium diversity as one of the copies misidentified T. thiebautii as Trichodesmium aureum. Taken together, our results highlight the importance of re-assessing Trichodesmium taxonomy while showing the ability of genomics to capture the complex diversity and distribution of Trichodesmium populations. Keywords: Trichodesmium, hetR, cyanobacteria, morphotype, Red Sea, taxonomy, metagenome, biodiversity

INTRODUCTION
Trichodesmium is a genus of filamentous cyanobacteria known for its ability to form large visible surface blooms in tropical and subtropical regions of the ocean. Having first been described in 1770 by James Cook (Transcription of National Library of Australia, 2004), Trichodesmium have been extensively studied due to their ability to form blooms of large biomass supporting marine food webs through their nitrogen and carbon fixing capabilities (Dugdale et al., 1961) and subsequent relevance to biogeochemical cycles within oligotrophic marine environments (Capone et al., 1997;McKinna, 2015;Capone, 2021).
Trichodesmium blooms are dynamic over space and time, typically consisting of a complex assemblage of several different subpopulations and colony morphologies that are predicted to exhibit unique ecological lifestyles (Hansel et al., 2016;Gradoville et al., 2017;Eichner et al., 2019;Delmont, 2021;Held et al., 2021;Wang et al., 2021). For example, the comparison of two distinct puff-shaped morphotypes termed "dense" and "thin" colonies, isolated from the Red Sea displayed a remarkable heterogeneity in their preference to capture and center dust (Wang et al., 2021). While colony morphotypes cannot always be linked to different genotypes or species (Rouco et al., 2016;Gradoville et al., 2017) genomic information regarding Trichodesmium colonies in this region is lacking in comparison to studies conducted in the Atlantic and Pacific Oceans.
The diversity of Trichodesmium bloom-forming populations was first assessed via colony morphology (Janson et al., 1995). With the advent of molecular techniques, the use of singlemarker genes represented a more accurate and consistent phylogenetic technique (McManus and Katz, 2009). Marker genes that have been used to assess Trichodesmium diversity include the 16S rRNA, hetR, a regulatory gene for the development of heterocysts, and nifH, which encodes the nitrogenase iron protein. The hetR gene is considered a good marker to assess Trichodesmium diversity as it is more variable (10%) in comparison to genetic markers such as 16S ribosomal RNA (2-3%) or nifH (2%; Orcutt et al., 2002;Lundgren et al., 2005).
To better understand the taxonomic diversity of Trichodesmium in the Red Sea, we isolated colonies from the Gulf of Aqaba and compared the resulting metagenomeassembled genomes (MAGs) to those of other Trichodesmium populations from the Indian, Pacific and Atlantic Oceans. We confirmed the presence of these genomes in the Red Sea based on amplicon sequencing of the hetR gene, and, in light of our findings, evaluated the ability of this marker gene to capture Trichodesmium diversity.

Trichodesmium Sampling and Extraction
To assess the Trichodesmium population of the Red Sea, Trichodesmium colonies were handpicked and separated into three distinct morphotypes (~100-200 colonies each) throughout the winter bloom (November 2020) using a 100 μm phytoplankton net at 20 m depth in the Gulf of Aqaba (Eilat, Israel; 29.56°N, 34.95°E; Figure 1). Colonies were washed three times in 0.2 μm filtered seawater before being filtered on a 0.2 μm PCC filter using a vacuum pump. Filters were frozen in liquid N 2 and kept at −80°C until analysis.
DNA was extracted from three Trichodesmium samples using the DNAeasy Plant Mini Kit (Qiagen) following the manufacturer's instructions. Metagenomic libraries were prepared at HyLabs (Rehovot, Israel) and sequenced with 30-60 million 2 × 150 bp reads on Illumina HiSeq 3000 at GENEWIZ (Leipzig, Germany). The metagenomic samples can be found under the accession numbers SRR17940154-6.

RESULTS AND DISCUSSION
Four non-redundant Trichodesmium MAGs were assembled from the Red Sea (Figure 1; Supplementary Table 1). Briefly, MAG R01 (T. erythraeum) and MAG R04 (T. miru-like) were assembled from the tuft sample. MAG R02 (T. thiebautii) was curated from dense-puff samples, while MAG R03 (T. nobislike) was from long puffs. The MAGs were of high quality, being 90% complete and with less than 5% redundancy, apart from MAG R04 which was 86.29% complete. To provide a wider framework of Trichodesmium diversity, we assembled three non-redundant Trichodesmium MAGs from the Pacific and the Atlantic Ocean, based on previously published raw data (Frischkorn et al., 2017(Frischkorn et al., , 2018bGradoville et al., 2017). These included two high-quality MAGs T01-2 and T02, as well as the lower quality MAG T01-1 (Supplementary  Table 3). The dominance of these lineages in the Red Sea could be confirmed from hetR gene phylotyping of Trichodesmium populations across several seasons (Supplementary Figure 2). Tuft colonies were primarily composed of T. erythraeum (~83% read abundance; Figure 1C; Orcutt et al., 2002).
Both thin and dense puff-shaped colonies were dominated by T. thiebautii (~78% read abundance; Supplementary Table 1; Figures 1A,B). This finding is similar to another genomic study where radial (dense-like) and non-radial puff morphologies were linked to the same genotype of T. thiebautii (Gradoville et al., 2017). Puff-shaped Trichodesmium colonies have the unique capability to interact, capture, and concentrate dust at the colony's core (Rubin et al., 2011) which can allow colonies to obtain limiting nutrients from the marine environment, such as iron or phosphorous (Basu and Shaked, 2018;Basu et al., 2019). In the field, thin puffs isolated from the Red Sea were shown to be more interactive in comparison to dense puffs in the capturing and concentrating of dust at the colony's core, while exhibiting pronounced gliding motility (Wang et al., 2021). Our findings highlight that these morphological variations do not appear to be explained genomically and that other yet uncharacterized factors are at play. We hypothesize that the observed variations in puff morphologies concerning the centering of dust (Kessler et al., 2019;Wang et al., 2021) are due to differences in gene regulation in response to yet unknown environmental conditions which may include nutrient limitation or grazing. In a laboratory setting, single filaments of T. erythraeum IMS101 clustered into the puff and tuft-shaped colonies when subjected to iron or phosphorous limitation (Tzubari et al., 2018). Nonetheless, Trichodesmium displays functional variability and heterogeneity at a single-colony level that remains largely unexplained (Eichner et al., 2019;Held et al., 2021) and subsequently the ability to predict whether a colony will center dust still requires ongoing exploration. Our data revealed subspecies diversity for both T. thiebautii and T. erythraeum (Figure 2). Trichodesmium thiebautii MAGs appeared to cluster according to broad geographical patterns as MAG R02 was more similar to that of T. thiebautii populations isolated from the Indian Ocean (98.3% ANI), than to the Atlantic T. thiebautii MAG (97.7% ANI). This concurs with the previous findings (Delmont, 2021). We also found genomic diversity within T. erythraeum as revealed from two distinct T. erythraeum MAGs from the Pacific Ocean (96.6% ANI). MAG T01-1 was highly similar to that of T. erythraeum isolated from the Red Sea (97.6% ANI; MAG R01), whereas MAG T01-2 was closely related to T. erythraeum IMS101 (99.4% ANI), which was isolated from the Atlantic Ocean (Prufert-Bebout et al., 1993). The hetR gene phylotyping of Trichodesmium populations in the Red Sea indicates the occurrence of an additional T. erythraeum subspecies similar to that of MAG T01-2 (Figure 3). We were, however, unable to assemble its genome or link it to a specific morphotype. It is still unclear if these two T. erythraeum subspecies within the Red Sea and the Pacific Ocean occupy distinct ecological niches.
We detected two potentially non-diazotrophic species within the Red Sea with minor read abundance in all three samples (Supplementary Table 1). Both genotypes could not be linked to a specific morphotype, and it is plausible that they either interleaved with T. thiebautii puffs or T. erythraeum tufts, or FIGURE 3 | Trichodesmium hetR phylogeny. The tree is based on the alignment of 312 nucleotide sequences. Bold text indicates the hetR sequences from metagenome-assembled genomes (MAGs; Red Sea sequences are marked in blue). Amplicon hetR sequences are shown in green. The five proposed Trichodemium clades are shown with colored boxes. Note that clades IV and V are polyphyletic and are therefore not completely resolved. MAGs R02 and T03 contain two hetR sequences, clustering within clades I and II, respectively. The average percentage identity (ANI) matrix between the hetR gene sequences can be found in Supplementary Table 4.
Frontiers in Microbiology | www.frontiersin.org 6 May 2022 | Volume 13 | Article 879970 FIGURE 4 | The hetR gene clusters in Trichodesmium sp. The synteny of the hetR genes (green) and the neighboring genes of the cluster is conserved between the Trichodesmium MAGs and Trichodesmium erythraeum IMS101. MAG R02 (Trichodesmium thiebautii) contains two hetR genes. Raw read counts, listed within the brackets, were mapped to the hetR gene cluster of MAG R02 from both the dense and long puff samples.
existed as single filaments. The low coverage reflected a previous estimate of the non-diazotrophic Trichodesmium of the Red Sea (Delmont, 2021). Whereas MAG R03 clustered within clade IV, which includes the non-diazotrophic T. nobis (95.6% ANI), and MAG R04 clustered within clade V containing the non-diazotrophic T. miru (94.6% ANI), both MAGs are on the threshold of being a distinct species within clade IV and V, respectively. We, therefore, turned to the single-marker gene hetR to further differentiate these MAGs, as hetR phylogeny captures a larger diversity of Trichodesmium than is possible through the limited number of Trichodesmium genomes currently available. The hetR sequences, but not the genomes, are available for T. pelagicum (AF490696.1), T. hildebrandtii (AF490679.1) in clade I, T. aureum (AF490680.1) in clade II, and T. contortum (AF013031.1), T. tenue (AF013033.1) in clade IV and T. miru in clade V. We show that T. nobis and T. contortum are closely related, potentially representing the same non-diazotrophic Trichodesmium species. Given that T. nobis has not been described morphologically, and the genome of T. contortum has not been sequenced, both can be linked through the phylogeny of marker genes. Whereas the previously published genome of T. nobis did not contain the hetR gene (Delmont, 2021), the clade IV MAG R03 did, and its hetR gene sequence was 99.78% similar to that of T. contortum (Figure 3; Supplementary Table 4). Complementing our findings, morphological descriptions of T. contortum depict the species as single spiral-shaped trichomes, rather than colonies, that are sporadically present in low abundance (<1% of total Trichodesmium biomass) within samples (Janson et al., 1995;Letelier and Karl, 1996;Orcutt et al., 2002). MAG R03 lacked the nitrogenase gene cluster, confirming that this clade is non-diazotrophic.
Our results indicate that hetR phylotyping can overestimate the diversity within Trichodesmium populations, in particular regarding clade II. Intriguingly, both T. thiebautii MAGs contained two distinct hetR copies (906 bp), which had 28 (MAG R02) and 29 bp (MAG T02) single nucleotide polymorphisms within a sequence. Further inspection of the two copies showed that they were encoded in close vicinity of each other and that the synteny is highly conserved across all the Trichodesmium genomes (Figure 4). Metagenomic reads of MAG T02 mapped uniformly to both hetR copies in both the dense (50.53 and 49.47% hetR reads) and long-puff samples (49.21 and 50.21% hetR reads), indicating that they likely originate from a single population DNA. The putative pseudogenes between the two hetR genes did not match any known viral sequence, isolate spacer, or metagenomic spacer within the Integrated Microbial Genome/Virus Repository (IMG/VR v3; 10 −5 cutoff value; Roux et al., 2021). Therefore, we suspect that these hetR genes are likely paralogs, although it is still unclear how T. thiebautii benefits from retaining both hetR sequences. Each hetR sequence was placed into a distinct phylogenetic clade, where one copy grouped with the hetR sequence of T. aureum (clade II), and the other one was placed within clade I, together with T. thiebautii, T. pelagicum, and T. hildebrandtii (Figure 3).
Amplicon sequencing of the hetR gene showed highly similar abundances of T. thiebautii and T. aureum during most sampling days. Some fluctuations in the relative read abundance were observed (Supplementary Figure 2). These findings can hint at the occurrence of T. aureum in the Red Sea. Yet, the presence of T. aureum-like hetR sequence as a paralog in a single population of T. thiebautii can lead to similar observations. Trichodesmium aureum has previously been described based on cloning and Sanger sequencing of the hetR gene whereas its morphology has been described poorly (Lundgren et al., 2005;Hynes et al., 2012). Therefore, past distinctions made between T. thiebautii and T. aureum may reflect the morphological diversity within the same genotype or species. We note that Frontiers in Microbiology | www.frontiersin.org the Atlantic T. thiebautii contained one hetR paralog that clustered with the hetR sequence of clade IV (T. erythraeum), yet we cannot exclude a binning artifact in this case, as the sequences did not belong to a single scaffold.

CONCLUSION AND FUTURE PERSPECTIVES
Morphotype metagenomics enabled us to better capture the diversity that is present within Trichodesmium populations, while revisiting past attempts to differentiate between colonies (Scornavacca et al., 2020;Smith and Hahn, 2021). We were able to reconstruct the genomes of four different Trichodesmium species isolated from the Red Sea, that were subsequently assigned to four different Trichodesmium clades including the more elusive non-diazotrophic clades IV and V. Single-colony, rather than bulk population analysis, may help address whether these non-diazotrophic lineages occur as single free-floating filaments, from colonies of yet unknown morphology or are included within the single T. erythraeum puff or T. thiebautii puff colonies. We further show that hetR gene-based phylotyping could overestimate Trichodesmium diversity, highlighting the importance of re-assessing past attempts at phylogeny. Other marker genes such as full-length 16S rRNA as suggested by Delmont (2021) or the single-copy cyanobacterial marker rbcL (Singh et al., 2015) may assess the phylogenetic diversity of Trichodesmium (Supplementary Figure 1), although further verification will be required. Overall, the phylogenetic and functional diversity of Trichodesmium in the world's oceans is still poorly understood to date. Thus, genomics in combination with gene expression and physiology, especially as single-colony approaches, will remain an important avenue for future exploration of these key species.

DATA AVAILABILITY STATEMENT
The four Trichodesmium genomes, the raw metagenomic sequences and the hetR amplicon sequencing data presented in this study are deposited under the BioProject ID number PRJNA804487.

AUTHOR CONTRIBUTIONS
CK, YS, and MR-B conceived the study. CK, FZ, and SW collected and extracted the metagenomic samples. Under guidance of MR-B, CK analyzed the metagenomic data. EL and SB collected samples for hetR amplicon sequencing. Under the guidance of IB-F, EL extracted and analyzed the amplicon sequencing data. The manuscript was compiled and written by CK with the help of MR-B. YS, MR-B, IB-F, and EL provided feedback and guidance during the writing process. All authors contributed to the article and approved the submitted version.