Ecogenomics and Taxonomy of Cyanobacteria Phylum

Cyanobacteria are major contributors to global biogeochemical cycles. The genetic diversity among Cyanobacteria enables them to thrive across many habitats, although only a few studies have analyzed the association of phylogenomic clades to specific environmental niches. In this study, we adopted an ecogenomics strategy with the aim to delineate ecological niche preferences of Cyanobacteria and integrate them to the genomic taxonomy of these bacteria. First, an appropriate phylogenomic framework was established using a set of genomic taxonomy signatures (including a tree based on conserved gene sequences, genome-to-genome distance, and average amino acid identity) to analyse ninety-nine publicly available cyanobacterial genomes. Next, the relative abundances of these genomes were determined throughout diverse global marine and freshwater ecosystems, using metagenomic data sets. The whole-genome-based taxonomy of the ninety-nine genomes allowed us to identify 57 (of which 28 are new genera) and 87 (of which 32 are new species) different cyanobacterial genera and species, respectively. The ecogenomic analysis allowed the distinction of three major ecological groups of Cyanobacteria (named as i. Low Temperature; ii. Low Temperature Copiotroph; and iii. High Temperature Oligotroph) that were coherently linked to the genomic taxonomy. This work establishes a new taxonomic framework for Cyanobacteria in the light of genomic taxonomy and ecogenomic approaches.


INTRODUCTION
Earth is home to nearly one trillion (10 12 ) microbial species that have evolved over ∼4 billion years (Locey and Lennon, 2016). Cyanobacteria emerged ∼3 billion years ago, ushering Earth's transition from anoxygenic to oxygenic conditions through photosynthesis (Schirrmeister et al., 2011a). Throughout their evolution, Cyanobacteria became one of the most diverse and widely distributed Prokaryotes, occupying many niches within terrestrial, planktonic, and benthic habitats. Their long history evolved in a broad heterogeneity comprising unicellular and multicellular, photosynthetic and non-photosynthetic (i.e., Melainabacteria) (Schirrmeister et al., 2011a;Di Rienzi et al., 2013;Soo et al., 2014), free-living, symbiotic, toxic and predatory organisms (Soo et al., 2015), with genomes sizes ranging from 1 to 10 Mb (Shih et al., 2013). Here we consider Cyanobacteria phylum as consisting only of oxygenic phototrophs. Cyanobacteria (also known as the Cyanophyceae, Cyanophyta, cyanoprokaryota, blue-green algae or blue-green bacteria) share similar metabolic features with eukaryotic algae and have been named according to the Botanical Code (Kauff and Büdel, 2010). The inclusion of Cyanobacteria in taxonomic schemes of Bacteria was only proposed in 1978 by Stanier et al. (1978), and through time the bacterial taxonomic names have come into conflict with the botanical nomenclature (Oren, 2004;Oren and Garrity, 2014). More than two decades passed before a Note to General Consideration 5 (1999) was published for Cyanobacteria to be included under the rules of the International Committee on Systematic Bacteriology (ICSB)/International Committee on Systematic of Prokaryotes (ICSP) (Tindall, 1999;De Vos and Trüper, 2000;Labeda, 2000). Taxa nomenclature within this group has long been a topic of discussion, but currently there is no consensus Oren and Tindall, 2005;Oren et al., 2009;Oren and Ventura, 2017). As a result, more than 50 genera of Cyanobacteria have been described since 2000, and many of them remain unrecognized in the List of Prokaryotic Names with Standing in Nomenclature, LPSN, http://www.bacterio.net (Parte, 2014) or in databases (e.g., NCBI).
The Cyanobacteria form a challenging group for the microbiologists. Their traditional taxonomy based on morphologic traits does not reflect the results of phylogenetic analyses (Rippka et al., 1979;Boone and Castenholz, 2001;Gugger and Hoffmann, 2004;Schirrmeister et al., 2011b;Hugenholtz et al., 2016). The predominance of morphology assembled unrelated Cyanobacteria into polyphyletic species and genera and higher taxonomic categories which require revisions in the future (Komárek et al., 2014). The polyphyly is an indicative of the taxonomic mislabeling of many taxa. The 16S rRNA gene sequences were useful in charting and characterizing microbial communities (Kozlov et al., 2016) but this molecule lack sensitivity for evolutionary changes that occur in ecological dynamics, where microbial diversity is organized by physicochemical parameters (Choudoir et al., 2012;Becraft et al., 2015). Hence, the processes that shape cyanobacterial communities over space and time are less known. A recent study proposed that there should be 170 genera of Cyanobacteria based on 16S rRNA sequences only (Kozlov et al., 2016). Farrant et al. (2016) delineated 121 Prochlorococcus and 15 Synechococcus ecologically significant taxonomic units (ESTUs) in the global ocean using single-copy petB sequences (encoding cytochrome b6) and environmental cues.
High Throughput Sequencing (HTS) have revolutionized the practice of microbial systematics, providing an informative, reproducible, and portable tool to delineate species, reconstruct their evolutionary history, and infer ecogenomic features Konstantinidis and Tiedje, 2005a,b;Garrity and Oren, 2012;Gribaldo and Brochier-Armanet, 2012;Shih et al., 2013;Sutcliffe et al., 2013;Hugenholtz et al., 2016). This approach allows both cultured (Al-saari et al., 2015;Appolinario et al., 2016) and uncultured microorganisms (Iverson et al., 2012;Brown et al., 2015;Hugerth et al., 2015) to be studied. The latter is especially important because the cyanobacterial cultivation in laboratory is another hurdle in the study of this group of bacteria.
Recommendations that nomenclature should agree with and reflect genomic information were stated during the pregenomic era (Wayne et al., 1987), due nothing describes an organism better than its genome. Sequence-based methods to delimit prokaryotic species have emerged to define and to improve cut-offs criteria during the genomic era Konstantinidis and Tiedje, 2005a,b;Konstantinidis et al., 2006;Goris et al., 2007;Richter and Rossello-Mora, 2009;Auch et al., 2010a;Thompson et al., 2013a,b;Varghese et al., 2015), demonstrating a greater discriminatory power. Inexorable advances in methodologies will incorporate genomics into the taxonomy and systematics of the prokaryotes, boosting the credibility of taxonomy in the current post-genomic era Chun and Rainey, 2014). Up-to-date, while several groups have been analyzed through a genomic-wide view (Gupta et al., 2015;Adeolu et al., 2016;Hahnke et al., 2016;Ahn et al., 2017;Amin et al., 2017;Waite et al., 2017), many others have faced hurdles, such as Cyanobacteria. However, a genomic taxonomy approach has successfully been applied to elucidate the taxonomic structure of the two cyanobacterial genera, Prochlorococcus and Synechococcus (Thompson et al., 2013a;Coutinho et al., 2016a,b). As genomic taxonomy postulates numeric, non-subjective, cut-offs for taxa delimitation, strains were considered to belong to the same species when share at least 98.8% 16S rRNA gene sequence similarity, 95% of AAI, and 70% GGD (Konstantinidis and Tiedje, 2005a;Thompson et al., 2013a,b), while species from the same genus form monophyletic branches (Yarza et al., 2008;Qin et al., 2014). It is in agreement with the concept of species as a discrete, monophyletic and genomically homogeneous population of organisms that can be discriminated from other related populations by means of diagnostic properties (Rossello-Mora and Amann, 2001;Stackenbrandt et al., 2002). The availability of whole-genomes opened the doors for an in-depth knowledge in microbial diversity and ecology, where the entire genomic pool may be applied to understanding the forces that govern community structure. The use of ecogenomic analysis postulates a reliable and scalable approach to delineate species and genera in order to reconstruct their evolution and to draw a global picture of possible ecological determinants (Di Rienzi et al., 2013;Soo et al., 2014;Spang et al., 2015;Thompson et al., 2015;Anantharaman et al., 2016;Garrity, 2016;Hug et al., 2016;Hugenholtz et al., 2016). Our hypothesis is that a phylogenomic framework will reflect ecologic groups found in nature.
To test this hypothesis, we first established a phylogenomic framework, using genomic signatures (i.e., a tree based on conserved gene sequences, average amino acid identity, and genome-to-genome distance), with the circumscription of species and genera. We then classified the genomes in three major groups according to their ecological traits as inferred through metagenomics and environmental metadata. Finally, we correlated the three disclosed ecogenomic groups (i. Low Temperature; ii. Low Temperature Copiotroph; and iii. High Temperature Oligotroph) with the circumscribed species and genera. We observed that the taxonomic delineation of species and genera is coherent with the ecogenomic groups.

Genome Election
Cyanobacterial genomes publicly available in January 2016 were retrieved from RefSeq (NCBI Reference Sequence Database), GenBank and GEBA (Genomic Encyclopedia of Bacteria and Archaea) databases. Genome completeness was assessed with CheckM , and the genomes that were at least 90% complete and assembled in <500 contigs were used for further analyses. Ninety-nine genomes were selected based on that criterion, and they are listed in Table 1 (additional  information on Table S2).
The Manhattan distances were calculated based on the percentage AAI values of every genome (genome-genome matrix) and was used as the input for making the hierarchical clustering using the hclust() function in R (R Development Core Team, 2011). This distance is able to indicate how far/close the genomes are located from each other. The heatmap was produced by heatmap.2 {gplots} package in R, with background color of each panel mapping to percentage AAI values.

Phylogenetic Analysis
To establish the phylogenetic structure of the phylum Cyanobacteria, phylogenetic trees were constructed using the 16S rRNA gene sequences and the concatenated alignments of a set of conserved genes, most of which encode ribosomal proteins.

Ribosomal RNA Sequences
The small subunit ribosomal RNA (16S rRNA) sequences from all cyanobacterial strains for which whole genome sequence data are publicly available (exception see below, thus N = 97), as well as 16S rRNA gene sequences from additional typestrains available (N = 14) were all analyzed. The sequences were retrieved from the ARB SILVA database (Pruesse et al., 2007;Quast et al., 2013). Whenever sequences were not available, they were retrieved directly from the genomes using RNammer 1.2 Server (Lagesen et al., 2007). Sequences were aligned through MUSCLE v. 3.8 (Edgar, 2004), with default settings, and Gblocks 0.91b (Castresana, 2000;Talavera and Castresana, 2007) was used for alignment curation. Using MEGA 6 (Tamura et al., 2013), best-fitting nucleic acid substitution models were calculated through the MLModelTest feature. Models were ranked based on their Bayesian Information Criterion (BIC) scores as described by Tamura et al. (2013). The model with the lowest BIC score was selected and used for further phylogenetic analysis. The phylogenetic inference was obtained using the Maximum Likelihood method based on the Kimura 2 parameter method with the Gamma distributed rate variation (K2+G) as the nucleotide substitution model, which was estimated from the data. The support branches of tree topology were checked by 1,000 bootstrap replicates. The 16S rRNA gene alignments were used to estimate the degree of genetic distance between strains through the Tajima-Nei method (Tajima and Nei, 1984).
Gloeobacter violaceus PCC 7421 was set as the outgroup in both trees. Trees were visualized with FigTree, version 1.4.2 (Rambaut, 2015). Due to incomplete or partial sequences, Synechococcus sp. CB0101 was omitted from these analyses. Planktothrix mougeotii NIVA-CYA 405 as well as Planktothrix prolifica NIVA-CYA 540 were not included in the phylogenetic analyses because 16S rRNA sequences are not currently available for these strains (and not retrievable from their genomes).
The type-strains or the type-species of each taxa were included in the 16S phylogenetic tree to confirm the phylogenetic relatedness of the cyanobacterial genomes. Designations of type strain or type species were not available for Chaemaesiphon minutus PCC6605, Pleurocapsa sp. PCC7319, Rivularia sp. PCC7116, Synechocystis sp. PCC7509, Trichodesmium erythraeum IMS01, Xenococcus sp. PCC7305, cyanobacterium ESFC-1, and cyanobacterium JSC-12. Geitlerinema sp. PCC7105 is the reference strain for marine species of Geitlerinema, and PCC73106 is the reference strain for Gloeocapsa (Sarma, 2012).

Conserved Marker Genes
A tree was generated using 31 conserved gene sequences previously validated as phylogenetic markers for (cyano) bacteria (Wu and Eisen, 2008, and recently used by Shih et al., 2013 andKomárek et al., 2014). The sequences of these proteins were mined using the AutoMated Phylogenomic infeRence Application-AMPHORA2 tool (Wu and Scott, 2012), through default settings for the Bacteria option, and with a cut-off value of 1.e−10. Individual alignments were performed for each of the 31 gene sets through MUSCLE v. 3.8 with default settings (Edgar, 2004). All alignments were then concatenated. Only genomes which present all the set of conserved genes were used in the phylogenetic analysis. A Maximum Likelihood tree was constructed using RaxML v. 7 (Stamatakis, 2006) and the Dayhoff+G likelihood model. One thousand bootstrap replications were calculated to evaluate the relative support of the branches. Trees were visualized with FigTree, version 1.4.2 (Rambaut, 2015).   Ecological and molecular features were indicated, such as environment sampling, as well as number of contigs, genome size, GC % content, completeness score, and carboxysome type. The following classifications for detailed for comparison: NCBI (order and family), numeral identification and genera according to Kozlov et al. (2016), and identification based in the database CyanoType v.1 (Ramos et al., 2017). Type strains or type species are indicated with overwritten T at the end of the name. a Cyanobacterial genomes used in Komárek et al. (2014)   associated metadata (Sunagawa et al., 2015). Sample-associated environmental data were inferred across multiple depths at global scale of Tara's metagenomics sampling: (i) surface water layer (5 m, s.d. = 0); and (ii) subsurface layer, including deep chlorophyll maximum zone (71 m, s.d. = 41 m) and mesopelagic zone (600 m, s.d. = 220 m) (Sunagawa et al., 2015). Eight freshwater metagenomes were retrieved for analysis from the Caatinga biome microbial community project along with their associated metadata (Lopes et al., 2016). Metagenome reads were mapped to a database containing the ninety-nine analyzed cyanobacterial genomes through Bowtie2 (Langmead and Salzberg, 2012) using -very-sensitive-local anda options. Abundance of genomes across samples was calculated based on the number of mapped reads as described by Iverson et al. (2012). Metagenomes were compared based on the relative abundances of the ninety-nine analyzed genomes within them using non-metric multidimensional scaling (NMDS).
Spearman correlation coefficients (R, or Spearman's rho) were calculated for the abundance of each genome and the levels of measured environmental parameters across samples. Next, a dissimilarity matrix of Manhattan distances was calculated based on the Spearman correlation values of every genome. All correlations were used by this analysis regardless of the corrected p-value, as non-significant correlations are still ecologically informative as they indicate weak associations between microorganisms and environmental parameters. Finally, this dissimilarity matrix was used as input for hierarchical clustering using the complete linkage method within the hclust() function in R. The resulting dendrogram was visually inspected to define groups (i.e., ecogenomic groups) of organisms with similar correlation patterns which were named based on the main correlated feature.
The classification reassessment was made integrating the results of genomic taxonomy, phylogenomic analysis and ecogenomic signals through an accurately comparison.

Phylogenomic Framework Reconstruction
The tree based on conserved marker genes (Figure 1) revealed the topology with the presence of well-defined nodes in general with bootstrap support values greater than 50% over 1,000 replicates. The phylogenomic tree (Figure 1) gave a higher resolution than the 16S rRNA phylogenetic analysis ( Figure S1 and Table S1), in the means that strains were better discriminated in the conserved marker genes tree (e.g., Parasynechococcus group, Figure 1 and Figure S1). The species assignations were considered correct when organisms located on the same phylogenetic branch as the corresponding type strains or type species presented the 16S rRNA sequence similarity higher than 98.8%, such as Crinalium epipsammum SAG22.89 T (Figure S1) and Crinalium epipsammum PCC9333 (Figure S1 and Figure 1).

Genomic Diversity of Cyanobacteria
In total, we found 57 branches corresponding to genera based on the AAI and GGD analyses (Figure 2). The genus and species cut-off delimitation were ≥70% and ≥95% AAI similarity FIGURE 1 | Phylogenomic tree of the Cyanobacteria phylum with the proposed new names. Tree construction was performed using 100 genomes (ninety-nine used in this study plus the outgroup), based on a set of conserved marker genes. The numbers at the nodes indicate bootstrap values as percentages greater than 50%. Bootstrap tests were conducted with 1,000 replicates. The unit of measure for the scale bars is the number of nucleotide substitutions per site. The Gloeobacter violaceus PCC 7421 sequence was designated as outgroup. Capital letters indicate environmental source: F, freshwater; M, marine; P, peat bog (sphagnum); S, soil; T, thermal; and §, other habitat. New names are highlighted in red. Overwritten T indicates type strain or type species. Ecogenomic groups are depicted in different colors as indicated in the legend: Low Temperature group; Low Temperature Copiotroph group; and High Temperature Oligotroph group. Cases depicted in the Results section are in bold.
Frontiers in Microbiology | www.frontiersin.org FIGURE 2 | Heatmap displaying the AAI levels between cyanobacterial genomes. The intraspecies limit is assumed as ≥95%, whereas genera delimitation is assumed as ≥70% (dashed lines) AAI. Clustering the genomes by AAI similarity was done using a hierarchical clustering method in R (hclust), based on Manhattan distances. The AAI values are associated with the respective thermal color scale located at the bottom left corner of the figure. The proposed new genera and species names were adopted in this figure. respectively. Thirty-three new genera and 87 species (of which 28 are new species) were circumscribed. From a total of ninetynine genomes used in this study, 69 were previously classified to the species level, whereas the remaining 30 had incomplete taxonomic classification (i.e., only sp. or unclassified). In total, 13 genera (from a total of 33) and 38 species (from a total of 69) were taxonomically reclassified and/or re-named. Thus, we found that 71 of all analyzed genomes required reassignment at one or more ranks to reconcile existing taxonomic classifications with our new genomic taxonomy (Figure 2 and Figure S1).
Over the next section, we highlight four specific cases to exemplify cyanobacterial taxonomic issues that were resolved through our genome-driven approach (see Figure S2). These cases illustrate how the use of genomic taxonomy in Cyanobacteria provides relevant information (Data Sheet 1, Formal description of new genera and species).
Case I. Oscillatoria group. Analysis of the five genomes of Oscillatoria distinguished four genera, based on the genomic signatures (i.e., GGD, AAI, 16S, and conserved marker genes tree): (i) Oscillatoria acuminata PCC 6304

Charting Ecological Groups of Cyanobacteria
Our phylogenomic analysis was complemented by an ecological characterization of the analyzed strains, providing essential insights into relations between taxonomy, phylogeny, and ecological role (Beiko, 2015). Correlating the relative genome abundances with environmental parameters measured at Tara Oceans samples (Sunagawa et al., 2015) revealed associations between Cyanobacteria and physical, chemical and biological variables of their habitats (Figure 3). The ecogenomic analysis clustered genomes based on their profiles of correlations to environmental parameters. Three major ecogenomic groups were found: (a) Low Temperature; (b) Low Temperature Copiotroph; and (c) High Temperature Oligotroph (Figure 4 and Figure S3). Closely related species of the same genus showed tight associations with environmental parameters, grouped to the same ecogenomic group, such as Arthrospira sesilensis C1 T and A. nitrilium PCC 8005 T , Eurycolium pastoris CCMP1986 T and E. tetisii MIT9515 T , and Pseudosynechococcus subtropicalis WH7805 T and P. pacificus WH7803 T (Figure 3). In a few cases, closely related species showed different ecogenomic groups (P. agardhii NIVA-CYA-407 and P. agardhii NIVA-CYA-540 compared to other Planktothrix strains, and between Lyngbya aestuarii BL-J and L. limosa PCC 8106 T ) (Figure 3).
Members of the Low Temperature group were characterized by positive correlations with the concentration of nitrogen and phosphorus sources; weak positive correlations with minimum generation time, silicate and depth; and by negative correlations with temperature, microbial cell abundance, oxygen availability, and salinity (Figures 3, 4). Meanwhile, members of the Low Temperature Copiotroph group were characterized by strong positive correlations with the concentration of nitrogen and phosphorus; positive correlations (stronger than those presented by Low Temperature group) with minimum generation time, silicate and depth; and by negative correlations (also stronger than those presented by Low Temperature group) with temperature, microbial cell abundance (in particular with autotroph cell density), oxygen availability, and salinity (Figures 3, 4). Finally, members of High Temperature Oligotroph group were characterized by negative correlations with the concentration of nitrogen and phosphorus and positive correlations with temperature and autotroph cell abundance (Figures 3, 4).
As suggested by correlation analyses (Figures 4C,D), NMDS revealed the Low Temperature Copiotroph group to be more abundant in cold and eutrophic waters, while the High Temperature Oligotroph group exhibited the opposite pattern and was more abundant in warm and oligotrophic environments (Figures 4A,B). In turn, Low Temperature was more abundant at intermediate conditions between these polar opposites and was shown to be more abundant in samples with higher cell densities and NO 2 concentrations.
We also investigated the abundance of the ecogenomic groups in freshwater environments. Unfortunately, there is no currently available large-scale dataset of freshwater metagenomes with associated metadata comparable to the Tara Oceans dataset. To define freshwater ecogenomic groups we chose to extrapolate the classification obtained from the analyses of the marine dataset. In freshwater metagenomes, the Low Temperature Copiotroph was the dominant group in all the analyzed samples ( Figure S4A). NMDS of freshwater samples suggested that Low Temperature group displayed a preference for higher pH and DOC, nitrite and total nitrogen concentrations whereas the High Temperature Oligotroph group has a preference for habitats with higher concentrations of POC, phosphorus, ammonia and nitrate (Figures S4B,C).
FIGURE 3 | Correlations between Cyanobacteria and environmental variables. Heatmap displays Spearman correlation scores between the abundance of cyanobacterial genomes and measured environmental parameters at Tara Ocean sampling sites. Correlations that showed q corrected p < 0.05 are marked with stars. Variables were grouped through the complete linkage clustering method using Manhattan distances as input. The proposed new genera and species names were adopted in this figure.

DISCUSSION
The use of HTS technologies and environmental surveys have allowed studies that link phylogenomics and ecogenomics of Cyanobacteria. High-throughput genome sequence technologies are causing a revolution in microbial diversity studies. Recent studies have obtained dozens of new metagenome-assembled genomes from complex environmental samples (Brown et al., 2015;Hugerth et al., 2015;Almstrand et al., 2016;Haroon et al., 2016;Pinto et al., 2016). The abundance of these genomes across different environments can now be inferred from metagenomics, including their metabolic and ecological potential. It is clear that a new system is required to allow for precise taxonomic identification of these new genomes.

WGS as the Basic Unit for Cyanobacteria Genomic Taxonomy (CGT)
Comparative genomic studies allow for identification of sequence groups with high genotypic similarity based on variation in protein coding genes distributed across the genomes. Analyses of environmental metagenomes and microbiomes have shown that microbial communities consist of genotypic clusters of closely related organisms (Farrant et al., 2016). These groups display cohesive environmental associations and dynamics that differentiate them from other groups co-existing in the same environment. In light of new concepts, restlessness is mounting with the inability to define the microbial species itself. Evolution studies on closely related bacteria show rapid and highly variable gene fluxes in evolving microbial genomes, suggesting that extensive gene loss and horizontal gene transfer leading to innovation are the dominant evolutionary processes (Batut et al., 2014;Puigbò et al., 2014). CGT will solve the often-observed issue that even closely related genomes contain high gene content variation, that gives phenotypic variation. CGT is completely adjusting to the genomics era, addressing the needs of its users in microbial ecology and clinical microbiology, in a new paradigm of open access (Beiko, 2015). CGT will provide a predictive operational framework for reliable automated and openly available identification and classification (Thompson et al., 2015).

Proposals for Cyanobacterial Taxonomy
A main gap exists and is growing each day between the formal taxonomy of Cyanobacteria and the forest of acronyms and numbers in the different databases. Indeed, the nameless operational taxonomic units (OTUs), strains, isolates and WGS sequences (Beiko, 2015;Kozlov et al., 2016) form the great majority of data in private and public databases. There is a need to re-examine the Cyanobacteria prokaryote species, taking into account all recently developed concepts, e.g., the gene flow unit, OTU, ESTU and Candidate taxonomic unit (CTU) in the context of a pragmatic genome-based taxonomic scheme. The type species or strain can be a culture, DNA or a WGS. The CGT system should maintain all of the existing information, integrating it with new data on DNA, genomes, isolates/strains, cultured and uncultured, "Candidatus" cases and reconstructed genomes from metagenomes (Brown et al., 2015;Hugerth et al., 2015). The international initiatives of GEBA are currently working on determining the WGS of all type strains of known microbial species to shorten this gap (more than eleven thousand genomes).
We strongly recommended that the modern taxonomy should be based on WGS. The enormous amount of unique gene sequences (e.g., 16S rRNA gene) databases should be always compared to the available genome-based phylogeny. Studies focusing on one specific taxa/group cannot be disregarded the phylogenetic analysis for the whole major taxa. It will avoid the inclusion of the previously erroneous taxa on the analysis. Furthermore, the anxiety to give a new name should be reconsidered. Proposes of new taxa where the phylogenetic relationship was not firmly established are frequently found (e.g., Rajaniemi et al., 2005).

Ecogenomics and the Delineation of the Ecological Niches of Cyanobacteria
Correlation analysis allowed us to characterize how the abundance of the analyzed genomes is associated with environmental parameters at both marine and freshwater habitats. These associations shed light on ecological interactions taking place within aquatic habitats that are responsible for delineating the ecological niches of Cyanobacteria. Our results showed that taxonomic affiliation and niche occupancy are coherently linked, i.e., closely related species of the same genus often shared correlation patterns, and consequently were assigned to the same ecogenomic group.
The identification of specific features responsible for defining niche occupancy among these organisms depends on extensive experimental data focusing on both physiological and morphological features, which is outside of our scope. Nevertheless, we speculate that some features are likely playing a role in this process: (1) Transcriptional patterns: The way in which Cyanobacteria regulate gene expression in response to changing environmental conditions is likely to play a role in defining which habitats are better suitable for growth of different species.
(2) Nutrient uptake and utilization: Throughout the aquatic environment a myriad of gradients of nutrient abundance are formed (Stocker and Seymour, 2012). The cyanobacterial capacity for uptake and utilization of limiting nutrients (e.g., P, N and Fe) is associated with their ecological niches occupancy (Thompson et al., 2013a;Coutinho et al., 2016b;Farrant et al., 2016). Considering that significant associations were detected between the abundance of the analyzed genomes and the nutrients sources (phosphorus and nitrogen), we assume that the diversity and efficiency of their nutrient transporters plays a major role in defining the cyanobacterial affiliation to the proposed ecogenomic groups.
(3) Photosynthetic machinery and efficiency: Cyanobacteria are remarkably diverse when considering their photosynthetic physiology. Species differ with regard their preferred light intensities and wavelengths which affects their photosynthetic efficiency (Moore et al., 1998;Ting et al., 2002). They also can be differentiated regarding their carboxysomes, sub-cellular structures where carbon fixation takes place (Yeates et al., 2008). To our knowledge, no study has consistently compared the photosynthetic yields of all the strains analyzed here, therefore we cannot determine if the proposed ecogenomic groups differ regarding this parameter. Nevertheless, distinctions regarding their requirements for efficient photosynthesis are likely linked to their patterns of niche occupancy.

Ecogenomics, Global Changes, and Cyanobacterial Communities
Over the past two centuries, human development has affected aquatic ecosystems due to nutrient over-enrichment (eutrophication), hydrologic alterations, global warming and ocean acidification. Temperature is one of the most important factors determining the taxonomic composition of marine microbial communities (Sunagawa et al., 2015). Our data shows that temperature is central for regulating the composition and functioning of cyanobacterial communities. Global warming can affect growth rates and bloom potentials of many taxa within this phylum Paerl and Huisman, 2008;Flombaum et al., 2013;Pittera et al., 2014). Niche based models predict an increase in the absolute levels of organisms formerly classified as Prochlorococcus and Synechococcus due to global warming (Flombaum et al., 2013). Consequently, the functioning of the biogeochemical cycles in which these organisms are involved will also be affected . Nevertheless, much less is known regarding how global warming could affect communities of Cyanobacteria aside from these two groups of organisms. The ecogenomic groups identified and their associations with environmental parameters shed light into the potential changes that communities of Cyanobacteria will undergo following global climate changes. Our results indicate that an increase in temperature will lead to decreases in the relative abundances of Low Temperature and Low Temperature Copiotroph groups, while that of High Temperature Oligotroph group increases, especially those of species Eurycolium neptunis, E. ponticus, E. chisholmi, and E. nereus. One major impact of this alteration is a possible effect on the degree of nitrogen fixation mediated by Cyanobacteria, as none of the species assigned to the High Temperature Oligotroph group are known to fix nitrogen (Latysheva et al., 2012). In fact, our data shows that higher temperatures are associated with lower relative abundances of nitrogen fixating Cyanobacteria of the genera Trichodesmium and Anabaena (Zehr, 2011). Both beneficial and deleterious effects of the ocean warming and associated phenomena (e.g., acidification) on the rates of growth and N 2 fixation have been reported Shi et al., 2012;Fu et al., 2014), and recent laboratory and field experiments (Hong et al., 2017) showed that the acidification inhibit growth and N 2 fixation in T. erythraeum IMS101 T due a decrease in cytosolic pH resulting biochemical cost of proton pumping across membranes. Rising temperatures might shift cyanobacterial community composition toward a state were diazotrophs are relatively less abundant. Because nitrogen is often a limiting nutrient to marine primary productivity (Tyrrell, 1999;Moore et al., 2013), alterations in the oceanic levels of nitrogen fixation could affect not only non-diazotrophic Cyanobacteria but also heterotrophic microbes as well as the higher tropic levels that are sustained by microorganisms.
Furthermore, our findings suggest that changes in temperature can affect the contributions of Cyanobacteria to the global carbon pump (Flombaum et al., 2013;Biller et al., 2015). For example, the five strongest positive correlations with temperature between the High Temperature Oligotroph group involve the high-light adapted members of the Eurycolium genus (i.e., strains MIT9312 T , MIT9301 T , MIT9215, MIT9202 T , and AS9601 T ). These are high-light adapted strains that display lower photosynthetic efficiency than their low-light adapted counterparts (Moore et al., 1998;Moore and Chisholm, 1999). Our results suggest that the relative abundance of highlight adapted strains would increase induced by the rising temperatures. In turn, these changes could affect the efficiency of carbon fixation in the ocean, a change that could also be influenced by the alterations in nitrogen fixation mentioned above.

CONCLUSIONS
The present study proposes a first attempt toward integrating taxonomy and ecogenomics, offering a compelling new perspective for the development of Cyanobacteria studies.
Our results show that closely related genomes often share a niche and can be assigned to the same ecogenomic group. End-users of Cyanobacteria taxonomy may benefit from a more reproducible and portable taxonomic scheme. Future studies are needed to expand the evolutionary and physiological basis for the cyanobacterial niche occupancy, integrating other important ecological variables such as phage susceptibility, light utilization strategies, horizontal gene transfer, and inter-species interactions.

AUTHOR CONTRIBUTIONS
All authors contributed to the writing of the manuscript. JW, FC, BD, JS, FT, and CT designed and planned the study. JW and FC performed the bioinformatics analyses, analyzed the results, and compiled the data. All authors approved the final version of the manuscript.

FUNDING
This work was supported by the National Counsel of Technological and Scientific Development (CNPq), Coordination for the Improvement of Higher Education Personnel (CAPES), and Rio de Janeiro Research Foundation (FAPERJ).

ACKNOWLEDGMENTS
This paper is part of the D. Sc. requirements of JW at the Biodiversity and Evolutionary Biology Graduate Program of the Federal University of Rio de Janeiro (UFRJ), and was developed within the Science Without Borders Program (Oceanography and Environmental Impacts Coordination Program/CNPq, process no. 207751/2014-5). Sequence Data: All publicly available sequence data used in this paper was retrieved from the RefSeq (https://www.ncbi.nlm.nih.gov/refseq/) and GenBank, as part of the International Nucleotide Sequence Database Collaboration, and also from the GEBA database, produced by the US Department of Energy Joint Genome Institute (http://www.jgi. doe.gov/) in collaboration with the user community.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb. 2017.02132/full#supplementary-material Figure S1 | Ribosomal phylogenetic reconstruction of the Cyanobacteria phylum. Tree was constructed through ML using the Kimura 2-parameter method, and GTR+G substitution model. Tree was inferred from 110 16S rRNA gene sequences (∼1,400 bp). The species cut-off was 98.8% similarity (Thompson et al., 2015). The percentages of replicate trees in which the associated taxa clustered together in the bootstrap test (1,000 replicates) are shown next to the branches. Nodes supported with a bootstrap of ≥ 50% are indicated. Overwritten T indicates type strain or type species of validly published species to assess their correct phylogenetic assignations. Bold names indicate the additional type strains or type species (only for 16S tree). The unit of measure for the scale bars is the number of nucleotide substitutions per site. Coleofasciculus chthonoplastes PCC 7420 is also called Microcoleus chthonoplastes PCC 7420. Gloeobacter violaceus PCC 7421 sequence was designated as outgroup.   Table S1 | Estimates of genome relatedness of cyanobacterium strains. Values at the matrix indicates the intergenomic distances (i.e., evolutionary divergence between sequences). The numbers of base substitutions per site between sequences are shown. Analyses were conducted accordingly Tamura et al. method. The analysis involved 110 nucleotide sequences. All positions containing gaps and missing data were eliminated. There were a total of 759 positions in the final dataset. Evolutionary analyses were conducted in MEGA6. Table S2 | Details of all cyanobacterial genomes included in this study. Information of other classifications for comparison: NCBI (order and family), numeral identification and genera according to Kozlov et al. (2016), and identification based in curated database CyanoType v.1 (Ramos et al., 2017). Overwritten T indicates type strain or type species.
Data Sheet 1 | Formal description of new genera and species.