Genomic Features for Desiccation Tolerance and Sugar Biosynthesis in the Extremophile Gloeocapsopsis sp. UTEX B3054

For tolerating extreme desiccation, cyanobacteria are known to produce both compatible solutes at intracellular level and a copious amount of exopolysaccharides as a protective coat. However, these molecules make cyanobacterial cells refractory to a broad spectrum of cell disruption methods, hindering genome sequencing, and molecular studies. In fact, few genomes are already available from cyanobacteria from extremely desiccated environments such as deserts. In this work, we report the 5.4 Mbp draft genome (with 100% of completeness in 105 contigs) of Gloeocapsopsis sp. UTEX B3054 (subsection I; Order Chroococcales), a cultivable sugar-rich and hardly breakable hypolithic cyanobacterium from the Atacama Desert. Our in silico analyses focused on genomic features related to sugar-biosynthesis and adaptation to dryness. Among other findings, screening of Gloeocapsopsis genome revealed a unique genetic potential related to the biosynthesis and regulation of compatible solutes and polysaccharides. For instance, our findings showed for the first time a novel genomic arrangement exclusive of Chroococcaceae cyanobacteria associated with the recycling of trehalose, a compatible solute involved in desiccation tolerance. Additionally, we performed a comparative genome survey and analyses to entirely predict the highly diverse pool of glycosyltransferases enzymes, key players in polysaccharide biosynthesis and the formation of a protective coat to dryness. We expect that this work will set the fundamental genomic framework for further research on microbial tolerance to desiccation and to a wide range of other extreme environmental conditions. The study of microorganisms like Gloeocapsopsis sp. UTEX B3054 will contribute to expand our limited understanding regarding water optimization and molecular mechanisms allowing extremophiles to thrive in xeric environments such as the Atacama Desert.

For tolerating extreme desiccation, cyanobacteria are known to produce both compatible solutes at intracellular level and a copious amount of exopolysaccharides as a protective coat. However, these molecules make cyanobacterial cells refractory to a broad spectrum of cell disruption methods, hindering genome sequencing, and molecular studies. In fact, few genomes are already available from cyanobacteria from extremely desiccated environments such as deserts. In this work, we report the 5.4 Mbp draft genome (with 100% of completeness in 105 contigs) of Gloeocapsopsis sp. UTEX B3054 (subsection I; Order Chroococcales), a cultivable sugar-rich and hardly breakable hypolithic cyanobacterium from the Atacama Desert. Our in silico analyses focused on genomic features related to sugar-biosynthesis and adaptation to dryness. Among other findings, screening of Gloeocapsopsis genome revealed a unique genetic potential related to the biosynthesis and regulation of compatible solutes and polysaccharides. For instance, our findings showed for the first time a novel genomic arrangement exclusive of Chroococcaceae cyanobacteria associated with the recycling of trehalose, a compatible solute involved in desiccation tolerance. Additionally, we performed a comparative genome survey and analyses to entirely predict the highly diverse pool of glycosyltransferases enzymes, key players in polysaccharide biosynthesis and the formation of a protective coat to dryness. We expect that this work will set the fundamental genomic framework for further research on microbial tolerance to desiccation and to a wide range of other extreme environmental conditions.

INTRODUCTION
The Atacama Desert is the driest warm desert on Earth (Houston and Hartley, 2003;Hartley et al., 2005). Located in Northern Chile, for many years it was thought to be a sterile territory, unable to give shelter to any kind of living organism (McKay et al., 2003;Navarro-González et al., 2003). We have recently learned that in the Atacama Desert the occasional water inputs coming from the coastal fog and dew sustain the scarce microbial life thriving under its characteristic extreme environmental conditions (Houston and Hartley, 2003;McKay et al., 2003).
Microbial life has developed a variety of physical and molecular strategies to overcome the high solar radiation and temperatures, as well as to maximize the efficiency in the use of the low amount of water available. Indeed, most microbial life in deserts is somehow associated to rocks, developing either within or underneath them (Chan et al., 2012;Pointing and Belnap, 2012;Cowan et al., 2014;Davila et al., 2015;Wierzchos et al., 2015). These microbial communities are dominated by primary producers, represented mainly by morphological and metabolically diverse cyanobacteria (Pointing et al., 2009;Wong et al., 2010;Wierzchos et al., 2015;Wei et al., 2016). Most of these desert cyanobacteria produce copious extracellular structures, a feature that is thought to constitute both the architectural and metabolic basis for the microbial community and its tolerance to extreme environmental conditions (Knowles and Castenholz, 2008;Colica et al., 2014;Rossi and De Philippis, 2015).
Specifically, cyanobacteria organized in packet-like structures such as Chroococcidiopsis and Gloeocapsa, dominate lithic communities found in warm deserts (Warren-Rhodes et al., 2006;Bahl et al., 2011;Chan et al., 2012;Wierzchos et al., 2015;Crits-Christoph et al., 2016). Although these microorganisms have been identified by microscopy and by 16S-rDNA surveys, a significant barrier to progress in the study of cyanobacteria in general has been the difficulties encountered in obtaining axenic cultures, as well as the presence of copious exopolysaccharide (EPS) that hinders sequencing of their genomes (Tillett and Neilan, 2000;Chrismas et al., 2016). To illustrate the latter, only 1,110 from a total of 76,299 genomes available in the Integrated Microbial Genomes and Microbiomes (IMG/JGI) database correspond to cyanobacteria. Solely the marine genus Prochlorococcus concentrates a 55.85% of the cyanobacterial genomes already available. Moreover, merely three genomes correspond to cyanobacteria isolated from desert environments, all of which are filamentous.
An increasing body of knowledge supports the fundamental role that compatible solutes (Hershkovitz et al., 1991;Hill et al., 1994Hill et al., , 1997Sakamoto et al., 2009;Yoshida and Sakamoto, 2009;Klähn and Hagemann, 2011) and EPS (Grilli Caiola et al., 1993Hill et al., 1997;Tamaru et al., 2005;Knowles and Castenholz, 2008;Mager and Thomas, 2011) might play in desiccation tolerance in cyanobacteria. However, comprehensive genomic analyses of the mechanisms for tolerating extreme desiccation in unicellular cyanobacteria are still missing. In that sense, we decided to sequence and to study the genome of Gloeocapsopsis sp. UTEX B3054, a unicellular cyanobacterium that we demonstrate belongs to the Chroococcaceae family, and which possesses few known cultivable and sequenced representatives. This strain was obtained by cell-sorting from Gloeocapsopsis sp. AAB1 culture, an enrichment initially collected from a quartz rock in the Atacama Desert and described to be extremely tolerant to desiccation (Azúa-Bustos et al., 2014). Besides improving the genome coverage of this family of cyanobacteria, our study aimed to predict and analyze the genomic mechanisms likely associated to the desiccation tolerance of Gloeocapsopsis sp. UTEX B3054. In particular, we focused on identifying the genetic potential and genomic mechanisms likely involved in the biosynthesis of compatible solutes and EPS, molecules that play a key role in microbial tolerance to dryness.

Strain Isolation and DNA Extraction
The strain used in this study, Gloeocapsopsis sp. UTEX B3054, was obtained from the non-axenic Gloeocapsopsis sp. AAB1 enrichment culture initially isolated from the Atacama Desert (Azúa-Bustos et al., 2014). To massively eliminate contaminant heterotrophic bacteria, a single cyanobacterial cell was sorted into BG11 media using an Influx Mariner Cell Sorter (Cytopeia, Seattle, WA, United States). Chlorophyll-containing cells were detected based on red fluorescence (692-40 nm; fluorescence filters are specified here by the wavelength of maximum transmission and spectral width of bandpass) excited with a 488 nm laser, while triggering was based on side light scatter (SSC) to allow the exclusion of non-fluorescent cells. The clone of Gloeocapsopsis sp. UTEX B3054 culture was deposited in the UTEX Culture Collection of Algae under the accession code UTEX B3054, and it is publicly available.
Several treatments were further implemented in order to mechanically, chemically and enzymatically destabilize cyanobacterial aggregates, to selectively eliminate sugar molecules, and to eliminate persistent accompanying heterotrophic bacteria in culture. The first treatment was applied to healthy cultures in mid-active growth and was based on a previously reported protocol for sugar-rich cells (Sharma et al., 2002), with several modifications detailed in Supplementary Figure S1. In order to ensure degradation of remaining non-cyanobacterial DNA, the pellet of cyanobacterial cells obtained after the first treatment was resuspended in 100 µl of sterile water containing four units of DNase I (Invitrogen) and incubated at 37 • C for 1 h. The enzymatic reaction was stopped at 65 • C using the DNase Stop solution (Invitrogen), and the pellet of cyanobacterial cells was three-time washed with sterile water. The last treatment aimed at breaking up cyanobacterial cells and to finally extract cyanobacterial DNA, and was based on the protocol described by Tillett and Neilan (2000) that was slightly modified adding a mechanical cell disruption step (glass beads and beadbeater) prior to DNA extraction.

Genome Sequencing, Assembly, and Annotation
Gloeocapsopsis sp. UTEX B3054 DNA was used for genome sequencing with Illumina MiSeq. The data were analyzed using the Illumina pipeline 1.4.0 to generate fastq files. The raw sequences were cleaned of barcode, the quality was checked with FastQC software (Andrews, 2010) and reads were processed with Trimmomatic (Bolger et al., 2014). The trimmed reads were de novo assembled into 25,153 contigs with a total length of 18,225,500 bp using Spades v.3.9.0 (Bankevich et al., 2012). Contigs below 3,000 bp were discarded from the final pool of sequences since their closest nucleotide identity match was non-cyanobacterial followed by a second de novo assembly using only corresponding trimmed reads from contigs >3,000 bp. In this assembly, all contigs presenting low levels of depth sequencing matched to DNA sequences from heterotrophic bacteria and were therefore eliminated. CheckM tool v1.0.5 was used to calculate the completeness and quality (contamination grade) of the obtained genome (Parks et al., 2015). Closest available genomes were analyzed by Tetra Correlation Search (TCS) and by average nucleotide identity (ANI) using JSpecies tool (Richter et al., 2015).
Genome automatic annotation was carried out using the PROKKA v.1.11 software (Seemann, 2014), and it was submitted to JGI IMG/ER for annotation (GOLD Analysis Project ID: Ga0181813). Genome sequence and annotation data are available at the JGI IMG/ER database (GOLD Taxon ID: 2756170284; Sequencing Project: Gp0208497 and Analysis Project ID: Ga0181813). Further manual annotation was performed for a limited set of genes of interest, namely those involved in the biosynthesis of compatible solutes sucrose and trehalose, and envelope related genes. The search for these genes was carried out using: the automatic annotation results as query, but complementing them by performing BLASTP search (Altschul et al., 1997) for sequence homology using amino acid sequences of characterized bacterial orthologs of genes of interest (threshold used: e-value < e-10 and bitscore > 30). Besides, all putative ortholog genes found in Gloeocapsopsis's genome were analyzed in the InterProScan database to corroborate the presence of specific functional domains. In the case of glycosyltransferase encoding genes, the classification into different sequence based-families was performed by following the classification available at the CAZy (Carbohydrate-Active enZymes) website (Coutinho et al., 2003). Finally, all the putative genes of interest found in Gloeocapsopsis sp. UTEX B3054 were used as query for a bidirectional BLASTP search against related cyanobacterial genomes (threshold used: e-value < 1e-15 and bit score > 100). In particular, in the case of genes putatively associated to trehalose biosynthesis and transport, automatic annotation identified several putative orthologs but with low levels of amino acid sequence identity (<35%). Thus, in order to confirm the protein homology, an additional three-dimensional protein structure prediction was performed in the Phyre2 webserver (Kelly et al., 2015), aiming to infer structural homology of the predicted proteins where certain functional domains were absent and/or when sequence identity was below 35%.

Genome Analysis
The phylogenomic species tree was generated by a concatenation of amino acid sequences of 43 cyanobacterial marker genes (Parks et al., 2015). These sequences were extracted from Gloeocapsopsis sp. UTEX B3054 genome and other 74 cyanobacterial genomes available on the NCBI database by using the CheckM tool v1.0.5 (Parks et al., 2015). In the case of the sugABC tree, sequences were also retrieved from the NCBI database and corresponded to 33 different genomes, including Gloeocapsopsis sp. UTEX B3054. Sequences were aligned using the MUSCLE v6.0.98 software (Edgar, 2004). Maximum-likelihood trees were generated with the IQtree v.1.5.5 software, with non-parametric ultrabootstrap support of 10,000 replicates (Hoang et al., 2017). For the multilocus tree as well as for TreH and SugABC trees, the best evolutionary substitution models were selected from sequence alignments with the Modelfinder option contained in IQTree Software.
Analyses of potential horizontal gene transfer were performed using the HGTector software (Zhu et al., 2014), considering 50% of amino acid identity and 80% of sequence coverage for BLASTP results. By using the AntiSMASH 4.2.0 web server (Blin et al., 2017), general secondary metabolite biosynthetic pathways were found. The search included ClusterFinder and whole genome PFAM analysis.

Obtaining a Purified and Non-degraded Cyanobacterial DNA Suitable for Genome Sequencing
For retrieving a high-quality DNA (260/280 ratio > 1.8) amenable for genome sequencing of the sugar-rich Gloeocapsopsis, it was necessary to massively eliminate heterotrophic bacteria for obtaining a pure and clonal culture, which was performed by single cell sorting coupled to flow cytometry. However, even after subsequent streaking and plating on agar, the culture remained not axenic (not shown). Thereafter, a three-step procedure was developed, which aimed at (1) eliminating heterotrophic bacteria and destabilizing cyanobacterial aggregates; (2) eliminating of remaining heterotrophic bacterial sequences; and (3) finally breaking up of cyanobacterial cells to extract a pure cyanobacterial DNA (Supplementary Figure S2). The final step for DNA extraction allowed to effectively break cyanobacterial cells and to selectively eliminate remnant sugar molecules. This protocol led to a sample of non-degraded DNA with an evident dominance of cyanobacterial DNA suitable for sequencing (Supplementary Figures S2B,C).

Genome Features and Phylogeny
After DNA sequencing, bioinformatics filtering steps discarded remaining non-cyanobacterial sequences (reflected in two discrete peaks of G+C content, at 43 and 68%), allowing us to differentially assemble the cyanobacterial genome with a completeness of 100 and 1.4% of sequence contamination, as detected by CheckM tool. The resulting genome size comprises 5,478,916 bp, in 105 contigs ranging from 3,290 to 356,189 bp. The N 50 corresponds to 90,437 bp, with final genome coverage of 138.33X. The G+C content is 42.41%, represented in a single peak. Genome size and G+C content are similar to those of Gloeocapsa sp. PCC7438 (5.9 Mb and 42.45%, respectively), the closest sequenced cyanobacterium to Gloeocapsopsis sp. UTEX B3054. Automatic genome annotation with PROKKA software resulted in a total of 5,165 protein-coding sequences (CDS), 40 tRNAs, 3 rRNAs, and 3 CRISPR arrays. Among the former, 1,767 sequences correspond to hypothetical proteins. General features of the Gloeocapsopsis's genome in comparison to other fully sequenced genomes of cyanobacteria are shown in Table 1.
Phylogenomic analysis indicated that Gloeocapsopsis sp. UTEX B3054 formed a distinct clade with unicellular cyanobacteria Gloeocapsa sp. PCC7428 and Chroogloeocystis siderophila NIES-1031 (100% of node support and ∼82% of ANI) (Figure 1, in the gray box). The Chroococcaceae family (Order Chroococcales) represented here by Gloeocapsopsis sp. UTEX B3054 is well separated from, but basally related to, the also unicellular Chroococcidiopsis thermalis PCC7203 (97% of node support and 70.40% of ANI) (Figure 1). Our phylogenetic tree confirms the multicellular origin of these unicellular cyanobacteria.
Analysis with the HGTector software indicated that the number of potential CDS horizontally transferred was about 200, being the following the main bacterial order donors: Bacillales (15%), Ktedonobacterales (7%), Myxococcales (7%), and Rhizobiales (6%). Of these genes, 42% possess unknown functions, 35% are related to known metabolic pathways, and 16.5% are involved in information storage and processing. Genes related to the metabolism and transport of secondary metabolites, lipids, and saccharides are the most recurrent (Supplementary Table S1). Furthermore, 38 clusters encoding putative pathways of secondary metabolites were identified by the ClusterFinder method. Of them, 13 gene clusters might be associated with saccharide biosynthesis. Of particular interest are eight gene clusters found with the antiSMASH software, which encode for presumed biosynthetic pathways for bacteriocin (1), terpenes (1), aminoglycoside-aminocyclitol (1), polyketide synthase (PKS) (1), non-ribosomal peptide synthesis (NRPS) (1), and NRPS-PKS hybrids (3). Noteworthy, one of the latter is particularly extended (72.8 kbp), comprising almost 50 genes (Contig00007; location 61,046-133,879 nt). No complete homolog of this gene cluster was found in the IMG/JGI database, supporting its uniqueness. Overall, these eight genes clusters represent 4.92% (269,747 bp) of the whole genome of Gloeocapsopsis sp. UTEX B3054.

Compatible Solutes: Sucrose and Trehalose Biosynthesis
Two genes involved in the biosynthesis of sucrose were found in Gloeocapsopsis sp. UTEX B3054 genome. The 2,420-bp gene of the sucrose-6-phosphate synthase enzyme (gene locus ID: BWI75_00738) was located in a genomic region devoid of other genes related to sugar metabolism. Its predicted amino acid sequence possesses the two characteristic enzymatic domains of sucrose synthase enzymes (InterProScan entries: IPR000368 and IPR001296), indicating that this protein belongs to the family 1 of glycosyltransferases (GTs). In contrast, the 806-bp gene of the sucrose-6-phosphate phosphatase enzyme (gene locus ID: BWI75_01845) was located upstream a gene cluster likely associated to sugar metabolism (gene locus ID's: BWI75_01850 to BWI75_01856) containing three glycosyltransferase genes  in tandem. Its predicted amino acid sequence contains a sucrose phosphatase domain (InterProScan entry: IPR006380) and a haloacid dehydrogenase HAD-like domain (InterProScan entry: IPR023214). In turn, two putative genes encoding trehalose synthase protein TreS were identified (gene loci IDs: BWI75_02890 and BWI75_05160), sharing more than 50% of amino acid identity with a TreS of the desiccation-tolerant cyanobacterium Leptolyngbya ohadii (Murik et al., 2017). With 100% of confidence, three-dimensional predictions by the structure modeler Phyre2 indicated that BWI75_02890 and BWI75_05160 share 36 and 40% of sequence identity, respectively, with the trehalose synthase protein from Mycobacterium smegmatis, one of the most characterized trehalose synthase enzyme in the bacterial world (Ruhal et al., 2013). However, no homologous genes of treZ (maltooligosyl-trehalose trehalohydrolase enzyme) neither treY (maltooligosyl-trehalose synthase) were identified in Gloeocapsopsis sp. UTEX B3054 genome.
An ortholog of the Anabaena sp. PCC7120 trehalase (treH) gene was also identified in Gloeocapsopsis genome (gene locus ID: BWI75_03170), sharing 63.87% of amino acid identity (alignment parameters: e-value 0.0; score 584). With 100% of confidence, three-dimensional predictions by the structure modeler Phyre2 indicated that BWI75_03170 shares 37% of sequence identity with the enzyme characterized as a neutral trehalase from the yeast Saccharomyces cerevisiae, Nth1. The phylogenetic reconstruction showed an early divergence of trehalase proteins from the Chroococcaceae family in comparison with related cyanobacteria like Nostocales (Figure 2). Interestingly, only in the Chroococcaceae family, treH gene is inserted in a conserved gene cluster comprising an ABC transporter system sugABC (gene loci IDs: BWI75_03173; BWI75_03172; BWI75_03171, respectively) and a sugar binding protein located upstream of treH (gene locus ID: BWI75_ 03174), as well as a homoserine kinase (gene locus ID: BWI75_ 03175) (Figure 3).

Envelope Related Genes: EPS Biosynthesis and Export
A genomic screening was conducted in order to identify all the putative glycosyltransferase (GT) proteins encoded by Gloeocapsopsis sp. UTEX B3054 and related cyanobacteria. A thorough search revealed that Gloeocapsopsis sp. UTEX B3054 possesses at least 129 genes encoding GTs (Table 1), a similar number to its closest evolutionary relative Gloeocapsa sp. PCC7428. C. thermalis PCC7203, Gloeocapsa sp. PCC7428 and Gloeocapsopsis sp. UTEX B3054 possess the highest GTs/CDS ratio (  FIGURE 3 | Organization of the sugA-sugB-sugC locus in cyanobacteria. Maximum-likelihood tree was generated with IQTree Software based on the evolutionary substitution model LG+I+G4. Only cyanobacteria from Chroococcaceae family have the treH flanking the 3 end of sugA-sugB-sugC genes. Note also that in this family an homoserine kinase and an ABC transport binding protein are located at the 5 end of this gene cluster. Tables S2-S6). Many GT genes were found inserted within genomic clusters associated to sugar metabolism, such as in contig PWI75_000047 (Figure 4), which contains 10 GTs and additional genes associated to sugar metabolism, whose orthologs in Anabaena sp. PCC7120 are up-regulated under desiccation (Yoshimura et al., 2007).

GT2 families (Supplementary
The final step in extracellular polysaccharide biosynthesis comprises the assembly and translocation of the polysaccharide chains to the extracellular space. Three main pathways may carry out this step, which are the Wzy-dependent, the ABCtransporter dependent and the synthase dependent pathways (Pereira et al., 2009(Pereira et al., , 2015Kehr and Dittmann, 2015). In Gloeocapsopsis sp. UTEX B3054, orthologs of genes associated with the three pathways for polysaccharide export were found ( Table 2). In total, 223 genes representing 6.89% of the Gloeocapsopsis sp. UTEX B3054 genome are included in the cluster of orthologous gene (COG) category "Cell wall/membrane/envelope biogenesis"(not shown).
FIGURE 4 | Gene cluster likely associated to desiccation tolerance mechanisms. This gene cluster contains 10 genes encoding for glycosyltransferases whose orthologs in Anabaena demonstrated to be overexpressed under desiccation conditions (Yoshimura et al., 2007). Gene locus: PWI75_04273 to PWI75_04293.

DISCUSSION
As in many cultivable cyanobacteria, Gloeocapsopsis's axenity turned out as one of the major challenge. Initial attempts in our laboratory to sequence the Gloeocapsopsis genome confirmed the persistence of heterotrophic bacteria in the culture that prevented effective cyanobacterial genome sequencing. Thus, in a first effort for sequencing the AAB1 enrichment using 454 pyrosequencing technology, of a total of 19.8 Mbp obtained, a strikingly 95.5% corresponded to heterotrophic bacteria belonging to 12 different genera. In the present study, single cell sorting allowed us to overcome this situation, recovering a less contaminated clonal culture at least to a workable degree.
Our results demonstrated that all the technical efforts implemented in this study to effectively and to successfully deteriorate the extracellular polysaccharide structure to which heterotrophic bacteria are attached, reinforced the notion that Gloeocapsopsis sp. UTEX B3054 envelope is sugarrich and impressively hard-to-break: cells were refractory to mechanical, physical, chemical, and biological disruption methods. Therefore, in molecular studies aiming at unveiling the microbial diversity and metabolic properties of extremophile microbes, the DNA extraction procedure should be of major concern (Lever et al., 2015).
The implementation of a three-step procedure allowed to differentially extract cyanobacterial DNA, obtaining a highquality material for sequencing. This final protocol avoided the co-precipitation of sugars and contaminant DNA with the desired cyanobacterial DNA, which were likely interfering with DNA amplification by PCR, cloning efficiency, and further genome sequencing and assembly (Angeloni and Potts, 1987;Billi et al., 1998;Fiore et al., 2000;Tillett and Neilan, 2000;Chrismas et al., 2016). We estimate that this protocol might be adapted to other hard-to-lyse and sugar-rich non-axenic cyanobacteria.
Noteworthy, concomitant with the submission of this report, another group working with the original contaminated culture of Gloeocapsopsis released its genome (Puente-Sanchez et al., 2018). Although time-consuming, our cell isolation and DNA extraction protocols assisted us to obtain better genomic data, i.e., we obtained less contigs (105 vs. 137), and our shortest contig possesses 3,290 bp (vs. only 137 bp), whereas the longest contains 356,189 bp (vs. 250,842 bp). The N 50 parameter of our assembly resulted to be higher (90,347 vs. 73,596). Moreover, due the fact that cyanobacteria possess genomes characterized by extended repetitive regions (Mazel et al., 1990;Asayama et al., 1996;Elhai et al., 2008), a customized DNA extraction protocol able to ensure high-quality DNA molecules is critical for facilitating the bioinformatics processing.
It is already known that cyanobacteria accumulate sucrose and trehalose not only in the cytoplasm, but in the extracellular matrix as a mechanism for tolerating desiccation (Sakamoto et al., , 2011Yoshida and Sakamoto, 2009;Azúa-Bustos et al., 2014;Murik et al., 2017). The genomic analysis of Gloeocapsopsis sp. UTEX B3054 revealed the presence of genes for sucrose and trehalose biosynthesis. Two forms of the enzyme sucrose phosphate synthase (SPS) have been found in cyanobacteria (Blank, 2013), and Gloeocapsopsis only encodes for an ortholog of the filamentous-related SPS, which possesses its characteristic single functional domain for glycosyl transferase.
Our genomic analysis revealed the presence of only one set of genes related to the biosynthesis of trehalose (the trehalose synthase genes) and the absence of treY/treZ genes. In silico analysis predicted that trehalose synthase likely mediates the biosynthesis of trehalose in this cyanobacterium, whose treS homologs were transcriptionally up-regulated in the desiccationtolerant L. ohadii subjected to simulated desiccation conditions (Murik et al., 2017). In both Anabaena sp. PCC7120 and N. punctiforme, trehalose concentration is modulated by TreH ), a trehalase enzyme which is encoded by a gene located within a gene cluster with maltooligosyl trehalose hydrolase (treY) and maltooligosyl trehalose synthase (treZ) genes (Higo et al., 2006;Yoshida and Sakamoto, 2009). However, only in Chroococcaceae genomes, the treH gene was found near a putative sugar ABC-transporter gene cluster, suggesting the existence of a conserved mechanism for trehalose regulation specific for this family.
The ABC transporter associated with treH gene of Gloeocapsopsis sp. UTEX B3054 possesses a conserved genomic arrangement and amino acid identity with the functionally described transporter LipqY-SugABC of Mycobacterium tuberculosis (27.00, 37.13, 38.70, and 42.93%, respectively). In M. tuberculosis this transporter possesses high specificity to trehalose and in conjunction with its periplasmic sugarbinding protein, works as an efficient retrograde recycling system for the disaccharide, participating actively in trehalose uptake and extracellular coat biosynthesis (Kalscheuer et al., 2010). Therefore, we hypothesize that cyanobacteria from Chroococcaceae family possess a novel retrograde recycling system for regulating trehalose concentration that might play a crucial role in preserving cells alive during extreme desiccation.
The most recent phylum-wide genomic study concerning EPS biosynthesis (Pereira et al., 2015) deliberately disregarded the study of GTs, in spite of the fact that they are thought to play a central role in bacterial polysaccharide biosynthesis (Schmid et al., 2016). These enzymes have been associated to envelope biosynthesis, including the EPS, the lipopolysaccharide and peptidoglycan, as well as to the glycosylation of membrane lipids and the biosynthesis of secondary metabolites, among other processes (Campbell et al., 1997;Coutinho et al., 2003).
The genome-wide analysis of GTs carried out in this work represents the first effort predicting the entire and diverse pool of GTs in cyanobacterial genomes. The whole classification of GTs is so far incomplete and might be improved as more GTs families become characterized. The information already available regarding envelope-related GTs in cyanobacteria is reduced and fractioned, and possibly overlooks an enormous diversity of putative GTs due to their large number and complexity in nature (Yang et al., 2007). The percentage of total encoded proteins that might be exclusively devoted to the synthesis of glycosidic bonds in Gloeocapsopsis resulted to be higher than the 1 to 2% of the total encoded proteins estimated for other genomes, whether archaeal, bacterial, or eukaryotes ( Table 1; Lairson et al., 2008). Furthermore, Gloeocapsopsis sp. UTEX B3054 encodes ortholog genes of the three pathways already described in cyanobacteria for EPS transport and export (Pereira et al., 2015), suggesting that this cyanobacterium might possess the whole molecular machinery for translocation and export of polysaccharides to the extracellular space.
Several identified GTs were inserted within gene clusters comprising other genes related to sugar biosynthesis, as well as several gene clusters associated with saccharide biosynthesis related to secondary metabolites. For instance, the glycosyltransferase-rich gene cluster found in contig BWI75_000047 comes out as an attractive cluster for functional studies of desiccation tolerance mechanisms. This cluster contains genes whose corresponding orthologs are up-regulated in Anabaena sp. PCC7120 under desiccation (Yoshimura et al., 2007). Moreover, Gloeocapsopsis sp. UTEX B3054 possesses an ortholog of the RNA polymerase sigma factor SigJ (gene locus ID: BWI75_ 01223), described as a key regulator of desiccation tolerance in Anabaena sp. PCC7120 (68.58% of sequence identity) (Yoshimura et al., 2007). The longest gene cluster (Contig00007; location 61,046-133,879 nt) found by antiSMASH software came out also as a very interesting target for further functional studies: an PKS-saccharide-NRPS hybrid cluster comprising 72.8 kbp of unique genetic information that might potentially represent a new mechanism associated to the high tolerance this cyanobacterium possesses to extreme environmental conditions.

CONCLUSION
The combination of single cell sorting and a customized multistep DNA extraction protocol ensured the sequencing of the Gloeocapsopsis sp. UTEX B3054 genome. Technical difficulties encountered extracting nucleic acids confirmed the complexity of the extracellular matrix and the enormous content of sugar in this strain. Cells demonstrated to be hardly resistant to a wide spectrum of disruptive methods, highlighting the outstanding physicochemical properties of their protective coat.
The in silico analysis revealed the genetic potential to deal with water scarcity specific for Gloeocapsopsis and its relatives. Major efforts should be focused on deciphering the role of sugars during desiccation and specifically, the functional role that trehalase coupled to the ABC transporter might play controlling trehalose concentrations at both intra and extracellular levels. Moreover, the diversity of GTs found in this work suggests that the molecular complexity of the polysaccharide matrix might be potentially enormous.
We hope that the whole genomic framework provided in this work will help to untangle the sugar composition and structural arrangements of the cyanobacterial extracellular matrix, whose rheological properties seem to be critical for the retention of the meager water in Atacama.

AUTHOR CONTRIBUTIONS
CU, RV, and BD conceived the study. CU and LS conducted the experimental work and DNA extraction protocol. CU extracted the genomic DNA for sequencing under the guidance of MV, RV, and BD. BD planned the genome sequencing. MP conducted the genome sequencing. CU and JA conducted the genomic study under the supervision of BD. CU, JA, MV, RV, and BD analyzed the data. CU, RV, and BD wrote the manuscript. All authors read and commented on the drafted manuscript.

FUNDING
This work was partially supported by grants from FONDECYT (1110597, 1150171, and 1161232) and the Millennium Institute for Fundamental and Applied Biology (MIFAB). CU was funded by the doctoral fellowship from CONICYT and also by Beca Gastos Operacionales CONICYT 21110394. The funders had no role in the study design, data collection and analysis, decision to publish or preparation of the manuscript.