Toxic/Bioactive Peptide Synthesis Genes Rearranged by Insertion Sequence Elements Among the Bloom-Forming Cyanobacteria Planktothrix

It has been generally hypothesized that mobile elements can induce genomic rearrangements and influence the distribution and functionality of toxic/bioactive peptide synthesis pathways in microbes. In this study, we performed in depth genomic analysis by completing the genomes of 13 phylogenetically diverse strains of the bloom-forming freshwater cyanobacteria Planktothrix spp. to investigate the role of insertion sequence (IS) elements in seven pathways. Chromosome size varied from 4.7–4.8 Mbp (phylogenetic Lineage 1 of P. agardhii/P. rubescens thriving in shallow waterbodies) to 5.4–5.6 Mbp (Lineage 2 of P. agardhii/P. rubescens thriving in deeper physically stratified lakes and reservoirs) and 6.3–6.6 Mbp (Lineage 3, P. pseudagardhii/P. tepida including planktic and benthic ecotypes). Although the variation in chromosome size was positively related to the proportion of IS elements (1.1–3.7% on chromosome), quantitatively, IS elements and other paralogs only had a minor share in chromosome size variation. Thus, the major part of genomic variation must have resulted from gene loss processes (ancestor of Lineages 1 and 2) and horizontal gene transfer (HGT). Six of seven peptide synthesis gene clusters were found located on the chromosome and occurred already in the ancestor of P. agardhii/P. rubescens, and became partly lost during evolution of Lineage 1. In general, no increased IS element frequency in the vicinity of peptide synthesis gene clusters was observed. We found a higher proportion of IS elements in ten breaking regions related to chromosomal rearrangements and a tendency for colocalization of toxic/bioactive peptide synthesis gene clusters on the chromosome.

It has been generally hypothesized that mobile elements can induce genomic rearrangements and influence the distribution and functionality of toxic/bioactive peptide synthesis pathways in microbes. In this study, we performed in depth genomic analysis by completing the genomes of 13 phylogenetically diverse strains of the bloomforming freshwater cyanobacteria Planktothrix spp. to investigate the role of insertion sequence (IS) elements in seven pathways. Chromosome size varied from 4.7-4.8 Mbp (phylogenetic Lineage 1 of P. agardhii/P. rubescens thriving in shallow waterbodies) to 5.4-5.6 Mbp (Lineage 2 of P. agardhii/P. rubescens thriving in deeper physically stratified lakes and reservoirs) and 6.3-6.6 Mbp (Lineage 3, P. pseudagardhii/P. tepida including planktic and benthic ecotypes). Although the variation in chromosome size was positively related to the proportion of IS elements (1.1-3.7% on chromosome), quantitatively, IS elements and other paralogs only had a minor share in chromosome size variation. Thus, the major part of genomic variation must have resulted from gene loss processes (ancestor of Lineages 1 and 2) and horizontal gene transfer (HGT). Six of seven peptide synthesis gene clusters were found located on the chromosome and occurred already in the ancestor of P. agardhii/P. rubescens, and became partly lost during evolution of Lineage 1. In general, no increased IS element frequency in the vicinity of peptide synthesis gene clusters was observed. We found a higher proportion of IS elements in ten breaking regions related to chromosomal rearrangements and a tendency for colocalization of toxic/bioactive peptide synthesis gene clusters on the chromosome.

INTRODUCTION
Cyanobacteria are an old form of life on earth, which occur in a wide range of aquatic and terrestrial habitats. They made a tremendous impact on the evolution of life on our planet because of oxygenic photosynthesis, and they are considered the ancestors of chloroplasts in plants. To date, cyanobacteria still contribute to global primary production, that is fixing a substantial amount of carbon (Garcia-Pichel et al., 2003) and nitrogen (Zehr et al., 2008). However, under favorable conditions, benthic and planktonic cyanobacteria multiply rapidly, and they may form dense harmful algal blooms (HABs) in aquatic ecosystems causing toxic effects in plants, invertebrates, and vertebrates, including humans and livestock. An increasing number of bloom-forming cyanobacterial species can produce over 1,100 distinct secondary metabolites (SM) (Dittmann et al., 2015) primarily via three types of biosynthetic machinery, namely, non-ribosomal peptide synthetases (NRPS), polyketide synthases (PKS), and ribosomally synthesized and post-translationally modified peptide (RiPPs) synthesis pathways (Arnison et al., 2013;Calteau et al., 2014;Kurmayer et al., 2016). Freshwater filamentous cyanobacteria Planktothrix are one of the major cyanotoxin and bioactive peptide producers, and they may serve as a niche constructor at ecosystem scale . The species assigned to the genus Planktothrix, for example, P. agardhii and P. rubescens, are recorded from toxic cyanobacterial blooms frequently (Fastner et al., 1999;Willame et al., 2005). Planktothrix produce various bioactive peptide families, including hepatotoxic microcystins (MCs) (Christiansen et al., 2003;Briand et al., 2008;Metcalf and Codd, 2012), aeruginosins (Ishida et al., 2007;Kohler et al., 2014), anabaenopeptins (Itou et al., 1999;Christiansen et al., 2011), cyanopeptolins (Grach-Pogrebinsky et al., 2003;Rounge et al., 2007), microviridins (Shin et al., 1996;Philmus et al., 2008), prenylagaramides (Donia and Schmidt, 2011), and microginins (Rounge et al., 2009;Pancrace et al., 2017). The gene clusters for aeruginosins, anabaenopeptins, cyanopeptolins, microginins, and MCs consist of genes encoding NRPS and/or PKS that follow a stepwise synthesis pathway using either amino acids (NRPS) or acetyl-coenzyme A (PKS) as substrate. RiPPs are formed ribosomally through a precursor peptide (consisting of a leader peptide and a core peptide), which is post-translationally modified. Microviridins and prenylagaramides constitute two cyanobactin peptide families produced by Planktothrix via the RiPP pathway. Structural diversity of the cyanobactins is additionally achieved by post-translational modification, such as heterocyclization, oxidation, prenylation, and epimerization (Sivonen et al., 2010). Although they are quite diverse in size, composition, and arrangement, all cyanobactin biosynthesis gene clusters share genes encoding two proteases for precursor peptide cleavage and cyclization and accessory proteins for post-translational modification (Sivonen et al., 2010;Arnison et al., 2013).
Previous research has revealed that the genetic basis of MC synthesis is frequently influenced by certain insertion sequence (IS) elements (Christiansen et al., 2006Chen et al., 2016;Pancrace et al., 2017). IS elements constitute the most number of mobile elements in prokaryotes, exhibiting a typical genetic structure with terminal inverted repeats, direct repeat target sequences, and an encoded transposase. IS elements may play a key role in genomic plasticity, that is by genomic rearrangement through homologous recombination in prokaryotes (Schneider et al., 2000;Zhou et al., 2008;Koonin and Wolf, 2010;Larsson et al., 2011). IS elements mutate the DNA sequence by four mechanisms during cell cycle amplification or other selective stress-induced processes: cut/paste, copy/paste, peel/paste, or co-integrate. IS elements can influence cyanobacterial SM biosynthesis gene clusters through insertion and subsequent deletion (Christiansen et al., 2006;Entfellner et al., 2017). Christiansen et al. (2008) and Chen et al. (2016) reported that non-toxic strains of Planktothrix frequently resulted from the (partial) deletion or inactivation of the MC synthesis gene cluster (mcy) through IS elements. It has been reported earlier that IS elements tend to cluster in the host genome (Zhou et al., 2008). Thus, IS elements located in the vicinity of a specific SM synthesis gene cluster might cause its inactivation, resulting in the loss or modification of bioactive peptide production. The comparison of complete genomes can reveal the potential physical relationship between IS elements and SM synthesis gene clusters. However, to date, for Planktothrix spp. only three strains of complete genomes, namely, P. agardhii NIVA-CYA126/8, PCC7805, and NIES-204, have been elucidated (Christiansen et al., 2014;Pancrace et al., 2017;Shimura et al., 2021). In addition, investigating the impact of IS elements on the population level is important to understand the influence of IS elements on a genomic scale. Therefore, in this study, we combined genomics and population genetics to characterize the distribution of IS elements and SM synthesis gene clusters within the taxonomically well-defined HAB-forming genus Planktothrix. Hence, we completed the genomes of 13 Planktothrix spp. strains assigned to three major phylogenetic lineages by using the reference genome (Christiansen et al., 2014). The phylogenetic lineages have been established using 125 Planktothrix strains originating from 40 water bodies located in 17 countries on three different continents (Europe, North America, and Africa) (Kurmayer et al., 2015;Entfellner et al., 2017). We focused on (i) the relationship between SM biosynthesis gene clusters and IS element distribution, (ii) the IS element-induced variation and innovation in cyanobacterial SM production, and (iii) the evolution of SM synthesis gene clusters as a result of recombination events. In particular, we analyzed seven SM biosynthesis gene clusters among 13 Planktothrix strains and detected 1,622 IS element copies in all the genomes. Notably, our results indicate the occurrence of IS elements with higher proportion within breaking regions related to chromosomal rearrangements resulting in colocalization of toxic/bioactive peptide gene clusters on the chromosome.

Planktothrix Genome Sequencing and Assembly
Genomes were sequenced using PacBio RS II platforms (GATC Biotech, Constance, Germany) in 2014 and/or PacBio Sequel Systems (Novogene, Beijing, China) in 2019. For PacBio RS II platforms, three SMRT cells per strain were used, whereas for PacBio Sequel Systems, only one SMRT cell per strain was used. Libraries were prepared using the standard PacBio 10-kb protocol and sequenced on a PacBio RS II system with P6-C4 chemistry (in 2014) and on the Sequel System with Sequel Sequencing Kit 3.0 (in 2019). Raw reads were filtered and assembled using the hierarchical genome assembly process (HGAP) v3 (Chin et al., 2013) for PacBio RS II platform and using HGAP v4 and Canu v2.0 (Koren et al., 2017) for PacBio Sequel System. In brief, consensus sequences were obtained using circular consensus sequencing command 1 and converted to FASTA format using bamtools (Barnett et al., 2011;Wenger et al., 2019). Consensus sequences were assembled into contigs using Canu under "-pacbio-hifi" mode. Coverages ranged between 17 and 455-fold for the Canu and HGAP assembly reads (Supplementary Additional File 1: Table  S1).

Chromosome Completion and Plasmid Confirmation
For a number of strains (NIVA-CYA126/8, No66, No82, No108, No365, No758, PCC7805, PCC7821, and PCC9214), chromosomes and plasmids were identified on the basis of PacBio RS II assembly. Contig alignments or missing sequences between contigs were obtained by PCR using specific primers and Sanger sequencing of forward and reverse sequences (Supplementary Additional File 1: Table S2) or by resequencing via PacBio Sequel Systems in 2019 (No66, No108, and No365). For the other strains (No2A, No713, No976, and PCC7811), complete chromosomes and plasmids were directly obtained through the PacBio Sequel Systems. The majority of plasmids were physically confirmed by overlapping longdistance PCR, showing that a specific DNA molecule was circled as all of the primer sites could be shown to reveal PCR products (Supplementary Additional File 1: Table S2 and Supplementary Additional File 2). PCRs were performed using Phusion High-Fidelity DNA Polymerase (Thermo Scientific) according to the manufacturer's protocol (HF Buffer, 500 nM of each primer, 200 µM of each deoxynucleotide triphosphate, 0.1 U polymerase, and 10 ng of HMW DNA). The size of PCR products was determined by gel electrophoresis (0.8% agarose gels in 0.5 × Tris-borate-EDTA buffer) and visualized using Midori Green. PCR products were either extracted from agarose gels or purified using a commercial PCR purification kit (QIAquick, Gel Extraction Kit, Qiagen) and directly sequenced (Eurofins Genomics, Ebersberg, Germany). Thirteen complete genome sequences have been submitted to EMBL-EBI under STUDY_ID PRJEB40445 (ERP124090).

Genome Annotation and Phylogenetic Analysis
Automated genome annotations and gene predictions were performed using the genome annotation tool GenDB 2.4 (Meyer et al., 2003) as compared with the reference genome of P. agardhii NIVA-CYA126/8 (Christiansen et al., 2014) and reference genomes from Arthrospira platensis C16 (NZ_CM001632.1), Lyngbya aestuarii BL-J (NZ_AUZM00000000.1), and Trichodesmium erythraeum (JAGGDU00000000.1) or if no homolog was found via the SwissProt database. Genomic comparison was performed using the online platform EDGAR 3.0 (Blom et al., 2009) regarding the core and pan genome, average nucleotide identity (ANI), and average amino acid identity (AAI) as well as a phylogenetic tree calculated from the core genes using Kamptonema sp. PCC6506 as an outgroup. The phylogenetic tree was calculated using the FastTree software 2 to generate approximately-maximum-likelihood phylogenetic trees and Shimodaira-Hasegawa local support values (Shimodaira and Hasegawa, 1999). All genes were compared pairwise within chromosomes or plasmids using 97% identity cutoff and >90% query coverage to quantify paralogs. In order to quantify gene functions the Clusters of Orthologous Groups (COG) database was used as transferred from the NIVA-CYA126/8 reference genome (Christiansen et al., 2014).

Insertion Sequence Element Assignment
Insertion sequence element transposases were assigned or predicted using ISsaga 2.0 (Varani et al., 2011), available through the ISfinder database (Siguier et al., 2006; date August 4th 2020). The IS elements annotated by GenDB were matched with those identified by ISfinder and BlastX with an e value of <10 −20 . Considering that terminal repeats of ISs are often small and poorly characterized and in many cases do not allow a precise definition, the predicted ORF of a specific IS element was used for further analysis. Copy numbers of different IS elements were calculated from all 13 sequenced genomes by BlastN using a threshold e value of 1e −20 . We used ORF length, non-interrupted translation, and copy number as criteria, that is shorter sequences occurring in higher copy number only were considered as fragments, to differentiate full-length transposases from fragments.

Secondary Metabolite Peptides
In order to investigate peptide synthesis all strains were incubated in liquid BG11 medium (Rippka, 1988) at 15 • C (P. agardhii, P. rubescens) or 23 • C (P. pseudagardhii, P. tepida) under continuous low light conditions (10 µmol m −2 s −1 , Osram Type L30W/77 Fluora) and harvested during early logarithmic growth phase (usually after 14 days). Harvested cells on glass fiber filters were dried, and 2-14 mg (median 4.8) of dry weight were extracted in 50% (v/v) aqueous methanol as described previously by shaking on ice (Kosol et al., 2009). Peptides were separated by HPLC (HP 1100, Agilent, Vienna, Austria) using a water/acetonitrile (0.05% trifluoroacetic acid) gradient from 80:20 to 50:50 in 45 min at a flow rate of 1 mL min −1 and at 30 • C oven temperature via a LiChrospher 100 RP18e octadecylsilica (5 µm particle size) sorbent packed in a LiChroCART 250-4 cartridge system (Merck, Darmstadt, Germany) (Kosol et al., 2009). An electrospray ionization mass spectrometer ion trap (amaZonSL, Bruker, Vienna, Austria) was coupled to the HPLC as described (Entfellner et al., 2017). Peptides were detected in positive-ion mode using nitrogen as sheath gas (43 psi, 8 L min −1 , 300 • C) and helium as auxiliary gas. The capillary voltage was set to 5 kV. Mass screening and automated fragmentation were performed within one run using the two precursor masses with high intensity for MS 2 fragmentation and one for MS 3 fragmentation. Peptides were determined on the basis of the retention time, mass, and fragmentation pattern and assigned to the peptide families based on the fragmentation pattern or the predicted mass of a precursor peptide.

Graphical Tools
SnapGene was used to visualize primer positions on plasmids. Synteny plots between strains were calculated using R2cat (Husemann and Stoye, 2010). BioCircos package in RStudio (Cui et al., 2016) was used to visualize the position of SM synthesis gene clusters, breaking regions of chromosomal rearrangements, and individual IS element groups.
identified (Table 1). For strain NIVA-CYA126/8 the previously published genome (Christiansen et al., 2014) was re-sequenced and closed. For strain PCC7805 the genome was published earlier by Pancrace et al. (2017), however, has been resequenced during this study. For both strains the resequenced total genome sizes were found 50 kbp increased (1.1% of genome size). For NIVA-CYA126/8 the larger plasmids (119, 90, and 52 kbp) were confirmed while the two smaller plasmids (5 kbp) were found integrated into the 90 kbp plasmid. For PCC7805 the megaplasmid was confirmed while the 5 kbp plasmid was not found. Within Planktothrix spp. the chromosome size varied between 4.72 Mbp (i.e., strain No2A) and 6.8 Mbp (PCC 9214), resulting in 4,305-5,796 predicted protein-coding genes (ORFs). Planktothrix agardhii strains of Lineage 1 (1A) had the smallest chromosomes; P. rubescens or P. agardhii strains of Lineage 2 had larger chromosomes, whereas those of Lineage 3 had the largest chromosomes. All strains had the same number of rRNA copies (three 5S, four 16S, and four 23S). However, tRNAs varied from 42 among strains of Lineages 1 and 2 up to 48 (PCC9214) or 56 (No713) in Lineage 3. In 13 chromosomes, we obtained 2,881 core genes and 9,284 pan genes (Supplementary Additional File 3: Figure S1A). The core genome development curve reached a stable minimum, indicating that the number of core genes will not decrease when Planktothrix spp. strains were further considered. For the pan genome, we obtained a saturation curve with a slight increase; therefore, we could obtain additional gene information when further strains assigned to Planktothrix spp. were considered. Considering closely related P. agardhii/P. rubescens strains, not only reduced variability but still no saturation was reached for the pan genome (Supplementary Additional File 3: Figure  S1B). When calculated for each lineage, again no saturation for pan genes was observed (Supplementary Additional File 3: Figure S1C). Instead pan genes increased distinctly between Lineages 1 and 2 and Lineages 2 and 3 by 34-41 and 52-54%, respectively (Supplementary Additional File 3: Figures S1C,D).
The phylogenomic tree calculated from core genes (Figure 1) revealed that Lineages 1 and 2 are sister lineages, to the exclusion of basal Lineage 3. In agreement the AAI and ANI matrices revealed that strains of Lineages 1 and 2 were found most closely related, whereas strains of Lineage 3 (No713 and PCC9214) were found more distinct (Supplementary Additional File 3: Figure  S2). Lineage 1 differed from Lineage 2 between 97.8 and 98.3% (AAI) or 95.3-96.3% (ANI) which is close to the cutoff frequently used for species demarcation (Jain et al., 2018). We used synteny plots to investigate chromosomal gene arrangement and explore differences in genomic structure among strains. We found a similar chromosomal structure among strains within a lineage and only a few chromosomal rearrangements among Lineages 1, 1A, 2, and 2A. By contrast, the chromosomal arrangement of P. agardhii/P. rubescens strains (Lineages 1 and 2) compared with strains of Lineage 3 (No713 and PCC9214) was completely different but rather indifferent between the two strains No713 and PCC9214 (Supplementary Additional File 3: Figure S3).
Paralogous genes were calculated to identify the factors contributing to chromosome size variation, which contributed only 1.0% (strain No976) to 4.2% (strain No713) of all genes ( Table 2). Paralogs comprised IS elements ranging from 15.7% (strain No66) to 82.3% (strain No713). The other duplicated genes were assigned to photosystem II, gas vesicle proteins, reverse transcriptases, and unknown proteins (Supplementary Additional File 1: Table S3). Thus, on a nucleotide basis, the percentage of duplicated genes (without transposases) was <1%. Therefore, apart from a relatively minor influence of IS elements, duplication of genes was also of relative minor importance for chromosome size increase among Planktothrix spp. Thus, the major part of chromosome size variation must have resulted from gene loss processes (ancestor of Lineages 1 and 2) and horizontal gene transfer (HGT).
Based on synteny analysis, many deviating genes appeared singular. We also found some larger regions encoding putatively No713 n/a n/a n/a n/a n/a n/a n/a PCC9214 n/a n/a n/a n/a n/a n/a n/a The raw data of HPLC-MS analysis as well as a list of protonated masses and assigned peptides for all 13 strains have been included in Supplementary Additional File 1: Tables S5, S6. n/a, not applicable; n.d., not detected; put., putative; Aer, Aeruginosin; AP, Anabaenopeptin; Cpt, Cyanopeptolin; MC, Microcystin; Mic, Microginin; Mvd, Microviridin; Pag, Prenylagaramide; Pla, Planktocyclin.
Frontiers in Microbiology | www.frontiersin.org functional gene clusters. Two of these operons were described earlier as the phycoerythrin gene cluster among red-pigmented P. rubescens/P. agardhii strains (Tooming-Klunderud et al., 2013) or the nitrogen-fixation gene cluster described for strain PCC9214 in Lineage 3 (Pancrace et al., 2017). When compared with the NIVA-CYA126/8 reference genome, strains among Lineage 1 slightly differed in deviating genes by 6.9-7.6% only. Strains among Lineage 2 differed by 17.0-20.4%, whereas strains among Lineage 3 differed by 37.5-40.7%. Based on the phylogenomic tree described above (Figure 1), the strains of Lineage 2 gained 14-20% of their genomic information possibly through HGT, whereas ancestors of Lineages 1 and 2 possibly lost up to 40.7% from more basal genotypes of Lineage 3. Correspondingly Venn diagrams revealed a higher overlap for both core and pan genes between lineages 1 and 2, while the overlap with Lineage 3 was smaller (Supplementary Additional File 3: Figure S4). In order to find out whether for disjunct core and pan genes certain gene functions occurred more frequently the COGs were assigned. It should be noted that only 20-43% and 20-34% for core and pan genes, respectively were assigned to COG categories. Overall for core genes the COG categories showed a rather similar proportion when compared between lineages. Lineage 3 showed higher percentage of COGs related to primary metabolism (i.e., C, G, E, H, and I). For pan genes, notably COG X (Mobilome: prophages, transposons including transposases) occurred most frequently among Lineage 2.

Insertion Sequence Elements
Using the completed genomes, we performed an exhaustive comparison of the composition and localization of IS elements. In total, we found 1,622 IS element copies among the 13 genomes (Supplementary Additional File 1: Table S4). Thereof 865 (53.7%) full-length transposases of IS elements were found and classified as putatively active. The IS element copy number per strain varied between 68 (No976) and 273 (No713). In the chromosomes, IS element transposases made up 0.8-3.7% (Table 1) on nucleotide basis, which increased with the increase of chromosome size (Figure 2). Similarly the copy number of full-length transposases and fragments thereof increased with chromosome size. The percentage of IS elements on plasmids was more variable as some plasmids were found without any IS elements, and others were found carrying up to 16.4% (Table 1). We identified 15 known IS element families by using the IS finder platform. Nevertheless, 7.8% of IS element copies were not classified to IS element families yet. The most abundant IS element family was IS200/IS605 (30%). For a precise insight, we further subdivided the main part of IS elements (58.7%) into 27 groups because of blastN similarity ( Figure 3A and Table 3). The majority of IS element groups occurred chromosomally only, whereas some IS element groups occurred on chromosomes and plasmids (e.g., groups 2, 4, 5, and 16). The frequency of these IS element groups and its phylogenetic distribution differed enormously: For IS element group 10, we found 13 copies in total, i.e., each strain had exactly one copy. Group 7 was also present in all strains; however, the copy number varied remarkable, as some strains had only one copy, and strain PCC9214 of Lineage 3 hosted 20 full-length copies. In addition, IS element group 5 was present in all lineages in a high copy number. IS element group 1, flanking the mcy gene cluster (Christiansen et al., 2003), occurred in full length in Lineages 1 and 2 in high copy number, whereas only few copies were found in Lineage 3. Opposite results were found for IS element group 15 and 23, which showed high copy numbers and occurred only in strains No713 and PCC9214 and thus appeared to be phylogenetically restricted to Lineage 3. Furthermore, with one exception IS element group 16 occurred in Lineage 2 only. IS element group 2 (ISPlag1), which is known to cause inactivation and deletion of the mcy gene cluster , showed the most copies (43 full-length copies and 56 fragments) and occurred mostly among strains of Lineage 1. Only fragments of ISPlag1 were found in strains of Lineage 2. ISPlr1, which is also known to inactivate the mcy gene cluster by insertion , was found frequently in Lineage 2, and few copies were found in Lineage 3. Furthermore, IS element group 4 was found adjacent to the anatoxin biosynthesis gene cluster in Kamptonema (Oscillatoria) strain PCC6506 (Mejean et al., 2009), and it occurred in full-length in Lineages 1 and 2, whereas Lineage 3 carried fragments only.
In general, the ratio between full-length IS elements and fragments of IS elements was balanced and strain specific but not lineage specific. Strain No976 and strain No758 had the lowest (36%) and the highest (68%) content of full-length transposases, respectively ( Figure 3B). We found no general difference between chromosomes and plasmids with regard to full-length vs. fragments. With regard to the 27 IS element groups, the ratio of full-length IS elements and fragments differed remarkably. Numerous IS element groups (groups 1, 16, 17, 757, and ISpagF) contained a high proportion of full-length copies (>90%). Other IS element groups had less full-length copies; in particular, groups 10, 11, 12 and 13 occurred only as fragments ( Figure 3C).

Secondary Metabolite Synthesis Gene Cluster Distribution
We localized seven previously elucidated SM synthesis gene clusters, including two NRPS gene clusters (apnA-E and ociD-C), two NRPS/PKS hybrid gene clusters (aerA-N and mcyT-J), two RiPP gene clusters (mvdA-F and pagC-G) on the chromosomes, and another NRPS/PKS gene cluster (micA-E) occurring on plasmids (Table 4 and Figure 4). Strains of Lineage 1 carried 4-6 of these SM synthesis gene clusters, whereas strains of Lineage 2 had 6-7 clusters. Four biosynthesis gene clusters (aer, mvd, oci, and pag) were present among all strains of Lineages 1 and 2. By contrast, the apn and mcy gene clusters were distributed irregularly, that is apn and mcy genes occurred irregularly among strains of Lineage 1, but they were always present among strains of Lineage 2. The mic gene cluster, located on plasmids, was randomly distributed, as it was only found in two strains, namely, No66 (Lineage 1) and PCC7821 (Lineage 2). By contrast, the two strains of Lineage 3 had none of these abovementioned SM synthesis gene clusters. However, we found several putative NRPS/PKS genes among strains of Lineage 3. Similarly, among strains of Lineage 1 and 2, unknown NRPS/PKS genes were found.  Figure 4 shows the distribution of SM synthesis gene clusters on the chromosomes. Notably, individual gene cluster positions changed in congruence with the phylogeny of the strains. Strains of Lineage 1 (e.g., No66) were found to have the apn and oci gene clusters distant to the mvd gene cluster, whereas among all strains of Lineage 2 and some strains of Lineage 1 (e.g., NIVA-CYA126/8) the apn and oci genes were found rather close to the mvd genes, i.e., forming a so-called meta gene cluster of mvdapn-oci. Similar evolution can be observed for the pag gene cluster, which was located far away from all other SM synthesis gene clusters among strains of Lineages 1, 1A, and 2A, but pagC-G genes became localized closer to the meta gene cluster among strains of Lineage 2.

Secondary Metabolite Synthesis Gene Cluster Functionality
In general, whenever we found one of the abovementioned NRPS or NRPS/PKS gene cluster in the genome, we also detected corresponding putative peptides, confirming a high share of SM synthesis gene cluster functionality in Planktothrix ( Table 5, Supplementary Additional File 1: Tables S5, S6 and Supplementary Additional File 3: Figure S5). In addition, for the mic gene cluster carried on plasmids (No66, PCC7821), the produced microginins were detected. However, an inactive aer gene cluster of strain No758 occurred because of partial deletion as well as a putatively inactive oci gene cluster of strain PCC7811. In contrast to NRPS, the two RiPP gene clusters were found frequently inactive. Within the mvd gene cluster, strains were found to carry one or two potential precursor peptide genes (mvdE and mvdF). For strains carrying mvdE (No66, No2A, No365, and NIVA-CYA126/8), the predicted peptides were detected. Strain No365 had two slightly different mvdE precursor peptide genes resulting in two different microviridin structural variants. On the contrary, for strains carrying only the mvdF precursor peptide gene, the corresponding peptide could not be detected. The pag gene clusters in Planktothrix showed a variable number of precursor peptide genes (pagE) ranging from one pagE copy in strain No108 to 12 pagE copies in strain No66 (Supplementary Additional File 3: Figure S5), indicating gene duplication and potential structural diversification. Some strains seemed to carry several active pagE genes (e.g., three in No976), whereas other strains had no active pagE (e.g., NIVA-CYA126/8). Consistent with Donia and Schmidt (2011), both genes encoding prenylagaramide B 1 (pagE6) and C 2 (pagE7) were detected in P. agardhii. It is interesting to note that active pagE genes often were found to contain a core motif GLTPH/L (Supplementary Additional File 1: Table S7) which has been suggested as a restriction site for the N-terminal protease A previously (Gu et al., 2018). Moreover, some strains of P. rubescens were reported to produce one or two variants of planktocyclin (Baumann et al., 2007;Kurmayer et al., 2015). Using backtranslation of the amino acid sequence constituting planktocyclin Pro-Gly-Leu-Val-Met-Phe-Gly-Val (resulting in 12,288 possible nucleotide sequences of 24 bp in length), only one exact BLASTn hit was found in the genome of strain No66. This exact match formed part of the precursor genes pagE described for the pag gene cluster previously (Donia and Schmidt, 2011). Therefore, we concluded that the pag gene cluster corresponded to the synthesis of prenylagaramide and planktocylin (lacking prenylation). Furthermore, we found a correlation between the presence of prenylated peptides and prenyltransferases (pagF). The strains PCC7805, PCC7811, No365, and No758 carried a highly similar pagF gene, and these strains also produced O-prenylated tyrosine-containing peptides (e.g., Pag B 1 and C 2). For strains No2A and No82 (and also in the planktocyclin-inactive strain NIVA-CYA126/8), we detected a lower similarity of the pagF gene (Supplementary Additional File 3: Figure S6). Recently, Shimura et al. (2021) reported dissimilarity of the pagF gene using strain PCC7821, resulting in C-prenylation of tryptophan, which has been previously known Frontiers in Microbiology | www.frontiersin.org as oscillatorin (Sano and Kaya, 1996). Other strains (No66 and No976) lacking prenyltransferase produced planktocyclins but no prenylagaramides.

Relationship Between Insertion Sequence Elements and Secondary Metabolites Synthesis Gene Clusters
We calculated net distances between the location of SM synthesis gene clusters and the location of full-length IS elements to systematically compare the distribution of IS element groups in relation to individual SM synthesis gene clusters (or fragments). When comparing IS element net distance among the SM synthesis gene clusters, the majority of IS elements showed a non-normal distribution (Kolmogorov-Smirnov test p < 0.01).
Notably, for nearly all IS element groups, the range in variation of net distance from a specific SM synthesis gene cluster exceeded 1 Mbp. Only for IS element group 3, a more narrow range in relation to the pag gene cluster was recorded, that is 0.12 Mbp. Therefore, any IS element group showed a clustering of its copies in the vicinity of a specific SM synthesis gene cluster. Nevertheless, a few copies of IS elements were found most closely located to a specific SM synthesis gene cluster (<11 kbp in distance), that is IS element group 2 (ISPlag1) for mvd genes, group 4 for oci and apn genes, group 3678 for pag genes, and group 1 and ISPlr1 for mcy genes.
We calculated the frequency of IS elements (including the remnants) in the adjacent regions (10-30 kbp) of the seven SM synthesis gene clusters ( Figure 5A). In general, the strains did not show an increased IS element content in the vicinity of SM synthesis gene clusters when compared with the entire chromosome. Only PCC7821, No108, and No66 indicated an IS element frequency that increased in the vicinity of SM synthesis gene clusters.
However, the pag and mic gene clusters included single IS element copies. The pag gene cluster of seven strains included a transposase (Supplementary Additional File 3: Figure S5) as reported for P. agardhii NIES-596 and ISpagF previously (Donia and Schmidt, 2011). The two mic gene clusters included a specific transposase, i.e., for strain No66 an IS element of group 4 was found, and for strain PCC7821 ISPlr1 was located within the mic gene cluster. Only a few transposases were found adjacent to SM synthesis gene clusters, for example, ISPlag1 (group 2) was found close to the apn gene cluster (or apn remnants) in many strains of Lineages 1 and 1A. Therefore, relatively few IS elements were found in proximity to one of the seven SM synthesis gene clusters among P. agardhii/P. rubescens strains.

Chromosomal Rearrangements Influencing (Co)Localization of Secondary Metabolites Synthesis Gene Clusters
As mentioned previously, P. agardhii and P. rubescens strains showed a similar chromosomal structure. In total, ten major chromosomal rearrangements were observed (Figure 6 and Supplementary Additional File 3: Figure S7). We investigated the breaking regions of the ten chromosomal rearrangement events (approximately 30 kbp up and downstream of the putative breaking point) with regard to general gene composition, gene duplications, IS elements, and repetitive sequences (Supplementary Additional File 4: Tables S8-S16). In general, within putative breaking regions, we observed not only IS elements but also genes of the primary metabolism or uncharacterized genes.
The chromosomal rearrangements A and H influenced the chromosomal positions of some SM synthesis gene clusters and contained duplicate genes. As described previously, some strains of Lineage 1 had two copies of apnE (ABC transporter encoded by the apn gene cluster) possibly because of apn gene loss and regaining of the apn gene cluster (Entfellner et al., 2017). The presence of the entire apn gene cluster and the additional apnE was related to chromosomal inversion (approximately 270 kbp), moving the apn and oci synthesis gene clusters close to the mvd gene cluster and thus forming a meta gene cluster, namely, mvd-apn-oci (Figure 7), which is ubiquitously present among strains of Lineage 2. Alternatively, some strains in Lineage 1 (e.g., strain No66) lost the meta gene cluster formation by the same event. Notably, within Lineage 2, another chromosomal rearrangement H (approximately 2.3 Mbp) resulted in the movement of pag synthesis genes closer to the meta gene cluster (distance approximately 280 kbp).
In general, among the putative breaking regions, repetitive sequences were observed (Supplementary Additional File 4: Tables S9, S10, S14-S16): WD repeat-containing protein genes were found at the breaking regions of chromosomal rearrangements B, C, G, H, I, and J. In particular, for chromosomal rearrangement J, we found repetitive sequences at both and very close to the putative breaking points in all strains of Lineage 2. Repetitive sequences that are part of an unknown ORF encoding a tetratricopeptide protein were found at breaking regions of chromosomal rearrangements H and I. Furthermore, we observed increased IS element frequency among the breaking regions. Notably, a correlation between IS element frequency (proportion of nucleotides) and the distance to the assumed breaking points of chromosomal rearrangements was observed ( Figure 5B). The percentage of IS elements (transposases on nucleotide basis) among breaking regions was highest within 10 kbp up and downstream (1.9-4.5%) and lowest for the entire chromosome (0.9-2.6%), and it decreased with distance from breaking point. For all ten chromosomal rearrangements, at least one IS element was found, whereas most breaking regions showed several copies of IS elements ( Figure 5C). Therefore, an increased proportion of IS elements occurred within breaking regions, which was related to chromosomal rearrangements predominantly within Lineage 2 (rearrangements C, E, and G-J) and to a less extent within Lineage 1 (rearrangements A, B, and D). Zhou et al. (2008) compared 1,356 IS elements from the IS finder database for 19 complete genome sequences of cyanobacteria and concluded that the genome size tends to  Figure S7). FIGURE 7 | Chromosomal rearrangement A related to colocalization of mvd and apn gene clusters. Homologous gene loci are indicated by gray shading, which enabled the inversion of a 300 kbp fragment leading to a meta peptide gene cluster comprising mvd, apn, and oci genes.

Relationship of Genome Size and Insertion Sequence Element Proportion
increase with the number of recently active IS elements in a genome. In this study, not only full-length copies but also fragments were found to correlate, which might be due to the presence of repetitive sequences. Similarly, Larsson et al. (2011) reported a highly significant correlation between genome size and the number of duplicated genes. When comparing 58 cyanobacterial genomes, the authors concluded that numerous paralogs (resulting from gene duplication) assigned to COG L (replication) were mostly linked to transposases. In general, transposases (IS elements) could be used to create/maintain genomic plasticity, thereby potentially allowing rapid adaptation to environmental changes (Frangeul et al., 2008;Kusumoto et al., 2011;Humbert et al., 2013;Nzabarushimana and Tang, 2018). Details on IS-based molecular adaptation mechanisms are less clear, but partial IS elements rather facilitate homologous recombination processes because of sequence similarity than actively inducing it by cut and paste mechanism.
By contrast, in this study, the chromosome size slightly increased with the frequency of IS elements, and the content of other paralogs was smaller than that of IS elements. Thus, likely orthologous genes inserted by HGT led to genome increase during evolution of Lineage 2. A large share of these orthologous genes was composed of uncharacterized hypothetical proteins ( Table 2). However, Tooming-Klunderud et al. (2013) demonstrated earlier that the strains assigned to P. rubescens gained a gene cluster encoding phycoerythrin synthesis through HGT, which is considered as a selective advantage in deep stratified lakes (Kurmayer et al., 2015). Understanding the physiological function of the acquired total genetic basis ranging from 0.8 to 0.995 Mbp is considered important to understand the ecophysiological adaptation of Lineage 2 on a genomic level.
According to the botanical definition (Anagnostidis and Komárek, 1988) the green vs red pigmentation is the major criterion to differ between P. agardhii and P. rubescens (Suda et al., 2002). The two major phylogenetic lineages identified through multi-locus sequence analysis (MLSA) previously were confirmed via phylogenomic analysis in this study (Figure 1 and Supplementary Additional File 3: Figure S2). However, as found earlier both lineages comprise green and red pigmented strains not reflecting the current species definition according to Suda et al. (2002). HGT of phycoerythrin synthesis only is unlikely to be the sole factor for evolutionary diversification. On the other hand the observed high genetic dissimilarity between phylogenetic Lineages 1 and 2 points to a more stable phylogenetic barrier which would imply genetic differentiation even if genotypes from both lineages occur in the same habitat (Jain et al., 2018). In the future for Lineage 2 the functional consequences of the acquired additional genetic information need to be explored concerning its role in the observed phylogenomic differentiation.

Secondary Metabolites Synthesis Gene Clusters Influenced by Mobile Elements
In general, the observed patchy SM synthesis gene cluster distribution among genera/strains has been considered the result of mobile elements, that is plasmids and transposases (i.e., Bolch et al., 1997;Nakasugi et al., 2007). Based on this study, most SM synthesis gene clusters have been stably integrated into the chromosome, which show a distribution congruent to phylogeny. Moreover, within Lineage 2, all four strains contained at least six SM synthesis gene clusters (aer, mvd, apn, oci, pag, and mcy) probably because an ancestor genotype already carried all of these clusters. By contrast, the number of SM synthesis gene clusters among Lineage 1 was reduced because of gene loss processes possibly influenced by IS elements (loss of mcy and apn genes related to insertion of ISPlag1). Only the mic gene cluster has been found on a plasmid, which may be the first example of a plasmid encoding a functional SM synthesis gene cluster for bloom-forming cyanobacteria (strains No66, PCC7821). However, for the genus Planktothrix, the genomic mobility of the six most abundant SM synthesis gene clusters is considered low. Similar to plasmids, various IS elements located within or in the vicinity of SM synthesis gene clusters have been suggested to facilitate HGT (e.g., Tillett et al., 2000;Christiansen et al., 2003). Correspondingly, in this study, for all SM synthesis gene clusters, a few IS elements were found closely located (<10 kbp), for example, strain No66 carried ISPlag1 and IS element group 4 flanking the apn gene cluster in a distance of 4 kbp. ISPlag1 also was found in the vicinity of apn gene clusters in strains No2A and No365. Considering that apn genes might have been re-introduced through HGT recently (Entfellner et al., 2017), the IS elements, namely, ISPlag1 and IS group 4, might have been involved. Two IS elements were part of the mic gene cluster, that is IS group 4 (No66) and ISPlr1 (PCC7821). Considering that the percentage of IS elements on plasmids ranged from 0 to 16.4%, the IS element share on plasmids is generally more flexible when compared with the chromosome. If the incoming DNA contains parts similar to the recipient plasmid/chromosome such as IS elements (Herrero and Flores, 2008), then IS elements present in plasmids can serve as the recognition site for homologous recombination.
Based on another hypothesis, active IS elements might cluster in host genomes (Zhou et al., 2008). Such clustering may affect the mutation rate in certain regions of the host genome, including functional operons such as SM synthesis gene clusters. Indeed, in this study, increased IS element frequency was observed in certain chromosomal regions related to chromosomal rearrangements ( Figure 5B). It is not known whether this increased frequency in certain regions is because of (unknown) factors directing IS element insertion or actually occurs accidentally. Nevertheless, we argue that the physical vicinity of IS elements to SM synthesis gene clusters is not of decisive importance for modification, rather it is the activity of certain IS elements (ISPlr1 and ISPlag1). For example, IS element group 1 is found closely located to the mcy gene cluster (except No758), but it shows no influence. Therefore, the (active) IS elements affect certain SM synthesis genes (mcy) from the distance, for example, directed by the presence of short repetitive (RR 1-7, 41-46 bp) sequences . Such RR sequences resemble the repetitive extragenic palindromic DNA sequences (REPs) described from many bacteria (Tobes and Ramos, 2005). REP sequences can form stem-loop hybridization during DNA replication, and they might direct transposases such as ISPlr1 (Tobes and Pareja, 2006). Notably, in contrast to mcy genes containing REPs (RR2,3,5,6,7, 43-46 bp), other SM synthesis gene clusters located in the chromosome of Planktothrix did not contain these reoccurring sequence motifs, indicating a potential protection from ISPlr1 disruption (Rainer Kurmayer, unpublished data). For the majority of ISPlr1 copies, a REP sequence is found in the flanking region of the IS element (< 10 kbp distance, i.e. for 35 copies of the 40 copies in strains PCC7821, No82, No108, and No758).

Evolution of Secondary Metabolites Synthesis Gene Cluster Localization in Chromosome
The colocalization of SM synthesis gene clusters (mvd-apnoci) resulting in the meta peptide synthesis gene cluster has been reported from Planktothrix and has been considered as a genomic island (Rounge et al., 2009;Pancrace et al., 2017). Genomic islands, including several SM synthesis gene clusters, have been described from other bacteria, that is actinobacteria of the genus Salinispora (Penn et al., 2009;Ziemert et al., 2014). In general, such genomic islands have been considered as a result of transmittance, that is pathogenicity islands containing biosynthesis gene clusters for yersiniabactin (Fischbach et al., 2008), and a high frequency of mobile (IS) elements has been reported (Penn et al., 2009). In this study, the usual criteria for the differentiation of genomic islands were not met, that is neither a high share of mobile elements, nor GC percentage deviation, or codon usage bias could be found (data not shown). However, based on this criteria, all of the six chromosomally encoded SM synthesis gene clusters occurred already in the ancestor of P. agardhii/P. rubescens, and they were partly lost during the evolution of Lineage 1 (e.g., MC or anabaenopeptin). Recently, Pancrace et al. (2017) described the draft genome of a benthic Planktothrix strain PCC11201, which contained SM synthesis gene clusters for MC, aeruginosin, cyanopeptolin, microviridin, and prenylagaramide. In this study, P. agardhii strains, namely, PCC7805 and No976, represented genotypes that lost the apn gene cluster previously but kept apnE as a remnant in the flanking region of the mvd gene cluster (Entfellner et al., 2017). Similarly, strains such as No2A, No66, No976, No365, PCC7805, and PCC7811 still contained remnants of mcyT indicative of the former mcy gene cluster . Notably, other genotypes such as NIVA-CYA126/8 maintained the six SM biosynthesis gene clusters, which can be considered representative of the ancestor of Lineages 1 and 2.
Aside from SM synthesis gene clusters, the presence or absence of chromosomal rearrangement processes led to a localization in the chromosome that was congruent with phylogeny (Figure 4), that is three strains of Lineage 2 showed a colocalization of six SM synthesis gene clusters within 1 Mbp. By contrast, strain NIVA-CYA126/8 and strain No758 still carried the pag gene cluster more distant from other SM synthesis gene clusters. At present, we can only speculate, but in general, the clustering of genes has been related to coordinate regulation, that is to activate a specific pathway on demand (Fischbach and Walsh, 2006). In addition, the colocalization of several SM synthesis pathways might aid in regulation, for example, in P. agardhii, NRPS and RiPP peptides are coproduced (Welker et al., 2004). Rohrlack and Utkilen (2007) reported that anabaenopeptin versus microviridin are produced constitutively despite marked changes in culture conditions. In general, those peptides are produced in high intracellular amounts approaching 1% of dry weight (Jüttner and Lüthi, 2008;Kosol et al., 2009). Such high concentrations may have intracellular effects, such as the covalent binding of MC variants to the free cysteine groups of abundant proteins (phycobilins) through the methyldehydroalanine group in pos. 7 of the MC molecule (Zilliges et al., 2011). Thus, the colocalization of NRPS and RiPP gene clusters might facilitate the regulation of individual peptide synthesis pathways, resulting in a more balanced intracellular peptide concentration.

CONCLUSION
The microevolution of the bloom-forming cyanobacteria Planktothrix spp. is related to a considerable variation in chromosome size. This variation in chromosome size, spanning almost 2,000 kbp, has been caused by gene duplication of IS elements or other genes to a minor extent only, while gene deletion and HGT events contributed more than eighty percent. A few functional adaptations and poorly characterized proteins including transposases were acquired by HGT. Six of seven peptide synthesis gene clusters occurred already in the ancestor of P. agardhii/P. rubescens, and became partly lost during the evolution of Lineage 1. Overall, IS elements have been involved in SM gene cluster loss processes, however, did not occur more frequently in the vicinity of SM synthesis gene clusters. By contrast, IS elements have been observed more frequent within breaking regions causing chromosomal rearrangements, thereby influencing the colocalization of SM synthesis gene clusters on the chromosome.

AUTHOR CONTRIBUTIONS
EE, RL, and RK performed the data analysis and drafted the manuscript. YJ and LD confirmed the plasmids by PCR analysis. JR and LD aligned the contigs from raw sequences. JB and RK assisted in exploitation of sequence information such as annotation, COG analysis and paralog identification and sequence submission to GenBank. RK supervised the work and reviewed the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
This study was financed by the Austrian Science Fund (FWF) P24070 and P32193 to RK. RL was supported by the Office of China Postdoc Council and Helmholtz Zentrum München (HMGU). EE was supported by the Austrian Academy of Sciences (ÖAW) Ph.D. fellowship program. YJ was supported by the China Scholarship Council.

ACKNOWLEDGMENTS
Rosemarie Rippka assisted in the purification of genomesequenced strains. We are grateful to Anneliese Wiedlroither and Katharina Moosbrugger for maintaining strain cultures in axenic conditions. Guntram Christiansen assisted in DNA preparation and sequencing. Hanna Pritsch assisted in the preparation of circular genomic plots. The authors would like to thank Enago (www.enago.com) for the English language review. Two reviewers provided helpful suggestions to an earlier draft of this manuscript.