Integrated Metabolomic, Molecular Networking, and Genome Mining Analyses Uncover Novel Angucyclines From Streptomyces sp. RO-S4 Strain Isolated From Bejaia Bay, Algeria

Multi-omic approaches have recently made big strides toward the effective exploration of microorganisms, accelerating the discovery of new bioactive compounds. We combined metabolomic, molecular networking, and genomic-based approaches to investigate the metabolic potential of the Streptomyces sp. RO-S4 strain isolated from the polluted waters of Bejaia Bay in Algeria. Antagonistic assays against methicillin-resistant Staphylococcus aureus with RO-S4 organic extracts showed an inhibition zone of 20 mm by using the agar diffusion method, and its minimum inhibitory concentration was 16 μg/ml. A molecular network was created using GNPS and annotated through the comparison of MS/MS spectra against several databases. The predominant compounds in the RO-S4 extract belonged to the angucycline family. Three compounds were annotated as known metabolites, while all the others were putatively new to Science. Notably, all compounds had fridamycin-like aglycones, and several of them had a lactonized D ring analogous to that of urdamycin L. The whole genome of Streptomyces RO-S4 was sequenced to identify the biosynthetic gene cluster (BGC) linked to these angucyclines, which yielded a draft genome of 7,497,846 bp with 72.4% G+C content. Subsequently, a genome mining analysis revealed 19 putative biosynthetic gene clusters, including a grincamycin-like BGC with high similarity to that of Streptomyces sp. CZN-748, that was previously reported to also produce mostly open fridamycin-like aglycones. As the ring-opening process leading to these compounds is still not defined, we performed a comparative analysis with other angucycline BGCs and advanced some hypotheses to explain the ring-opening and lactonization, possibly linked to the uncoupling between the activity of GcnE and GcnM homologs in the RO-S4 strain. The combination of metabolomic and genomic approaches greatly improved the interpretation of the metabolic potential of the RO-S4 strain.


INTRODUCTION
The emergence of novel mechanisms of antimicrobial resistance is increasing and spreading worldwide, posing a challenge to mankind. The World Health Organization has stated that antibiotic resistance will be one of the biggest threats to human health in the future (Prestinaci et al., 2015). Multidrug-resistant organisms have become common not only in hospital settings but also in the wide community settings, suggesting that reservoirs of antibiotic-resistant bacteria are present outside hospitals, as reviewed by Munita and Arias (2016). This antibiotic resistance crisis has been attributed to the overuse and inappropriate use of these drugs and the lack of antimicrobial drug development by the pharmaceutical industry due to reduced economic incentives and difficult regulatory requirements (Ventola, 2015;Chokshi et al., 2019). Methicillin-resistant Staphylococcus aureus (MRSA) is the most common cause of nosocomial infections as it is very capable of developing antibiotic resistance (Chambers and DeLeo, 2009;Barbosa et al., 2020). Many challenges are faced by laboratories and clinicians in the diagnosis and treatment of MRSA infections, some of which were highlighted by Edwards et al. (2014). It is thus clear that the search for new bioactive compounds to combat antimicrobial resistance is a research priority.
Marine environments represent a largely unexplored source for the isolation of new microorganisms (Bhatnagar and Kim, 2010). They display a unique combination of environmental conditions and organisms with distinct metabolic capabilities to adapt and thrive (Bose et al., 2015;Stien, 2020). A large number of bioactive compounds have been isolated from marine organisms (Lane and Moore, 2011;Blunt et al., 2012;Jose and Jha, 2016), particularly Actinobacteria, which have been the main source of natural products in the past (Hassan and Shaikh, 2017;Lewis, 2020;Jose et al., 2021). Among the latter, the Streptomyces genus is well-known for its ability to produce a wide range of bioactive metabolites as well as antibacterial, anticancer, antifungal, antiparasitic, and immunosuppressive agents (Baltz, 2008;Olano et al., 2009), representing the most prolific source of bioactive metabolites that have been approved for clinical use, notably as antibiotics (Bérdy, 2005).
Traditionally, activity-guided fractionation of metabolite extracts, followed by purification and characterization of metabolites, has commonly been used for natural product research, but this approach often leads to the isolation of already known molecules. More recently, significant developments in genetics, genomics, and data analysis have greatly changed natural product research, leading to a new era in the emerging field of systems biology. Consequently, new avenues were opened for the discovery of novel compounds from actinomycetes (e.g., Dettmer et al., 2007;Gomez-Escribano et al., 2016;Jose and Jha, 2016;Cornell et al., 2018;Hu et al., 2020;Jose et al., 2021). Interestingly, metabolomics and genomics approaches have proven to be efficient and promising tools for defining phenotypes in a dynamic context, with the potential to reduce rediscovery rates (Khoomrung et al., 2017;Pye et al., 2017;van Der Hooft et al., 2020), and several tools have been designed for this purpose, as reported by Caesar et al. (2021). These approaches have been successfully applied to study the chemical diversity of marine bacteria and to uncover novel bioactive molecules (Macintyre et al., 2014;Schneider et al., 2018;Ye et al., 2020), despite the challenges encountered due to the complexity of biological matrices (Liu and Locasale, 2017). More recently, molecular networking, a tandem mass spectrometry (MS/MS) data organizational approach, has been introduced in the field of drug discovery , and the combination of system analyses involving multi-omics data and genome-scale, metabolic network models has greatly contributed to exploring bioactive Actinobacteria (Palazzotto et al., 2019) and has great potential to accelerate natural product discovery.
Here, we investigated the secreted metabolome of Streptomyces sp. RO-S4 in the quest for novel antimicrobial compounds against MRSA. For this purpose, we used a metabolomic approach based on ultra-performance highresolution tandem mass spectrometry (UPLC-HRMS/MS), followed by the creation of a molecular network using the Global Natural Product Social Molecular Networking (GNPS) analysis. These analyses were combined with a genomic analysis to refine and further annotate the structural hypothesis generated, and conversely, to understand the biosynthesis of the major angucyclines produced by this strain.

Isolation of the RO-S4 Strain
The RO-S4 strain was isolated from polluted seawater that was collected from the coastline of Bejaia City at S3 point (36 • 43'55.2 "N5 • 04'37.9"E) by Ouchene et al. (2022). It was isolated after filtration onto a 0.45 µm pore size membrane filter (Hirsch and Christensen, 1983) and laid onto solid starch casein agar medium (SCA, also named M2 medium by Jensen et al., 1991).

Molecular Identification of the RO-S4 Isolate
The initial molecular identification of the RO-S4 strain was based on the 16S rRNA gene sequence. Genomic DNA was extracted from the grown strain using the Wizard R Genomic DNA Purification Kit (Promega, USA) according to the manufacturer's instructions. PCR and sequencing were realized as previously described (Fagervold et al., 2013), utilizing universal primers recommended for bacteria 27Fmod: 5'AGRGTTTGATCMTGGCTCAG 3' and 1492Rmod: 5'TACGGYTACCTTGTTAYGACTT 3'. The PCR product was purified with a purification kit (Promega, USA), and then sequenced by the dideoxy termination reaction using an AB3130 DNAxl sequencer. The obtained 16S rRNA sequence was identified by comparison to the EZBiocloud database (https:// www.ezbiocloud.net/) recommended by Yoon and co-workers (Yoon et al., 2017). The 16S rRNA gene sequence of strain RO-S4 was deposited in the GenBank database under the accession number (MW448345).

Whole Genome Sequencing and Assembly of Streptomyces sp. RO-S4 Strain
Genomic DNA was isolated from 50 ml of RO-S4 grown in starch casein broth for 12 days at 28 • C with shaking (150 rpm/min). The DNA was extracted using the Bacteria Genomic DNA Extraction Kit (Promega, United States) according to the manufacturer's instructions. Illumina whole genome sequencing was performed by the Genotoul facility in Toulouse, France. Briefly, libraries (Truseq nano HT, Ilumina) were constructed using 200 ng of purified DNA and sequenced on a Novaseq 6000 sequencer (Illumina), generating 93 million paired 150 base pair (bp) reads. The entire dataset was assembled using SPAdes (v3.14.0) (Prjibelski et al., 2020) with the option "careful". The assembly was manually curated to remove contigs with low (<500) coverage and low (<55%) G+C content. Since a gene cluster of interest was truncated in a contig, we manually extended it using blast searches against raw reads and re-assembly of reads and of a downstream contig using the gap4 tool of the Staden package (http://staden.sourceforge. net). This final assembly was auto-annotated using PROKKA v. 1.14.6 (Seemann, 2014) using Streptomyces sp. Vc74B-19 protein descriptions and the annotation was manually curated prior to final submission to 1) add the ORF corresponding to GcnM in Streptomyces sp. LR32, 2) remove partial rRNA genes and possible adapters in contig ends. The genomic sequence (JAJQKZ000000000) was compared to that of close strains based on the 16S rRNA gene analysis described above, relatives (type strains) and genomes in the NCBI GenBank, based on top blastp hits using the five housekeeping genes described by Antony-Babu and coworkers (Antony-Babu et al., 2017). ANI values were calculated using the OrthoANI tool available through the EZbiocloud server (https://www.ezbiocloud.net/tools/orthoani). Further genome-based taxonomic identification using estimated DNA-DNA hybridization was performed using the TYGS server (Meier-Kolthoff and Goker, 2019).

Antimicrobial Assays
The antibacterial potential of the RO-S4 isolate was evaluated against methicillin-resistant S. aureus (MRSA) ATCC 43300 by the agar diffusion method (Shomura et al., 1979). The agar cylinders (8 mm in diameter) of the RO-S4 strain (starch casein agar medium, incubation for 14 days at 28 • C) were inserted into Mueller Hinton plates previously seeded with the targeted bacterium at 10 7 CFUs/ml. The plates were placed for 2 h at 4 • C to allow metabolites to diffuse prior to MRSA growth, and antibacterial activity was estimated by measuring the inhibition zone around the agar disc after incubation of the plates for 24 h at 37 • C.

Culture Strain and the Production of raw Extract
The production of bioactive compounds by the selected strain was carried out by agar surface fermentation, according to Nkanga and Hagedorn (1978). Briefly, the RO-S4 isolate was initially grown on an SCA medium. After 14 days, the mycelium layers were peeled off and extracted overnight in ethyl acetate (EtOAc), covering the entire surface, and the ethyl acetate extract was drawn. Subsequently, the organic extract was concentrated under a vacuum with a rotary evaporator at 40 • C and then stored at −80 • C until further analysis. A control, uninoculated medium, was extracted with the same protocol.

The Minimum Inhibitory Concentration of the RO-S4 Crude Extract
The minimal inhibitory concentration (MIC) of the RO-S4 ethyl acetate extract was evaluated against the methicillin-resistant S. aureus strain ATCC 43300 by the broth microdilution method according to the Clinical and Laboratory Standards Institute (CLSI) guidelines and as described by Wiegand et al. (2008). The assays were performed in serial dilutions (in triplicate) in 96well plates. Briefly, the EtOAc extract was diluted in DMSO and tested at different concentrations ranging from 256 to 0.5 µg/ml. The targeted bacterium culture was prepared in Mueller Hinton broth at 2•10 5 CFUs/ml. Afterward, 10 µl of the test bacterial culture was pipetted into each well. The last column (column 12) with no inoculum served as a sterility control, while wells that were not treated with the crude extract served as a negative control (column 11). The final volume of each well was adjusted to 100 µl. The microplate was shaken gently, then incubated for 24 h at 37 • C. Inhibition was evaluated as well, where the growth medium appeared clear, indicating that the test extract prevented the growth or killed the bacteria.

UHPLC-HRMS Profiling
The protocol for high-resolution Full MS data dependent MS 2 analyses was adapted from previous reports (Stien et al., 2019. Briefly, the UHPLC system was an Ultimate 3000 (Thermo Fisher Scientific) coupled to a Q-Exactive Focus Orbitrap detector and equipped with a Phenomenex Luna Omega polar C-18 column 150 mm × 2.1 mm, 1.6 µm. Here, crude bacterial and culture medium (starch casein broth) extracts were dissolved in MeOH at a concentration of 1.5 mg/ml. Pure methanol injections were used as blanks for metabolomics. In HPLC, the solvent system was a mixture of water (solution A) with increasing proportions of acetonitrile (solution B), both solvents modified with 0.1% formic acid. Here, the gradient was as follows: 5% of solution B for 5 min before injection, then from 1 to 12 min, a linear increase of B up to 100%, followed by 100% of solution B for 8 min.

MS/MS Molecular Networking Analysis and Spectra Annotation
The molecular network was constructed using the Global Natural Product Social Molecular Networking (GNPS) platform available at (https://gnps.ucsd.edu) as recommended by Wang and collaborators , using the molecular networking (MN) tool. The MS 2 data of the crude extract, solvent (blank), and culture media were converted from the RAW into mzXML file format using the Proteowizard MSConvert tool version (3.0.20104), then uploaded to GNPS. For MN construction, the precursor ion mass tolerance was set at 0.005 Da and the MS 2 fragment ion tolerance was set at 0.01 Da. A network was created where edges were filtered to have a cosine score above 0.77 and 11 or more matched peaks. The maximum size of a molecular family was set at 85. The MS 2 spectra in the network were searched against the 'GNPS spectra library'. All matches between network and library spectra were required to have a score above 0.7 and at least 6 matched peaks. The visualization of the molecular network was performed in Cytoscape (3.8.0), which allowed its visualization as a network of nodes and edges (Otasek et al., 2019). Redundancies and adducts were cleared manually. In Figure 2, node numbers are consensus parent masses, node size is linked to the relative molecular ion intensity based on the peak area measured from the extracted ion chromatogram. Peak areas were measured automatically with the FreeStyle Genesis algorithm, sometimes modified manually if found unfitting, and then pasted manually into the Cytoscape table. This information was also used to create pie charts in which each portion represents the relative peak area of different isomers included in the same node (each GNPS node is a cluster of MS 2 spectra that may come from different isomeric protonated molecular ions). For nodes gathering protonated molecular ions and in-source fragments of higher molecular weight compounds, only protonated molecular ion integrations were included for peak area information. Any ion present in the network solely due to in-source fragmentation was given the arbitrary extracted ion intensity of 0 and was white in the network. Furthermore, spectra of interest were manually annotated using different databases and tools, including Sirius (Dührkop et al., 2019), Metfrag (Ruttkies et al., 2016), available at (https://msbi. ipb-halle.de/MetFrag/), Pubchem, Sci-Finder, and Mass Bank of North America (MoNa, https://mona.fiehnlab.ucdavis.edu/). Detailed spectral data are provided in supporting information (Supplementary Figures S1-S75), and the raw spectral files are available (see Data Availability).

Prediction of Secondary Metabolite Biosynthetic Gene Clusters in the RO-S4 Genome
The sequenced genome (DNA sequence as input) of the RO-S4 strain was analyzed for the prediction of secondary metabolites and biosynthetic gene clusters (BGCs) using the genome mining tool antiSMASH version 6.0.1 (Blin et al., 2021) available through (https://antismash.secondarymetabolites.org) using the "relaxed" option. Synteny plots of the RO-S4 angucycline BGC were performed using Clinker v 0.0.21 (Gilchrist and Chooi, 2021). We also submitted the PROKKA annotated GenBank formatted files using different annotation parameters to antiSMASH. All results and descriptions of the annotation pipelines are available (see Data Availability below).

Phylogenetic Analysis of UrdE and UrdM Homologs
Phylogenetic analysis of UrdE and UrdM homologs, enzymes involved in the oxidation of angucyclines (Rix et al., 2002) in RO-S4, and in the BGCs of other angucyclines and the ring-opening Baeyer-Villiger monooxygenase MtmOIV in the mithramycin BGC was performed using SeaView Version 4.6 (Gouy et al., 2010) and MEGA v.11 (Tamura et al., 2021). Briefly, amino-acid sequences were aligned using muscle and a mask was created using the gblocks options "allow smaller blocks positions" and "allow gap positions" followed by manual curation of the mask, and a tree was generated by maximum likelihood with the LG+F model and gamma distribution using MEGA v.11 based on 359 homologous positions. A neighbor-joining tree was constructed using the JTT substitution model. The robustness of both trees was evaluated by bootstrap analysis using 100 replicates. The raw alignments, the regions used for reconstruction, and the results of model testing are available (see Data Availability).

Initial Identification, and Antimicrobial Assays
The RO-S4 strain was isolated from the Bay of Bejaia City in Algeria. It grew well on the SCA medium, showing substrate growth typical for Streptomyces strains, with a brown powdery aspect and producing a dark-brown pigment ( Figure 1A). Antimicrobial activity was first evaluated against the methicillinresistant S. aureus ATCC 43300 strain by the agar diffusion method. It exhibited antagonistic activity against this bacterium with an inhibition zone estimated at 20 mm ( Figure 1B). A MIC value of 16 µg/ml was observed against the methicillinresistant Staphylococcus aureus ATCC 43300 strain. The 16S rRNA gene sequence indicated that the strain belongs to the Streptomyces genus, with 99.79% identity to Streptomyces albogriseolus NRRL B-1305 T .

Whole Genome Sequencing
The whole genome of Streptomyces sp. RO-S4 was sequenced using the Illumina Novaseq technology. The complete genome consisted of 7,497,846 bp in 129 contigs with an NC50 of 117,347 bp and a 72.4% G+C content. We were unable to obtain a full-length rRNA in our assembly. The closest genome to Streptomyces sp. RO-S4 sequenced from a type strain was that of S. albogriseolus JCM 4616 T [assembly GCA_014650475; 90.94% ANI; Estimated % DNA-DNA hybridization d 0 59.2 (55.6-62.8), d 4 41.7 (39.2-44.2) and d 6 55.7 (52.6-58.9); 0.06 difference in G+C content]. The genome-based tree places Streptomyces sp. RO-S4 clearly exterior to the clade containing S. albogriseolus JCM 4616 T . Overall, these values indicate strain RO-S4 might represent a new Streptomyces species.

Untargeted Metabolomic Analysis and Molecular Networking
The metabolomic profile of the active ethyl acetate (EtOAc) crude extract was investigated using UPLC-HRMS. An examination of the MS and collision-induced MS/MS (MS 2 hereafter) spectra of the main constituents of the mixture indicated that most metabolite molecular ions fragmented to yield a product at m/z 487.1600 corresponding to the formula C 25 H 27 O + 10 (calcd. 487.1599) (Figure 2). The same ion was also detected as a protonated molecular ion corresponding to compound 1 ( Table 1). Compound 1 was annotated either as aquayamycin or as fridamycin A or B (Figure 4) by various dereplication tools. A comparison with experimental spectra from the MoNA database confirmed that compound 1 was fridamycin A or its diastereomer, fridamycin B. The MS 2 spectra of fridamycin A and aquayamycin are very similar (Supplementary Figure S3). Nevertheless, two fragment ions are diagnostic. These are the ions at m/z 347.09 and 427.14, the relative intensities of which are very low in fridamycin A when compared to those of aquayamycin. Fridamycin A has been previously shown to have anti-MRSA as well as antidiabetic activities (Yoon et al., 2019;Sabido et al., 2020).
As mentioned above, the [1+H] + ion was also produced as a fragment resulting from the in-source fragmentation of many metabolites in the profile. The MS 2 spectra of all ions lead to a 487.1599 fragment clustered in the same node in the molecular network (MN). The MS 2 spectrum obtained for the protonated molecular ion of compound 1 was compared to all other MS 2 spectra of equimassic ions found whenever other more complex metabolites were fragmented in the source. The fragmentation patterns were all similar, confirming that many metabolites in the MN were fridamycin A or B analogs. It was deduced that the strain biosynthesizes the central fridamycin core and then adds various substituents to generate its diverse products. Notably, the pentacyclic aquayamycin subunit was not detected in any of the annotated metabolites. This observation was supported by the presence of a biosynthetic gene cluster very close to that of Streptomyces sp. CNZ-748 (Shang et al., 2021), which also produces a majority of fridamycin-like compounds.
The parameters for MN were set to construct the best representative network containing all fridamycin analogs (Figure 2). The MN was constituted of three groups of ions. In the first group, the annotation was propagated from 1 as follows. Compound 2 is a dehydrofridamycin based on its molecular formula and MS 2 spectrum. The molecular formula of compound 3 was C 27 H 28 O 11 , which might be annotated as an acetyl-fridamycin A or B, while the position of the acetyl group could not be inferred from MS 2 . Compound 4, whose molecular formula was C 30 H 36 O 10 (m/z for [M+H] + 669.2903, calcd. 669.2905) was a fridamycin bearing a C 5 H 10 substituent. A fridamycin isopentyl ester was thought to be a reasonable putative structure based on the biosynthetic considerations below. Compound 5 could not be annotated more precisely than just with its molecular formula, but its MS 2 spectrum is also one of a fridamycin analog. These considerations indicated that 5 was new to Science.
In the second part of the fridamycin MN, compound 6a protonated molecular ion was at m/z 597.1973, corresponding to the formula C 31 H 33 O + 12 (calcd. m/z 597.1967). The formula and fragmentation pattern were consistent with the annotation of 6a as fridamycin D, previously reported to have antibacterial activity (Maskey et al., 2003). Annotation as fridamycin D was also supported by Sirius. An isomer of 6a (6b) was also detected at a different retention time in smaller relative proportions in the strain's metabolomic profile. Another major constituent of the strain metabolome was compound 7a, whose protonated molecular ion at m/z 711.2645 corresponded to the formula C 37 H 43 O + 14 (calcd. 711.2647). The molecular formula indicated that angucycline 7a may be vineomycin C, a potent inhibitor of inducible nitric oxide synthase (Alvi et al., 2000). It was annotated as vineomycin C by Sirius as well, although with a 57% CSI finger ID score. In the collision-induced MS 2 spectrum of 7a protonated molecular ion, the rhodinose (or its distereoisomer amicetose) and the aculose oxonium ions were present at m/z 115.0756 and 111.0443 (Supplementary Table S1), FIGURE 2 | Molecular network generated with the GNPS Molecular Networking tool. The diameter of the nodes represents the total extracted ion chromatogram integration of the corresponding ion peak(s) and the blue/red pie chart represents the proportions of each isomer in the cluster. The colorless nodes are clusters of MS 2 spectra of ions produced by in-source fragmentation of diverse compounds and were thus neither annotated nor integrated (integration was set to 0). Only protonated molecular ions were considered for integration measurement.
respectively, while the presence of the fridamycin A aglycone was ascertained based on the common fridamycin A fragment ions (Supplementary Figure S17). Nonetheless, the first fragmentation steps in the MS 2 spectrum suggested that the sugar sequence might be different from that of vineomycin C. The protonated molecular ion lost both water (m/z for C 37 H 41 O + 13 693.2258) and a C 6 H 12 O 3 fragment (m/z for C 31 H 31 O + 11 597.1938), which could only be assigned to the rhodinose/amicetose moiety. Hence, the rhodinose could not be placed between the aglycone and the aculose, as in vineomycin C, and compound 7a must be considered new to Science unless the published structure of vineomycin C requires revision. Possible annotations are reported in Figure 3. These alternative proposals could account for the preferential fragmentation observed in the MS 2 spectrum. The proximity of 7a with fridamycin D (6a) in the MN suggested that the most probable annotation might be the one in which aculose underwent a Michael addition, as in 6a (Figure 3).
A minor isomer of 7b also appeared in the MN under the same cluster. Based on the same considerations as for 7a, 7b could not be annotated as vineomycin C either. It was therefore new to Science. Compound 8, a protonated molecular ion with an m/z of 545.1808 was found next to compounds 6 and 7, corresponding to the formula C 31 H 29 O + 9 (calcd. 545.1806). The formula suggested that 8 might be marangucycline B (Song et al., 2015). Sirius also annotated this compound as marangucycline B, although with 58% confidence only. Marangucycline B is an analog of fridamycin D with a modified aglycone presenting weak antibacterial activity but strong in vitro cytotoxicity activity against cancer cell lines (Song et al., 2015). The presence of an aculose subunit has been confirmed by MS 2 (m/z 111.0442), but the fragmentation pattern was not attributable to marangucycline B. Hence, while this compound is probably not marangucycline B it could not be annotated further. Angucycline 9 was also new to Science. Its protonated molecular ion at m/z 667.2753 corresponded to the formula C 36 H 43 O + 12 (calcd. 667.2749). In the MS 2 spectrum, the fragmentation from the protonated molecular ion to the product at m/z 579.1851 corresponded to the loss of a neutral fragment the formula of which was C 5 H 12 O (Supplementary Figure S23). This group has been annotated above as an isopentanol, probably linked to the carboxylic acid moiety. A carbon monoxide loss was also detected from m/z 561.1750 to 533.1812, indicating that the carboxylic acid side chain of the aglycone should be present. The MS 2 spectrum also showed a hydrated aculose oxonium ion at m/z 129.0549 and its dehydrated form at m/z 111.0443. For all these reasons, compound 9 was annotated as shown in Table 1. The protonated molecular ion of compound 10 was found at m/z 781.3438, a mass that corresponds to the formula C 42 H 53 O + 14 (calcd. 781.3430). An isopentanol, a rhodinose/amicetose, and an aculose were pointed out in the fragmentation spectrum of the parent ion. Rhodinose and aculose oxonium ions were also detected at m/z 115.0754 and 111.0442, respectively, showing that 10 should be annotated as the isopentanol ester of 7a/b (Figure 4).
The formula of compound 11a protonated molecular ion was found to be C 42 H 53 O + 14 (exp. m/z 675.2438, calcd. 675.2436). This corresponded to the grincamycin H molecular formula (Zhu et al., 2017). Nonetheless, the MS 2 spectrum indicated that 11a successively lost rhodinose/amicetose and aculose, and therefore could not be annotated as grincamycin H. The mass of the protonated aglycone after oses fragmentation was m/z 451.1388, corresponding to a doubly dehydrated fridamycin A. We hypothesized that one hydroxyl group of the olivose side chain might be dehydrated and that the second H 2 O loss might be explained by ring closure of the lactone as in urdamycin L (Rix et al., 2003), a possibility supported by the analysis of the biosynthetic gene cluster discussed below. Compound 11b was presumably a minor diasteroisomer of 11a due to the high similarity of their respective MS 2 spectra. Compound 12 protonated molecular ion at m/z 677.2597 indicated the formula C 37 H 41 O + 12 (calcd. 677.2593). The MS 2 fragmentation pattern showed the successive losses of rhodinose/amicetose and aculose. The masses of the fragments generated by cleavage of the aglycone were all shifted by 2 Da relative to those of compound 11a, indicating that the two additional hydrogens were in the center of the aglycone. Thus, it could be deduced that 12 was likely the hydroquinone form of 11a. Compound 13 is a minor metabolite and could not be annotated with reasonable confidence. The molecular formula of compound 14 was C 49 H 56 O 17 (exp. m/z 917.3601 ([M+H] + ), calcd. 917.3590). The MS 2 spectrum indicated a loss of an aculosyl-rhodinose (or aculosyl-amicetose) moiety. Then the product at m/z 675.2443 lost the neutral group C 6 H 8 O (a dehydro-rhodinose), indicating that the aculosyl-rhodinose moiety was linked to another rhodinose. Then the aglycone ion at m/z 451.1398 was produced by the loss of another aculose. The molecular weight of this aglycone ion (fridamycin A−2 H 2 O) along with the proximity of other lactonic aglycones in the MN spoke in favor of 14 also being a lactone derivative, as reported in Table 1. The fragmentation spectrum of compound 15 was not very clear and 15 could not be annotated with enough confidence. At m/z 901.3653, compound 16 protonated molecular ion indicated the formula C 49 H 57 O + 16 (calcd. 901.3641). In MS 2 , the ion 16+H + lost an aculosylrhodinose group to give the product at m/z 659.2499, which then lost dehydro-rhodinose. Further dehydration produced an ion at m/z 545.1790 which again lost aculose to yield the aglycone ion at m/z 435.1471. This aglycone was doubly dehydrated compared to fridamycin A, indicating that the C-olivosyl group in 16 may be dehydrated. The structure proposed in Table 1 appeared to be a reasonable hypothesis for 16. The molecular formula of compound 17 is C 49 H 58 O 18 (exp. m/z 952.3960 for [M+NH 4 ] + , calcd. 952.3961). This molecular formula and sodium adduct fragmentation pattern, in which two successive aculosyl-rhodinose losses were recorded, were compatible with the annotation of 17 as vineomycin B2, previously reported to have antibacterial activity (Omura et al., 1977;Imamura et al., 1982).
In the third part of the fridamycin MN, compounds 18a and 18b were annotated as isomers with the molecular formula C 31 H 34 O 12 (exp. m/z 599.2124 for [M+H] + , calcd. 599.2123). Both isomers fragmented extensively in the ESI source to produce the fridamycin A protonated molecular ion [1+H] + losing C 6 H 8 O 2 , i.e., a dehydro-cinerulose A moiety. The cinerulose A oxonium ion was also detected in MS 2 , while the aglycone fragmentation was very similar to what was recorded for [1+H] + . Angucyclins 18a and 18b were therefore annotated as cinerulosyl-fridamycin A or B. The position of the cinerulosyl side chain was not determined and may not be identical for both isomers. Compounds 19a and 19b were annotated as isomers with the molecular formula  Both isomers also fragmented extensively in the ESI source to produce the fridamycin A protonated molecular ion [1+H] + losing C 6 H 10 O 2 , i.e., dehydro-rhodinose/amicetose subunit, the corresponding oxonium of which was also present in MS 2 . As mentioned above, the MS 2 spectrum of the aglycone protonated ion formed by in-source fragmentation was identical to the one of [1+H] + , therefore confirming that 19a or b should not be annotated as grincamycin L (Bao et al., 2018). Instead, 19a and 19b were annotated as rhodinosyl-and/or amicetosylfridamycin A/B and should be considered new in the field. Angucycline 19a was one of the major constituents in the profile of the strain. spectrum showed the loss of 1 rhodinose/amicetose subunit, yielding a product at m/z 558.1983 (C 28 H 32 NO + 11 ), which then successively lost water, CH 2 O 2 (formic acid or H 2 O+CO), and C 2 H 5 N to generate a protonated didehydro-fridamycin A at m/z 451.1393. This fragmentation pattern was compatible with 24 being an alanine amide of 19, as shown in Table 1. The presence of an alanine subunit was also supported by the fragment ion at m/z 90.0550, corresponding to an alaninium ion (Tan et al., 2010). Compound 25 molecular formula was C 37 H 46 N 2 O 14 (exp. m/z 743.3026 for [M+H] + , calcd. 743.3022). In-source fragmentation indicated the successive loss of a dehydro-rhodinose/amicetose group and a C 6 H 10 N 2 O 2 subunit. The MS 2 spectrum highlighted the loss of one dehydro-rhodinose/amicetose and two alanine subunits to generate the protonated didehydro-fridamycin A at m/z 451.1393 (see Supplementary Figure S65). Overall, compound 25 could be annotated with high confidence as shown in Table 1. Neighboring minor compound 26 in the MN could not be annotated. Compound 27 was an analog of 25 with one rhodinose/amicetose subunit more, the presence of which could be ascertained by examination of both in-source fragmentation scheme and the collision-induced MS 2 spectrum. Compound 28 molecular formula was C 37 H 40 O 14 (exp. m/z 709.2490 for [M+H] + , calcd. 709.2491). The MS 2 spectrum clearly indicated that the protonated molecular ion lost both dehydrorhodinose/amicetose and a C 6 H 8 O 4 neutral fragment, therefore confirming that this compound was new and should not be annotated as saprolmycin B (Nakagawa et al., 2012). However, the annotation remained ambiguous and 28 were not annotated further. Compounds 29 and 30 were analogs of 28; 29 did not have the rhodinose/amicetose subunit, while 30 had an aculose in place of the rhodinose/amicetose. All the annotated compounds' MS 2 spectra were provided in the supporting information (Supplementary Figures S1-S75), and a list of sugars potentially linked to the angucyclins is provided in Supplementary Table S1.

Secondary Metabolite Biosynthetic Gene Clusters of Streptomyces sp. RO-S4
The genome of the RO-S4 strain was analyzed by the antibiotics and Secondary Metabolite shell (antiSMASH) to determine its putative biosynthetic capabilities. A total of 19 putative biosynthetic gene clusters (BGCs) were annotated (Supplementary Table S2), including three types of polyketide synthases [Type 1 (T1PKS), Type 2 (T2PKS), and Type 3 (T3PKS)] polyketide synthases, class I Lanthipeptides, a Lassopeptide, a Ribosomally Synthesized and Post-Translationally Modified Peptide (RiPP), an Ectoine, a Terpene, Phenazines, and a Butyrolactone BGC. In addition, four hybrid clusters were recovered that were composed of 1). T2PKS, Oligosaccharide Phenazine, Siderophore, 2). RiPP-like, betalactam, Terpene, 3). Two hybrid Non-ribosomal Peptide Synthase (NRPS), and T1PKS. The analysis showed that 17 out of the 19 identified BGCs showed high content similarity with known BGCs, five of which (3, 5, 7, 13, and 17) showed 100% content similarity with known BGCs. Two clusters (Cluster 10 and 16) were annotated as orphan BGCs for which no homologous gene clusters could be identified, suggesting that they could be responsible for the biosynthesis of novel natural products or natural products with no characterized BGCs. Many of these clusters are known to encode genes linked to the production of biologically active natural compounds, such as antibiotics. Notably, we have recovered a T2PKS BGC very similar to those linked to the biosynthesis of angucycline compounds, consistent with the metabolomic analysis.

Description of the Angucycline Biosynthetic Gene Cluster
AntiSMASH analysis using an unannotated genomic DNA sequence revealed that Cluster 2 in Contig RO-S4_2 of the assembly displayed a high synteny (> 65% common genes) to that of BGCs linked to several angucycline compounds, such as grincamycin (97%), saprolmycin E (83%), saquayamycin A (75%), landomycin A (71%), and saquayamycin Z (67%). All these compounds share a common tetracyclic angular benz[a]anthraquinone aglycone. Due to the predominance of tricyclic aglycones ("open" aglycone(s) hereafter) among the major metabolites of the RO-S4 strain, we performed an in-depth analysis of Cluster 2. Cluster 2 contains genes putatively involved in the biosynthesis and modification of the aglycone core. A typical set of genes responsible for an angucycline core assembly named "minimal PKS" has been identified, supporting the synthesis of angucycline-like molecules by its cluster. This included three genes: a ketoacyl synthase α (LRR80_00487), a ketoacyl synthase β/chain length factor CLF (LRR80_00488), and an acyl carrier protein (ACP) (LRR80_004989). Two possible cyclase genes (LRR80_00486 and LRR80_00491) were also annotated, which are likely responsible for the polyketide chain cyclization into the benz[a]anthracene structure. In addition, this cluster harbors two genes encoding oxygenase enzymes (LRR80_00485 and LRR80_00492) probably involved in the modification of the aglycone and possibly in the lactonization and opening of the angular aglycone cycle (see discussion below). (Keto)reductasecoding genes, including LRR80_00490 and LRR80_00470, were annotated to exhibit a high degree of sequence similarity to known enzymes involved in the modification of aromatic polyketides. Three genes (LRR80_00495, LRR80_00496, and LRR80_00498) likely associated with the glycosylation steps showed high similarity to genes coding for glycosyltransferases (GTs) in other angucyclines. All the annotated genes involved in the BGC of Cluster 2 and their homologs are listed in Supplementary Table S3.
The closest BGCs to RO-S4 Cluster-2 are those of the grincamycin-producing Streptomyces lusitanus SCSIO LR32 (Gcn LR32) (Zhang et al., 2013) and Streptomyces sp. CZN-748 (Gcn CZN-748 (Shang et al., 2021) graciously provided by the authors). We performed a synteny analysis comparing the three BGCs and notably, when PROKKA (Seemann, 2014)-annotations were used for RO-S4 and CZN-748, the GcnM ORF was absent in these strains, as previously described by Shang and co-workers (Shang et al., 2021). In contrast, the BGC annotated by AntiSMASH from genomic sequences identified these ORFs. This difference could be explained by a possible tRNA ala in the region coding the GcnM orthologs (Supplementary Figure S76). In addition, the comparison between the BGC of LR32 identified two missing genes (GcnU and GcnT) in the BGC of RO_S4_2, whereas the region downstream of GcnS8 was not present in the available BGC of strain CZN-748 ( Figure 5). Cluster 2 of RO-S4 showed near complete synteny and a higher average amino acid identity (99.8% for 28 common ORFs) to the grincamycin BGC of CZN-748, which fits the observation that both strains produce a majority of "open" aglycone angucyclines, whereas LR32 (93.2% average amino acid identity for 28 common ORFs) produces primarily tetracyclic angucyclines, as previously noted by Shang et al. (2021).

DISCUSSION
Here we report the use of a combined genomic-metabolomic approach to investigate the antagonistic potential of the Streptomyces sp. RO-S4 strain isolated from a polluted marine environment. Based on 16S rRNA gene sequencing and genomic analysis, the strain belongs to the genus Streptomyces, likely meriting to be considered the type strain of a new species. RO-S4 extracts show inhibitory activity against the methicillinresistant Staphylococcus aureus ATCC 43300 strain with a MIC of 16 µg/ml.
Metabolomic analyses of the crude extract produced by the RO-S4 strain using mass-spectrometry-based molecular networking revealed diverse angucycline derivatives as dominant products, which have mostly (but not exclusively) been linked to the Streptomyces genus (Hu et al., 2016;Peng et al., 2018). Angucyclines represent the largest group of Type 2 PKS natural products produced by actinobacteria, and they show diverse pharmacological activities including cytotoxicity, antitumor, antibacterial, and antiviral properties (Rohr and Thiericke, 1992;Kharel et al., 2012). Two of the predominant ions (fridamycin A and D) have previously shown anti-MRSA activity (Sabido et al., 2020), albeit with a higher MIC than raw extracts of Streptomyces sp. RO-S4, but also cytotoxic activity (e.g., Bao et al., 2018), like many other angucyclines. Combined with the lower ion integrated peak sizes of the novel molecules, these factors guided our choice of not further purifying and testing novel molecules in this study. In silico analysis or synthesis will probably be better strategies to evaluate the biotechnological potential of these molecules in the future.
We have demonstrated that many of the compounds identified by our untargeted metabolomic analysis are novel to Science, and this high diversity of novel molecules can be explained by the ability of HRMS to highlight minor compounds even though some of the major metabolites are also new to Science. Our annotation of the RO-S4 angucycline-like compounds was further supported by genomic analysis.
The combination of metabolomic and genome analysis provides some interesting insights into the biosynthesis of angucyclines. Cluster 2 shows high content, synteny, and sequence similarity to previously described BGCs of grincamycin-like products and is certainly responsible for producing the structures predicted by the MN analysis. Since the RO-S4 strain and Streptomyces CNZ-748 produce primarily tricyclic glycosylated structures, while LR32 and many other strains produce tetracyclic angucyclines, we were interested in possible enzymatic processes that could lead to the biosynthesis of these tricyclic aglycons (Shang et al., 2021). Several studies have indicated that oxygenase complexes are required for cyclic C-C bond cleavage, in particular Baeyer-Villiger type oxygenases (Fürst et al., 2019). In the case of RO-S4 products, we hypothesized that this reaction would take place by oxidation of the C12b-C1 single bond of the aglycone (an UWM6-like molecule), prior to or after glycosylation, and subsequent hydrolysis of the lactone (Figure 6). We have therefore focused the analysis on putative Baeyer-Villiger mono-oxidases (BVMOs) in Cluster 2 of the RO-S4 strain.
The observation that-as in the case of the grincamycin BGC of strain CZN-748-annotation with PROKKA (Seemann, 2014) failed to identify an ORF downstream of the T2PKS synthases and the cyclase putatively involved in the generation of the angular cycle [the region coding for GcnM in S. lusitanus LR32 (Zhang et al., 2013)], suggested that this ORF could be involved in the ring opening process. Since AntiSMASH identified an ORF both in RO-S4 (LRR80_00492) and CNZ-748 (CTG1-52) when a genomic sequence was used as the query, we determined that PROKKA failed to annotate the ORF since its Aragorn (Laslett and Canback, 2004) step identified a putative tRNA ala (Supplementary Figure S76) in its complementary strand. However, the facts that the putative tRNA is in the reverse strand, that it contains mismatches in the side hairpins, and that four more canonical tRNA ala are coded in the genome, suggest that this tRNA ala could have been misidentified. We performed an RNA fold analysis (Gruber et al., 2008) that identified a highly probable and low-entropy hairpin loop that could potentially affect the translation of this ORF (Supplementary Figure S77) and possibly decreases the production of the coded protein.
The LRR80_00492 ORF codes a hybrid FAD-dependent oxidase-reductase (GcnM in the grincamycin BGC) homologous to UrdM, that has been linked to C12b hydroxylation in the biosynthesis of urdamycins by S. fradiae TÜ 2717 (Faust et al., 2000;Rix et al., 2003). Furthermore, a mutant with an inframe deletion of the reductase domain of UrdM produced small amounts of urdamycin-L, a product containing oxygen FIGURE 5 | Comparison between Cluster 2 of Streptomyces sp. RO-S4, the Grincamycin Gene Cluster of Streptomyces lusitanus SCSIO LR32, and the BGC of Grincamycin-producing Streptomyces sp. CNZ-748. Gene neighborhoods representative of the compared BGCs are shown aligned with an arbitrary color scheme using clinker to highlight the conserved genes. The arrow's colors represent the gene clusters identified by Clinker, which encode similar proteins. The lines linking the arrows show gene-encoding proteins that share more than 80% sequence identity. Missing genes in RO-S4 compared to LR-32 were circled in red.
between C12b and C1, leading to the hypothesis that UrdM is involved in the C12b-C1 bond oxidation and subsequent lactone Baker-Venkataraman rearrangement leading to the tetracyclic skeleton of aquayamycin-like angucyclines (Figure 6) (Rix et al., 2003). However, in this model, low levels or absence of LRR80_00492 due to the secondary structure described above would not lead to fridamycin-like aglycones as those in RO-S4 (Table 1), and another BVMO should be responsible for the oxidation of the C12b-C1 bond in UWM 6 (or other intermediates) leading to compounds 11, 12 and 16 and fridamycin-like aglycones.
The BGC of RO-S4 and of all grincamycin-producing strains contain a second FAD-dependent putative BVMO product of LRR80_00485 (GcnE in the grincamycin BGC) that is homologous to FAD-dependent monooxygenases involved in angucycline modifications (e.g., UrdE, PgaE, BexE, CabE; Supplementary Figures S78, S79). In earlier studies, UrdE was hypothesized to directly hydroxylate different positions of the aglycon (C6, C12, C12b) (Decker and Haag, 1995;Faust et al., 2000;Rix et al., 2002) in urdamycin biosynthesis, but more recent evidence have suggested that its homolog PgaE might also oxidize the C12b-C1 bond of prejadomycin leading to the tricyclic aglycons of gaudimycins D and E (Guo et al., 2015). In vitro assays using enzymes heterologous expressed in E. coli also showed that PgaE/CabE oxidizes UWM6 and is dependent on the PgaM red homolog CabV to complete the hydroxylation of UWM6 at C12b (Kallio et al., 2011).
Since in the RO-S4 BGC there are two possible FADdependent mono-oxidases that could be involved in the oxidation of the C12b-C1 bond and subsequent ring opening of RO-S4 and CNZ-748, we attempted to compare the sequences of both ORFs to different enzymes oxidizing analogous cyclic compounds, including MtmOIV, shown to perform Baeyer-Villiger oxidation and ring opening of premithramycin B to mithramycin. Blastp analyses indicated that the LRR80_00492 (UrdM like)/MtmOIV alignment was shorter and had a lower overall score, but a higher number of identical positions, whereas the LRR80_00485 (UrdE-like)/MtmOIV alignment was longer and had a higher total score, but with fewer identical positions. Phylogenetic analyses including UrdE homologs and the oxidase portion of UrdM homologs separated these oxidases into two groups, with maximum likelihood and distance methods showing that LRR80_00485 was slightly closer to MtmOIV than LRR80_00492, and in a subclade including PgaE (Supplementary Figures S78,  S79).
In aggregate, these results lead to different possibilities that could be tested in the future using genetic modifications of the different ORFs in the angucycline BGC of the RO-S4 strain or CNZ-748. Hypothesis (1): the product of LRR80_00492 would be solely responsible for the oxidation of the C12b-C1 bond. This hypothesis is supported by the prediction that this enzyme has several AAs unique to RO-S4 and CNZ-748. On the other hand, since the reductase portion of the enzyme is present, one would expect that tetracyclic angucyclines with a hydroxylated C12b would be produced. Hypothesis (2): the most likely hypothesis based on our work on the RO-S4 strain is that the LRR80_00492 ORF is inactive due to the presence of a tRNA ala in the coding region or that its translation is affected by secondary structure, in which case LRR80_00485 would generate the lactone via Baeyer-Villiger oxidation, possibly allowing for a later opening of the ring. Other alternative hypotheses that could be related to the ring opening are Hypothesis (3): that the LRR80_00492 ORF would be partly transcribed due to the mRNA secondary structure, which would allow its oxidase portion to be transcribed but not the reductase, much like the case with the UrdM partial knockout mutant that produces urdamycin L (Rix et al., 2003) and is responsible for C12b-C1 single bond oxidation and Hypothesis (4): another BVMO enzyme not in the BGC could be responsible for ring opening. The genome of RO-S4 codes for another enzyme with a slightly higher blastp score when queried with MtmOIV. That ORF is present in a hybrid NRPS-T1PKS hybrid BGC similar to that of polyoxypeptin A BGC (Du et al., 2014). However, the function of that homolog (ORF4) in the polyoxypeptin A BGC has not yet been described.
Hence, based on similarities between known GTs and the sugars annotated by metabolomic analysis, we hypothesize that a D-olivose is added to C9 by LRR80_00498 and further O-glycosylated with rhodinose/amicetose by one of the three glycosylases, and in some cases (ex. compound 9), experiencing a Michael addition, or further glycosylation as in compounds 19 and 27 (and in previously described angucyclines). Interestingly, all compounds glycosylated at position C3 in RO-S4 appear to have a rhodinose/amicetose as the sugar moiety, as do all grincamycins. It is worth noting that the composition and varying lengths of the oligosaccharide chains in angucyclines have a considerable impact on their biological potential, as previously reported by Elshahawi et al. (2015).
In addition to the rare lactonized D cycle, we also detected other modifications that, to the best of our knowledge, have so far not been shown for fridamycin-like molecules, including alanyl amidation(s) (compounds 25-28). Amide modifications have been previously shown in fridamycin G (Myronovskyi et al., 2016), and fridamycin I produced by Actinokineospora spheciospongiae (Tawfike et al., 2019). Fridamycin G contains an ethanolamine moiety and was produced heterologously, and the amidation was attributed to a process linked to the host since the BGC source (S. cyanogenus S136) does not produce it. Fridamycin I contains a benzylamine moiety, the biosynthetic origin of which was not discussed. As compounds 25-28 contain multiple alanine moieties, we speculate that fridamycin-like molecules could undergo peptide-like elongation with aminoacids (alanine or L-p-hydroxyphenylglycine in the case of fridamycin I), perhaps via an NRPS in a manner analogous to what has been described in the biosynthesis of actinomycin-D (Keller et al., 2010). The presence of an hpgT homologthe enzyme linked to L-p-hydroxyphenylglycine production in actinobacteria (Hubbard et al., 2000)-in the Actinokineospora spheciospongiae, supports this hypothesis.

CONCLUSION
Our combination of untargeted UPLC-HRMS/MS metabolomic, molecular networking, and genomic analyses generated many structural and biosynthetic hypotheses for targeted structural determination and genetic manipulation, possibly using recently developed gene editing approaches (e.g., Cobb et al., 2015;Tong et al., 2020). This approach does not substitute traditional isolation-NMR structural analyses and genetic manipulations, which ultimately will be needed to confirm these hypotheses. However, as it yields structural information of many minor compounds linked to BGCs in a relatively fast manner, it can streamline and accelerate pipelines of discovery of new drugs, biosynthetic pathways, and enzymes, and hopefully inspire the discovery of novel antibiotics effective against multiresistant microorganisms.