Metabolomics and Marine Biotechnology: Coupling Metabolite Profiling and Organism Biology for the Discovery of New Compounds

The high diversity of marine natural products represents promising opportunities for drug discovery, an important area in marine biotechnology. Within this context, high-throughput techniques such as metabolomics are extremely useful in unveiling unexplored chemical diversity at much faster rates than classical bioassay-guided approaches. Metabolomics approaches enable studying large sets of metabolites, even if they are produced at low concentrations. Although, metabolite identification remains the main metabolomics bottleneck, bioinformatic tools such as molecular networks can lead to the annotation of unknown metabolites and discovery of new compounds. A metabolomic approach in drug discovery has two major advantages: it enables analyses of multiple samples, allowing fast dereplication of already known compounds and provides a unique opportunity to relate metabolite profiles to organisms’ biology. Understanding the ecological and biological factors behind a certain metabolite production can be extremely useful in enhancing compound yields, optimizing compound extraction or in selecting bioactive compounds. Metazoan-associated microbiota are often responsible for metabolite synthesis, however, classical approaches only allow studying metabolites produced from cultivatable microbiota, which often differ from the compounds produced within the host. Therefore, coupling holobiome metabolomics with microbiome analysis can bring new insights to the role of microbiota in compound production. The ultimate potential of metabolomics is its coupling with other “omics” (i.e., transcriptomics and metagenomics). Although, such approaches are still challenging, especially in non-model species where genomes have not been annotated, this innovative approach is extremely valuable in elucidating gene clusters associated with biosynthetic pathways and will certainly become increasingly important in marine drug discovery.


INTRODUCTION
Natural products (i.e., compounds produced by living organisms) are key to drug development, and have proved especially useful in the development of anticancer and anti-infective agents (Rodrigues et al., 2016;Liu et al., 2019). The marine environment, which harbors a large and yet highly unexplored biodiversity, is an extremely rich source of novel and structurally unique compounds with antibacterial, antifungal, antiviral, antiparasitic, antitumor, anti-inflammatory, antioxidant and immunomodulatory activities (Abdelmohsen et al., 2017;Carroll et al., 2020;Mayer et al., 2020). Some marine organisms such as sponges are amongst the most prolific sources of natural products, suggesting that the marine environment is a reservoir of natural products with high relevance for marine biotechnology and drug discovery (including against resistant pathogens) (Thakur andMüller, 2004, Abdelmohsen et al., 2017;Liu et al., 2019). The development of high-throughput techniques such as genomics and metabolomics (i.e., the study of all small molecules, <2,000 Da, in an organism) promises to revolutionize natural product discovery, which according to some estimates, could outpace antibiotic discovery at its peak in the 1950s (Fortman and Mukhopadhyay, 2016). The use of these techniques, does not only offer an increase in the speed of natural product discovery and decrease in the rediscovery rate (Wolfender et al., 2019;van der Hooft et al., 2020), but provides a wide-array of unprecedented opportunities to discover unexplored chemical diversity and elucidate the biological and molecular mechanisms involved in metabolites production (e.g., Cantrell et al., 2019;Mohanty et al., 2020). Here, we provide a synthetic overview of the advantages of using metabolomic approaches in marine natural product discovery and marine biotechnology. Metabolomics is a scientific field at the interface of different disciplines (chemistry, bioinformatics, ecology, microbiology, and systems biology); therefore, this mini-review aims to illustrate how the multidisciplinarity of metabolomics is a key asset for the advancement of marine natural product discovery.

SPEEDING UP MOLECULE IDENTIFICATION
Bioassay-guided fractionation (i.e., extract fractionation and purification based on targeted activities) has been classically used for the discovery of new natural products (Kildgaard et al., 2017;El Menif et al., 2019). However, this approach is not only time-consuming, but often results in the isolation of previously known compounds, since molecule dereplication is performed at the end of the workflow (Herath et al., 2019;Pereira et al., 2019). In recent years, metabolomics has arisen as a new tool to circumvent some of the bioassay-guided fractionation limitations and has shown its effectiveness in the discovery of new active compounds from marine sources (Einarsdottir et al., 2017;Naman et al., 2017;Stuart et al., 2020). Metabolomics approaches are based on the simultaneous screening (using different spectrometry techniques such as LC-MS, GC-MS, and NMR spectroscopy) of a large number of samples and the use of numerical analyses and databases to detect patterns and relevant metabolites, according to a previously defined biological question (Gertsman and Barshop, 2018;Wolfender et al., 2019). For example, the study of shallow water hydrothermal vent sponges by mass spectrometry-based metabolomics combined with cytotoxicity bioassays led to the selection of three sponges harboring a unique chemical diversity and the putative identification of the minor original compound cyclostellettamine P (Einarsdottir et al., 2017). By combining direct analysis of complex extracts with machine learning (i.e., the use of computer algorithms that continuously improve from experience for example to find patterns), metabolomics allows studying complex chemical mixtures, including minor compounds, whilst accelerating the dereplication process and decreasing the re-discovery rate (Wolfender et al., 2019;Liebal et al., 2020;van der Hooft et al., 2020). Combining large-scale biological screening with untargeted metabolomics of multiple organisms can lead to the fast identification of specimens producing unique compounds and their prioritization in further investigations (Luzzatto-Knaan et al., 2017;Réveillon et al., 2019).
Metabolomic approaches require, however, method standardization (same extraction and analytical conditions) between the different samples analyzed, which can lead to metabolite losses (Gertsman and Barshop, 2018). Data processing and curation which can be done using several software (e.g., R, MetGem, OpenMS, MZmine 2) and online platforms (e.g., Workflow4Metabolomics 1 , XCMS Online -xcmsonline.scripps.edu, Metaboanalyst 2 ), may be laborious and the identification of marine compounds remains challenging since marine databases are still scarce compared to terrestrial ones (Barbosa and Roque, 2019). The development of bioinformatics tools, such as mass spectrometry-based molecular networking, are emerging as approaches that could fill some of these gaps and facilitate dereplication (Ramos et al., 2019;Aron et al., 2020, Figure 1). In molecular networking (such as implemented by the GNPS: Global Natural Products Social Molecular Networking 3 ), algorithms detect and group metabolites based on the similarity of their fragments (mass ion spectra from tandem mass spectrometry), allowing the identification of candidate molecule clusters (i.e., groups of molecules grouped together) (Wang et al., 2016). Then molecular networks can be annotated by identifying matching cluster members in existing databases or by using tools such as Unsupervised Substructure Discovery (MS2LDA 4 ). For example, using this approach, two smenamide-related clusters containing unknown analogs with anti-proliferative activity against cancer cells were identified in Caribbean sponges (Caso et al., 2019). Then, a targeted isolation can be performed to confirm the activity and the structure of the bioactive compounds using NMR experiments (Li et al., 2018). Using this approach, new pyrrole derived alkaloids were isolated from a marine bacterium FIGURE 1 | Metabolomics pipeline illustrating the different steps: experiment design according to the biological research question (1), sample extraction (2), MS n analyses (3), pre-processing the raw MS n data using softwares such as XCMS, mzMine2 or OpenMS (4), data numerical analyses to identify patterns and select interesting metabolites (5), molecular networking analysis using tools such as GNPS or MetGem (6), network and metabolite annotation using tools such as MS2LDA or databases (Metlin, MarinLit) (7), and purification and elucidation of new natural products (8). and were identified with the analysis of their HRMS, MS/MS, and NMR data . Molecular networking is also a powerful tool in minor natural product dereplication (Cantrell et al., 2019). For example, sarasinosides and melophlins were identified using molecular networking tools, despite only small amounts of sponges being available (Mohanty et al., 2020). Finally, a recent tool called "bioactivity-based molecular networking" allows to accelerate the identification of bioactive candidate molecules by integrating data on the activity of extracts/fractions directly into the network via a bioactivity score (Nothias et al., 2018). The use of such a tool led to the isolation of new decalinoylspirotetramic acid derivatives (pyrenosetis A-C) from a seaweed-derived fungus (Fan et al., 2019).

LINKING CHEMICAL DIVERSITY AND ECOLOGY
Secondary metabolites are responsible for a diverse array of ecological functions. While most ecological studies focusing on the function of known natural products investigate defense against predators, allelopathy, or interactions with microbes (Rohde et al., 2015;Puglisi and Becerro, 2018), metabolites are often involved in multiple additional life history traits such as reproduction, recruitment (Hay, 2009;Tebben et al., 2015), or resistance against abiotic stressors (Pavia and Brock, 2000). Because of the diverse array of ecological functions, the biosynthesis of metabolites is highly variable and strongly affected by environmental and biological conditions. Thus, biotechnological exploitation of natural products faces the challenge that metabolic variations occur at inter-and intraspecific scales, vary with time and space and as such make it difficult to interpret multiparametric responses. Before the emergence of metabolomics approaches, chemical ecologists relied on targeting few well-characterized compounds to identify metabolites' ecological functions. For example, a study that investigated the spatial variation of six major metabolites of the sponge Stylissa massa showed high variation in metabolite concentrations, but most of the variability could not be assigned to specific factors (Rohde et al., 2012). The development of metabolomics provides significant opportunities to investigate patterns in metabolite variation, expanding the focus from few to hundreds of metabolites, which could assist in unraveling the multiple pathways affected by environmental factors (Figure 1). A study of the metabolome of the red algae Gracilaria vermiculophylla in response to herbivory revealed 19 upregulated metabolites with some compounds increasing more than 100-fold in concentration, illustrating the broad spectrum of metabolites that are relevant in single interactions (Nylund et al., 2011). The study of the metabolome of the brown algae Lobophora rosacea showed that out of 262 features identified by LC-MS (i.e., peaks with specific retention time and mass to charge ratio m/z), the production of 53 features changed under different pH conditions (Gaubert et al., 2020).
Understanding the variability of metabolite production is not only relevant for ecological purposes, but is important for natural product research and potential drug development. One on side, understanding the biological and ecological factors that affect metabolite production is essential for optimal harvesting of the producing organisms. For example, production of peloruside A (a metabolite with potent anticancer activity, Altmann and Gertsch, 2007), from large-scale aquaculture of the sponge Mycale hentscheli has been challenging, since its production is altered by parameters such as light and fouling intensity in the farm setting Page M.J. et al., 2005). Similarly, the production of specialized metabolites such as the anticancer compound panicein A hydroquinone (Fiorini et al., 2015), decreased over 10-fold in the putative reproductive months (Reverter et al., 2018), highlighting the importance of timely sponge collection. On the other side, different biotic and abiotic conditions can result in the production of distinct metabolites (for example through the expression of silent genes), suggesting that a deep understanding on the factors underlying metabolite production can also contribute toward the discovery of novel compounds (Elias et al., 2006;Romano et al., 2018). This has triggered the development of new approaches mostly in the microbiology field such as the "OSMAC" (One Strain Many Compounds) and co-culture approaches that aim to produce differential compounds by modifying the abiotic or biotic culture conditions (Fan et al., 2019;Sproule et al., 2019).
Metabolomics have proven a powerful, although mostly descriptive, tool to identify metabolic patterns and responses (e.g., Reverter et al., 2018;Gaubert et al., 2020). However, bioassays are still needed to verify the functions of the identified chemomarkers. The development of ecological realistic bioassays to identify the function of metabolites could also lead to the identification of suitable conditions promoting metabolite production with direct implications for bioprospecting (Ledoux and Antunes, 2018). For example, several transplant and nutrient enrichment experiments showed that the production of the allelopathic compound sarcophytoxide by the soft coral Sarcophyton increased with higher nutrients and when the soft corals were in contact with other species (Fleury et al., 2004). While these in situ experiments provide an important tool to identify environmental factors that affect metabolite production, their reproducibility is often constrained by disregarded parameters that affect the metabolome, masking single factor effects.

HOST-MICROBE INTERACTIONS
Many animals and plants harbor exceptionally diverse communities of microorganisms (i.e., bacteria, fungi, viruses) that perform vital functions in the holobiont (Bewley et al., 1996;Thomas et al., 2016;Wilkins et al., 2019). In marine organisms, associated microorganisms are sometimes identified as the producers of the natural products found in their hosts (e.g., Wilson et al., 2014;Morita and Schmidt, 2018;Rust et al., 2020). For example, recent studies have demonstrated that many sponge-isolated compounds such as different polyketides, peptides and bromotyrosinederived alkaloids, are in fact, produced by their associated microorganisms (Wilson et al., 2014;Nicacio et al., 2017). Another example includes γ-pyrones, which are often found in molluscs such as cone snails, and which were shown to be produced by the associated cultivable bacteria Nocardiopsis alba (Lin et al., 2013). Classical approaches used to investigate the contribution of host-associated microorganisms in natural product synthesis include isolation and host-independent culture of the microorganisms (Lin et al., 2013;Nicacio et al., 2017). Host-associated culture with miniature incubation chambers inside the host has also been attempted with varying success (Steinert et al., 2014). However, these approaches are unsuitable for uncultivable microorganisms, which have been proposed to be at least 70% of the known taxa for the prokaryotic phyla alone (Achtman and Wagner, 2008). Furthermore, the holobiont comprises a complex network of microorganisms that are in constant communication and signaling, which regulates gene expression and metabolite synthesis (Hughes and Sperandio, 2008;Pande and Kost, 2017). Therefore, isolation and repeated culture of host-associated microorganisms can result in a loss of compound production (Morita and Schmidt, 2018). Within this context, the combined study of the holobiont microbiome and metabolome provides new exciting opportunities to explore host-symbiont relationships. Identification of consistent coassociations between compounds and microorganisms using correlation tools can assist in metagenomics mining to search for natural product biosynthetic gene clusters (BGCs) in the holobionts (Paix et al., 2019;Reverter et al., 2020, Figure 2). For example, cyclic dipeptides found in the algae Taonia atomaria were found to be highly correlated with a BD1-7 clade bacterial taxon from the Alteromanadaceae family (Paix et al., 2019). Similarly, several hemoglobin-derived peptides found in the butterflyfish Chaetodon lunulatus were highly correlated to a Fusobacteriaceae strain (Reverter et al., 2020), suggesting a possible direct or indirect involvement of these bacteria in the production of the aforementioned compounds.

MULTI-OMICS INTEGRATION
Although several natural product BGCs have been elucidated using metagenomics (e.g., Wakimoto et al., 2014;Storey et al., 2020), most of them have yet to be linked to the metabolites they encode (Amos et al., 2017). Metabolomics is considered the last link in the systems biology chain, therefore, its coupling with other high-throughput technologies, such as transcriptomics and genomics, provides a bridge to link molecular mechanisms with metabolite production (Amos et al., 2017;van der Hooft et al., 2020). Such integration techniques provide a deeper understanding of the molecular mechanisms involved in the production of biologically active compounds through the identification of BGCs, gene expression patterns and enzymes related to the produced metabolites van der Hooft et al., 2020). For example, an integrated metabolomic-transcriptomic analysis of the sea urchin Strongylocentrotus intermedius allowed identification of critical genes related to eicosanoid acid biosynthesis (Wang et al., 2020). A study of Dysideidae sponges using mass spectrometry, molecular fragmentation and NMR spectroscopy identified a large-array of new polybrominated diphenyl ethers (PBDEs) in these sponges (Agarwal et al., 2015). Application of metagenomics then led to the identification of the genes responsible for PBDEs production within the sponge cyanobacterial endosymbionts (Agarwal et al., 2017;Schorn et al., 2019). Similarly, integration of molecular networking and genome-mining of several marine Salinispora bacteria led to several molecular family-BGC pairings, including the characterization of a new cytotoxic depsipeptide (retimycin) and its link to gene cluster NRPS40 (Duncan et al., 2015;Amos et al., 2017). Metabologenomics, a term that defines these integrated approaches, can therefore contribute to accelerate the linking of unknown BGCs to metabolites as well as assist in extract prioritization for structure elucidation (Goering et al., 2016;van der Hooft et al., 2020 , Figure 2). Natural products complexity is often a barrier in their synthesis and scale-up production, while harvest of the producing organisms (e.g., plants, marine macroorganisms or uncultivable microorganisms) might not be a reliable or sustainable option (Paddon and Keasling, 2014). For example, 13 tonnes of the marine bryozoan Bugula neritina were harvested for the isolation of 18 g of the potent anti-cancer compound bryostatin 1 for clinical phase 1 studies (Schaufelberger et al., 1991). In such cases, pairing interesting metabolites with their encoding genes provides new opportunities for the synthesis of natural products using synthetic biology approaches such as insertion of biosynthetic genes in heterologous hosts (Amos et al., 2017;Ahmed et al., 2020). However, despite the huge potential of these methods and the global effort in increasing the number of annotated and well-curated genome data from marine bacteria and symbionts (e.g., Udwary et al., 2007;Machado et al., 2015), these tools are still in the infancy and need to be further developed for their widespread use in marine biotechnology.

FUTURE DIRECTIONS AND CONCLUDING REMARKS
Amidst the increasing global concern over antimicrobial resistance, the SARS-CoV-2 virus pandemic, and the continuous need for anticancer and antiviral drugs, the natural product research field has attracted renewed attention as new tools such as metabolomics accelerate metabolite discovery and decrease FIGURE 2 | Pipeline illustrating the integration of metabolomics, microbiomics and metagenomics tools. The parallel metabolome and microbiome analysis allows the identification of metabolites of interest that are highly correlated to specific bacterial taxons. Then by performing long-read sequencing and genome mining (using tools such as AntiSMASH or similar) biosynthetic gene clusters (BGCs) can be identified and annotated manually or by using databases such as MIBiG. 16S contigs (i.e., continuous fragments of DNA sequence from an incomplete genome) can then be used to assign the BGCs to specific bacterial taxon genomes. The predicted compounds encoded by these BGCs can then be compared with the metabolome data in order to link produced metabolites with specific bacterial BGCs. re-discovery rates (Rodrigues et al., 2016;Liu et al., 2019). In particular, marine organisms arise as a prolific source of novel natural products, with an economic value estimated at US$ 563 billion -5.7 trillion for only anticancer marine drugs (Erwinn et al., 2010;Abdelmohsen et al., 2017). Marine organisms have been much less studied than their terrestrial counterparts, however, they produce an incredibly diverse array of molecules with new chemical features and modes of action (Abdelmohsen et al., 2017, Carroll et al., 2020. For example, the production of highly bioactive organohalogens seems to be widespread amongst marine organisms, but is less common amongst terrestrial organisms (Gribble, 2010). Both the discovery of new species and improved taxonomic descriptions offer new natural product bioprospecting possibilities. However, not only taxonomy determines the production of metabolites but also biotic (e.g., life cycle, organisms interactions) and abiotic (e.g., environmental conditions) factors regulate metabolite production (e.g., Nylund et al., 2011;Gaubert et al., 2020). Metabolomics arises therefore not only as a fast alternative to elucidate new biologically active metabolites, but as a tool to study the variation of the organism's chemical diversity in response to different factors. Linking the production of biologically active metabolites to environmental or biological conditions could have direct implications for bioprospecting and scaling-up of metabolite production. Where scaling-up of metabolite production from whole organism culture is not possible, the linking of metabolomics with metagenomics approaches, that may allow identifying responsible BGCs (either in the host or the associated microorganisms), also provides new opportunities for metabolite production through synthetic biology and bioengineering (Ahmed et al., 2020).
Despite the many advantages and huge potential of metabolomics approaches for the discovery of new marine biologically active compounds, the major drawback continues to be the poor coverage of marine natural products in public databases (Pereira and Aires-de-Sousa, 2018;Barbosa and Roque, 2019). Such lack of public reference spectra (including fragmentation patterns) of marine-derived natural products prevents automated dereplication processes, and often results in the development of in-house libraries by the different research groups after labor-intensive efforts (e.g., purification and structural elucidation of each metabolite). Therefore, development of a free-access database with wellcurated spectrometric data (e.g., MS 1 , MS 2 , UV, NMR) on marine natural products is urgently needed, along with the constant development and improvement of metabolomics tools (e.g., hardware and workflows) to increase sensitivity and repeatability to enable comparison and integration of a larger number of datasets (Guitton et al., 2017;Forsberg et al., 2018). This could be achieved, for example, with the integration of known and new marine natural products (from macro-and microorganisms elucidated by both classical approaches and metabolomics) into currently operational databases highly used in metabolomics such as GNPS.

AUTHOR CONTRIBUTIONS
MR conceived the first idea of the article. MR, SR, and CP wrote the first draft of the article. All authors revised the first draft and worked on the final version of the article. All authors discussed throughout the conception of the article.

FUNDING
This research was funded by an Alexander von Humboldt postdoctoral fellowship to MR. The Ph.D. of CP is supported through a scholarship from the french region "Occitanie".