Systems Biology Approaches to Understand Natural Products Biosynthesis

Actinomycetes populate soils and aquatic sediments that impose biotic and abiotic challenges for their survival. As a result, actinomycetes metabolism and genomes have evolved to produce an overwhelming diversity of specialized molecules. Polyketides, non-ribosomal peptides, post-translationally modified peptides, lactams, and terpenes are well-known bioactive natural products with enormous industrial potential. Accessing such biological diversity has proven difficult due to the complex regulation of cellular metabolism in actinomycetes and to the sparse knowledge of their physiology. The past decade, however, has seen the development of omics technologies that have significantly contributed to our better understanding of their biology. Key observations have contributed toward a shift in the exploitation of actinomycete’s biology, such as using their full genomic potential, activating entire pathways through key metabolic elicitors and pathway engineering to improve biosynthesis. Here, we review recent efforts devoted to achieving enhanced discovery, activation, and manipulation of natural product biosynthetic pathways in model actinomycetes using genome-scale biological datasets.

Actinomycetes populate soils and aquatic sediments that impose biotic and abiotic challenges for their survival. As a result, actinomycetes metabolism and genomes have evolved to produce an overwhelming diversity of specialized molecules. Polyketides, non-ribosomal peptides, post-translationally modified peptides, lactams, and terpenes are well-known bioactive natural products with enormous industrial potential. Accessing such biological diversity has proven difficult due to the complex regulation of cellular metabolism in actinomycetes and to the sparse knowledge of their physiology. The past decade, however, has seen the development of omics technologies that have significantly contributed to our better understanding of their biology. Key observations have contributed toward a shift in the exploitation of actinomycete's biology, such as using their full genomic potential, activating entire pathways through key metabolic elicitors and pathway engineering to improve biosynthesis. Here, we review recent efforts devoted to achieving enhanced discovery, activation, and manipulation of natural product biosynthetic pathways in model actinomycetes using genome-scale biological datasets.
Keywords: actinomycetes, genome mining, genomics, transcriptomics, proteomics, metabolomics, genomescale metabolic reconstructions inTRODUCTiOn Actinomycetes represent one of the largest bacterial phyla and are primary contributors to carbon cycling and a major source of bioactive natural products (BNP) including, most prominently, antibiotics. Despite their prime importance, our understanding of actinomycete's biology remains elusive owing to a characteristically large, convoluted, high GC content genome (Demain, 2014). The complexity of actinomycetes genomes was only fully revealed in the last decade as part of the genomic revolution. Sequencing of the first actinomycetes genomes revealed a plethora of bioactive secondary metabolites yet to be discovered in addition to the well-characterized biosynthetic gene clusters (BGC) (Doroghazi et al., 2014). According to NCBI database, to date around 1,000 actinomycete genomes have been fully sequenced and annotated. Homology sequence-based bioinformatic tools have confirmed their great potential as BNP producers; for example, species Systems Biology in Actinomycetes Frontiers in Bioengineering and Biotechnology | www.frontiersin.org of Streptomyces, Salinispora, and Saccharopolyspora families contain an average of 30 secondary metabolite gene clusters (Nett et al., 2009).
The physiological changes leading to BNP biosynthesis in actinomycetes have been thoroughly studied over the past 10 years. Considerable work has advanced our understanding of the transitional stage triggering BNP biosynthesis (also known as the "metabolic switch"; Alam et al., 2010) and with it, our understanding of the physiological changes leading to secondary pathways activation. However, a lack of full understanding of this physiological transition stage has prevented us from manipulating fully this cellular process using metabolic engineering. Here, we review landmark studies contributing to the discovery, activation, and manipulation of metabolic pathways for BNP through the development of genome-wide biological datasets and systems biology in actinomycetes (Figure 1).

PATHwAY DiSCOveRY: FROM iMPROveD GenOMe AnnOTATiOn TO THe DiSCOveRY OF new BnP BiOSYnTHeTiC PATHwAYS
Genome annotation is the basis for pathway discovery and manipulation. Pathway discovery typically follows a defined pipeline: genome sequencing, annotation, gene discovery, and pathway manipulation. The exponential increase in sequencing efficiency is yielding an ever-increasing number of sequenced genomes, causing a bottleneck due to an often limited understanding of genome sequences. In silico approaches mainly rely on sequence homology scores to experimentally characterized sequences. Historically, however, functional microbiology has focused on a handful of microorganisms. Therefore, the genomic space for in silico genome annotation pipelines is biased for certain G + C content sequences, gene length, and organization. For instance, approximately 60% of the bacterial genomic space is missannotated in terms of gene boundaries (start/stop codons) caused by minimal cross-checks between computationally assigned open-reading frames (ORFs) and real genes (Nielsen and Krogh, 2005).
Bioinformatics-based pipelines failed to annotate accurately short-length proteins and high G + C content sequences in an annotation effort for 46 bacterial and archaea genomes (Venter et al., 2011). By contrast, functional annotations supported by "omics technologies" dramatically improve gene function assignment, particularly in less characterized microorganisms such as Geobacter sulfurreducens (Qiu et al., 2010) or the erythromycin producer actinomycete Saccharopolyspora erythraea (Marcellin et al., 2013a). Integration of proteomics and transcriptomics approaches has led to the re-annotation of these genomes, allowing for correction of hundreds of gene boundaries, the confirmation of hypothetical proteins and the discovery of dozens of new genes. A combination of proteomics and genomics, also known as proteogenomics, has also been used to deliver unbiased correlations between genome sequence and protein expression (Gupta et al., 2007;Gallien et al., 2009;Armengaud, 2010;Castellana and Bafna, 2010;Marcellin et al., 2013a).
Initial approaches to the discovery and identification of BNP were based on the search for cryptic BGC. The most common method involves gene mapping of enzymatic assembly complexes such as polyketide synthases (PKSs), non-ribosomal peptide synthases (NRPSs), and other enzymes typically related to BNPs (e.g., lanthipeptide synthases, terpene synthases, etc.). While simplistic, accumulation of structural, mechanistic, genetic, and chemical information on PKs and NRPs has allowed for the prediction of structures and chemical properties of dozens of BCGs from DNA sequences (Walsh et al., 2006;Hertweck, 2009;Jenke-Kodama and Dittmann, 2009;Koglin and Walsh, 2009;Walsh and Fischbach, 2010). Incorporating these mining strategies in specialized bioinformatic pipelines has revolutionized the genome mining scene efficiency. Genome-scale prediction of putative BGCs is nowadays possible within a few of hours.
Continuous progress has enabled the emergence of bioinformatics platforms, such as CLUSEAN (Weber et al., 2009), ClustScan (Starcevic et al., 2008), np.searcher , SMURF (Khaldi et al., 2010), and antiSMASH (Medema et al., 2011b;Blin et al., 2013;Weber et al., 2015). The latter is the most popular system for automated BNP genome mining since it analyzes BGC domains to propose loci, chemical scaffold, and putative chemical structures. However, one of the biggest disadvantages of the use of these genome mining approaches is their intrinsic limitations to BGCs from known chemical structures. Complementary approaches have emerged to enable the discovery of novel BCGs, such as ClusterFinder (Cimermancic et al., 2014) or EvoMining (Medema and Fischbach, 2015). ClusterFinder uses hidden Markov model-based algorithms and Pfam as search database to annotate BCGs by clusters of protein domains with a biosynthetic logic. The use of ClusterFinder has allowed the detection of previously unknown classes of BCGs (Cimermancic et al., 2014). On the other hand, EvoMining is FiGURe 1 | Sequencing actinomycete genomes has revealed an unexpected complexity. Systems Biology has opened access to the untapped chemical diversity encoded within the global microbial genome, including the vast majority (>99%) of taxa that are currently deemed unculturable, and a wealth of bioactive genes that are currently silent (untranslated) under standard cultivation condition.
a functional phylogenomic pipeline that identifies expanded, repurposed enzyme families, with the potential to catalyze new conversions within BGC (Medema and Fischbach, 2015). This innovative method embraces the predictive power of evolutionary theory leading to model-independent predictions that include gene clusters that do not follow traditional biosynthetic rules. The method has been used for the discovery of the genes directing synthesis of small peptide aldehydes and the first biosynthetic system for arseno-organic metabolites.
Overall, genomic approaches have significantly improved the prediction of BNP from unannotated sequences and provided deep insights into the identification of novel chemical species. The genomic approach is limited to the known repertoire of BCGs, ignoring regulatory information for pathway activation.

PATHwAY ACTivATiOn: SYSTeMS BiOLOGY AnALYSiS OF ACTinOMYCeTe PHYSiOLOGY AnD DeveLOPMenT
Systems biology protocols have been successfully used to describe germination (Piette et al., 2005;Yagüe et al., 2013a;Bobek et al., 2014), programed cell death (Manteca et al., 2005), diauxic lag phase (Novotna et al., 2003), mutant analyses (bald A mutant) (Kim et al., 2005;Hesketh et al., 2007), and phosphate limitation (Rodríguez-García et al., 2007). Given that biosynthesis of natural products in actinomycetes is conceived as a physiological response to environmental changes (e.g., change of temperature, nutritional conditions, etc.), it is assumed that understanding their physiological behavior would provide the lead for natural product pathway activation and manipulation. Here, we focus on reviewing efforts devoted to understand the physiological transitions prior the activation of known natural product biosynthesis and the approaches used for the activation of unknown natural products biosynthetic pathways in model actinomycetes.

Physiological Transitions and Development
Actinomycetes undergo drastic physiological changes during their developmental cycle (i.e., programed cell death and sporulation). In contrast to previous assumptions that sporulation events exclusively occurred in solid cultures (Flardh and Buttner, 2009), differentiation during pre-sporulation stages have been described in both solid and liquid Streptomyces cultures (Manteca et al., 2010). The existence of two different mycelia (MI and MII) across the developmental cycle has been characterized using iTRAQ LC-MS/MS proteomics, phosphoproteomics, and microarray-based transcriptomics (Manteca et al., 2010(Manteca et al., , 2011Yagüe et al., 2013b). Specifically, proteins involved in antibiotic biosynthesis were upregulated in MII, and primary metabolism proteins from glycolysis, protein biosynthesis, and tricarboxylic acid cycle were upregulated in the MI. The second multinucleated mycelium with (aerial) and without (substrate) hydrophobic covers constituted a unique reproductive structure (Manteca et al., 2010). The most remarkable differences between MII from solid and liquid cultures involved proteins regulating the final stages of hyphae compartmentalization and spore formation (Manteca et al., 2010).
Similarly, characterization of the S. erythraea developmental cycle in bioreactors has been explored at base resolution transcription (RNA-seq), proteome (iTRAQ) and phosphoproteome (sMRM) (Marcellin et al., 2013a,b;Licona-Cassani et al., 2014) (Figure 2). The studies focused on the metabolic switch, a distinct transformational event that bisects two growth phases in actinomycetes and is characterized by rapid molecular and morphological changes. Authors found that the S. erythraea transcriptome undergoes extensive events of targeted mRNA degradation and transcription of mRNAs for adaptive metabolic functions, thereby resetting cells for the induction of a replacement transcriptional program. A suite of RNase and proteases mediate a targeted destruction of the transcriptome and proteome (suicidal patterns) in concert with the shifting of broad transcription macro-domains, delineated by core/non-core genomic regions. In addition, the temporal-dynamic, semiquantitative phosphoproteomic study revealed that proteins from central metabolism (putative acetyl-CoA carboxylase, isocitrate lyase, and 2-oxoglutarate dehydrogenase) and key developmental pathways (trypsin-like serine protease, ribonuclease Rne/Rng, and ribosomal proteins) in S. erythraea change dramatically the degree of phosphorylation across the developmental cycle in liquid cultures (Figure 2) (Licona-Cassani et al., 2014).
One of the most significant observations linking actinomycete physiological behavior and pathway activation was made by Nieselt and collaborators (Nieselt et al., 2010). Using a temporal-dynamic transcriptomic analysis, Nieselt and collaborators identified the existence of several transitional stages along the fermentation that coincide with activation of natural product metabolic pathways in S. coelicolor (Nieselt et al., 2010). Under their bioreactor settings, early coordinated gene expression changes of genes related to nitrogen metabolism, including glutamine synthases I and II and the signaling protein GlnK is observed under similar temporal space as genes from the CPK antibiotic biosynthetic pathway. Interestingly, such transcriptional changes were observed under nitrogen sufficiency conditions. In addition, an unexpected transcriptional switch for developmental genes, such as chaplins, bldN, and whiH was registered showing for the first time that developmental genes are transcribed in S. coelicolor liquid cultures. Finally, the traditional metabolic switch was observed by a strong upregulation of the pho regulon together with the upregulation of the pigmented antibiotic undecylprodigiosin and actinorhodin (Nieselt et al., 2010).
While we are still far from overcoming the physiological barrier of achieving pathway activation and exploiting the full genomic potential of actinomycetes, systems biology approaches have significantly contributed to shifting key paradigms. First, we know that pathway activation does not follow the same regulatory rules to model actinomycetes (e.g., S. coelicolor or Mycobacterium tuberculosis); in fact, it is now possible to understand such differences in a single experiment. More importantly, systems biology has exposed a subset of strain-metabolite-specific regulatory mechanisms such as non-coding RNAs (Marcellin et al., 2013b), dynamic phosphorylation of ribosomal proteins (Licona-Cassani et al., 2014), acetylation of RNA degradation-related proteins (Huang et al., 2015), among others. The last part of this mini, review focuses on the efforts to manipulate metabolic pathways in actinomycetes using genome-scale metabolic reconstructions and metabolic engineering strategies.

PATHwAY MAniPULATiOn: FROM RATiOnAL DeSiGn TO GenOMe-SCALe MODeL -GUiDeD MeTABOLiC enGineeRinG STRATeGieS in ACTinOMYCeTeS
Production processes with sub-hundreds of mg/L product titers are considered unsustainable for industrial scale production. Actinomycetes cultures are also slow growing and unpredictable even under controlled conditions (i.e., bioreactor fermentations), and as such are difficult to ferment. For over 50 years, entire teams of metabolic and bioprocess engineers have used classical approaches such as random mutagenesis rounds (Tanaka et al., 2009;Jung et al., 2011), media design, and process optimization (Hamedi et al., 2004;El-Enshasy et al., 2008;Zou et al., 2009), and rationally designed metabolic engineering strategies (Reeves et al., 2006(Reeves et al., , 2007Olano et al., 2008) to engineer actinomycetes bioprocesses and strains to achieve acceptable titers. Modern strain engineering uses genome-scale models (GEMs) in combination with omics data for the integration of genome-scale biological datasets toward the manipulation of metabolism.
Since the first GEM release, more than a decade ago (Edwards and Palsson, 1999), applications have expanded from in silico metabolic predictions (Edwards and Palsson, 2000;Schilling et al., 2002;Park et al., 2007) to the discovery of antibacterial FiGURe 2 | Systems biology aims at understanding the larger picture of actinomycete's biology -at the level of the organism -by putting its pieces together rather than apart. It is in stark contrast to decades of reductionist biology in the area of actinomycete's biology. For example, this figure illustrates the comprehensive multi-omics characterization of the S. erythraea metabolism across a fermentation time-course. As in most actinomycetes, at the 50th hour, there is a characteristic metabolic transition, which dictates the production of erythromycin. This figure illustrates how this transition is characterized by a massive loss of proteins and ribosomal RNA before a new expression pattern emerges. Around 30% of all transcripts arose from previously unannotated DNA and detailed analysis revealed approximately 350 new coding genes and 300 non-coding genes. Systems biology can unravel the complex nature of actinomycete's biology at the transcriptomics, metabolomics, and proteomics level revealing various novel non-coding RNAs and uncharacterized phosphorylation patterns.
While draft GEM are now routinely generated using highthroughput automated pipelines (Aziz et al., 2008;Henry et al., 2010), successful applications of GEMs to microbial metabolism only have occurred on exhaustively (manually) curated/experimentally validated metabolic reconstructions. In this regard, despite being our primary microbial source for antibiotics, only a handful of manually curated GEM for actinomycetes have been reported (Borodina et al., 2005a;Beste et al., 2007;Jamshidi and Palsson, 2007;Alam et al., 2010;Medema et al., 2010Medema et al., , 2011aChindelevitch et al., 2012;Licona-Cassani et al., 2012). Even more surprising is the fact that pathway optimization of actinomycete metabolism has only been achieved at the level of precursor supply (Borodina et al., 2005b(Borodina et al., , 2008Licona-Cassani et al., 2012). In such approaches, optimal solutions are found because there is congruence between cellular (maximize growth rate) and engineering objectives (maximize productivity). In order to properly optimize production of non-growth-associated metabolites (i.e., BNP) novel algorithms and new objective functions are to be incorporated to current protocols.
The last few years have seen the emergence of network reconstruction beyond metabolism. These next-generation network reconstructions account for expression coupled to metabolism Systems Biology in Actinomycetes Frontiers in Bioengineering and Biotechnology | www.frontiersin.org and even transcriptional regulation. The first such model, known as ME-model, was developed for Thermotoga maritima Lerman et al., 2012) rapidly followed by the Escherichia coli ME-model (O'Brien et al., 2013). Incorporation of gene expression in the mathematical framework allows these models to expand their predictive capabilities, which may be what is needed to model non-growth-associated metabolites such as BNPs. It is expected that as ME-models for actinomycetes become available, just like GEMs became available 10 years ago, multi-omics integration may be possible, and with it, models become more predictive.

FUTURe DiReCTiOnS
Like in model organisms, such as yeast and E. coli, systems biology in actinomycetes has immensely advanced our understanding of this complex and fascinating bacterial family, offering insightful information regarding gene discovery, gene regulation, and pathway manipulation. The tools are highly developed and readily available yet integration and data analysis remain our main challenge. Just like finding a needle in a haystack, finding a key gene in a sea of data is extremely challenging; the main challenge remains the lack of tools for integration and visualization of large datasets. As such, systems biology in actinomycetes has yet to deliver real advances for the production of BNPs and discovery of novel bioactive compounds.

FUnDinG
We would like to thank The Queensland Government for the Accelerate fellowship to E.M. and The University of Queensland, the Australian Research Council, and the Mexican Council for Science and Technology CONACYT for financial support.