Cloning and Heterologous Expression of a Large-sized Natural Product Biosynthetic Gene Cluster in Streptomyces Species

Actinomycetes family including Streptomyces species have been a major source for the discovery of novel natural products (NPs) in the last several decades thanks to their structural novelty, diversity and complexity. Moreover, recent genome mining approach has provided an attractive tool to screen potentially valuable NP biosynthetic gene clusters (BGCs) present in the actinomycetes genomes. Since many of these NP BGCs are silent or cryptic in the original actinomycetes, various techniques have been employed to activate these NP BGCs. Heterologous expression of BGCs has become a useful strategy to produce, reactivate, improve, and modify the pathways of NPs present at minute quantities in the original actinomycetes isolates. However, cloning and efficient overexpression of an entire NP BGC, often as large as over 100 kb, remain challenging due to the ineffectiveness of current genetic systems in manipulating large NP BGCs. This mini review describes examples of actinomycetes NP production through BGC heterologous expression systems as well as recent strategies specialized for the large-sized NP BGCs in Streptomyces heterologous hosts.


INTRODUCTION
Natural products (NPs) and their derivatives lead a huge pharmaceutical market share comprising 61% of anticancer drugs and 49% of anti-infection medicine in the past 30 years (Newman and Cragg, 2012). Especially, actinomycetes NPs are a major resource for drug discovery and development, mainly due to their structural novelty, diversity, and complexity (Donadio et al., 2007). Isolation and characterization of NP biosynthetic gene clusters (BGCs) have further accelerated our understanding of their molecular biosynthetic mechanisms, leading to the rational redesign of novel NPs through BGC manipulation (Fischer et al., 2003;Castro et al., 2015).
Some of these potentially valuable BGCs are, however, derived from non-culturable metagenomes or genetically recalcitrant microorganisms. Moreover, many of these BGCs are expressed poorly or not at all under laboratory culture conditions, which makes it challenging to characterize the target NPs (Galm and Shen, 2006). Since efficient expression of actinomycetes NP BGCs present a major bottleneck for novel NP discovery, various cryptic BGC awakening strategies such as regulatory genes control, ribosome engineering, co-culture fermentation, and heterologous expression have been pursued for NP development Flinspach et al., 2014;Martinez-Burgo et al., 2014;Miyamoto et al., 2014).
A traditional method for BGC cloning involves cosmid library construction by partial digestion or random shearing of chromosomal DNA. A typical size of NP BGC is usually larger than 20 kb (sometimes over 100 kb), and a cosmid vector system can only accept a relatively small BGC (up to 40 kb) or only a part of a large BGC. Therefore, cloning and efficient overexpression of an entire BGC still remains challenging due to the ineffectiveness of current host cells including the genetic and metabolic characteristics in manipulating large BGCs for heterologous expression. This mini review summarizes the list of the actinomycetes NP BGCs that have been successfully cloned and expressed in Streptomyces heterologous hosts ( Table 1). In addition, three cloning and heterologous expression systems, which are quite suitable for large NP BGCs, such as transformation-associated recombination (TAR) system, integrase-mediated recombination (IR) system, and plasmid Streptomyces bacterial artificial chromosome (pSBAC) system are introduced (Figure 1).

TRADITIONAL METHOD FOR HETEROLOGOUS EXPRESSION OF NP BGCS
We summarized about 90 actinomycetes NP BGCs that have been successfully expressed in Streptomyces heterologous hosts from the last several decades (Table 1). Relatively small BGCs encoding Type II polyketide were first to be isolated at the beginning of heterologous expression research. Many of the listed BGCs (about 83%) were isolated by cosmid/fosmid library construction and some of these BGCs were cloned into replicative or integrative vector by linear-plus-linear (recombination between two linear DNAs) or linear-plus-circular (recombination between linear and replicating circular DNA) homologous recombination. Approximately 60% of BGCs were integrated into the heterologous host chromosome and only 37% of BGCs existed in the heterologous host via replicative plasmid. Cosmid vectors such as pOJ446 and SuperCos1 were used to be replicative or integrative in the heterologous host, so the production level of the heterologously expressed NP BGC varied significantly. Some BGCs were isolated with two different vector systems, followed by heterologous expression via both integrative and replicative systems. For example, the epothilone BGC was expressed by both pSET152-based integration vector and SCP2 * -based replication vectors, so that its expression level was increased from 0.1 mg/L in the original Sorangium cellulosum system to 20 mg/L in the epothilone BGC-expressing Streptomyces host . S. coelicolor and S. lividans were two major strains for heterologous expression, thanks to their well-characterized genetic and biochemical properties. About 12% BGCs were expressed in another popular heterologous host, S. albus, which has fast growth and an efficient genetic system (Zaburannyi et al., 2014). Comparing with the original NP producing strains, approximately 14% of NPs had a higher expression level and 12% lower when they were expressed in the heterologous hosts. When bernimamycin BGC was heterologously expressed both in S. lividans and S. venezuelae, its production yield was increased 2.4-fold in S. lividans with no production in S. venezuelae (Malcolmson et al., 2013).

TAR System
The TAR system takes advantage of the natural in vivo homologous recombination of Saccharomyces cerevisiae (Larionov et al., 1994). It has also been applied to capture and express large biosynthetic gene clusters from environmental DNA samples Kim et al., 2010). Yamanaka and colleagues designed TAR cloning vector, pCAP01, which consists of three elements, one from each of yeast, E. coli, and actinobacteria (Yamanaka et al., 2014). The target BGC can be directly captured and manipulated in yeast background, and the captured BGC can be shuttled between E. coli and actinobacteria species. It also has a pUC ori that could stably carry an over 50 kb insert in E. coli hosts. The pCAP01 vector contains oriT and attP-int that can transfer the target BGC by conjugation, and the DNA stability can be maintained by insertion into heterologous host chromosomes. To generate a capturing vector, both flanking homologous arms of the target BGC were PCRamplified and cloned into the pCAP01. The linearized capturing vector and the restriction enzyme digested genomic DNA were co-transformed into yeast, then the target BGC was captured by yeast recombination activities ( Figure 1A). The marinopyrrole BGC (30 kb) and the taromycin A BGC (67 kb) were captured by this TAR system, and functionally expressed in Streptomyces coelicolor (Yamanaka et al., 2014).

IR System
Most cloning systems to clone a large DNA fragment directly from bacterial genome are based on different site-specific recombination systems that consist of a specialized recombinase and its target sites. The IR system is based on BT1 integrase-mediated site-specific recombination and simultaneous Streptomyces genome engineering (Du et al., 2015). The actinorhodin BGC, the napsamycin BGC and the daptomycin BGC were successfully isolated by the IR system (Du et al., 2015). pUC119-based suicide vector and pKC1139 carrying mutated attP or attB, respectively, and an integrative plasmid containing the BT1 integrase gene were used for the system ( Figure 1B). The pUC119-based plasmid carrying mutated attB and a homologous region to 5 ′ end of the target BGC was introduced into the chromosome by single crossover. The pKC1139 carrying mutated attP and a homologous region to 3 ′ end of the BGC was transferred and integrated into chromosome by conjugation and single crossover through cultivation at high temperature above 34 • C. Expression of BT1 integrase leads to excision of the pKC1139 containing the target BGC. The pKC1139 containing BGC from original producing Streptomyces was extracted and transferred into E. coli for recovery. The IR system was only expressed in parental strain not heterologous  host, but it was presumed to be transferred and maintained by replication in heterologous host (Du et al., 2015).

pSBAC Vector System
In the early 1990s, Bacterial Artificial Chromosomes (BAC) was reported to carry inserts approaching 200 kb in length emerged (Shizuya et al., 1992). Various BAC vectors have been used extensively for construction of DNA libraries to facilitate physical genomic mapping and DNA sequencing efforts (Sosio et al., 2000;Martinez et al., 2004;Fuji et al., 2014;Varshney et al., 2014). Several E. coli-Streptomyces shuttle BAC vectors have been developed to carry the large-sized NP BGCs such as pStreptoBAC V and pSBAC (Miao et al., 2005;Liu et al., 2009). The utility of pSBAC was demonstrated through the precise cloning and heterologous expression of the tautomycetin BGC and the pikromycin BGC of the type I PKS biosynthetic pathway, as well as the meridamycin BGC of the PKS-NRPS hybrid biosynthetic pathways (Liu et al., 2009;Nah et al., 2015). Unique restriction enzyme recognition sites naturally existing or artificially inserted into both flanking regions of the entire BGC were used for capturing the BGCs. The pSBAC vector was also inserted within the unique restriction enzyme site by homologous recombination. And then the entire target BGC was captured in a single pSBAC through straightforward single restriction enzyme digestion and self-ligation ( Figure 1C). The pSBAC contains two replication origins, ori2 and oriV, for DNA stability in E. coli, and oriT and C31 attP-int for BGC integration into the surrogate host chromosome through intergenic conjugation. The recombinant pSBAC containing the large BGCs of varied length from 40 kb to over 100 kb have been successfully cloned and conjugated from E. coli to S. coelicolor and S. lividans (Liu et al., 2009;Nah et al., 2015), implying that the pSBAC system seems to be the most suitable for large BGC cloning comparing with TAR and IR systems.
Recently, a new cloning method named CATCH (Cas9-Assisted Targeting of Chromosome) based on the in vitro application of RNA-guided Cas9 nuclease was developed (Jiang and Zhu, 2016). The Cas9 nuclease cleaves target DNA in vitro from intact bacterial chromosomes embedded in agarose plugs, which can be subsequently ligated with cloning vector through Gibson assembly. Jiang and colleagues cloned the 36-kb jadomycin BGC from S. venezuelae and the 32-kb chlortetracycline BGC from S. aureofaciens by CATCH (Jiang et al., 2015).

STREPTOMYCES HETEROLOGOUS EXPRESSION OF NP BGCS
The Streptomyces genus is suitable for heterologous expression of large NP BGCs due to its intrinsic ability to produce various valuable secondary metabolites. Well-studied Streptomyces strains such as S. coelicolor, S. lividans, and S. albus have been mainly used as heterologous expression surrogate hosts ( Table 1). The regulatory networks of secondary metabolite production have been well characterized in these strains, and thus several NP high-level producing strains have been constructed (Baltz, 2010;Gomez-Escribano and Bibb, 2011). In addition, some of these Streptomyces host genomes have been further engineered to eliminate precursor-competing biosynthetic BGCs, so that the extra precursors such as malonyl-CoA and acetyl-CoA could be funneled into the target polyketide NP biosynthesis (Gomez-Escribano and Bibb, 2011). Table 1, most of the heterologously expressed NPs were detected as a final product, but some were detected as an intermediate due to their partial BGC expression. The NP production yield was similar to or slightly lower than that in WT. To increase the production level in heterologous hosts, it was devised to substitute with strong promoters or to increase the copy number of BGCs (Montiel et al., 2015;Nah et al., 2015). In case of pSBAC system, the tautomycetin production yield in the heterologous hosts was similar to that in the original producing strain. The selection marker on the tautomycetin BGC was changed and re-introduced into the heterologous host by tandem repeat, resulting in further yield increase from 3.05 to 13.31 mg/L in comparison with the heterologous host harboring only single copy of tautomycetin BGC. The heterologous host harboring tandem copies of tautomycetin BGC was proved to stably maintain two BGCs in the presence of appropriate antibiotic selection (Nah et al., 2015).

As shown in
Meanwhile, the TAR system used yeast homologous recombination-based promoter engineering for the activation of silent natural product BGCs (Montiel et al., 2015). Bi-directional promoter cassettes were generated by PCR amplification of varied yeast selectable markers, which contains promoterinsulator-RBS combinations, and they were co-transformed with the cosmid or BAC clone harboring the target BGC into yeast. The rebeccamycin BGC was used as a model BGC. The promoter-replaced rebeccamycin BGC was transferred into S. albus by conjugation, and the production of rebeccamycin was examined in the heterologous host (Montiel et al., 2015). Using the TAR-based promoter engineering strategy, multiple promoter cassettes could be inserted simultaneously into the target BGC, thereby expediting the re-engineering process. The TAR-based promoter engineering strategy was also used to capture the silent tetarimycin BGC and the silent, cryptic pseudogene-containing, environmental DNA-derived lazarimide BGC (Montiel et al., 2015).
In conclusion, Streptomyces heterologous expression systems have been proved to be a very attractive strategy to awaken cryptic NP BGCs, and could also be applied to overexpression of a variety of large NP BGCs in actinomycetes.

AUTHOR CONTRIBUTIONS
HN, SK, SC, and EK planned, outlined, and revised the manuscript. HN, HP, and EK wrote and revised the manuscript.