Original Research ARTICLE
Talaropeptides A-D: Structure and Biosynthesis of Extensively N-methylated Linear Peptides From an Australian Marine Tunicate-Derived Talaromyces sp.
- 1Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD, Australia
- 2Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, St Lucia, QLD, Australia
- 3Joint BioEnergy Institute, Emeryville, CA, United States
An Australian marine tunicate-derived fungus, Talaromyces sp. CMB-TU011 was subjected to a program of analytical microbioreactor (MATRIX) cultivations, supported by UHPLC-QTOF profiling, to reveal conditions for producing a new class of extensively N-methylated 11-12 residue linear peptides, talaropeptides A-D (2-5). The structures for 2-5, inclusive of absolute configurations, were determined by a combination of detailed spectroscopic and chemical (e.g., C3 and C18 Marfey's) analyses. We report on the biological properties of 2-5, including plasma stability, as well as antibacterial, antifungal and cell cytotoxicity. The talaropeptide mega non-ribosomal peptide synthetase (NRPS) is described, as second only in size to that for the fungus-derived immunosuppressant cyclosporine (an 11-residue extensively N-methylated cyclic peptide).
In an earlier study, we described the structure elucidation of a first-in-class cyclic hexapeptide containing a rare hydroxamate residue, talarolide A (1) (Figure 1), isolated from an Australian marine tunicate-derived fungus, Talaromyces sp. CMB-TU011 (Dewapriya et al., 2017). In an effort to optimize the production of 1, we now report on a 24-well microbioreactor cultivation analysis (known in-lab as the MATRIX), using a combination of 11 different media and 3 phases (i.e., solid agar, as well as static and shaken broth). In situ solvent extraction on individual MATRIX culture wells yielded 33 extracts, which were subjected to UHPLC-DAD and UHPLC-QTOF-MS/MS analysis. While this study successfully revealed optimal conditions for the production of 1, including new analogs (work-in-progress), it also revealed conditions where talarolide production was fully suppressed in favor of a new class of extensively N-methylated linear peptides. This report provides an account of the production, isolation and characterisation of these new peptides, talaropeptides A-D (2-5), where structure elucidation inclusive of absolute configurations was achieved by a combination of detailed spectroscopic and chemical analyses. We also take the opportunity to report on the biological properties of 2-5, and document the mega non-ribosomal peptide synthetase (NRPS) responsible for the biosynthesis of talaropeptides.
Materials and Methods
General Experimental Details
Specific rotations ([α]D) were measured on a JASCO P-1010 polarimeter in a 100 × 2 mm cell at room temperature. NMR spectra were acquired on a Bruker Avance 600 MHz spectrometer with either a 5 mm PASEL 1H/D–13C Z-Gradient probe or 5 mm CPTCI 1H/19F-13C/15N/DZ-Gradient cryoprobe, controlled by TopSpin 2.1 software. In all cases, NMR spectra were acquired at 25°C (unless otherwise specified) in hexadeuterated dimethylsulfoxide (DMSO-d6) and tetradeuterated methanol (methanol-d4) with referencing to residual 1H (δH 2.50 and δH 3.31, respectively) or 13C (δC 39.51 and δC 49.15, respectively) NMR resonances. HPLC-DAD-ESIMS data were acquired on an Agilent 1100 series separation module equipped with an Agilent 1100 series HPLC/MSD mass detector and Agilent diode array detector. Semi-preparative and preparative HPLCs were performed using Agilent 1100 series HPLC instruments with corresponding detectors, fraction collectors, and software. HRESI(+)MS spectra were obtained on a Bruker micrOTOF mass spectrometer by direct injection in MeCN at 3 μL/min using sodium formate clusters as an internal calibrant. UHPLC-QTOF analysis was performed on UHPLC-QTOF instrument comprising an Agilent 1290 Infinity II UHPLC equipped with a Zorbax C8 column (2.1 mm × 50 mm, 1.8 μm particles), running with H2O/MeCN inclusive of 0.1% formic acid coupled to an Agilent 6545 Q-TOF. MS/MS analysis was performed on the same instrument for ions detected in the full scan at an intensity above 1,000 counts at 10 scans/s, with an isolation width of 4 ~m/z using a fixed collision energy and a maximum of 3 selected precursors per cycle. Nα-(2,4-dinitro-5-fluorophenyl)-L-alaninamide (L-FDAA, synonym 1-fluoro-2-4-dinitrophenyl-5-L-alanine amide) and Nα-(2,4-dinitro-5-fluorophenyl)-D-alanine amide (D-FDAA, synonym 1-fluoro-2-4-dinitrophenyl-5-D-alanine amide) were purchased from NovaBiochem. Amino acids and standards were purchased from NovaBiochem, BAChem Biosciences, Sigma, Fluka, or Merck.
Fungus Isolation and Taxonomy
Talaromyces sp. CMB-TU011 was isolated from an unidentified tunicate collected near Tweed Heads, NSW, Australia, and taxonomically identified as previously reported (Dewapriya et al., 2017).
Analytical (MATRIX) Cultivation of Talaromyces sp. CMB-TU011
Talaromyces sp. CMB-TU011 was cultured in a 24-well microbioreactor plate (Khalil et al., 2014) using a combination of 11 culture media and 3 phases (i.e., solid agar, and liquid static and liquid shaken) known in-lab as the MATRIX (Table S1). Briefly, a sterile loop was used to transfer mycelia/spores from an agar plate cultivation to 24-well microbioreactor plate (2.5 mL agar for solid cultures, 1.5 mL of broth for liquid cultures). The microbioreactor plates were sealed with air permeable membranes and incubated at 26.5°C for 10 days (190 rpm for shaken broth). The resulting 33 cultures were extracted in situ with EtOAc (2.0 mL/well), with the decanted solvent filtered and dried under N2. Secondary metabolite production was then analyzed by HPLC-DAD-ESIMS and UHPLC-QTOF.
Scale-Up Cultivation of Talaromyces sp. CMB-TU011 and Isolation of 2-5
Agar cubes (~1 cm2) recovered from 7 day CMB-TU011 agar plate cultures (3.3% artificial sea salt containing M1 medium) were used to inoculate 10 flasks (500 mL) charged with YES broth (160 mL). Individual flasks were covered with air permeable sterile cotton plugs, and incubated under static conditions for 10 days at 26.5°C, after which the combined broths/mycelia were extracted with EtOAc (4 × 500 mL) to yield a crude extract (1.65 g) which was partitioned between hexane (200 mL) and 1% aqueous MeOH (50 mL) and dried in vacuo to yield hexane (495 mg) and MeOH (1.15 g) solubles. The MeOH solubles were further fractionated by gel chromatography (Sephadex® LH-20, MeOH) into 20 fractions, which were selectively combined on the basis of HPLC-DAD-ESIMS analysis (Zorbax SB-C8 column, analytical gradient 90% H2O/MeCN – 100% MeCN inclusive of an isocratic 0.05% formic acid) to yield a fraction of interest (49 mg) that was resolved by optimized semi-preparative HPLC (Zorbax SB C3 column (9.4 mm × 25 cm), 40% MeCN/H2O elution at 3.0 mL/min inclusive of an isocratic 0.01% TFA modifier) to yield talaropeptide A (2) (tR = 7.11 min, 1.3 mg), talaropeptide B (3) (tR = 8.78 min, 1.3 mg), talaropeptide C (4) (tR = 16.89 min, 1.8 mg), and talaropeptide D (5) (tR = 22.96 min, 2.8 mg) (Supplementary Scheme S1).
Talaropeptide A (2): white powder; [α] −135.8 (c 0.04, MeOH); 1D and 2D NMR (600 MHz, DMSO-d6) see Table 1 and Supplementary Material; HRESI(+)MS m/z 1254.8435 [(M+H)+] (calcd for C65H112N11O13, 1254.8436).
Talaropeptide B (3): white powder; [α]-110.0 (c 0.05, MeOH); 1D and 2D NMR (600 MHz, methanol-d4) see Table 2 and Supplementary Material; HRESI(+)MS m/z 1375.8920 [M+Na]+ (calcd for C70H120N12O14Na, 1375.8939).
Talaropeptide C (4): white powder; [α] −151.6 (c 0.05), MeOH); 1D and 2D NMR (600 MHz, methanol-d4) see Table 3 and Supplementary Material; HRESI(+)MS m/z 1318.8376 [(M+Na)+] (calcd for C67H113N11O14Na, 1318.8361).
Talaropeptide D (5): white powder; [α]D22 −182.9 (c 0.05), MeOH); 1D and 2D NMR (600 MHz, methanol-d4) see Table 4 and Supplementary Material; HRESI(+)MS m/z 1417.9052 [(M+Na)+] (calcd for C72H122N12O15Na, 1417.9045).
Analyses were carried out following the published method (Vijayasarathy et al., 2016). Briefly, an aliquot (50 μg) of each talaropeptide in 6 M HCl (100 μL) was heated to 100°C in a sealed vial for 12 h, after which the hydrolysate was concentrated to dryness at 40°C under a stream of dry N2. The hydrolysate was then treated with 1 M NaHCO3 (20 μL) and L-FDAA (1-fluoro-2,4-dinitrophenyl-5-L-alanine amide) as a 1% solution in acetone (40 μL) at 40 °C for 1 h, after which the reaction was neutralized with 1 M HCl (20 μL) and filtered (0.45 μm PTFE) to generate an analyte.
C3 Marfey's analysis. An aliquot (10 μL) of each analyte was subjected to HPLC-DAD-MS analysis (Agilent Zorbax SB-C3 column, 5 μm, 4.6 × 150 mm, 50°C, with a 1 mL/min, 55 min linear gradient elution from 15–60% MeOH/H2O with a 5% isocratic modifier of 1% formic acid in MeCN) with amino acid content assessed by DAD (340 nm) and ESI(±)MS monitoring, supported by SIE (single ion extraction) methodology, with comparison to authentic standards.
C18 Marfey's analysis. An aliquot (10 μL) of each analyte was subjected to HPLC-DAD analysis for (Agilent Zorbax SB-C18 HPLC column, 5 μm, 4.6 × 150 mm, 50 °C, with a 1 mL/min, 50 min isocratic elution of 21% MeOH/H2O for N-Me-Ala and 34 % MeOH/H2O for N-Me-Phe, with a 5% isocratic modifier of 1% formic acid in MeCN) with amino acid content assessed by DAD (340 nm), with comparison to authentic standards.
Genome Mining of Talaromyces sp. CMB-TU011
Genomic DNA from Talaromyces sp. CMB-TU011 was extracted using a standard chloroform protocol (Nikodinovic et al., 2003). The extracted DNA was fragmented using a Covaris focused ultrasonicator and the resulting fragments (~ 1 KB) were used for library construction using a Thrulex DNA-Seq kit (Rubicon Genomics). The library was sequenced using a Next Seq platform in the paired-end (2 × 150) format to yield a total of 6,674,290 reads (1 GB). The raw reads were filtered and trimmed using Trimmomatic v0.36 (Bolger et al., 2014) to yield a total of 5,821,558 high quality reads (0.873 GB), which were assembled using Velvet 1.2.10 (Zerbino and Birney, 2008), Abyss v.2.0.3 (Simpson et al., 2009) and SPAdes v3.11.1 (Bankevich et al., 2012) assemblers with a window of Kmers between 41 and 121, with iterations every 10 units. The best assembly (Velvet with Kmer = 51) was annotated for natural products biosynthetic gene clusters using the Fungal implementation of AntiSMASH 4.0 (Blin et al., 2017). The output was manually curated and domain annotation was improved using pFAM (Finn et al., 2016) and the NCBI Conserved Domain Search tool (Marchler-Bauer et al., 2017). Adenylation domain specificity was predicted using the LSI based A-domain functional predictor (Baranašić et al., 2014). Manual sequence curation was done using the Artemis Genome Browser (Rutherford et al., 2000).
The bacterium to be tested was streaked onto a tryptic soy agar plate and was incubated at 37°C for 24 h. One colony was then transferred to fresh tryptic soy broth (15 mL) and the cell density was adjusted to 104-105 CFU/mL. The compounds to be tested were dissolved in DMSO and diluted with H2O to give 600 μM stock solution (20% DMSO), which was serially diluted with 20% DMSO to give concentrations from 600 μM to 0.2 μM in 20% DMSO. An aliquot (10 μL) of each dilution was transferred to a 96-well microtiter plate and freshly prepared microbial broth (190 μL) was added to each well to give final concentrations of 30−0.01 μM in 1% DMSO. The plates were incubated at 37°C for 24 h and the optical density of each well was measured spectrophotometrically at 600 nm using POLARstar Omega plate (BMG LABTECH, Offenburg, Germany). Each test compound was screened against the Gram-negative bacteria Escherichia coli ATCC 11775 and Pseudomonas aeruginosa ATCC 10145 and the Gram-positive bacteria Staphylococcus aureus ATCC 25923 and Bacillus subtilis ATCC 6051. Rifampicin was used as a positive control (40 μg/mL in 10% DMSO). The IC50 value was calculated as the concentration of the compound or antibiotic required for 50% inhibition of the bacterial cells using Prism 7.0 (GraphPad Software Inc., La Jolla, CA).
The fungus Candida albicans ATCC 10231 was streaked onto a Sabouraud agar plate and was incubated at 37°C for 48 h. One colony was then transferred to fresh Sabouraud broth (15 mL) and the cell density adjusted to 104-105 CFU/mL. Test compounds were dissolved in DMSO and diluted with H2O to give a 600 μM stock solution (20% DMSO), which was serially diluted with 20% DMSO to give concentrations from 600 to 0.2 μM in 20% DMSO. An aliquot (10 μL) of each dilution was transferred to a 96-well microtiter plate and freshly prepared fungal broth (190 μL) was added to each well to give final concentrations of 30–0.01 μM in 1% DMSO. The plates were incubated at 37°C for 24 h and the optical density of each well was measured spectrophotometrically at 600 nm using POLARstar Omega plate (BMG LABTECH, Offenburg, Germany). Amphotericin B was used as a positive control (30 μg/ml in 10% DMSO). Where relevant, IC50 value were calculated as the concentration of the compound or antifungal drug required for 50% inhibition of the fungal cells using Prism 7.0 (GraphPad Software Inc., La Jolla, CA).
Adherent cell SW620 (human colorectal carcinoma) and NCI-H460 (human lung carcinoma) cells were cultured in Roswell Park Memorial Institute (RPMI) 1640 medium. All cells were cultured as adherent mono-layers in flasks supplemented with 10% fetal bovine serum, L–glutamine (2 mM), penicillin (100 unit/mL), and streptomycin (100 μg/mL), in a humidified 37°C incubator supplied with 5% CO2. Briefly, cells were harvested with trypsin and dispensed into 96-well microtiter assay plates at 3,000 cells/well, after which they were incubated for 18 h at 37°C with 5% CO2 (to allow cells to attach as adherent mono-layers). Test compounds were dissolved in 20% DMSO in PBS (v/v) and aliquots (10 μL) applied to cells over a series of final concentrations ranging from 10 nM to 30 μM. After 48 h incubation at 37°C with 5% CO2 an aliquot (20 μL) of 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) in phosphate buffered saline (PBS, 5 mg/mL) was added to each well (final concentration 0.5 mg/mL), and microtiter plates were incubated for a further 4 h at 37°C with 5% CO2. After final incubation, the medium was aspirated and precipitated formazan crystals dissolved in DMSO (100 μL/well). The absorbance of each well was measured at 580 nm with a PowerWave XS Microplate Reader from Bio-Tek Instruments Inc. Where relevant, IC50 values were calculated using Prism 7.0, as the concentration of analyte required for 50% inhibition of cancer cell growth (compared to negative controls). Negative control was 1% aqueous DMSO, while positive control was doxorubicin (30 μM). All experiments were performed in duplicate.
Plasma Stability Assay
An aliquot of talaropeptide D (5) (10 μL, 1 mM in DMSO) was added to rat plasma (190 μL) pooled from >3 different rats, and heated to 37°C in a circulating water bath. Aliquots (20 μL) were taken at time points 0, 60, 120, and 180 min and added to MeCN (80 μL). Samples were centrifuged at 13,000 g for 3 min, and the supernatants concentrated to dryness under N2. After resuspending in MeOH (30 μL), and aliquot (1 μL) was analyzed by UHPLC-QTOF (MS), to detect and quantify residual talaropeptide D (5).
Production and Isolation
UHPLC-QTOF analysis of MATRIX cultivations (i.e., microbioreactor well, Figure S1) revealed that YES static broth cultivation was optimum for the production of talaropeptides (Figures S2–S4). Reversed phase HPLC-DAD-ESIMS analysis of a 10 day YES static broth cultivation of CMB-TU011 revealed complete suppression of talarolide A (1) biosynthesis, in favor of four new higher molecular weight putative peptides, eluting in the order m/z 1254.8 (2), 1353.8 (3), 1318.8 (4), and 1417.9 (5) (Figure S6). Subsequent studies confirmed the production of 2–5 in YES static broth flask cultivations (80 mL broths in 250 mL flasks), with a >10-fold increase in production in flasks sealed with an air permeable cotton plug, as opposed to an air impermeable screw cap (Figure S5). Based on these results, scaled up production (160 mL broth in 10 × 500 mL flasks) successfully yielded a crude EtOAc extract (1.65 g), which was subjected to solvent trituration to yield hexane (495 mg) and MeOH (1.15 g) solubles. With analytical HPLC-DAD-ESIMS localizing 2-5 in the MeOH solubles, this material was subjected to gel chromatography (Sephadex LH-20, MeOH) followed by semi-preparative reversed phase HPLC chromatography, to yield talaropeptides A (2, 1.3 mg), B (3, 1.3 mg), C (4, 1.8 mg), and D (5, 2.8 mg) (Figure 1).
Talaropeptide A (2)
HRESI(+)MS analysis of 2 returned a protonated molecular ion attributed to a molecular formula (C65H111N11O13, Δmmu−0.1) requiring 16 double bond equivalent (DBEs). Consistent with its putative peptide status, C3 and C18 Marfey's analyses (Figure 2), together with careful consideration of 1D and 2D NMR (DMSO-d6) data (Table 1, Figures S8–S13), confirmed the presence of 11 amino acid residues [L-Thr, L-Pro, L-Val, L-Leu, N-Me-L-Ala, N-Me-L-Val (×4), N-Me-L-Phe and N-Me-L-Ile]. Whereas the C3 Marfey's method proved very effective at discriminating most amino acids, and in particular L vs D N-Me-Ile and N-Me-allo-Ile (Figure 2H), the C18 Marfey's method was needed to discriminate L vs D N-Me-Ala (Figure 2D, inset). The presence of multiple (×4) N-Me-L-Val residues was apparent from the complex array of isopropyl methyl resonances in the 1H NMR data for 2 (Table 1). This assessment of the amino acid content in 2 accounted for all DBE and was indicative of a linear undecapeptide. Although overlapping 1D NMR resonances prevented assignment of the complete amino acid sequence, diagnostic 2D NMR HMBC and ROESY correlations did identify a number of partial sequences; (i) N-Me-L-Ala1-N-Me-L-Val2-N-H (e.g., an HMBC correlation from N-Me-L-Val2 to C-1 in N-Me-L-Ala1); (ii) L-Thr4-N-Me-L-Val5 (e.g., a ROESY correlation between H-2 in L-Thr4 and N-Me-L-Val5); (iii) L-Pro6-N-Me-L-Val7-N-Me-L-Val8-N-Me-L-Phe9 (e.g., HMBC correlations from H-2 in N-Me-L-Val7 to C-1 in L-Pro6, from N-Me-L-Val8 to C-1 in N-Me-L-Val7, and from H-2 in N-Me-L-Phe9 to C-1 in N-Me-L-Val8); (iv) N-Me-L-Ile10-L-Leu11 (e.g., an HMBC correlation from the N-H in L-Leu11 to C-1 in N-Me-L-Ile10) (see Figure 3). While the HMBC and ROESY data failed to link fragments (i–iv), or locate L-Val3, these issues were ultimately resolved by diagnostic UHPLC-QTOF (MS/MS) fragmentation patterns, which identified two consolidated partial sequences; (v) N-Me-L-Ala1-N-Me-L-Val2-L-Val3-L-Thr4-N-Me-L-Val5 and (vi) N-Me-L-Val8-N-Me-L-Phe9-N-Me-L-Ile10-L-Leu11 (Figure 3, Figure S14). Assembly of the partial sequences i-vi returned the complete structure for talaropeptide A (2) as shown (Figure 1).
Figure 2. Marfeys analysis for talaropeptide A (2). (A) C3 HPLC-DAD (340 nm) chromatogram of L-FDAA derivatised hydrolysate of 2. (B–H) C3 HPLC-MS-SIE chromatograms for selected L-FDAA derivatives of amino acid standards (broken lines) and the acid hydrolysate of 2 (red highlighted peaks). The inset in (D) and (G) show the C18 HPLC-DAD chromatogram. Traces reveal (B) L-Thr (SIE m/z 372), (C) L-Pro (SIE m/z 368), (D) N-Me-L-Ala (SIE m/z 356) (E) L-Val (SIE m/z 370) (F) N-Me-L-Val and L-Leu (SIE m/z 384), (G) N-Me-L-Phe (SIE m/z 432) and (H) N-Me-L-Ile (SIE m/z 398). *Residual Marfey's reagent.
Figure 3. Selected 2D NMR ROESY and HMBC correlations, and MS/MS fragmentations for talaropeptide A (2)–partial sequences highlighted in blue.
Talaropeptide B (3)
HRESI(+)MS analysis of 3 returned a sodiated molecular ion attributed to a molecular formula (C70H120N12O14, Δmmu−1.9) requiring 17 DBEs, suggestive of a Val homolog of 2. The 1H NMR (DMSO-d6) spectrum of 3 (Figure S16) revealed resonances closely resembling 2, however, the extra Val residue was not observed in HSQC (DMSO-d6) spectrum. Therefore, we re-acquired the NMR data in methanol-d4, which revealed resonances attributed to the additional Val residue (δH 3.50, δC 60.3) (Figure S19). This hypothesis was confirmed by C3 and C18 Marfey's analyses (Figure S23) and 1D and 2D NMR (methanol-d4) data (Table 2, Figures S17–S21), which confirmed the presence of 12 amino acid residues [L-Thr, L-Pro, L-Val (×2), L-Leu, N-Me-L-Ala, N-Me-L-Val (×4), N-Me-L-Phe and N-Me-L-Ile], accounting for all DBE and requiring that 3 be a linear dodecapeptide. Diagnostic 2D NMR HMBC and ROESY correlations identified key partial sequences; (i) N-Me-L-Ala1-N-Me-L-Val2 (e.g., an HMBC correlation from H-2 in N-Me-L-Val2 to C-1 in N-Me-L-Ala1); (ii) L-Pro6-N-Me-L-Val7-N-Me-L-Val8-N-Me-L-Phe9-N-Me-L-Ile10 (e.g., HMBC correlations from H-2 in N-Me-L-Val7 to C-1 in L-Pro6, from N-Me-L-Val8 to C-1 in N-Me-L-Val7, from N-Me-L-Phe9 to C-1 in N-Me-L-Val8, and from N-Me-L-Ile10 to C-1 in N-Me-L-Phe9) (Figure 4). Similarly, diagnostic UHPLC-QTOF (MS/MS) fragmentation patterns identified consolidated partial sequence; (iii) N-Me-L-Ala1-N-Me-L-Val2-L-Val3-L-Thr4-L-Val*-N-Me-Val5, and (iv) N-Me-L-Val8-N-Me-L-Phe9-N-Me-L-Ile10-L-Leu11 (Figure 4, Figure S22). Assembly of the partial sequences i-iv returned the complete structure for talaropeptide B (3) as shown (Figure 1), with the following caveat. As N-Me-Val5 and L-Leu11 are isomeric, their relative position cannot be determined by MS/MS fragmentation (or by way of overlapping 1D NMR resonances). To establish the regiochemistry of these amino acid residues, we draw on biosynthetic comparisons to the co-metabolite 2, as well as knowledge of the talaropeptide biosynthetic gene cluster (see below).
Figure 4. Selected 2D NMR correlations, and MS/MS fragmentations for talaropeptide B (3)–partial sequences highlighted in blue.
Talaropeptide C (4)
HRESI(+)MS analysis of 4 returned a sodiated molecular ion attributed to a molecular formula (C67H113N11O14, Δmmu +1.5) requiring 17 DBEs, suggestive of an acetylated homolog of 2 (i.e., +42 Da). Comparison of the 1H NMR (methanol-d6) data for 4 with 2 supported the latter hypothesis, with the only significant difference being the appearance of resonances attributed to an acetyl moiety (δH 1.99, δC 22.3), with an HMBC correlation from N-Me-L-Ala1 to N-COCH3 being diagnostic for an N-terminal acetamide moiety. C3 and C18 Marfey's analyses (Figure S31) together with 1D and 2D NMR (methanol-d4) data (Table 3, Figures S24–S29), confirmed the presence of 11 amino acid residues (L-Thr, L-Pro, L-Val, L-Leu, N-Me-L-Ala, N-Me-L-Val (×4), N-Me-L-Phe and N-Me-L-Ile), accounting for all DBE and requiring that 4 be a linear undecapeptide. Diagnostic 2D NMR HMBC and ROESY correlations identified key partial sequences; (i) N-Me-N-Ac-L-Ala1-N-Me-L-Val2 (e.g., an HMBC correlation from H-2 in N-Me-L-Val2 to C-1 in N-Me-N-Ac-L-Ala1); (ii) L-Thr4-N-Me-L-Val5 (e.g., an HMBC correlation from an N-Me-L-Val5 to C-1 in L-Thr4), and (iii) L-Pro6-N-Me-L-Val7-N-Me-L-Val8-N-Me-L-Phe9-N-Me-L-Ile10 (e.g., HMBC correlations from H-2 in N-Me-L-Val7 to C-1 in L-Pro6, and from N-Me-L-Val8 to C-1 in N-Me-L-Val7, and from N-Me-L-Phe9 to C-1 in N-Me-L-Val8, and from N-Me-L-Ile10 to C-1 in N-Me-L-Phe9) (Figure 5). Similarly, diagnostic UHPLC-QTOF (MS/MS) fragmentation patterns identified the consolidated partial sequence (iv) N-Me-L-Val8-N-Me-L-Phe9- N-Me-L-Ile10-L-Leu11 (Figure 5, Figure S30). Assembly of the partial sequences i–iv returned the complete structure for talaropeptide C (4) as shown (Figure 1).
Figure 5. Selected 2D NMR correlations, and MS/MS fragmentations for talaropeptide C (4)–partial sequences highlighted in blue.
Talaropeptide D (5)
HRESI(+)MS analysis of 5 returned a sodiated molecular ion attributed to a molecular formula (C72H122N12O15, Δmmu +0.7) requiring 18 DBEs, suggestive of an acetylated homolog of 3 (i.e., +42 Da). Comparison of the 1H NMR (methanol-d4) data for 5 with 3 supported the latter hypothesis, with the only significant difference being the appearance of resonances attributed to an acetyl moiety (δH 1.99, δC 22.5), with HMBC correlations to the N-Me-L-Ala1 position it on the N-terminus. C3 and C18 Marfey's analyses (Figure S38) and 1D and 2D NMR (methanol-d6) data (Table 4, Figures S32–S36), confirmed the presence of 12 amino acid residues [L-Thr, L-Pro, L-Val (×2), L-Leu, N-Me-L-Ala, N-Me-L-Val (×4), N-Me-L-Phe and N-Me-L-Ile], accounting for all DBE and requiring that 5 be a linear dodecapeptide. Diagnostic 2D NMR HMBC and ROESY correlations identified key partial sequences; (i) N-Me-N-Ac-L-Ala1-N-Me-L-Val2 (e.g., HMBC correlations from N-Me-L-Ala1 to N-COCH3, and from H-2 in N-Me-L-Val2 to C-1 in N-Me-L-Ala1); (ii) L-Val*-N-Me-L-Val5 (e.g., an HMBC correlation from an N-Me-L-Val5 to C-1 in L-Val*), and (iii) L-Pro6-N-Me-L-Val7-N-Me-L-Val8-N-Me-L-Phe9-N-Me-L-Ile10 (e.g., HMBC correlations from H-2 in N-Me-L-Val7 to C-1 in L-Pro6, and from N-Me-L-Val8 to C-1 in N-Me-L-Val7, and from N-Me-L-Phe9 to C-1 in N-Me-L-Val8, and from an N-Me to C-1 in N-Me-L-Phe9) (see Figure 6). Similarly, diagnostic UHPLC-QTOF (MS/MS) fragmentation patterns identified the partial sequence (iv) N-Me-L-Val8-N-Me-L-Phe9-N-Me-L-Ile10-L-Leu11 (Figure 6, Figure S37). Assembly of the partial sequences i–iv returned the complete structure for talaropeptide D (5) as shown (Figure 1), with the following caveat. As the NMR and MS/MS data for 5 could not provide an experimental assignment of relative regiochemistry for the dipeptide fragment comprised of L-Val3 and L-Thr4, we draw on biosynthetic comparisons to the co-metabolite 3, as well as knowledge of the talaropeptide biosynthetic gene cluster (see below).
Figure 6. Selected 2D NMR correlations, and MS/MS fragmentations for talaropeptide D (5)–partial sequences highlighted in blue.
A genome sequence of Talaromyces sp. CMB-TU011 was obtained, with coverage of 31X length of 27.5 MB, and a GC content of 47 %, consistent with related species (Table 5). Natural product genome mining of this sequence identified 17 biosynthetic gene clusters (BGCs, Table S7) including three non-ribosomal peptide synthetases (NRPS). A very large intron-less mega synthase that includes 12 modules and 44 domains encoded in a single gene (45,892 bases, 15,297 amino acids) was identified as a plausible talaropeptide NRPS. Of note, this NRPS is only 3.2 kb smaller than the largest NRPS ever reported (49,104 bases, plu2670, 16,367 amino acids), being that documented for the extensively N-methylated and commercially important fungal cyclic undecapeptide cyclosporine from Tolypocladium inflatum (GenBank accesion: CAA82227, 15281 amino acids) (Weber et al., 1994).
The putative talaropeptide NRPS (Figure 7) exhibits an N-terminus condensation domain with a similar configuration to that of previously reported C domains associated with peptides incorporating N-terminal acyl moieties, consistent with the N-terminal N-acylation observed in talaropeptides C (4) and D (5). This domain might have been skipped during the biosynthesis of talaropeptide A (2) and B (3), or alternatively the N-Ac moiety may have been deleted after the biosynthesis (i.e. hydrolysed). A total of 12 adenylation domains were detected, in agreement with the number of amino acid residues found in talaropeptides B (3) and D (5). Predicted amino acid specificities for these domains are largely in agreement with those observed for 3 and 5, except for modules 1 and 4 (Table 6). Seven methyl transferase domains were consistent with N-methylation sites in 2-5, with exceptions for N-methylation of residues 1 and 2, which may be installed post NRPS assembly. Alternatively, the methyl transferase in module 3 appears to be inactive on its corresponding extension step (i.e., L-Val3), and may be responsible for methylation of the first two residues (i.e., N-Me-N-Ac-L-Ala1 and N-Me-L-Val2). The methylation domain at module 5 (L-Val*) appears to be inactive during the biosynthesis of talaropeptides B (3) and D (5), while the entire module 5 inactive in the biosynthesis of talaropeptides A (2) and C (4). Finally, a thioesterase domain was detected at the C-terminus of the talaropeptide NRPS, accounting for the release of the peptide product with a C-terminus carboxylic acid.
Figure 7. Domain organization of the talaropeptide synthase and biosynthetic logic of the talaropeptides. Biosynthesis of talaropeptide D (5) is depicted. Methylation domains marked with an asterisk are skipped during biosynthesis, while module 5 (highlight light green) is skipped during talaropeptides A (2) and C (4) biosynthesis.
Table 6. Comparison of a predicted product for ORFX (talaropeptide synthase) with structure for talaropeptide D (5). Adenylation (A) domain specificity was calculated using the LSI based A-domain functional predictor.
Talaropeptide Biological Activity
Talaropeptides A-D (2-5) exhibited no growth inhibitory activity when tested (up to 30 μM) against human lung (NCI-H460) and colon (SW620) carcinoma cells, or when tested against the Gram-negative bacteria Escherichia coli ATCC 11775 and Pseudomonas aeruginosa ATCC 10145, the Gram-positive bacteria Staphylococcus aureus ATCC 25923 and S. aureus ATCC 9144, or the fungus Candida albicans ATCC 10231 (Supplementary Material). By contrast, talaropeptides A (2) and B (3) alone exhibited promising growth inhibitory activity (IC50 1.5 and 3.7 μM) against the Gram-positive bacteria Bacillus subtilis ATCC 6633 (Figure S41). As might be predicted for an extensively N-methylated linear peptide, talaropeptide A (2) proved stable to rat plasma (i.e., proteases) (Figure S39).
Although fungi are well-known to produce cyclic peptides, linear peptides > 7 amino acid residues are comparatively rare (Komatsu et al., 2001; Boot et al., 2006). For example, excluding peptaibols such as the recently described trichodermides (Jiao et al., 2018), which are dominated by non-proteinogenic amino acids [e.g., α-aminoisobutyric acid (Aib) and D–isovaline (D–Iva)], only a handful linear peptides of > 7 amino acid residues have been reported from fungi. Interestingly, these reports feature peptides from marine-derived fungi, including the dodecapeptide dictyonamides A and B from a marine red alga-derived fungus (Komatsu et al., 2001), and N-methylated octapeptides RHM 1 and RHM 2 from a marine sponge-derived Acremonium (Boot et al., 2006). Also of note, no linear peptides have been reported from the genus Talaromyces.
The talaropeptides A-D (2-5) represent a new class of extensively N-methylated linear peptide natural product, and at the same time feature peptide amino acid sequences that are unprecedented in the scientific literature. That the talaropeptide pharmacophore lacks mammalian cell cytotoxicity, and exhibits highly selective antibacterial properties (albeit with modest potency), with a clear structure activity relationship requirement built around N-terminal acetylation, is intriguing.
From an ecological perspective, the link between antibacterial activity and acetylation suggests that control of N-acetylation, perhaps as a post-NRPS modification by hydrolysis of the acetyl group or by an unknown biosynthetic mechanism that lead to domain skipping, may bias production in favor of 2 and 3 as an antibacterial defense, or 4 and 5 as putative antibacterial prodrugs. In an ecological setting rich in microbial competitors, this putative biosynthetic mechanism of control may be mediated by inter-species or even inter-kingdom chemical communication.
The discovery that talaropeptide production was highly culture media and phase dependent (i.e., YES broth, static flask with an air permeable seal) raises the possibility that, the paucity of published fungal linear peptides may be due to a bias for cultivation conditions that disfavor linear peptides. Our application of systematic miniaturized microbioreactor approach to trialing cultivation conditions (i.e., MATRIX) provides a low cost, practical means to access this silent potential.
Data Availability Statement
The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.
The GeneBank accession; Bankit212474, talarolide_synthase – MH479449.
RC initiated and oversaw all research. PD performed all fungal cultivations, and isolated and characterized talaropeptides. PD, PP, AS, ZK, and RC performed data analysis and talaropeptide structure elucidations. ZK isolated fungal DNA, and together with PD carried out bioassays. PC-M and EM performed all genomic analyses, and identified the talaropeptide NRPS. RC and PD co-drafted the manuscript.
This work was supported in part by The University of Queensland, the Institute for Molecular Bioscience and the Australian Institute for Bioengineering and Nanotechnology.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We thank R Damodar for the original isolation of Talaromyces sp. CMB-TU011.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem.2018.00394/full#supplementary-material
Supplementary Material. The includes fungal cultivation trial details, as well as tabulated and annotated 1D and 2D NMR data and spectra, C3 and C18 Marfey's and MS/MS analyses, and biological assay data for 2–5.
UHPLC, ultra-high-performance liquid chromatography; HPLC, high-performance liquid chromatography; QTOF, quadrupole time of flight; ESI, electrospray ionization; SIE, single ion extraction; DAD, diode array detector; NRPS, non-ribosomal peptide synthetase; LSI, Latent semantic indexing; ORFX, orphan protein/enzyme.
Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A., Dvorkin, M., Kulikov, A. S., et al. (2012). SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477. doi: 10.1089/cmb.2012.0021
Baranašić, D., Zucko, J., Diminic, J., Gacesa, R., Long, P. F., Cullum, J., et al. (2014). Predicting substrate specificity of adenylation domains of nonribosomal peptide synthetases and other protein properties by latent semantic indexing. J. Ind. Microbiol. Biotechnol. 41, 461–467. doi: 10.1007/s10295-013-1322-2
Blin, K., Wolf, T., Chevrette, M. G., Lu, X., Schwalen, C. J., Kautsar, S. A., et al. (2017). antiSMASH 4.0-improvements in chemistry prediction and gene cluster boundary identification. Nucleic Acids Res. 45, W36–W41. doi: 10.1093/nar/gkx319
Boot, C. M., Tenney, K., Valeriote, F. A., and Crews, P. (2006). Highly N-methylated linear peptides produced by an atypical sponge-derived Acremonium sp. J. Nat. Prod. 69, 83–92. doi: 10.1021/np0503653
Dewapriya, P., Prasad, P., Damodar, R., Salim, A. A., and Capon, R. J. (2017). Talarolide A, a cyclic heptapeptide hydroxamate from an Australian Marine Tunicate-Associated Fungus, Talaromyces sp. (CMB-TU011). Org. Lett. 19, 2046–2049. doi: 10.1021/acs.orglett.7b00638
Finn, R. D., Coggill, P., Eberhardt, R. Y., Eddy, S. R., Mistry, J., Mitchell, A. L., et al. (2016). The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, D279–D285. doi: 10.1093/nar/gkv1344
Jiao, W. H., Khalil, Z., Dewapriya, P., Salim, A. A., Lin, H. W., and Capon, R. J. (2018). Trichodermides A–E: new peptaibols isolated from the australian termite nest-derived fungus trichoderma virens CMB-TN16. J. Nat. Prod. 81, 976–984. doi: 10.1021/acs.jnatprod.7b01072
Marchler-Bauer, A., Bo, Y., Han, L., He, J., Lanczycki, C. J., Lu, S., et al. (2017). CDD/SPARCLE: functional classification of proteins via subfamily domain architectures. Nucleic Acids Res. 45, D200–D203. doi: 10.1093/nar/gkw1129
Rutherford, K., Parkhill, J., Crook, J., Horsnell, T., Rice, P., Rajandream, M. A., et al. (2000). Artemis: sequence visualization and annotation. Bioinformatics 16, 944–945. doi: 10.1093/bioinformatics/16.10.944
Simpson, J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones, S. J. M., and Birol, I. (2009). ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123. doi: 10.1101/gr.089532.108
Vijayasarathy, S., Prasad, P., Fremlin, L. J., Ratnayake, R., Salim, A. A., Khalil, Z., et al. (2016). C3 and 2D C3 Marfey's methods for amino acid analysis in natural products. J. Nat. Prod. 79, 421–427. doi: 10.1021/acs.jnatprod.5b01125
Weber, G., Schörgendorfer, K., Schneider-Scherzer, E., and Leitner, E. (1994). The peptide synthetase catalyzing cyclosporine production in Tolypocladium niveum is encoded by a giant 45.8-kilobase open reading frame. Curr. Genet. 26, 120–125. doi: 10.1007/BF00313798
Keywords: marine-derived, fungus, Talaromyces, talaropeptide, NRPS, linear peptide, secondary metabolite, natural product
Citation: Dewapriya P, Khalil ZG, Prasad P, Salim AA, Cruz-Morales P, Marcellin E and Capon RJ (2018) Talaropeptides A-D: Structure and Biosynthesis of Extensively N-methylated Linear Peptides From an Australian Marine Tunicate-Derived Talaromyces sp. Front. Chem. 6:394. doi: 10.3389/fchem.2018.00394
Received: 02 July 2018; Accepted: 14 August 2018;
Published: 04 September 2018.
Edited by:Xian-Wen Yang, Third Institute of Oceanography, State Oceanic Administration, China
Reviewed by:Yonghui Zhang, Huazhong University of Science and Technology, China
Fernando Reyes, Fundación Centro de Excelencia en Investigación de Medicamentos Innovadores en Andalucía, Spain
Kirk R. Gustafson, National Cancer Institute (NCI), United States
Copyright © 2018 Dewapriya, Khalil, Prasad, Salim, Cruz-Morales, Marcellin and Capon. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Robert J. Capon, firstname.lastname@example.org