Talaropeptides A-D: Structure and Biosynthesis of Extensively N-methylated Linear Peptides From an Australian Marine Tunicate-Derived Talaromyces sp.

An Australian marine tunicate-derived fungus, Talaromyces sp. CMB-TU011 was subjected to a program of analytical microbioreactor (MATRIX) cultivations, supported by UHPLC-QTOF profiling, to reveal conditions for producing a new class of extensively N-methylated 11-12 residue linear peptides, talaropeptides A-D (2-5). The structures for 2-5, inclusive of absolute configurations, were determined by a combination of detailed spectroscopic and chemical (e.g., C3 and C18 Marfey's) analyses. We report on the biological properties of 2-5, including plasma stability, as well as antibacterial, antifungal and cell cytotoxicity. The talaropeptide mega non-ribosomal peptide synthetase (NRPS) is described, as second only in size to that for the fungus-derived immunosuppressant cyclosporine (an 11-residue extensively N-methylated cyclic peptide).


INTRODUCTION
In an earlier study, we described the structure elucidation of a first-in-class cyclic hexapeptide containing a rare hydroxamate residue, talarolide A (1) (Figure 1), isolated from an Australian marine tunicate-derived fungus, Talaromyces sp. CMB-TU011 (Dewapriya et al., 2017). In an effort to optimize the production of 1, we now report on a 24-well microbioreactor cultivation analysis (known in-lab as the MATRIX), using a combination of 11 different media and 3 phases (i.e., solid agar, as well as static and shaken broth). In situ solvent extraction on individual MATRIX culture wells yielded 33 extracts, which were subjected to UHPLC-DAD and UHPLC-QTOF-MS/MS analysis. While this study successfully revealed optimal conditions for the production of 1, including new analogs (work-in-progress), it also revealed conditions where talarolide production was fully suppressed in favor of a new class of extensively N-methylated linear peptides. This report provides an account of the production, isolation and characterisation of these new peptides, talaropeptides A-D (2-5), where structure elucidation inclusive of absolute configurations was achieved by a combination of detailed spectroscopic and chemical analyses. We also take the opportunity to report on the biological properties of 2-5, and document the mega non-ribosomal peptide synthetase (NRPS) responsible for the biosynthesis of talaropeptides.

Fungus Isolation and Taxonomy
Talaromyces sp. CMB-TU011 was isolated from an unidentified tunicate collected near Tweed Heads, NSW, Australia, and taxonomically identified as previously reported (Dewapriya et al., 2017).

Analytical (MATRIX) Cultivation of Talaromyces sp. CMB-TU011
Talaromyces sp. CMB-TU011 was cultured in a 24-well microbioreactor plate (Khalil et al., 2014) using a combination of 11 culture media and 3 phases (i.e., solid agar, and liquid static and liquid shaken) known in-lab as the MATRIX (Table S1). Briefly, a sterile loop was used to transfer mycelia/spores Abbreviations: UHPLC, ultra-high-performance liquid chromatography; HPLC, high-performance liquid chromatography; QTOF, quadrupole time of flight; ESI, electrospray ionization; SIE, single ion extraction; DAD, diode array detector; NRPS, non-ribosomal peptide synthetase; LSI, Latent semantic indexing; ORFX, orphan protein/enzyme. from an agar plate cultivation to 24-well microbioreactor plate (2.5 mL agar for solid cultures, 1.5 mL of broth for liquid cultures). The microbioreactor plates were sealed with air permeable membranes and incubated at 26.5 • C for 10 days (190 rpm for shaken broth). The resulting 33 cultures were extracted in situ with EtOAc (2.0 mL/well), with the decanted solvent filtered and dried under N 2 . Secondary metabolite production was then analyzed by HPLC-DAD-ESIMS and UHPLC-QTOF.

Genome Mining of Talaromyces sp. CMB-TU011
Genomic DNA from Talaromyces sp. CMB-TU011 was extracted using a standard chloroform protocol (Nikodinovic et al., 2003). The extracted DNA was fragmented using a Covaris focused ultrasonicator and the resulting fragments (∼ 1 KB) were used for library construction using a Thrulex DNA-Seq kit (Rubicon Genomics). The library was sequenced using a Next Seq platform in the paired-end (2 × 150) format to yield a total of 6,674,290 reads (1 GB). The raw reads were filtered and trimmed using Trimmomatic v0.36 (Bolger et al., 2014) to yield a total of 5,821,558 high quality reads (0.873 GB), which were assembled using Velvet 1.2.10 (Zerbino and Birney, 2008), Abyss v.2.0.3 (Simpson et al., 2009) and SPAdes v3.11.1 (Bankevich et al., 2012) assemblers with a window of Kmers between 41 and 121, with iterations every 10 units. The best assembly (Velvet with Kmer =51) was annotated for natural products biosynthetic gene clusters using the Fungal implementation of AntiSMASH 4.0 (Blin et al., 2017). The output was manually curated and domain annotation was improved using pFAM (Finn et al., 2016) and the NCBI Conserved Domain Search tool  (Marchler-Bauer et al., 2017). Adenylation domain specificity was predicted using the LSI based A-domain functional predictor (Baranašić et al., 2014). Manual sequence curation was done using the Artemis Genome Browser (Rutherford et al., 2000).

Antibacterial Assay
The bacterium to be tested was streaked onto a tryptic soy agar plate and was incubated at 37 • C for 24 h. One colony was then transferred to fresh tryptic soy broth (15 mL) and the cell density was adjusted to 10 4 -10 5 CFU/mL. The compounds to be tested    as the concentration of the compound or antibiotic required for 50% inhibition of the bacterial cells using Prism 7.0 (GraphPad Software Inc., La Jolla, CA).

Antifungal Assay
The fungus Candida albicans ATCC 10231 was streaked onto a Sabouraud agar plate and was incubated at 37 • C for 48 h. One colony was then transferred to fresh Sabouraud broth (15 mL) and the cell density adjusted to 10 4 -10 5 CFU/mL.   drug required for 50% inhibition of the fungal cells using Prism 7.0 (GraphPad Software Inc., La Jolla, CA).

Cytotoxicity Assay
Adherent cell SW620 (human colorectal carcinoma) and NCI-H460 (human lung carcinoma) cells were cultured in Roswell Park Memorial Institute (RPMI) 1640 medium. All cells were cultured as adherent mono-layers in flasks supplemented with 10% fetal bovine serum, L-glutamine (2 mM), penicillin (100 unit/mL), and streptomycin (100 µg/mL), in a humidified 37 • C incubator supplied with 5% CO 2 . Briefly, cells were harvested with trypsin and dispensed into 96-well microtiter assay plates at 3,000 cells/well, after which they were incubated for 18 h at 37 • C with 5% CO 2 (to allow cells to attach as adherent monolayers). Test compounds were dissolved in 20% DMSO in PBS (v/v) and aliquots (10 µL) applied to cells over a series of final concentrations ranging from 10 nM to 30 µM. After 48 h incubation at 37 • C with 5% CO 2 an aliquot (20 µL) of 3-(4,5dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) in phosphate buffered saline (PBS, 5 mg/mL) was added to each well (final concentration 0.5 mg/mL), and microtiter plates were incubated for a further 4 h at 37 • C with 5% CO 2 . After final incubation, the medium was aspirated and precipitated formazan crystals dissolved in DMSO (100 µL/well). The absorbance of each well was measured at 580 nm with a PowerWave XS Microplate Reader from Bio-Tek Instruments Inc. Where relevant, IC 50 values were calculated using Prism 7.0, as the concentration of analyte required for 50% inhibition of cancer cell growth (compared to negative controls). Negative control was 1% aqueous DMSO, while positive control was doxorubicin (30 µM). All experiments were performed in duplicate.

Talaropeptide A (2)
HRESI(+)MS analysis of 2 returned a protonated molecular ion attributed to a molecular formula (C 65 H 111 N 11 O 13, mmu−0.1) requiring 16 double bond equivalent (DBEs). Consistent with its putative peptide status, C 3 and C 18 Marfey's analyses (Figure 2), together with careful consideration of 1D and 2D NMR (DMSOd 6 ) data (Table 1, Figures S8-S13), confirmed the presence of 11 amino acid residues [L-Thr, L-Pro, L-Val, L-Leu, N-Me-L-Ala, N-Me-L-Val (×4), N-Me-L-Phe and N-Me-L-Ile]. Whereas the C 3 Marfey's method proved very effective at discriminating most amino acids, and in particular L vs D N-Me-Ile and N-Me-allo-Ile (Figure 2H), the C 18 Marfey's method was needed to discriminate L vs D N-Me-Ala (Figure 2D, inset). The presence of multiple (×4) N-Me-L-Val residues was apparent from the complex array of isopropyl methyl resonances in the 1 H NMR data for 2 ( Table 1) (Figure 3, Figure S14). Assembly of the partial sequences i-vi returned the complete structure for talaropeptide A (2) as shown (Figure 1).

Talaropeptide B (3)
HRESI(+)MS analysis of 3 returned a sodiated molecular ion attributed to a molecular formula (C 70 H 120 N 12 O 14 , mmu−1.9) requiring 17 DBEs, suggestive of a Val homolog of 2. The 1 H NMR (DMSO-d 6 ) spectrum of 3 ( Figure S16) revealed resonances closely resembling 2, however, the extra Val residue was not observed in HSQC (DMSO-d 6 ) spectrum. Therefore, we re-acquired the NMR data in methanol-d 4 , which revealed resonances attributed to the additional Val residue (δ H 3.50, δ C 60.3) (Figure S19). This hypothesis was confirmed by C 3 and C 18 Marfey's analyses ( Figure S23) and 1D and 2D NMR (methanol-d 4 ) data ( Table 2, (Figure 4, Figure S22). Assembly of the partial sequences i-iv returned the complete structure for talaropeptide B (3) as shown (Figure 1), with the following caveat. As N-Me-Val 5 and L-Leu 11 are isomeric, their relative position cannot be determined by MS/MS fragmentation (or by way of overlapping 1D NMR resonances). To establish the regiochemistry of these amino acid residues, we draw on biosynthetic comparisons to the co-metabolite 2, as well as knowledge of the talaropeptide biosynthetic gene cluster (see below).

Talaropeptide C (4)
HRESI(+)MS analysis of 4 returned a sodiated molecular ion attributed to a molecular formula (C 67 H 113 N 11 O 14, mmu +1.5) requiring 17 DBEs, suggestive of an acetylated homolog of 2 (i.e., +42 Da). Comparison of the 1 H NMR (methanol-d 6 ) data for 4 with 2 supported the latter hypothesis, with the only significant difference being the appearance of resonances attributed to an acetyl moiety (δ H 1.99, δ C 22.3), with an HMBC correlation from    Frontiers in Chemistry | www.frontiersin.org N-Me-L-Ala 1 to N-COCH 3 being diagnostic for an N-terminal acetamide moiety. C 3 and C 18 Marfey's analyses ( Figure S31) together with 1D and 2D NMR (methanol-d 4 ) data (Table 3, Figures S24-S29 Figure S30). Assembly of the partial sequences i-iv returned the complete structure for talaropeptide C (4) as shown (Figure 1).

Talaropeptide D (5)
HRESI(+)MS analysis of 5 returned a sodiated molecular ion attributed to a molecular formula (C 72 H 122 N 12 O 15, mmu +0.7) requiring 18 DBEs, suggestive of an acetylated homolog of 3 (i.e., +42 Da). Comparison of the 1 H NMR (methanol-d 4 ) data for 5 with 3 supported the latter hypothesis, with the only significant difference being the appearance of resonances attributed to an acetyl moiety (δ H 1.99, δ C 22.5), with HMBC correlations to the N-Me-L-Ala 1 position it on the N-terminus. C 3 and C 18 Marfey's analyses ( Figure S38) and 1D and 2D NMR (methanol-d 6 ) data ( Table 4, (Figure 6, Figure S37). Assembly of the partial sequences i-iv returned the complete structure for talaropeptide D (5) as shown (Figure 1), with the following caveat. As the NMR and MS/MS data for 5 could not provide an experimental assignment of relative regiochemistry for the dipeptide fragment comprised of L-Val 3 and L-Thr 4 , we draw on biosynthetic comparisons to the co-metabolite 3, as well as

Talaropeptide Biosynthesis
A genome sequence of Talaromyces sp. CMB-TU011 was obtained, with coverage of 31X length of 27.5 MB, and a GC content of 47 %, consistent with related species (Table 5). Natural product genome mining of this sequence identified 17 biosynthetic gene clusters (BGCs,  (Weber et al., 1994). The putative talaropeptide NRPS (Figure 7) exhibits an N-terminus condensation domain with a similar configuration to that of previously reported C domains associated with peptides incorporating N-terminal acyl moieties, consistent with the N-terminal N-acylation observed in talaropeptides C (4) and D (5). This domain might have been skipped during the biosynthesis of talaropeptide A (2) and B (3), or alternatively the N-Ac moiety may have been deleted after the biosynthesis (i.e. hydrolysed). A total of 12 adenylation domains were detected, in agreement with the number of amino acid residues found in talaropeptides B (3) and D (5). Predicted amino acid specificities for these domains are largely in agreement with those observed for 3 and 5, except for modules 1 and 4 ( Table 6). Seven methyl transferase domains were consistent with Nmethylation sites in 2-5, with exceptions for N-methylation of residues 1 and 2, which may be installed post NRPS assembly. Alternatively, the methyl transferase in module 3 appears to be inactive on its corresponding extension step (i.e., L-Val 3 ), and may be responsible for methylation of the first two residues (i.e., N-Me-N-Ac-L-Ala 1 and N-Me-L-Val 2 ). The methylation domain at module 5 (L-Val * ) appears to be inactive during the biosynthesis of talaropeptides B (3) and D (5), while the entire module 5 inactive in the biosynthesis of talaropeptides A (2) and C (4). Finally, a thioesterase domain was detected at the C-terminus of the talaropeptide NRPS, accounting for the release of the peptide product with a C-terminus carboxylic acid.
FIGURE 7 | Domain organization of the talaropeptide synthase and biosynthetic logic of the talaropeptides. Biosynthesis of talaropeptide D (5) is depicted. Methylation domains marked with an asterisk are skipped during biosynthesis, while module 5 (highlight light green) is skipped during talaropeptides A (2) and C (4) biosynthesis.
TABLE 6 | Comparison of a predicted product for ORFX (talaropeptide synthase) with structure for talaropeptide D (5). Adenylation (A) domain specificity was calculated using the LSI based A-domain functional predictor.

Module
Residue Prediction Domain LSI score Residues in 5 Comment

DISCUSSION
Although fungi are well-known to produce cyclic peptides, linear peptides > 7 amino acid residues are comparatively rare (Komatsu et al., 2001;Boot et al., 2006). For example, excluding peptaibols such as the recently described trichodermides (Jiao et al., 2018), which are dominated by non-proteinogenic amino acids [e.g., α-aminoisobutyric acid (Aib) and D-isovaline (D-Iva)], only a handful linear peptides of > 7 amino acid residues have been reported from fungi. Interestingly, these reports feature peptides from marine-derived fungi, including the dodecapeptide dictyonamides A and B from a marine red alga-derived fungus (Komatsu et al., 2001), and N-methylated octapeptides RHM 1 and RHM 2 from a marine sponge-derived Acremonium (Boot et al., 2006). Also of note, no linear peptides have been reported from the genus Talaromyces.
The talaropeptides A-D (2-5) represent a new class of extensively N-methylated linear peptide natural product, and at the same time feature peptide amino acid sequences that are unprecedented in the scientific literature. That the talaropeptide pharmacophore lacks mammalian cell cytotoxicity, and exhibits highly selective antibacterial properties (albeit with modest potency), with a clear structure activity relationship requirement built around N-terminal acetylation, is intriguing.
From an ecological perspective, the link between antibacterial activity and acetylation suggests that control of N-acetylation, perhaps as a post-NRPS modification by hydrolysis of the acetyl group or by an unknown biosynthetic mechanism that lead to domain skipping, may bias production in favor of 2 and 3 as an antibacterial defense, or 4 and 5 as putative antibacterial prodrugs. In an ecological setting rich in microbial competitors, this putative biosynthetic mechanism of control may be mediated by inter-species or even inter-kingdom chemical communication.
The discovery that talaropeptide production was highly culture media and phase dependent (i.e., YES broth, static flask with an air permeable seal) raises the possibility that, the paucity of published fungal linear peptides may be due to a bias for cultivation conditions that disfavor linear peptides. Our application of systematic miniaturized microbioreactor approach to trialing cultivation conditions (i.e., MATRIX) provides a low cost, practical means to access this silent potential.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.

AUTHOR CONTRIBUTIONS
RC initiated and oversaw all research. PD performed all fungal cultivations, and isolated and characterized talaropeptides. PD, PP, AS, ZK, and RC performed data analysis and talaropeptide structure elucidations. ZK isolated fungal DNA, and together with PD carried out bioassays. PC-M and EM performed all genomic analyses, and identified the talaropeptide NRPS. RC and PD co-drafted the manuscript.

FUNDING
This work was supported in part by The University of Queensland, the Institute for Molecular Bioscience and the Australian Institute for Bioengineering and Nanotechnology.