First insights into the gut microbiome of Diatraea saccharalis: From a sugarcane pest to a reservoir of new bacteria with biotechnological potential

A country’s biodiversity is a key resource for the development of a sustainable bioeconomy. However, often the most biodiverse countries on the planet hardly profit from their biological diversity. On the contrary, occasionally components of that biodiversity become a threat to society and its food sustainability. That is the case of the sugarcane borer Diatraea saccharalis. Here, the analysis of the bacteria associated with the digestive tract of D. saccharalis reveals a rich and diverse microbiota. Two types of diets were analyzed under laboratory conditions. The metataxonomic analysis revealed a number of taxa common to most of the larval pools analyzed with relative abundances exceeding 5%, and five families of bacteria which have also been reported in the gut of another Lepidoptera. A large fraction of microorganisms detected by amplicon sequencing were considered to be rare and difficult to cultivate. However, among the cultivable microorganisms, 12 strains with relevant biotechnological features were identified. The strain that showed the highest cellulolytic activity (GCEP-101) was genome sequenced. The analysis of the GCEP-101 complete genome revealed that the values of 16S rRNA identity, the Average Nucleotide Identity, and the digital DNA–DNA hybridization place the strain as a candidate for a new species within the genus Pseudomonas. Moreover, the genome annotation of the putative new species evidenced the presence of genes associated with cellulose degradation, revealing the hidden potential of the pest as a reservoir of biotechnologically relevant microorganisms.


Introduction
The class Insecta comprises a vast group of organisms with remarkable adaptive capabilities that have favored their colonization of the biosphere. The evolutionary success of insects largely relies on the degree of specialization of their gut microbiota (Poveda, 2019). The number of microorganisms present in an insect's intestinal tract is estimated to be up to 10 times larger than the number of insect cells. This translates into an estimate of 100 times more microbial genes than animal genes within one organism (Rajagopal, 2009;Engel and Moran, 2013;Gurung et al., 2019;Gupta and Nair, 2020). Apart for being main pollinators, several insect species and their microbiomes are considered major threats to several crops that are crucial to humans. Sugarcane, being one of the most cultivated crops globally, is vulnerable to pests such as Diatraea saccharalis, commonly known as the sugarcane borer (Beuzelin et al., 2014;Joyce et al., 2014;Daquila et al., 2019). In its larval stage, D. saccharalis efficiently penetrates deep into the stalks establishing galleries in which it continues its life cycle, thereby causing large economic losses (Vargas and Gómez, 2005;Lopes et al., 2014;Simões et al., 2015).
Notwithstanding the impact of D. saccharalis, to our knowledge there are no metagenomics or metataxonomics reports describing its gut microbiome. Most of the existing research is based on the characterization of cultivable microbes made while scouting for organisms capable of exerting biological control over the pest itself (Tan et al., 2011;Mahmoud et al., 2012;Davolos et al., 2015). Still, some authors have been interested in D. saccharalis as a source of microorganisms with biotechnological relevance (Barbosa et al., 2020;da Silva et al., 2022). Among the potentially relevant microbes are those involved in the copious degradation of cellulose within D. saccharalis' gut. The latter since there is a global need for biomass processing as a means to both reduce the environmental impact of some industrial and agricultural processes and diversify the sources for energy generation (Dantur et al., 2015). In this work, we describe the first insights into the gut microbiome of D. saccharalis by using 16S rRNA gene analysis and culturing techniques. This approach represents an opportunity for the discovery of previously undescribed microorganisms bearing catalytic activities of industrial interest.

Obtention and processing of Diatraea saccharalis
Eggs of D. saccharalis were purchased from the Colombian Agricultural Research Corporation (AGROSAVIA -Bogotá, Colombia). These were separated into two groups and incubated until hatching in hermetically closed containers at room temperature. Upon hatching one group was fed with an artificial diet provided by AGROSAVIA containing per liter: 8 g of wheat germ, 11.25 g of milled corn, 8 g of casein, 40 g of beer yeast, 20 g of pulverized sugar, 40 g of milled carrot, 80 g of pulverized sugarcane sheath, and 3.75 g of agar. The second group of larvae was fed with natural non-processed sugarcane pieces obtained from a local farm in the municipality of Piedecuesta, Santander, Colombia.
After 10 to 15 days at room temperature, the complete digestive tract of in average 80 larvae from each incubation was dissected under sterile conditions. In each case, approximately 60 tracts were placed into 1.5 mL centrifuge tubes and preserved at −80 °C for DNA extraction, while the remaining 20 were placed in sterile PBS for microbial cultivation.

Microbial enrichment and isolation
Approximately 20 digestive tracts from each incubation were macerated with a sterile swab until a homogeneous solution was obtained. The macerated material was then inoculated in liquid minimal media suitable for recovering uncommon slow-growing microorganisms (Kato et al., 2018;Chaudhary et al., 2019;Kurm et al., 2019). The base medium used was M9 as described by Reasoner and Geldreich, 1985; and modified as follows: 6.05 mM K 2 HPO 4 , 3.95 mM KH 2 PO 4 , 0.855 mM NaCl, 10 mM NH 4 Cl, 1 mM MgSO 4 , 0.3 mM CaCl 2 , 0.134 mM EDTA, 0.311 mM FeCl 3 , 6.2 μM ZnSO 4 , 0.76 μM CuSO 4 , 1.62 μM H 3 BO 3 , 1.65 μM Na 2 MoO 4 , 0.6 μM KI, and 0.1 g/L of yeast extract. All chemicals were purchased from Merck, Germany. The pH was adjusted to 7.2 ± 0.2 before autoclaving at 121 °C for 20 min. To favor the enrichment of microorganisms with cellulolytic activities, the medium was supplemented with a separately autoclaved carboxymethylcellulose (CMC) solution added to a concentration of 2 g/L as only carbon source. The 50 mL liquid enrichments were incubated for 15 days at room temperature and then transferred to solid modified M9 medium. Cellulose-degrading colonies were identified and isolated after showing a surrounding halo when stained with a Congo red solution (McDonald et al., 2012;Dantur et al., 2015). In addition, isolated colonies were tested for growth on solid M9 medium with 2 g/L of sawdust as only carbon source.

DNA extraction and sequencing
Metagenomic DNA was extracted from the dissected digestive tracts with the DNeasy ® PowerSoil ® -Qiagen kit following the manufacturer's instructions. Subsequently, the DNA samples were sent to Novogene Corporation Inc. (Stockton Blvd, California, United States) for partially sequencing the 16S rRNA gene with 250 bp paired-end reads using the primers 341F (5'-CCTAYGGGRBGCASCAG) and 806R (5'-GGACTACNNGGGTATCTAAT) with the Illumina NovaSeq 6000 platform (Illumina, United States).
Genomic DNA of pure isolates was extracted as described by Martín-Platero et al. (2007). Sequencing of the 16S rRNA gene of all isolates was carried out with a MinION Mk1C device (Oxford Nanopore Technologies Inc., United Kingdom), using the 16S Barcoding Sequencing Kit 1-24 (SQK-16S024) and FLO-MIN106 (R9.4.1) flow cells. The obtained sequences were compared with the GenBank DNA database release 251.0 (Benson et al., 2013) and the Ribosomal Database Project release Taxonomy 18 (Cole et al., 2014). In addition, a combination of Illumina NovaSeq 6000 (PE150) and Nanopore MinION platforms was used to sequence the genome of the cellulolytic strain GCEP-101. The latter was performed by using the Ligation Sequencing Kit (SQK-LSK109). MinION data was gathered and basecalled with MinKNOW 21.10.8.

Bioinformatic analysis 2.4.1. Metataxonomic composition of the intestines with the amplicon 16S rRNA
Demultiplexed Illumina FASTQ sequences were analyzed using QIIME2 v.2020.8 (Bolyen et al., 2019). First, paired-end read sequences were filtered by quality, denoised, truncated at 220 bases and clustered using DADA2 (Callahan et al., 2016). The Amplicon Sequence Variants (ASV) were classified at taxonomic level using a Naive Bayes classifier built with the SILVA 138_2 database (Robeson et al., 2020). ASVs identified as chloroplast or mitochondria were removed from the representative sequences and excluded from further analyses. The output tables for ASV abundance and taxonomy were processed with R  , 2013). Alpha-diversity was measured as Shannon (H′) and "Chao1" indexes. Statistical differences between diets were assessed with Mann-Whitney U test. Beta-diversity was measured using Bray-Curtis distance. Statistical differences for beta diversity between diets was assessed with the permutational multivariate analysis of variance (PERMANOVA) and the analysis of similarities (ANOSIM).

Whole genome assembly and annotation
The raw reads from both sequencing platforms were quality filtered with FastQC v.0.11.9 (Andrews, 2010). High-quality reads were hybridassembled de novo with Unicycler v 0.4.8 (Wick et al., 2017). Subsequently, the functional annotation of the genome was performed initially with Prokka v. 1.14.6 (Seemann, 2014), RAST v. 2.0 (Aziz et al., 2008) and PATRIC v. 3.6.12 (Davis et al., 2020). Finally, BlastKOALA annotation tool V2.2 was used to predict genes involved in cellulose degradation. The KEGG Orthology assignments were made against the KEGG prokaryotes gene database at the genus level (Kanehisa et al., 2016).

Genome-based phylogenetic analysis and in-silico species delineation
The genomic similarity between the strain GCEP-101 and the closest strains from the NCBI database were calculated using the Orthologous Average Nucleotide Identity (OrthoANI) Tool (OAT v. 0.93.1; Lee et al., 2016). To explore the existence of a putative new species, the Genome-to-Genome Distance Calculator (GGDC) and the Type (Strain) Genome Server (TYGS) provided by the German Collection of Microorganisms and Cell Cultures (DSMZ) was used (Meier-Kolthoff and Göker, 2019). The determination of the closest type strain genomes was carried out in two complementary ways: First, the GCEP-101 genome was compared against all type strains available in the TYGS database with the MASH algorithm (Ondov et al., 2016), from which the 10 type strains with the smallest MASH distances were chosen. Secondly, the 16S rDNA sequences were extracted from the GCEP-101 genome using RNAmmer (Lagesen et al., 2007) and BLASTed (Camacho et al., 2009) against the 16S rRNA gene sequences of the 16,976 type strains in the TYGS database (as for Jul.22.22). The 10 closest type strains were identified using the Genome BLAST Distance Phylogeny approach (GBDP) (Meier-Kolthoff et al., 2013). Digital DNA-DNA hybridization (dDDH) values and confidence intervals were calculated using the recommended settings of the GGDC 3.0 (Meier- Kolthoff et al., 2013Kolthoff et al., , 2022. The genome and closest relatives were visualized using BLAST Ring Image Generator (BRIG v. 0.95; Alikhan et al., 2011). In addition, intergenomic distances were used to infer a balanced minimum evolution tree with branch support via FASTME 2.1.6.1 including SPR postprocessing (Lefort et al., 2015). Branch support was inferred from 100 pseudobootstrap replicates. The tree was rooted (Farris, 1972) and visualized with PhyD3 (Kreft et al., 2017).

Diversity and composition of the gut microbiota of Diatraea saccharalis
We profiled the gut microbiota of 388 D. saccharalis individuals fed with two distinct diets. In total 231,943 reads from the V3-V4 region of the 16S rRNA gene were analyzed. Nine hundred and twenty-nine amplicon sequence variants (ASVs) with an average length of 395 bp were identified.
After removing sequences considered artifacts (ASVs present in only 1 sample), 361 ASVs were selected for taxonomic distribution and diversity analysis. The saturation of the rarefaction curves indicated that most of the bacterial biodiversity was sampled during sequencing (Supplementary Figure S1). In seven of the larval pools Proteobacteria were the most abundant phylum ( Figure 1A). The exception was one sample fed with an artificial diet where the most abundant phylum was Firmicutes. At the order level, representatives from Rhizobiales and Micrococcales were present throughout all analyzed samples. Two of the sugarcane diet microbiomes showed dominance of organisms within the order Rickettsiales ( Figure 1B). At family and genus level, representatives of Rhizobiaceae and Wolbachia dominated in several samples, accompanied by a large number of taxa present in less than 1% of the sequences (Figures 1C,D). Alpha diversity for D. saccharalis gut microbiota was similar when insects were reared on artificial or natural diets. Species richness (Chao1, Mann-Whitney value of p = 0.0714) and species diversity (Shannon, Mann-Whitney value of p = 0.3929) did not resulted in statistically significant differences between diets. Community structure of gut microbiome among diet treatments also was stable and did not result in statistical differences as measured by PERMANOVA (Bray-Curtis distance; F-value = 1.105; R-square = 0.16324; value of p = 0.215) and ANOSIM (Bray-Curtis distance; R = 0.10769; value of p <0.231).

Bioprospecting of a putative new bacterial species with cellulosic activity
In order to target microorganisms with cellulolytic activity, approximately 160 digestive tracts were aseptically dissected from D. saccharalis fifth instar larvae grown on the sugarcane and artificial diet set-ups. Once homogenized in PBS, the biological material was inoculated into modified M9 medium. In total, 12 bacterial colonies were isolated showing cellulose degrading activity on CMC-supplemented plates (Supplementary Table S1). Upon macroscopical and microscopical inspection, four morphologically distinct isolates were selected for whole 16S rRNA gene sequencing. Isolates GCEP-92 and GCEP-94 showed to be closely related to Klebsiella variicola while isolate GCEP-95 affiliated to Pseudomonas nitroreducens (Supplementary Table S2). The isolate that showed the highest apparent cellulolytic activity on CMC plates (GCEP-101), had as closest hit a bacterium within the genus Pseudomonas with no species-level assignment (Pseudomonas sp. HS-18). Consequently, a hybrid assembly using Illumina and Oxford Nanopore reads was prepared reaching a 304-fold coverage and producing a single circular contig of 6,229,841 bp (Figure 2; GenBank Acc. N°. CP104011). The genome has an overall GC content of 66.34%. In total, 5,510 genes were predicted, including 70 tRNAs, 16 rRNAs, 1 tmRNA, 3,400 genes encoding proteins with predicted functions, and 2,023 genes encoding hypothetical proteins. The BlastKOALA pipeline assigned at least one GO term to 3,247 of the predicted proteins (59%), while 2,392 were assigned to at least one KEGG pathway. A total of 269 pathways were predicted, including 8 proteins related to cellulose degradation. These were identified within pathways for carbohydrate metabolism (Supplementary Table S3; Supplementary Figure S2). Regarding the Average Nucleotide Identity (ANI), strain GCEP-101 showed the highest nucleotide-level similarity with Pseudomonas nicosulfuronedens LAM1902T (ANI: 90.27%; Figure 3). In addition, the in-silico DNA-DNA hybridization showed maximum d4 and d6 values of 41.5 and 68.2% with members of P. nicosulfuronedens, respectively (Supplementary Table S4). In all cases such values are lower than the 70% hybridization threshold used commonly to assign closely related strains Frontiers in Ecology and Evolution 04 frontiersin.org Frontiers in Ecology and Evolution 05 frontiersin.org to the same species (Wayne et al., 1987). A phylogenetic tree (Supplementary Figure S3) was calculated using the genomes of 14 microorganisms. Pseudomonas sp. GCEP-101 was located in an independent branch within the same cluster than the recently described P. nicosulfuronedens, and two species known by their metabolism of nitrogenated compounds: P. nitroreducens and P. nitritireducens. Finally, the analysis of the genes within GCEP-101 genome revealed the presence of genes encoding enzymes related to the degradation of plant biomass, as well as chitobiases and β-hexosaminidases.

Discussion
Lepidoptera is one of the most widely distributed insect orders in nature (Van Nieukerken et al., 2011). The larval stages of numerous Lepidoptera, including D. saccharalis, have a detrimental impact on society since they feed mainly on living plants. During the arms race over the course of evolution, both plants and insects have evolved mechanisms to thrive in their respective ecosystems (Mello and Silva-Filho, 2002;Zunjarrao et al., 2020). Insects in particular rely on the adaptability of their gut microbiome for the utilization of plants and plant material for the completion of their life cycle (Gupta and Nair, 2020). Here, the analysis of the composition and structure of the bacteriome associated with the digestive tract of D. saccharalis reveals a rich and diverse microbiota. A number of taxa are common to most of the larval pools analyzed in both diet types and were detected with relative abundances exceeding 5%. In this study, a high diversity of microorganisms with low relative abundances was evidenced (Figure 1). Other reports on Lepidoptera and additional insect orders that characterized gut microbiomes from wild specimens have documented significantly lower relative abundances of rare taxa (Pinto-Tomás et al., 2011;Mejía-Alvarado et al., 2021). Interestingly, the microbiome of D. saccharalis exhibited five families of bacteria (Actinomycetaceae, Microbacteriaceae, Rhizobiaceae, Propionobacteriaceae, and Lachnospiraceae; Figure 1) that have also been reported in the gut of at least other 30 species of Lepidoptera (Voirol et al., 2018). Normally the lepidopteran microbiome is affected by diverse factors, including diet, environment, and the physiology of the host (Colman et al., 2012). This study pursued to determine community changes influenced by the type of diet. However, no statistical difference was detected by comparing the treatments. Our findings will help the subsequent research on the topic to decide the real value of using natural or artificial diets in this type of studies (Hammer et al., 2017).
Among the Proteobacteria, the families Xanthobacteraceae and Rhizobiaceae were common in all analyzed samples (Figure 2). Both families are widely known to be capable of N 2 fixation into ammonia which is then assimilated by gut endosymbionts that biosynthesize vitamins and amino acids needed for insect development (Indiragandhi et al., 2008;González-Cortés et al., 2022). At genus level, similar to the other taxonomic levels examined, a large number of taxa with relative abundances of less than 1% were observed. This category represents the largest number of genera in most of the groups fed with artificial diet and in three out of five of the larval pools nourished with sugarcane. A prominent genus present in samples from both the sugarcane and the artificial diet was Wolbachia, an α-proteobacterium endosymbiont that is widespread among nematodes and arthropods (Werren et al., 2008;Lefoulon et al., 2020). Some of the ecological interactions of Wolbachia with its hosts include mutualism and in certain cases parasitic manipulation. A meta-analysis conducted to determine the frequency and structure of Wolbachia infection in butterflies and moths demonstrated that this bacterial genus is present in approximately 80% of the lepidopteran species analyzed (Ahmed et al., 2015). This high distribution has been of interest since it is considered that the bacterium could bear the potential for the development of environmentally friendly strategies for the control of pests. Culturing bacteria from the digestive system of insects is challenging since often these harbor highly specialized microorganisms. Some of them require the specific physicochemical conditions of their host's gut for their development. In the case of D. saccharalis, most of the taxa found by amplicon sequencing were considered rare and complex to culture. Still, considering the remarkable capacity of D. saccharalis for degrading plant biomass, in this work, we focused on the isolation of microorganisms with cellulose degradation capacity. We isolated representatives of low abundance genera found by amplicon sequencing. Other studies have reported cellulolytic activity in the gut microbiota of insects. For instance, among the bacteria capable of degrading CMC in Coleoptera are Bacillus, Enterobacter, Klebsiella, and Paenibacillus (Rinke et al., 2011;Show et al., 2022).
Different microorganisms with cellulolytic capacity have been reported for D. saccharalis, including Klebsiella, Bacillus, Stenotrophomonas, Microbacterium, Pseudomonas and Enterococcus (Dantur et al., 2015;Barbosa et al., 2020). In this study, 12 strains capable of degrading cellulose on solid culture media were isolated. Based on phenotypic features and the taxonomic classification after 16S rRNA sequencing, the strain GCEP-101 was genome sequenced using two complementary sequencing technologies (Figure 2). Properly done prokaryotic systematics is based on the application of the so-called polyphasic approach (Schleifer, 2009). It comprises phenotypic characteristics, chemotaxonomy and genotypic and phylogenetic data. As part of these features, genomic profiling plays a fundamental role during the first stages of systematic determination of prokaryotes. This has been influenced by the major developments in genome sequencing in the last decades (Stackebrandt et al., 2002). In addition, alternatives to the classical DNA-DNA hybridization method have been introduced, since it demands significant laboratory work and it is prone to experimental biases (Richter and Rosselló-Móra, 2009). These approaches include dDDH and ANI and its derivatives such as OrthoANI, JSpecies (ANIb and ANIm) and gANI (Chun and Rainey, 2014;Chun et al., 2018).
The genus Pseudomonas is one of the most complex bacterial genera in terms of systematics. The number of species in the genus increases yearly (Aidan, 2014). In this specific case, the cellulase activity and the results from the 16S rRNA gene showing a close association with an undefined strain of Pseudomonas, motivated further steps for a better systematic determination. According to the criteria proposed by the ad hoc committee for the evaluation of species definition in bacteriology, both the adoption of ANI and the application of a technique with high correlation with conventional DDH, i.e., dDDH, are necessary for a species definition in prokaryotes (Wayne et al., 1987;Stackebrandt et al., 2002). Consequently, in this study both the OrthoANI values (90.27%) and dDDH (<70% for d4 and d6) strongly indicate the existence of a new species within the genus Pseudomonas derived from the gut microbiome of D. saccharalis. Nonetheless, the description of the potential new species requires further biological evidence. The results of this work certainly demonstrate the importance of studying in depth the cultivable and non-cultivable components of the sugarcane borer microbiome. The former is especially true when it is considered that the genome of GCEP-101 revealed the presence of enzymes involved in the hydrolysis of both labile carbohydrates (Supplementary Figure S2) and recalcitrant biomass such as the beta-xylosidase (EC3.2.1.52) and chitobiose (EC3.2.1; Naraian and Gautam, 2017). This study on the one hand expands the understanding of the microbiomes associated with an important lepidopteran pest affecting one major source of food and income at global scale. On the other hand, it contributes on the identification of potential biocatalysts to be used in industrial processes. In this regard, the microbiomes of D. saccharalis and similar pests are still unexplored potential sources for relevant biomolecules.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ncbi.nlm.nih.gov/, PRJNA870473.

Funding
This project (SIGP 75043) was funded through the FFJC by the Colombian Ministry of Science, Technology and Innovation -MINCIENCIAS and its program Colombia-BIO with contract number 80740-532-2020. The sampling and processing of biological material were granted by the contract ARG-167-2017, addendum N°1.
Frontiers in Ecology and Evolution 07 frontiersin.org