Structure, culture, and predicted function of the gut microbiome of the Mormon cricket Anabrus simplex (Orthoptera: Tettigoniidae)

The gut microbiome of insects plays an important role in their ecology and evolution, participating in nutrient acquisition, immunity, and behavior. Microbial community structure within the gut is heavily influenced by differences among gut regions in morphology and physiology, which determine the niches available for microbes to colonize. We present a high-resolution analysis of the structure of the gut microbiome in the Mormon cricket Anabrus simplex, an insect known for its periodic outbreaks in the western United States and nutrition-dependent mating system. The Mormon cricket microbiome was dominated by eleven taxa from the Lactobacillaceae, Enterobacteriaceae, and Streptococcaeae. While most of these were represented in all gut regions, there were marked differences in their relative abundance, with lactic-acid bacteria (Lactobacillaceae) more common in the foregut and midgut and enteric (Enterobacteriaceae) bacteria more common in the hindgut. Differences in community structure were driven by variation in the relative prevalence of three groups: a Lactobacillus in the foregut, Pediococcus lactic-acid bacteria in the midgut, and Pantoea agglomerans, an enteric bacterium, in the hindgut. These taxa have been shown to have beneficial effects on their hosts in insects and other animals by improving nutrition, increasing resistance to pathogens, and modulating social behavior. Using PICRUSt to predict gene content from our 16S rRNA sequences, we found enzymes that participate in carbohydrate metabolism and pathogen defense in other orthopterans. These were predominately represented in the hindgut and midgut, the most important sites for nutrition and pathogen defense. Phylogenetic analysis of 16S rRNA sequences from cultured isolates indicated low levels of divergence from sequences derived from plants and other insects, suggesting that these bacteria are likely to be exchanged between Mormon crickets and the environment. Our study shows strong spatial variation in microbiome community structure, which influences predicted gene content and thus the potential of the microbiome to influence host function.


INTRODUCTION
influence the representation of taxa in 16S rRNA metagenomic studies (Yuan et al. 2012), 118 however our aim here is not to make inferences about differences between field and laboratory-119 raised animals but differences among tissue types. We include the source of the animal (field or 120 laboratory) and as a covariate in our statistical analyses to account for variation due to 121 source/DNA extraction method (see Statistics). 122

Sequencing and Bioinformatics 123
The variable V4 region of 16S rRNA gene was amplified with universal primers (Hyb515F: 5'-124 GTGYCAGCMGCCGCGGTA -3', Hyb806R: 5'-GGACTACHVGGGTWTCTAAT-3') and 125 sequenced on the Illumina Miseq V3 platform. DADA2 1.1.5 (Callahan et al. 2016) was used to 126 process the raw sequencing data and taxonomy was assigned with the Greengenes 13.8 database 127 at 97% identity (see supplementary material). Sequence variants that comprised an average of 128 less than 1% of the reads recovered within a given Mormon cricket were removed prior to 129 analysis using phyloseq 1. 16 IDs used by PICRUSt to construct the phylogenetic tree were assigned to sequence variants 138 using Qiime 1.9 (Caporaso et al. 2010), and the Kyoto Encyclopedia of Genes and Genomes 139 (KEGG) database was used for functional classification. 140

Phenotypic assays 159
Fresh overnight cultures of all isolates were used for microscopic analysis. Lactobacillaceae 160 isolates were cultured in Man-Rogosa-Sharpe medium and Enterobacteriaceae were cultured in 161 nutrient broth or LB medium. Biochemical tests were done following Bridson (1998 Beta diversity among gut tissue types and between animal source (field vs. lab) was 176 assessed with a distance-based redundancy analysis (db-RDA) in vegan 2.3 (Oksanen et al. 177 2015), specifying a principal components ordination of Bray-Curtis distances. Statistical 178 significance of the terms in the db-RDA model was determined by 999 permutations of the 179 distance matrix in vegan, restricting the permutations to within each individual to retain the 180 nested structure of the data. The same procedure was also used to examine variation among 181 tissue types in the abundance of KEGG pathways, except nonmetric multidimensional scaling 182 was used for ordination. 183 We assessed the difference in taxon abundance among tissue types in univariate analyses 184 by fitting the data to a negative binomial generalized linear mixed model (Bates et al. 2013), 185 specifying the individual ID as the random effect and the tissue type and animal source (field vs. 186 lab) as fixed effects. A similar procedure was used to assess differences in 16S rRNA gene copy 187 number between tissue types and animal source, except a normal distribution was specified. 188 Likelihood ratio tests were used to determine the statistical significance of each factor (Venables 189 & Ripley 2002). Goodness-of-fit was assessed by with a Chi-square test (Faraway 2006) and 190 homoscedasticity was assessed by examination of residual plots. Nonparametric methods were 191 used in univariate analyses of the metagenomic predictions because no distribution provided a 192 reasonable fit to the data. P-values were adjusted for multiple tests using the false discovery rate 193 (Benjamini & Hochberg 1995). 194

195
Spatial structure of the gut microbiome 196 We recovered 11 dominant sequence variants from field and lab-raised individuals (Fig. 2)  Field-caught Mormon crickets had three taxa that were not shared with laboratory-raised 204 individuals, while lab-raised individuals had two taxa that were not shared with field individuals 205 ( Fig. 2). Guts from two laboratory individuals were almost completely comprised of the enteric 206 bacterium Pantoea agglomerans (99.3% and 80.8% of reads respectively), so we conducted our 207 analysis with and without these individuals. 208 Species richness and diversity differed among gut regions and were higher in field 209 compared to lab-raised animals (Table 1, Fig. 3). There was no significant interaction between 210 collection source and tissue type (Table 1), indicating that differences in alpha diversity among 211 tissue types were shared between lab and field-caught animals. We found that the midgut was the 212 most diverse part of the gut with two of the three measures of alpha diversity (species richness 213 and the Chao1 diversity estimator), while the hindgut and foregut had similar levels of richness 214 and diversity. The third metric (Shannon-Weiner) also found the foregut to be the least diverse 215 region, but differed in that the midgut and hindgut had similar levels of species diversity (Table  216 1, Fig 3). 217 The db-RDA analysis revealed that the structure of the gut microbiome also varied 218 among gut regions and between field and laboratory animals (Table 2, Fig. 4a). The non-219 significant interaction in this analysis, however, indicates that the differences in community 220 structure among tissue types were consistent between field and laboratory-raised individuals 221 (Table 2). To determine which members of the gut microbiome varied among gut regions, we 222 plotted the taxa scores from db-RDA analyses of field and laboratory Mormon crickets ( Figure  223 S1a). Three groups of bacteria appeared to separate along the gut axis: a Lactobacillus sp. lactic-224 acid bacterium associated with the foregut, Pediococcus lactic-acid bacteria were associated with 225 the midgut, and Pantoea agglomerans, an enteric bacterium, was found in association with the 226 hindgut. Inspection of the plots from laboratory animals, where the ileum and rectum of the 227 hindgut were dissected separately, indicate that P. agglomerans is more abundant in the rectum, 228 while the composition of the ileum, which is separated from the rectum by the colon, closely 229 resembled that of the midgut ( Figure S1b). 230 Univariate analyses of these three groups largely confirmed the pattern in the ordination 231 (Table 3, Fig. 2, Fig. S2). The interaction between tissue type and source was not significant in 232 any of the analyses and dropped to estimate the differences in abundance between tissue types. 233 Lactobacillus sp. was three times more common in the foregut than in the midgut (β=1.4 + 0.50, 234 p=0.02) and seven times more abundant in the foregut than in the hindgut (β=2.0 + 0.51, 235 p<0.001). Pediococcus were similar in abundance in the midgut and hindgut but 4.7 times more 236 common in these areas than the foregut (β=1.1 + 0.36, p=0.006). P. agglomerans was 209 times 237 more abundant in the hindgut than in the foregut (β=3.8 + 0.87, p<0.001) and twelve times more 238 abundant in the hindgut than in the midgut (β=2.5 + 0.82, p=0.007). 239

Abundance of bacteria from 16S rRNA qPCR 240
The number of copies of bacterial 16S rRNA genes was significantly different among tissue 241 types, as indicated by the significant interaction between tissue type and the source of the 242 Mormon crickets (Analysis of deviance: Source, F 1,14 =25.9, p<0.001; tissue type, F 3,161 =7.8, 243 p<0.001; Interaction, F 3,161 =2.8, p=0.04, Fig. 7). We decomposed the interaction to determine 244 how the total number of 16S rRNA copies differed among tissue types within field and 245 laboratory-raised animals. The major difference between the two sources was that in wild 246 Mormon crickets, the midgut had the lowest abundance of all gut regions, while in laboratory-247 raised individuals, both the midgut and the ileum had the lowest abundance of bacterial 16S 248 rRNA genes (Table S1, Fig. 5). 249

PICRUSt metagenomic predictions 250
PICRUSt analysis of 16S rRNA sequence variants recovered 5,891 KEGG orthologs associated 251 with 328 metabolic pathways. The representation of the predicted KEGG pathways differed 252 significantly among gut regions in both the full and reduced datasets, while the source of the 253 animals had a significant influence in the full dataset but not the reduced dataset (Table 2, Fig.  254 4b). Neither analysis, however, showed an interaction between tissue type and whether an animal 255 was wild or lab-reared, indicating that metagenomic predictions differed among tissue types in 256 similar ways (Table 2). Univariate analyses found significant differences among tissue types in 257 most KEGG pathways (Table S2), including those that could affect host-microbe interactions via 258 their role in nutrition, immunity, degradation of xenobiotics, and production of secondary 259 metabolites (Fig. 6). In these functional groups, the hindgut exhibited the most abundant 260 representation of each KEGG category, followed by the midgut and then the foregut (Fig. 6). 261

Nutrition 262
We searched our metagenomic predictions for specific bacterial genes known to 263 contribute to host nutrition in orthopterans. We queried our database for enzymes capable of 264 metabolizing the complex plant carbohydrates xylan, pectin, raffinose, and galactomannan, The abundance of KEGG orthologs for carbohydrate metabolism in our samples were 273 most pronounced in the hindgut (Table S3, Fig. 7a) and dominated by the enteric bacteria, 274 particularly Klebsiella sp. and Enterobacteriaceae 1 (Fig 7b). Lactic-acid bacteria, however, were 275 also represented in predictions for raffinose metabolism and enzymes capable of participating in 276 the degradation of cellubiose to glucose via cellubiose glucohydrolase, but not in degrading 277 cellulose to cellubiose (Fig 7b). 278 Gut bacteria might also play a role in the production of the essential amino acid 279 phenylalanine via the shikimate pathway, which is found in microbes and plants but not in 280 animals (Herrmann & Weaver 1999). Phenylalanine is required for stabilization of the cuticle 281 following molting (Bernays & Woodhead 1984) and is converted to tyrosine, the precursor of In the locust Schistocera gregaria (Orthoptera), four phenols have been shown to 290 increase resistance to microbial pathogens (Dillon & Charnley 1988: hydroquinone, 3,4-291 dihydroxybenzoic acid, p-hydroxybenzoic acid, and 4,5-dihydroxybenzoic acid. We found 292 enzymes associated with the production of all these compounds except for 4,5-dihydroxybenzoic 293 acid, which was not annotated in the KEGG database. Hydroquinone production was represented 294 by the enzyme arbutin 6-phosphate glucohydrolase, which metabolizes arbutin, a phenolic 295 glycoside present in leaf and fruit tissue of many plants (Xu et al. 2015). 296 Two enzymes were found capable of producing 3,4-dihydroxybenzoic acid. The first, 297 vanillate monooxygenase, demethylates vanillic acid, a compound derived from lignin (Bugg et 298 al. 2011). This is also the pathway proposed for 3,4-dihydroxybenzoic acid production in locusts 299 based on the abundance of vanillic acid in their feces (Dillon & Charnley 1988. The 300 second, p-hydroxybenzoate 3-monooxygenase, oxidizes p-hydroxybenzoic acid, one of the other 301 antimicrobial phenols in locusts (Dillon & Charnley 1995). The most likely source of p-302 hydroxybenzoic acid in the diet of Mormon crickets is benzoic acid, which is a precursor to 303 salicyclic acid in plants (Raskin 1992). The enzyme responsible for catalyzing the conversion of 304 benzoic acid to p-hydroxybenzoic acid (benzoate 4-monooxygenase), however, was not found 305 among the 11 dominant taxa in our samples, although it was present in the minority members of 306 the Mormon cricket gut microbiome. Production of p-hydroxybenzoic acid in appreciable 307 concentrations is thus less likely than for hydroquinone or 3,4-dihydroxybenzoic acid. 308 Like carbohydrate metabolism, the hindgut (Fig. 7c) and enteric bacteria (Fig. 7d) 309 dominated the abundance of KEGG orthologs implicated in the production of antimicrobial 310 phenols in our samples, with the exception of hydroquinone, which was represented to varying 311 degrees among the lactic-acid bacteria. Notably, P. agglomerans, which has been reported to 312 participate in the production of 3,4-dihydroxybenzoic acid in locusts (Dillon & Charnley 1995), 313 was not among taxa responsible for the occurrence of vanillate monooxygenase in our samples 314 (Fig 7d). 315 Finally, we searched for three other known contributors to pathogen defense: bacterocins, 316 antibiotics, and lactate dehydrogenase, which provides protection from pathogens in the gut by 317 reducing pH (Servin 2004). We found lactate dehydrogenase to be equally represented among 318 gut regions (Fig 7c), and lactic-acid bacteria were the main contributors to our samples (Fig. 7d). 319 We found three bacteriocins in the KEGG database: nisin, mutacin, and blp-derived bacterocins. 320 None of these were found in our metagenomics predictions, perhaps not surprising considering 321 their association with Streptococcus, which was not among the top 11 taxa in our samples (Fig.  322 2). The bacteriocins we would expect to find based on taxonomy (e.g. pediocin for Pediococcus) 323 were not annotated in the KEGG database. 324 Turning to the antibiotics, we found enzymes involved in the production of streptomycin, 325 penicillin, and novobiocin, but not all enzymes required for their synthesis were present (data not 326 shown). We did find β -lactamase, which confers resistance to β -lactam antibiotics (e.g.

Phylogenetic analysis of cultured isolates 333
Thirteen strains were cultured from the Mormon cricket gut based on 99% sequence similarity of 334 their near full-length 16S rRNA genes (mean + sd: 1406 + 30bp). Six were lactic-acid bacteria 335 (Lactobacillaceae) and seven were enteric bacteria (Enterobacteriaceae). 336 The lactic-acid bacteria fell into two clades in our phylogenetic analysis (Fig. 8). The first 337 clade was comprised of Pediococcus acidilactici isolates derived from environmental sources, 338 such as plants and various human foodstuffs, as well as strains from the human gut. Similarity to 339 sequences from the BLAST search was high (>99.5%) and branch lengths were short, indicating 340 that Pediococcus from the Mormon cricket gut are not highly derived from their relatives, as has 341 been found for Lactobacillus species isolated from bees ( Fig. 8; McFrederick et al. 2013). 342 Our search for Pediococcus sequences from insect guts in Genbank recovered sequences 343 from the termites Macrotermes bellicosus and M. subhyalinus, which formed their own well-344 supported clade (Fig 8). Cultured Pediococcus acidilactici shared 100% sequence identity in the 345 V4 region with the P. acidilactici 1 phylotype sequenced using the Illumina platform in this 346 study and with the P. acidilactici (102222) phylotype associated with variation in mating status 347 in Mormon crickets (Smith et al. 2016). Morphologically, Pediococcus acidilactici were 348 nonmotile and spherical (0.8 -1.0 μ m), often dividing to form pairs as described for other 349 Pediococcus. As other members of the genus, the P. acidilactici were gram-positive, non-motile, 350 facultatively anaerobic, grow at low pH, and produce lactate from lactose (Table S2). 351 The second clade of lactic-acid bacteria was comprised primarily of plant-associated 352 Lactobacillus. Unlike P. acidilactici, these Lactobacillus formed a distinct clade with good 353 branch support (Fig 8), indicating it is genetically distinct enough at the 16S rRNA locus to 354 distinguish itself from other clades in the phylogeny. Similar to P. acidilacitici, these 355 Lactobacillus had high sequence similarity (>99.5%) to other members of the clade and a short 356 branch length, indicating that while it is distinct enough to form its own clade, it is not highly 357 derived from its relatives at the 16S rRNA locus. 358 Our Genbank search for Lactobacillus isolated from insect guts found sequences from 359 ants, bees, and termites, and fruit flies, all of which fell into a different clade than Lactobacillus 360 isolated from Mormon crickets. Lactobacillus from these taxa thus appear to have a different 361 evolutionary history. Lactobacillus isolates shared 100% sequence identity in the V4 region with 362 the Lactobacillaceae 2 phylotype sequenced using the Illumina platform in this study. 363 Morphologically, these Lactobacillus appear as non-motile straight rods, approximately 1.3-2 364 μ m in length and 0.8-1.0 μ m wide and are gram-positive, non-motile, facultatively anaerobic, 365 grow at low pH, and produce lactate from lactose (Table S2). 366 The seven Enterobacteriaceae strains were most similar to Enterobacter strains in our 367 BLAST search, which recovered sequences from a variety of plant and animal sources (sequence 368 similarity=98.7-99.8%). Our survey of Genbank found Enterobacter from alimentary tracts of a 369 diverse group of insects, including termites, cockroaches, flies, beetles, stink bugs, bees, ants, 370 and moths. Like other studies (Brenner et al. 2005), however, the 16S rRNA gene did not have 371 enough signal to resolve relationships among Enterobacter and its relatives (data not shown) so 372 we present a simpler phylogeny with the Mormon cricket isolates and type strains from the 373 family (Fig. 9). 374 We found that our Mormon cricket isolates were interspersed with Enterobacter, 375 Klebsiella, and Escherichia type strains. A multilocus sequencing approach is thus needed to 376 improve the inference (Brenner et al. 2005). All seven strains isolated from Mormon crickets had 377 100% identity at the V4 region with the Klebsiella phylotype sequenced on the Illumina 378 platform, however the phylogenetic (Fig. 9) and phenotypic data (Table S2)

suggest that 379
Klebsiella is unlikely to be a correct taxonomic assignment. Unlike most Klebsiella, cultured 380 strains were motile, which is more typical of Enterobacter and other Enterobacteriaceae 381 not observe analogous structures in Mormon crickets (Fig. 1). 414 The midgut is particularly vulnerable to pathogens because the lack of an endocuticle 415 leaves the epithelium exposed once the peritrophic membrane is penetrated (Lehane &  416 Billingsley 1996). The Mormon cricket midgut was populated by lactic-acid bacteria, with 417 Pediococcus specifically exhibiting greater abundance in the midgut (and hindgut) than in the 418 foregut. Lactic-acid bacteria are known for their beneficial effects in insects, increasing 419 resistance to parasites in bees (Forsgren et al. 2010)  where in the gut Pediococcus is located has been unavailable until now. Pediococcus in the 427 midgut could provide immunological or nutritional benefits to Mormon crickets, as has been 428 shown for P. acidilactici in other animals (Castex et al. 2008(Castex et al. , 2009). We found that the capacity 429 for lactate production in our samples was dominated by Pediococcus and other lactic-acid 430 bacteria, although the abundance of the enzyme mediating lactate production was not higher in 431 the midgut relative to other regions based on our metagenomics predictions. The cultured isolates 432 of P. acidilactici obtained from Mormon crickets in this study will enable future experimental 433 and comparative genomic approaches to evaluate these hypotheses. 434 Lactic-acid bacteria were also common in the foregut, which was dominated by a 435 Lactobacillus that averaged 73.9% of the sequences recovered from this region. Bignell (1984) 436 noted that the foregut of insects tends to be the most acidic compartment, however studies that 437 measure the physiochemical environment and characterize microbiome composition of the 438 foregut are rare (but see Köhler et al. 2012). This is because the endocuticle, lack of 439 differentiated cells for absorption of nutrients, and frequent purging of consumed material into 440 the midgut provides little opportunity for foregut microbes to contribute to host nutrition. The 441 large differences in community structure between the foregut and the rest of the alimentary tract 442 in our study does illustrate the dramatic transition in microbial communities between what is 443 ingested and what can colonize the more distal regions of the gut. Our metagenomics predictions 444 also suggest that the foregut is not the site of extensive carbohydrate metabolism or pathogen 445 defense for most of the pathways we examined. 446 In contrast to the foregut and midgut, the hindgut was characterized by a dramatic 447 increase in enteric bacteria (Enterobacteriaceae). Ordination of the laboratory Mormon cricket 448 samples indicated that the rectum, not the ileum, was primarily responsible for the difference in 449 community structure in the hindgut. Enterobacteriaceae comprised 83.5% of the sequences from 450 the rectum compared to 57.5% from the ileum in laboratory-raised animals, which was more 451 similar to the midgut in community structure (Fig. 4a). This distinction is potentially important 452 because higher digestive efficiency in conventional compared to germ-free crickets has been 453 attributed to microbial colonization of the ileum in the orthopteran A domesticus (Kaufman & 454 Klug 1991). 455 Metabolism of the specific complex carbohydrates attributed to bacteria in this study 456 were also identified in our metagenomic predictions and localized to the hindgut, as well as 457 enzymes involved in the production of the essential amino acid phenylalanine via the shikimate 458 pathway. Phenylalanine is a precursor for tyrosine, which is required to stabilize the cuticle 459 during molting (Bernays & Woodhead 1984)  pheromone (Dillon, Vennard & Charnley 2000 and reduce susceptibility to microbial 474 pathogens through the production of phenols (Dillon & Charnley 1986. 475 Our metagenomics predictions suggest that enteric bacteria in Mormon crickets might be 476 capable of producing at least two of the antimicrobial phenols identified in S. gregaria, although 477 P. agglomerans was not identified as an important contributor in our study. This illustrates a 478 limitation of PICRUSt, as genomes in the IMG database used to make inferences about gene 479 content may miss important among-strain variation in metabolic capabilities. P. agglomerans 480 derived from S. gregaria are likely to have acquired this capability independently, unless the 481 metabolic pathway is different from the one analyzed here or the taxonomic designation reported 482 by Charnley (1986, 1995) is incorrect. 483 Metagenomic analyses are also dependent upon annotation of the relevant pathways in 484 the KEGG database. We were unable, for example, to assess the potential for the Mormon 485 crickets microbiome to produce bacteriocins or the aggregation pheromone guaiacol, a bacterial 486 metabolite produced by P. agglomerans in S. gregaria (Dillon et al. 2000,