<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article article-type="methods-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Bioinform.</journal-id>
<journal-title>Frontiers in Bioinformatics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Bioinform.</abbrev-journal-title>
<issn pub-type="epub">2673-7647</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">918853</article-id>
<article-id pub-id-type="doi">10.3389/fbinf.2022.918853</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Bioinformatics</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>
<italic>microTrait</italic>: A Toolset for a Trait-Based Representation of Microbial Genomes</article-title>
<alt-title alt-title-type="left-running-head">Karaoz and Brodie</alt-title>
<alt-title alt-title-type="right-running-head">Microbial Genomes to Traits</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Karaoz</surname>
<given-names>Ulas</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/31793/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Brodie</surname>
<given-names>Eoin L.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/23045/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>Earth and Environmental Sciences</institution>, <institution>Lawrence Berkeley National Laboratory</institution>, <addr-line>Berkeley</addr-line>, <addr-line>CA</addr-line>, <country>United States</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>Department of Environmental Science, Policy and Management</institution>, <institution>University of California</institution>, <addr-line>Berkeley</addr-line>, <addr-line>CA</addr-line>, <country>United States</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/38684/overview">Joao Carlos Setubal</ext-link>, University of S&#xe3;o Paulo, Brazil</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1694617/overview">Bruno Koshin V&#xe1;zquez Iha</ext-link>, University of S&#xe3;o Paulo, Brazil</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/435639/overview">Phil B. Pope</ext-link>, Norwegian University of Life Sciences, Norway</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Ulas Karaoz, <email>ukaraoz@lbl.gov</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Genomic Analysis, a section of the journal Frontiers in Bioinformatics</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>22</day>
<month>07</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>2</volume>
<elocation-id>918853</elocation-id>
<history>
<date date-type="received">
<day>12</day>
<month>04</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>20</day>
<month>06</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2022 Karaoz and Brodie.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Karaoz and Brodie</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Remote sensing approaches have revolutionized the study of macroorganisms, allowing theories of population and community ecology to be tested across increasingly larger scales without much compromise in resolution of biological complexity. In microbial ecology, our remote window into the ecology of microorganisms is through the lens of genome sequencing. For microbial organisms, recent evidence from genomes recovered from metagenomic samples corroborate a highly complex view of their metabolic diversity and other associated traits which map into high physiological complexity. Regardless, during the first decades of this <italic>omics</italic> era, microbial ecological research has primarily focused on taxa and functional genes as ecological units, favoring breadth of coverage over resolution of biological complexity manifested as physiological diversity. Recently, the rate at which provisional draft genomes are generated has increased substantially, giving new insights into ecological processes and interactions. From a genotype perspective, the wide availability of genome-centric data requires new data synthesis approaches that place organismal genomes center stage in the study of environmental roles and functional performance. Extraction of ecologically relevant traits from microbial genomes will be essential to the future of microbial ecological research. Here, we present <italic>microTrait</italic>, a computational pipeline that infers and distills ecologically relevant traits from microbial genome sequences. <italic>microTrait</italic> maps a genome sequence into a trait space, including discrete and continuous traits, as well as simple and composite. Traits are inferred from genes and pathways representing energetic, resource acquisition, and stress tolerance mechanisms, while genome-wide signatures are used to infer composite, or life history, traits of microorganisms. This approach is extensible to any microbial habitat, although we provide initial examples of this approach with reference to soil microbiomes.</p>
</abstract>
<kwd-group>
<kwd>functional traits</kwd>
<kwd>functional guilds</kwd>
<kwd>ecological strategy</kwd>
<kwd>trait-based model</kwd>
<kwd>profile hidden markov model</kwd>
<kwd>microbial genome</kwd>
<kwd>fitness traits</kwd>
<kwd>trait inference workflow</kwd>
</kwd-group>
<contract-num rid="cn001">DE-AC02- 05CH11231</contract-num>
<contract-sponsor id="cn001">U.S. Department of Energy<named-content content-type="fundref-id">10.13039/100000015</named-content>
</contract-sponsor>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Importance</title>
<p>The rapid adoption of high-throughput microbial sequencing is leading to accumulation of microbial genomes at an ever-increasing rate. These genomes represent instances from not only isolated microbes but also microbial populations in their native environmental context as metagenome-assembled genomes (MAGs) or single-cell amplified genomes (SAGs). We believe that an ability to efficiently predict ecological traits directly from primary sequence data is a necessary interface between microbial <italic>omics</italic> information and trait-based microbial ecology, and success here will significantly advance our ability to uncover generalizable features of microbiomes and their environmental context. To streamline the process of going from genome sequences to putative ecological traits, we developed <italic>microTrait,</italic> a set of tools to efficiently discover and distill the trait-based representation of a microbial genome.</p>
</sec>
<sec id="s2">
<title>Introduction</title>
<p>Linking microbiome structure and dynamics to ecosystem functioning globally in a predictive way and in face of global change has been a long-standing goal of microbial ecology (<xref ref-type="bibr" rid="B32">Finlay et al., 1997</xref>; <xref ref-type="bibr" rid="B63">Prosser et al., 2007</xref>; <xref ref-type="bibr" rid="B78">Van Der Heijden et al., 2008</xref>; <xref ref-type="bibr" rid="B76">Todd-Brown et al., 2012</xref>; <xref ref-type="bibr" rid="B13">Bier, Bernhardt et al., 2015</xref>). Efforts towards this goal traditionally included taxon-centric measurement approaches (<xref ref-type="bibr" rid="B75">Thompson et al., 2017</xref>; <xref ref-type="bibr" rid="B65">Ramirez et al., 2018</xref>) (<xref ref-type="bibr" rid="B56">Madin et al., 2020</xref>). Genetic, physiological, and ecological characterization of cultured isolates provided links between specific taxa and ecosystem processes like contributions to elemental and nutrient cycles, and biomass production. With the commoditization of high-throughput sequencing of taxonomic marker sequences, much effort in taxon-centric approaches shifted to extrapolating what is learned from representative isolates in the lab to their phylogenetic nearest neighbors detected with environmental community sequencing (<xref ref-type="bibr" rid="B51">Langille et al., 2013</xref>; <xref ref-type="bibr" rid="B8">Asshauer et al., 2015</xref>). Such approaches to infer functional groups via phylogenetic markers inherently assume strong phylogenetic conservation of microbial traits. Furthermore, without any whole-genome data, they are limited to taxa with cultured isolates.</p>
<p>Microbial-biogeochemical models are crucial tools in linking microbiome dynamics, environmental responses, and ecosystem processes across scales. Wide-spread availability of taxon-centric microbial measurements have naturally popularized taxon-centric models including few species or functional groups dominant at the local scale of interest. The upward scalability of such models would be limited given the fact that no single taxa would dominate at larger scales and with a limited number of parameter sets, the model would have poor adaptive capability both across scales and environmental conditions. Moreover, trying to approach the complexity of real systems at larger scales by adding more taxa or functional groups lead to increasingly complex models with a continuous demand for more parameters. Given these limitations of taxon-centric approaches in modeling the diversity and activity of microbes globally and with changing environmental conditions, trait-based representation of microbes is becoming increasingly popular.</p>
<p>Trait-based approaches represent an intermediate approach to modeling complex populations while also preserving key mechanistic properties that determine fitness in dynamic systems. The trait-based framework represents microbes with traits that can be summarized by few parameters and that are constrained by environmentally-dependent trade-offs. These approaches were developed in the field of plant ecology (<xref ref-type="bibr" rid="B86">Westoby and Wright 2006</xref>; <xref ref-type="bibr" rid="B1">Ackerly and Cornwell 2007</xref>), and have more recently been applied within microbial ecology at various scales, including global oceans and terrestrial environments (<xref ref-type="bibr" rid="B34">Follows et al., 2007</xref>; <xref ref-type="bibr" rid="B2">Allison 2012</xref>; <xref ref-type="bibr" rid="B15">Bouskill et al., 2012</xref>). The main underlying assumption is that combination of traits determines physiological performance which influences individual fitness and life history evolution. By abandoning the taxon concept, the trait-based framework strives to achieve a succinct description of the microbial communities with few essential communities, avoiding the complexity trap of taxon-centric modeling approaches. The challenge with this approach is to identify the key properties or traits of members of microbial communities and how these traits are regulated or trade-off against other traits, and to use this information to parameterize or constrain the functional potential of the modeled communities.</p>
<p>Traits may be identified through <italic>&#x2018;omic</italic> approaches (e.g. potential to produce or the detected activity of an extracellular enzyme, the genes for a specific metabolic pathway, the genomic capacity to replicate rapidly etc) or through physiological studies (e.g. enzyme, substrate uptake or growth kinetics, cell surface area, biomass stoichiometry, composition of storage pools etc.) or they may be inferred by manipulation experiments such as stable-isotope tracing with substrates at various concentrations to determine relative affinities. The paradigm shift from a taxa-to a trait-centric representation of microbiomes is partly stimulated by the wide-use of <italic>omic</italic> technologies to illuminate the functional potential of environmental microbial communities and their interactions with each other, higher organisms, and their environment (<xref ref-type="bibr" rid="B73">Sharon and Banfield 2013</xref>; <xref ref-type="bibr" rid="B5">Anantharaman et al., 2016</xref>; <xref ref-type="bibr" rid="B39">Gupta et al., 2016</xref>; <xref ref-type="bibr" rid="B70">Sangwan et al., 2016</xref>; <xref ref-type="bibr" rid="B92">Woodcroft et al., 2018</xref>). In particular, focusing on genome rather genes as ecological units makes the incorporation of many concepts from ecological and evolutionary theory into models possible therefore increase the value of the <italic>omic</italic> data for trait-based modeling (<xref ref-type="bibr" rid="B64">Prosser 2015</xref>). The rate at which isolate genomes, single-cell assembled genomes (SAGs) and metagenome-assembled genomes (MAGs) are being generated provide an unprecedented resource to study patterns in fitness trait conservation, trait linkage (i.e. co-occurrence patterns of traits within ecological units), trait trade-offs, and trait-environment relationships across scales. This continuous stream of microbial genomes necessitates development of computational tools that can efficiently and robustly extract potential traits from genome sequences.</p>
<p>Currently, the methods used to infer functional traits from genome sequences include 1) pairwise sequence alignments and database search (<xref ref-type="bibr" rid="B72">Shaffer et al., 2020</xref>), 2) statistical learning methods (<xref ref-type="bibr" rid="B31">Feldbauer et al., 2015</xref>; <xref ref-type="bibr" rid="B84">Weimann et al., 2016</xref>), and 3) phylogenetic inference (<xref ref-type="bibr" rid="B36">Goberna and Verdu 2016</xref>). Homologous inference from sequence alignments with tools like BLAST (<xref ref-type="bibr" rid="B4">Altschul et al., 1990</xref>), USearch (<xref ref-type="bibr" rid="B28">Edgar 2010</xref>), or DIAMOND (<xref ref-type="bibr" rid="B18">Buchfink et al., 2015</xref>) have large memory requirements and long run times, which makes these methods challenging to scale for a typical user to thousands of genome sequences. In addition, for the detection of remote homologs, the sensitivity of alignment-based methods is lower than the profile methods (<xref ref-type="bibr" rid="B17">Brenner et al., 1998</xref>). Statistical learning methods to predict microbial traits depend on the availability of extensive training sets to establish genotype-phenotype relationships. Such data exist only for a very limited set of core phenotypes and therefore the resulting models, while they can be highly accurate, offer a narrow view of the microbial trait space (<xref ref-type="bibr" rid="B93">Yabuuchi 2001</xref>; <xref ref-type="bibr" rid="B67">Ruan 2013</xref>). Phylogeny-based methods predict missing trait values of new genomes based on the traits of their evolutionary relatives. While phylogenetic conservatism of certain traits has been documented for bacteria and archaea, prokaryotic traits of ecological relevance have overall weak phylogenetic signal (<xref ref-type="bibr" rid="B59">Martiny et al., 2013</xref>). In addition, as the bulk of the current information on phenotypes are centered around organisms of biotechnological and medical interest, the accuracy of the phylogenetic trait prediction remains low (<xref ref-type="bibr" rid="B36">Goberna and Verdu 2016</xref>).</p>
<p>To fill this need, we developed an R package, <italic>microTrait,</italic> that provides a conceptual framework and associated pipelines to translate a microbial genome into a suite of potential fitness traits. <italic>microTrait</italic> maps a genome sequence into a hierarchical trait space that covers energetic, resource acquisition, stress tolerance, and life history traits that underlie microbial strategies describing environmental microbes (<xref ref-type="bibr" rid="B57">Malik et al., 2020</xref>). Our pipeline makes use of literature-supported <italic>omics</italic> markers defining trait-based microbial strategies to quantify trait profiles for microbial genomes. Given a genome sequence, individual gene markers are detected with a model-based approach using a new HMM database of protein families. The models have been trained with protein sequences that represent sequence diversity from genomes and metagenomes and their accuracy measured independently with KEGG orthology database. The traits are inferred from gene markers based on their presence/absence patterns and presented in a hierarchical manner.</p>
</sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<sec id="s3-1">
<title>Microbial Traits With Genomic Basis</title>
<p>The overarching goal of our approach is to reduce the dimensionality and complexity of the genomic information such that a genome is represented as a feature vector where individual features represent one or more aspects of an ecological strategy (<xref ref-type="bibr" rid="B50">Lajoie and Kembel 2019</xref>). Microbial traits span a wide range of phenotypic, ecological, and metabolic characteristics. The choice of specific traits and their representational granularity depend on the research question of interest. We first review the genome based traits inferred by <italic>microTrait,</italic> rationalize their choice primarily following the frameworks proposed by (<xref ref-type="bibr" rid="B38">Green et al., 2008</xref>) and more recently (<xref ref-type="bibr" rid="B57">Malik et al., 2020</xref>) (<xref ref-type="fig" rid="F1">Figure 1</xref>).</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Conceptual overview of genome-derivable traits (gray boxes) underlying ecological strategies (blue boxes) represented in <italic>microTrait</italic> based on literature surveys. For each trait, genomic features are indicated. <xref ref-type="sec" rid="s11">Supplementary Table S1</xref> provides full details for the <italic>microTrait</italic> hierarchy. <xref ref-type="sec" rid="s11">Supplementary Table S8</xref> lists references for genomic features underlying ecological traits.</p>
</caption>
<graphic xlink:href="fbinf-02-918853-g001.tif"/>
</fig>
<p>At the very fundamental level, our approach takes as input a genome sequence and maps it to a trait space in a computationally scalable way. Here we adopt a microbial counterpart of the widely used definition of &#x201c;functional traits&#x201d; for macroorganisms as measurable characteristics that &#x201c;impact fitness of an organism via its effect on growth, reproduction, or survival&#x201d; at the individual level (<xref ref-type="bibr" rid="B81">Violle et al., 2007</xref>; <xref ref-type="bibr" rid="B80">Violle et al., 2014</xref>). Unlike for macroorganisms, measuring traits at the individual microbe level in complex communities is currently not feasible, although single-cell imaging and &#x2018;<italic>omic</italic> technologies are beginning to expand our understanding of population heterogeneity at these native scales (<xref ref-type="bibr" rid="B82">Wang and Bodovitz 2010</xref>; <xref ref-type="bibr" rid="B14">Bock et al., 2016</xref>). Genomes have recently been proposed as the ecological units (<xref ref-type="bibr" rid="B64">Prosser 2015</xref>; <xref ref-type="bibr" rid="B77">Turaev and Rattei 2016</xref>) at which genome-inferred traits should be measured. Advances in DNA sequencing and computational protocols has led to a more or less continuous stream of provisional genomes not only from cultured isolates but also from single-cells (SAGs) and metagenomes (MAGs) (<xref ref-type="bibr" rid="B73">Sharon and Banfield 2013</xref>). Though as an ecological unit, the resolution represented by MAGs may not currently match its counterpart for macroorganisms, possibly representing mosaics and distorting or masking intra-population differences, they nevertheless provide an unprecedented window into complex microbiomes and provide especially valuable insights into the physiology and metabolism of uncultivated organisms in their natural environments. As such, a genome-centric lens to traits allows scaling of organism level traits to communities (through incorporation of genome abundances) and therefore at larger scale as well as studying trait linkage across ecologically relevant units.</p>
<p>We identified genomic features that can be mapped to microbial ecological strategies, conceptualized under four dimensions (<xref ref-type="fig" rid="F1">Figure 1</xref>) organized as a hierarchy (&#x201c;<italic>microTrait</italic> hierarchy&#x201d;: <xref ref-type="sec" rid="s11">Supplementary Table S1</xref>). Within each strategy, the trait information is organized as a hierarchy whose leaf nodes map to specific genome derived features. <xref ref-type="sec" rid="s11">Supplementary Table S8</xref> lists the full list of references that establish the links between each genome derived feature and the ecological strategy at the most granular level. Here we give an overview of the traits for each ecological strategy:</p>
</sec>
<sec id="s3-2">
<title>Resource Acquisition Traits</title>
<p>A tremendous variety of substrates ranging from simple inorganic ions to complex organic molecules serve as resources for microbes. Microbes have adapted a suite of concrete strategies with genomic basis to be competitive in a wide range of environments with spatiotemporally variable resource profiles. Many microorganisms have the potential to produce exoenzymes that can disassemble complex resources (substrate degradation), which can then be acquired through uptake (substrate uptake) via membrane transporters (<xref ref-type="bibr" rid="B12">Berntsson et al., 2010</xref>; <xref ref-type="bibr" rid="B6">Arnosti 2011</xref>; <xref ref-type="bibr" rid="B98">Zimmerman et al., 2013</xref>; <xref ref-type="bibr" rid="B7">Arnostil et al., 2014</xref>; <xref ref-type="bibr" rid="B23">Courty and Wipf 2016</xref>; <xref ref-type="bibr" rid="B11">Bergauer et al., 2018</xref>). Thus, one aspect of resource acquisition strategy concerns the investment in both the number and diversity of exoenzymes and membrane transporters a microbe would maintain in a microbial genome. Substrate uptake is linked to substrate assimilation traits that determine the capacity for assimilation of inorganic compounds.</p>
</sec>
<sec id="s3-3">
<title>Resource Use (Energy Generating) Traits</title>
<p>Redox reactions underlie all biological energy metabolism and redox chemistry provides an organizing principle to connect microscale to global scale processes (<xref ref-type="bibr" rid="B29">Falkowski et al., 2008</xref>; <xref ref-type="bibr" rid="B66">Ramirez-Flandes et al., 2019</xref>). Genes whose protein products catalyze redox reactions, their coupling to energy conservation, and their genomic organization determine the basis for microbial metabolic strategies. Historically, in the pre-genomic era, single metabolic traits were evaluated in isolation to define &#x201c;metabolic functional groups&#x201d; but genomic data has underlined the tremendous metabolic flexibility of microbes (<xref ref-type="bibr" rid="B5">Anantharaman et al., 2016</xref>). As a result, classical enumerations of microbial metabolism are not sufficient to represent the linkage of metabolic traits. Representation of microbes as a suite of energy metabolism traits provides a more complete picture and a data driven definition of metabolic guilds.</p>
</sec>
<sec id="s3-4">
<title>Stress Tolerance Traits</title>
<p>Stress may be induced by physical, chemical, or biological conditions that adversely affect microbial growth and survival. Microbes that use stress tolerance strategies respond to a variety of stressors using several physiological and evolutionary mechanisms. Though the specific stress response depends on the particular suboptimal conditions, common traits with genomic underpinnings have been broadly identified (General Stress Tolerance Traits). These include increasing the concentration of some molecular chaperones (stress proteins/heat-shock proteins) to combat biomolecular damage in response to stress. This is a universal feature across all domains of life but the relative importance of genetic (i.e., diversity and gene copy number) or regulatory (transcriptional, translational, and post-translational) processes under different stressors is less clear (<xref ref-type="bibr" rid="B30">Feder and Hofmann 1999</xref>; <xref ref-type="bibr" rid="B41">Hecker and Volker 2001</xref>; <xref ref-type="bibr" rid="B95">Yu et al., 2015</xref>).</p>
<p>Genomic bases of microbial traits that underlie stress tolerance to specific physiochemical and chemical factors have also been identified: 1) Temperature stress: a suite of heat shock genes serving as chaperones and proteases are involved in the protection, repair, and degradation of denatured/misfolded proteins. Response to cold shock involves adaptation of the membrane via an increase in the proportion of unsaturated fatty acids and activation of chaperone cold shock proteins to restore mRNA functionality. 2) Desiccation, osmotic, salt stress: Known molecular strategies to tolerate drought and freezing include production or uptake of osmolytes like trehalose and glycine betaine to reduce water potential and maintain hydration or synthesis of extracellular polymeric substances (<xref ref-type="bibr" rid="B24">Csonka 1989</xref>; <xref ref-type="bibr" rid="B49">Ko et al., 1994</xref>; <xref ref-type="bibr" rid="B60">Mindock et al., 2001</xref>; <xref ref-type="bibr" rid="B22">Costa et al., 2018</xref>). 3) Oxidative stress: The response to oxidative stress is a complex one that involves the coordinated regulation of many genes most critically involving enzymes that scavenge reactive oxygen species. The activation of such regulons requires redox sensing (two-component redox sensors and redox-sensitive TFs). 4) pH stress: Similarly to general, oxidative, and temperature stress, molecular mechanisms for protection from acid stress include investment in chaperones, proteases and the ability to sense and respond to redox conditions through two-component systems and TFs. Unique mechanisms for maintenance of intracellular pH include the consumption and extrusion of intracellular protons by acid-inducible amino acid decarboxylase-antiporter and urease systems, and the enzymatic conversion of unsaturated fatty acids into cyclopropane fatty acids.</p>
</sec>
<sec id="s3-5">
<title>Life History Traits</title>
<p>Ecological and evolutionary processes leave their signatures in overall microbial genome content and organization. A key dimension of any ecological strategy is growth. Optimal growth characteristics of microbes are key to understand how the key traits regarding resource acquisition, resource use, and stress tolerance are realized to adapt to a particular environmental niche. Traits that concern these characteristics are classified as life history traits. Codon usage bias and ribosomal RNA (rRNA) operon copy number are linked to maximum growth rate, a life history trait constraining all other functional traits (<xref ref-type="bibr" rid="B83">Weider et al., 2005</xref>; <xref ref-type="bibr" rid="B79">Vieira-Silva and Rocha 2010</xref>; <xref ref-type="bibr" rid="B85">Weissman et al., 2021</xref>). Another key life history trait closely linked to the overall genomic adaptation is optimal growth temperature (OGT). Temperature is a master regulator of enzyme activity and overall cell machinery. A combination of quantifiable proteome-wide features predictable from genome sequences allows OGT to be hypothesized solely from genomic sequence (<xref ref-type="bibr" rid="B96">Zeldovich et al., 2007</xref>; <xref ref-type="bibr" rid="B71">Sauer and Wang 2019</xref>).</p>
</sec>
<sec id="s3-6">
<title>
<italic>microTrait</italic> Pipeline</title>
<p>The computational pipeline to infer traits from primary genome sequences has two major components (<xref ref-type="fig" rid="F2">Figure 2A</xref>): 1) a database of gene HMMs (<italic>microTrait-HMM</italic>) to model the diversity of protein families based on sequences from genomes and metagenomes with independently established accuracy to detect genetic loci (<xref ref-type="fig" rid="F2">Figure 2B</xref> and <xref ref-type="sec" rid="s11">Supplementary Table S2</xref>), 2) a set of rules (<italic>microTrait</italic> rules) encoded in predicate logic to infer traits from presence and absence of the set of loci modeled in <italic>microTrait-HMM</italic> (<xref ref-type="sec" rid="s11">Supplementary Table S4</xref>)<italic>.</italic> The model-based detection of genetic loci ensures decreased run-times and interoperability across datasets (given model and scoring cutoff). The rule-based framework to infer traits from primary features gives the user the flexibility for redefinition and refinement.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Overview of <italic>microTrait</italic>. <bold>(A)</bold> <italic>microTrait</italic> pipeline consists of a library of gene-level Hidden Markov Models (<italic>microTrait</italic>-HMMs) for detection of genome features and logical rules (<italic>microTrait</italic>-rules) that map these features to traits. The output from the pipeline are trait matrices (genomes &#xd7; traits) at different granularities corresponding the levels of the <italic>microTrait</italic> hierarchy. <bold>(B)</bold> Workflow for construction of <italic>microTrait</italic>-HMMs. Each HMM models the diversity of sequences from IMG/M at gene-level. <bold>(C)</bold> Benchmarking of <italic>microTrait</italic>-HMMs. The trusted cutoffs for <italic>microTrait</italic>-HMMs were determined through cross-references to KEGG orthologs (whenever available).</p>
</caption>
<graphic xlink:href="fbinf-02-918853-g002.tif"/>
</fig>
</sec>
<sec id="s3-7">
<title>Cross References to External Databases From <italic>microTrait</italic>-HMM</title>
<p>The statistical models in <italic>microTrait-HMM</italic> reflect the most recent sequence diversity from both cultured and uncultured microbes and therefore should have improved accuracy over existing methods to detect genes underlying traits covered in <italic>microTrait</italic>. To ensure interoperability of the <italic>microTrait</italic> pipeline with the existing HMM databases and relevant sequence database resources, for each gene model we provide database cross references to KEGG (<xref ref-type="bibr" rid="B47">Kanehisa and Goto 2000</xref>), Transporter Classification Database (<xref ref-type="bibr" rid="B68">Saier et al., 2016</xref>), and Enzyme nomenclature database (through EC numbers) (1999).</p>
</sec>
<sec id="s3-8">
<title>Performance of Gene HMMs and Assignment of Trusted-Cutoffs</title>
<p>We assessed the performance of each <italic>microTrait-HMM</italic> by first determining the corresponding orthologous group (KO number) in KEGG orthologs database (when the loci was represented in KEGG) (<xref ref-type="fig" rid="F2">Figure 2C</xref>). A test dataset for the gene model in question was built by using IMG/M sequences labeled with the determined KO number (&#x201c;true positives&#x201d;) and the remaining KO numbers (&#x201c;true negative&#x201d;). IMG/M database was scanned with the profile HMM using HMMER/hmmsearch. F-scores (harmonic mean of precision and recall) were calculated as a function of &#x201c;hmmsearch scores&#x201d; based on the test dataset with R using ROCR package (<xref ref-type="bibr" rid="B74">Sing et al., 2005</xref>). The smallest score that maximizes F-scores was assigned as the trusted cutoff. <xref ref-type="sec" rid="s11">Supplementary Table S3</xref> summarizes the performance of each model in <italic>microTrait-</italic>HMM. Overall, at the determined trusted cutoffs, the overwhelming majority of <italic>microTrait-</italic>HMMs (94.2%-1,686 out of 1790 HMMs) had high sensitivity (&#x2265;75%) and low FPR (false positive rate), with 92% of HMMs having an F-score &#x3e;&#x3d;0.8 (<xref ref-type="sec" rid="s11">Supplementary Figure S1</xref>).</p>
</sec>
<sec id="s3-9">
<title>
<italic>microTrait</italic> Pipeline: Derivation of Traits From Genome Sequences</title>
<p>The input to <italic>microTrait</italic> is a genome sequence (.fa) or the corresponding protein coding genes (.faa) in FASTA format. When genomic rather than protein coding gene sequences are supplied, Prodigal is used to predict open reading frames (<xref ref-type="bibr" rid="B42">Hyatt et al., 2010</xref>). For each genome, protein sequences are scanned against <italic>microTrait-HMM</italic> with HMMER/hmmsearch to generate a count table for the detected gene models. Binary and continuous traits are assigned using the count table and predefined logical rules mapping the presence/absence of genes(s) or other rules to specific traits (<xref ref-type="fig" rid="F3">Figure 3</xref>). The rules can be edited by the users within the R package. Their role is twofold: On one hand they allow modifications in the way some binary traits can be defined (for instance based on one or more proteins in a large complex, or one or more steps in a pathway) giving the user flexibility. They can also be used to increase detection sensitivity for provisional or lower quality genomes (i.e., SAGs and MAGs).</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Trait inference with <italic>microTrait</italic> rules. microTrait rules use simple boolean logic to map presence/absence of <italic>microTrait</italic>-HMMs (italicized) to traits. The reconfigurability of the rules makes the exploration of the effect of different trait definitions on the microbial guilds possible and therefore enables a flexible microbial trait extraction pipeline. Examples for trait definitions from rules for <bold>(A)</bold> denitrification traits. Rule-based inference allows flexible definition of traits, for example by end products of denitrification. <bold>(B)</bold> substrate uptake. <italic>microTrait</italic> represents substrate uptake traits using the range substrates documented in TCDB (Transported Classification Database) (shown as word cloud colored by substrate class). Traits relevant to the uptake of substrates (example for monosaccharides) can be defined in a hierarchical manner with rules defined from other rules and <italic>microTrait</italic>-HMMs.</p>
</caption>
<graphic xlink:href="fbinf-02-918853-g003.tif"/>
</fig>
</sec>
<sec id="s3-10">
<title>Modular Trait Definitions With Predicate Logic</title>
<p>
<italic>microTrait</italic> uses Boolean algebra to map protein family content into traits through <italic>microTrait</italic> rules (<xref ref-type="sec" rid="s11">Supplementary Table S5</xref>). In this framework, each protein family is a Boolean variable (i.e. equals 1 if detected, 0 otherwise) whose value is determined by the output of the corresponding <italic>microTrait-</italic>HMM. The traits are represented by rules whose arguments are one or more protein families, other rules, or a combination of these. Conceptually, the rules map to representations of protein complexes with multiple subunits or a series of enzyme catalyzed reactions that transform one molecular species into another. While the standard package comes with a predefined set of rules, the rules themselves and the mapping of rules to traits are modular and can be modified by the user. As an example, consider denitrification traits (<xref ref-type="fig" rid="F3">Figure 3A</xref>). The canonical denitrification pathways, excluding accessory and regulatory proteins, involve 4 protein complexes (NarGHI: the inner membrane-bound nitrate reductase; NapAB: the periplasmic nitrate reductase; NorBC, NorVW: nitric oxide reductases) and 3 proteins (NirS, NirK: nitrite reductases; NosZ: nitrous oxide reductase). Together, these are represented by 12 protein families (italicized gene names in <xref ref-type="fig" rid="F3">Figure 3A</xref>) and the four individual enzymatic steps are represented by 4 rules. From these rules, several denitrification traits corresponding to individual functional guilds can be defined.</p>
<p>For transporters and polymer specific extracellular enzymes, we compiled a list of the experimentally reported substrates of each enzyme using the Transporter Classification Database (TCDB) (<xref ref-type="bibr" rid="B68">Saier et al., 2016</xref>) and the Database of carbohydrate-active enzymes (dbCAN) (<xref ref-type="bibr" rid="B94">Yin et al., 2012</xref>). We then classified each reported substrate into broad substrate classes (<xref ref-type="fig" rid="F3">Figure 3B</xref> and <xref ref-type="sec" rid="s11">Supplementary Table S6</xref>). The relevant rules for transporters and extracellular enzymes let the user quantify the number of protein complexes with a given substrate or substrate class.</p>
<p>A challenge in assigning traits to genomes based on the protein family signatures is the modularity of the underlying pathways. This modularity might be truly reflecting the genomic variation within a set of isolates, MAGs or SAGs but also be an apparent manifestation of incomplete and noisy genomic information. Starting with genomic sequences, <italic>microTrait</italic> allows the investigation of this modularity across a set of genomes. The resulting information can be used by the user to define custom logical rules to assign traits based on the protein family content.</p>
</sec>
<sec id="s3-11">
<title>Comparing <italic>microTrait</italic> With a Taxonomy-Based Inference of Microbial Functional Groups</title>
<p>Linking taxonomic classification with function is a commonly used method to infer microbial traits. Faprotax is a manually curated database that maps taxa to functional groups based on the physiological studies for the cultured representatives of these taxa (<xref ref-type="bibr" rid="B54">Louca et al., 2016</xref>). The taxonomic resolution is typically at species or genus level but can also be less specific (i.e. family or higher). Using a large collection of isolate genomes from environmental ecosystems (refer to Materials and Methods for construction of the genome collection) and literature references for functional affiliations based on taxonomic names in Faprotax (<xref ref-type="sec" rid="s11">Supplementary Table S11</xref>), we have quantified the extent to which <italic>microTrait-</italic>rules recovered the validated culturable taxa for different microbial functional groups. For each functional group, we first matched the taxonomic names from literature, primarily genus/species names but also extending to higher ranks for certain functional groups, to canonical NCBI taxonomic names. All available genomes from environmental ecosystems with the respective taxonomic affiliation were considered as a &#x201c;positive&#x201d; for that functional group according to the Faprotax approach (<xref ref-type="sec" rid="s11">Supplementary Table S12</xref>). We have then tested how many of these assumed Faprotax positives the <italic>microTrait</italic> pipeline was able to recall solely based on the functional trait predictions from genomes. In addition, for each functional group, we have also evaluated the specificity of genome-based calls based on the assumption that all negatives via the Faprotax taxonomic affiliation were &#x201c;true negatives&#x201d; (<xref ref-type="sec" rid="s11">Supplementary Table S13</xref>).</p>
<p>Among 41 functional groups, 29 had a recall rate over 70%. Functional groups for which <italic>microTrait</italic> had low recall rates included anammox (0 <italic>microTrait</italic>&#x2b; genomes out of 7 Faprotax&#x2b; genomes; 0/7), dark iron oxidation (10/16), iron respiration (19/86), aerobic nitrite oxidation (6/13), chlorate reducers (3/6), dark sulfide oxidation (49/93), anoxygenic photoautotrophy Fe oxidizing (9/16), dark sulfur oxidation (71/124), sulfur respiration (82/139), thiosulfate respiration (88/145). A close examination of the taxonomic identity of the genomes &#x201c;missed&#x201d; by <italic>microTrait</italic> suggested a variety of explanations for the functional groups with poor recall.</p>
<p>A primary advantage of inferring microbial traits directly from genomic sequences rather than by taxonomic names is the ability to resolve diversity (species or strain level), which increases the prediction accuracy. We have observed that for many functional groups defined in Faprotax, the genomes that were assigned to the taxonomic clades lacked the required genetic repertoire for the metabolic function in question. Some prominent examples are for the &#x201c;anammox&#x201d; and &#x201c;dark iron oxidation&#x201d;. For anammox, among the diversity of taxa (genus and species), only <italic>P. mendocina</italic> had corresponding genomes in the isolate set (n &#x3d; 7) and none of those had the genomic features for anammox suggesting that this is a strain specific trait for <italic>P. mendocina.</italic> Similarly, for dark iron oxidation, genome features suggested that the trait can be strain specific. Among 15 <italic>R. palustris</italic> and 2 <italic>M. ferrooxydans,</italic> a limited number (9 and 1 genome respectively) was genome-supported to carry the trait. There were also cases where the genomic evidence suggested that trait conservation was limited to deep taxonomic levels so a taxonomic inference at genus or family level would have impacted the accuracy of Faprotax method. For instance, methanotrophy is associated with Methylocystaceae (family) and Methylocapsa (genus) yet the trait was specific to subfamily/subgenus. Among 7 Methylocystaceae genera with genome representatives, 2 genera (Methylocystis and Methylosinus) had genome support for the trait. Similarly, 2 out 3 Methylocapsa species with genomes had evidence for the trait.</p>
<p>It should be noted that, there were also cases for which the absence of the genomic signal reflected limited knowledge for the genetic underpinnings of the trait. A typical example was for iron respiration, a trait for which current evidence suggests that electron transport for iron reduction proceeds in a different and unknown mechanism in acidophiles compared with <italic>Ferrimonas</italic> and <italic>Shewanella</italic> (<xref ref-type="bibr" rid="B58">Malik et al. 2018</xref>). Another example was for chlorate reduction, a process whose genomic trait sits in a region prone to horizontal transfer (<xref ref-type="bibr" rid="B21">Clark et al., 2013</xref>) which impacts the accuracy of a gene-level profile HMM approach. Overall, these disagreements between taxonomic and genome-based approaches suggests that, a genomic feature-based approach such as <italic>microTrait</italic> increases prediction accuracy and precision, even when one considers single traits (such as functional groups).</p>
</sec>
<sec id="s3-12">
<title>High-Throughput Extraction of Microbial Traits from Genomes with <italic>microTrait</italic>
</title>
<p>As an example of scalable extraction of traits from genomes, we applied <italic>microTrait</italic> to publicly available isolate genomes and MAGs. The datasets we used included 1) isolate genomes from environmental ecosystems from IMG/M (n &#x3d; 6,157), 2) MAGs from an aquifer system (n &#x3d; 2,545) (<xref ref-type="bibr" rid="B5">Anantharaman et al., 2016</xref>), 3) MAGs from a thawing permafrost (n &#x3d; 1,530) (<xref ref-type="bibr" rid="B92">Woodcroft et al., 2018</xref>), 4) MAGs from hydrothermal sediments (n &#x3d; 666) (<xref ref-type="bibr" rid="B25">Dombrowski et al., 2018</xref>), and 5) MAGs from publicly available metagenome samples, referred to as Uncultivated Bacteria and Archaea Dataset (UBA) (n &#x3d; 7,902) (<xref ref-type="bibr" rid="B62">Parks et al., 2017</xref>). This compendium of datasets (genome compendium) resulted in a total number of 20,062 genomes.</p>
<p>We tested <italic>microTrait</italic> on a machine with a 2.3&#xa0;GHz 16-core Intel Xeon Processor E5-2,698. When run using a single core, with a single genome processed using that core, <italic>microTrait</italic> processed that genome in 3.94 &#xb1; 2.59&#xa0;min, with an average of 1.11&#xa0;min/Mb of genome sequence (<xref ref-type="sec" rid="s11">Supplementary Figure S2</xref>). From these, we predict that <italic>microTrait</italic> can process an average microbial genome of size 4&#xa0;Mb in approximately 4.5&#xa0;min. In all runs, the memory footprint of <italic>microTrait</italic> was not larger than 60&#xa0;MB. In a multiprocessor compute environment, <italic>microTrait</italic> is easily parallelizable using a typical data-level parallelization scheme (for instance using R&#x2019;s <italic>parallel</italic> package (distributed as part of R-core)) mapping genomes to separate logical processors. In our tests, when run in a 64 processor compute node, the processing of the compendium of 20,062 genomes (total size &#x3d; 47.9&#xa0;Gb) took 12.47&#xa0;h.</p>
</sec>
<sec id="s3-13">
<title>
<italic>microTrait</italic> Trait Matrix</title>
<p>When applied to multiple genomes, <italic>microTrait</italic> outputs a trait matrix of &#x201c;genomes x traits&#x201d; with three types of qualitative variables. Binary trait variables are calculated as presence/absence of a specific functional capacity and span 1) energy generation via specific electron acceptors/donors, 2) capacity to degrade, assimilate, or acquire specific substrates. Continuous trait variables are of two groups. The first group of continuous traits are calculated starting from counts of specific functional capacities in the genome and span 1) acquisition of chemical classes of substrates with transporters or via extracellular breakdown, 2) investment in extracellular polysaccharides and osmolytes. For each genome, the counts are normalized by genome size. The second group represent life history traits and include 1) minimum generation time (unit: h<sup>&#x2212;1</sup>) predicted based on indices of codon-usage bias in ribosomal protein genes (a proxy for highly expressed genes) (<xref ref-type="bibr" rid="B79">Vieira-Silva and Rocha 2010</xref>) (<xref ref-type="bibr" rid="B85">Weissman et al., 2021</xref>), 2) optimal growth temperature (unit: &#xb0;C) predicted from a suite of features derived from the nucleotide and protein sequences of the genome (<xref ref-type="bibr" rid="B71">Sauer and Wang 2019</xref>).</p>
</sec>
<sec id="s3-14">
<title>Refinement of Functional Guilds Using <italic>microTrait</italic>
</title>
<p>To exemplify the use of <italic>microTrait</italic> in refining functional guilds, we explored how denitrifier guilds can be defined based on the genomic distribution of denitrification traits in the isolate genomes from our compendium of genomes. Denitrification is a key biologically catalyzed process by which nitrogen available to plants is transformed to the atmospheric nitrogen pool as gaseous forms of nitrogen as molecular N<sub>2</sub> or as an oxide of N. Denitrification occurs as a step-wise reduction of nitrogen oxides with gaseous products. Four reductases are involved in the denitrification, NAR, NIR, NOR and N2OR, sequentially catalyzing the reductions of NO3 - &#x2192; NO2 - &#x2192; NO &#x2192;N2O &#x2192;N2. Several previous studies reported both genomic and phenotypic evidence for truncated versions of the denitrification pathway but a global genomic analysis is not currently available (<xref ref-type="bibr" rid="B69">Sanford et al., 2012</xref>; <xref ref-type="bibr" rid="B46">Jones et al., 2014</xref>; <xref ref-type="bibr" rid="B55">Lycus et al., 2017</xref>; <xref ref-type="bibr" rid="B53">Liu et al., 2018</xref>; <xref ref-type="bibr" rid="B35">Gao et al., 2019</xref>).</p>
<p>We used the <italic>microTrait</italic> pipeline to explore all of the publicly available environmental genomes from the IMG/M database (<xref ref-type="sec" rid="s11">Supplementary Table S9</xref>). This resulted in a &#x201c;genomes X rules&#x201d; matrix specifying for each genome whether each of the rules was asserted as TRUE or FALSE. The matrix was subset to rules underlying denitrification traits and the genomes were clustered based on their denitrification trait profiles. The clustering gave 13 denitrification-associated functional guilds, with 58.3% of the screened genomes involved in at least one denitrification-related process (<xref ref-type="sec" rid="s11">Supplementary Figure S3</xref>). Only, a small proportion of these had the genomic capacity to perform complete denitrification to N2. Overall, the guilds correspond to generation of the same end products from different starting nitrogen compounds (e.g. guilds 1&#x2013;4, 5&#x2212;7, and 8&#x2212;9 generating N<sub>2</sub>, N<sub>2</sub>O, and NO respectively), or multiple end products with missing steps (e.g. guilds 11&#x2013;13). The default trait matrix in <italic>microTrait</italic> defines denitrification traits by the end products of denitrification (<xref ref-type="sec" rid="s11">Supplementary Table S7</xref>) yet the workflow of going from genomic features to traits via <italic>microTrait</italic> rules makes redefinition of traits possible.</p>
</sec>
<sec id="s3-15">
<title>Testing Trait Dimensionality of Microbial Genomes from a Given Ecosystem</title>
<p>
<italic>microTrait</italic> hierarchy maps a microbial genome to a high-dimensional space of putative functional traits of ecological relevance. In trait-based ecological modeling, trait selection is of central importance not only for biological but also for computational, statistical, and practical reasons (<xref ref-type="bibr" rid="B50">Lajoie and Kembel 2019</xref>). In our conceptualization of the relevant traits for terrestrial ecosystems, the set of selected traits are assumed to approximate the intrinsic (i.e. true underlying but unobserved) dimensionality of microbial traits. Unlike for plants for which accumulated evidence suggests that the intrinsic dimensionality of functional trait space is low (<xref ref-type="bibr" rid="B52">Laughlin 2014</xref>), the intrinsic dimensionality of the trait space of microbes in specific ecosystems remains largely unknown. However, we can assume that if the selected trait proxies are largely independent of each other then, taken jointly, they should represent the underlying functional differences, and improve our ability to explain and predict microbial distributions.</p>
<p>To investigate whether the selected traits in <italic>microTrait</italic> are largely independent, we used an extensive dataset of genomes of microbes isolated from terrestrial ecosystems to study the correlation structure of their <italic>microTrait</italic> profiles. The trait matrix (at granularity 3) for a total of 4,116 genomes of organisms isolated from terrestrial environments (ST9) was computed using <italic>microTrait.</italic> A non-parametric rank-order correlation metric was used to estimate the degree of relatedness between all trait pairs, visualized as a correlation matrix and reordered to elucidate the potential hidden structure and pattern in the matrix (<xref ref-type="fig" rid="F4">Figure 4A</xref>).</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Correlation matrix for <italic>microTrait</italic> defined traits. The strength of the correlation (Spearman&#x2019;s rho) is represented by the color intensity (positive: blue, negative: red). Left upper panel: the distribution of trait-to-trait correlation values, left lower panel: comparison of the distribution of trait-to-trait correlations within and between ecological strategies.</p>
</caption>
<graphic xlink:href="fbinf-02-918853-g004.tif"/>
</fig>
<p>Overall, the bulk of the correlations were weak (&#x7c;&#x3c1;&#x7c; &#x3c; 0.3) suggesting that <italic>microTrait</italic> trait dimensions map to largely independent traits (<xref ref-type="fig" rid="F4">Figure 4B</xref>). On the extremes, strong positive correlations would be indicative of redundancy of trait dimensions while negative correlations would be indicative of underlying tradeoffs for the ecosystem in question. Few strongly positively correlated blocks corresponded to phototrophic resource use traits linking the variety of phototrophic pigments and photosystems.</p>
</sec>
<sec id="s3-16">
<title>Dimensionality Reduction with Guild-Centric Analysis of Microbial Genomes With <italic>microTrait</italic>
</title>
<p>Metagenomics allow the recovery of the genomes of all detectable members of an ecosystem along extensive spatiotemporal gradients. The genomes then provide support for co-occurrence of ecologically relevant traits of the members that together underlie the ecosystem function. A typical genome-centric microbiome study involves the analysis of hundreds to thousands of genomes leading to trait matrices of high genomic dimensionality. This high dimensionality poses a particular problem for statistical analyses (<xref ref-type="bibr" rid="B45">Johnstone and Titterington 2009</xref>). Further, when attempting to leverage the information from these genomes for downstream modeling applications, there is both a practical need and discovery opportunities in quantify and reducing this dimensionality in a tractable manner. Organizing microbial members of an ecosystem community into &#x201c;putative guilds&#x201d; can reduce the dimensionality of a metagenomic dataset and hypothesize the functional niche of community members and computationally explore their interactions independently of their taxonomic origin. Here, using the soil ecosystem as an example, we show how to define microbial guilds in a data-driven manner using <italic>microTrait.</italic>
</p>
<p>Given a set of genomes representing a habitat, <italic>microTrait</italic> can be used to discover and define functional guilds, parameterize the defined guilds with life history traits (minimum doubling time and optimal growth temperature), and reduce the dimensionality of the trait space in a quantifiable way. <xref ref-type="fig" rid="F5">Figure 5</xref> outlines the guild-centric pipeline starting with a trait matrix leading to the definition and characterization of the microbial guilds. Since <italic>microTrait</italic> encompasses both continuous and binary traits, the similarity between genomes are measured using a distance metric suitable for mixed data types (<xref ref-type="bibr" rid="B91">Wishart 2003</xref>) (see Methods). The resulting distance matrix (genomes x genomes) is clustered with unsupervised hierarchical clustering, visualized with trait presence/absence (i.e., treating continuous traits as binary variables), and annotated with the distribution of life history traits and trait prevalence across the dataset (<xref ref-type="fig" rid="F5">Figure 5A</xref>). Quantifying relationships between genomes based on their trait profiles gives the opportunity to dynamically define guilds in a data-driven way for any dataset. The proportion of inter-guild variance explained can then be quantified as a function of the number of guilds (<xref ref-type="fig" rid="F5">Figure 5B</xref>). A larger number of guilds corresponds to a smaller information loss at the expense of greater complexity for downstream applications. The user decides here where to operate along the curve depending on the shape (rate of change in steepness with increasing guilds) and the application of interest. Once determined, the guilds can be defined which results in a list of guilds, each representing a number of genomes and the joint distribution of traits captured by them. It is often useful to examine the distribution of the number of genomes that underlies each guild as on average the within-guild trait variance would be higher for guilds supported by a smaller number of genomes. The user can filter the guilds by number of genomes to generate a dataset that represents guild profiles, that is a fingerprint of the co-occurrence of traits for each guild and the within-guild distribution of life history traits (<xref ref-type="fig" rid="F5">Figure 5C</xref> and ST 16).</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Primary use cases and graphical outputs of <italic>microTrait</italic> workflow. <bold>(A)</bold> Trait matrix provides clustering of a set of input genomes using trait profiles from <italic>microTrait</italic> outputs based on a distance metric taking into account mixed data types (i.e. for binary and count traits). Heatmap visualization use presence (red)/absence (white) of traits, with trait prevalence (% genomes positive) shown at the top panel. Life history traits (minimal doubling time and optimum growth temperature) are overlaid on the right panel in continuous scale. <bold>(B)</bold> Trait variance across genomes based on the genome clustering is quantified as a function of the number of guilds using analysis of variance using distance matrices. Guilds can be defined either at a fixed number of guilds or based on percent explained within-guild variance, which results in a size (number of supporting genomes) distribution of guilds. <bold>(C)</bold> Visualization of trait profiles for the defined guilds (guilds &#xd7; traits), with mean trait values visualized across a color scale. Traits are ordered by ecological strategies (red: resource acquisition, green: resource use, blue: stress tolerance). For each trait, top panel shows the statistical significance of comparison of mean trait values across guilds. The distribution of life history traits are shown on the right side panels.</p>
</caption>
<graphic xlink:href="fbinf-02-918853-g005.tif"/>
</fig>
<p>We applied the <italic>microTrait</italic> data-driven guild-definition pipeline to soil isolate genomes from IMG (3,430 genomes with GOLD Ecosystem Type &#x3d; &#x201c;Soil OR Rhizoplane OR Rhizosphere OR Root&#x201d;). All traits except &#x201c;anaerobic ammonia oxidation (anammox)&#x201d; were detected at least once in the dataset resulting in a trait matrix of dimensionality 3,430 genomes X 190 traits. To date no pure culture isolates of anammox organisms have been obtained (<xref ref-type="bibr" rid="B44">Jetten et al., 2005</xref>). Clustering analysis indicated that a total of 196 guilds captured 70% of the inter-guild variance, with 16 guilds supported by at least 50 genomes. Comparison of the trait profiles across guilds elucidates the differentiating trait features of a set of guilds with respect to other guilds.</p>
<p>For example, the top three guilds supported by the highest numbers of genomes (guild 3, guild 23, and guild 4; 383, 375, and 340 genomes respectively) were each enriched in specific traits under resource acquisition and resource use strategies (ST16). Guild 23 compared to guild 3, and 4 was marked by enrichment of the ability to assimilate simple C compounds, use 2&#xa0;C compounds in the absence of glucose via glyoxylate cycle, uptake a variety of N compounds (elemental N and urea) as well aromatic acids and biopolymers, and fix elemental nitrogen for biomass. On the other hand, compared to guild 23, guild 3, and 4 represent a different strategy for incorporation of N compounds into biomass through assimilatory nitrate reduction and a unique ability to assimilate P compounds. Notably, although all three guilds were enriched in the capacity to utilize glucose, guilds 23 and guilds 3, and 4 differed in their preferred glycolytic pathways (canonical Embden-Meyerhoff-Parnass (EMP) pathway in guilds 3, and 4 vs. less common Entner&#x2013;Doudoroff (ED) pathway in guild 23) reflecting differing preferences in balancing production of ATP (energy yield) and cost of protein synthesis to achieve maximum fitness (<xref ref-type="bibr" rid="B33">Flamholz et al., 2013</xref>). Across these three guilds (3, 23, and 4) differences in enrichment for stress tolerance mechanisms were not apparent, however, other guilds did display enrichment in specific stress tolerance strategies. For instance, among all the guilds supported by at least 50 genomes, guilds 7 and 14 were uniquely enriched in traits for desiccation and pH stress tolerance respectively.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>Discussion</title>
<p>Genome sequencing, from a data perspective, now provides a primary window into the traits that regulate fitness and function across Earth&#x2019;s microbiomes. Genomes are increasingly recognized as a fundamental unit in the study of microorganisms, however, the integration of this information is required to understand how such genome units relate to ecologically coherent behavior. Exploration of feedbacks between microorganisms and their environments requires numerical modeling approaches, and the assimilation of genomic information has substantially lagged its generation. This assimilation of microbiome information into numerical models in an automated fashion remains a significant challenge as microbial communities are ultra-diverse, physiologically plastic, and dynamically adaptive. Trait-based approaches to microbial ecology provide a framework to represent microbial diversity in a way that facilitates prediction, integration and generalization (<xref ref-type="bibr" rid="B50">Lajoie and Kembel 2019</xref>) and the rate at which isolate and metagenome-assembled genomes are being generated provide an unprecedented resource to explore patterns in microbial trait conservation and linkage. The resulting information can be used to initialize and parameterize mechanistic trait-based models spanning a scale of complexities to explore the drivers of patterns in the distribution and co-occurrence of microbial traits. With <italic>microTrait</italic>, our goal was to provide an extendable toolset and computational pipeline to infer microbial traits from genomic data and show how the resulting information can be used to define microbial guilds with varying parameters.</p>
<p>Our approach to infer ecological traits from genomic data couples profile search methods with reconfigurable simple predicate logic. This coupling provides important advantages for deriving microbial traits from large numbers of phylogenetically diverse microbial genomes. Profile methods represent information across a family of evolutionarily related sequences from a multiple sequence alignment and increase sensitivity by incorporating position-specific information into a model. Moreover, the set of sequences from which gene-level <italic>microTrait-HMMs</italic> have been trained were selected from an extensive sequence database (IMG/M (<xref ref-type="bibr" rid="B20">Chen et al., 2019</xref>)) that not only includes genomes of cultured isolates but also MAGs and SAGs, the majority of which had been derived from environmental samples. Given that the bulk of the stream of incoming genomes from new studies is expected from MAGs with higher phylogenetic diversity compared to isolate genomes, the ability to detect remote homologs underlying microbial traits and explore sequence diversity from environmental samples is critical to increase the accuracy of trait prediction. With future releases of IMG, new sequences can be incorporated into multiple sequence alignments and consecutively <italic>microTrait-HMM</italic>s can be updated.</p>
<p>To benchmark and determine the score thresholds for each gene-level <italic>microTrait-HMM</italic>, we used the corresponding genes from the corresponding KO (KEGG Orthology) group. While this approach makes a systematic assessment of model accuracy possible by balancing model precision and recall, it should be noted that the computed thresholds may be overly strict for certain applications. Sequences in the KO database correspond to a highly curated set of sequences with a limited phylogenetic scope, this may lead to high precision and low recall with respect to the true labels especially for phylogenetically divergent or novel genomes not well represented in KEGG (<xref ref-type="bibr" rid="B43">Jaffe et al., 2020</xref>). Since the true orthologs for the underlying protein families are not known but can only be inferred, the accuracy of the model can only be estimated using independent labels such as those from KEGG. For applications where a higher recall at the expense of a lower precision is desired, it would be desirable to lower the HMM cutoff thresholds depending on the user input. We leave the implementation of such modifications for future work.</p>
<p>In this work, we focused on mechanistically well-studied traits whose genetic underpinnings have previously been documented and which can be conceptualized as Boolean rules. In addition to extraction of microbial traits with a rule-based system, further opportunities exist for unsupervised discovery of traits. For example, genomes with metadata labels determined experimentally or through text-mining (<xref ref-type="bibr" rid="B3">Alneberg et al., 2020</xref>) (<xref ref-type="bibr" rid="B16">Brbic et al., 2016</xref>) indicating the ecological niches of the organisms can be leveraged for exploring the genetic basis of organismal adaptation. Statistical modeling of the organismal niche and inference based on domain or gene content would be the classical approach towards this (<xref ref-type="bibr" rid="B97">Zhalnina et al., 2018</xref>; <xref ref-type="bibr" rid="B19">Ceja-Navarro et al., 2019</xref>). In addition, the exponential increase in the availability of high-quality MAGs with rich metadata will make feasible machine learning approaches that focus on prediction rather than explainability using a much larger number of features also feasible (<xref ref-type="bibr" rid="B26">Drouin et al., 2019</xref>).</p>
<p>Despite the increasing availability of genomic and physiological data of microbes, the adoption of trait-based approaches in microbial ecology is relatively recent. Unlike plants and animals, working definitions of microbial traits and conceptual frameworks to define functional guilds from these are lacking. The large diversity of microbial lifestyles manifest as a large number of potential traits some of which might be unobserved. Even with thousands of diverse genomes, the high-dimensionality of the potential trait space poses a challenge to define functional guilds for microbes. Here we adopted an operational definition of microbial guild as &#x201c;groups consisting of diverse microorganisms with similar traits&#x201d; based on a synthesis of a relatively small number of master traits that define microbial lifestyles. Depending on the specific analysis goals, a user might want to fine tune the granularity at which traits are defined (e.g., selection of different pathway endpoints as in denitrification or transporter/enzyme substrate classification). In <italic>microTrait</italic>, the reconfigurability of the rules makes the exploration of the effect of different trait definitions on the microbial guilds possible and therefore enables a flexible microbial trait extraction pipeline.</p>
<p>Finally, a trait-based microbial ecology framework has the potential to integrate ecological and genomic data. For this promise to be achieved however, the availability of metadata on the provenance and biogeochemical/ecological identification of the underlying biological samples is essential. Environmental metadata give essential context for genome data but current isolation of metadata resources (GOLD (<xref ref-type="bibr" rid="B61">Mukherjee et al., 2019</xref>) and NCBI&#x2019;s BioSample (<xref ref-type="bibr" rid="B10">Barrett et al., 2012</xref>)) and lack of rich ontological and data standards hinder interoperability and reusability. Reusability of metadata is further hampered by inability to download metadata in bulk. Even within a single resource with a relatively consistent data schema, the fill rates for the existent terms are very low leading to existence of a large number of genomes without any usable metadata. For example, within 162,711 bacterial and archaeal GOLD genomes (accessed on 04/2021), only 17% had the Ecosystem field (GOLD: Study Fields: Ecosystem) completed with one of the three categories (Environmental, Engineered, or Host). Among the Environmental genomes, only &#x223c;41% (7,868 genomes) had even the broadest ecosystem classification completed (GOLD: Study Fields: Ecosystem Category) leaving an overwhelming majority of genomes unusable. For a trait-based framework to fulfill its full potential in elucidating microbial trait-environment relationships, significant community efforts towards higher quality metadata standards and metadata enrichment such as that led by National Microbiome Data Collaborative (NMDC, <ext-link ext-link-type="uri" xlink:href="https://microbiomedata.org/">https://microbiomedata.org/</ext-link>) towards higher quality metadata standards and metadata enrichment will be much needed.</p>
</sec>
<sec sec-type="methods" id="s5">
<title>Methods</title>
<sec id="s5-1">
<title>Implementation</title>
<p>
<italic>microTrait</italic> is implemented in R. Besides R-base functions, it depends on R packages dplyr, tidyr, tidyverse, readr (<xref ref-type="bibr" rid="B90">Wickham, 2019</xref>; <xref ref-type="bibr" rid="B40">Hadley et al., 2018</xref>; <xref ref-type="bibr" rid="B89">Wickham et al., 2019</xref>; <xref ref-type="bibr" rid="B88">Wickham and Henry, 2019</xref>) for efficient data access, manipulation and storage, doMC (<xref ref-type="bibr" rid="B87">Weston and Calaway 2015</xref>) to implement multicore functionality. <italic>microTrait</italic> is available from <ext-link ext-link-type="uri" xlink:href="https://github.com/ukaraoz/microtrait">https://github.com/ukaraoz/microtrait</ext-link>.</p>
</sec>
<sec id="s5-2">
<title>Construction of a Gene HMM Database of Protein Families (<italic>microTrait-HMM</italic>)</title>
<p>We constructed an HMM database that model gene loci underlying functional traits (called <italic>microTrait-HMM</italic>) based on archaeal and bacterial sequence diversity from 1) genomes of cultured organisms, 2) single cell genomes, 3) metagenome-assembled genomes, and 4) metagenomes from environmental, host associated and engineered microbiome samples. For each gene loci, a profile HMM was trained as follows. Seed protein sequences were collected from the non-redundant IMG/M database (img_core_v400) based on &#x201c;EC Number&#x201d;, &#x201c;Gene Symbol&#x201d;, and &#x201c;IMG Term and Synonym&#x201d; (<xref ref-type="bibr" rid="B20">Chen et al., 2019</xref>). Multiple sequences alignments (MSA) were generated from the seed sequences using MAFFT with an accuracy-oriented parameter set (--maxiterate 1,000 --localpair--anysymbol) (<xref ref-type="bibr" rid="B48">Katoh et al., 2005</xref>). Profile HMMs were built with HMMER/hmmbuild (<xref ref-type="bibr" rid="B27">Eddy 2008</xref>). We call the set of HMMs <italic>microTrait-HMM</italic> (<xref ref-type="sec" rid="s11">Supplementary Table S2</xref>)<italic>.</italic> All seed sequences, MSAs, and profile HMMs are available at <ext-link ext-link-type="uri" xlink:href="https://github.com/ukaraoz/microtrait-hmm">https://github.com/ukaraoz/microtrait-hmm</ext-link>.</p>
</sec>
<sec id="s5-3">
<title>Estimation of Life History Traits (Minimal Doubling Time and Optimum Growth Temperature)</title>
<p>To estimate minimal doubling time from genome-wide codon usage bias, <italic>microTrait</italic> uses gRodon R package (<xref ref-type="bibr" rid="B85">Weissman et al., 2021</xref>) using multiple linear regression models trained on the dataset of maximal growth rates compiled by Vieira-Silva and Rocha (<xref ref-type="bibr" rid="B79">Vieira-Silva and Rocha 2010</xref>). Optimum growth temperature is estimated with the multiple linear regression models based on the same features of tRNA and 16S rRNA genes, ORFs and translated ORFs determined by Sauer and Wang (<xref ref-type="bibr" rid="B71">Sauer and Wang 2019</xref>), but reimplementing their python pipeline in R as part of the <italic>microTrait</italic> package itself to increase computational efficiency.</p>
</sec>
<sec id="s5-4">
<title>Inference of Guilds</title>
<p>Ecological guilds were inferred from <italic>microTrait</italic> trait matrix with variance partitioning and clustering analysis. Trait values for &#x201c;count traits&#x201d; were normalized by genome size to express them as &#x201c;per base-pair genomic investments&#x201d;. The normalized trait matrix was used to calculate genome-to-genome distances using Wishart distance metric for mixed variable data (<xref ref-type="bibr" rid="B91">Wishart 2003</xref>) as implemented in R kmed package. Wishart distance is similar to the Gower distance (<xref ref-type="bibr" rid="B37">Gower 1971</xref>) for mixed variable data but applies a variance weight rather than a range for the numerical variables and uses a squared distance component. The resulting distance matrix was used to cluster genomes using hierarchical clustering with complete linkage. Next, we quantified variance in the genome to genome distances as a function of the number of defined guilds. We first cut the tree from hierarchical clustering into clusters ranging from 2 clusters to the total number of genomes in the dataset. Then, for each cut that corresponds to a given number of clusters, we quantified the variance in the distance matrix using cluster identity as a source of variation (using adonis2 in R vegan package) and plotted the resulting coefficient of determination (R<sup>2</sup>) as a function of the number of clusters. This allows the user the option to pick the number of guilds capturing a given level of trait variance across the dataset, and vice versa. Given a threshold for a trait variance or a number of guilds, we then assign each genome to a guild based on the corresponding tree cut from hierarchical clustering. Finally, we visualize the trait profiles for the defined guilds using trait positivity as a metric.</p>
</sec>
</sec>
</body>
<back>
<sec id="s6" sec-type="data-availability">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article <xref ref-type="sec" rid="s11">Supplementary Material</xref>, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>
<italic>microTrait</italic> was conceived by UK and EB. UK developed the code, performed the computational analyses and wrote the original draft of the manuscript. EB contributed to the writing, review, and editing of the manuscript.</p>
</sec>
<sec id="s8">
<title>Funding</title>
<p>This work was supported by the Watershed Function Science Focus Area, and the Belowground Biogeochemistry Science Focus Areas at Lawrence Berkeley National Laboratory, funded by the US Department of Energy, Office of Science, Office of Biological and Environmental Research, Environmental System Science program under Award No. DE-AC02-05CH11231.</p>
</sec>
<sec sec-type="COI-statement" id="s9">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ack>
<p>This manuscript benefited from discussions with organizers and participants at the US National Institute for Mathematical and Biological Synthesis (NIMBioS) Pan-microbial Trait Ecology Workshop June 14&#x2013;16 2017 at the University of Tennessee, Knoxville, for which we are grateful.</p>
</ack>
<sec id="s11">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fbinf.2022.918853/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fbinf.2022.918853/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material>
<label>Supplementary Figure S1</label>
<caption>
<p>Performance of <italic>microTrait</italic>-HMMs with respect to cross-reference to KEGG orthologous families (KO). Each point corresponds to a gene-level HMM with the estimated sensitivity (true positive rate) and specificity (as false positive rate or 1-specificity) corresponding to the scoring threshold that maximizes F-score. The inset shows the cumulative distribution for the maximum F-scores.</p>
</caption>
</supplementary-material>
<supplementary-material>
<label>Supplementary Figure S2</label>
<caption>
<p>
<italic>microTrait</italic> runtimes. Distribution of running times for isolate and metagenome-assembled genome sets normalized for genome size (measured as time (minutes) per Mb of sequence). Each point in the distribution corresponds to a genome. The normalized running times depend on the genome content, with more HMM hits requiring longer processing.</p>
</caption>
</supplementary-material>
<supplementary-material>
<label>Supplementary Figure S3</label>
<caption>
<p>Refinement of functional guilds using <italic>microTrait</italic>.</p>
</caption>
</supplementary-material>
<supplementary-material>
<label>Supplementary Figure S4</label>
<caption>
<p>Example <italic>microTrait</italic> trait matrix for soil isolate genomes as in <xref ref-type="fig" rid="F5">Figure 5A</xref>, in high resolution.</p>
</caption>
</supplementary-material>
<supplementary-material>
<label>Supplementary Table S1</label>
<caption>
<p>
<italic>microTrait</italic> hierarchy. Hierarchical mapping of genome-derived features into ecological function of increasing granularity in <italic>microTrait</italic>. <italic>microTrait</italic> hierarchy is an unbalanced hierarchy with 3 levels, with certain leaves spanning all 3 levels. References supporting the inference of traits from genome derived features are given in <xref ref-type="sec" rid="s11">Supplementary Table S8</xref>.</p>
</caption>
</supplementary-material>
<supplementary-material>
<label>Supplementary Table S2</label>
<caption>
<p>
<italic>microTrait</italic> HMMs. List of gene-level HMMs underlying <italic>microTrait</italic> pipeline (&#x201c;<italic>microTrait</italic>-HMMs&#x201d;), with cross-references (&#x201c;dbxref&#x201d;) to KEGG, EC, and Transporter Classification Database.</p>
</caption>
</supplementary-material>
<supplementary-material>
<label>Supplementary Table S3</label>
<caption>
<p>Evaluation of <italic>microTrait</italic> HMMs. Performance of <italic>microTrait</italic>-HMMs with respect to cross-reference to KEGG orthologous families (KO). For each model, the model score maximizing F-score for the corresponding KO is used as a trusted cutoff.</p>
</caption>
</supplementary-material>
<supplementary-material>
<label>Supplementary Table S4</label>
<caption>
<p>
<italic>microTrait</italic> rules. Each <italic>microTrait</italic> rule is a boolean expression for presence/absence of <italic>microTrait</italic> HMMs or other <italic>microTrait</italic> rules.</p>
</caption>
</supplementary-material>
<supplementary-material>
<label>Supplementary Table S5</label>
<caption>
<p>Mapping of <italic>microTrait</italic> rules to the <italic>microTrait</italic> hierarchy. <italic>microTrait</italic> traits are either of type binary or count. Count traits can be counted by themselves or by their substrate (microtrait_rule-type &#x3d; &#x201c;count_by_substrate&#x201d;) in case of transporters. Refer to ST6 for the mapping between substrates and the <italic>microTrait</italic> hierarchy.</p>
</caption>
</supplementary-material>
<supplementary-material>
<label>Supplementary Table S6</label>
<caption>
<p>Classification of substrates for substrate uptake and degradation by chemical class.</p>
</caption>
</supplementary-material>
<supplementary-material>
<label>Supplementary Table S7</label>
<caption>
<p>
<italic>microTrait</italic> traits by strategy, type (i.e. binary, count), and granularity.</p>
</caption>
</supplementary-material>
<supplementary-material>
<label>Supplementary Table S8</label>
<caption>
<p>References for genome-derived features underlying ecological traits.</p>
</caption>
</supplementary-material>
<supplementary-material>
<label>Supplementary Table S9</label>
<caption>
<p>Selected GOLD genomes of organisms isolated from aquatic or terrestrial environments. Environmental isolate genomes (GOLD_organisms:Cultured &#x3d;&#x3d; &#x201c;Yes&#x201d; AND GOLD_organisms:Ecosystem &#x3d;&#x3d; &#x201c;Environmental&#x201d;) from GOLD database (<ext-link ext-link-type="uri" xlink:href="https://gold.jgi.doe.gov/">https://gold.jgi.doe.gov/</ext-link>) were selected and filtered using ecosystem category and sample collection site (GOLD_organisms:Ecosystem Category &#x3d;&#x3d; &#x201c;Aquatic OR Terrestrial&#x201d; OR GOLD_organisms:Sample Collection Site (MIGS-13) &#x3d;&#x3d; &#x201c;soil OR sediment OR rhizosphere&#x201d;).</p>
</caption>
</supplementary-material>
<supplementary-material>
<label>Supplementary Table S10</label>
<caption>
<p>Taxonomic breakdown of selected GOLD genomes.</p>
</caption>
</supplementary-material>
<supplementary-material>
<label>Supplementary Table S11</label>
<caption>
<p>Mapping between taxa and functional groups based on Faprotax database. Faprotax (Functional Annotation of Prokaryotic Taxa) (<ext-link ext-link-type="uri" xlink:href="http://www.loucalab.com/archive/FAPROTAX/lib/php/index.php?section=Download">http://www.loucalab.com/archive/FAPROTAX/lib/php/index.php?section&#x003D;Download</ext-link>) is a database that maps prokaryotic clades (e.g. class, order, family, genus, species) to metabolic functions. For comparison with <italic>microTrait</italic> rules for the same metabolic functions, we resolved the listed taxa names to standard names, which are listed in this table (column: taxa).</p>
</caption>
</supplementary-material>
<supplementary-material>
<label>Supplementary Table S12</label>
<caption>
<p>Mapping of Faprotax taxa name to the NCBI taxa name.</p>
</caption>
</supplementary-material>
<supplementary-material>
<label>Supplementary Table S13</label>
<caption>
<p>Functional group assignments with Faprotax and <italic>microTrait</italic>. Each GOLD genome was assigned to a Faprotax functional group by taxonomy (i.e. based on Faprotax database as in ST11) and by <italic>microTrait</italic> (i.e based on genome sequence).</p>
</caption>
</supplementary-material>
<supplementary-material>
<label>Supplementary Table S14</label>
<caption>
<p>Evaluation of <italic>microTrait</italic> traits (genome-based) with respect to Faprotax functional groups (taxonomic name based). For each functional group, validity of <italic>microTrait</italic> predictions is evaluated based on Faprotax classifications (T: number of <italic>microTrait</italic> predicted positive genomes, N: number of <italic>microTrait</italic> predicted negative genomes, TP: number of true positive genomes, TN: number of true negative genomes, FP: number of false positive genomes, FN: number of false negative genomes, TPR: true positive rate, TNR: true negative rate).</p>
</caption>
</supplementary-material>
<supplementary-material>
<label>Supplementary Table S15</label>
<caption>
<p>Correlations between traits. Spearman&#x2019;s rank correlation coefficient between pairs of traits.</p>
</caption>
</supplementary-material>
<supplementary-material>
<label>Supplementary Table S16</label>
<caption>
<p>Guild trait profile matrix. Trait profiles (<italic>microTrait</italic> granularity 3) for defined guilds as mean trait values.</p>
</caption>
</supplementary-material>
<supplementary-material>
<label>Supplementary Table S17</label>
<caption>
<p>Guild taxonomic profiles. Taxonomic profiles for defined guilds as relative abundance of genome taxonomy (phylum, class, order, family, genus).</p>
</caption>
</supplementary-material>
<supplementary-material xlink:href="Image4.pdf" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image2.pdf" id="SM2" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image3.pdf" id="SM3" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table1.XLSX" id="SM4" mimetype="application/XLSX" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image1.pdf" id="SM5" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ackerly</surname>
<given-names>D. D.</given-names>
</name>
<name>
<surname>Cornwell</surname>
<given-names>W. K.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>A Trait-Based Approach to Community Assembly: Partitioning of Species Trait Values into within- and Among-Community Components</article-title>. <source>Ecol. Lett.</source> <volume>10</volume> (<issue>2</issue>), <fpage>135</fpage>&#x2013;<lpage>145</lpage>. <pub-id pub-id-type="doi">10.1111/j.1461-0248.2006.01006.x</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Allison</surname>
<given-names>S. D.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>A Trait-Based Approach for Modelling Microbial Litter Decomposition</article-title>. <source>Ecol. Lett.</source> <volume>15</volume> (<issue>9</issue>), <fpage>1058</fpage>&#x2013;<lpage>1070</lpage>. <pub-id pub-id-type="doi">10.1111/j.1461-0248.2012.01807.x</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Alneberg</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Bennke</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Beier</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Bunse</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Quince</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Ininbergs</surname>
<given-names>K.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Ecosystem-wide Metagenomic Binning Enables Prediction of Ecological Niches from Genomes</article-title>. <source>Commun. Biol.</source> <volume>3</volume> (<issue>1</issue>), <fpage>119</fpage>. <pub-id pub-id-type="doi">10.1038/s42003-020-0856-x</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Altschul</surname>
<given-names>S. F.</given-names>
</name>
<name>
<surname>Gish</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>E. W.</given-names>
</name>
<name>
<surname>Lipman</surname>
<given-names>D. J.</given-names>
</name>
</person-group> (<year>1990</year>). <article-title>Basic Local Alignment Search Tool</article-title>. <source>J. Mol. Biol.</source> <volume>215</volume> (<issue>3</issue>), <fpage>403</fpage>&#x2013;<lpage>410</lpage>. <pub-id pub-id-type="doi">10.1016/S0022-2836(05)80360-2</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Anantharaman</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>C. T.</given-names>
</name>
<name>
<surname>Hug</surname>
<given-names>L. A.</given-names>
</name>
<name>
<surname>Sharon</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Castelle</surname>
<given-names>C. J.</given-names>
</name>
<name>
<surname>Probst</surname>
<given-names>A. J.</given-names>
</name>
<etal/>
</person-group> (<year>2016</year>). <article-title>Thousands of Microbial Genomes Shed Light on Interconnected Biogeochemical Processes in an Aquifer System</article-title>. <source>Nat. Commun.</source> <volume>7</volume>, <fpage>13219</fpage>. <pub-id pub-id-type="doi">10.1038/ncomms13219</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Arnosti</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Microbial Extracellular Enzymes and the Marine Carbon Cycle</article-title>. <source>Ann. Rev. Mar. Sci.</source> <volume>3</volume>, <fpage>401</fpage>&#x2013;<lpage>425</lpage>. <pub-id pub-id-type="doi">10.1146/annurev-marine-120709-142731</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Arnosti</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bell</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Moorhead</surname>
<given-names>D. L.</given-names>
</name>
<name>
<surname>Sinsabaugh</surname>
<given-names>R. L.</given-names>
</name>
<name>
<surname>Steen</surname>
<given-names>A. D.</given-names>
</name>
<name>
<surname>Stromberger</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2014</year>). <article-title>Extracellular Enzymes in Terrestrial, Freshwater, and Marine Environments: Perspectives on System Variability and Common Research Needs</article-title>. <source>Biogeochemistry</source> <volume>117</volume> (<issue>1</issue>), <fpage>5</fpage>&#x2013;<lpage>21</lpage>. <pub-id pub-id-type="doi">10.1007/s10533-013-9906-5</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Asshauer</surname>
<given-names>K. P.</given-names>
</name>
<name>
<surname>Wemheuer</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Daniel</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Meinicke</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Tax4Fun: Predicting Functional Profiles from Metagenomic 16S rRNA Data</article-title>. <source>Bioinformatics</source> <volume>31</volume> (<issue>17</issue>), <fpage>2882</fpage>&#x2013;<lpage>2884</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btv287</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Author Anonymous</surname>
</name>
</person-group> (<year>1999</year>). <article-title>IUPAC-IUBMB Joint Commission on Biochemical Nomenclature (JCBN) and Nomenclature Committee of IUBMB (NC-IUBMB), Newsletter 1999</article-title>. <source>Eur. J. Biochem.</source> <volume>264</volume> (<issue>2</issue>), <fpage>607</fpage>&#x2013;<lpage>609</lpage>. <pub-id pub-id-type="doi">10.1046/j.1432-1327.1999.news99.x</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Barrett</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Clark</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Gevorgyan</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Gorelenkov</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Gribov</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Karsch-Mizrachi</surname>
<given-names>I.</given-names>
</name>
<etal/>
</person-group> (<year>2012</year>). <article-title>BioProject and BioSample Databases at NCBI: Facilitating Capture and Organization of Metadata</article-title>. <source>Nucleic Acids Res.</source> <volume>40</volume>, <fpage>D57</fpage>&#x2013;<lpage>D63</lpage>. <comment>Database issue)</comment>. <pub-id pub-id-type="doi">10.1093/nar/gkr1163</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bergauer</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Fernandez-Guerra</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Garcia</surname>
<given-names>J. A. L.</given-names>
</name>
<name>
<surname>Sprenger</surname>
<given-names>R. R.</given-names>
</name>
<name>
<surname>Stepanauskas</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Pachiadaki</surname>
<given-names>M. G.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Organic Matter Processing by Microbial Communities throughout the Atlantic Water Column as Revealed by Metaproteomics</article-title>. <source>Proc. Natl. Acad. Sci. U. S. A.</source> <volume>115</volume> (<issue>3</issue>), <fpage>E400</fpage>&#x2013;<lpage>E408</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1708779115</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Berntsson</surname>
<given-names>R. P.</given-names>
</name>
<name>
<surname>Smits</surname>
<given-names>S. H.</given-names>
</name>
<name>
<surname>Schmitt</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Slotboom</surname>
<given-names>D. J.</given-names>
</name>
<name>
<surname>Poolman</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>A Structural Classification of Substrate-Binding Proteins</article-title>. <source>FEBS Lett.</source> <volume>584</volume> (<issue>12</issue>), <fpage>2606</fpage>&#x2013;<lpage>2617</lpage>. <pub-id pub-id-type="doi">10.1016/j.febslet.2010.04.043</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bier</surname>
<given-names>R. L.</given-names>
</name>
<name>
<surname>Bernhardt</surname>
<given-names>E. S.</given-names>
</name>
<name>
<surname>Boot</surname>
<given-names>C. M.</given-names>
</name>
<name>
<surname>Graham</surname>
<given-names>E. B.</given-names>
</name>
<name>
<surname>Hall</surname>
<given-names>E. K.</given-names>
</name>
<name>
<surname>Lennon</surname>
<given-names>J. T.</given-names>
</name>
<etal/>
</person-group> (<year>2015</year>). <article-title>Linking Microbial Community Structure and Microbial Processes: an Empirical and Conceptual Overview</article-title>. <source>FEMS Microbiol. Ecol.</source> <volume>91</volume> (<issue>10</issue>). <pub-id pub-id-type="doi">10.1093/femsec/fiv113</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bock</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Farlik</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Sheffield</surname>
<given-names>N. C.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Multi-Omics of Single Cells: Strategies and Applications</article-title>. <source>Trends Biotechnol.</source> <volume>34</volume> (<issue>8</issue>), <fpage>605</fpage>&#x2013;<lpage>608</lpage>. <pub-id pub-id-type="doi">10.1016/j.tibtech.2016.04.004</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bouskill</surname>
<given-names>N. J.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Riley</surname>
<given-names>W. J.</given-names>
</name>
<name>
<surname>Brodie</surname>
<given-names>E. L.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Trait-based Representation of Biological Nitrification: Model Development, Testing, and Predicted Community Composition</article-title>. <source>Front. Microbiol.</source> <volume>3</volume>, <fpage>364</fpage>. <pub-id pub-id-type="doi">10.3389/fmicb.2012.00364</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brbic</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Piskorec</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Vidulin</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Krisko</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Smuc</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Supek</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>The Landscape of Microbial Phenotypic Traits and Associated Genes</article-title>. <source>Nucleic Acids Res.</source> <volume>44</volume> (<issue>21</issue>), <fpage>10074</fpage>&#x2013;<lpage>10090</lpage>. </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brenner</surname>
<given-names>S. E.</given-names>
</name>
<name>
<surname>Chothia</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Hubbard</surname>
<given-names>T. J.</given-names>
</name>
</person-group> (<year>1998</year>). <article-title>Assessing Sequence Comparison Methods with Reliable Structurally Identified Distant Evolutionary Relationships</article-title>. <source>Proc. Natl. Acad. Sci. U. S. A.</source> <volume>95</volume> (<issue>11</issue>), <fpage>6073</fpage>&#x2013;<lpage>6078</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.95.11.6073</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Buchfink</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Huson</surname>
<given-names>D. H.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Fast and Sensitive Protein Alignment Using DIAMOND</article-title>. <source>Nat. Methods</source> <volume>12</volume> (<issue>1</issue>), <fpage>59</fpage>&#x2013;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.3176</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ceja-Navarro</surname>
<given-names>J. A.</given-names>
</name>
<name>
<surname>Karaoz</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Bill</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Hao</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>White</surname>
<given-names>R. A.</given-names>
<suffix>3rd</suffix>
</name>
<name>
<surname>Arellano</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Gut Anatomical Properties and Microbial Functional Assembly Promote Lignocellulose Deconstruction and Colony Subsistence of a Wood-Feeding Beetle</article-title>. <source>Nat. Microbiol.</source> <volume>4</volume> (<issue>5</issue>), <fpage>864</fpage>&#x2013;<lpage>875</lpage>. <pub-id pub-id-type="doi">10.1038/s41564-019-0384-y</pub-id> </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>I. A.</given-names>
</name>
<name>
<surname>Chu</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Palaniappan</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Pillay</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Ratner</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>IMG/M v.5.0: an Integrated Data Management and Comparative Analysis System for Microbial Genomes and Microbiomes</article-title>. <source>Nucleic Acids Res.</source> <volume>47</volume> (<issue>D1</issue>), <fpage>D666</fpage>&#x2013;<lpage>D677</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gky901</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Clark</surname>
<given-names>I. C.</given-names>
</name>
<name>
<surname>Melnyk</surname>
<given-names>R. A.</given-names>
</name>
<name>
<surname>Engelbrektson</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Coates</surname>
<given-names>J. D.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Structure and Evolution of Chlorate Reduction Composite Transposons</article-title>. <source>mBio</source> <volume>4</volume> (<issue>4</issue>). <pub-id pub-id-type="doi">10.1128/mBio.00379-13</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Costa</surname>
<given-names>O. Y. A.</given-names>
</name>
<name>
<surname>Raaijmakers</surname>
<given-names>J. M.</given-names>
</name>
<name>
<surname>Kuramae</surname>
<given-names>E. E.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Microbial Extracellular Polymeric Substances: Ecological Function and Impact on Soil Aggregation</article-title>. <source>Front. Microbiol.</source> <volume>9</volume>, <fpage>1636</fpage>. <pub-id pub-id-type="doi">10.3389/fmicb.2018.01636</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Courty</surname>
<given-names>P. E.</given-names>
</name>
<name>
<surname>Wipf</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Editorial: Transport in Plant Microbe Interactions</article-title>. <source>Front. Plant Sci.</source> <volume>7</volume>, <fpage>809</fpage>. <pub-id pub-id-type="doi">10.3389/fpls.2016.00809</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Csonka</surname>
<given-names>L. N.</given-names>
</name>
</person-group> (<year>1989</year>). <article-title>Physiological and Genetic Responses of Bacteria to Osmotic Stress</article-title>. <source>Microbiol. Rev.</source> <volume>53</volume> (<issue>1</issue>), <fpage>121</fpage>&#x2013;<lpage>147</lpage>. <pub-id pub-id-type="doi">10.1128/mr.53.1.121-147.1989</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dombrowski</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Teske</surname>
<given-names>A. P.</given-names>
</name>
<name>
<surname>Baker</surname>
<given-names>B. J.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Expansive Microbial Metabolic Versatility and Biodiversity in Dynamic Guaymas Basin Hydrothermal Sediments</article-title>. <source>Nat. Commun.</source> <volume>9</volume> (<issue>1</issue>), <fpage>4999</fpage>. <pub-id pub-id-type="doi">10.1038/s41467-018-07418-0</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Drouin</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Letarte</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Raymond</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Marchand</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Corbeil</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Laviolette</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Interpretable Genotype-To-Phenotype Classifiers with Performance Guarantees</article-title>. <source>Sci. Rep.</source> <volume>9</volume> (<issue>1</issue>), <fpage>4071</fpage>. <pub-id pub-id-type="doi">10.1038/s41598-019-40561-2</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eddy</surname>
<given-names>S. R.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>A Probabilistic Model of Local Sequence Alignment that Simplifies Statistical Significance Estimation</article-title>. <source>PLoS Comput. Biol.</source> <volume>4</volume> (<issue>5</issue>), <fpage>e1000069</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1000069</pub-id> </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Edgar</surname>
<given-names>R. C.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Search and Clustering Orders of Magnitude Faster Than BLAST</article-title>. <source>Bioinformatics</source> <volume>26</volume> (<issue>19</issue>), <fpage>2460</fpage>&#x2013;<lpage>2461</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btq461</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Falkowski</surname>
<given-names>P. G.</given-names>
</name>
<name>
<surname>Fenchel</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Delong</surname>
<given-names>E. F.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>The Microbial Engines that Drive Earth&#x27;s Biogeochemical Cycles</article-title>. <source>Science</source> <volume>320</volume> (<issue>5879</issue>), <fpage>1034</fpage>&#x2013;<lpage>1039</lpage>. <pub-id pub-id-type="doi">10.1126/science.1153213</pub-id> </citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Feder</surname>
<given-names>M. E.</given-names>
</name>
<name>
<surname>Hofmann</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>1999</year>). <article-title>Heat-shock Proteins, Molecular Chaperones, and the Stress Response: Evolutionary and Ecological Physiology</article-title>. <source>Annu. Rev. Physiol.</source> <volume>61</volume>, <fpage>243</fpage>&#x2013;<lpage>282</lpage>. <pub-id pub-id-type="doi">10.1146/annurev.physiol.61.1.243</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Feldbauer</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Schulz</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Horn</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Rattei</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Prediction of Microbial Phenotypes Based on Comparative Genomics</article-title>. <source>BMC Bioinforma.</source> <volume>16</volume> (<issue>Suppl. 14</issue>), <fpage>S1</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-16-S14-S1</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Finlay</surname>
<given-names>B. J.</given-names>
</name>
<name>
<surname>Maberly</surname>
<given-names>S. C.</given-names>
</name>
<name>
<surname>Cooper</surname>
<given-names>J. I.</given-names>
</name>
</person-group> (<year>1997</year>). <article-title>Microbial Diversity and Ecosystem Function</article-title>. <source>Oikos</source> <volume>80</volume> (<issue>2</issue>), <fpage>209</fpage>&#x2013;<lpage>213</lpage>. <pub-id pub-id-type="doi">10.2307/3546587</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Flamholz</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Noor</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Bar-Even</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Liebermeister</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Milo</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Glycolytic Strategy as a Tradeoff between Energy Yield and Protein Cost</article-title>. <source>Proc. Natl. Acad. Sci. U. S. A.</source> <volume>110</volume> (<issue>24</issue>), <fpage>10039</fpage>&#x2013;<lpage>10044</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1215283110</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Follows</surname>
<given-names>M. J.</given-names>
</name>
<name>
<surname>Dutkiewicz</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Grant</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Chisholm</surname>
<given-names>S. W.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Emergent Biogeography of Microbial Communities in a Model Ocean</article-title>. <source>Science</source> <volume>315</volume> (<issue>5820</issue>), <fpage>1843</fpage>&#x2013;<lpage>1846</lpage>. <pub-id pub-id-type="doi">10.1126/science.1138544</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gao</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Mao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>W. T.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Wells</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Genome-centric Metagenomics Resolves Microbial Diversity and Prevalent Truncated Denitrification Pathways in a Denitrifying PAO-Enriched Bioprocess</article-title>. <source>Water Res.</source> <volume>155</volume>, <fpage>275</fpage>&#x2013;<lpage>287</lpage>. <pub-id pub-id-type="doi">10.1016/j.watres.2019.02.020</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goberna</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Verd&#xfa;</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Predicting Microbial Traits with Phylogenies</article-title>. <source>ISME J.</source> <volume>10</volume> (<issue>4</issue>), <fpage>959</fpage>&#x2013;<lpage>967</lpage>. <pub-id pub-id-type="doi">10.1038/ismej.2015.171</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gower</surname>
<given-names>J. C.</given-names>
</name>
</person-group> (<year>1971</year>). <article-title>A General Coefficient of Similarity and Some of its Properties</article-title>. <source>Biometrics</source> <volume>27</volume> (<issue>4</issue>), <fpage>857</fpage>&#x2013;<lpage>871</lpage>. <pub-id pub-id-type="doi">10.2307/2528823</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Green</surname>
<given-names>J. L.</given-names>
</name>
<name>
<surname>Bohannan</surname>
<given-names>B. J.</given-names>
</name>
<name>
<surname>Whitaker</surname>
<given-names>R. J.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Microbial Biogeography: from Taxonomy to Traits</article-title>. <source>Science</source> <volume>320</volume> (<issue>5879</issue>), <fpage>1039</fpage>&#x2013;<lpage>1043</lpage>. <pub-id pub-id-type="doi">10.1126/science.1153475</pub-id> </citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gupta</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Prasoodanan</surname>
<given-names>V. P.</given-names>
</name>
<name>
<surname>Harish</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Sharma</surname>
<given-names>A. K.</given-names>
</name>
<name>
<surname>Sharma</surname>
<given-names>V. K.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Reconstruction of Bacterial and Viral Genomes from Multiple Metagenomes</article-title>. <source>Front. Microbiol.</source> <volume>7</volume>, <fpage>469</fpage>. <pub-id pub-id-type="doi">10.3389/fmicb.2016.00469</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Hadley</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Hester</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Francois</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2018</year>). <source>Readr: Read Rectangular Text Data</source>. </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hecker</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>V&#xf6;lker</surname>
<given-names>U.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>General Stress Response of Bacillus Subtilis and Other Bacteria</article-title>. <source>Adv. Microb. Physiol.</source> <volume>44</volume>, <fpage>35</fpage>&#x2013;<lpage>91</lpage>. <pub-id pub-id-type="doi">10.1016/s0065-2911(01)44011-2</pub-id> </citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hyatt</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>G. L.</given-names>
</name>
<name>
<surname>Locascio</surname>
<given-names>P. F.</given-names>
</name>
<name>
<surname>Land</surname>
<given-names>M. L.</given-names>
</name>
<name>
<surname>Larimer</surname>
<given-names>F. W.</given-names>
</name>
<name>
<surname>Hauser</surname>
<given-names>L. J.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification</article-title>. <source>BMC Bioinforma.</source> <volume>11</volume>, <fpage>119</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-11-119</pub-id> </citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jaffe</surname>
<given-names>A. L.</given-names>
</name>
<name>
<surname>Castelle</surname>
<given-names>C. J.</given-names>
</name>
<name>
<surname>Matheus Carnevali</surname>
<given-names>P. B.</given-names>
</name>
<name>
<surname>Gribaldo</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Banfield</surname>
<given-names>J. F.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>The Rise of Diversity in Metabolic Platforms across the Candidate Phyla Radiation</article-title>. <source>BMC Biol.</source> <volume>18</volume> (<issue>1</issue>), <fpage>69</fpage>. <pub-id pub-id-type="doi">10.1186/s12915-020-00804-5</pub-id> </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jetten</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Schmid</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>van de Pas-Schoonen</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Sinninghe Damst&#xe9;</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Strous</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>Anammox Organisms: Enrichment, Cultivation, and Environmental Analysis</article-title>. <source>Methods Enzymol.</source> <volume>397</volume>, <fpage>34</fpage>&#x2013;<lpage>57</lpage>. <pub-id pub-id-type="doi">10.1016/S0076-6879(05)97003-1</pub-id> </citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Johnstone</surname>
<given-names>I. M.</given-names>
</name>
<name>
<surname>Titterington</surname>
<given-names>D. M.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Statistical Challenges of High-Dimensional Data</article-title>. <source>Philos. Trans. A Math. Phys. Eng. Sci.</source> <volume>367</volume>, <fpage>4237</fpage>&#x2013;<lpage>4253</lpage>. <pub-id pub-id-type="doi">10.1098/rsta.2009.0159</pub-id> </citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jones</surname>
<given-names>C. M.</given-names>
</name>
<name>
<surname>Spor</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Brennan</surname>
<given-names>F. P.</given-names>
</name>
<name>
<surname>Breuil</surname>
<given-names>M.-C.</given-names>
</name>
<name>
<surname>Bru</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Lemanceau</surname>
<given-names>P.</given-names>
</name>
<etal/>
</person-group> (<year>2014</year>). <article-title>Recently Identified Microbial Guild Mediates Soil N2O Sink Capacity</article-title>. <source>Nat. Clim. Change</source> <volume>4</volume> (<issue>9</issue>), <fpage>801</fpage>&#x2013;<lpage>805</lpage>. <pub-id pub-id-type="doi">10.1038/nclimate2301</pub-id> </citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kanehisa</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Goto</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2000</year>). <article-title>KEGG: Kyoto Encyclopedia of Genes and Genomes</article-title>. <source>Nucleic Acids Res.</source> <volume>28</volume> (<issue>1</issue>), <fpage>27</fpage>&#x2013;<lpage>30</lpage>. <pub-id pub-id-type="doi">10.1093/nar/28.1.27</pub-id> </citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Katoh</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Kuma</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Miyata</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Toh</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>Improvement in the Accuracy of Multiple Sequence Alignment Program MAFFT</article-title>. <source>Genome Inf.</source> <volume>16</volume> (<issue>1</issue>), <fpage>22</fpage>&#x2013;<lpage>33</lpage>. </citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ko</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>L. T.</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>G. M.</given-names>
</name>
</person-group> (<year>1994</year>). <article-title>Glycine Betaine Confers Enhanced Osmotolerance and Cryotolerance on Listeria Monocytogenes</article-title>. <source>J. Bacteriol.</source> <volume>176</volume> (<issue>2</issue>), <fpage>426</fpage>&#x2013;<lpage>431</lpage>. <pub-id pub-id-type="doi">10.1128/jb.176.2.426-431.1994</pub-id> </citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lajoie</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Kembel</surname>
<given-names>S. W.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Making the Most of Trait-Based Approaches for Microbial Ecology</article-title>. <source>Trends Microbiol.</source> <volume>27</volume> (<issue>10</issue>), <fpage>814</fpage>&#x2013;<lpage>823</lpage>. <pub-id pub-id-type="doi">10.1016/j.tim.2019.06.003</pub-id> </citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Langille</surname>
<given-names>M. G.</given-names>
</name>
<name>
<surname>Zaneveld</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Caporaso</surname>
<given-names>J. G.</given-names>
</name>
<name>
<surname>McDonald</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Knights</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Reyes</surname>
<given-names>J. A.</given-names>
</name>
<etal/>
</person-group> (<year>2013</year>). <article-title>Predictive Functional Profiling of Microbial Communities Using 16S rRNA Marker Gene Sequences</article-title>. <source>Nat. Biotechnol.</source> <volume>31</volume> (<issue>9</issue>), <fpage>814</fpage>&#x2013;<lpage>821</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.2676</pub-id> </citation>
</ref>
<ref id="B52">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Laughlin</surname>
<given-names>D. C.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>The Intrinsic Dimensionality of Plant Traits and its Relevance to Community Assembly</article-title>. <source>J. Ecol.</source> <volume>102</volume> (<issue>1</issue>), <fpage>186</fpage>&#x2013;<lpage>193</lpage>. <pub-id pub-id-type="doi">10.1111/1365-2745.12187</pub-id> </citation>
</ref>
<ref id="B53">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Ni</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Genomic Insights into Metabolic Potentials of Two Simultaneous Aerobic Denitrification and Phosphorus Removal Bacteria, Achromobacter Sp. GAD3 and Agrobacterium Sp. LAD9</article-title>. <source>FEMS Microbiol. Ecol.</source> <volume>94</volume> (<issue>4</issue>). <pub-id pub-id-type="doi">10.1093/femsec/fiy020</pub-id> </citation>
</ref>
<ref id="B54">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Louca</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Parfrey</surname>
<given-names>L. W.</given-names>
</name>
<name>
<surname>Doebeli</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Decoupling Function and Taxonomy in the Global Ocean Microbiome</article-title>. <source>Science</source> <volume>353</volume> (<issue>6305</issue>), <fpage>1272</fpage>&#x2013;<lpage>1277</lpage>. <pub-id pub-id-type="doi">10.1126/science.aaf4507</pub-id> </citation>
</ref>
<ref id="B55">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lycus</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Lovise B&#xf8;thun</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Bergaust</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Peele Shapleigh</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Reier Bakken</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Frosteg&#xe5;rd</surname>
<given-names>&#xc5;.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Phenotypic and Genotypic Richness of Denitrifiers Revealed by a Novel Isolation Strategy</article-title>. <source>ISME J.</source> <volume>11</volume> (<issue>10</issue>), <fpage>2219</fpage>&#x2013;<lpage>2232</lpage>. <pub-id pub-id-type="doi">10.1038/ismej.2017.82</pub-id> </citation>
</ref>
<ref id="B56">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Madin</surname>
<given-names>J. S.</given-names>
</name>
<name>
<surname>Nielsen</surname>
<given-names>D. A.</given-names>
</name>
<name>
<surname>Brbic</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Corkrey</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Danko</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Edwards</surname>
<given-names>K.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>A Synthesis of Bacterial and Archaeal Phenotypic Trait Data</article-title>. <source>Sci. Data</source> <volume>7</volume> (<issue>1</issue>), <fpage>170</fpage>. <pub-id pub-id-type="doi">10.1038/s41597-020-0497-4</pub-id> </citation>
</ref>
<ref id="B57">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Malik</surname>
<given-names>A. A.</given-names>
</name>
<name>
<surname>Martiny</surname>
<given-names>J. B. H.</given-names>
</name>
<name>
<surname>Brodie</surname>
<given-names>E. L.</given-names>
</name>
<name>
<surname>Martiny</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>Treseder</surname>
<given-names>K. K.</given-names>
</name>
<name>
<surname>Allison</surname>
<given-names>S. D.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Defining Trait-Based Microbial Strategies with Consequences for Soil Carbon Cycling under Climate Change</article-title>. <source>ISME J.</source> <volume>14</volume> (<issue>1</issue>), <fpage>1</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1038/s41396-019-0510-0</pub-id> </citation>
</ref>
<ref id="B58">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Malik</surname>
<given-names>A. A.</given-names>
</name>
<name>
<surname>Martiny</surname>
<given-names>J. B. H.</given-names>
</name>
<name>
<surname>Brodie</surname>
<given-names>E. L.</given-names>
</name>
<name>
<surname>Martiny</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>Treseder</surname>
<given-names>K. K.</given-names>
</name>
<name>
<surname>Allison</surname>
<given-names>S. D.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Defining Trait-Based Microbial Strategies with Consequences for Soil Carbon Cycling under Climate Change</article-title>. <source>bioRxiv</source>, <fpage>445866</fpage>. </citation>
</ref>
<ref id="B59">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Martiny</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>Treseder</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Pusch</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Phylogenetic Conservatism of Functional Traits in Microorganisms</article-title>. <source>ISME J.</source> <volume>7</volume> (<issue>4</issue>), <fpage>830</fpage>&#x2013;<lpage>838</lpage>. <pub-id pub-id-type="doi">10.1038/ismej.2012.160</pub-id> </citation>
</ref>
<ref id="B60">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mindock</surname>
<given-names>C. A.</given-names>
</name>
<name>
<surname>Petrova</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Hollingswort</surname>
<given-names>R. I.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>Re-evaluation of Osmotic Effects as a General Adaptative Strategy for Bacteria in Sub-freezing Conditions</article-title>. <source>Biophys. Chem.</source> <volume>89</volume> (<issue>1</issue>), <fpage>13</fpage>&#x2013;<lpage>24</lpage>. <pub-id pub-id-type="doi">10.1016/s0301-4622(00)00214-3</pub-id> </citation>
</ref>
<ref id="B61">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mukherjee</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Stamatis</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Bertsch</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Ovchinnikova</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Katta</surname>
<given-names>H. Y.</given-names>
</name>
<name>
<surname>Mojica</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Genomes OnLine Database (GOLD) v.7: Updates and New Features</article-title>. <source>Nucleic Acids Res.</source> <volume>47</volume> (<issue>D1</issue>), <fpage>D649</fpage>&#x2013;<lpage>D659</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gky977</pub-id> </citation>
</ref>
<ref id="B62">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Parks</surname>
<given-names>D. H.</given-names>
</name>
<name>
<surname>Rinke</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Chuvochina</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Chaumeil</surname>
<given-names>P. A.</given-names>
</name>
<name>
<surname>Woodcroft</surname>
<given-names>B. J.</given-names>
</name>
<name>
<surname>Evans</surname>
<given-names>P. N.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>Recovery of Nearly 8,000 Metagenome-Assembled Genomes Substantially Expands the Tree of Life</article-title>. <source>Nat. Microbiol.</source> <volume>2</volume> (<issue>11</issue>), <fpage>1533</fpage>&#x2013;<lpage>1542</lpage>. <pub-id pub-id-type="doi">10.1038/s41564-017-0012-7</pub-id> </citation>
</ref>
<ref id="B63">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Prosser</surname>
<given-names>J. I.</given-names>
</name>
<name>
<surname>Bohannan</surname>
<given-names>B. J.</given-names>
</name>
<name>
<surname>Curtis</surname>
<given-names>T. P.</given-names>
</name>
<name>
<surname>Ellis</surname>
<given-names>R. J.</given-names>
</name>
<name>
<surname>Firestone</surname>
<given-names>M. K.</given-names>
</name>
<name>
<surname>Freckleton</surname>
<given-names>R. P.</given-names>
</name>
<etal/>
</person-group> (<year>2007</year>). <article-title>The Role of Ecological Theory in Microbial Ecology</article-title>. <source>Nat. Rev. Microbiol.</source> <volume>5</volume> (<issue>5</issue>), <fpage>384</fpage>&#x2013;<lpage>392</lpage>. <pub-id pub-id-type="doi">10.1038/nrmicro1643</pub-id> </citation>
</ref>
<ref id="B64">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Prosser</surname>
<given-names>J. I.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Dispersing Misconceptions and Identifying Opportunities for the Use of &#x27;omics&#x27; in Soil Microbial Ecology</article-title>. <source>Nat. Rev. Microbiol.</source> <volume>13</volume> (<issue>7</issue>), <fpage>439</fpage>&#x2013;<lpage>446</lpage>. <pub-id pub-id-type="doi">10.1038/nrmicro3468</pub-id> </citation>
</ref>
<ref id="B65">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ramirez</surname>
<given-names>K. S.</given-names>
</name>
<name>
<surname>Knight</surname>
<given-names>C. G.</given-names>
</name>
<name>
<surname>de Hollander</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Brearley</surname>
<given-names>F. Q.</given-names>
</name>
<name>
<surname>Constantinides</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Cotton</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Detecting Macroecological Patterns in Bacterial Communities across Independent Studies of Global Soils</article-title>. <source>Nat. Microbiol.</source> <volume>3</volume> (<issue>2</issue>), <fpage>189</fpage>&#x2013;<lpage>196</lpage>. <pub-id pub-id-type="doi">10.1038/s41564-017-0062-x</pub-id> </citation>
</ref>
<ref id="B66">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ram&#xed;rez-Flandes</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Gonz&#xe1;lez</surname>
<given-names>B. O.</given-names>
</name>
<name>
<surname>Ulloa</surname>
<given-names>O.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Redox Traits Characterize the Organization of Global Microbial Communities</article-title>. <source>Proc. Natl. Acad. Sci. U. S. A.</source> <volume>116</volume> (<issue>9</issue>), <fpage>3630</fpage>&#x2013;<lpage>3635</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1817554116</pub-id> </citation>
</ref>
<ref id="B67">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ruan</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Bergey&#x27;s Manual of Systematic Bacteriology (Second Edition) Volume 5 and the Study of Actinomycetes Systematic in China</article-title>. <source>Wei Sheng Wu Xue Bao</source> <volume>53</volume> (<issue>6</issue>), <fpage>521</fpage>&#x2013;<lpage>530</lpage>. </citation>
</ref>
<ref id="B68">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Saier</surname>
<given-names>M. H.</given-names>
<suffix>Jr.</suffix>
</name>
<name>
<surname>Reddy</surname>
<given-names>V. S.</given-names>
</name>
<name>
<surname>Tsu</surname>
<given-names>B. V.</given-names>
</name>
<name>
<surname>Ahmed</surname>
<given-names>M. S.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Moreno-Hagelsieb</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>The Transporter Classification Database (TCDB): Recent Advances</article-title>. <source>Nucleic Acids Res.</source> <volume>44</volume> (<issue>D1</issue>), <fpage>D372</fpage>&#x2013;<lpage>D379</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkv1103</pub-id> </citation>
</ref>
<ref id="B69">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sanford</surname>
<given-names>R. A.</given-names>
</name>
<name>
<surname>Wagner</surname>
<given-names>D. D.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Chee-Sanford</surname>
<given-names>J. C.</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>S. H.</given-names>
</name>
<name>
<surname>Cruz-Garc&#xed;a</surname>
<given-names>C.</given-names>
</name>
<etal/>
</person-group> (<year>2012</year>). <article-title>Unexpected Nondenitrifier Nitrous Oxide Reductase Gene Diversity and Abundance in Soils</article-title>. <source>Proc. Natl. Acad. Sci. U. S. A.</source> <volume>109</volume> (<issue>48</issue>), <fpage>19709</fpage>&#x2013;<lpage>19714</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1211238109</pub-id> </citation>
</ref>
<ref id="B70">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sangwan</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Xia</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Gilbert</surname>
<given-names>J. A.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Recovering Complete and Draft Population Genomes from Metagenome Datasets</article-title>. <source>Microbiome</source> <volume>4</volume>, <fpage>8</fpage>. <pub-id pub-id-type="doi">10.1186/s40168-016-0154-5</pub-id> </citation>
</ref>
<ref id="B71">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sauer</surname>
<given-names>D. B.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>D. N.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Predicting the Optimal Growth Temperatures of Prokaryotes Using Only Genome Derived Features</article-title>. <source>Bioinformatics</source> <volume>35</volume> (<issue>18</issue>), <fpage>3224</fpage>&#x2013;<lpage>3231</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btz059</pub-id> </citation>
</ref>
<ref id="B72">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shaffer</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Borton</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>McGivern</surname>
<given-names>B. B.</given-names>
</name>
<name>
<surname>Zayed</surname>
<given-names>A. A.</given-names>
</name>
<name>
<surname>La Rosa</surname>
<given-names>S. L.</given-names>
</name>
<name>
<surname>Solden</surname>
<given-names>L. M.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>DRAM for Distilling Microbial Metabolism to Automate the Curation of Microbiome Function</article-title>. <source>bioRxiv</source> <volume>48</volume> (<issue>16</issue>). <fpage>8883</fpage>&#x2013;<lpage>8900</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkaa621</pub-id> </citation>
</ref>
<ref id="B73">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sharon</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Banfield</surname>
<given-names>J. F.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Microbiology. Genomes from Metagenomics</article-title>. <source>Science</source> <volume>342</volume> (<issue>6162</issue>), <fpage>1057</fpage>&#x2013;<lpage>1058</lpage>. <pub-id pub-id-type="doi">10.1126/science.1247023</pub-id> </citation>
</ref>
<ref id="B74">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sing</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Sander</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Beerenwinkel</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Lengauer</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>ROCR: Visualizing Classifier Performance in R</article-title>. <source>Bioinformatics</source> <volume>21</volume> (<issue>20</issue>), <fpage>3940</fpage>&#x2013;<lpage>3941</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bti623</pub-id> </citation>
</ref>
<ref id="B75">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Thompson</surname>
<given-names>L. R.</given-names>
</name>
<name>
<surname>Sanders</surname>
<given-names>J. G.</given-names>
</name>
<name>
<surname>McDonald</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Amir</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ladau</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Locey</surname>
<given-names>K. J.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>A Communal Catalogue Reveals Earth&#x27;s Multiscale Microbial Diversity</article-title>. <source>Nature</source> <volume>551</volume> (<issue>7681</issue>), <fpage>457</fpage>&#x2013;<lpage>463</lpage>. <pub-id pub-id-type="doi">10.1038/nature24621</pub-id> </citation>
</ref>
<ref id="B76">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Todd-Brown</surname>
<given-names>K. E. O.</given-names>
</name>
<name>
<surname>Hopkins</surname>
<given-names>F. M.</given-names>
</name>
<name>
<surname>Kivlin</surname>
<given-names>S. N.</given-names>
</name>
<name>
<surname>Talbot</surname>
<given-names>J. M.</given-names>
</name>
<name>
<surname>Allison</surname>
<given-names>S. D.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>A Framework for Representing Microbial Decomposition in Coupled Climate Models</article-title>. <source>Biogeochemistry</source> <volume>109</volume> (<issue>1</issue>), <fpage>19</fpage>&#x2013;<lpage>33</lpage>. <pub-id pub-id-type="doi">10.1007/s10533-011-9635-6</pub-id> </citation>
</ref>
<ref id="B77">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Turaev</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Rattei</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>High Definition for Systems Biology of Microbial Communities: Metagenomics Gets Genome-Centric and Strain-Resolved</article-title>. <source>Curr. Opin. Biotechnol.</source> <volume>39</volume>, <fpage>174</fpage>&#x2013;<lpage>181</lpage>. <pub-id pub-id-type="doi">10.1016/j.copbio.2016.04.011</pub-id> </citation>
</ref>
<ref id="B78">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Van Der Heijden</surname>
<given-names>M. G.</given-names>
</name>
<name>
<surname>Bardgett</surname>
<given-names>R. D.</given-names>
</name>
<name>
<surname>Van Straalen</surname>
<given-names>N. M.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>The Unseen Majority: Soil Microbes as Drivers of Plant Diversity and Productivity in Terrestrial Ecosystems</article-title>. <source>Ecol. Lett.</source> <volume>11</volume> (<issue>3</issue>), <fpage>296</fpage>&#x2013;<lpage>310</lpage>. <pub-id pub-id-type="doi">10.1111/j.1461-0248.2007.01139.x</pub-id> </citation>
</ref>
<ref id="B79">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vieira-Silva</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Rocha</surname>
<given-names>E. P.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>The Systemic Imprint of Growth and its Uses in Ecological (Meta)genomics</article-title>. <source>PLoS Genet.</source> <volume>6</volume> (<issue>1</issue>), <fpage>e1000808</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pgen.1000808</pub-id> </citation>
</ref>
<ref id="B80">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Violle</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Reich</surname>
<given-names>P. B.</given-names>
</name>
<name>
<surname>Pacala</surname>
<given-names>S. W.</given-names>
</name>
<name>
<surname>Enquist</surname>
<given-names>B. J.</given-names>
</name>
<name>
<surname>Kattge</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>The Emergence and Promise of Functional Biogeography</article-title>. <source>Proc. Natl. Acad. Sci. U. S. A.</source> <volume>111</volume> (<issue>38</issue>), <fpage>13690</fpage>&#x2013;<lpage>13696</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1415442111</pub-id> </citation>
</ref>
<ref id="B81">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Violle</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Navas</surname>
<given-names>M.-L.</given-names>
</name>
<name>
<surname>Vile</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Kazakou</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Fortunel</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Hummel</surname>
<given-names>I.</given-names>
</name>
<etal/>
</person-group> (<year>2007</year>). <article-title>Let the Concept of Trait Be Functional!</article-title>. <source>Oikos</source> <volume>116</volume>, <fpage>882</fpage>&#x2013;<lpage>892</lpage>. <pub-id pub-id-type="doi">10.1111/j.0030-1299.2007.15559.x</pub-id> </citation>
</ref>
<ref id="B82">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Bodovitz</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Single Cell Analysis: the New Frontier in &#x27;omics&#x27;</article-title>. <source>Trends Biotechnol.</source> <volume>28</volume> (<issue>6</issue>), <fpage>281</fpage>&#x2013;<lpage>290</lpage>. <pub-id pub-id-type="doi">10.1016/j.tibtech.2010.03.002</pub-id> </citation>
</ref>
<ref id="B83">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weider</surname>
<given-names>L. J.</given-names>
</name>
<name>
<surname>Elser</surname>
<given-names>J. J.</given-names>
</name>
<name>
<surname>Crease</surname>
<given-names>T. J.</given-names>
</name>
<name>
<surname>Mateos</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Cotner</surname>
<given-names>J. B.</given-names>
</name>
<name>
<surname>Markow</surname>
<given-names>T. A.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>The Functional Significance of Ribosomal (R)DNA Variation: Impacts on the Evolutionary Ecology of Organisms</article-title>. <source>Annu. Rev. Ecol. Evol. Syst.</source> <volume>36</volume> (<issue>1</issue>), <fpage>219</fpage>&#x2013;<lpage>242</lpage>. <pub-id pub-id-type="doi">10.1146/annurev.ecolsys.36.102003.152620</pub-id> </citation>
</ref>
<ref id="B84">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weimann</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Mooren</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Frank</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Pope</surname>
<given-names>P. B.</given-names>
</name>
<name>
<surname>Bremges</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>McHardy</surname>
<given-names>A. C.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>From Genomes to Phenotypes: Traitar, the Microbial Trait Analyzer</article-title>. <source>mSystems</source> <volume>1</volume> (<issue>6</issue>). <pub-id pub-id-type="doi">10.1128/mSystems.00101-16</pub-id> </citation>
</ref>
<ref id="B85">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weissman</surname>
<given-names>J. L.</given-names>
</name>
<name>
<surname>Hou</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Fuhrman</surname>
<given-names>J. A.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Estimating Maximal Microbial Growth Rates from Cultures, Metagenomes, and Single Cells via Codon Usage Patterns</article-title>. <source>Proc. Natl. Acad. Sci. U. S. A.</source> <volume>118</volume> (<issue>12</issue>). <pub-id pub-id-type="doi">10.1073/pnas.2016810118</pub-id> </citation>
</ref>
<ref id="B86">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Westoby</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Wright</surname>
<given-names>I. J.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>Land-plant Ecology on the Basis of Functional Traits</article-title>. <source>Trends Ecol. Evol.</source> <volume>21</volume> (<issue>5</issue>), <fpage>261</fpage>&#x2013;<lpage>268</lpage>. <pub-id pub-id-type="doi">10.1016/j.tree.2006.02.004</pub-id> </citation>
</ref>
<ref id="B87">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Weston</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Calaway</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2015</year>). <source>doMC: Foreach Parallel Adaptor for &#x2018;parallel&#x2019;</source>. </citation>
</ref>
<ref id="B90">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wickham</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Welcome to the Tidyverse</article-title>. <source>J. Open Source Softw.</source> <volume>4</volume> (<issue>43</issue>), <fpage>1686</fpage>. <pub-id pub-id-type="doi">10.21105/joss.01686</pub-id> </citation>
</ref>
<ref id="B88">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Wickham</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Henry</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Tidyr: Easily Tidy Data with &#x27;spread()&#x27; and &#x27;gather()&#x27; Functions</source>. </citation>
</ref>
<ref id="B89">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Wickham</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Francois</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Henry</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>M&#xfc;ller</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Dplyr: A Grammar of Data Manipulation</source>. </citation>
</ref>
<ref id="B91">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Wishart</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2003</year>). <source>K-Means Clustering with Outlier Detection, Mixed Variables and Missing ValuesExploratory Data Analysis in Empirical Research Studies in Classification, Data Analysis, and Knowledge Organization</source>. <publisher-loc>Berlin, Heidelberg</publisher-loc>: <publisher-name>Springer</publisher-name>. <pub-id pub-id-type="doi">10.1007/978-3-642-55721-7_23</pub-id> </citation>
</ref>
<ref id="B92">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Woodcroft</surname>
<given-names>B. J.</given-names>
</name>
<name>
<surname>Singleton</surname>
<given-names>C. M.</given-names>
</name>
<name>
<surname>Boyd</surname>
<given-names>J. A.</given-names>
</name>
<name>
<surname>Evans</surname>
<given-names>P. N.</given-names>
</name>
<name>
<surname>Emerson</surname>
<given-names>J. B.</given-names>
</name>
<name>
<surname>Zayed</surname>
<given-names>A. A. F.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Genome-centric View of Carbon Processing in Thawing Permafrost</article-title>. <source>Nature</source> <volume>560</volume> (<issue>7716</issue>), <fpage>49</fpage>&#x2013;<lpage>54</lpage>. <pub-id pub-id-type="doi">10.1038/s41586-018-0338-1</pub-id> </citation>
</ref>
<ref id="B93">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yabuuchi</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>Current Topics on Classification and Nomenclature of Bacteria. 7. Taxonomic Outline of Archeae and Bacteria in the Second Edition of Bergey&#x27;s Manual of Systematic Bacteriology</article-title>. <source>Kansenshogaku Zasshi</source> <volume>75</volume> (<issue>8</issue>), <fpage>653</fpage>&#x2013;<lpage>655</lpage>. <pub-id pub-id-type="doi">10.11150/kansenshogakuzasshi1970.75.653</pub-id> </citation>
</ref>
<ref id="B94">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yin</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Mao</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Mao</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>dbCAN: a Web Resource for Automated Carbohydrate-Active Enzyme Annotation</article-title>. <source>Nucleic Acids Res.</source> <volume>40</volume>, <fpage>W445</fpage>&#x2013;<lpage>W451</lpage>. <comment>Web Server issue</comment>. <pub-id pub-id-type="doi">10.1093/nar/gks479</pub-id> </citation>
</ref>
<ref id="B95">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Roles of Hsp70s in Stress Responses of Microorganisms, Plants, and Animals</article-title>. <source>Biomed. Res. Int.</source>, <fpage>510319</fpage>. <pub-id pub-id-type="doi">10.1155/2015/510319</pub-id> </citation>
</ref>
<ref id="B96">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zeldovich</surname>
<given-names>K. B.</given-names>
</name>
<name>
<surname>Berezovsky</surname>
<given-names>I. N.</given-names>
</name>
<name>
<surname>Shakhnovich</surname>
<given-names>E. I.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Protein and DNA Sequence Determinants of Thermophilic Adaptation</article-title>. <source>PLoS Comput. Biol.</source> <volume>3</volume> (<issue>1</issue>), <fpage>e5</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.0030005</pub-id> </citation>
</ref>
<ref id="B97">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhalnina</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Louie</surname>
<given-names>K. B.</given-names>
</name>
<name>
<surname>Hao</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Mansoori</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>da Rocha</surname>
<given-names>U. N.</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Dynamic Root Exudate Chemistry and Microbial Substrate Preferences Drive Patterns in Rhizosphere Microbial Community Assembly</article-title>. <source>Nat. Microbiol.</source> <volume>3</volume> (<issue>4</issue>), <fpage>470</fpage>&#x2013;<lpage>480</lpage>. <pub-id pub-id-type="doi">10.1038/s41564-018-0129-3</pub-id> </citation>
</ref>
<ref id="B98">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zimmerman</surname>
<given-names>A. E.</given-names>
</name>
<name>
<surname>Martiny</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>Allison</surname>
<given-names>S. D.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Microdiversity of Extracellular Enzyme Genes Among Sequenced Prokaryotic Genomes</article-title>. <source>ISME J.</source> <volume>7</volume> (<issue>6</issue>), <fpage>1187</fpage>&#x2013;<lpage>1199</lpage>. <pub-id pub-id-type="doi">10.1038/ismej.2012.176</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>