Computational Biology and Machine Learning Approaches to Understand Mechanistic Microbiome-Host Interactions

Sudhakar, Padhmanand; Machiels, Kathleen; Verstockt, Bram; Korcsmaros, Tamas; Vermeire, Séverine

doi:10.3389/fmicb.2021.618856

REVIEW article

Front. Microbiol., 11 May 2021

Sec. Systems Microbiology

Volume 12 - 2021 | https://doi.org/10.3389/fmicb.2021.618856

Computational Biology and Machine Learning Approaches to Understand Mechanistic Microbiome-Host Interactions

1. Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID), KU Leuven, Leuven, Belgium
2. Earlham Institute, Norwich, United Kingdom
3. Quadram Institute Bioscience, Norwich, United Kingdom
4. Department of Gastroenterology and Hepatology, University Hospitals Leuven, KU Leuven, Leuven, Belgium

Article metrics

View details

Citations

13k

Views

3,7k

Downloads

Abstract

The microbiome, by virtue of its interactions with the host, is implicated in various host functions including its influence on nutrition and homeostasis. Many chronic diseases such as diabetes, cancer, inflammatory bowel diseases are characterized by a disruption of microbial communities in at least one biological niche/organ system. Various molecular mechanisms between microbial and host components such as proteins, RNAs, metabolites have recently been identified, thus filling many gaps in our understanding of how the microbiome modulates host processes. Concurrently, high-throughput technologies have enabled the profiling of heterogeneous datasets capturing community level changes in the microbiome as well as the host responses. However, due to limitations in parallel sampling and analytical procedures, big gaps still exist in terms of how the microbiome mechanistically influences host functions at a system and community level. In the past decade, computational biology and machine learning methodologies have been developed with the aim of filling the existing gaps. Due to the agnostic nature of the tools, they have been applied in diverse disease contexts to analyze and infer the interactions between the microbiome and host molecular components. Some of these approaches allow the identification and analysis of affected downstream host processes. Most of the tools statistically or mechanistically integrate different types of -omic and meta -omic datasets followed by functional/biological interpretation. In this review, we provide an overview of the landscape of computational approaches for investigating mechanistic interactions between individual microbes/microbiome and the host and the opportunities for basic and clinical research. These could include but are not limited to the development of activity- and mechanism-based biomarkers, uncovering mechanisms for therapeutic interventions and generating integrated signatures to stratify patients.

Introduction: Microbiome-Host Interactions

Across different niches and ecosystems, micro-organisms including bacteria, viruses, archaea inhabit a wide range of hosts (Braga et al., 2016). This community of microbes imparts various functions such as making nutrients accessible to the host (Martin et al., 2019), modulating the host immune system (Mendes et al., 2019), warding off pathogens (Pickard et al., 2017), maintaining homeostasis (Ohland and Jobin, 2015; Penny et al., 2018) among others. These functions are in turn driven primarily by molecular interactions between microbial and host molecules such as proteins, RNA and metabolites (Hughes and Sperandio, 2008; Braga et al., 2016). Deciphering these interactions could not only reveal the microbe-host cross-talk but also provide us with insights into formulating therapeutic strategies aimed at maintaining health and/or ameliorating disease states. The past decades have witnessed a surge in research interest to study microbial communities (and their interactions) which inhabit various niches – from the gut to the soil ecosystem. This was made possible by technological advancements leading to plummeting costs of 16S and metagenomic sequencing, higher sequencing depth and resolution (Levy and Myers, 2016; Jacob et al., 2019; Valli et al., 2020), novel in vitro systems (Shah et al., 2016; Eain et al., 2017; May et al., 2017), and new methodologies for high-throughput profiling of multiple -omic data types such as metaproteomics, metabolomics, lipidomics (Muller et al., 2013; Roume et al., 2015). However, due to many other limitations related to scale, scope, feasibility and sample availability for parallel omic read -outs, experimentally determining the inter-species microbe-host interactions is a challenging task (Fritz et al., 2013). Computational methods can overcome some of these limitations thereby enhancing our understanding of microbe-host interactions (Dix et al., 2016). In this review, we outline some key concepts, tools, and methods involved in computationally inferring the molecular mechanisms mediating microbe-host interactions.

Biological Networks: Concepts and Applications

Biological networks represent relationships (termed edges) between any two biological entities (species, organisms, and molecules, etc.) which are usually called as nodes. At the level of molecules (genes, proteins, metabolites, RNAs, and small molecules, etc.), biological networks could either denote the physical interactions (e.g., protein–protein, protein-DNA, and RNA-protein, etc.) between molecules or any measure of association (e.g., co-expression and co-occurrence) between molecules (Gosak et al., 2018). In this paper, we will refer only to physical interactions. Physical interactions can be classified based on various criteria such as molecular types (protein–protein, protein-DNA, and RNA-protein, etc.), experimental scale (high-throughput or low-throughput), source (experimentally determined or computationally predicted), directionality (directed or undirected), relational signs (positive or negative relationships) and coverage (genome-wide or targeted). Since biological networks provide the larger context in which genes or proteins tend to exert their action, researchers can thereby fine-tune their hypotheses. Networks have largely been used in the domain of biological sciences (a) as a scaffold to integrate either singular or multiple contextual -omic datasets such as gene expression, proteomics, etc., measured in response to intrinsic or extrinsic stimuli (Charitou et al., 2016), (b) as a graph to trace potential signaling and regulatory pathways connecting any two nodes (Azeloglu and Iyengar, 2015), (c) to perform functional analysis at a local or global level (Emmert-Streib and Glazko, 2011), (d) to reconstruct the networks of non-model organisms from those of model organisms (Thompson et al., 2015), (e) to discover drug and disease targets (Huang et al., 2018), and (f) to infer globally or locally conserved signatures such as modules, motifs, etc (Wong et al., 2012). Various resources of molecular interactions and tools for integrative network analysis have been compiled and developed by the research community of network biologists. Since a very detailed description of the resources and tools is out of scope of the current review, readers are hereby referred to Pedamallu and Ozdamar (2014), Miryala et al. (2018), Romano et al. (2019).

Due to their utility in capturing contextual backgrounds and communication between molecular entities, biological networks have been used to not only study intra-species interactions but also inter-species cross-talks. Molecular ecological networks (Deng et al., 2012; Heleno et al., 2014) are a case in point by which the concept of networks are used to study the interactions between molecules (derived from different species or even kingdoms) in a larger ecological context (Yang et al., 2017; Meyer et al., 2020; Yu et al., 2020; Zheng et al., 2020). At the very core of it, a typical molecular ecological network inference workflow (Zhou et al., 2010; Deng et al., 2012; Chen et al., 2017) starts with the generation of meta -omic datasets (such as metagenomics, metatranscriptomics, and metaproteomics, etc.) followed by differential abundance testing between samples from contrasting conditions. Various measures of correlations and associations can then be applied to determine the distance between samples based on the differences and similarities in terms of the molecular features measured in the -omic datasets across the sample classes. Such correlations or associations can be used as a primary point of reference to investigate the possibility of mechanistic interactions which could in turn be driving the associative relationships. Furthermore, a network based representation of the feature-space can be used to compare samples with each other or to associate network properties such as the presence of motifs and modules to higher-level ecological traits/phenotypes. However, since molecular ecological networks do not directly infer molecular mechanisms which is the topic of this review, a detailed discussion on the topic is not undertaken.

Computational Methods in Microbiome-Host Interactions: Filling the Gaps

Computational methods bring in various advantages to the analysis of interactions between the host and individual microbes and/or the microbial community. These include their attributes of (a) enhancing scalability, i.e., perform the computational inferences for a large number of variables and samples, (b) improving reproducibility (if complemented by inter-operability, automation, proper version control and sufficient documentation), (c) assessing performance by using a series of metrics, (d) shortlisting and prioritizing interactions, (e) and thereby (f) enabling the fine-tuning of hypothesis for experimental and/or epidemiological studies. Although most of the methods hitherto have focused on inferring the interactions between individual microbial species (mostly well studied pathogens) and the host, a few methods have been developed to predict the interactions at a community level. In principle, many of the methods which have been used to infer interactions of single species can be scaled up (with appropriate modifications) to infer community level interactions.

Classification of Computational Methods in Microbiome-Host Interactions

From a mechanistic view-point, the most widely studied interaction types in interspecies cross-talks include (a) microbial metabolite-mediated networks, (b) protein–protein interactions (PPIs), and (c) RNA-mediated interactions. Accordingly, many of the computational methods developed to investigate microbe-host interactions have focused on the three above-mentioned interaction types (Figure 1). As a fourth method approach, integrated pipelines combine multiple microbial and host -omic data types and networks to infer the cumulative functional effects of inter-species interactions/communication on the host.

FIGURE 1

Approaches Inferring Mechanistic Metabolic Interactions

The metabolomic layer (which comprises the enzymes, metabolites, and the reactional interactions between them) has a prominent influence on both health and disease states associated with alterations in microbiota composition (Wong et al., 2016; Martinez et al., 2017). Metabolic networks can thus represent and capture the underlying mechanisms driving various phenotypes (Pey et al., 2013; Samal et al., 2017; Zampieri et al., 2019). Computational approaches aimed at inferring the microbe-host co-metabolic networks can be classified into three prominent categories namely (a) Community-wide metabolic network modeling using metagenomic datasets: this approach is based on the assumption that the metagenomic read-outs represent the gene-distribution structure of the entire microbial community. The autonomy of species – i.e., information about which gene is derived from which species, are disregarded. Thus, the metabolic network reconstructed using this approach consists of relationships (reactions) catalyzed by enzymes (encoded by the measured genes) between molecular entities (metabolites) at a community level. (b) High throughput data driven approaches using metabolic datasets – this data-driven methodology uses targeted or untargeted profiling of metabolites from different groups of samples. Subsequently, multi-variate modeling methods and various statistical methods including simple PCAs are applied to identify biomarkers which distinguish different sample groups from each other. (c) Genome scale reconstruction applying constraint-based modeling approaches which are described below. The first two methods do not provide direct mechanistic insights and hence are not covered further in this review.

Genome-scale reconstruction models provide mechanistic information by integrating multiple inputs. These inputs include the curated genome scale metabolic models of both the host and microbial species, high-throughput meta -omic datasets including metabolites, reaction fluxes, biochemical traits and accessory phenotypic data. However, due to the strenuous nature of various steps involved in constructing the models and in scaling it up to multiple species or multiple hosts, only a handful of studies have applied this concept to infer microbe-host co-metabolic interactions (Table 1). The AGORA (assembly of gut organisms through reconstruction and analysis) collection is a resource of genome-scale metabolic models for 773 human gut bacterial species using a combination of metagenomics and experimental data from literature. Furthermore, the framework employed by AGORA is amenable to scale-up given its easy adaptability to novel species of interest. AGORA also serves as a source of genome scale metabolic models reconstructed in a standardized manner. Thus, various studies have in turn used the genome scale models from the AGORA resource to construct context-specific models (Bauer et al., 2017; Bunesova et al., 2018; Tramontano et al., 2018; Pryor et al., 2019; Yilmaz et al., 2019). Recently, the authors of AGORA and their collaborators extended the framework to 7206 strains by incorporating information on the drug-metabolizing potential of the bacterial strains (Heinken et al., 2020).

TABLE 1

Study	Context
Rodenburg et al. (2019)	Integrated metabolic model of P. infestans infecting tomato (S. lycopersicum)
Islam et al. (2019)	Genome-scale metabolic model between key members in the rumen microbiome and the viral phages
Hertel et al. (2019)	Integrated constraint-based model revealing microbe-host interactions in Parkinson’s Disease
Aller et al. (2018)	Genome-scale model integrating biochemical demands arising from virus production and human macrophage cell metabolism
Ding et al. (2016)	Simulation of co-metabolic model of different enteropathogens in response to various host environments
Heinken and Thiele (2015)	In silico microbe-host gut co-metabolic model to predict effects of different host dietary schemes
Heinken et al. (2013)	Experimentally validated gut co-metabolic model between commensal bacterium B. thetaiotaomicron and mouse
Bordbar et al. (2010)	Francisella tularensis infecting human alveolar macrophage supported by high-throughput data from infected conditions

Studies using genome-scale metabolic models and constraint based approaches to infer mechanistic co-metabolic interactions between microbial and host species.

The reported studies on genome-scale reconstruction models have been distributed across many different ecological contexts such as the human and rumen gut ecosystems (Islam et al., 2019), microbe-plant interactions, human alveolar macrophages, the effect of viral demands on the metabolism of human macrophages, microbe-host interactions in Parkinson’s Disease to name a few. Due to the mechanistic nature of such models, they can be used as a template for further integrating other -omic datasets. This not only refines the models thereby increasing their predictive power but also assigns contextuality.

By incorporating the individual reconstructed metabolic models of tomato (Solanum lycopersicum) and the tomato late blight pathogen Phytophthora infestans, Rodenburg et al. (2019) pointed out specific pathways which mediate the dependencies of the pathogen on the metabolism of S. lycopersicum. The individual metabolic models for S. lycopersicum and P. infestans were derived by manually adding reactions and sub-cellular localization of metabolites and reactions (based on curation of literature) to the corresponding genome-scale models. Furthermore, by over-laying dual RNA-seq transcriptomic datasets from the host-pathogen duo into the co-metabolic network, various metabolic changes characterizing the scavenging nature of P. infestans were revealed. A similar study was performed in a mammalian setting wherein co-metabolic interactions and metabolic exchanges were inferred between the respiratory pathogen Mycobacterium tuberculosis and human alveolar macrophages (Bordbar et al., 2010). The metabolic model for the alveolar macrophages was derived from Recon1, the global human metabolic model (Thiele et al., 2013b). Briefly, a curated version of Recon1 was overlaid with gene expression data for healthy, inactivated alveolar macrophages and combined with information on flux limits for major pathways of central metabolism and a host of heterogeneous datasets such as immunohistological staining, transporter proteins, etc (Bordbar et al., 2010). The macrophage model was then combined with that of Francisella tularensis and corrected for compartment-specific reactions and metabolites. Unsurprisingly, given the advancement in terms of data generated and metabolic models made available, most of the genome-scale metabolic reconstruction studies (Table 1) were carried out for the gut ecosystem (Heinken et al., 2013; Heinken and Thiele, 2015; Ding et al., 2016; Islam et al., 2019).

Other microbe-host co-metabolic studies have been performed using publicly available tools based on constraint-based modeling approaches. The Constraint-based reconstruction and analysis (COBRA) toolbox (Heirendt et al., 2019) is one such compendium of methods containing various user-guided steps to reconstruct genome-scale metabolic models. It is characterized by properties such as interoperability, customized reconstruction, modeling, visualization, modeling, simulation, and integration of -omic datasets in various contexts (compartments, cell-types, etc.). By harnessing these properties, researchers have used the COBRA toolbox to model and investigate microbe-host metabolic interactions (Heinken et al., 2013; Thiele et al., 2013a) in the context of mammalian health with implications on human health. A representative study of the gut ecosystem using the COBRA toolbox integrated two previously published constraint-based models of mouse and a gut commensal Bacteroides thetaiotaomicron (Heinken et al., 2013). The B. thetaiotaomicron model was generated by the manual curation of a seed model produced by Model Seed (Henry et al., 2010) from the genome sequence annotated using RAST (Aziz et al., 2008) (which is a prokaryotic genome annotation tool). The mouse metabolic model was compiled by integrating a previously annotated and reconstructed model with gene essentiality data from experiments followed by corrections for duplicate reactions. The two models were then brought together by setting rules based on the subcellular localization of metabolites and reactions. The integrated metabolic model could capture many of the phenotypes exhibited in vivo namely the dependence of B. thetaiotaomicron on glycans derived from the metabolism of the host as well as the host diet itself (Heinken et al., 2013). It is noteworthy to mention that the authors also introduced novel methodologies such as Pareto analysis to complement the power of the COBRA toolbox. Pareto analysis is a bi-objective linear programming-based methodology which enables the analysis and identification of growth dependencies and trade-offs between the microbe and the host as captured by their metabolic networks.

A similar study (Hertel et al., 2019) was performed using the COBRA toolbox in conjunction with other supplementary tools such as the Microbiome Modeling Toolbox (Baldini et al., 2019) which can integrate the individual reconstructed models together into one reconstructed model in addition to other useful properties (such as inferring interactions by taxa, reconstruction of pairwise/community co-metabolic networks, compartment-based modeling, pareto analysis, and various downstream operations) to extend the constraint-based modeling framework. The study integrated the microbiome and longitudinal metabolomic datasets from patients with Parkinson’s disease (Hertel et al., 2019). This microbiome-host -omic integration study provided clues as to how alterations in particular co-metabolized pathways (by both the host and microbiome) such as sulfur metabolism could contribute to the varying severity of the disease. In particular, the authors were able to identify that changes in the co-metabolized pathways could be driven by particular members of the gut microbiota. This opens up possibilities to design gut microbiome-based therapies to treat or even prevent Parkinson’s disease.

Approaches Inferring Protein–Protein Interactions (PPIs)

Protein–protein interactions are one of the most well-studied interaction types mediating inter-species communication (Schweppe et al., 2015). Accordingly, a large number of computational microbe-host interaction studies have focused on PPIs. Congruently, PPI-based approaches have also been propelled by the adoption of concepts from other domains of computational biology and computational sciences in general. Hence, PPI-based approaches can be sub-classified into four predominant methods (Table 2) depending on the concepts used (1) Machine learning based PPI methods, (2) Structural feature based PPI methods, (3) Data/Literature mining based PPI methods, and (4) Interolog based PPI methods. In this section, we provide a brief overview of the concepts involved in each of these methods (Table 2) and provide a few representative examples.

TABLE 2

Method and corresponding studies	Reported use-case (host-microbe)
Machine learning based methods
Leite et al. (2018)	Bacteria–phage
Tastan et al. (2009); Qi et al. (2010), Dyer et al. (2011); Nouretdinov et al. (2012), Shoombuatong et al. (2012); Mei (2013), Hongjaisee et al. (2019)	Human–HIV
Kshirsagar et al. (2013)	Human–F. tularensis, Human–Y. pestis, Human–B. anthracis, Human-S. typhi
Wuchty (2011)	Human–Plasmodium falciparum
Kösesoy et al. (2019)	Human–Y. pestis, Human–B. anthracis
Cui et al. (2012); Emamjomeh et al. (2014), Kim et al. (2017)	Human–Hepatitis C virus
HOPITOR (Basit et al., 2018)	Generic (Human–virus PPIs)
Liao et al. (2011)	Human–Schistosoma japonicum
Mei et al. (2018); Sun et al. (2018)	Human–Francisella tularensis
Kargarfard et al. (2016)	3 hosts and 674 influenza strains
Cui et al. (2012); Dong et al. (2015), Kim et al. (2017)	Human–Human papillomavirus
Lai et al. (2012)	Human–Influenza A virus
Mei and Zhu (2014a)	Human–HTLV retroviruses
Mei and Zhu (2014b)	Human–Salmonella
Lian et al. (2019)	Human–Y. pestis
Structural feature based methods (features used)
Dyer at al. (2007) (DDI)	Human–Plasmodium falciparum
Nourani et al. (2016) (DDI)	Human–multiple viruses
Sudhakar et al. (2019) (DDI and DMI)	Human–multiple bacterial pathogens
Doolittle and Gomez (2011) (PSS)	Human–Dengue virus, Aedes aegypti–Dengue virus
Cui et al. (2016) (PSS)	Human–HIV, Human–Francisella tularensis
P-HIPSTer (Lasso et al., 2019) (PSS)	Human–multiple viruses
Chen at al. (2019) (PSS)	Human–Dengue virus 2, Human–West Nile virus
Guven-Maiorov et al. (2017) (Mimicry)	Human–Helicobacter pylori
Mahajan and Mande (2017) (DDI)	Human–Francisella tularensis
Zhang et al. (2017a) (DMI)	Grass carp–Grass carp reovirus
Mehrotra et al. (2017) (PSS, DDI, and localization)	Human–Leptospira interrogans, Human–Leptospira biflexa
Halehalli and Nagarajaram (2015) (DDI, DMI)	Human–multiple viruses
SugarBindDB (Mariethoz et al., 2016) (glycan mediated PPIs)	Generic
Rajasekharan et al. (2013) (PSS)	Human–Chandipura virus
Carducci et al. (2010) (DDI)	Human–papillomavirus type 16
Franzosa and Xia (2011) (PSS and sequence identity)	Human–multiple viruses
Sahu et al. (2014) (DDI)	Arabidopsis-Pseudomonas syringae
Zhou et al. (2018) (DDI)	Human–Dengue virus, Aedes aegypti–Dengue virus
Kim et al. (2017) (DDI)	Human–multiple viruses
Kerr et al. (2015) (Computational docking)	Human–Dengue virus 2, Human–West Nile virus
Evans et al. (2009) (DMI)	Human–HIV
Doxey and McConkey (2013) (Mimicry)	Human–Francisella tularensis
Mei and Zhang (2020) (Mimicry)	Human-S. typhimurium and Human-Human respiratory syncytial virus
Data/Literature mining based methods
Thieu et al. (2012)	Generic
Viruses.STRING (Cook et al., 2018)	319 hosts and 239 viruses
Li et al. (2018)	Human–Epstein-Barr virus
Saik et al. (2016)	Human–Hepatitis C virus
García-Pérez et al. (2018)	Human–Influenza A virus
“Interolog” based methods
Krishnadev and Srinivasan (2008); Lee et al. (2008)	Human–Plasmodium falciparum
Krishnadev and Srinivasan (2011)	Human–E. coli, Human– S. typhimurium, Human–Y. pestis
Tyagi et al. (2009)	Human–Helicobacter pylori
Cui et al. (2016)	Human–HIV, Human–Francisella tularensis
Schleker et al. (2012)	Human–Salmonella, Salmonella–A. thaliana
Li et al. (2012)	A. thaliana–Ralstonia solanacearum
Wallqvist et al. (2017)	Human–Coxiella burnetii
Cuesta-Astroz et al. (2019)	Human and 15 eukaryotic parasites
Zhou et al. (2014); Cui et al. (2016)	Human–Francisella tularensis
Barh et al. (2013)	Human–Corynebacterium pseudotuberculosis, Human–Corynebacterium diphtheriae, Human–Francisella tularensis, Human–Corynebacterium ulcerans, Human–Y. pestis, and Human–E. coli

Computational approaches and methods inferring protein–protein interactions mediating inter-kingdom cross-talk between microbial and host organisms.

DDI, domain–domain interaction; DMI, domain-motif interaction; PSS, pairwise structural similarity. Supplementary Table 1 provides further details into the novelty of the methods and results.

Structural Feature Based PPI Methods

Interactions between proteins are usually a by-product of physical interactions between structural features of the proteins and/or could be characterized indirectly by co-occurring functional features of the proteins (Ding and Kihara, 2018). Structural features of the proteins include their domain and motif architectures/compositions, amino acid composition and frequencies, post-translational modification signatures, amino acid k-mers, mimicry motifs and 3D structural properties (Ding and Kihara, 2018). Structural feature-based PPI prediction, applied initially for intra-species PPIs, was subsequently extended to inter-species studies. Essentially, the fundamental principle on which structural feature-based PPI prediction methods work involves the use of mechanistic evidence between structural features to identify potentially interacting proteins. These could include for example interactions between domains, between domains and motifs, post-translational modifications and pairwise structural similarity (Ding and Kihara, 2018). Such structural studies have been confined to considerably well studied species pairs involving H. sapiens and prominent viral and bacterial pathogens (Table 2). Along with pairwise structural similarity-based methods using 3D protein complexes, domain–domain interaction (DDI) and domain-motif interaction (DMI) based methods are one of the most commonly used methods within the structural feature based methodological framework for predicting inter-species PPIs. Due to the ease of annotating domains and motifs, DDI- and DMI-based methods have been harnessed widely (Table 2). While DDI based methods have been applied to infer PPIs for a large number of species-pairs including Human–Plasmodium falciparum (Dyer et al., 2007), Human–Francisella tularensis (Zhou et al., 2013; Mahajan and Mande, 2017), Human–Leptospira interrogans (Mehrotra et al., 2017), Human–Leptospira biflexa (Mehrotra et al., 2017), Human–papillomavirus type 16 (Carducci et al., 2010), Arabidopsis–Pseudomonas syringae (Sahu et al., 2014), Rice–Xanthomonas oryzae (Kim et al., 2008), they have the inherent disadvantage of not being able to explicitly discern directionality.

On the other hand, DMIs provide directionality for PPIs, thus indicating the flow of signal transduction (Akiva et al., 2012; Gibson et al., 2015). For example, if a microbial protein A contains a domain known to be interacting with a motif on the host protein B, it is graphically represented as A > B, translating into “microbial protein A modulates host protein B.” Due to their specificity, DMI-based methods are preferred over DDI based methods for research questions seeking to answer the role of post-translational modifications elicited on host proteins by microbial proteins or vice versa. However, due to the short sequence length of protein sequence motifs, even the most stringent search strategies have the tendency to result in thousands of false-positive hits while performing motif searches on a proteome-wide basis (Perkins et al., 2010; Idrees et al., 2018). Therefore, proper quality controls need to be applied to filter out false-positives based on structural properties such as the occurrence of truly interacting motifs within disordered regions and outside globular domains (Perkins et al., 2010; Idrees et al., 2018; Figure 2).

FIGURE 2

Several studies (Table 2) have been conducted to apply the principles of DMIs to predict PPIs for multiple microbe-host species-combinations including grass carp-grass carp reovirus (Zhang et al., 2017a), human-multiple bacterial pathogens (Sudhakar et al., 2019) and human-multiple viruses (Evans et al., 2009; Halehalli and Nagarajaram, 2015). By integrating DMI predictions between grass carp and grass carp reovirus (GCRV) proteins with differential gene expression and tissue-specific gene expression followed by functional enrichment, Zhang et al. (2017a) were able to pinpoint several signaling pathways modulated by GCRV. The authors also highlight an enrichment of host genes expressed in the intestinal niche suggesting that GCRV might have a higher influence on the gut. Recently, we conducted a study (Sudhakar et al., 2019) using DDI and DMI based methods to identify cross-talks between several bacterial pathogens including Salmonella and autophagy – a prominent biological process involved in host cellular homeostasis. Firstly, to identify microbial proteins targeted by selective autophagy, we scanned the bacterial proteins for the presence of the recognition motifs corresponding to the selective autophagy receptors p62 and NDP52 and the autophagy adapter protein LC3. Conversely, to infer the modulation of host autophagy by the bacterial pathogens, DMI and DDI based methods were used to identify the bacterial proteins which are able to bind to/modulate the 37 core autophagy host proteins. By overlapping the two above-mentioned sets of predictions, bacterial proteins involved in interplays were identified. Such bacterial proteins are also targeted by the host autophagy machinery for clearance and degradation. This was followed by experimentally verifying the effect on autophagy of a Salmonella protease involved in human-Salmonella interplay.

A variation of the motif-based methodologies is the use of motifs to characterize pathogen mimicry. This essentially involves the identification of eukaryotic linear motifs on microbial proteins which in turn can hijack host proteins and thereby promote antagonistic binding (Hurford and Day, 2013; Via et al., 2015). Motif-mediated molecular mimicry therefore rewires the host signaling and regulatory networks by titrating essential host proteins and enabling the microbe to create favorable micro-environments in the host cell by altering immune responses for example (Cusick et al., 2012). In addition to motifs, molecular mimicry can also be mediated at the level of protein, structural and interface levels. At the protein level, specific studies investigating the role of molecular mimicry in the pathogenesis of prominent bacterial pathogens (Doxey and McConkey, 2013) including Salmonella typhimurium and Human respiratory syncytial virus (Mei and Zhang, 2020) have been carried out (Table 2). At the interface level, Guven-Maiorov et al. (2017) devised a computational method to infer mimicry induced by a prominent gastric cancer causing pathogen Helicobacter pylori. Besides DDI and DMI based methods, researchers have also used other structure-based methodologies such as pairwise structural similarity (PSS) to predict inter-species PPIs. PSS methods at their very core are based on the premise that proteins possessing similar structures have a greater probability of interacting with the same set of protein partners (Ding and Kihara, 2018). This has been applied to infer the interactions with the host of various pathogens such as Dengue virus (Doolittle and Gomez, 2011), HIV (Cui et al., 2016), Francisella tularensis (Cui et al., 2016), West Nile virus (Chen et al., 2019), Chandipura virus (Rajasekharan et al., 2013), and other viral pathogens (Franzosa and Xia, 2011; Lasso et al., 2019).

As a means of ensuring proper quantitative evaluation of de novo PPI predictions, emerging computational methods such as machine learning have been used in conjunction with structural-feature based PPI prediction methods. In order to avoid repetitions, methods using ML for evaluating the performance of structural feature dependent PPI predictions are discussed in the next subsection.

Machine Learning Based PPI Methods

Due to their ability to discern complex patterns among a large number of features in big datasets, machine learning (ML) methods have found favor in various applications of computational biology and bioinformatics (Shastry and Sanjay, 2020) including the prediction of microbe-host molecular interactions. A variety of supervised and unsupervised methods have been used to predict the interactions between microbial and host proteins (Table 2). In general, supervised machine learning methods utilize features from “gold-standard” interaction datasets to identify potential protein–protein interaction pairs from the user provided list of microbial and host proteins (Zhang et al., 2017b). In supervised methods, the “gold-standard” datasets are either compiled from high-throughput experimental methodologies or from curated lists of interactions from the literature (Zhang et al., 2017b). In the case of ML being used in combination with “interolog” based methods (explained in section 5.2.4), “gold-standard” PPI datasets can also be retrieved from other related or unrelated microbe-host species pairs depending on the scope of the study. Some of the features used to infer de novo PPI predictions include protein properties such as post-translational modifications, chemical composition, tissue distribution, molecular weight, domain/motif compositions, ontologies, gene expression, amino-acid frequencies, homology to human binding partners, and relevance of proteins in host network. By using these features, supervised methods are able to discern truly interacting protein pairs from all possible pairs of microbial and host proteins (Zhang et al., 2017b).

Supervised methods can also be differentiated by the kind of ML methodology/model used for the task of rightly classifying truly interacting protein pairs. Several supervised studies employing individual ML models [such as I2-regularized logistic regression (Mei et al., 2018), random forests (RF) (Kösesoy et al., 2019), etc], support vector machine (SVM) (Cui et al., 2012; Shoombuatong et al., 2012; Kim et al., 2017) have been applied to infer PPIs between microbial and host species. SVMs use a framework of searching and finding the best hyperplane (aka decision boundary represented by a mathematical equation) to separate sample with different labels corresponding to a class. Several variations of the SVM exist to handle data with underlying linear or non-linear relationships (Byvatov and Schneider, 2003).

Using four different ML models namely RF, SVM, Artificial Neural Networks (ANN) and K-Nearest Neighbors (K-NN), and multiple lines of -omic evidence including experimental PPIs as predictive features, Leite et al. (2018) devised a model based on a supervised protocol to accurately predict bacterium-phage interactions. The model, a type of ensemble learning, due to its generic nature, can also be used to predict interactions between any two given species, given the availability of informative feature sets. Ensemble learning (Che et al., 2011), combines multiple individual classifiers to achieve a final classification and has been used to predict PPI based HIV-human and hepatitis C virus-human networks (Mei, 2013; Emamjomeh et al., 2014). Ensemble classification methods outperform individual classifiers based on several use-cases (Krawczyk, 2015; Haque et al., 2016; Yijing et al., 2016; Lin et al., 2019) and can be generalized into three distinct categories namely bagging, boosting and stacked generalization. The last of the three approaches, stacked generalization, was used by Emamjomeh et al. (2014) to predict PPIs between human and the hepatitis C virus. While bagging assigns training sets to individual classifiers based on a random selection of the initial training dataset with replacement for subsequent sampling runs, boosting involves the creation and evaluation of classifiers in a sequential manner, with the succeeding classifier assigning more weights to the misclassification errors committed by the preceding classifier. The “boosted” weights are then normalized for all the instances in the entire dataset which is then used as the training dataset for the next classifier after which the final classification step is carried out based on the weighted individual classifiers. The stacked generalization methodology is designed to overcome some of the errors committed by the individual classifiers even if they are used in the ensemble framework. The stacked approach achieves this by using a “stacks” of base learners so that its output is the input for a meta-learner which knows how best to combine the base learners’ outputs. The training data may or may not overlap between the two stacks and can be specified accordingly.

Various auxiliary algorithms have been used in conjunction with machine learning methods to predict inter-species PPIs. An example of such a study includes the use of a novel protein sequence based feature extraction method called Location Based Encoding (LBE) with different classifier models including RFs. Such integrated methodologies have been used to predict protein interactions with the human host of two important pathogens – Bacillus anthracis and Yersinia pestis (Kösesoy et al., 2019). LBE is a methodology which complements the ML approaches for PPIs by differentiating proteins only based on the locations of the amino acids in the sequence (Li et al., 2009).

Supervised methods are sometimes constrained due to the small size of “gold-standard” datasets that restricts the inference and prediction of proteome-wide PPIs between the full list of proteins of any two given species. Mei and Zhu (2014a) harness the power of multi-instance AdaBoost, a type of boosting-based ensemble learning protocol, which is a multi-instance learning based ML method, to reconstruct proteome-wide Human T-cell leukemia virus-human PPI networks using homology knowledge derived protein features. AdaBoost improves classification performance by combining multiple weak classifiers into one strong classifier. It works in part by assigning more weight to instances which can only be classified with greater difficulty than to instances which can be easily classified (Kim et al., 2012). The dearth of true interacting protein-pairs has also prompted researchers to use unsupervised or semi-supervised approaches to infer microbe-host PPIs. Qi et al. (2010) complement the list of true interactions with a list of protein-pairs wherein association evidence exists with no interaction evidence between the proteins of a pair. Supervised learning is performed thereafter with a multilayer perceptron network and by using the true interaction list. Subsequently, the semi-supervised approach uses the same network layers of the supervised classifier but instead trains on the protein-pairs with association evidence only. By using this hybrid approach, the authors report improved performance for predicting interactions between HIV and human proteins (Qi et al., 2010).

Data/Literature Mining Based PPI Methods

Even though many databases have been compiled to collect, curate and store microbe-host PPIs (Kumar and Nanduri, 2010; Durmus Tekir et al., 2013; Cook et al., 2018; Gao et al., 2018; Singh et al., 2019), these are mostly confined to well-studied pathogens and are predominantly comprised of interactions from high-throughput experiments. Contrastingly, in the literature, there exist inter-species PPIs from low-throughput experiments with some of them from non-model organisms, and commensal microbes, but mostly distributed over several individual studies. Very often, the inter-species PPI databases and repositories do not capture these sparse interactions. Hence, researchers have adapted and modified data- and text-mining tools to search for and extract microbe-host PPIs from existing literature. Retrieving such PPIs not only helps in increasing the number of true positive and true negative interactions (which helps aid the predictive performance of algorithms) but also extends our knowledge of existing microbe-host interactions. Motivated by the above explained need to mine-out microbe-host PPIs, Thieu et al. (2012) combine and compare the performance of a language based method based on a link grammar parser to a supervised ML methodology (SVM) and report that the combined approach results in a higher classification accuracy when compared to existing literature mining methods. As part of a bigger analytical framework aimed at uncovering the cellular mechanisms involved in human B lymphocytes during Epstein-Barr virus infection, Li et al. (2018) use a big-data mining methodology to identify a diverse range of inter-species molecular interactions including PPIs. Similar text/data mining approaches were also executed to extract PPI-mediated interactions of the human host with multiple viruses such as Hepatitis C virus (Saik et al., 2016) and Influenza A virus (García-Pérez et al., 2018; Table 2).

Interolog Based PPI Methods

For most species-pairs of interest, especially those belonging to the category of non-model organisms, there is a scarcity of experimentally verified PPIs. This has necessitated the development of novel bioinformatic methods, one of which is the inference of interactions from existing experimentally determined inter-species PPIs (Kshirsagar et al., 2015). These types of methodologies are usually based on the principle of homology (hence the term “interolog”: meaning interacting orthologs) – either at the level of proteins or protein structural features or both. Protein features used for homology based extrapolation include but are not limited to domains, motifs, amino-acid k-mers, and 3D structural properties (Kshirsagar et al., 2015). Interolog based approaches have been applied to harness the large volume of experimentally verified PPIs for model organisms including prominent bacterial/viral pathogens. Despite the potentially large coverage that can be achieved by such approaches, there exist several disadvantages of using interolog approaches as a silver bullet for inferring inter-species PPIs especially for novel species-pairs. These disadvantages are attributed to different pathogenic mechanisms between the microbes in the context of infecting different host species, different cellular localizations, and varying activity levels (expression, post-translational modifications, etc.) of the orthologous microbial proteins. Such differences lead to accessibility bottlenecks i.e., the ability of the proteins to physically access host proteins and thereby interact. Hence, interolog based approaches need to be complemented with additional filtering and quality control steps such as selecting proteins from infection-relevant cellular compartments, expression/activity measurements, etc.

Interolog based methods have been used to infer inter-species PPIs for many prominent pathogens and parasites (Table 2). Different versions of the interolog approach have been used to extrapolate PPIs corresponding to interactions between the human host and various pathogens such as Plasmodium falciparum (Krishnadev and Srinivasan, 2008; Lee et al., 2008), Escherichia coli (Krishnadev and Srinivasan, 2011), S. typhimurium (Krishnadev and Srinivasan, 2011; Schleker et al., 2012), Y. pestis (Krishnadev and Srinivasan, 2011), Helicobacter pylori (Tyagi et al., 2009), HIV (Cui et al., 2016), Francisella tularensis (Zhou et al., 2014; Cui et al., 2016), Coxiella burnetii (Wallqvist et al., 2017), Corynebacterium pseudotuberculosis (Barh et al., 2013), Corynebacterium diphtheriae (Barh et al., 2013), and Corynebacterium ulcerans (Barh et al., 2013). Using PPIs from the STRING database as the starting interaction set, Cuesta-Astroz et al. (2019) used the interolog methodology to predict PPIs between 15 different eukaryotic pathogens and the human host. To assign species-specific and lifecycle- specific contextuality, the authors confined the analysis to proteins from particular cellular compartments which are relevant to the infection process. From the analysis of the ensuing PPI networks, various invasion and evasion mechanisms adopted commonly and specifically by particular parasites were inferred (Cuesta-Astroz et al., 2019). Schleker et al. (2012) present another version of the interolog approach to predict human-Salmonella and A. thaliana-Salmonella PPI networks. As a source of template PPIs, publicly available interaction databases are used along with databases containing 3D structures between Pfam domains. As an add-on to the sequence based orthology of proteins, domain based orthology is also performed in order to reduce the false positive rates. Several additional filtering strategies such as restriction to predicted transmembrane proteins, relevance in host network and functional attributes such as gene ontology are used to make the PPIs more specific.

Approaches Inferring RNA Mediated Interactions

The role of RNAs, especially non-coding RNAs such as long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) in mediating molecular microbe-host interactions have been reported in the literature (Li et al., 2015b; Agliano et al., 2019). RNA molecules are either secreted by the microbial cell into the host cell or are packaged into vesicles along with other molecules which are then taken up by the host cell by endocytosis (Weiberg et al., 2014; Huang et al., 2019; Ahmadi Badi et al., 2020). Such microbial RNAs then modulate host cell activity by either binding to DNA, messenger RNAs or proteins. Thus, by salvaging and titrating host components, microbial RNAs modulate regulatory and signaling networks and subsequently host cell activity (Duval et al., 2017; Agliano et al., 2019; Shirahama et al., 2020). However, in contrast to PPI based methods, even though RNA-mediated microbe-host interactions are well studied from an experimental point of view, very few methods or studies exist that have systemically and systematically applied computational analysis (Table 3). As such, the resources which exist in the domain of RNA-mediated microbe-host interactions comprise of databases such as ViRBase (Li et al., 2015b) which is predominantly a source of experimentally verified virus–host non-coding RNA-associated interactions. In addition, it also contains predicted binding sites of virus non-coding RNAs on host proteins and RNAs. A prominent study which comprehensively examines and evaluates the role of RNAs in microbe-host interactions is that of Saçar Demirci and Adan (2020) who investigated the roles in infection of miRNA-like sequences encoded within the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) genome. They used a modified version of izMiR (Allmer et al., 2016), a SVM based ML method to predict pre-miRNAs which are homologous to the human precursor miRNAs from miRbase. The SVM based ML method identified several viral hairpin sequences which were smaller in length compared to the human miRNA precursors while many of the human and viral miRNA precursors were similar in length and shared identical minimum free energy, a feature used by the izMiR workflow (Allmer et al., 2016). Based on this observation, a revised classifier trained using only the known human miRNAs was used on the entire SARS-CoV-2 hairpin dataset which resulted in the identification of potential hairpins from which mature miRNA candidates were extracted. As a next step, the psRNATarget tool (Dai et al., 2018) was used to predict de novo the human genes targeted by the inferred viral miRNAs. Functional analysis of the human genes targeted revealed that the SARS-CoV-2miRNAs can affect various host processes including transcription, defense systems, Wnt and EGFR signaling pathways.

TABLE 3

Study	Context
Saçar Demirci and Adan (2020)	Analysis revealing the potential interactions between mature micro-RNA like viral RNA sequences and host genes
ViRBase (Li et al., 2015b)	Source of experimentally verified virus–host non-coding RNA-associated interactions; also contains predicted binding sites of virus non-coding RNAs on host proteins and RNAs

Examples of studies utilizing computational approaches to infer RNA-mediated interactions between microbes and hosts.

Approaches Utilizing Pipelines Integrating Multiple-Omic Datasets

Besides the computational methods based on particular types of molecular interactions, some integrated pipelines (Table 4) have been compiled to infer mechanistic microbe-host interactions. In general, such pipelines (Figure 2) incorporate the prediction of at least one molecular interaction type between microbial and host molecular components followed by various other functionalities such as integration of host responses. Table 5 provides a non-exhaustive overview of the different tools, databases and resources which are available in the public domain to compile integrated workflows based on PPIs for example.

TABLE 4

Methodology	Functionalities
MicrobioLink (Andrighetti et al., 2020)	Integrating microbe-host protein interaction networks with host responses and host regulatory/signaling networks using network diffusion principles
KBase (Arkin et al., 2018)	Integrated platform enabling data sharing, integration, and analysis of -omic datasets from microbes, plants, and their communities by creating computational workflows
Li et al. (2015a)	Identifying critical effectors involved in host-pathogen interactions by integrating multiple lines of -omic evidence

Integrated pipelines used to infer microbe-host interactions by combining heterogeneous -omic datasets.

TABLE 5

Step in workflow	Resource/Tool/Database
Source of proteomes (sequence information)	UniProt (The UniProt Consortium, 2018), HumanPSD (Hodges et al., 2002), YPD (Payne and Garrels, 1997), PombePD (Costanzo et al., 2001), WormPD (Costanzo et al., 2001), and SWISS-PROT (Bairoch and Apweiler, 1996)
Source of proteomic datasets (expression information)	ProteomicsDB (Schmidt et al., 2018), Human Protein Atlas (HPA) (Thul and Lindskog, 2018), PRIDE (Perez-Riverol et al., 2019), PeptideAtlas (Desiere et al., 2006), MassIVE.quant (Choi et al., 2020), jPOSTrepo (Okuda et al., 2017), iProX (Ma et al., 2019), and Panorama Public (Sharma et al., 2018)
Proteomic annotations (structural features)	InterPro (Mitchell et al., 2019), Pfam (El-Gebali et al., 2019), ELM (Gouw et al., 2018), and PDB (Burley et al., 2017)
Protein sub-cellular localization (databases and prediction tools)	ComPPI (Veres et al., 2015), HPA (Thul and Lindskog, 2018), LocDB (Rastogi and Rost, 2011), LocSigDB (Negi et al., 2015), COMPARTMENTS (Binder et al., 2014), eSLDB (Pierleoni et al., 2007), SCLpred-EMS (Kaleel et al., 2020), DeepLoc (Almagro Armenteros et al., 2017), PSORTdb (Peabody et al., 2016), SecretomeP (Bendtsen et al., 2004), and Signal P (Armenteros et al., 2019)
Base information for prediction of PPIs	Domain-domain predictions – DOMINE (Raghavachari et al., 2008) and Domain-motif predictions – ELM (Gouw et al., 2018)
Quality control of inferred PPIs (using disordered region prediction)	IUPred (Mészáros et al., 2018), PrDOS (Ishida and Kinoshita, 2007), D2P2 (Oates et al., 2013), PONDR-FIT (Xue et al., 2010), DISOPRED (Ward et al., 2004), MFDp2 (Mizianty et al., 2013), and Meta-Disorder (Kozlowski and Bujnicki, 2012)
Network resources	OmniPath (Türei et al., 2016), IntAct (Orchard et al., 2014), Reactome (Fabregat et al., 2018), STRING (Szklarczyk et al., 2017), HTRI (Bovolenta et al., 2012), and DoRothEA (Garcia-Alonso et al., 2018)
Network diffusion approaches	NBS (Hofree et al., 2013), HotNet (Vandin et al., 2011), TieDie (Basha et al., 2013; Paull et al., 2013), RegMod (Qiu et al., 2010), and stSVM21 (Cun and Fröhlich, 2013)
Databases for host gene expression	GEO (Clough and Barrett, 2016) and ArrayExpress (Parkinson et al., 2007)

A non-exhaustive catalog of resources, tools and databases to compile protein–protein interaction based workflows for inferring microbe (microbiome)-host interactions.

KBase (Arkin et al., 2018) is an integrated bioinformatics platform enabling users to share datasets with the research community as well as facilitating the integration, and analysis of -omic datasets from microbes and plants by creating computational workflows. Recently, we developed MicrobioLink (Andrighetti et al., 2020), an integrated pipeline which carries out de novo DDI and DMI based microbe-host PPI prediction followed by quality control using information from disordered region predictions from built-in tools such as IUPred (Mészáros et al., 2018). The pipeline then utilizes network diffusion principles and tools (Paull et al., 2013) to infer the molecular mechanisms and signaling pathways which mediate the effect of microbial proteins on host responses as measured by transcriptomic or proteomic read-outs. Flexibility is provided for users to feed in the desired datasets at any given step of the pipeline. Given the advent of new computational tools in inter-species interactions and pipeline management platforms, it is expected that an increasing number of dedicated bioinformatic workflows for microbe-host interactions will be developed in the near future.

Discussion: Opportunities and Challenges

Opportunities

Clinical and Translational Research

Since the aforementioned computational tools help researchers narrow down on both microbial and host components involved in mechanistic cross-talks, the tools may discover molecules which can delineate different clinical phenotypes. In addition, they can also be possible targets for therapeutic interventions. In other words, mechanistic predictions combined with clinical meta-data have a dual-purpose – they provide information on molecular components which could both represent and drive clinical phenotypes (Younesi, 2015) and thereby could potentially minimize our reliance on association-based biomarkers alone which need not explain causality (Levenson and Mori, 2014). The discovery of such mechanistic knowledge warrants the combinatorial use of different methodologies including machine learning and molecular interaction analysis. While many community level studies have been conducted on meta -omic datasets for the clinical classification of patients and the discovery of associative biomarkers (Wen et al., 2017; Yu et al., 2020; Clos-Garcia et al., 2019; Conteville et al., 2019), they have not incorporated mechanistic inferences. On the other hand, most mechanistic studies (Tables 2, 3) have been carried out on particular pathogens/microbial species without including clinical meta-data and/or clinical classifications.

Multi-omic approaches integrating heterogeneous -omic datasets from patients have been implemented for several diseases including IBD (Lloyd-Price et al., 2019) which are associated with microbial dysbiosis. However, these studies do not provide the required mechanistic insights for formulating therapeutic interventions. Beltran and Brito (2019) devised an integrated methodology to unravel the molecular mechanisms underlying the microbe-host interactions associated with various diseases such as colorectal cancer, IBD, obesity and type-2 diabetes. The aforementioned study represents one of the first and few initiatives to use community-wide microbe-host interaction predictions using meta -omic datasets from patients to discover mechanistic interactions driving the clinical phenotypes. By combining orthology based approaches to extrapolate interactions from experimental PPIs, machine learning and patient derived -omic datasets, the authors identified a subset of inter-species PPIs which are associated with disease phenotypes (Beltran and Brito, 2019). Thiele et al. (2020) published a novel study by integrating different levels of information (dietary information, physiological parameters, organ weights, and organ connectivities, etc.) and datasets such as molecular -omics (proteomics, metabolomics, metabolites produced by the gut microbiota) in an organ specific manner to arrive at a whole-body-model of human metabolism. Although not fully mechanistic, with this model, the authors were able to predict biomarkers of inherited metabolic diseases and host-microbiome co-metabolism. Such integrated studies and workflows combining statistical and mechanistic inference of multi -omic datasets awaits further adoption and application in the research on various diseases associated with microbial dysbiosis.

Research on Comparative Ecological Networks

The tools and resources listed in this review can be used to infer and predict molecular interactions between species in several contexts [microbe/microbiota in host, microbe/microbiota in several hosts, microbe (vs) microbe, and microbiota (vs) microbe, etc]. In almost all of the above-mentioned cases, molecular interactions between the autonomous entities (be it species or communities) could be driving the emergent phenotypes. Since the tools discussed in this manuscript also concern themselves with extrapolating interactions based on homology between species-pairs, it could be a right fit to predict de novo interaction relationships for species with very little experimental interaction information.

For example, Crohn’s disease, a sub-type of IBD, is characterized by the dysbiosis of the gut microbiome (Joossens et al., 2011; Schaubeck et al., 2016; Shaw et al., 2016). This results in persistent inflammation of the gut mucosal barrier as a result of the unbalanced host responses (co-influenced by host genetic factors as well) to the dysbiosed microbiome and its various components such as proteins, metabolites, etc (Li et al., 2014; Lavelle and Sokol, 2020). Some of the CD patients also display lesions of the skin during or after therapeutic regimens (Huang et al., 2012; Gravina et al., 2016). It is known that the skin also houses a complex microbial community which plays a role in maintaining homeostasis (Schommer and Gallo, 2013; Chen et al., 2018). Understanding the mechanisms by which CD medications impact the microbe-host interactions in the gut as well as the skin could help in avoiding the unintended side-effects of therapy in CD.

Yet another relevant context to apply the tools discussed herein is the inference of underlying molecular mechanisms which mediate the evasion of immune responses by bacterial pathogens in various hosts and their importance in transmission between hosts. We recently showed that bacterial pathogens and autophagy, a primary intracellular line of defense in the host, are engaged in an evolutionary tug of war, as evidenced by the presence of various interplays and cross-talks (Sudhakar et al., 2019). Given the exposure of host animals such as poultry and cattle to xenobiotic compounds such as antibiotics, many zoonotic pathogens are under constant selection pressure to evolve survival strategies to modulate/evade/survive within the host animal (Harada and Asai, 2010). This opens the door for impending risks of transmission (from animal hosts to human hosts or between various animal hosts) via the food chain of zoonotic species which have been selected for survival over many generations of persistence in the host (Farrell and Davies, 2019; Mollentze and Streicker, 2020). Microbe-host interaction mechanisms are at the evolutionary cross-roads of such transmission events between hosts. In this context, studying such interactions is expected to provide deeper insights into designing strategies to prevent and/or minimize spill-over transmission events.

Challenges

Over the past decade, various advances in the domain of computational analysis of microbe-host interactions have been made. However, despite this progress, there remain many challenges as described below. These challenges also present opportunities and the need to come up with innovative approaches and solutions.

Catching Up With Complex Infection Processes

Infection biology has taken new strides over the past years with new molecule classes (Katiyar-Agarwal and Jin, 2010; Rana et al., 2015; Duval et al., 2017; Long et al., 2017; Peters et al., 2019; Acuña et al., 2020) and cell-types (Chattopadhyay et al., 2018) being discovered as having a role in the infection process. With that, novel interaction types between various molecular classes are also unearthed (Silmon de Monerri and Kim, 2014). In some cases, computational methods have not caught up with molecular mechanisms. For example, hepadnaviruses utilize host DNA ligases to generate covalently closed circular DNAs which play a major role in mediating viral infection and persistence (Long et al., 2017). Similarly long non-coding RNAs are known to be involved in host-pathogen interactions (Duval et al., 2017; Agliano et al., 2019). However, till date, computational methods do not exist to predict or infer the mechanisms by which the viruses recruit the host DNA ligases or directly modulate the biogenesis, conformation and activity of long non-coding RNAs. Hence, computational method developments are always a step behind the complexity associated with infection biology. This gap is all the more prevalent for commensal organisms in contrast to pathogens due to the constant and historically prevalent study bias.

Lack of Experimental Datasets

Non-model organisms and non-pathogenic organisms such as probiotics and commensals also suffer from a considerable knowledge gap in terms of known/experimentally verified molecular interactions. This affects the performance of computational methods considerably due to the need for large sets of true positives for the satisfactory performance and assessment of predictive algorithms (Jiao and Du, 2016). In addition, this also influences the coverage and accuracy of interolog approaches since they harness already existing true positive datasets for extrapolating to the species-pairs of interest based on orthology.

False-Positives

As with any computational algorithm, microbe-host interaction prediction methods also face the curse of false positives. This issue could be exacerbated by the availability of relatively small true positive (truly interacting) and true negative (non-interacting sets) datasets (Jiao and Du, 2016). Furthermore, the evolutionary distance and difference in infection process between the template species-pairs and the species-pair of interest as well as the absence of orthologous molecular components involved in the interactions could also contribute to the inflated false positive rates, reduced performance and coverage.

Community-Wide Interaction Prediction

Most of the microbe-host interaction computational tools have been directed at uncovering interactions corresponding to individual microbe-host pairs. This is a major drawback of existing methodologies, especially given the fact that phenotypes related to health and disease are associated with changes in community wide alterations (Clemente et al., 2012; Koboziev et al., 2014; Wang et al., 2017; Bailey and Holscher, 2018; Dominguez-Bello et al., 2019).

Modeling Dynamics of Microbe-Host Interactions

Last but not the least, current methods involved in microbe-host interaction analysis are not equipped to handle the dynamic nature of natural ecosystems and ecological niches in which the interactions are embedded. Although it is a generic drawback of many bioinformatic approaches, this challenge will need coordinated efforts between modelers, experimental biologists and bioinformaticians.

Conclusion

Since the advent and expansion of high-throughput sequencing technologies, various observational studies of microbial communities inhabiting various ecological niches (inside host organisms for example) have been carried out. This has mostly resulted in associations with health- or disease-associated phenotypes. However, there is a huge gap in terms of the mechanisms mediated by these microbial communities and how these mechanisms contribute to the observed phenotypes. Despite the availability of experimental datasets which capture some of these mechanisms such as PPIs, these are either confined to model organisms or well-studied pathogens. Computational approaches provide researchers with the tools to upscale microbe-host interaction research by enabling them to make de novo inter-species molecular interactions and to extrapolate existing microbe-host interaction datasets to the species-pairs of interest. Computational methods may aid the study of microbe-host interaction by reducing the variable space, prioritizing interactions, and eventually building hypothesis for further experimental verification.

Statements

Author contributions

PS performed the literature review and wrote the manuscript. KM provided critical feedbacks and contributed to the text. BV contributed to relevant discussion about the clinical implications. TK and SV supervised the work and provided valuable discussions, feedbacks, and comments. All authors contributed to the article and approved the submitted version.

Funding

PS was supported by the ERC Advanced Grant (ERC-2015-AdG, 694679, CrUCCial). TK was supported by a fellowship in computational biology at the Earlham Institute (Norwich, United Kingdom) in partnership with the Quadram Institute (Norwich, United Kingdom) and strategically supported by the BBSRC (BB/J004529/1, BB/P016774/1, and BB/CSP17270/1). SV is a senior clinical investigator of the Research Foundation Flanders (FWO), Belgium.

Conflict of interest

BV received lecture fees from AbbVie, Ferring Pharmaceuticals, Janssen, R-Biopharm, and Takeda; consultancy fees from Janssen and Sandoz. SV: research grant: MSD, AbbVie, Takeda, Pfizer, and J&J; lecture fee: MSD, AbbVie, Takeda, Ferring, Centocor, Hospira, Pfizer, J&J, and Genentech/Roche; consultancy: MSD, AbbVie, Takeda, Ferring, Centocor, Hospira, Pfizer, J&J, Genentech/Roche, Celgene, Mundipharma, Celltrion, SecondGenome, Prometheus, Shire, ProDigest, Gilead, and Galapagos. SV is a senior clinical investigator of the Research Foundation–Flanders (FWO). The work of TK was supported by BenevolentAI and Unilever. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb.2021.618856/full#supplementary-material

Supplementary Table 1

Studies using genome-scale metabolic models and constraint based approaches to infer mechanistic co-metabolic interactions between microbial and host species.

References

1
AcuñaS. M.Floeter-WinterL. M.MuxelS. M. (2020). MicroRNAs: biological regulators in pathogen-host interactions.Cells9:113. 10.3390/cells9010113
2
AglianoF.RathinamV. A.MedvedevA. E.VanajaS. K.VellaA. T. (2019). Long noncoding RNAs in host-pathogen interactions.Trends Immunol.40492–510. 10.1016/j.it.2019.04.001
3
Ahmadi BadiS.BrunoS. P.MoshiriA.TarashiS.SiadatS. D.MasottiA. (2020). Small RNAs in outer membrane vesicles and their function in host-microbe interactions.Front. Microbiol.11, 1209. 10.3389/fmicb.2020.01209
4
AkivaE.FriedlanderG.ItzhakiZ.MargalitH. (2012). A dynamic view of domain-motif interactions.PLoS Comput. Biol.8, e1002341. 10.1371/journal.pcbi.1002341
5
AllerS.ScottA.Sarkar-TysonM.SoyerO. S. (2018). Integrated human-virus metabolic stoichiometric modelling predicts host-based antiviral targets against Chikungunya. Dengue and Zika viruses.J. R. Soc. Interface15:20180125. 10.1098/rsif.2018.0125
6
AllmerJ.AllmerJ.Saçar DemirciM. D. (2016). izMiR: computational ab initio microRNA detection.Protoc. Exch. [Preprint].10.1038/protex.2016.047
- CrossRef
- Google Scholar
7
Almagro ArmenterosJ. J.SønderbyC. K.SønderbyS. K.NielsenH.WintherO. (2017). DeepLoc: prediction of protein subcellular localization using deep learning.Bioinformatics333387–3395. 10.1093/bioinformatics/btx431
8
AndrighettiT.BoharB.LemkeN.SudhakarP.KorcsmarosT. (2020). MicrobioLink: an integrated computational pipeline to infer functional effects of microbiome-host interactions.Cells9:1278. 10.3390/cells9051278
9
ArkinA. P.CottinghamR. W.HenryC. S.HarrisN. L.StevensR. L.MaslovS.et al (2018). Kbase: the united states department of energy systems biology knowledgebase.Nat. Biotechnol.36566–569. 10.1038/nbt.4163
10
ArmenterosJ. J. A.TsirigosK. D.SønderbyC. K.PetersenT. N.WintherO.BrunakS.et al (2019). SignalP 5.0 improves signal peptide predictions using deep neural networks.Nat. Biotechnol.37420–423. 10.1038/s41587-019-0036-z
11
AzelogluE. U.IyengarR. (2015). Signaling networks: information flow, computation, and decision making.Cold Spring Harb. Perspect. Biol.7:a005934. 10.1101/cshperspect.a005934
12
AzizR. K.BartelsD.BestA. A.DeJonghM.DiszT.EdwardsR. A.et al (2008). The RAST server: rapid annotations using subsystems technology.BMC Genomics9:75. 10.1186/1471-2164-9-75
13
BaileyM. A.HolscherH. D. (2018). Microbiome-mediated effects of the mediterranean diet on inflammation.Adv. Nutr.9, 193–206. 10.1093/advances/nmy013
14
BairochA.ApweilerR. (1996). The SWISS-PROT protein sequence data bank and its new supplement TREMBL.Nucleic Acids Res.2421–25. 10.1093/nar/24.1.21
15
BaldiniF.HeinkenA.HeirendtL.MagnusdottirS.FlemingR. M. T.ThieleI. (2019). The microbiome modeling toolbox: from microbial interactions to personalized microbial communities.Bioinformatics352332–2334. 10.1093/bioinformatics/bty941
16
BarhD.GuptaK.JainN.KhatriG.León-SicairosN.Canizalez-RomanA.et al (2013). Conserved host-pathogen PPIs. globally conserved inter-species bacterial PPIs based conserved host-pathogen interactome derived novel target in Corynebacterium pseudotuberculosis, Corynebacterium diphtheriae, Francisella tularensis, Corynebacterium ulcerans, Y. pestis, and E. coli targeted by Piper betel compounds.Integr. Biol. (Camb)5495–509. 10.1039/c2ib20206a
17
BashaO.TirmanS.ElukA.Yeger-LotemE. (2013). ResponseNet2.0: revealing signaling and regulatory pathways connecting your proteins and genes–now with human data.Nucleic Acids Res.41W198–W203. 10.1093/nar/gkt532
18
BasitA. H.AbbasiW. A.AsifA.GullS.MinhasF. U. A. A. (2018). Training host-pathogen protein-protein interaction predictors.J. Bioinform. Comput. Biol.16:1850014. 10.1142/S0219720018500142
19
BauerE.ZimmermannJ.BaldiniF.ThieleI.KaletaC. (2017). BacArena: individual-based metabolic modeling of heterogeneous microbes in complex communities.PLoS Comput. Biol.13:e1005544. 10.1371/journal.pcbi.1005544
20
BendtsenJ. D.JensenL. J.BlomN.Von HeijneG.BrunakS. (2004). Feature-based prediction of non-classical and leaderless protein secretion.Protein Eng. Des. Sel.17349–356. 10.1093/protein/gzh037
21
BeltranJ. F.BritoI. (2019). Host-microbiome protein-protein interactions capture mechanisms in human disease.BioRxiv.10.1101/821926
- CrossRef
- Google Scholar
22
BinderJ. X.Pletscher-FrankildS.TsafouK.StolteC.O’DonoghueS. I.SchneiderR.et al (2014). COMPARTMENTS: unification and visualization of protein subcellular localization evidence.Database (Oxford)2014:bau012. 10.1093/database/bau012
23
BordbarA.LewisN. E.SchellenbergerJ.PalssonB. ØJamshidiN. (2010). Insight into human alveolar macrophage and Francisella tularensis interactions via metabolic reconstructions.Mol. Syst. Biol.6:422. 10.1038/msb.2010.68
24
BovolentaL. A.AcencioM. L.LemkeN. (2012). HTRIdb: an open-access database for experimentally verified human transcriptional regulation interactions.BMC Genomics13:405. 10.1186/1471-2164-13-405
25
BragaR. M.DouradoM. N.AraújoW. L. (2016). Microbial interactions: ecology in a molecular perspective.Braz. J. Microbiol.47(Suppl. 1), 86–98. 10.1016/j.bjm.2016.10.005
26
BunesovaV.LacroixC.SchwabC. (2018). Mucin cross-feeding of infant bifidobacteria and Eubacterium hallii.Microb. Ecol.75228–238. 10.1007/s00248-017-1037-1034
- CrossRef
- Google Scholar
27
BurleyS. K.BermanH. M.KleywegtG. J.MarkleyJ. L.NakamuraH.VelankarS. (2017). Protein data bank (PDB): the single global macromolecular structure archive.Methods Mol. Biol.1607627–641. 10.1007/978-1-4939-7000-1_26
- CrossRef
- Google Scholar
28
ByvatovE.SchneiderG. (2003). Support vector machine applications in bioinformatics.Appl. Bioinform.267–77.
- Google Scholar
29
CarducciM.LicataL.PelusoD.CastagnoliL.CesareniG. (2010). Enriching the viral-host interactomes with interactions mediated by SH3 domains.Amino Acids381541–1547. 10.1007/s00726-009-0375-z
30
CharitouT.BryanK.LynnD. J. (2016). Using biological networks to integrate, visualize and analyze genomics data.Genet. Sel. Evol.48:27. 10.1186/s12711-016-0205-201
- CrossRef
- Google Scholar
31
ChattopadhyayP. K.RoedererM.BoltonD. L. (2018). A deadly dance: the choreography of host-pathogen interactions, as revealed by single-cell technologies.Nat. Commun.9, 4638. 10.1038/s41467-018-06214-0
32
CheD.LiuQ.RasheedK.TaoX. (2011). Decision tree and ensemble learning algorithms with their applications in bioinformatics.Adv. Exp. Med. Biol.696191–199. 10.1007/978-1-4419-7046-6_19
- CrossRef
- Google Scholar
33
ChenJ.SunJ.LiuX.LiuF.LiuR.WangJ. (2019). Structure-based prediction of West Nile virus-human protein-protein interactions.J. Biomol. Struct. Dyn.372310–2321. 10.1080/07391102.2018.1479659
34
ChenY. E.FischbachM. A.BelkaidY. (2018). Skin microbiota-host interactions.Nature553427–436. 10.1038/nature25177
35
ChenZ.ZhengY.DingC.RenX.YuanJ.SunF.et al (2017). Integrated metagenomics and molecular ecological network analysis of bacterial community composition during the phytoremediation of cadmium-contaminated soils by bioenergy crops.Ecotoxicol. Environ. Saf.145111–118. 10.1016/j.ecoenv.2017.07.019
36
ChoiM.CarverJ.ChivaC.TzourosM.HuangT.TsaiT.-H.et al (2020). MassIVE.quant: a community resource of quantitative mass spectrometry-based proteomics datasets.Nat. Methods17981–984. 10.1038/s41592-020-0955-950
- CrossRef
- Google Scholar
37
ClementeJ. C.UrsellL. K.ParfreyL. W.KnightR. (2012). The impact of the gut microbiota on human health: an integrative view.Cell148, 1258–1270. 10.1016/j.cell.2012.01.035
38
Clos-GarciaM.Andrés-MarinN.Fernández-EulateG.AbeciaL.LavínJ. L.van LiempdS.et al (2019). Gut microbiome and serum metabolome analyses identify molecular biomarkers and altered glutamate metabolism in fibromyalgia.EBioMedicine46, 499–511. 10.1016/j.ebiom.2019.07.031
39
CloughE.BarrettT. (2016). The gene expression omnibus database.Methods Mol. Biol.141893–110. 10.1007/978-1-4939-3578-9_5
- CrossRef
- Google Scholar
40
ContevilleL. C.Oliveira-FerreiraJ.VicenteA. C. P. (2019). Gut microbiome biomarkers and functional diversity within an amazonian semi-nomadic hunter-gatherer group.Front. Microbiol.10, 1743. 10.3389/fmicb.2019.01743
41
CookH. V.DonchevaN. T.SzklarczykD.von MeringC.JensenL. J. (2018). Viruses.STRING: a virus-host protein-protein interaction database.Viruses10:519. 10.3390/v10100519
42
CostanzoM. C.CrawfordM. E.HirschmanJ. E.KranzJ. E.RobertsonL. S.et al (2001). YPD, PombePD and WormPD: model organism volumes of the BioKnowledge library, an integrated resource for protein information.Nucleic Acids Res.2975–79. 10.1093/nar/29.1.75
43
Cuesta-AstrozY.SantosA.OliveiraG.JensenL. J. (2019). Analysis of predicted host-parasite interactomes reveals commonalities and specificities related to parasitic lifestyle and tissues tropism.Front. Immunol.10:212. 10.3389/fimmu.2019.00212
44
CuiG.FangC.HanK. (2012). Prediction of protein-protein interactions between viruses and human by an SVM model.BMC Bioinformatics13(Suppl. 7):S5. 10.1186/1471-2105-13-S7-S5
45
CuiT.LiW.LiuL.HuangQ.HeZ.-G. (2016). Uncovering new pathogen-host protein-protein interactions by pairwise structure similarity.PLoS One11:e0147612. 10.1371/journal.pone.0147612
46
CunY.FröhlichH. (2013). Network and data integration for biomarker signature discovery via network smoothed T-statistics.PLoS One8:e73074. 10.1371/journal.pone.0073074
47
CusickM. F.LibbeyJ. E.FujinamiR. S. (2012). Molecular mimicry as a mechanism of autoimmune disease.Clin. Rev. Allergy Immunol.42, 102–111. 10.1007/s12016-011-8294-7
48
DaiX.ZhuangZ.ZhaoP. X. (2018). psRNATarget: a plant small RNA target analysis server (2017 release).Nucleic Acids Res.46, W49–W54. 10.1093/nar/gky316
49
DengY.JiangY.-H.YangY.HeZ.LuoF.ZhouJ. (2012). Molecular ecological network analyses.BMC Bioinformatics13:113. 10.1186/1471-2105-13-113
50
DesiereF.DeutschE. W.KingN. L.NesvizhskiiA. I.MallickP.EngJ.et al (2006). The PeptideAtlas project.Nucleic Acids Res.34D655–D658. 10.1093/nar/gkj040
51
DingT.CaseK. A.OmoloM. A.ReilandH. A.MetzZ. P.DiaoX.et al (2016). Predicting essential metabolic genome content of niche-specific enterobacterial human pathogens during simulation of host environments.PLoS One11:e0149423. 10.1371/journal.pone.0149423
52
DingZ.KiharaD. (2018). Computational methods for predicting protein-protein interactions using various protein features.Curr. Protoc. Protein Sci.93, e62. 10.1002/cpps.62
53
DixA.VlaicS.GuthkeR.LindeJ. (2016). Use of systems biology to decipher host-pathogen interaction networks and predict biomarkers.Clin. Microbiol. Infect.22600–606. 10.1016/j.cmi.2016.04.014
54
Dominguez-BelloM. G.Godoy-VitorinoF.KnightR.BlaserM. J. (2019). Role of the microbiome in human development.Gut68, 1108–1114. 10.1136/gutjnl-2018-317503
55
DongY.KuangQ.DaiX.LiR.WuY.LengW.et al (2015). Improving the understanding of pathogenesis of human papillomavirus 16 via mapping protein-protein interaction network.Biomed Res. Int.2015:890381. 10.1155/2015/890381
56
DoolittleJ. M.GomezS. M. (2011). Mapping protein interactions between dengue virus and its human and insect hosts.PLoS Negl. Trop. Dis.5:e954. 10.1371/journal.pntd.0000954
57
DoxeyA. C.McConkeyB. J. (2013). Prediction of molecular mimicry candidates in human pathogenic bacteria.Virulence4453–466. 10.4161/viru.25180
58
Durmus TekirS.ÇakirT.ArdiçE.SayilirbasA. S.KonukG.KonukM.et al (2013). PHISTO: pathogen-host interaction search tool.Bioinformatics29, 1357–1358. 10.1093/bioinformatics/btt137
59
DuvalM.CossartP.LebretonA. (2017). Mammalian microRNAs and long noncoding RNAs in the host-bacterial pathogen crosstalk.Semin. Cell Dev. Biol.6511–19. 10.1016/j.semcdb.2016.06.016
60
DyerM. D.MuraliT. M.SobralB. W. (2007). Computational prediction of host-pathogen protein-protein interactions.Bioinformatics23i159–i166. 10.1093/bioinformatics/btm208
61
DyerM. D.MuraliT. M.SobralB. W. (2011). Supervised learning and prediction of physical interactions between human and HIV proteins.Infect. Genet. Evol.11917–923. 10.1016/j.meegid.2011.02.022
62
EainM. M. G.BaginskaJ.GreenhalghK.FritzJ. V.ZenhausernF.WilmesP. (2017). Engineering solutions for representative models of the gastrointestinal human-microbe interface.Engineering360–65. 10.1016/J.ENG.2017.01.011
- CrossRef
- Google Scholar
63
El-GebaliS.MistryJ.BatemanA.EddyS. R.LucianiA.PotterS. C.et al (2019). The Pfam protein families database in 2019.Nucleic Acids Res.47D427–D432. 10.1093/nar/gky995
64
EmamjomehA.GoliaeiB.ZahiriJ.EbrahimpourR. (2014). Predicting protein-protein interactions between human and hepatitis C virus via an ensemble learning method.Mol. Biosyst.103147–3154. 10.1039/c4mb00410h
65
Emmert-StreibF.GlazkoG. V. (2011). Network biology: a direct approach to study biological function.Wiley Interdiscip. Rev. Syst. Biol. Med.3379–391. 10.1002/wsbm.134
66
EvansP.DampierW.UngarL.TozerenA. (2009). Prediction of HIV-1 virus-host protein interactions using virus and host sequence motifs.BMC Med. Genomics2:27. 10.1186/1755-8794-2-27
67
FabregatA.JupeS.MatthewsL.SidiropoulosK.GillespieM.GarapatiP.et al (2018). The reactome pathway knowledgebase.Nucleic Acids Res.46D649–D655. 10.1093/nar/gkx1132
68
FarrellM. J.DaviesT. J. (2019). Disease mortality in domesticated animals is predicted by host evolutionary relationships.Proc. Natl. Acad. Sci. U S A.1167911–7915. 10.1073/pnas.1817323116
69
FranzosaE. A.XiaY. (2011). Structural principles within the human-virus protein-protein interaction network.Proc. Natl. Acad. Sci. U S A.10810538–10543. 10.1073/pnas.1101440108
70
FritzJ. V.DesaiM. S.ShahP.SchneiderJ. G.WilmesP. (2013). From meta-omics to causality: experimental models for human microbiome research.Microbiome1:14. 10.1186/2049-2618-1-14
71
GaoN. L.ZhangC.ZhangZ.HuS.LercherM. J.ZhaoX.-M.et al (2018). MVP: a microbe-phage interaction database.Nucleic Acids Res.46, D700–D707. 10.1093/nar/gkx1124
72
Garcia-AlonsoL.IorioF.MatchanA.FonsecaN.JaaksP.PeatG.et al (2018). Transcription factor activities enhance markers of drug sensitivity in cancer.Cancer Res.78769–780. 10.1158/0008-5472.CAN-17-1679
73
García-PérezC. A.GuoX.NavarroJ. G.AguilarD. A. G.Lara-RamírezE. E. (2018). Proteome-wide analysis of human motif-domain interactions mapped on influenza a virus.BMC Bioinformatics19:238. 10.1186/s12859-018-2237-2238
- CrossRef
- Google Scholar
74
GibsonT. J.DinkelH.Van RoeyK.DiellaF. (2015). Experimental detection of short regulatory motifs in eukaryotic proteins: tips for good practice as well as for bad.Cell Commun. Signal.13:42. 10.1186/s12964-015-0121-y
75
GosakM.MarkovièR.DolenšekJ.Slak RupnikM.MarhlM.StožerA.et al (2018). Network science of biological systems at different scales: a review.Phys. Life Rev.24118–135. 10.1016/j.plrev.2017.11.003
76
GouwM.MichaelS.Sámano-SánchezH.KumarM.ZekeA.LangB.et al (2018). The eukaryotic linear motif resource - 2018 update.Nucleic Acids Res.46D428–D434. 10.1093/nar/gkx1077
77
GravinaA. G.FedericoA.RuoccoE.Lo SchiavoA.RomanoF.MirandaA.et al (2016). Crohn’s disease and skin.United Eur. Gastroenterol. J.4165–171. 10.1177/2050640615597835
78
Guven-MaiorovE.TsaiC.-J.MaB.NussinovR. (2017). Prediction of host pathogen interactions for Helicobacter pylori by interface mimicry and implications to gastric Cancer.J. Mol. Biol.4293925–3941. 10.1016/j.jmb.2017.10.023
79
HalehalliR. R.NagarajaramH. A. (2015). Molecular principles of human virus protein-protein interactions.Bioinformatics311025–1033. 10.1093/bioinformatics/btu763
80
HaqueM. N.NomanN.BerrettaR.MoscatoP. (2016). Heterogeneous ensemble combination search using genetic algorithm for class imbalanced data classification.PLoS One11:e0146116. 10.1371/journal.pone.0146116
81
HaradaK.AsaiT. (2010). Role of antimicrobial selective pressure and secondary factors on antimicrobial resistance prevalence in Escherichia coli from food-producing animals in Japan.J. Biomed. Biotechnol.2010:180682. 10.1155/2010/180682
82
HeinkenA.AcharyaG.RavcheevD. A.HertelJ.NygaM.OkpalaO. E.et al (2020). AGORA2: large scale reconstruction of the microbiome highlights wide-spread drug-metabolising capacities.BioRxiv [preprint]10.1101/2020.11.09.375451
- CrossRef
- Google Scholar
83
HeinkenA.SahooS.FlemingR. M. T.ThieleI. (2013). Systems-level characterization of a host-microbe metabolic symbiosis in the mammalian gut.Gut Microbes428–40. 10.4161/gmic.22370
84
HeinkenA.ThieleI. (2015). Systematic prediction of health-relevant human-microbial co-metabolism through a computational framework.Gut Microbes6120–130. 10.1080/19490976.2015.1023494
85
HeirendtL.ArreckxS.PfauT.MendozaS. N.RichelleA.HeinkenA.et al (2019). Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0.Nat. Protoc.14639–702. 10.1038/s41596-018-0098-92
- CrossRef
- Google Scholar
86
HelenoR.GarciaC.JordanoP.TravesetA.GómezJ. M.BlüthgenN.et al (2014). Ecological networks: delving into the architecture of biodiversity.Biol. Lett.10:20131000. 10.1098/rsbl.2013.1000
87
HenryC. S.DeJonghM.BestA. A.FrybargerP. M.LinsayB.StevensR. L. (2010). High-throughput generation, optimization and analysis of genome-scale metabolic models.Nat. Biotechnol.28977–982. 10.1038/nbt.1672
88
HertelJ.HarmsA. C.HeinkenA.BaldiniF.ThinnesC. C.GlaabE.et al (2019). Integrated analyses of microbiome and longitudinal metabolome data reveal microbial-host interactions on sulfur metabolism in Parkinson’s disease.Cell Rep.291767–1777.e8. 10.1016/j.celrep.2019.10.035
89
HodgesP. E.CarricoP. M.HoganJ. D.O’NeillK. E.OwenJ. J.ManganM.et al (2002). Annotating the human proteome: the human proteome survey database (HumanPSD) and an in-depth target database for G protein-coupled receptors (GPCR-PD) from incyte genomics.Nucleic Acids Res.30137–141. 10.1093/nar/30.1.137
90
HofreeM.ShenJ. P.CarterH.GrossA.IdekerT. (2013). Network-based stratification of tumor mutations.Nat. Methods101108–1115. 10.1038/nmeth.2651
91
HongjaiseeS.NantasenamatC.CarrawayT. S.ShoombuatongW. (2019). HIVCoR: a sequence-based tool for predicting HIV-1 CRF01_AE coreceptor usage.Comput. Biol. Chem.80419–432. 10.1016/j.compbiolchem.2019.05.006
92
HuangB. L.ChandraS.ShihD. Q. (2012). Skin manifestations of inflammatory bowel disease.Front. Physiol.3:13. 10.3389/fphys.2012.00013
93
HuangC.-Y.WangH.HuP.HambyR.JinH. (2019). Small RNAs – Big players in plant-microbe interactions.Cell Host Microbe26, 173–182. 10.1016/j.chom.2019.07.021
94
HuangJ. K.CarlinD. E.YuM. K.ZhangW.KreisbergJ. F.TamayoP.et al (2018). Systematic evaluation of molecular networks for discovery of disease genes.Cell Syst.6484–495.e5. 10.1016/j.cels.2018.03.001
95
HughesD. T.SperandioV. (2008). Inter-kingdom signalling: communication between bacteria and their hosts.Nat. Rev. Microbiol.6111–120. 10.1038/nrmicro1836
96
HurfordA.DayT. (2013). Immune evasion and the evolution of molecular mimicry in parasites.Evolution67, 2889–2904. 10.1111/evo.12171
97
IdreesS.Pérez-BercoffÅ.EdwardsR. J. (2018). SLiM-Enrich: computational assessment of protein-protein interaction data as a source of domain-motif interactions.PeerJ6, e5858. 10.7717/peerj.5858
98
IshidaT.KinoshitaK. (2007). PrDOS: prediction of disordered protein regions from amino acid sequence.Nucleic Acids Res.35W460–W464. 10.1093/nar/gkm363
99
IslamM. M.FernandoS. C.SahaR. (2019). Metabolic modeling elucidates the transactions in the rumen microbiome and the shifts upon virome interactions.Front. Microbiol.10:2412. 10.3389/fmicb.2019.02412
100
JacobJ. J.VeeraraghavanB.VasudevanK. (2019). Metagenomic next-generation sequencing in clinical microbiology.Ind. J. Med. Microbiol.37133–140. 10.4103/ijmm.IJMM_19_401
- CrossRef
- Google Scholar
101
JiaoY.DuP. (2016). Performance measures in evaluating machine learning based bioinformatics predictors for classifications.Quant. Biol.4, 320–330. 10.1007/s40484-016-0081-2
- CrossRef
- Google Scholar
102
JoossensM.HuysG.CnockaertM.De PreterV.VerbekeK.RutgeertsP.et al (2011). Dysbiosis of the faecal microbiota in patients with Crohn’s disease and their unaffected relatives.Gut60631–637. 10.1136/gut.2010.223263
103
KaleelM.ZhengY.ChenJ.FengX.SimpsonJ. C.PollastriG.et al (2020). SCLpred-EMS: subcellular localization prediction of endomembrane system and secretory pathway proteins by Deep N-to-1 convolutional neural networks.Bioinformatics363343–3349. 10.1093/bioinformatics/btaa156
104
KargarfardF.SamiA.Mohammadi-DehcheshmehM.EbrahimieE. (2016). Novel approach for identification of influenza virus host range and zoonotic transmissible sequences by determination of host-related associative positions in viral genome segments.BMC Genomics17:925. 10.1186/s12864-016-3250-3259
- CrossRef
- Google Scholar
105
Katiyar-AgarwalS.JinH. (2010). Role of small RNAs in host-microbe interactions.Annu. Rev. Phytopathol.48, 225–246. 10.1146/annurev-phyto-073009-114457
106
KerrS. A.JacksonE. L.LunguO. I.MeyerA. G.DemoginesA.EllingtonA. D.et al (2015). Computational and functional analysis of the virus-receptor interface reveals host range trade-offs in new world arenaviruses.J. Virol.8911643–11653. 10.1128/JVI.01408-1415
- CrossRef
- Google Scholar
107
KimB.AlguwaizaniS.ZhouX.HuangD.-S.ParkB.HanK. (2017). An improved method for predicting interactions between virus and human proteins.J. Bioinform. Comput. Biol.15:1650024. 10.1142/S0219720016500244
108
KimJ.-G.ParkD.KimB.-C.ChoS.-W.KimY. T.ParkY.-J.et al (2008). Predicting the interactome of Xanthomonas oryzae pathovar oryzae for target selection and DB service.BMC Bioinformatics9:41. 10.1186/1471-2105-9-41
109
KimT.-H.ParkD.-C.WooD.-M.JeongT.MinS.-Y. (2012). “Multi-class classifier-based adaboost algorithm,” in Intelligent Science and Intelligent Data Engineering Lecture Notes in Computer Science, edsZhangY.ZhouZ.-H.ZhangC.LiY. (Berlin: Springer), 122–127. 10.1007/978-3-642-31919-8_16
- CrossRef
- Google Scholar
110
KobozievI.Reinoso WebbC.FurrK. L.GrishamM. B. (2014). Role of the enteric microbiota in intestinal homeostasis and inflammation.Free Radic. Biol. Med.68, 122–133. 10.1016/j.freeradbiomed.2013.11.008
111
KösesoyÝGökM.ÖzC. (2019). A new sequence based encoding for prediction of host-pathogen protein interactions.Comput. Biol. Chem.78170–177. 10.1016/j.compbiolchem.2018.12.001
112
KozlowskiL. P.BujnickiJ. M. (2012). MetaDisorder: a meta-server for the prediction of intrinsic disorder in proteins.BMC Bioinformatics13:111. 10.1186/1471-2105-13-111
113
KrawczykB. (2015). Forming ensembles of soft one-class classifiers with weighted bagging.New Gener. Comput.33449–466. 10.1007/s00354-015-0406-400
- CrossRef
- Google Scholar
114
KrishnadevO.SrinivasanN. (2008). A data integration approach to predict host-pathogen protein-protein interactions: application to recognize protein interactions between human and a malarial parasite.In Silico Biol. (Gedrukt)8235–250.
- Google Scholar
115
KrishnadevO.SrinivasanN. (2011). Prediction of protein-protein interactions between human host and a pathogen and its application to three pathogenic bacteria.Int. J. Biol. Macromol.48613–619. 10.1016/j.ijbiomac.2011.01.030
116
KshirsagarM.CarbonellJ.Klein-SeetharamanJ. (2013). Multitask learning for host-pathogen protein interactions.Bioinformatics29i217–i226. 10.1093/bioinformatics/btt245
117
KshirsagarM.SchlekerS.CarbonellJ.Klein-SeetharamanJ. (2015). Techniques for transferring host-pathogen protein interactions knowledge to new tasks.Front. Microbiol.6, 36. 10.3389/fmicb.2015.00036
118
KumarR.NanduriB. (2010). HPIDB–a unified resource for host-pathogen interactions.BMC Bioinformatics11, S16. 10.1186/1471-2105-11-S6-S16
119
LaiY.-H.LiZ.-C.ChenL.-L.DaiZ.ZouX.-Y. (2012). Identification of potential host proteins for influenza a virus based on topological and biological characteristics by proteome-wide network approach.J. Proteomics752500–2513. 10.1016/j.jprot.2012.02.034
120
LassoG.MayerS. V.WinkelmannE. R.ChuT.ElliotO.Patino-GalindoJ. A.et al (2019). A structure-informed atlas of human-virus interactions.Cell1781526–1541.e16. 10.1016/j.cell.2019.08.005
121
LavelleA.SokolH. (2020). Gut microbiota-derived metabolites as key actors in inflammatory bowel disease.Nat. Rev. Gastroenterol. Hepatol.17223–237. 10.1038/s41575-019-0258-z
122
LevensonV.MoriY. (2014). The era of personalized medicine: mechanistic or correlative biomarkers?Per. Med.11, 361–364. 10.2217/pme.14.10
123
LeeS.-A.ChanC.TsaiC.-H.LaiJ.-M.WangF.-S.KaoC.-Y.et al (2008). Ortholog-based protein-protein interaction prediction and its application to inter-species interactions.BMC Bioinformatics9(Suppl. 12):S11. 10.1186/1471-2105-9-S12-S11
124
LeiteD. M. C.BrochetX.ReschG.QueY.-A.NevesA.Peña-ReyesC. (2018). Computational prediction of inter-species relationships through omics data analysis and machine learning.BMC Bioinformatics19:420. 10.1186/s12859-018-2388-2387
- CrossRef
- Google Scholar
125
LevyS. E.MyersR. M. (2016). Advancements in next-generation sequencing.Annu. Rev. Genomics. Hum. Genet.1795–115. 10.1146/annurev-genom-083115-22413
- CrossRef
- Google Scholar
126
LiC.-W.JhengB.-R.ChenB.-S. (2018). Investigating genetic-and-epigenetic networks, and the cellular mechanisms occurring in Epstein-Barr virus-infected human B lymphocytes via big data mining and genome-wide two-sided NGS data identification.PLoS One13:e0202537. 10.1371/journal.pone.0202537
127
LiQ.WangC.TangC.HeQ.LiN.LiJ. (2014). Dysbiosis of gut fungal microbiota is associated with mucosal inflammation in Crohn’s disease.J. Clin. Gastroenterol.48513–523. 10.1097/MCG.0000000000000035
128
LiW.FanX.LongQ.XieL.XieJ. (2015a). Mycobacterium tuberculosis effectors involved in host-pathogen interaction revealed by a multiple scales integrative pipeline.Infect. Genet. Evol.321–11. 10.1016/j.meegid.2015.02.014
129
LiY.WangC.MiaoZ.BiX.WuD.JinN.et al (2015b). ViRBase: a resource for virus-host ncRNA-associated interactions.Nucleic Acids Res.43D578–D582. 10.1093/nar/gku903
130
LiX.LiaoB.ShuY.ZengQ.LuoJ. (2009). Protein functional class prediction using global encoding of amino acid sequence.J. Theor. Biol.261290–293. 10.1016/j.jtbi.2009.07.017
131
LiZ.-G.HeF.ZhangZ.PengY.-L. (2012). Prediction of protein-protein interactions between Ralstonia solanacearum and Arabidopsis thaliana.Amino Acids422363–2371. 10.1007/s00726-011-0978-z
132
LianX.YangS.LiH.FuC.ZhangZ. (2019). Machine-Learning-Based predictor of human-bacteria protein-protein interactions by incorporating comprehensive host- network properties.J. Proteome Res.182195–2205. 10.1021/acs.jproteome.9b00074
133
LiaoQ.YuanX.XiaoH.LiuC.LvZ.ZhaoY.et al (2011). Identifying Schistosoma japonicum excretory/secretory proteins and their interactions with host immune system.PLoS One6:e23786. 10.1371/journal.pone.0023786
134
LinW.-C.LuY.-H.TsaiC.-F. (2019). Feature selection in single and ensemble learning-based bankruptcy prediction models.Expert Systems36:e12335. 10.1111/exsy.12335
- CrossRef
- Google Scholar
135
Lloyd-PriceJ.ArzeC.AnanthakrishnanA. N.SchirmerM.Avila-PachecoJ.PoonT. W.et al (2019). Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases.Nature569, 655–662. 10.1038/s41586-019-1237-9
136
LongQ.YanR.HuJ.CaiD.MitraB.KimE. S.et al (2017). The role of host DNA ligases in hepadnavirus covalently closed circular DNA formation.PLoS Pathog.13:e1006784. 10.1371/journal.ppat.1006784
137
MaJ.ChenT.WuS.YangC.BaiM.ShuK.et al (2019). iProX: an integrated proteome resource.Nucleic Acids Res.47D1211–D1217. 10.1093/nar/gky869
138
MahajanG.MandeS. C. (2017). Using structural knowledge in the protein data bank to inform the search for potential host-microbe protein interactions in sequence space: application to Mycobacterium tuberculosis.BMC Bioinformatics18:201. 10.1186/s12859-017-1550-y
139
MariethozJ.KhatibK.AlocciD.CampbellM. P.KarlssonN. G.PackerN. H.et al (2016). SugarBindDB, a resource of glycan-mediated host-pathogen interactions.Nucleic Acids Res.44D1243–D1250. 10.1093/nar/gkv1247
140
MartinA. M.YabutJ. M.ChooJ. M.PageA. J.SunE. W.JessupC. F.et al (2019). The gut microbiome regulates host glucose homeostasis via peripheral serotonin.Proc. Natl. Acad. Sci. U S A.11619802–19804. 10.1073/pnas.1909311116
141
MartinezK. B.LeoneV.ChangE. B. (2017). Microbial metabolites in health and disease: navigating the unknown in search of function.J. Biol. Chem.292, 8553–8559. 10.1074/jbc.R116.752899
142
MayS.EvansS.ParryL. (2017). Organoids, organs-on-chips and other systems, and microbiota.Emerg. Top. Life Sci.1385–400. 10.1042/ETLS20170047
143
MehrotraP.RamakrishnanG.DhandapaniG.SrinivasanN.MadananM. G. (2017). Comparison of Leptospira interrogans and Leptospira biflexa genomes: analysis of potential leptospiral-host interactions.Mol. Biosyst.13883–891. 10.1039/c6mb00856a
144
MeiS. (2013). Probability weighted ensemble transfer learning for predicting interactions between HIV-1 and human proteins.PLoS One8:e79606. 10.1371/journal.pone.0079606
145
MeiS.FlemingtonE. K.ZhangK. (2018). Transferring knowledge of bacterial protein interaction networks to predict pathogen targeted human genes and immune signaling pathways: a case study on Francisella tularensis.BMC Genomics19:505. 10.1186/s12864-018-4873-4879
- CrossRef
- Google Scholar
146
MeiS.ZhangK. (2020). In silico unravelling pathogen-host signaling cross-talks via pathogen mimicry and human protein-protein interaction networks.Comput. Struct. Biotechnol. J.18100–113. 10.1016/j.csbj.2019.12.008
147
MeiS.ZhuH. (2014a). AdaBoost based multi-instance transfer learning for predicting proteome-wide interactions between Salmonella and human proteins.PLoS One9:e110488. 10.1371/journal.pone.0110488
148
MeiS.ZhuH. (2014b). Computational reconstruction of proteome-wide protein interaction networks between HTLV retroviruses and Homo sapiens.BMC Bioinformatics15:245. 10.1186/1471-2105-15-245
149
MendesV.GalvãoI.VieiraA. T. (2019). Mechanisms by which the gut microbiota influences cytokine production and modulates host inflammatory responses.J. Interferon Cytokine Res.39393–409. 10.1089/jir.2019.0011
150
MészárosB.ErdosG.DosztányiZ. (2018). IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding.Nucleic Acids Res.46W329–W337. 10.1093/nar/gky384
151
MeyerJ. M.LeempoelK.LosapioG.HadlyE. A. (2020). Molecular ecological network analyses: an effective conservation tool for the assessment of biodiversity, trophic interactions, and community structure.Front. Ecol. Evol.8:588430. 10.3389/fevo.2020.588430
- CrossRef
- Google Scholar
152
MiryalaS. K.AnbarasuA.RamaiahS. (2018). Discerning molecular interactions: a comprehensive review on biomolecular interaction databases and network analysis tools.Gene64284–94. 10.1016/j.gene.2017.11.028
153
MitchellA. L.AttwoodT. K.BabbittP. C.BlumM.BorkP.BridgeA.et al (2019). InterPro in 2019: improving coverage, classification and access to protein sequence annotations.Nucleic Acids Res.47D351–D360. 10.1093/nar/gky1100
154
MiziantyM. J.PengZ.KurganL. (2013). MFDp2: accurate predictor of disorder in proteins by fusion of disorder probabilities, content and profiles.Intrinsically Disord Proteins1:e24428. 10.4161/idp.24428
155
MollentzeN.StreickerD. G. (2020). Viral zoonotic risk is homogenous among taxonomic orders of mammalian and avian reservoir hosts.Proc. Natl. Acad. Sci. U S A.1179423–9430. 10.1073/pnas.1919176117
156
MullerE. E. L.GlaabE.MayP.VlassisN.WilmesP. (2013). Condensing the omics fog of microbial communities.Trends Microbiol.21325–333. 10.1016/j.tim.2013.04.009
157
NegiS.PandeyS.SrinivasanS. M.MohammedA.GudaC. (2015). LocSigDB: a database of protein localization signals.Database (Oxford)2015:bav003. 10.1093/database/bav003
158
NouraniE.KhunjushF.DurmuñS. (2016). Computational prediction of virus-human protein-protein interactions using embedding kernelized heterogeneous data.Mol. Biosyst.121976–1986. 10.1039/c6mb00065g
159
NouretdinovI.GammermanA.QiY.Klein-SeetharamanJ. (2012). Determining confidence of predicted interactions between HIV-1 and human proteins using conformal method.Pac. Symp. Biocomput.2012311–322.
- Google Scholar
160
OatesM. E.RomeroP.IshidaT.GhalwashM.MiziantyM. J.XueB.et al (2013). D2P2: database of disordered protein predictions.Nucleic Acids Res.41D508–D516. 10.1093/nar/gks1226
161
OhlandC. L.JobinC. (2015). Microbial activities and intestinal homeostasis: a delicate balance between health and disease.Cell. Mol. Gastroenterol. Hepatol.128–40. 10.1016/j.jcmgh.2014.11.004
162
OkudaS.WatanabeY.MoriyaY.KawanoS.YamamotoT.MatsumotoM.et al (2017). jPOSTrepo: an international standard data repository for proteomes.Nucleic Acids Res.45D1107–D1111. 10.1093/nar/gkw1080
163
OrchardS.AmmariM.ArandaB.BreuzaL.BrigantiL.Broackes-CarterF.et al (2014). The MIntAct project –IntAct as a common curation platform for 11 molecular interaction databases.Nucleic Acids Res.42D358–D363. 10.1093/nar/gkt1115
164
ParkinsonH.KapusheskyM.ShojatalabM.AbeygunawardenaN.CoulsonR.FarneA.et al (2007). ArrayExpress–a public database of microarray experiments and gene expression profiles.Nucleic Acids Res.35D747–D750. 10.1093/nar/gkl995
165
PaullE. O.CarlinD. E.NiepelM.SorgerP. K.HausslerD.StuartJ. M. (2013). Discovering causal pathways linking genomic events to transcriptional states using Tied Diffusion Through Interacting Events (TieDIE).Bioinformatics292757–2764. 10.1093/bioinformatics/btt471
166
PayneW. E.GarrelsJ. I. (1997). Yeast Protein Database (YPD): a database for the complete proteome of Saccharomyces cerevisiae.Nucleic Acids Res.2557–62. 10.1093/nar/25.1.57
167
PeabodyM. A.LairdM. R.VlasschaertC.LoR.BrinkmanF. S. L. (2016). PSORTdb: expanding the bacteria and archaea protein subcellular localization database to better reflect diversity in cell envelope structures.Nucleic Acids Res.44D663–D668. 10.1093/nar/gkv1271
168
PedamalluC. S.OzdamarL. (2014). “A review on protein-protein interaction network databases,” in Modeling, Dynamics, Optimization and Bioeconomics I Springer Proceedings in Mathematics & Statistics, edsPintoA. A.ZilbermanD. (Cham: Springer International Publishing), 511–519. 10.1007/978-3-319-04849-9_30
- CrossRef
- Google Scholar
169
PennyH. A.HodgeS. H.HepworthM. R. (2018). Orchestration of intestinal homeostasis and tolerance by group 3 innate lymphoid cells.Semin. Immunopathol.40357–370. 10.1007/s00281-018-0687-688
- CrossRef
- Google Scholar
170
Perez-RiverolY.CsordasA.BaiJ.Bernal-LlinaresM.HewapathiranaS.KunduD. J.et al (2019). The PRIDE database and related tools and resources in 2019: improving support for quantification data.Nucleic Acids Res.47D442–D450. 10.1093/nar/gky1106
171
PerkinsJ. R.DibounI.DessaillyB. H.LeesJ. G.OrengoC. (2010). Transient protein-protein interactions: structural, functional, and network properties.Structure18, 1233–1243. 10.1016/j.str.2010.08.007
172
PetersJ. M.SolomonS. L.ItohC. Y.BrysonB. D. (2019). Uncovering complex molecular networks in host pathogen interactions using systems biology.Emerg. Top. Life Sci.3, 371–378. 10.1042/ETLS20180174
173
PeyJ.TobalinaL.de CisnerosJ. P. J.PlanesF. J. (2013). A network-based approach for predicting key enzymes explaining metabolite abundance alterations in a disease phenotype.BMC Syst. Biol.7:2. 10.1186/1752-0509-7-62
174
PickardJ. M.ZengM. Y.CarusoR.NúñezG. (2017). Gut microbiota: role in pathogen colonization, immune responses, and inflammatory disease.Immunol. Rev.27970–89. 10.1111/imr.12567
175
PierleoniA.MartelliP. L.FariselliP.CasadioR. (2007). eSLDB: eukaryotic subcellular localization database.Nucleic Acids Res.35D208–D212. 10.1093/nar/gkl775
176
PryorR.NorvaisasP.MarinosG.BestL.ThingholmL. B.QuintaneiroL. M.et al (2019). Host-Microbe-Drug-Nutrient screen identifies bacterial effectors of metformin therapy.Cell1781299–1312.e29. 10.1016/j.cell.2019.08.003
177
QiY.TastanO.CarbonellJ. G.Klein-SeetharamanJ.WestonJ. (2010). Semi-supervised multi-task learning for predicting interactions between HIV-1 and human proteins.Bioinformatics26i645–i652. 10.1093/bioinformatics/btq394
178
QiuY.-Q.ZhangS.ZhangX.-S.ChenL. (2010). Detecting disease associated modules and prioritizing active genes based on high throughput data.BMC Bioinformatics11:26. 10.1186/1471-2105-11-26
179
RaghavachariB.TasneemA.PrzytyckaT. M.JothiR. (2008). DOMINE: a database of protein domain interactions.Nucleic Acids Res.36D656–D661. 10.1093/nar/gkm761
180
RajasekharanS.RanaJ.GulatiS.SharmaS. K.GuptaV.GuptaS. (2013). Predicting the host protein interactors of Chandipura virus using a structural similarity-based approach.Pathog Dis.6929–35. 10.1111/2049-632X.12064
181
RanaA.AhmedM.RubA.AkhterY. (2015). A tug-of-war between the host and the pathogen generates strategic hotspots for the development of novel therapeutic interventions against infectious diseases.Virulence6, 566–580. 10.1080/21505594.2015.1062211
182
RastogiS.RostB. (2011). LocDB: experimental annotations of localization for homo sapiens and Arabidopsis thaliana.Nucleic Acids Res.39D230–D234. 10.1093/nar/gkq927
183
RodenburgS. Y. A.SeidlM. F.JudelsonH. S.VuA. L.GoversF.de RidderD. (2019). Metabolic model of the phytophthora infestans-tomato interaction reveals metabolic switches during host colonization.mBio10:e00454-19. 10.1128/mBio.00454-419
- CrossRef
- Google Scholar
184
RomanoP.DrägerA.FiannacaA.GiugnoR.La RosaM.et al (2019). The 2017 Network Tools and Applications in Biology (NETTAB) workshop: aims, topics and outcomes.BMC Bioinformatics20:125. 10.1186/s12859-019-2681-2680
- CrossRef
- Google Scholar
185
RoumeH.Heintz-BuschartA.MullerE. E. L.MayP.SatagopamV. P.LacznyC. C.et al (2015). Comparative integrated omics: identification of key functionalities in microbial community-wide metabolic networks.NPJ Biofilms Microb.1:15007. 10.1038/npjbiofilms.2015.7
186
Saçar DemirciM. D.AdanA. (2020). Computational analysis of microRNA-mediated interactions in SARS-CoV-2 infection.PeerJ8:e9369. 10.7717/peerj.9369
187
SahuS. S.WeirickT.KaundalR. (2014). Predicting genome-scale Arabidopsis-Pseudomonas syringae interactome using domain and interolog-based approaches.BMC Bioinformatics15(Suppl. 11):S13. 10.1186/1471-2105-15-S11-S13
188
SaikO. V.IvanisenkoT. V.DemenkovP. S.IvanisenkoV. A. (2016). Interactome of the hepatitis C virus: literature mining with ANDSystem.Virus Res.21840–48. 10.1016/j.virusres.2015.12.003
189
SamalS. S.RadulescuO.WeberA.FröhlichH. (2017). Linking metabolic network features to phenotypes using sparse group lasso.Bioinformatics33, 3445–3453. 10.1093/bioinformatics/btx427
190
SchaubeckM.ClavelT.CalasanJ.LagkouvardosI.HaangeS. B.JehmlichN.et al (2016). Dysbiotic gut microbiota causes transmissible Crohn’s disease-like ileitis independent of failure in antimicrobial defence.Gut65225–237. 10.1136/gutjnl-2015-309333
191
SchlekerS.Garcia-GarciaJ.Klein-SeetharamanJ.OlivaB. (2012). Prediction and comparison of Salmonella-human and Salmonella-Arabidopsis interactomes.Chem. Biodivers.9991–1018. 10.1002/cbdv.201100392
192
SchmidtT.SamarasP.FrejnoM.GessulatS.BarnertM.KieneggerH.et al (2018). ProteomicsDB.Nucleic Acids Res.46D1271–D1281. 10.1093/nar/gkx1029
193
SchommerN. N.GalloR. L. (2013). Structure and function of the human skin microbiome.Trends Microbiol.21660–668. 10.1016/j.tim.2013.10.001
194
SchweppeD. K.HardingC.ChavezJ. D.WuX.RamageE.SinghP. K.et al (2015). Host-Microbe Protein Interactions during Bacterial Infection.Chem. Biol.22, 1521–1530. 10.1016/j.chembiol.2015.09.015
195
ShahP.FritzJ. V.GlaabE.DesaiM. S.GreenhalghK.FrachetA.et al (2016). A microfluidics-based in vitro model of the gastrointestinal human-microbe interface.Nat. Commun.7:11535. 10.1038/ncomms11535
196
SharmaV.EckelsJ.SchillingB.LudwigC.JaffeJ. D.MacCossM. J.et al (2018). Panorama public: a public repository for quantitative data sets processed in skyline.Mol. Cell Proteomics171239–1244. 10.1074/mcp.RA117.000543
197
ShastryK. A.SanjayH. A. (2020). “Machine learning for bioinformatics,” in Statistical Modelling and Machine Learning Principles for Bioinformatics Techniques, Tools, and Applications Algorithms for Intelligent Systemseds.SrinivasaK. G.SiddeshG. M.ManisekharS. R. (Singapore: Springer Singapore), 25–39. 10.1007/978-981-15-2445-5_3
- CrossRef
- Google Scholar
198
ShawK. A.BerthaM.HofmeklerT.ChopraP.VatanenT.SrivatsaA.et al (2016). Dysbiosis, inflammation, and response to treatment: a longitudinal study of pediatric subjects with newly diagnosed inflammatory bowel disease.Genome Med.8:75. 10.1186/s13073-016-0331-y
199
ShirahamaS.MikiA.KaburakiT.AkimitsuN. (2020). Long non-coding RNAs involved in pathogenic infection.Front. Genet.11:454. 10.3389/fgene.2020.00454
200
ShoombuatongW.HongjaiseeS.BarinF.ChaijaruwanichJ.SamleeratT. (2012). HIV-1 CRF01_AE coreceptor usage prediction using kernel methods based logistic model trees.Comput. Biol. Med.42885–889. 10.1016/j.compbiomed.2012.06.011
201
Silmon de MonerriN. C.KimK. (2014). Pathogens hijack the epigenome: a new twist on host-pathogen interactions.Am. J. Pathol.184, 897–911. 10.1016/j.ajpath.2013.12.022
202
SinghN.BhatiaV.SinghS.BhatnagarS. (2019). MorCVD: a unified database for host-pathogen protein-protein interactions of cardiovascular diseases related to microbes.Sci. Rep.9:4039. 10.1038/s41598-019-40704-5
203
SudhakarP.JacominA.-C.HautefortI.SamavedamS.FatemianK.AriE.et al (2019). Targeted interplay between bacterial pathogens and host autophagy.Autophagy151620–1633. 10.1080/15548627.2019.1590519
204
SunJ.YangL.-L.ChenX.KongD.-X.LiuR. (2018). Integrating multifaceted information to predict Mycobacterium tuberculosis-human protein-protein interactions.J. Proteome Res.173810–3823. 10.1021/acs.jproteome.8b00497
205
SzklarczykD.MorrisJ. H.CookH.KuhnM.WyderS.SimonovicM.et al (2017). The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible.Nucleic Acids Res.45D362–D368. 10.1093/nar/gkw937
206
TastanO.QiY.CarbonellJ. G.Klein-SeetharamanJ. (2009). Prediction of interactions between HIV-1 and human proteins by information integration.Pac. Symp. Biocomput.2009516–527. 10.1142/9789812836939_0049
- CrossRef
- Google Scholar
207
The UniProt Consortium (2018). UniProt: the universal protein knowledgebase.Nucleic Acids Res.46:2699. 10.1093/nar/gky092
208
ThieleI.HeinkenA.FlemingR. M. T. (2013a). A systems biology approach to studying the role of microbes in human health.Curr. Opin. Biotechnol.244–12. 10.1016/j.copbio.2012.10.001
209
ThieleI.SwainstonN.FlemingR. M. T.HoppeA.SahooS.AurichM. K.et al (2013b). A community-driven global reconstruction of human metabolism.Nat. Biotechnol.31419–425. 10.1038/nbt.2488
210
ThieleI.SahooS.HeinkenA.HertelJ.HeirendtL.AurichM. K.et al (2020). Personalized whole-body models integrate metabolism, physiology, and the gut microbiome.Mol. Syst. Biol.16:e8982. 10.15252/msb.20198982
211
ThieuT.JoshiS.WarrenS.KorkinD. (2012). Literature mining of host-pathogen interactions: comparing feature-based supervised learning and language-based approaches.Bioinformatics28867–875. 10.1093/bioinformatics/bts042
212
ThompsonD.RegevA.RoyS. (2015). Comparative analysis of gene regulatory networks: from network reconstruction to evolution.Annu. Rev. Cell Dev. Biol.31399–428. 10.1146/annurev-cellbio-100913-112908
- CrossRef
- Google Scholar
213
ThulP. J.LindskogC. (2018). The human protein atlas: a spatial map of the human proteome.Protein Sci.27233–244. 10.1002/pro.3307
214
TramontanoM.AndrejevS.PruteanuM.KlünemannM.KuhnM.GalardiniM.et al (2018). Nutritional preferences of human gut bacteria reveal their metabolic idiosyncrasies.Nat. Microbiol.3514–522. 10.1038/s41564-018-0123-129
- CrossRef
- Google Scholar
215
TüreiD.KorcsmárosT.Saez-RodriguezJ. (2016). OmniPath: guidelines and gateway for literature-curated signaling pathway resources.Nat. Methods13966–967. 10.1038/nmeth.4077
216
TyagiN.KrishnadevO.SrinivasanN. (2009). Prediction of protein-protein interactions between Helicobacter pylori and a human host.Mol. Biosyst.51630–1635. 10.1039/b906543c
217
ValliR. X. E.LyngM.KirkpatrickC. L. (2020). There is no hiding if you Seq: recent breakthroughs in Pseudomonas aeruginosa research revealed by genomic and transcriptomic next-generation sequencing.J. Med. Microbiol.69162–175. 10.1099/jmm.0.001135
218
VandinF.UpfalE.RaphaelB. J. (2011). Algorithms for detecting significantly mutated pathways in cancer.J. Comput. Biol.18507–522. 10.1089/cmb.2010.0265
219
VeresD. V.GyurkóD. M.ThalerB.SzalayK. Z.FazekasD.KorcsmárosT.et al (2015). ComPPI: a cellular compartment-specific database for protein-protein interaction network analysis.Nucleic Acids Res.43D485–D493. 10.1093/nar/gku1007
220
ViaA.UyarB.BrunC.ZanzoniA. (2015). How pathogens use linear motifs to perturb host cell networks.Trends Biochem. Sci.40, 36–48. 10.1016/j.tibs.2014.11.001
221
WallqvistA.WangH.ZavaljevskiN.MemiševiæV.KwonK.PieperR.et al (2017). Mechanisms of action of Coxiella burnetii effectors inferred from host-pathogen protein interactions.PLoS One12:e0188071. 10.1371/journal.pone.0188071
222
WangB.YaoM.LvL.LingZ.LiL. (2017). The human microbiota in health and disease.Engineering3, 71–82. 10.1016/J.ENG.2017.01.008
- CrossRef
- Google Scholar
223
WardJ. J.McGuffinL. J.BrysonK.BuxtonB. F.JonesD. T. (2004). The DISOPRED server for the prediction of protein disorder.Bioinformatics202138–2139. 10.1093/bioinformatics/bth195
224
WeibergA.WangM.BellingerM.JinH. (2014). Small RNAs: a new paradigm in plant-microbe interactions.Annu. Rev. Phytopathol.52, 495–516. 10.1146/annurev-phyto-102313-045933
225
WenC.ZhengZ.ShaoT.LiuL.XieZ.Le ChatelierE.et al (2017). Quantitative metagenomics reveals unique gut microbiome biomarkers in ankylosing spondylitis.Genome Biol.18:42. 10.1186/s13059-017-1271-6
226
WongA. C. N.VanhoveA. S.WatnickP. I. (2016). The interplay between intestinal bacteria and host metabolism in health and disease: lessons from Drosophila melanogaster.Dis. Model. Mech.9, 271–281. 10.1242/dmm.023408
227
WongE.BaurB.QuaderS.HuangC.-H. (2012). Biological network motif detection: principles and practice.Brief. Bioinformatics13202–215. 10.1093/bib/bbr033
228
WuchtyS. (2011). Computational prediction of host-parasite protein interactions between Plasmodium falciparum and H. sapiens.PLoS One6:e26960. 10.1371/journal.pone.0026960
229
XueB.DunbrackR. L.WilliamsR. W.DunkerA. K.UverskyV. N. (2010). PONDR-FIT: a meta-predictor of intrinsically disordered amino acids.Biochim. Biophys. Acta1804996–1010. 10.1016/j.bbapap.2010.01.011
230
YangG.PengM.TianX.DongS. (2017). Molecular ecological network analysis reveals the effects of probiotics and florfenicol on intestinal microbiota homeostasis: an example of sea cucumber.Sci. Rep.7:4778. 10.1038/s41598-017-05312-5311
- CrossRef
- Google Scholar
231
YijingL.HaixiangG.XiaoL.YananL.JinlingL. (2016). Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data.Knowledge-Based Systems9488–104. 10.1016/j.knosys.2015.11.013
- CrossRef
- Google Scholar
232
YilmazB.JuilleratP.ØyåsO.RamonC.BravoF. D.FrancY.et al (2019). Microbial network disturbances in relapsing refractory Crohn’s disease.Nat. Med.25323–336. 10.1038/s41591-018-0308-z
233
YounesiE. (2015). Disease systems modeling for discovery of mechanistic biomarkers.Eur. J. Mol. Clin. Med.2:61. 10.1016/j.nhtm.2014.11.023
- CrossRef
- Google Scholar
234
YuH.XueD.WangY.ZhengW.ZhangG.WangZ.-L. (2020). Molecular ecological network analysis of the response of soil microbial communities to depth gradients in farmland soils.Microbiologyopen9:e983. 10.1002/mbo3.983
235
ZampieriG.VijayakumarS.YaneskeE.AngioneC. (2019). Machine and deep learning meet genome-scale metabolic modeling.PLoS Comput. Biol.15:e1007084. 10.1371/journal.pcbi.1007084
236
ZhangA.HeL.WangY. (2017a). Prediction of GCRV virus-host protein interactome based on structural motif-domain interactions.BMC Bioinformatics18:145. 10.1186/s12859-017-1500-1508
- CrossRef
- Google Scholar
237
ZhangM.SuQ.LuY.ZhaoM.NiuB. (2017b). Application of machine learning approaches for protein-protein interactions prediction.Med. Chem.13506–514. 10.2174/1573406413666170522150940
238
ZhengQ.ZhangM.ZhangT.LiX.ZhuM.WangX. (2020). Insights from metagenomic, metatranscriptomic, and molecular ecological network analyses into the effects of chromium nanoparticles on activated sludge system.Front. Environ. Sci. Eng.14:60. 10.1007/s11783-020-1239-1238
- CrossRef
- Google Scholar
239
ZhouH.GaoS.NguyenN. N.FanM.JinJ.LiuB.et al (2014). Stringent homology-based prediction of H. sapiens-M. tuberculosis H37Rv protein-protein interactions.Biol. Direct9:5. 10.1186/1745-6150-9-5
240
ZhouH.RezaeiJ.HugoW.GaoS.JinJ.FanM.et al (2013). Stringent DDI-based prediction of H. sapiens-M. tuberculosis H37Rv protein-protein interactions.BMC Syst. Biol.7:S6. 10.1186/1752-0509-7-S6-S6
241
ZhouJ.DengY.LuoF.HeZ.TuQ.ZhiX. (2010). Functional molecular ecological networks.mBio1:e00169-10. 10.1128/mBio.00169-110
- CrossRef
- Google Scholar
242
ZhouX.ParkB.ChoiD.HanK. (2018). A generalized approach to predicting protein-protein interactions between virus and host.BMC Genomics19:568. 10.1186/s12864-018-4924-4922
- CrossRef
- Google Scholar

Summary

Keywords

health, disease, microbiome-host interactions, molecular mechanisms, computational approaches, machine learning, basic and clinical research

Citation

Sudhakar P, Machiels K, Verstockt B, Korcsmaros T and Vermeire S (2021) Computational Biology and Machine Learning Approaches to Understand Mechanistic Microbiome-Host Interactions. Front. Microbiol. 12:618856. doi: 10.3389/fmicb.2021.618856

Received

18 October 2020

Accepted

19 March 2021

Published

11 May 2021

Volume

12 - 2021

Edited by

Isabel Moreno Indias, University of Málaga, Spain

Reviewed by

Zhili He, University of Oklahoma, United States; Christopher L. Hemme, University of Rhode Island, United States; Swagatika Sahoo, Indian Institute of Technology Madras, India

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Padhmanand Sudhakar, padhmanand.sudhakar@kuleuven.be; orcid.org/0000-0003-1907-4491

This article was submitted to Systems Microbiology, a section of the journal Frontiers in Microbiology

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Systems Microbiology

REVIEW article

Computational Biology and Machine Learning Approaches to Understand Mechanistic Microbiome-Host Interactions

Abstract

Introduction: Microbiome-Host Interactions

Biological Networks: Concepts and Applications

Computational Methods in Microbiome-Host Interactions: Filling the Gaps

Classification of Computational Methods in Microbiome-Host Interactions

Approaches Inferring Mechanistic Metabolic Interactions