Application of Machine Learning in Microbiology

Qu, Kaiyang; Guo, Fei; Liu, Xiangrong; Lin, Yuan; Zou, Quan

doi:10.3389/fmicb.2019.00827

REVIEW article

Front. Microbiol., 18 April 2019

Sec. Systems Microbiology

Volume 10 - 2019 | https://doi.org/10.3389/fmicb.2019.00827

Application of Machine Learning in Microbiology

1. College of Intelligence and Computing, Tianjin University, Tianjin, China
2. School of Information Science and Technology, Xiamen University, Xiamen, China
3. Department of System Integration, Sparebanken Vest, Bergen, Norway
4. Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
5. Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China

Abstract

Microorganisms are ubiquitous and closely related to people’s daily lives. Since they were first discovered in the 19th century, researchers have shown great interest in microorganisms. People studied microorganisms through cultivation, but this method is expensive and time consuming. However, the cultivation method cannot keep a pace with the development of high-throughput sequencing technology. To deal with this problem, machine learning (ML) methods have been widely applied to the field of microbiology. Literature reviews have shown that ML can be used in many aspects of microbiology research, especially classification problems, and for exploring the interaction between microorganisms and the surrounding environment. In this study, we summarize the application of ML in microbiology.

Introduction

Microorganisms first appeared approximately 3.5 billion years ago, making them one of the earliest living things on Earth (Nannipieri et al., 2010). Microorganisms include bacteria, viruses, fungi, some small protozoa, and microscopic algae. These organisms, which are closely related to human beings (Ley et al., 2006a), have a wide range of beneficial and harmful uses, including in the food (Cotter et al., 2005), medicine (Petrof et al., 2012; Yu et al., 2018), agriculture (Morris et al., 1986), industrial (Souza, 2010), environmental protection and other fields (Reiff and Kelly, 2010).

Microbiology is a discipline that studies the structure and function of microbial groups, the interrelationships and mechanisms of internal communities, and the relationships between microorganisms and their environments or hosts (Alexander, 1962; Niel, 1966). The microbiome is a collection of all microbial species and their genetic information and functions in a given environment. Studies of the microbiome also include the interaction between different microorganisms (DiMucci et al., 2018), the interaction between microorganisms and other species (Xie et al., 2018), and the interaction between microorganisms and the environment (Moitinho-Silva et al., 2017). Because of their small size, the microscope is an important tool for studying microorganisms. However, microscopy analyses only allow observation and must therefore be complemented by culture techniques to study the biological, physiological, genetic, metabolic, pathogenic and other biological characteristics of microorganisms (Waldron, 2018). During cultivation, researchers can also explore the interactions between microorganisms and the environment, which reflect the breadth and diversity of microbial distribution. A variety of microorganisms living in different environments or in different hosts form microbial communities, which have extensive and complex interactions with the environment and the host and form various types of ecosystems (Srinivasan et al., 2012; Xie et al., 2018).

With the development of microbial sequencing in recent years, the microbiome has become increasingly popular in many studies. High-throughput sequencing technology has resulted in generation of an increasing amount of microbial data. Traditional methods using microscopes and biological cultures are expensive and labor intensive; therefore, machine-learning methods have been gradually applied to microbial studies (Huang Y. A. et al., 2017; Huang Z. A. et al., 2017; Wang et al., 2017; Wei et al., 2017a,b; Peng et al., 2018; Yang et al., 2018b; Zou et al., 2018a). Here, we introduce the application of machine learning (ML) in microbial analyses. Since ML is mainly applied to classification and interaction problems, we focus on these two areas. Figure 1 shows the framework of this paper.

Figure 1

Machine Learning Methods

Machine Learning is a multi-disciplinary subject involving many disciplines including probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory (Qu et al., 2017; Zou et al., 2018b). ML methods can be divided into two types (Zitnik et al., 2019), supervised learning and unsupervised learning. Supervised learning (Stoter et al., 2019) requires that the model be trained using a training set. The training sets for supervised learning include features and results. Common supervised learning algorithms include regression analysis and statistical classification. Unsupervised learning, also known as clustering, adopts k-means to establish a centriole and reduce error through iteration and descent to achieve classification. With the development of ML, more and more fields have begun to use this technique for research (Chen W. et al., 2016, Chen et al., 2017a,d, 2018a,b,e,f,g; Li et al., 2016; Zou et al., 2016, 2017; Ding et al., 2017a,b; Feng et al., 2017a; Yu et al., 2017a; Zeng et al., 2017a, 2018; Liu et al., 2018; Pan et al., 2018; Wei et al., 2018a,b; Yang et al., 2018a; Zhao et al., 2018b; He et al., 2019; Zhang et al., 2019), for example, drug repositioning (Yu et al., 2016b, 2017b), disease-related microRNA (Chen and Huang, 2017; Chen et al., 2017d, 2018b,e,g; Zhao et al., 2018a,c) identification, and disease-related long non-coding RNA identification (Chen and Yan, 2013; Chen et al., 2017e, 2018c; Hu et al., 2017, 2018). There are four main steps in developing ML algorithms (Oudah and Henschel, 2018). The first step is extraction of the features, which is critical to the ML method (Liu et al., 2015). Then, the operational classification units (OTU) table can be obtained by clustering. Next, important features that can improve the accuracy and efficiency are selected. Finally, a training dataset is used to train the model, after which a test set is used to evaluate the model. The process is summarized in Figure 2.

Figure 2

In microbial studies, according to the collected samples, obtaining relevant OTU is an important step in the study of microbial data. OTU is a type of similar microorganisms, which are cluster according to the similarity DNA sequences (Blaxter et al., 2005). In recent years, OTUs are always used for microbial diversity, especially when analyzing small subunit 16S or 18S rRNA datasets (Schmidt et al., 2014). Sequences can be clustered according to their similarity to one another, and the researcher sets the similarity threshold. After OTU clustering and species classification annotation for OTU, the OTU table can be obtained, which contains the OTU types and quantities for each sample, as well as species annotation information for each OTU.

As we know, some microbes have higher data dimensions, so feature dimensionality reduction is also an important part of data processing. There are some common methods for reducing the dimensionality and many studies are about how to reduce the dimensionality. For example, the principal components analysis (PCA) is a common reduction dimensionality method, which is mainly to decompose the covariance matrix to obtain the principal components and their weights (Jolliffe, 2002). PCA is often used to reduce the dimensionality of dataset while maintaining the feature that maximizes the contribution of the variance in the data set. Principal co-ordinates analysis (PCoA) is another common method. After sorting the feature values and the feature vectors, PCoA selects the features, which are in the top digits and the most significant coordinates in the distance matrix can be found (Podani and Miklós, 2002). The result is a rotation of the data matrix. It does not change the mutual positional relationship between the sample points, but only changes the coordinate system.

In microbial studies, supervised learning is always used, especially the support vector machine (SVM) (Feng et al., 2013a, 2017b; Chen X. X. et al., 2016; Yang et al., 2016), and the Naïve Bayes (NB) (Feng et al., 2013b,c), random forest (RF) (Chen et al., 2018d), and k nearest neighbor (KNN) methods (Chen et al., 2017c).

The SVM is a generalized linear classifier that can perform binary classification of data employing a decision basis, according to the maximum-margin hyperplane of the learning sample. The SVM can classify non-linear data by the kernel methods (Drucker et al., 2002). SVM is widely used in bioinformatics, such as the prediction of proteins (Xu et al., 2018a,b,c). The NB method (Meena and Chandran, 2009), which is a classification based on Bayes’ theory and the independent assumption of features that originate from classical mathematical theory (Rodríguez and Kuncheva, 2007), has a solid mathematical foundation and stable classification efficiency. The NB classifier, which requires only a few parameters, is less sensitive to missing data and simpler than other methods (Jordan, 2008). The RF is a classifier that contains multiple decision trees and its output accords to the voting on each decision tree (Svetnik et al., 2003). KNN (Cui et al., 2001) is a theoretically mature method. The method infers the sample category based on its neighbors. The main steps of the algorithm are as follows (Liao and Vemuri, 2002). First, the distance, which is between the test sample and each training sample, should be calculated. Then, the nearest k training samples are found as the nearest neighbors of the test sample. Finally, the test sample is classified according to the categories of the k nearest neighbors.

Classification and Prediction in Microbiology

Prediction of Microbial Species

There are two main types of microorganisms (Maiden et al., 1998), one of them with non-cellular morphology (Yeom and Javidi, 2006), such as viruses, and the other with cellular morphology that can divided into two types, one of them namely prokaryotes (Weinbauer, 2010), such as archaea and eubacteria, and the other namely eukaryotes (Nowrousian, 2010), such as fungi and unicellular algae. Different microorganisms have different characteristics, so it is important to identify the microorganisms properly. There are two main approaches to the identification of microorganisms. In one, the species of an unknown microorganism is determined with the goal of classifying it based on its domain, kingdom, phylum, class, order, family genus and species. In the other, the goal is to determine whether an unknown microorganism belongs to a specific species or not. For example, we can determine if an unknown microorganism is a virus or not, or more specifically, whether it is a certain virus. In this section, we will introduce recent studies that have used machine-learning methods to predict microorganisms.

In the study (Murali et al., 2018), the authors classified specific species of microorganisms using the IDTAXA, which employed the LearnTaxa and IdTaxa functions. Both of these functions are part of the R package DECIPHER, which was released under the GPLv3 license as part of the Bioconductor, which provides tools for the analysis and comprehension of high-throughput genomic data. The LearnTaxa function attempts to reclassify each training sequence into its tagged taxon using a method known as tree descent, which is similar to the decision tree, a commonly ML algorithms. IdTaxa uses the objects returned by the LearnTaxa and query sequences as input data. This system returns the classification results for each sequence in the taxonomic form and provides the relevant confidence for each level. If the confidence does not reaches the required value, which indicates that the classification cannot be accurately performed at that level. The classification of IdTaxa may lead to different conclusions in microbiological studies. Although the misclassification is small, many of the remaining misclassifications may be caused by the errors in the reference taxonomy. Fiannaca et al. (2018) presented a method for identifying the 16S short-read sequences based on k-mer and deep learning. According to their results, the method can classify both 16S shotgun (SG) and amplicon (AMP) data very well.

It is important to identify specific microbial sequences in mixed metagenomics samples. At present, gene-based similarity methods are popularly used to classify prokaryotic and host organisms from mixed samples; however, these techniques have major weakness. Therefore, many studies have been conducted to identify better methods for identification of specific microorganisms. Amgarten et al. (2018) proposed a tool known as MARVEL for predicting double-stranded DNA bacteriophage sequences in metagenomics. MARVEL uses the RF method, with a training dataset composed of 1,247 phage and 1,029 bacterial genomes and a test dataset composed of 335 bacteria and 177 phage genomes. The authors proposed six features to identify the phages, then used random forests to select features and found three features provided more information (Grazziotin et al., 2017). Ren et al. (2017) developed VirFinder, which is a ML method based on k-mer for virus overlap group identification that avoids gene-based similarity searches. VirFinder trains the ML model through known viral and non-viral (prokaryotic host) sequences to detect the specificity of viral k-mer frequencies. The model was trained with host and viral genomes prior to January 1, 2014, and the test set consisted of sequences obtained after January 1, 2014. VirSorter (Roux et al., 2015) is based on reference dependence and reference independence in different kinds of microbial sequence data to identify the viral signal. Experimental results have shown that VirSorter has good performance, especially for predicting viral sequences outside the host genome.

The above methods specifically classify microorganisms according to different needs. When we want to know the taxonomy information of microorganisms, we can use the method, which proposed by Murali et al. (2018). Moreover, MARVEL, VirSort, and VirFinder can identify specific types of microorganisms. According to the Amgarten et al. (2018), these three methods have comparable performance on specificity, but MARVEL has a better recall (sensitivity) performance. We have compiled materials for implementation of the above methods, which are shown in Table 1.

Table 1

Studies	Availability of data and materials	Reference
IDTAXA	http://DECIPHER.codes	Murali et al., 2018
Fiannaca et al.	https://github.com/IcarPA-TBlab/MetagenomicDC	Fiannaca et al., 2018
MARVEL	https://github.com/LaboratorioBioinformatica/MARVEL	Amgarten et al., 2018
VirFinder	https://github.com/jessieren/VirFinder	Ren et al., 2017
VirSorter	https://github.com/simroux/VirSorter	Roux et al., 2015

The available data and materials for prediction of microbial species.

Prediction of Environmental and Host Phenotypes

With the development of next-generation DNA and high-throughput sequencing, a new area of microbiology has been generated. The main research in this field is to link microbial populations to phenotypes and ecological environments, which can provide favorable support for disease outbreaks and precision medicine (Atlas and Bartha, 1981). It is well known that some microorganisms are parasitic and that the surrounding environment and host cells have an important impact on the microbial population. Differences in nutrient availability and environmental conditions lead to differences in microbial communities (Moran, 2015). Because microorganisms can exchange information with the surrounding environment and host cells, we can predict the environmental and host phenotypes based on the microorganisms that are present (Xie et al., 2018). This provides a more comprehensive understanding of the environment and the host, so that we can better use the environment and protect the host. Many studies have recently been conducted to predict environmental and host phenotypes using microorganisms. In this section, we introduce these studies.

Asgari et al. (2018) used shallow subsample representation based on k-mer and deep learning, random forests, and SVMs to predict environmental and host phenotypes from 16S rRNA gene sequencing using the MicroPheno system. They found that the shallow subsample representation based on k-mer is superior to OTU in terms of body location recognition and Crohn’s disease prediction. In addition, the deep learning method is better than the RF and SVM for large datasets. This method not only can improve the performance, but also avoid overfitting. Moreover, it can reduce the time of pretreatment. Statnikov et al. (2013) used OTUs as an input feature and processed the data as follows. First, the authors sequenced the original DNA, after which they removed the human DNA sequence and defined the OTUs based on the microbial sequence. Next, they quantified the relative abundance of all sequences belonging to each OTU. The authors used SVM, kernel ridge regression, regularized logistic regression, Bayesian logistic regression, the KNN method, the RF method and probabilistic neural networks with different parameters and kernel functions. Overall, they investigated 18 ML methods. In addition, they used five feature extraction methods. The experimental results revealed that the RF, SVM, kernel-regression and Bayesian logic use Laplacian prior regression provided better performance. Based on their research, human skin microorganisms collected from objects that have been touched can be used to identify the individual from which they originated. In this work, the author used a variety of classification and dimensionality reduction methods to explore the effects of each method. It is very useful for the next work, which provides a comprehensive comparison. Schmedes et al. (2018) used the microbial community for forensic identification. In their study, they developed the hidSkinPlex, a novel targeted sequencing method using skin microbiome markers developed for human identification. In forensic science, it is important to estimate the time of death. Johnson et al. (2016) used KNN regression to predict the time interval after death using datasets from nose and ear samples. This indicates that skin microbiota can be an important tool in forensic death investigation. Traditionally, marine biological monitoring involves the classification and morphological identification of large benthic invertebrates, which requires a great deal of time and money. Cordier et al. (2017) used eDNA metabarcoding and supervised ML to build a powerful prediction model of benthic monitoring. Moitinho-Silva et al. (2017), studied the microbial flora of sponges and their HMA-LMA status demonstrated the applicability of ML to exploring host-related microbial community patterns.

Due to the specificity of microbial communities, we can better identify the environment and the host. Moreover, we can judge the existing environmental conditions and host survival status according to the existence of microbial community. We summarize the available datasets and methods, which are shown in Table 2.

Table 2

Studies	Availability of data and materials	Reference
Asgari et al.	https://llp.berkeley.edu/micropheno	Asgari et al., 2018
Statnikov et al.	https://link.springer.com/article/10.1186/2049-2618-1-11	Statnikov et al., 2013

The available data and materials for prediction of environmental and host phenotypes.

Using Microbial Communities to Predict Disease

Microbiomes are important to human health and disease (Bourne et al., 2009). Indeed, there are many microbial communities in the human body. Once a microbial community is out of balance or foreign microorganisms invade, the human body is likely to get sick. For example, intestinal microbial communities are associated with obesity (Ley et al., 2006b) and pulmonary communities with pulmonary infection (Sibley et al., 2008). Because of the complexity of these communities, it is difficult to determine which kind of microbiome communities cause of the disease. Recently, many studies have investigated use of microbiome communities to predict diseases, especially bacterial vaginosis (Srinivasan et al., 2012; Deng et al., 2018) and inflammatory bowel disease (Gillevet et al., 2010). By analyzing microbial communities, we can better understand the disease and then make effective decisions regarding treatment. Therefore, in this section, we discuss current studies investigating use of microbiome communities to predict diseases.

Bacterial vaginosis (BV) is a disease associated with the vaginal microbiome. Beck and Foster (2014) used the genetic algorithm (GP), RF, and logistic regression (LR) to classify BV according to microbial communities. There are two criteria for BV, the Amsel standard, which accord to the discharge, whiff, clue cells, and pH (Amsel et al., 1983), and Nugent score, which dependents on counting gram-positive cells (Nugent et al., 1991). The dataset in Beck et al. study was from Ravel et al. (2011) and Sujatha et al. (2012). The method in the paper (Beck and Foster, 2014) first classifies BV according to vaginal microbiota and related environmental factors, then identifies the most important microbial community for predicting BV.

Hierarchical feature extraction is based on the classification of microbes from kingdoms to species. The existing stratification feature selection algorithm will lead to information loss, and the stratification information of some 16S rRNA sequences is usually incomplete, influencing the classification. Therefore, Oudah and Henschel (2018) proposed a method known as hierarchical feature engineering (HFE) to identify colorectal cancer (CRC). To accomplish this, they used RF, decision trees and the NB method to classify a dataset of Next Generation Sequencing based 16S rRNA sequences provided by metagenomics studies. This method is good for processing datasets with high dimensional features. Moreover, the available dataset and method are in https://github.com/HenschelLab/HierarchicalFeatureEngineering.

In another study (Wisittipanit, 2012), the author focused on predicting inflammatory bowel disease. In that study, patients with Crohn’s disease and ulcerative colitis were compared with healthy controls to identify differences between the mucosa and lumen in different intestinal locations. The author used the Relief algorithm (Kira and Rendell, 1992) to select features, and Metastats (White et al., 2009) to detect differential features. Finally, the author used KNN and SVM as classifiers to perform disease specificity and site specificity analysis.

In this section, we discuss using microorganisms to predict different diseases. Beck and Foster (2014) predicted BV according to the microorganisms and the diagnosis standard of BV. HFE identified the CRC according to the OTU ID and the taxonomy information. Wisittipanit proposed a method to predict Crohn’s disease, based on OTU and feature selection method. The above methods used different ideas to predict diseases by using microorganisms and obtained good results. This indicates that some diseases affect human colonies. According to these colony changes, we can not only predict the disease, but also treat the disease according to the colony condition, which is a direction for future research.

Interaction and Association in Microbiology

Interaction Between Microorganisms

The collective behavior of microbial ecosystems in biomes is the result of many interactions between community members. These interactions include metabolite exchange, signaling and quorum sensing processes, as well as growth inhibition and killing (Langille et al., 2013; DiMucci et al., 2018). Understanding the interspecific interactions within microbial communities is critical to understanding the functions of natural ecosystems and the design of synthetic consortia (Mainali et al., 2017). Therefore, in this section, we introduce the application of ML to investigation of interactions between microorganisms.

DiMucci et al. (2018) showed how the microbial interaction network can be combined with the characteristic level of individual microbes to provide an accurate inference of the missing edges in the network and a constructive mechanism of the interaction. The same authors proposed the notion of a composite vector that combined the generated trait vectors and pairwise interactions. The training set for the model is all observed interactions. The model was then used to predict the unobserved interactions. If the random forest classifier is used, feature contributions can be calculated. Microbial interactions in the soil can affect crop yields; therefore, Chang et al. (2017) used the random forest method to predict the productivity based on the microorganisms. In this study, the improved crop productivity differences were linked to the soil microbial composition.

There are cooperative and competitive relationships within the same microbial population. Moreover, there are eight relationships between the different microbial populations, which are neutralism, commensalism, synergism, mutualism, competition, amensalism, parasitism and predation. Understanding the interactions between microorganisms is important for the study of microbial species and for microbial applications. However, there are not many studies on ML in this area, which will be an important research direction.

Microbiome-Disease Association

There are many kinds of microorganisms in human bodies, and they are inseparable from human health. For example, intestinal microbial disorders can cause intestinal inflammatory diseases (Chen et al., 2017b), such as ulcerative colitis, CRC, atherosclerosis, diabetes and obesity. Accordingly, it is necessary to predict the microbial-disease association because this study not only improves the diagnosis and prognosis of human diseases, but also develops the new drugs (Yu et al., 2015, 2016a; Shi et al., 2016; Su et al., 2018; Fan et al., 2019). However, few studies have investigated predictive analysis of the microbial-disease association. Therefore, in this section, we introduce the application of ML to the study of microbial-disease association.

Fan et al. (2019) proposed a new approach to analyze the microbial-disease association by integrating multiple data sources from the human microbe-disease consortium (MDPH_HMDA) and path-based HeteSim scores. First, heterogeneity networks were constructed. Microbe-disease pair weighting was conducted according to the standardized HeteSim measurement method, after which the microbe-disease-disease pathway and microbe-microbe-disease pathway HeteSim scores were integrated. Finally, the correlation scores of potential micro genome associations were calculated. Xuezhong et al. (2014) proposed a method based on the Human Disease Network (HSDN) in which co-occurrence of disease/symptom terms based on PubMed bibliographic records was used to calculate disease similarity. KATZ (Katz, 1953) is a network based measurement method that calculates the similarly of nodes in a heterogeneous network, to solve the link prediction problem proposed by Katz. The KATZ method has been applied in many fields, including disease-gene association prediction (Xiaofei et al., 2014) and IncRNA-disease association prediction (Chen et al., 2015). Chen et al. (2017b) proposed a novel method based on KATZ to predict associations of human microbiota with non-infectious diseases (named KATZHMDA). The KATZHMDA first constructs adjacency matrix A based on known microbial-disease associations. The kernel similarity matrix KD and KM are calculated based on the disease Gaussian interaction profile and microbial Gaussian interaction profile, respectively. We can construct the integrated matrix A^∗ based on KM, KD and known microbial-disease associations. Next, all walks of different lengths are integrated to obtain a single microbe-disease association measurement. Therefore, we can calculate microbe-disease association probability in a matrix form. Shi et al. (2018) proposed a prediction method based on binary matrix completion named BMCMDA. The BMCMDA assumes that the incomplete microbiome-disease association (MDA) matrix is the sum of a potential parameterization matrix and a noise matrix. Additionally, the BMCMDA assumes that the independent subscripts of the items observed in the MDA matrix follow the binomial model. Shi et al. (2018) used the same dataset, which was collected from the Human Microbe-Disease Association Database (HMDAD) and included 292 microbes and 39 human diseases, to perform comparisons. According to the study, BMCMDA is better than the KATZHMDA in AUC. BMCMDA can be integrated with other and independent microbial/disease similarities or characteristics to enhance MDA prediction. Moreover, this method can be applied to more prediction aspects. We summarize the available datasets and methods, which are shown in Table 3.

Table 3

Studies	Availability of data and materials	Reference
Zhou et al.	https://www.nature.com/articles/ncomms5212#supplementary-information	Xiaofei et al., 2014
KATZHMDA	http://dwz.cn/4oX5mS.	Chen et al., 2017b
BMCMDA	https://github.com/JustinShi2016/ISBRA2017	Shi et al., 2018

The available data and materials for microbiome-disease association.

Conclusion

Microorganisms are involved in many life activities, and affect their surrounding environment and other organisms. Microorganisms play important roles in human heath, crop growth, livestock farming, environmental management, industrial chemical production and food production. In the 19th century, people first observed microbes using microscopes and began to study them. However, the development of high-throughput sequencing technology has led to generation of large amounts of microbial related data. As a result, machine-learning methods are now being applied to microbiological research. Here, we discuss the current application of ML in the microbiome. The results revealed that ML is widely used in microbiological research, and that it has focused on classification problems and analysis of interaction problems. However, many problems remain unresolved and will require the cooperation of researchers from different fields, such as biology, informatics and medicine, to jointly promote the development and progress of microbiological research. On the other hand, the recent developed link prediction (Liu et al., 2016; Zeng et al., 2017b) and computational intelligence methods (Cabarle et al., 2017; Song et al., 2018), can be promising in discovering the relationship between diseases and microbes.

Statements

Author contributions

KQ drafted the manuscript. FG and XL conducted research. YL modified the manuscript. QZ conceived the idea.

Funding

The work was supported by the National Key R&D Program of China (2018YFC0910405), and the National Natural Science Foundation of China (No. 61771331).

Acknowledgments

We thank Jeremy Kamen, MSc., from Liwen Bianji, Edanz Group China (www.liwenbianji.cn/ac), for editing the English text of a draft of this manuscript.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

1
AlexanderM. (1962). Introduction of soil microbiology.Soil Sci.93:74. 10.1097/00010694-196201000-00034
- CrossRef
- Google Scholar
2
AmgartenD.BragaL. P. P.da SilvaA. M.SetubalJ. C. (2018). MARVEL, a tool for prediction of bacteriophage sequences in metagenomic bins.Front. Genet.9:304. 10.3389/fgene.2018.00304
3
AmselR.TottenP. A.SpiegelC. A.ChenK. C.EschenbachD.HolmesK. K. (1983). Nonspecific vaginitis. Diagnostic criteria and microbial and epidemiologic associations.Am. J. Med.7414–22. 10.1016/0002-9343(83)91112-9
- CrossRef
- Google Scholar
4
AsgariE.GarakaniK.McHardyA. C.MofradM. R. K. (2018). MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples.Bioinformatics34i32–i42. 10.1093/bioinformatics/bty296
5
AtlasR. M.BarthaR. (1981). Microbial ecology:fundamentals and applications.Acta Ecol. Sin.70:977. 10.1016/j.biortech.2015.07.074
6
BeckD.FosterJ. A. (2014). Machine learning techniques accurately classify microbial communities by bacterial vaginosis characteristics.PLoS One9:e87830. 10.1371/journal.pone.0087830
7
BlaxterM.MannJ.ChapmanT.ThomasF.WhittonC.FloydR.et al (2005). Defining operational taxonomic units using DNA barcode data.Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci.3601935–1943. 10.1098/rstb.2005.1725
8
BourneD. G.GarrenM.WorkT. M.RosenbergE.SmithG. W.HarvellC. D. (2009). Microbial disease and the coral holobiont.Trends Microbiol.17554–562. 10.1016/j.tim.2009.09.004
9
CabarleF. G. C.AdornaH. N.JiangM.ZengX. X. (2017). Spiking neural P systems with scheduled synapses.IEEE Trans. Nanobioscience16792–801. 10.1109/tnb.2017.2762580
10
ChangH. X.HaudenshieldJ. S.BowenC. R.HartmanG. L. (2017). Metagenome-wide association study and machine learning prediction of bulk soil microbiome and crop productivity.Front. Microbiol.8:519. 10.3389/fmicb.2017.00519
11
ChenJ.GuoM. Y.LiS. M.LiuB. (2017a). ProtDec-LTR2.0: an improved method for protein remote homology detection by combining pseudo protein and supervised Learning to Rank.Bioinformatics333473–3476. 10.1093/bioinformatics/btx429
12
ChenX.HuangY.-A.YouZ.-H.YanG.-Y.WangX.-S. (2017b). A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases.Bioinformatics33733–739. 10.1093/bioinformatics/btw715
13
ChenX.WuQ. F.YanG. Y. (2017c). RKNNMDA: ranking-based KNN for MiRNA-disease association prediction.RNA Biol.14952–962. 10.1080/15476286.2017.1312226
14
ChenX.XieD.ZhaoQ.YouZ.-H. (2017d). MicroRNAs and complex diseases: from experimental results to computational models.Brief. Bioinform.20515–539. 10.1093/bib/bbx130
15
ChenX.YanC. C.ZhangX.YouZ. H. (2017e). Long non-coding RNAs and complex diseases: from experimental results to computational models.Brief. Bioinform.18558–576. 10.1093/bib/bbw060
16
ChenJ.GuoM. Y.WangX. L.LiuB. (2018a). A comprehensive review and comparison of different computational methods for protein remote homology detection.Brief. Bioinform.19231–244. 10.1093/bib/bbw108
17
ChenX.HuangL.XieD.ZhaoQ. (2018b). EGBMMDA: extreme gradient boosting machine for MiRNA-disease association prediction.Cell Death Dis.9:3. 10.1038/s41419-017-0003-x
18
ChenX.SunY.-Z.GuanN.-N.QuJ.HuangZ.-A.ZhuZ.-X.et al (2018c). Computational models for lncRNA function prediction and functional similarity calculation.Brief. Funct. Genomics1858–82. 10.1093/bfgp/ely031
19
ChenX.WangC. C.YinJ.YouZ. H. (2018d). Novel human miRNA-disease association inference based on random forest.Mol. Ther. Nucleic Acids13568–579. 10.1016/j.omtn.2018.10.005
20
ChenX.WangL.QuJ.GuanN. N.LiJ. Q. (2018e). Predicting miRNA-disease association based on inductive matrix completion.Bioinformatics344256–4265. 10.1093/bioinformatics/bty503
21
ChenX.XieD.WangL.ZhaoQ.YouZ. H.LiuH. S. (2018f). BNPMDA: bipartite network projection for MiRNA-disease association prediction.Bioinformatics343178–3186. 10.1093/bioinformatics/bty333
22
ChenX.YinJ.QuJ.HuangL. (2018g). MDHGI: matrix decomposition and heterogeneous graph inference for miRNA-disease association prediction.PLoS Comput. Biol.14:e1006418. 10.1371/journal.pcbi.1006418
23
ChenW.DingH.FengP. M.LinH.ChouK. C. (2016). IACP: a sequence-based tool for identifying anticancer peptides.Oncotarget716895–16909. 10.18632/oncotarget.7815
24
ChenX. X.TangH.LiW. C.WuH.ChenW.DingH.et al (2016). Identification of bacterial cell wall lyases via pseudo amino acid composition.Biomed Res. Int.2016:1654623. 10.1155/2016/1654623
25
ChenX.HuangL. (2017). LRSSLMDA: laplacian regularized sparse subspace learning for MiRNA-disease association prediction.PLoS Comput. Biol.13:e1005912. 10.1371/journal.pcbi.1005912
26
ChenX.YanC. C.LuoC.JiW.ZhangY.DaiQ. (2015). Constructing lncRNA functional similarity network based on lncRNA-disease associations and disease semantic similarity.Sci. Rep.5:11338. 10.1038/srep11338
27
ChenX.YanG. Y. (2013). Novel human lncRNA-disease association inference based on lncRNA expression profiles.Bioinformatics292617–2624. 10.1093/bioinformatics/btt426
28
CordierT.EslingP.LejzerowiczF.ViscoJ.OuadahiA.MartinsC.et al (2017). Predicting the ecological quality status of marine environments from eDNA metabarcoding data using supervised machine learning.Environ. Sci. Technol.519118–9126. 10.1021/acs.est.7b01518
29
CotterP. D.HillC.RossR. P. (2005). Food microbiology: bacteriocins: developing innate immunity for food.Nat. Rev. Microbiol.3777–788. 10.1038/nrmicro1273
30
CuiY.OoiB. C.TanK. L.JagadishH. V. (2001). “Indexing the distance: an efficient method to KNN processing”, inVldb Proceedings of the 27th VLDB Conference,Rome, 421–430.
- Google Scholar
31
DengZ. L.GottschickC.BhujuS.MasurC.AbelsC.Wagner-DoblerI. (2018). Metatranscriptome analysis of the vaginal microbiota reveals potential mechanisms for protection against metronidazole in bacterial vaginosis.Msphere3:e00262-18. 10.1128/mSphereDirect.00262-18
32
DiMucciD.KonM.SegreD. (2018). Machine learning reveals missing edges and putative interaction mechanisms in microbial ecosystem networks.Msystems3:e00181-18. 10.1128/mSystems.00181-18
33
DingY. J.TangJ. J.GuoF. (2017a). Identification of drug-target interactions via multiple information integration.Inf. Sci.418546–560. 10.1016/j.ins.2017.08.045
34
DingY. J.TangJ. J.GuoF. (2017b). Identification of protein-ligand binding sites by sequence information and ensemble classifier.J. Chem. Inf. Model.573149–3161. 10.1021/acs.jcim.7b00307
35
DruckerH.WuD.VapnikV. N. (2002). Support vector machines for spam categorization.IEEE Trans. Neural Netw.101048–1054. 10.1109/72.788645
36
FanC. Y.LeiX. J.GuoL.ZhangA. D. (2019). Predicting the associations between microbes and diseases by integrating multiple data sources and path-based HeteSim scores.Neurocomputing32376–85. 10.1016/j.neucom.2018.09.054
- CrossRef
- Google Scholar
37
FengP. M.ChenW.LinH.ChouK. C. (2013a). iHSP-PseRAAAC: identifying the heat shock protein families using pseudo reduced amino acid alphabet composition.Anal. Biochem.442118–125. 10.1016/j.ab.2013.05.024
38
FengP. M.DingH.ChenW.LinH. (2013b). Naive bayes classifier with feature selection to identify phage virion proteins.Comput. Math. Methods Med.2013:530696. 10.1155/2013/530696
39
FengP. M.LinH.ChenW. (2013c). Identification of antioxidants from sequence information using naive bayes.Comput. Math. Methods Med.2013:567529. 10.1155/2013/567529
40
FengP. M.DingH.LinH.ChenW. (2017a). AOD: the antioxidant protein database.Sci. Rep.7:7449. 10.1038/s41598-017-08115-6
41
FengP. M.ZhangJ. D.TangH.ChenW.LinH. (2017b). Predicting the organelle location of noncoding RNAs using pseudo nucleotide compositions.Interdiscip. Sci. Comput. Life Sci.9540–544. 10.1007/s12539-016-0193-4
42
FiannacaA.PagliaL. L.RosaM. L.BoscoG. L.RendaG.RizzoR.et al (2018). Deep learning models for bacteria taxonomic classification of metagenomic data.BMC Bioinformatics19:198. 10.1186/s12859-018-2182-6
43
GillevetP.SikaroodiM.KeshavarzianA.MutluE. A. (2010). Quantitative assessment of the human gut microbiome using multitag pyrosequencing.Chem. Biodivers.71065–1075. 10.1002/cbdv.200900322
44
GrazziotinA. L.KooninE. V.KristensenD. M. (2017). Prokaryotic Virus Orthologous Groups (pVOGs): a resource for comparative genomics and protein family annotation.Nucleic Acids Res.45D491–D498. 10.1093/nar/gkw975
45
HeW. Y.JiaC. Z.ZouQ. (2019). 4mCPred: machine learning methods for DNA N-4-methylcytosine sites prediction.Bioinformatics35593–601. 10.1093/bioinformatics/bty668
46
HuH.ZhangL.AiH. X.ZhangH.FanY. T.ZhaoQ.et al (2018). HLPI-Ensemble: prediction of human lncRNA-protein interactions based on ensemble strategy.RNA Biol.15797–806. 10.1080/15476286.2018.1457935
47
HuH.ZhuC. Y.AiH. X.ZhangL.ZhaoJ.ZhaoQ.et al (2017). LPI-ETSLP: lncRNA-protein interaction prediction using eigenvalue transformation-based semi-supervised link prediction.Mol. Biosyst.131781–1787. 10.1039/c7mb00290d
48
HuangY. A.YouZ. H.ChenX.HuangZ. A.ZhangS. W.YanG. Y. (2017). Prediction of microbe-disease association from the integration of neighbor and graph with collaborative recommendation model.J. Transl. Med.15:209. 10.1186/s12967-017-1304-7
49
HuangZ. A.ChenX.ZhuZ. X.LiuH. S.YanG. Y.YouZ. H.et al (2017). PBHMDA: path-based human microbe-disease association prediction.Front. Microbiol.8:233. 10.3389/fmicb.2017.00233
50
JohnsonH. R.TrinidadD. D.GuzmanS.KhanZ.ParzialeJ. V.DeBruynJ. M.et al (2016). A machine learning approach for using the postmortem skin microbiome to estimate the postmortem interval.PLoS One11:e0167370. 10.1371/journal.pone.0167370
51
JolliffeI. T. (2002). Principal component analysis.J. Mark. Res.87:513.
- Google Scholar
52
JordanA. (2008). On Discriminative vs. Generative classifiers: a comparison of logistic regression and naive Bayes.Neural Process. Lett.28:169. 10.1007/s11063-008-9088-7
- CrossRef
- Google Scholar
53
KatzL. (1953). A new status index derived from sociometric analysis.Psychometrika1839–43. 10.1007/BF02289026
- CrossRef
- Google Scholar
54
KiraK.RendellL. A. (1992). “A practical approach to feature selection,” in Proceedings of the Ninth International Workshop on Machine Learning, Aberdeen. 10.1016/B978-1-55860-247-2.50037-1
- CrossRef
- Google Scholar
55
LangilleM. G. I.ZaneveldJ.CaporasoJ. G.McDonaldD.KnightsD.ReyesJ. A.et al (2013). Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences.Nat. Biotechnol.31814–821. 10.1038/nbt.2676
56
LeyR. E.TurnbaughP. J.KleinS.GordonJ. I. (2006a). Microbial ecology: human gut microbes associated with obesity.Nature4441022–1023.
- Google Scholar
57
LeyR. E.TurnbaughP. J.SamuelK.GordonJ. I. (2006b). Microbial ecology: human gut microbes associated with obesity.Nature4441022–1023.
- Google Scholar
58
LiZ.TangJ. J.GuoF. (2016). Learning from real imbalanced data of 14-3-3 proteins binding specificity.Neurocomputing21783–91. 10.1016/j.neucom.2016.03.093
- CrossRef
- Google Scholar
59
LiaoY.VemuriV. R. (2002). Use of K-Nearest Neighbor classifier for intrusion detection.Comput. Secur.21439–448. 10.1016/S0167-4048(02)00514-X
- CrossRef
- Google Scholar
60
LiuB.JiangS.ZouQ. (2018). HITS-PR-HHblits: protein remote homology detection by combining PageRank and Hyperlink-Induced Topic Search.Brief. Bioinform.2018:bby104. 10.1093/bib/bby104
61
LiuB.LiuF.WangX.ChenJ.FangL.ChouK. -C. (2015). Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences.Nucleic Acids Res.43W65–W71. 10.1093/nar/gkv458
62
LiuY.ZengX.HeZ.ZouQ. (2016). Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources.IEEE/ACM Trans. Comput. Biol. Bioinform.14905–915. 10.1109/TCBB.2016.2550432
63
MaidenM. C. J.BygravesJ. A.FeilE.MorelliG.RussellJ. E.UrwinR.et al (1998). Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms.Proc. Natl. Acad. Sci. U.S.A.953140–3145. 10.1073/pnas.95.6.3140
64
MainaliK. P.BewickS.ThielenP.MehokeT.BreitwieserF. P.PaudelS.et al (2017). Statistical analysis of co-occurrence patterns in microbial presence-absence datasets.PLoS One12:e0187132. 10.1371/journal.pone.0187132
65
MeenaM. J.ChandranK. R. (2009). “Naïve Bayes text classification with positive features selected by statistical method,” in Proceedings of the First International Conference on Advanced Computing, ICAC 2009, Los Alamitos: IEEE, 28–33. 10.1109/ICADVC.2009.5378273
- CrossRef
- Google Scholar
66
Moitinho-SilvaL.SteinertG.NielsenS.HardoimC. C. P.WuY. C.McCormackG. P.et al (2017). Predicting the HMA-LMA status in marine sponges by machine learning.Front. Microbiol.8:752. 10.3389/fmicb.2017.00752
67
MoranM. A. (2015). The global ocean microbiome.Science350:aac8455. 10.1126/science.aac8455
68
MorrisO. N.CunninghamJ. C.FinneycrawleyJ. R.JaquesR. P.KinoshitaG. (1986). Microbial insecticides in Canada: their registration and use in agriculture, forestry and public and animal health.Bull. Entomol. Soc. Canada181–43.
- Google Scholar
69
MuraliA.BhargavaA.WrightE. S. (2018). IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences.Microbiome6:140. 10.1186/s40168-018-0521-5
70
NannipieriP.AscherJ.CeccheriniM. T.LandiL.PietramellaraG.RenellaG. (2010). Microbial diversity and soil functions.Eur. J. Soil Sci.54655–670. 10.1046/j.1351-0754.2003.0556.x
- CrossRef
- Google Scholar
71
NielC. B. V. (1966). Microbiology and molecular biology.Q. Rev. Biol.41105–112. 10.1086/404937
- CrossRef
- Google Scholar
72
NowrousianM. (2010). Next-generation sequencing techniques for eukaryotic microorganisms: sequencing-based solutions to biological problems.Eukaryot. Cell91300–1310. 10.1128/EC.00123-10
73
NugentR. P.KrohnM. A.HillierS. L. (1991). Reliability of diagnosing bacterial vaginosis is improved by a standardized method of gram stain interpretation.J. Clin. Microbiol.29297–301.
- Pubmed Abstract
- Google Scholar
74
OudahM.HenschelA. (2018). Taxonomy-aware feature engineering for microbiome classification.BMC Bioinformatics19:227. 10.1186/s12859-018-2205-3
75
PanG. F.JiangL. M.TangJ. J.GuoF. (2018). A novel computational method for detecting DNA methylation sites with DNA sequence information and physicochemical properties.Int. J. Mol. Sci.19:E511. 10.3390/ijms19020511
76
PengL. H.YinJ.ZhouL. Q.LiuM. X.ZhaoY. (2018). Human microbe-disease association prediction based on adaptive boosting.Front. Microbiol.9:2440. 10.3389/fmicb.2018.02440
77
PetrofE. O.ClaudE. C.GloorG. B.AllenvercoeE. (2012). Microbial ecosystems therapeutics: a new paradigm in medicine?Benef. Microbes453–65. 10.3920/BM2012.0039
78
PodaniJ.MiklósI. (2002). Resemblance coefficients and the horseshoe effect in principal coordinates analysis.Ecology833331–3343. 10.1890/0012-9658(2002)083[3331:RCATHE]2.0.CO;2
- CrossRef
- Google Scholar
79
QuK. Y.HanK.WuS.WangG. H.WeiL. Y. (2017). Identification of DNA-binding proteins using mixed feature representation methods.Molecules22:E1602. 10.3390/molecules22101602
80
RavelJ.GajerP.AbdoZ.SchneiderG. M.KoenigS. S.McCulleS. Let al (2011). Vaginal microbiome of reproductive-age women.Proc. Natl. Acad. Sci. U.S.A.108(Suppl. 1), 4680–4687. 10.1073/pnas.1002611107
81
ReiffC.KellyD. (2010). Inflammatory bowel disease, gut bacteria and probiotic therapy.Int. J. Med. Microbiol.30025–33. 10.1016/j.ijmm.2009.08.004
82
RenJ.AhlgrenN. A.LuY. Y.FuhrmanJ. A.SunF. Z. (2017). VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data.Microbiome5:69. 10.1186/s40168-017-0283-5
83
RodríguezJ. J.KunchevaL. I. (2007). “Naïve bayes ensembles with a random oracle,” in Lecture Notes in Computer ScienceVol. 4472edsHaindlM.KittlerJ.RoliF. (Berlin: Springer), 450–458.
- Google Scholar
84
RouxS.EnaultF.HurwitzB. L.SullivanM. B. (2015). VirSorter: mining viral signal from microbial genomic data.PeerJ3:e985. 10.7717/peerj.985
85
SchmedesS. E.WoernerA. E.NovroskiN. M. M.WendtF. R.KingJ. L.StephensK. M.et al (2018). Targeted sequencing of clade-specific markers from skin microbiomes for forensic human identification.Forensic Sci. Int. Genet.3250–61. 10.1016/j.fsigen.2017.10.004
86
SchmidtT. S. B.Matias RodriguesJ. F.von MeringC. (2014). Ecological consistency of SSU rRNA-based operational taxonomic units at a global scale.PLoS Comput. Biol.10:e1003594. 10.1371/journal.pcbi.1003594
87
ShiJ. Y.HuangH.ZhangY. N.CaoJ. B.YiuS. M. (2018). BMCMDA: a novel model for predicting human microbe-disease associations via binary matrix completion.BMC Bioinformatics19169–176. 10.1186/s12859-018-2274-3
88
ShiJ. Y.LiJ. X.LuH. M. (2016). Predicting existing targets for new drugs base on strategies for missing interactions.BMC Bioinformatics17(Suppl. 8):282. 10.1186/s12859-016-1118-2
89
SibleyC. D.ParkinsM. D.RabinH. R.KangminD.NorgaardJ. C.SuretteM. G. (2008). A polymicrobial perspective of pulmonary infections exposes an enigmatic pathogen in cystic fibrosis patients.Proc. Natl. Acad. Sci. U.S.A.10515070–15075. 10.1073/pnas.0804326105
90
SongT.Rodriguez-PaionA.ZhengP.ZengX. X. (2018). Spiking neural P systems with colored spikes.IEEE Trans. Cogn. Dev. Syst.101106–1115. 10.1109/tcds.2017.2785332
91
SouzaP. M. D. (2010). Application of microbial α-amylase in industry – A review.Braz. J. Microbiol.41850–861. 10.1590/S1517-83822010000400004
92
SrinivasanS.HoffmanN. G.MorganM. T.MatsenF. A.FiedlerT. L.HallR. W.et al (2012). Bacterial communities in women with bacterial vaginosis: high resolution phylogenetic analyses reveal relationships of microbiota to clinical criteria.PLoS One7:e37818. 10.1371/journal.pone.0037818
93
StatnikovA.HenaffM.NarendraV.KongantiK.LiZ. G.YangL. Y.et al (2013). A comprehensive evaluation of multicategory classification methods for microbiomic data.Microbiome1:11. 10.1186/2049-2618-1-11
94
StoterF. R.ChakrabartyS.EdlerB.HabetseE. A. P. (2019). CountNet: estimating the number of concurrent speakers using supervised learning.IEEE/ACM Trans. Audio Speech Lang. Process.27268–282. 10.1109/taslp.2018.2877892
- CrossRef
- Google Scholar
95
SuR.WuH.XuB.LiuX.WeiL. (2018). Developing a multi-dose computational model for drug-induced hepatotoxicity prediction based on toxicogenomics data.IEEE/ACM Trans. Comput. Biol. Bioinform. 10.1109/tcbb.2018.2858756[Epub ahead of print].
96
SujathaS.HoffmanN. G.MorganM. T.MatsenF. A.FiedlerT. L.HallR. W.et al (2012). Bacterial communities in women with bacterial vaginosis: high resolution phylogenetic analyses reveal relationships of microbiota to clinical criteria.PLoS One7:e37818. 10.1371/journal.pone.0037818
97
SvetnikV.LiawA.TongC.CulbersonJ. C.SheridanR. P.FeustonB. P. (2003). Random forest: a classification and regression tool for compound classification and QSAR modeling.J. Chem. Inf. Comput. Sci.431947–1958. 10.1021/ci034160g
98
WaldronL. (2018). Data and statistical methods to analyze the human microbiome.Msystems3:e00194-17. 10.1128/mSystems.00194-17
99
WangF.HuangZ. A.ChenX.ZhuZ. X.WenZ. K.ZhaoJ. Y.et al (2017). LRLSHMDA: laplacian regularized least squares for human microbe-disease association prediction.Sci. Rep.7:7601. 10.1038/s41598-017-08127-2
100
WeiL. Y.ChenH. R.SuR. (2018a). M6APred-EL: a sequence-based predictor for identifying N6-methyladenosine sites using ensemble learning.Mol. Ther. Nucleic Acids12635–644. 10.1016/j.omtn.2018.07.004
101
WeiL. Y.ZhouC.ChenH. R.SongJ. N.SuR. (2018b). ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides.Bioinformatics344007–4016. 10.1093/bioinformatics/bty451
102
WeiL. Y.WanS. X.GuoJ. S.WongK. K. L. (2017a). A novel hierarchical selective ensemble classifier with bioinformatics application.Artif. Intell. Med.8382–90. 10.1016/j.artmed.2017.02.005
103
WeiL. Y.XingP. W.ZengJ. C.ChenJ. X.SuR.GuoF. (2017b). Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier.Artif. Intell. Med.8367–74. 10.1016/j.artmed.2017.03.001
104
WeinbauerM. G. (2010). Ecology of prokaryotic viruses.FEMS Microbiol. Rev.28127–181. 10.1016/j.femsre.2003.08.001
105
WhiteJ. R.NagarajanN.PopM. (2009). Statistical methods for detecting differentially abundant features in clinical metagenomic samples.PLoS Comput. Biol.5:e1000352. 10.1371/journal.pcbi.1000352
106
WisittipanitN. (2012). Machine Learning Approach for Profiling Human Microbiome.Ph.D. dissertation, George Mason University, Fairfax, VA. Available at: https://search.proquest.com/docview/1009703926?accountid=45721(accessed April 8 2019).
- Google Scholar
107
XiaofeiY.LinG.XingliG.XinghuaS.HaoW.FeiS.et al (2014). A network based method for analysis of lncRNA-disease associations and prediction of lncRNAs implicated in diseases.PLoS One9:e87797. 10.1371/journal.pone.0087797
108
XieK.GuoL.BaiY.LiuW.YanJ.BucherM. (2018). Microbiomics and plant health: an interdisciplinary and international workshop on the plant microbiome.Mol. Plant121–3. 10.1016/j.molp.2018.11.004
109
XuL.LiangG. M.LiaoC. R.ChenG. D.ChangC. C. (2018a). An efficient classifier for alzheimer’s disease genes identification.Molecules23:E3140. 10.3390/molecules23123140
110
XuL.LiangG. M.ShiS. H.LiaoC. R. (2018b). SeqSVM: a sequence-based support vector machine method for identifying antioxidant proteins.Int. J. Mol. Sci.19:E1773. 10.3390/ijms19061773
111
XuL.LiangG. M.WangL. J.LiaoC. R. (2018c). A novel hybrid sequence-based model for identifying anticancer peptides.Genes9:E158. 10.3390/genes9030158
112
XuezhongZ.JöRgM.Albert-LászlóB.AmitabhS. (2014). Human symptoms-disease network.Nat. Commun.5:4212. 10.1038/ncomms5212
113
YangH.LvH.DingH.ChenW.LinH. (2018a). iRNA-2OM: a sequence-based predictor for identifying 2 ’-o-methylation sites in homo sapiens.J. Comput. Biol.251266–1277. 10.1089/cmb.2018.0004
114
YangH.QiuW. R.LiuG. Q.GuoF. B.ChenW.ChouK. C.et al (2018b). iRSpot-Pse6NC: identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC.Int. J. Biol. Sci.14883–891. 10.7150/ijbs.24616
115
YangH.TangH.ChenX. X.ZhangC. J.ZhuP. P.DingH.et al (2016). Identification of secretory proteins in mycobacterium tuberculosis using pseudo amino acid composition.Biomed Res. Int.2016:5413903. 10.1155/2016/5413903
116
YeomS.JavidiB. (2006). Automatic identification of biological microorganisms using three-dimensional complex morphology.J. Biomed. Opt.11:024017.
- Pubmed Abstract
- Google Scholar
117
YuL.HuangJ. B.MaZ. X.ZhangJ.ZouY. P.GaoL. (2015). Inferring drug-disease associations based on known protein complexes.BMC Med. Genomics8(Suppl. 2):S2. 10.1186/1755-8794-8-s2-s2
118
YuL.MaX. K.ZhangL.ZhangJ.GaoL. (2016a). Prediction of new drug indications based on clinical data and network modularity.Sci. Rep.6:32530. 10.1038/srep32530
119
YuL.WangB. B.MaX. K.GaoL. (2016b). The extraction of drug-disease correlations based on module distance in incomplete human interactome.BMC Syst. Biol.10(Suppl. 4):111. 10.1186/s12918-016-0364-2
120
YuL.SuR. D.WangB. B.ZhangL.ZouY. P.ZhangJ.et al (2017a). Prediction of novel drugs for hepatocellular carcinoma based on multi-source random walk.IEEE/ACM Trans. Comput. Biol. Bioinform.14966–977. 10.1109/tcbb.2016.2550453
121
YuL.ZhaoJ.GaoL. (2017b). Drug repositioning based on triangularly balanced structure for tissue-specific diseases in incomplete interactome.Artif. Intell. Med.7753–63. 10.1016/j.artmed.2017.03.009
122
YuL.ZhaoJ.GaoL. (2018). Predicting potential drugs for breast cancer based on miRNA and tissue specificity.Int. J. Biol. Sci.14971–980. 10.7150/ijbs.23350
123
ZengX. X.DingN. X.Rodriguez-PatonA.ZouQ. (2017a). Probability-based collaborative filtering model for predicting gene-disease associations.BMC Med. Genomics10(Suppl. 5):76. 10.1186/s12920-017-0313-y
124
ZengX. X.LinW.GuoM. Z.ZouQ. (2017b). A comprehensive overview and evaluation of circular RNA detection tools.PLoS Comput. Biol.13:e1005420. 10.1371/journal.pcbi.1005420
125
ZengX. X.LiuL.LuL. Y.ZouQ. (2018). Prediction of potential disease-associated microRNAs using structural perturbation method.Bioinformatics342425–2432. 10.1093/bioinformatics/bty112
126
ZhangX.ZouQ.Rodriguez-PatonA.ZengX. X. (2019). Meta-path methods for prioritizing candidate disease miRNAs.IEEE/ACM Trans. Comput. Biol. Bioinform.16283–291. 10.1109/tcbb.2017.2776280
127
ZhaoQ.LiangD.HuH.RenG. F.LiuH. S. (2018a). RWLPAP: random walk for IncRNA-protein associations prediction.Protein Pept. Lett.25830–837. 10.2174/0929866525666180905104904
128
ZhaoQ.YuH.MingZ.HuH.RenG.LiuH. (2018b). The bipartite network projection-recommended algorithm for predicting long non-coding RNA-protein interactions.Mol. Ther. Nucleic Acids13464–471. 10.1016/j.omtn.2018.09.020
129
ZhaoQ.ZhangY.HuH.RenG. F.ZhangW.LiuH. S. (2018c). IRWNRLPI: integrating random walk and neighborhood regularized logistic matrix factorization for lncRNA-protein interaction prediction.Front. Genet.9:239. 10.3389/fgene.2018.00239
130
ZitnikM.NguyenF.WangB.LeskovecJ.GoldenbergA.HoffmanM. M. (2019). Machine learning for integrating data in biology and medicine: principles, practice, and opportunities.Int. J. Inf. Fusion5071–91. 10.1016/j.inffus.2018.09.012
131
ZouQ.ChenL.HuangT.ZhangZ.G.XuY.G. (2017). Machine learning and graph analytics in computational biomedicine.Artif. Intell. Med.83:1. 10.1016/j.artmed.2017.09.003
132
ZouQ.LiJ. J.SongL.ZengX. X.WangG. H. (2016). Similarity computation strategies in the microRNA-disease network: a survey.Brief. Funct. Genomics1555–64. 10.1093/bfgp/elv024
133
ZouQ.LinG.JiangX.LiuX.ZengX. (2018a). Sequence clustering in bioinformatics: an empirical study.Brief. Bioinform.bby090. 10.1093/bib/bby090
134
ZouQ.QuK. Y.LuoY. M.YinD. H.JuY.TangH. (2018b). Predicting diabetes mellitus with machine learning techniques.Front. Genet.9:515. 10.3389/fgene.2018.00515
- CrossRef
- Google Scholar

Summary

Keywords

microorganisms, classification, environment, species, association, diseases

Citation

Qu K, Guo F, Liu X, Lin Y and Zou Q (2019) Application of Machine Learning in Microbiology. Front. Microbiol. 10:827. doi: 10.3389/fmicb.2019.00827

Received

31 January 2019

Accepted

01 April 2019

Published

18 April 2019

Volume

10 - 2019

Edited by

Hongsheng Liu, Liaoning University, China

Reviewed by

Yen-Wei Chu, National Chung Hsing University, Taiwan; Mohamed Elhoseny, Mansoura University, Egypt

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Yuan Lin, linyuan1979@gmail.com Quan Zou, zouquan@nclab.net

This article was submitted to Systems Microbiology, a section of the journal Frontiers in Microbiology

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Systems Microbiology

REVIEW article

Application of Machine Learning in Microbiology

Abstract

Introduction

Machine Learning Methods

Classification and Prediction in Microbiology

Prediction of Microbial Species

Prediction of Environmental and Host Phenotypes

Using Microbial Communities to Predict Disease

Interaction and Association in Microbiology

Interaction Between Microorganisms

Microbiome-Disease Association

Conclusion

Statements

Author contributions

Funding

Acknowledgments

Conflict of interest

References

Summary

Outline

Figures

Cite article

Article metrics

REVIEW article

Application of Machine Learning in Microbiology

Abstract

Introduction

Machine Learning Methods

Classification and Prediction in Microbiology

Prediction of Microbial Species

Prediction of Environmental and Host Phenotypes

Using Microbial Communities to Predict Disease

Interaction and Association in Microbiology

Interaction Between Microorganisms

Microbiome-Disease Association

Conclusion

Statements

Author contributions

Funding

Acknowledgments

Conflict of interest

References

Summary

Outline

Figures

Cite article

Share article

Article metrics