The application of machine learning in clinical microbiology and infectious diseases

Xu, Cheng; Zhao, Ling-Yun; Ye, Cun-Si; Xu, Ke-Chen; Xu, Ke-Yang

doi:10.3389/fcimb.2025.1545646

REVIEW article

Front. Cell. Infect. Microbiol., 01 May 2025

Sec. Clinical Infectious Diseases

Volume 15 - 2025 | https://doi.org/10.3389/fcimb.2025.1545646

This article is part of the Research TopicUnravelling Host-Pathogen Interactions in Bacterial Infection: Insights from Omics and Machine LearningView all 7 articles

The application of machine learning in clinical microbiology and infectious diseases

Cheng Xu^1†

Ling-Yun Zhao^2†

Cun-Si Ye³

Ke-Chen Xu^4,5*

Ke-Yang Xu^6*

¹Clinical Laboratory of Chun’an First People’s Hospital, Zhejiang Provincial People’s Hospital Chun’an Branch, Hangzhou Medical College Affiliated Chun’an Hospital, Hangzhou, Zhejiang, China
²Department of Medicine & Therapeutics, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR, China
³Department of Clinical Laboratory Medicine, Institution of Microbiology and Infectious Diseases, The First Affiliated Hospital, Hengyang Medical School, University of South China, Hengyang, Hunan, China
⁴School of Psychology, Zhejiang Normal University, Jinhua, China
⁵Key Laboratory of Intelligent Education Technology and Application of Zhejiang Province, Zhejiang Normal University, Jinhua, China
⁶Faculty of Chinese Medicine, and State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Macao SAR, China

With the development of artificial intelligence(AI) in computer science and statistics, it has been further applied to the medical field. These applications include the management of infectious diseases, in which machine learning has created inroads in clinical microbiology, radiology, genomics, and the analysis of electronic health record data. Especially, the role of machine learning in microbiology has gradually become prominent, and it is used in etiological diagnosis, prediction of antibiotic resistance, association between human microbiome characteristics and complex host diseases, prognosis judgment, and prevention and control of infectious diseases. Machine learning in the field of microbiology mainly adopts supervised learning and unsupervised learning, involving algorithms from classification and regression to clustering and dimensionality reduction. This Review explains crucial concepts in machine learning for unfamiliar readers, describes machine learning’s current applications in clinical microbiology and infectious diseases, and summarizes important approaches clinicians must be aware of when evaluating research using machine learning.

Introduction

In 1956, John McCarthy and colleagues founded the field of artificial intelligence at an artificial intelligence conference at Dartmouth College that spawned a new interdisciplinary field of study (Kaul et al., 2020). AI is a new technical science that studies and develops theories, methods, technologies, and application systems used to simulate, extend, and expand human intelligence. AI involves robotics, language recognition, image recognition, natural language processing, expert systems, machine learning, computer vision, etc. The application of AI in medicine has two main branches: virtual and physical. The virtual component is represented by machine earning that uses mathematical algorithms for improving learning through experience (Kaul et al., 2020). The second form of application includes physical objects, medical devices, and increasingly sophisticated robots taking part in the delivery of care (Cornet, 2013).

Machine learning emerges at the intersection of statistics and computer science, where the convergence of the two disciplines is driven by the unique computational challenge of building statistical models from massive data sets (Deo, 2015). There are three types of machine learning algorithms: unsupervised learning, supervised learning, and reinforcement learning. From the perspective of medicine, machine learning’s substantial progress carries potential implications across the scope of practice, including drug research, disease diagnosis, risk stratification and prognosis, treatment planning, and advances in precision medicine approaches (Deo, 2015; Radakovich et al., 2020). Data from various omics sources such as genetics, proteomics, and metabolomics can be integrated to unravel the intricate workings of systems biology using predictive algorithms, such as the discovery of markers (Reel et al., 2021). These new biomarkers have the potential to help in accurate disease prediction, patient stratification, and delivery of precision medicine (Reel et al., 2021). In addition, details of the building process of machine learning can be seen in Figure 1, including data processing, feature encoding, model training and model evaluation

Figure 1

Figure 1. The development and application for example machine learning model. The building process of machine learning model is mainly from data processing, feature coding, model training, model evaluation and selection, and finally to test data prediction. The applications in clinical microbiology and infectious diseases are included.

The applications in clinical microbiology and infectious diseases are quickly expanding, used in etiological diagnosis, prediction of antibiotic resistance, association between human microbiome characteristics and complex host diseases, prognosis judgment, and prevention and control of infectious diseases (Asnicar et al., 2024). In this Review, we want to help clinical staff grasp the important concepts and basic applications of machine learning ranging from their experiments to the critical assessments of the work. Firstly, we introduce supervised and unsupervised machine learning techniques, especially focusing on muti-omics data analysis. We also examine approaches for algorithms of machine learning, for example, dimensionality reduction is frequently used for exploratory microbiological investigations, and feature selection is key to identifying the most relevant aspects of the microbiological phenomenon. Secondly, we mainly summarized the application of machine learning in clinical microbiology and infectious diseases, from diagnosis, risk stratification, prognosis, treatment selection, and response prediction to infectious disease prevention and control, and multi-omics applications. Thirdly, we highlight the key elements of machine learning for clinical staff, including how to evaluate machine learning models and how to apply them to real-world scenarios that minimize potential bias.

Supervised machine learning

Supervised machine learning, uses training sets of input/output pairs to build machine learning models designed to make accurate predictions about new data that have never been seen before. Supervised learning usually requires a certain amount of manpower to build the training set, mainly by labeling the corresponding data features such as the sequence of the gene or genome of the strain, or phenotypic information obtained by in vitro experiments on the strain. Labeled data usually defines the outcome of interest, for example, to train an algorithm for sepsis prediction, we use a dataset in which patients are already defined as having sepsis or not (Peiffer-Smadja et al., 2020). Besides, supervised learning improves outbreak detection of Salmonella and Campylobacter infections using routine surveillance data (Zacher and Czogiel, 2022), diagnoses the childhood febrile illness using a multi-class blood RNA molecular signature (Habgood-Coote et al., 2023) and inflammatory bowel diseases using gut microbiome (Manandhar et al., 2021), screens and types diabetes using gut microbiome metagenomic hypervariable features (Chavarria et al., 2025), etc. Supervised learning usually divides labeled data into training sets and verification sets, and unlabeled data into test sets. Common algorithms for supervised machine learning include Decision Trees(DT), Ordinary Least Squares Regression(OLS), Naive Bayesian classification(NB), Logistic Regression(LR), Support Vector Machine(SVM), Ensemble methods(EM), Random Forests linear regression(RF), Linear Discriminant Analysis (LD), k-nearest neighbor algorithm(k-NN), Multi-layer perceptron(MLP), Convolutional neural networks(CNN)and so on.

There are two main types of supervised machine learning problems, called classification and regression. When supervised learning uses categorical labels (for example, taxonomic labels) for the outcome variable, it is referred to as classification, whereas regression refers to the case in which the outcome variable is a numerical continuous variable (for example, the optimal pH for a bacterium to grow) (Asnicar et al., 2024). Supervised learning focuses on classification, which involves choosing among subgroups to best describe a new instance of data, and prediction, which involves estimating an unknown parameter (Deo, 2015). For example, machine learning techniques through RF and gradient boosting (GB) models can be successfully applied to predict malaria using patient information (Lee et al., 2021). The graph-based MLP and RF models effectively diagnosed influenza and hepatitis, respectively (Alqaissi et al., 2023). In addition, machine learning algorithms for taxonomic classification of 16S rRNA genes from isolate sequences or of 16S rRNA gene fragments from microbiome experiments have been developed for this task, including k-mer profiling and support vector machines (SVMs) (McHardy et al., 2007; Diaz et al., 2009; Gregor et al., 2016; Vervier et al., 2016). However, different algorithms have their own advantages and disadvantages. For example, the most important advantage of RF is that training can be highly parallelized, which has advantages for large-sample training speed in the era of big data (Hu and Szymczak, 2023). However, RF models tend to fall into overfitting with some noisy sample sets and features with more value division (Hu and Szymczak, 2023). The SVM algorithm is very effective in high-dimensional feature classification and regression problems, and it still has a good effect when the feature dimension is larger than the number of samples (Valkenborg et al., 2023). However, SVM algorithm is not suitable for big data and sensitive to missing data (Valkenborg et al., 2023). The main advantages of DT algorithm are that data preprocessing is relatively simple and missing data can be processed (Kingsford and Salzberg, 2008). However, DT algorithm is very easy to overfit, resulting in poor generalization ability, and is not suitable for complex relationships and characteristic samples (Kingsford and Salzberg, 2008). The KNN regression method, known for its nonparametric nature, is highly valued for its simplicity and its effectiveness in handling complex structured data, particularly in big data contexts (Srisuradetchai and Suksrikran, 2024). However, this method is susceptible to overfitting and fit discontinuity, which present significant challenges (Srisuradetchai and Suksrikran, 2024). The naive Bayes algorithm is stable for classification efficiency, missing data, and large data sets (Awaysheh et al., 2019). However, Bayesian algorithms are often influenced by prior probability models (Awaysheh et al., 2019). CNN has the high classification, and strong robustness and fault tolerance to noise nerve (Shan et al., 2021). However, CNN require a large number of parameters, such as the network topology, weights, and initial values of thresholds (Shan et al., 2021).

Unsupervised machine learning

Unsupervised learning can solve various problems in pattern recognition based on training samples whose class is unknown (not labeled), such as finding subsets of patients with similar expression levels in a gene expression study (Altman and Krzywinski, 2017) or predicting mutation effects from gene sequence co-variation (Hopf et al., 2017). What’s more, the measurement of the gene expression time point of each cell in the isogenic bacterial cell population in liquid batch culture at different points should identify the cell growth stage and the cell group with similar gene expression could be divided by unsupervised learning algorithm to reflect the overview of the growth pattern (Asnicar et al., 2024). There are two types of unsupervised learning: unsupervised transformation and clustering algorithm. Unsupervised transformation of a data set is the creation of a new representation of the data that may be easier for humans or other machine learning algorithms to understand than the original representation of the data. The clustering algorithm divides the data into different groups by identifying similar items.

Dimensionality reduction, a common application of unsupervised transformation, is a new way of accepting a high-dimensional approach to data that contains many features, with fewer features to generalize its important properties. For example, dimensionality reduction could be applied to visualize single-cell data (Amouzgar et al., 2022; Becht et al., 2018; Jiang et al., 2023), investigate the diversity of Tuberculosis Spoligotypes (Senelle et al., 2022), deconvolute gut microbial community dynamics (Martino et al., 2021), distill complex evolutionary relationships in seasonal influenza and SARS-CoV-2 (Nanduri et al., 2024), characterize circulating innate lymphoid cell in inflammatory bowel disease (Mazzurana et al., 2021), treat for human immunodeficiency virus infection (Choi et al., 2023), etc. These techniques comprise both linear and non-linear transformations of the data, including principal component analysis (PCA) (Tsuyuzaki et al., 2020), Uniform Manifold Approximation and Projection (UMAP) (Becht et al., 2018) and t-distributed stochastic neighbor embedding (t-SNE) (Kobak and Berens, 2019). PCA is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss (Jolliffe and Cadima, 2016). Uniform Manifold Approximation and Projection (UMAP) is an alternative method that can reduce the dimensionality of beta diversity distance matrices, for example, UMAP can reveal composite patterns and resolve visualization artifacts in microbiome data (Armstrong et al., 2021). Common data analysis pipelines include a dimensionality reduction step for visualizing the data in two dimensions, most frequently performed using t-distributed stochastic neighbor embedding, for example, in single-cell transcriptomics (t-SNE) (Kobak and Berens, 2019). Dimensionality reduction can improve computing efficiency, reduce storage space, remove redundant features, speed up follow-up processing, and promote visualization. The main disadvantages of dimensionality reduction in machine learning algorithms include loss of information, decreased interpretability of and higher technical barriers.

Clustering methods are used to predict groupings of similar data points in a dataset and are usually based on some measure of similarity between data points (Greener et al., 2022). For example, clustering could be applied to analyze gene sequence (Ali et al., 2022; He et al., 2023), visualize the sequence-structure-function relationship of protein networks (Mai et al., 2016; Mirdita et al., 2017; Yeung et al., 2023), densely sample human gut microbiome time series (Benincà et al., 2023), categorize countries into homogeneous subgroups based on the joint patterns of HIV/AIDS and TB mortality rates (Mobaderi et al., 2025), detect infectious disease transmission outbreaks from sequence variation (McCloskey and Poon, 2017), etc. The clustering algorithm mainly includes K-means clustering, agglomerative clustering, and density-based spatial clustering of applications with noise (DBSCAN). K-means clustering can identify diverse clinical phenotypes in COVID-19 patients (Garcia-Vidal et al., 2024) and estimate bacterial community composition (Koslicki et al., 2015). The agglomerative clustering algorithm can reveal distinct community types of the bacterial populations (Bezek et al., 2020; Ghosh et al., 2022). Clustering algorithm has the advantages of flexibility and simplicity, which can handle noise and no need to specify the number of clusters in advance. However, the clustering algorithm has high computational complexity and sensitive parameters, which needs to determine the number of clusters in advance.

Other machine learning paradigms

Semi-supervised learning trains a model using not only labeled data generally available in small amount, but also using unlabeled data often available in large amount (Mourad, 2023). For example, semi-supervised learning could be applied to predict virus-receptor interactions (Yan et al., 2019), facilitate antibiotic stewardship for urinary tract infections (de Vries et al., 2022), predict Lactobacillus delbrueckii subsp. bulgaricus-Streptococcus thermophilus interactions (Yang et al., 2025), segment the medical image (Tang et al., 2023), etc. Key technical approaches include: 1) Consistency regularization (e.g., Temporal Ensembling), which enforces stable predictions under input perturbations or dropout variations through loss terms like mean squared error between multiple predictions (Laine and Aila, 2017); 2) Noise-aware frameworks like DivideMix, which leverage Gaussian Mixture Models (GMM) to separate clean and noisy labels by analyzing loss distributions, achieving robustness in high-noise scenarios (>50% noise) but struggling with low-noise cases due to overlapping distributions (Li et al., 2020b). Advantages include reduced annotation costs and improved generalization through pseudo-labeling unlabeled data, while limitations involve sensitivity to noise thresholds (e.g., GMM failures in <20% noise) and computational complexity from iterative co-training.

Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective (Mnih et al., 2015). For example, reinforcement learning could be applied to evaluate treatment policies for patients with hepatitis C virus (Oselio et al., 2022), adjust dynamic treatment regimes in HIV (Yu et al., 2019), push the boundaries of coarse-grained vaccine models (Faris et al., 2022), support outbreak management (Kao et al., 2024), etc. Key technical approaches include value functions (such as Q-learning), policy gradient methods and deep reinforcement learning models (such as DQN). For example, DQN combines empirical playback and target networks to solve stability problems in high-dimensional state Spaces (Mnih et al., 2015). Policy gradient methods (such as PPO) directly optimize policy parameters and are suitable for continuous action Spaces (Schulman et al., 2017). However, the low sample efficiency and exploration-utilization tradeoff of RL remain challenges. Lillicrap et al. proposed depth deterministic strategy gradient (DDPG) for continuous control tasks (Lillicrap et al., 2019). The advantages include the need for prior knowledge and the ability to adapt to dynamic environments, but the disadvantages are significant: high computational cost, long training time, and unpredictable behavior may be generated in complex scenarios, leading to safety and ethical risks (Kulkarni et al., 2016).

Machine learning in clinical microbiology and infectious diseases

The applications of machine learning in clinical microbiology and infectious diseases include predicting drug targets or vaccine candidates, diagnosing microorganisms causing infectious diseases, classifying drug resistance against antimicrobial medicines, predicting disease outbreaks, and exploring microbial interactions (Goodswen et al., 2021). From the perspective of experienced clinicians, machine learning’s substantial progress carries potential implications across the scope of practice, including diagnosis, risk stratification and prognosis, treatment selection, response prediction, prevention, and control of infectious diseases. The common infectious diseases in clinics mainly include bacterial infection, viral infection, and fungal infection. The Image analysis AI (IAAI) tools are beginning to penetrate routine clinical microbiology practice, and their scope and impact on routine clinical microbiology practice will continue to grow (Burns et al., 2023). The use of machine learning as a means for the discrimination of diseases from mass spectrometric data aims to develop diagnostic and prognostic tools, treatment targets, and patient management systems (Liebal et al., 2020). Representative studies detailing machine learning’s applications in clinical microbiology and infectious diseases are summarized in Table 1. As shown in the Figure 1, the applications of machine learning in clinical microbiology and infectious diseases include etiology diagnosis, risk stratification, prognosis, treatment selection, treatment innovation, prevention and control.

Table 1

Table 1. Representative machine learning in clinical microbiology and infectious diseases publications.

Etiology diagnosis

In clinical microbiology and infectious diseases, machine learning shows promise and practicability, both in doing existing tasks and making broader applications of existing data than traditional diagnosis does. As detailed in a review by Stephen and colleagues, previous approaches to processing microbiological data entailed identifying and sequencing pathogenic microorganisms, algorithmically extracting features from them, and using those features for classification (Goodswen et al., 2021).

Many studies have reported the practical application of machine learning in the diagnosis of bacterial infections. Rare event detection by machine learning can be used for screening purposes or final identification of a microbe including microscopic detection of mycobacteria in a primary specimen, detection of bacterial colonies growing on nutrient agar, or detection of parasites in a stool preparation or blood smear (Burns et al., 2023). Score-based image analysis AI can be applied to a scoring system that classifies images in toto as its output interpretation and examples include application of the Nugent score for diagnosing bacterial vaginosis and interpretation of urine cultures (Burns et al., 2023). Leveraging machine learning can distinguish between bacterial and viral-induced pharyngitis using hematological markers (Jin et al., 2023). Profiling of the conjunctival bacterial microbiota reveals the feasibility of utilizing a microbiome-based machine learning model to differentially diagnose microbial keratitis and the core components of the conjunctival bacterial interaction network (Ren et al., 2022).

Many studies have reported the practical application of machine learning in the diagnosis of viral infections. Respiratory viruses can be rapidly and quantitatively detected by using surface-enhanced Raman spectroscopy and machine learning (Yang et al., 2022). Image-based and machine learning-guided multiplexed serology test is developed for SARS-CoV-2 diagnosis (Pietiäinen et al., 2023). Blood tests and machine learning can predict the diagnosis of SARS-CoV-2 by calculating the gravity of each feature, such as eosinophils, monocytes, leukocytes, and platelets (Chadaga et al., 2022). Hepatitis B virus(HBV) detection models are developed and validated through a neural network algorithm by using routine clinical data to improve the detection of HBV (Dong et al., 2024). Gradient boosting machine(GBM) using clinical records can predict the diagnosis of HIV and sexually transmitted infections among men who have sex with men using (Bao et al., 2021). PCA-SVM (poly-5) model is effective and robust for clinical prediction of DENV infection in human blood sera (Saleem et al., 2022).

Many studies have reported the practical application of machine learning in the diagnosis of fungal infections. The binary logistic regression model is conducted for early detection of fungal keratitis by learning twelve clinical signs of slit-lamp images and collinear variables (Wei et al., 2023). Metabolomics and machine learning approaches are combined in pursuit of more accurate Paracoccidioidomycosis diagnoses (Lima et al., 2020). Interpretable deep learning can diagnose fungal and acanthamoeba keratitis using in vivo confocal microscopy images (Essalat et al., 2023).

Risk stratification and prognosis

Accurate prediction of risk stratification and prognosis is crucial for balancing the upsides of therapy and the risk of side effects. Although there are still many challenges in specific clinical practice, machine learning provides a reliable way to create efficient models for estimating risk and prognosis.

Machine learning-based clinical decision support is effective for infection risk prediction (Feng et al., 2023). Development and validation of a machine learning-driven prediction model is applied for serious bacterial infections, such as bacterial meningitis or sepsis, among febrile children in emergency departments (Lee et al., 2022). Machine learning can be used for the prediction of prognostic risk factors in patients with invasive candidiasis infection and bacterial bloodstream infection (Li et al., 2022). Virulence factors (VFs), which are crucial for pathogens to successfully infect host tissue and evade the immune system, can be predicted by using sequence alignment percentage and ensemble learning models (Singh et al., 2024). Machine learning can stratify methicillin-resistant Staphylococcus aureus risk among hospitalized patients with community-acquired pneumonia (Rhodes et al., 2023). COVID-19 automated risk assessment uses an ADA tree boosting algorithm through metabolomics data from mass spectrometry (Delafiori et al., 2021). A risk assessment framework for multidrug-resistant Staphylococcus aureus shows high accuracy by incorporating machine learning and mass spectrometry technology (Wang et al., 2023).

The prognosis of infectious diseases is an important basis for clinical adjustment of treatment and machine learning provides important and efficient tools for infection prognosis. A machine learning model for predicting prognosis in HIV-negative CM patients was built and validated, and the model might provide a reference for personalized treatment of HIV-negative CM patients (Liu et al., 2023). Prognostic models in critically Ill patients with sepsis-associated acute kidney injury can be constructed and validated through an interpretable machine-learning approach (Fan et al., 2023). Prediction of prognosis in elderly patients with sepsis can be based on a random survival forest model (Zhang et al., 2022). A simplified machine learning model utilizes platelet-related genes for predicting poor prognosis in sepsis (Diao et al., 2023). Prognosis in COVID-19 patients can be predicted by using machine learning and readily available clinical data (de Fátima Cobre et al., 2022; Campbell et al., 2021).

Treatment selection and innovation

Infectious diseases need to take corresponding drug treatment according to the specific cause, such as viral infection, bacterial infection, and fungal infection. Therefore, accurate diagnosis is a prerequisite for scientific selection of treatment plans and examples of accurate diagnosis using machine learning have been described in detail above. This section will focus on the contents of machine learning in treatment selection and innovation.

The choice of treatment for infectious diseases depends first on accurate diagnosis, and then critically on the strategies for drug selection. The problem of antibiotic resistance varies with time, environment, and region, and there are some differences between in vitro and in vivo results of drug susceptibility tests. Hence, the application of machine learning is an effective tool to solve the above problems. Machine learning can be used for microbial identification and antimicrobial susceptibility testing on MALDI-TOF mass spectra (Weis et al., 2020). Machine-learning-based virtual screening can repurpose drugs for the treatment of Candida albicans infection (Gao et al., 2022). In addition, AI is gaining more and more attention for drug combination discovery and optimization against a variety of infectious agents in bacteria, viruses, parasites, and fungi. In general, input variables that have been used by an AI system for drug combination design can be divided into three groups, such as drug-based, pathogen-based, and host-based (He et al., 2021). Regarding bacteria, three- and four-drug combinations highly efficacious for treating MDR and extensively drug-resistant TB have been identified with the aid of an output-driven feedback system (Silva et al., 2016). In addition, by using genetic algorithms, the dosing strategies of meropenem/polymyxin B combination against carbapenem-resistant A. baumannii were optimized (Smith et al., 2020). Concerning viruses, AI platforms have been developed to discover the optimal combination therapies for HIV, HBV, hepatitis C virus (HCV), SARS-CoV-2, Ebola, vesicular stomatitis virus, herpes simplex virus-1, using a series of machine learning models, such as decision trees, SVM, Bayesian network, logistic regression, Random forest (He et al., 2021; Churkin et al., 2022; Bukic et al., 2023). For fungi, a novel computational algorithm termed Network-based Laplacian regularized Least Square Synergistic drug combination prediction has been developed to predict synergistic drug combinations for fungal diseases where drug resistance is common (Chen et al., 2016).

The antibiotic resistance crisis is a major challenge facing humanity today, and machine learning is one of the effective tools to address it. A machine-learning-based approach is presented to predict active antimicrobial peptides (AMPs) within the global microbiome and leverage a vast dataset of 63,410 metagenomes and 87,920 prokaryotic genomes from environmental and host-associated habitats to create the AMPSphere, a comprehensive catalog comprising 863,498 non-redundant peptides (Santos-Júnior et al., 2024). Using explainable graph algorithms, substructure-based rationales are identified for compounds with high predicted antibiotic activity and low predicted cytotoxicity, and after testing of 283 compounds, it is assumed that one is selective against methicillin-resistant S. aureus (MRSA) and vancomycin-resistant enterococci, evades substantial resistance, and reduces bacterial titers in mouse models of MRSA skin and systemic thigh infection (Wong et al., 2024). Leveraging machine learning essentiality predictions and chemogenomic interactions to identify the glutaminyl-tRNA synthetase Gln4 as the antifungal targets of N-pyrimidinyl-β-thiophenylacrylamide (NP-BTA) (Fu et al., 2021). In addition, the high variability of the virus is a difficulty in the development of antiviral drugs. With the integrated efforts to improve data quality and availability, ML is a promising approach to developing next-generation antivirals and therapeutics for infectious diseases (Kumari et al., 2023). For instance, ML methods can design small molecules based on multiscale behavior and interactions to selectively inhibit multiple influenza targets while mitigating interaction with host proteins to minimize adverse effects (Overhoff et al., 2021).

Prevention and control

Preventing and controlling infectious diseases remains a global public health challenge, as it sometimes causes unexpected pandemics, which are responsible for high morbidity, mortality, and substantial economic impact. AI has had a pivotal role in the prevention and control of infectious diseases. AI has shown great potential in developing effective HIV prevention intervention strategies (Xiang et al., 2022). Machine learning methods can predict the epidemic of human-adaptive Influenza A Viruses based on viral nucleotide compositions (Li et al., 2020a). As most infections are seeded from a patient’s microbiota, these resistance-gaining recurrences can be predicted using the patient’s past infection history and minimized by machine learning-personalized antibiotic recommendations, offering a means to reduce the emergence and spread of resistant pathogens (Stracy et al., 2022). Furthermore, the DT models with alternative sensitivity levels can be exploited in different stages of an emerging infectious diseases(EID) disaster to optimize medical resource allocation, which is crucial in the response to a large-scale epidemic of emerging infectious disease (Chiu et al., 2022).

Vaccines are automatic immune preparations made by artificially attenuated, inactivated, or genetically modified pathogenic microorganisms (such as bacteria, rickettsia, viruses, etc.) and their metabolites for the prevention of infectious diseases. Few developments have done more to limit the spread of infectious disease and associated mortality than the advent of vaccination (Dubé et al., 2021). Vaxign2 is updated to the second generation of the first Web-based vaccine design program using reverse vaccinology and machine learning (Ong et al., 2021). The newly developed machine learning-based reverse vaccinology tools are applied to design the COVID-19 vaccine (Ong et al., 2020; Lv et al., 2021; Huffman et al., 2022). A random forest model is used for active vaccine safety monitoring, such as anaphylaxis and agranulocytosis (Kim et al., 2021). A combinatorial artificial-neural-network design-of-experiment (ANN-DOE) model shows great advantages in lipid nanoparticle-based mRNA vaccine bioprocess (Maharjan et al., 2023). Supervised and unsupervised machine learning approaches are used for monitoring subvisible particles within an aluminum-salt adjuvanted vaccine formulation (Greenblott et al., 2024).

Machine learning in multi-omic data

With the development of technologies such as next-generation DNA and RNA sequencing, it becomes more feasible to obtain personalized data about complex diseases. Data from various omics sources such as genomics, proteomics, metabolomics, transcriptomics, lipidomics, immunomics, glycomics, radiomics (Priya et al., 2022), and ultrasonics can be integrated to unravel the intricate working of systems biology using machine learning-based predictive algorithms (Reel et al., 2021). Integrating multi-omics data with electronic health records (EHRs) can be used for precision medicine by using advanced artificial intelligence (Tong et al., 2024). Integrating multi-omics data could reveal the host-microbiota interactome in inflammatory bowel disease (Su et al., 2025), the interplay between gut microbiome and the host following opioid use (Kolli et al., 2023), host responses to lethal human virus infections (Eisfeld et al., 2024) and etc. HONMF, which is the AI system for the integrative analysis of multi-modal microbiome data, including bacterial, fungal, and viral composition profiles, provides rich biological insights by implementing discriminative microbial feature selection and bacterium-fungus-virus association analysis (Ma et al., 2023). On the COVID-19 diagnosis task, omics-based models performed better than image or physiological feature-based models, proving the importance of the omics-based dataset for future model development (Liu X. et al., 2023). A comprehensive multi-omic blood atlas is presented for identifying immune signatures and correlates of host response with varying COVID-19 severity in an integrated comparison with influenza and sepsis patients versus healthy volunteers (David et al., 2022). The muti-omic machine learning model has implications for the development of diagnostic tests and treatments that could ultimately short-circuit the cycle of Clostridioides difficile infection (CDI) recurrence, by providing candidate metabolic biomarkers for diagnostics development, as well as offering insights into the complex microbial and metabolic alterations that are protective or permissive for recurrence (Dawkins et al., 2022). Deep learning of dual plasma fingerprints is developed for high-performance infection classification (Cao et al., 2023). Using lasso and sparse CCA to detect specific associations between gut microbial taxa and host genes, the study finds that Peptostreptococcaceae is associated with MAPK3 and VIPR1 that are part of G protein-coupled receptors pathways in inflammatory bowel disease; and Bacteroides massiliensis is associated with the host gene PLA2G4A, a member of the prostaglandin biosynthesis pathway, in irritable bowel syndrome (Priya et al., 2022).

The key to machine learning for clinicians

Professional training in statistical and research methods has long been a cornerstone of medical education for clinicians. However, it is unrealistic and unnecessary for clinicians to completely understand machine learning’s complexity and depth at the level of a computer scientist. Owing to the diversity and complexity of data types encountered, microbiological data often require individualized solutions for dealing with them effectively, and this makes it difficult to recommend common tools or guidelines for the application of machine learning in these specific domains, as the model selection, training procedure, and test data will reply highly on the exact questions one wants to answer. Clinicians need to master the basic concepts, core steps, general limitations, and common applications of machine learning, such as data processing, feature selection and extraction, model selection and evaluation, generalization, overfitting, underfitting, etc.

Data processing

When available data are in larger quantities, clinicians need to consider more highly parameterized models such as deep neural networks. In supervised machine learning, the relative proportions of each ground truth label in the dataset should also be considered, with more data required for machine learning to work if some labels are rare (Wei and Dunbrack, 2013). Considering the data leakage, clinicians have to pay attention to the problem of having related samples in the training and testing sets. It is usually necessary to use 70-80% of the total data set as the training set and 20-30% as the independent test set (Collins and Moons, 2019). The’related’ here depends on the nature of the study, which might be a case of sampling data from the same patient or the same organism. The issue of data leakage becomes a problem when a model that appears accurate on some benchmark set performs poorly on new data that are different from the training set; in other words, the model does not generalize, likely because it has not modeled the true relationship between the variables, but rather remembered hidden associations present in the data (Greener et al., 2022). Strategies to prevent data leakage include proper data splitting (Kaufman et al., 2012), pipelines for preprocessing (Fabian et al., 2011), time-aware validation (Bergmeir and Benítez, 2012), causal feature analysis (Pearl, 2009), potential leak characteristics identification (Lundberg and Lee, 2017) and so on.

Feature selection and extraction

The main goal of feature selection is the minimization of the original amount of input features, which is chosen for training the machine learning model. It is different from feature extraction, which refers to generating new features from a large number of input features. The ‘omics’ technologies used for microbiome analysis continuously evolve and, although much of the research is still at an early stage, large-scale datasets of ever-increasing size and complexity are being produced (Cammarota et al., 2020). Facing the high-dimensional data, both feature selection and feature extraction can generalize and simplify the input features of the machine learning model.

There are some tips for feature selection and extraction techniques. Firstly, the application of any learning tools for evaluating prediction performance can promote the iterative removal or addition of features to identify those that seem redundant or provide no new information. Secondly, some machine learning algorithms already contain feature selection steps, for example, SVMs embed recursive feature elimination, RF provides a feature importance score, and the LASSO constrains most regression coefficients to be exactly zero. Thirdly, dimensionality reduction is fairly effective for feature extraction when extreme reduction of the high-dimension data is needed and is unnecessary to retain the original features within the model.

Model selection and evaluation

Clinicians usually focus on accurate modeling, discovering mechanisms and the factors responsible for modeling output. The step of model selection exploits the training data to identify the best machine learning model based on the evaluation of different types of models, or across models of the same type but with different hyperparameter settings (Asnicar et al., 2024). In machine learning, the commonly used model evaluation metrics include accuracy, precision, recall, F1 score, receiver operating characteristic (ROC) curve, AUC, mean squared error(MSE), mean absolute error(MAE), log loss, R-squared, cross-validation score, etc (Rainio et al., 2024). Accuracy is the fraction of correct predictions overall predictions. Precision is the fraction of true positives overall positives. Recall or sensitivity is the fraction of true positives over all correct predictions. Specificity is the fraction of true negatives over all negatives. The F1 score is the harmonic mean of precision and recall. ROC curve plots pairs of specificity and sensitivity values calculated at all possible threshold scores. The area under the ROC curve (AUC-ROC) summarizes the performances regardless of the threshold and ranges from 0.5 (random classification) to 1.0 (perfect classification). MSE is essentially finding the average squared error between the predicted value and the true value. MAE is the average of all absolute errors, which finds the average absolute distance between the predicted value and the true value. Log loss is mainly used in binary classification problems to measure the difference between the predicted results of the model and the real label. R (Cornet, 2013), also known as the coefficient of determination, represents how well the model fits the data. An R (Cornet, 2013) representation model close to 1.0 agrees well with the data, while a model close to 0 does not. The cross-validation score evaluates the performance of the model on the new data set by dividing the data set into a training set and a test set to prevent overfitting and improve the generalization ability of the model.

As a data science professional, it is essential to understand the above important evaluation metrics. Clinicians need to understand their uses, advantages, and disadvantages, which will help you choose and implement them accordingly. Classification is one of the most widely used problems in machine learning, with various industrial applications, such as face recognition, image classification, content review, text classification, etc. SVM, LR, DC, RF, and other models are also some of the most popular classification models. The most commonly used metrics for classification problems are accuracy, precision, recall, F1 score, ROC curve and AUC, log loss, etc. In addition, the commonly used metrics for multi-label problems in classification are mainly precision at k (P@k), average precision at k (AP@k), mean average precision at k (MAP@k), etc. Regression models are used to predict continuous target values and also have a wide range of applications, such as house price forecasting, weather forecasting, stock price forecasting, etc. LR, RF, XGboost, RNN, etc., are also some of the most popular regression models. The most common metrics in regression are MAE, MSE, Root mean squared error (RMSE), Root mean squared logarithmic error (RMSLE), Mean percentage error (MPE), Mean absolute percentage error (MAPE), R (Cornet, 2013), etc. Choosing appropriate strategies to evaluate machine learning models is important to provide robust and generalizable estimations and avoid biased models (Topçuoğlu et al., 2020).

Conclusion

With the popularization and development of machine learning technology, the medical field has also undergone subversive changes and challenges. In the field of clinical microbiology and infectious diseases, machine learning has greatly promoted the diversification and accuracy of diagnostic methods, scientific decision-making of treatment programs, accurate judgment of disease prognosis, innovation of treatment means, and effective prevention of diseases. While many of these applications are at the exploratory stage and require further validation and generalization, they hold substantial promise in furthering clinical practice. Different algorithms of machine learning have their own advantages and disadvantages, which should be comprehensively judged and used in combination with the actual situation.

Clinical microbiologists and infectiologists are deeply immersed in the fields of data science and artificial intelligence, by focusing on the general principles and guidelines and on avoiding frequent potential issues affecting machine learning ranging from evaluation issues to study design problems. Choosing the right machine learning algorithm and scientific evaluation model is vitally important because it can help to generalize the model and avoid the problems of underfitting and overfitting. In clinical microbiology and infectious diseases, a large number of multi-omics data is a problem we have to face, which is also an important direction to guide our future research and development.

Author contributions

CX: Writing – original draft, Conceptualization. L-YZ: Conceptualization, Writing – original draft. C-SY: Conceptualization, Writing – original draft. K-CX: Conceptualization, Methodology, Supervision, Writing – review & editing. K-YX: Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Acknowledgments

The authors would like to thank Adjei Peter Darko for his enlightenment and guidance in the field of machine learning.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Ali, O., Farooq, A., Yang, M., Jin, V. X., Bjørås, M., Wang, J. (2022). abc4pwm: affinity based clustering for position weight matrices in applications of DNA sequence analysis. BMC Bioinf. 23, 83. doi: 10.1186/s12859-022-04615-z

PubMed Abstract | Crossref Full Text | Google Scholar

Altman, N., Krzywinski, M. (2017). Clustering. Nat. Methods. 14, 545–546.

Google Scholar

Alqaissi, E., Alotaibi, F., Sher Ramzan, M., Algarni, A. (2023). Novel graph-based machine-learning technique for viral infectious diseases: application to influenza and hepatitis diseases. Ann. Med. 55, 2304108. doi: 10.1080/07853890.2024.2304108

PubMed Abstract | Crossref Full Text | Google Scholar

Amouzgar, M., Glass, D. R., Baskar, R., Averbukh, I., Kimmey, S. C., Tsai, A. G., et al (2022). Supervised dimensionality reduction for exploration of single-cell data by HSS-LDA. Patterns (N Y) 3 (8), 100536.

PubMed Abstract | Google Scholar

Armstrong, G., Martino, C., Rahman, G., Gonzalez, A., Vázquez-Baeza, Y., Mishne, G., et al. (2021). Uniform manifold approximation and projection (UMAP) reveals composite patterns and resolves visualization artifacts in microbiome data. mSystems 6, e0069121. doi: 10.1128/msystems.00691-21

PubMed Abstract | Crossref Full Text | Google Scholar

Asnicar, F., Thomas, A. M., Passerini, A., Waldron, L., Segata, N. (2024). Machine learning for microbiologists. Nat. Rev. Microbiol. 22, 191–205. doi: 10.1038/s41579-023-00984-1

PubMed Abstract | Crossref Full Text | Google Scholar

Awaysheh, A., Wilcke, J., Elvinger, F., Rees, L., Fan, W., Zimmerman, K. L. (2019). Review of medical decision support and machine-learning methods. Vet. Pathol. 56, 512–525. doi: 10.1177/0300985819829524

PubMed Abstract | Crossref Full Text | Google Scholar

Bao, Y., Medland, N. A., Fairley, C. K., Wu, J., Shang, X., Chow, E. P. F., et al. (2021). Predicting the diagnosis of HIV and sexually transmitted infections among men who have sex with men using machine learning approaches. J. Infect. 82, 48–59. doi: 10.1016/j.jinf.2020.11.007

PubMed Abstract | Crossref Full Text | Google Scholar

Becht, E., McInnes, L., Healy, J., Dutertre, C. A., Kwok, I. W. H., Ng, L. G., et al. (2018). Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44. doi: 10.1038/nbt.4314

PubMed Abstract | Crossref Full Text | Google Scholar

Benincà, E., Pinto, S., Cazelles, B., Fuentes, S., Shetty, S., Bogaards, J. A., et al. (2023). Wavelet clustering analysis as a tool for characterizing community structure in the human microbiome. Sci. Rep. 13, 8042. doi: 10.1038/s41598-023-34713-8

PubMed Abstract | Crossref Full Text | Google Scholar

Bergmeir, C., Benítez, J. M. (2012). On the use of cross-validation for time series predictor evaluation. Inf. Sci. 191, 192–213. doi: 10.1016/j.ins.2011.12.028

Crossref Full Text | Google Scholar

Bezek, K., Petelin, A., Pražnikar, J., Nova, E., Redondo, N., Marcos, A., et al. (2020). Obesity measures and dietary parameters as predictors of gut microbiota phyla in healthy individuals. Nutrients 12, 2695. doi: 10.3390/nu12092695

PubMed Abstract | Crossref Full Text | Google Scholar

Bukic, E., Milasin, J., Toljic, B., Jadzic, J., Jevtovic, D., Obradovic, B., et al. (2023). Association between combination antiretroviral therapy and telomere length in people living with human immunodeficiency virus. Biol. (Basel) 12, 1210. doi: 10.3390/biology12091210

PubMed Abstract | Crossref Full Text | Google Scholar

Burns, B. L., Rhoads, D. D., Misra, A. (2023). The use of machine learning for image analysis artificial intelligence in clinical microbiology. J. Clin. Microbiol. 61, e0233621. doi: 10.1128/jcm.02336-21

PubMed Abstract | Crossref Full Text | Google Scholar

Cammarota, G., Ianiro, G., Ahern, A., Carbone, C., Temko, A., Claesson, M. J., et al. (2020). Gut microbiome, big data and machine learning to promote precision medicine for cancer. Nat. Rev. Gastroenterol. Hepatol. 17, 635–648. doi: 10.1038/s41575-020-0327-3

PubMed Abstract | Crossref Full Text | Google Scholar

Campbell, T. W., Wilson, M. P., Roder, H., MaWhinney, S., Georgantas, R. W., 3rd, Maguire, L. K., et al. (2021). Predicting prognosis in COVID-19 patients using machine learning and readily available clinical data. Int. J. Med. Inform. 155, 104594.

PubMed Abstract | Google Scholar

Cao, J., Xiao, Y., Zhang, M., Huang, L., Wang, Y., Liu, W., et al. (2023). Deep learning of dual plasma fingerprints for high-performance infection classification. Small 19, e2206349. doi: 10.1002/smll.202206349

PubMed Abstract | Crossref Full Text | Google Scholar

Chadaga, K., Chakraborty, C., Prabhu, S., Umakanth, S., Bhat, V., Sampathila, N. (2022). Clinical and laboratory approach to diagnose COVID-19 using machine learning. Interdiscip Sci. 14, 452–470. doi: 10.1007/s12539-021-00499-4

PubMed Abstract | Crossref Full Text | Google Scholar

Chavarria, X., Park, H. S., Oh, S., Kang, D., Choi, J. H., Kim, M., et al. (2025). Using gut microbiome metagenomic hypervariable features for diabetes screening and typing through supervised machine learning. Microb. Genom 11, 001365. doi: 10.1099/mgen.0.001365

PubMed Abstract | Crossref Full Text | Google Scholar

Chen, X., Ren, B., Chen, M., Wang, Q., Zhang, L., Yan, G. (2016). NLLSS: predicting synergistic drug combinations based on semi-supervised learning. PloS Comput. Biol. 12, e1004975. doi: 10.1371/journal.pcbi.1004975

PubMed Abstract | Crossref Full Text | Google Scholar

Chiu, H.-Y. R., Hwang, C. K., Chen, S. Y., Shih, F. Y., Han, H. C., King, C. C., et al. (2022). Machine learning for emerging infectious disease field responses. Sci. Rep. 12, 328. doi: 10.1038/s41598-021-03687-w

PubMed Abstract | Crossref Full Text | Google Scholar

Choi, Y., Choi, B. Y., Kim, S. I., Choi, J., Kim, J., Park, B. Y., et al. (2023). Effect of characteristics on the clinical course at the initiation of treatment for human immunodeficiency virus infection using dimensionality reduction. Sci. Rep. 13, 5547. doi: 10.1038/s41598-023-31916-x

PubMed Abstract | Crossref Full Text | Google Scholar

Churkin, A., Kriss, S., Uziel, A., Goyal, A., Zakh, R., Cotler, S. J., et al. (2022). Machine learning for mathematical models of HCV kinetics during antiviral therapy. Math Biosci. 343, 108756. doi: 10.1016/j.mbs.2021.108756

PubMed Abstract | Crossref Full Text | Google Scholar

Collins, G. S., Moons, K. G. M. (2019). Reporting of artificial intelligence prediction models. Lancet 393, 1577–1579. doi: 10.1016/S0140-6736(19)30037-6

PubMed Abstract | Crossref Full Text | Google Scholar

Cornet, G. (2013). Robot companions and ethics a pragmatic approach of ethical design. J. Int. Bioethique 24, 49–58, 179–180. doi: 10.3917/jib.243.0049

PubMed Abstract | Crossref Full Text | Google Scholar

David, J. A., Zhichao, A., Mark, A., Chris, A., Allcock, A., Brian, A., et al. (2022). A blood atlas of COVID-19 defines hallmarks of disease severity and specificity. Cell 185, 916–938.e58.

PubMed Abstract | Google Scholar

Dawkins, J. J., Allegretti, J. R., Gibson, T. E., McClure, E., Delaney, M., Bry, L., et al. (2022). Gut metabolites predict Clostridioides difficile recurrence. Microbiome 10, 87. doi: 10.1186/s40168-022-01284-1

PubMed Abstract | Crossref Full Text | Google Scholar

de Fátima Cobre, A., Surek, M., Stremel, D. P., Fachi, M. M., Lobo Borba, H. H., Tonin, F. S., et al. (2022). Diagnosis and prognosis of COVID-19 employing analysis of patients’ plasma and serum via LC-MS and machine learning. Comput. Biol. Med. 146, 105659.

PubMed Abstract | Google Scholar

Delafiori, J., Navarro, L. C., Siciliano, R. F., de Melo, G. C., Busanello, E. N. B., Nicolau, J. C., et al. (2021). Covid-19 automated diagnosis and risk assessment through metabolomics and machine learning. Anal. Chem. 93, 2471–2479. doi: 10.1021/acs.analchem.0c04497

PubMed Abstract | Crossref Full Text | Google Scholar

Deo, R. C. (2015). Machine learning in medicine. Circulation 132, 1920–1930. doi: 10.1161/CIRCULATIONAHA.115.001593

PubMed Abstract | Crossref Full Text | Google Scholar

de Vries, S., Ten Doesschate, T., Totté, J. E. E., Heutz, J. W., Loeffen, Y. G. T., Oosterheert, J. J., et al. (2022). A semi-supervised decision support system to facilitate antibiotic stewardship for urinary tract infections. Comput. Biol. Med. 146, 105621. doi: 10.1016/j.compbiomed.2022.105621

PubMed Abstract | Crossref Full Text | Google Scholar

Diao, Y., Zhao, Y., Li, X., Li, B., Huo, R., Han, X., et al. (2023). A simplified machine learning model utilizing platelet-related genes for predicting poor prognosis in sepsis. Front. Immunol. 14.

PubMed Abstract | Google Scholar

Diaz, N. N., Krause, L., Goesmann, A., Niehaus, K., Nattkemper, T. W. (2009). TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinf. 10, 56. doi: 10.1186/1471-2105-10-56

PubMed Abstract | Crossref Full Text | Google Scholar

Dong, W., Da Roza, C. C., Cheng, D., Zhang, D., Xiang, Y., Seto, W. K., et al. (2024). Development and validation of HBV surveillance models using big data and machine learning. Ann. Med. 56, 2314237. doi: 10.1080/07853890.2024.2314237

PubMed Abstract | Crossref Full Text | Google Scholar

Dubé, È., Ward, J. K., Verger, P., MacDonald, N. E. (2021). Vaccine Hesitancy, Acceptance, and Anti-Vaccination: Trends and Future Prospects for Public Health. Annu. Rev. Public Health 42, 175–191.

PubMed Abstract | Google Scholar

Eisfeld, A. J., Anderson, L. N., Fan, S., Walters, K. B., Halfmann, P. J., Westhoff Smith, D., et al. (2024). A compendium of multi-omics data illuminating host responses to lethal human virus infections. Sci. Data 11, 328.

PubMed Abstract | Google Scholar

Essalat, M., Abolhosseini, M., Le, T. H., Moshtaghion, S. M., Kanavi, M. R. (2023). Interpretable deep learning for diagnosis of fungal and acanthamoeba keratitis using. Vivo confocal microscopy images Sci. Rep. 13, 8953.

Google Scholar

Fabian, P., Gaël, V., Alexandre, G., Vincent, M., Bertrand, T., Olivier, G., et al. (2011). Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 12, 2825–2830.

Google Scholar

Fan, Z., Jiang, J., Xiao, C., Chen, Y., Xia, Q., Wang, J., et al. (2023). Construction and validation of prognostic models in critically Ill patients with sepsis-associated acute kidney injury: interpretable machine learning approach. J. Trans. Med. 21.

PubMed Abstract | Google Scholar

Faris, J. G., Orbidan, D., Wells, C., Petersen, B. K., Sprenger, K. G. (2022). Moving the needle: Employing deep reinforcement learning to push the boundaries of coarse-grained vaccine models. Front. Immunol. 13, 1029167. doi: 10.3389/fimmu.2022.1029167

PubMed Abstract | Crossref Full Text | Google Scholar

Feng, T., Noren, D. P., Kulkarni, C., Mariani, S., Zhao, C., Ghosh, E., et al. (2023). Machine learning-based clinical decision support for infection risk prediction. Front. Med. 10.

Google Scholar

Fu, C., Zhang, X., Veri, A. O., Iyer, K. R., Lash, E., Xue, A., et al. (2021). Leveraging machine learning essentiality predictions and chemogenomic interactions to identify antifungal targets. Nat. Commun. 12 (1), 6497.

PubMed Abstract | Google Scholar

Gao, A., Kouznetsova, V. L., Tsigelny, I. F. (2022). Machine-learning-based virtual screening to repurpose drugs for treatment of Candida albicans infection. Mycoses 65, 794–805. doi: 10.1111/myc.13475

PubMed Abstract | Crossref Full Text | Google Scholar

Garcia-Vidal, C., Teijón-Lumbreras, C., Aiello, T. F., Chumbita, M., Menendez, R., Mateu-Subirà, A., et al. (2024). K-means clustering identifies diverse clinical phenotypes in COVID-19 patients: implications for mortality risks and remdesivir impact. Infect. Dis. Ther. 13, 715–726. doi: 10.1007/s40121-024-00938-x

PubMed Abstract | Crossref Full Text | Google Scholar

Ghosh, S., Sarkar Paria, D., Chatterjee, S. (2022). Comparative study on bacterial population dynamics of foregut, midgut, and hindgut content of perionyx excavatus (Perrier) isolated from eco-friendly, non-hazardous vermicompost. Appl. Biochem. Biotechnol. 194, 6126–6139. doi: 10.1007/s12010-022-03970-0

PubMed Abstract | Crossref Full Text | Google Scholar

Goodswen, S. J., Barratt, J. L. N., Kennedy, P. J., Kaufer, A., Calarco, L., Ellis, J. T. (2021). Machine learning and applications in microbiology. FEMS Microbiol. Rev. 45, fuab015. doi: 10.1093/femsre/fuab015

PubMed Abstract | Crossref Full Text | Google Scholar

Greenblott, D. N., Wood, C. V., Zhang, J., Viza, N., Chintala, R., Calderon, C. P., et al. (2024). Supervised and unsupervised machine learning approaches for monitoring subvisible particles within an aluminum-salt adjuvanted vaccine formulation. Biotechnol. Bioeng 121, 1626–1641. doi: 10.1002/bit.v121.5

PubMed Abstract | Crossref Full Text | Google Scholar

Greener, J. G., Kandathil, S. M., Moffat, L., Jones, D. T. (2022). A guide to machine learning for biologists. Nat. Rev. Mol. Cell Biol. 23, 40–55.

PubMed Abstract | Google Scholar

Gregor, I., Dröge, J., Schirmer, M., Quince, C., McHardy, A. C. (2016). PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes. PeerJ 4, e1603. doi: 10.7717/peerj.1603

PubMed Abstract | Crossref Full Text | Google Scholar

Habgood-Coote, D., Wilson, C., Shimizu, C., Barendregt, A. M., Philipsen, R., Galassini, R., et al. (2023). Diagnosis of childhood febrile illness using a multi-class blood RNA molecular signature. Med 4, 635–654.e5. doi: 10.1016/j.medj.2023.06.007

PubMed Abstract | Crossref Full Text | Google Scholar

He, X., Qian, K., Wang, Z., Zeng, S., Li, H., Li, W. V. (2023). scAce: an adaptive embedding and clustering method for single-cell gene expression data. Bioinf. 39 btad546. doi: 10.1093/bioinformatics/btad546

PubMed Abstract | Crossref Full Text | Google Scholar

He, S., Leanse, L. G., Feng, Y. (2021). Artificial intelligence and machine learning assisted drug delivery for effective treatment of infectious diseases. Advanced Drug Delivery Rev. 178, 113922. doi: 10.1016/j.addr.2021.113922

PubMed Abstract | Crossref Full Text | Google Scholar

Hopf, T. A., Ingraham, J. B., Poelwijk, F. J., Schärfe, C. P., Springer, M., Sander, C., et al. (2017). Mutation effects predicted from sequence co-variation. Nat. Biotechnol. 35, 128–135. doi: 10.1038/nbt.3769

PubMed Abstract | Crossref Full Text | Google Scholar

Hu, J., Szymczak, S. (2023). A review on longitudinal data analysis with random forest. Brief Bioinform. 24, bbad002. doi: 10.1093/bib/bbad002

PubMed Abstract | Crossref Full Text | Google Scholar

Huffman, A., Ong, E., Hur, J., D'Mello, A., Tettelin, H., He, Y. (2022). COVID-19 vaccine design using reverse and structural vaccinology, ontology-based literature mining and machine learning. Brief Bioinform. 23, bbac190. doi: 10.1093/bib/bbac190

PubMed Abstract | Crossref Full Text | Google Scholar

Jiang, J., Xu, J., Liu, Y., Song, B., Guo, X., Zeng, X., et al. (2023). Dimensionality reduction and visualization of single-cell RNA-seq data with an improved deep variational autoencoder. Brief Bioinform. 24, bbad152. doi: 10.1093/bib/bbad152

PubMed Abstract | Crossref Full Text | Google Scholar

Jin, Z., Ma, F., Chen, H., Guo, S. (2023). Leveraging machine learning to distinguish between bacterial and viral induced pharyngitis using hematological markers: a retrospective cohort study. Sci. Rep. 13, 22899. doi: 10.1038/s41598-023-49925-1

PubMed Abstract | Crossref Full Text | Google Scholar

Jolliffe, I. T., Cadima, J. (2016). Principal component analysis: a review and recent developments. Philos. Trans. A Math Phys. Eng. Sci. 374, 20150202. doi: 10.1098/rsta.2015.0202

PubMed Abstract | Crossref Full Text | Google Scholar

Kao, Y., Chu, P.-J., Chou, P.-C., Chen, C.-C. (2024). A dynamic approach to support outbreak management using reinforcement learning and semi-connected SEIQR models. BMC Public Health 24, 751. doi: 10.1186/s12889-024-18251-0

PubMed Abstract | Crossref Full Text | Google Scholar

Kaufman, S., Rosset, S., Perlich, C., Stitelman, O. (2012). Leakage in data mining: Formulation, detection, and avoidance. ACM Trans. Knowl. Discovery Data 6, 1–21. doi: 10.1145/2382577.2382579

Crossref Full Text | Google Scholar

Kaul, V., Enslin, S., Gross, S. A. (2020). History of artificial intelligence in medicine. Gastrointest Endosc 92, 807–812. doi: 10.1016/j.gie.2020.06.040

PubMed Abstract | Crossref Full Text | Google Scholar

Kim, Y., Jang, J. H., Park, N., Jeong, N. Y., Lim, E., Kim, S., et al. (2021). Machine learning approach for active vaccine safety monitoring. J. Korean Med. Sci. 36, e198. doi: 10.3346/jkms.2021.36.e198

PubMed Abstract | Crossref Full Text | Google Scholar

Kingsford, C., Salzberg, S. L. (2008). What are decision trees? Nat. Biotechnol. 26, 1011–1013.

PubMed Abstract | Google Scholar

Kobak, D., Berens, P. (2019). The art of using t-SNE for single-cell transcriptomics. Nat. Commun. 10, 5416. doi: 10.1038/s41467-019-13056-x

PubMed Abstract | Crossref Full Text | Google Scholar

Kolli, U., Jalodia, R., Moidunny, S., Singh, P. K., Ban, Y., Tao, J., et al. (2023). Multi-omics analysis revealing the interplay between gut microbiome and the host following opioid use. Gut Microbes 15, 2246184. doi: 10.1080/19490976.2023.2246184

PubMed Abstract | Crossref Full Text | Google Scholar

Koslicki, D., Chatterjee, S., Shahrivar, D., Walker, A. W., Francis, S. C., Fraser, L. J., et al. (2015). ARK: aggregation of reads by K-means for estimation of bacterial community composition. PloS One 10, e0140644. doi: 10.1371/journal.pone.0140644

PubMed Abstract | Crossref Full Text | Google Scholar

Kulkarni, T. D., Saeedi, A., Gautam, S., Gershman, S. J. (2016). Deep successor reinforcement learning. arXiv:1606.02396 [stat.ML]. doi: 10.48550/arXiv.1606.02396

Crossref Full Text | Google Scholar

Kumari, R., Sharma, S.D., Kumar, A., Ende, Z., Mishina, M., Wang, Y., et al. (2023). Antiviral approaches against influenza virus. Clin. Microbiol. Rev. 36, e00040–e00022.

PubMed Abstract | Google Scholar

Laine, S., Aila, T. (2017). Temporal ensembling for semi-supervised learning. doi: 10.48550/arXiv.1610.02242

Crossref Full Text | Google Scholar

Lee, Y. W., Choi, J. W., Shin, E.-H. (2021). Machine learning model for predicting malaria using clinical information. Comput. Biol. Med. 129, 104151. doi: 10.1016/j.compbiomed.2020.104151

PubMed Abstract | Crossref Full Text | Google Scholar

Lee, B., Chung, H. J., Kang, H. M., Kim, D. K., Kwak, Y. H. (2022). Development and validation of machine learning-driven prediction model for serious bacterial infection among febrile children in emergency departments. PloS One 17, e0265500. doi: 10.1371/journal.pone.0265500

PubMed Abstract | Crossref Full Text | Google Scholar

Li, J., Zhang, S., Li, B., Hu, Y., Kang, X. P., Wu, X. Y., et al. (2020a). Machine learning methods for predicting human-adaptive influenza A viruses based on viral nucleotide compositions. Mol. Biol. Evol. 37, 1224–1236. doi: 10.1093/molbev/msz276

PubMed Abstract | Crossref Full Text | Google Scholar

Li, Y., Wu, Y., Gao, Y., Niu, X., Li, J., Tang, M., et al. (2022). Machine-learning based prediction of prognostic risk factors in patients with invasive candidiasis infection and bacterial bloodstream infection: a singled centered retrospective study. BMC Infect. Dis. 22, 150. doi: 10.1186/s12879-022-07125-8

PubMed Abstract | Crossref Full Text | Google Scholar

Li, J., Socher, R., Hoi, S. C. H. (2020b). DivideMix: learning with noisy labels as semi-supervised learning. doi: 10.48550/arXiv.2002.07394

Crossref Full Text | Google Scholar

Liebal, U. W., Phan, A. N. T., Sudhakar, M., Raman, K., Blank, L. M. (2020). Machine learning applications for mass spectrometry-based metabolomics. Metabolites 10, 243. doi: 10.3390/metabo10060243

PubMed Abstract | Crossref Full Text | Google Scholar

Lillicrap, T. P., Jonathan, J.H., Alexander, P., Nicolas, H., Tom, E., Yuval, T., et al. (2019). Continuous control with deep reinforcement learning. arXiv:1509.02971[cs.LG]. doi: 10.48550/arXiv.1509.02971

Crossref Full Text | Google Scholar

Lima, E. O., Navarro, L.C., Morishita, K.N., Kamikawa, C.M., Rodrigues, R.G.M., Dabaja, M.Z., et al. (2020). Metabolomics and machine learning approaches combined in pursuit for more accurate paracoccidioidomycosis diagnoses. mSystems 5, e00258–e00220. doi: 10.1128/msystems.00258-20

PubMed Abstract | Crossref Full Text | Google Scholar

Liu, J., Lu, Y., Liu, J., Liang, J., Zhang, Q., Li, H., et al. (2023). Development and validation of a machine learning model to predict prognosis in HIV-negative cryptococcal meningitis patients: a multicenter study. Eur. J. Clin. Microbiol. Infect. Dis. 42, 1183–1194. doi: 10.1007/s10096-023-04653-2

PubMed Abstract | Crossref Full Text | Google Scholar

Liu, X., Hasan, M. R., Ahmed, K. A., Hossain, M. Z. (2023). Machine learning to analyse omic-data for COVID-19 diagnosis and prognosis. BMC Bioinf. 24, 7. doi: 10.1186/s12859-022-05127-6

PubMed Abstract | Crossref Full Text | Google Scholar

Lundberg, S. M., Lee, S.-I. (2017). “A unified approach to interpreting model predictions.” in NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 4768–4777.

Google Scholar

Lv, H., Shi, L., Berkenpas, J. W., Dao, F. Y., Zulfiqar, H., Ding, H., et al. (2021). Application of artificial intelligence and machine learning for COVID-19 drug discovery and vaccine design. Brief Bioinform. 22, bbab320. doi: 10.1093/bib/bbab320

PubMed Abstract | Crossref Full Text | Google Scholar

Ma, Y., Liu, L., Ma, Y., Zhang, S. (2023). HONMF: integration analysis of multi-omics microbiome data via matrix factorization and hypergraph. Bioinformatics 39, btad335. doi: 10.1093/bioinformatics/btad335

PubMed Abstract | Crossref Full Text | Google Scholar

Maharjan, R., Hada, S., Lee, J. E., Han, H. K., Kim, K. H., Seo, H. J., et al. (2023). Comparative study of lipid nanoparticle-based mRNA vaccine bioprocess with machine learning and combinatorial artificial neural network-design of experiment approach. Int. J. Pharm. 640, 123012. doi: 10.1016/j.ijpharm.2023.123012

PubMed Abstract | Crossref Full Text | Google Scholar

Mai, T.-L., Hu, G.-M., Chen, C.-M. (2016). Visualizing and clustering protein similarity networks: sequences, structures, and functions. J. Proteome Res. 15, 2123–2131. doi: 10.1021/acs.jproteome.5b01031

PubMed Abstract | Crossref Full Text | Google Scholar

Manandhar, I., Alimadadi, A., Aryal, S., Munroe, P. B., Joe, B., Cheng, X., et al. (2021). Gut microbiome-based supervised machine learning for clinical diagnosis of inflammatory bowel diseases. Am. J. Physiol. Gastrointest Liver Physiol. 320, G328–G337. doi: 10.1152/ajpgi.00360.2020

PubMed Abstract | Crossref Full Text | Google Scholar

Martino, C., Shenhav, L., Marotz, C. A., Armstrong, G., McDonald, D., Vázquez-Baeza, Y., et al. (2021). Context-aware dimensionality reduction deconvolutes gut microbial community dynamics. Nat. Biotechnol. 39, 165–168. doi: 10.1038/s41587-020-0660-7

PubMed Abstract | Crossref Full Text | Google Scholar

Mazzurana, L., Bonfiglio, F., Forkel, M., D'Amato, M., Halfvarson, J., Mjösberg, J., et al. (2021). Crohn’s disease is associated with activation of circulating innate lymphoid cells. Inflammation Bowel Dis. 27, 1128–1138. doi: 10.1093/ibd/izaa316

PubMed Abstract | Crossref Full Text | Google Scholar

McCloskey, R. M., Poon, A. F. Y. (2017). A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation. PloS Comput. Biol. 13, e1005868. doi: 10.1371/journal.pcbi.1005868

PubMed Abstract | Crossref Full Text | Google Scholar

McHardy, A. C., Martín, H. G., Tsirigos, A., Hugenholtz, P., Rigoutsos, I. (2007). Accurate phylogenetic classification of variable-length DNA fragments. Nat. Methods 4, 63–72. doi: 10.1038/nmeth976

PubMed Abstract | Crossref Full Text | Google Scholar

Mirdita, M., von den Driesch, L., Galiez, C., Martin, M. J., Söding, J., Steinegger, M. (2017). Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 45, D170–D176. doi: 10.1093/nar/gkw1081

PubMed Abstract | Crossref Full Text | Google Scholar

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature 518, 529–533. doi: 10.1038/nature14236

PubMed Abstract | Crossref Full Text | Google Scholar

Mobaderi, T., Kazemnejad, A., Salehi, M. (2025). Clustering and modeling joint-trajectories of HIV/AIDS and tuberculosis mortality rates using bayesian multi-process latent growth model: A global study from 1990 to 2021. BMC Infect. Dis. 25, 330. doi: 10.1186/s12879-025-10715-x

PubMed Abstract | Crossref Full Text | Google Scholar

Mourad, R. (2023). Semi-supervised learning improves regulatory sequence prediction with unlabeled sequences. BMC Bioinf. 24, 186. doi: 10.1186/s12859-023-05303-2

PubMed Abstract | Crossref Full Text | Google Scholar

Nanduri, S., Black, A., Bedford, T., Huddleston, J. (2024). Dimensionality reduction distills complex evolutionary relationships in seasonal influenza and SARS-CoV-2. Virus Evol. 10, veae087. doi: 10.1093/ve/veae087

PubMed Abstract | Crossref Full Text | Google Scholar

Ong, E., Cooke, M. F., Huffman, A., Xiang, Z., Wong, M. U., Wang, H., et al. (2021). Vaxign2: the second generation of the first Web-based vaccine design program using reverse vaccinology and machine learning. Nucleic Acids Res. 49, W671–W678. doi: 10.1093/nar/gkab279

PubMed Abstract | Crossref Full Text | Google Scholar

Ong, E., Wong, M. U., Huffman, A., He, Y. (2020). COVID-19 coronavirus vaccine design using reverse vaccinology and machine learning. Front. Immunol. 11, 1581. doi: 10.3389/fimmu.2020.01581

PubMed Abstract | Crossref Full Text | Google Scholar

Oselio, B., Singal, A. G., Zhang, X., Van, T., Liu, B., Zhu, J., et al. (2022). Reinforcement learning evaluation of treatment policies for patients with hepatitis C virus. BMC Med. Inform Decis Mak 22, 63. doi: 10.1186/s12911-022-01789-7

PubMed Abstract | Crossref Full Text | Google Scholar

Overhoff, B., Falls, Z., Mangione, W., Samudrala, R. A. (2021). Deep-Learning proteomic-Scale approach for drug design. Pharm. (Basel) 14, 1277. doi: 10.3390/ph14121277

PubMed Abstract | Crossref Full Text | Google Scholar

Pearl, J. (2009). Causal inference in statistics: An overview. Statist. Surv. 3. doi: 10.1214/09-SS057

Crossref Full Text | Google Scholar

Peiffer-Smadja, N., Rawson, T. M., Ahmad, R., Buchard, A., Georgiou, P., Lescure, F.-X., et al. (2020). Machine learning for clinical decision support in infectious diseases: a narrative review of current applications. Clin. Microbiol. Infect. 26, 584–595. doi: 10.1016/j.cmi.2019.09.009

PubMed Abstract | Crossref Full Text | Google Scholar

Pietiäinen, V., Polso, M., Migh, E., Guckelsberger, C., Harmati, M., Diosdi, A., et al. (2023). Image-based and machine learning-guided multiplexed serology test for SARS-CoV-2. Cell Rep. Methods 3, 100565. doi: 10.1016/j.crmeth.2023.100565

PubMed Abstract | Crossref Full Text | Google Scholar

Priya, S., Burns, M. B., Ward, T., Mars, R. A. T., Adamowicz, B., Lock, E. F., et al. (2022). Identification of shared and disease-specific host gene–microbiome associations across human diseases using multi-omic integration. Nat. Microbiol. 7, 780–795. doi: 10.1038/s41564-022-01121-z

PubMed Abstract | Crossref Full Text | Google Scholar

Radakovich, N., Nagy, M., Nazha, A. (2020). Machine learning in haematological Malignancies. Lancet Haematol 7, e541–e550. doi: 10.1016/S2352-3026(20)30121-6

PubMed Abstract | Crossref Full Text | Google Scholar

Rainio, O., Teuho, J., Klén, R. (2024). Evaluation metrics and statistical tests for machine learning. Sci. Rep. 14, 6086. doi: 10.1038/s41598-024-56706-x

PubMed Abstract | Crossref Full Text | Google Scholar

Reel, P. S., Reel, S., Pearson, E., Trucco, E., Jefferson, E. (2021). Using machine learning approaches for multi-omics data analysis: A review. Biotechnol. Adv. 49, 107739. doi: 10.1016/j.biotechadv.2021.107739

PubMed Abstract | Crossref Full Text | Google Scholar

Ren, Z., Li, W., Liu, Q., Dong, Y., Huang, Y. (2022). Profiling of the conjunctival bacterial microbiota reveals the feasibility of utilizing a microbiome-based machine learning model to differentially diagnose microbial keratitis and the core components of the conjunctival bacterial interaction network. Front. Cell Infect. Microbiol. 12, 860370. doi: 10.3389/fcimb.2022.860370

PubMed Abstract | Crossref Full Text | Google Scholar

Rhodes, N. J., Rohani, R., Yarnold, P. R., Pawlowski, A. E., Malczynski, M., Qi, C., et al. (2023). Machine Learning To Stratify Methicillin-Resistant Staphylococcus aureus Risk among Hospitalized Patients with Community-Acquired Pneumonia. Antimicrob. Agents Chemother. 67.

PubMed Abstract | Google Scholar

Saleem, M., Ali, S., Bilal, M., Safdar, K., Hassan, M. (2022). Development of multivariate classification models for the diagnosis of dengue virus infection. Photodiagnosis Photodyn. Ther. 40, 103136. doi: 10.1016/j.pdpdt.2022.103136

PubMed Abstract | Crossref Full Text | Google Scholar

Santos-Júnior, C. D., Torres, M. D. T., Duan, Y., Rodríguez Del Río, Á., Schmidt, T. S. B., Chong, H., et al. (2024). Discovery of antimicrobial peptides in the global microbiome with machine learning. Cell 187, 3761–3778.e16. doi: 10.1016/j.cell.2024.05.013

PubMed Abstract | Crossref Full Text | Google Scholar

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O. (2017). Proximal policy optimization algorithms. arXiv:1707.06347 [cs.LG]. doi: 10.48550/arXiv.1707.06347

Crossref Full Text | Google Scholar

Senelle, G., Guyeux, C., Refrégier, G., Sola, C. (2022). Investigating the diversity of tuberculosis spoligotypes with dimensionality reduction and graph theory. Genes (Basel) 13 (12), 2328.

PubMed Abstract | Google Scholar

Shan, W., Li, X., Yao, H., Lin, K. (2021). Convolutional neural network-based virtual screening. Curr. Med. Chem. 28, 2033–2047. doi: 10.2174/0929867327666200526142958

PubMed Abstract | Crossref Full Text | Google Scholar

Silva, A., Lee, B. Y., Clemens, D. L., Kee, T., Ding, X., Ho, C. M., et al. (2016). Output-driven feedback system control platform optimizes combinatorial therapy of tuberculosis using a macrophage cell culture model. Proc. Natl. Acad. Sci. U.S.A. 113, E2172–E2179. doi: 10.1073/pnas.1600812113

PubMed Abstract | Crossref Full Text | Google Scholar

Singh, S., Le, N. Q. K., Wang, C. (2024). VF-Pred: Predicting virulence factor using sequence alignment percentage and ensemble learning models. Comput. Biol. Med. 168, 107662. doi: 10.1016/j.compbiomed.2023.107662

PubMed Abstract | Crossref Full Text | Google Scholar

Smith, N. M., Lenhard, J. R., Boissonneault, K. R., Landersdorfer, C. B., Bulitta, J. B., Holden, P. N., et al. (2020). Using machine learning to optimize antibiotic combinations: dosing strategies for meropenem and polymyxin B against carbapenem-resistant Acinetobacter baumannii. Clin. Microbiol. Infect. 26, 1207–1213. doi: 10.1016/j.cmi.2020.02.004

PubMed Abstract | Crossref Full Text | Google Scholar

Srisuradetchai, P., Suksrikran, K. (2024). Random kernel k-nearest neighbors regression. Front. Big Data 7, 1402384. doi: 10.3389/fdata.2024.1402384

PubMed Abstract | Crossref Full Text | Google Scholar

Stracy, M., Snitser, O., Yelin, I., Amer, Y., Parizade, M., Katz, R., et al. (2022). Minimizing treatment-induced emergence of antibiotic resistance in bacterial infections. Science 375, 889–894. doi: 10.1126/science.abg9868

PubMed Abstract | Crossref Full Text | Google Scholar

Su, F., Su, M., Wei, W., Wu, J., Chen, L., Sun, X., et al. (2025). Integrating multi-omics data to reveal the host-microbiota interactome in inflammatory bowel disease. Gut Microbes. 17, 2476570.

PubMed Abstract | Google Scholar

Tang, Y., Wang, S., Qu, Y., Cui, Z., Zhang, W. (2023). Consistency and adversarial semi-supervised learning for medical image segmentation. Comput. Biol. Med. 161, 107018. doi: 10.1016/j.compbiomed.2023.107018

PubMed Abstract | Crossref Full Text | Google Scholar

Tong, L., Shi, W., Isgut, M., Zhong, Y., Lais, P., Gloster, L., et al. (2024). Integrating multi-omics data with EHR for precision medicine using advanced artificial intelligence. IEEE Rev. BioMed. Eng. 17, 80–97. doi: 10.1109/RBME.2023.3324264

PubMed Abstract | Crossref Full Text | Google Scholar

Topçuoğlu, B. D., Lesniak, N. A., Ruffin, M. T., Wiens, J., Schloss, P. D. (2020). A framework for effective application of machine learning to microbiome-based classification problems. mBio 11, e00434–e00420. doi: 10.1128/mBio.00434-20

PubMed Abstract | Crossref Full Text | Google Scholar

Tsuyuzaki, K., Sato, H., Sato, K., Nikaido, I. (2020). Benchmarking principal component analysis for large-scale single-cell RNA-sequencing. Genome Biol. 21 (1), 9.

PubMed Abstract | Google Scholar

Valkenborg, D., Rousseau, A.-J., Geubbelmans, M., Burzykowski, T. (2023). Support vector machines. Am. J. Orthodontics Dentofacial Orthopedics 164, 754–757. doi: 10.1016/j.ajodo.2023.08.003

PubMed Abstract | Crossref Full Text | Google Scholar

Vervier, K., Mahé, P., Tournoud, M., Veyrieras, J.-B., Vert, J.-P. (2016). Large-scale machine learning for metagenomics sequence classification. Bioinformatics 32, 1023–1032. doi: 10.1093/bioinformatics/btv683

PubMed Abstract | Crossref Full Text | Google Scholar

Wang, Z., Pang, Y., Chung, C. R., Wang, H. Y., Cui, H., Chiang, Y. C., et al. (2023). A risk assessment framework for multidrug-resistant Staphylococcus aureus using machine learning and mass spectrometry technology. Briefings Bioinf. 24.

PubMed Abstract | Google Scholar

Wei, Q., Dunbrack, R. L. (2013). The role of balanced training and testing data sets for binary classifiers in bioinformatics. PloS One 8, e67863. doi: 10.1371/journal.pone.0067863

PubMed Abstract | Crossref Full Text | Google Scholar

Wei, Z., Wang, S., Wang, Z., Zhang, Y., Chen, K., Gong, L., et al. (2023). Development and multi-center validation of machine learning model for early detection of fungal keratitis. EBioMedicine 88, 104438. doi: 10.1016/j.ebiom.2023.104438

PubMed Abstract | Crossref Full Text | Google Scholar

Weis, C. V., Jutzeler, C. R., Borgwardt, K. (2020). Machine learning for microbial identification and antimicrobial susceptibility testing on MALDI-TOF mass spectra: a systematic review. Clin. Microbiol. Infect. 26, 1310–1317. doi: 10.1016/j.cmi.2020.03.014

PubMed Abstract | Crossref Full Text | Google Scholar

Wong, F., Zheng, E. J., Valeri, J. A., Donghia, N. M., Anahtar, M. N., Omori, S., et al. (2024). Discovery of a structural class of antibiotics with explainable deep learning. Nature 626, 177–185. doi: 10.1038/s41586-023-06887-8

PubMed Abstract | Crossref Full Text | Google Scholar

Xiang, Y., Du, J., Fujimoto, K., Li, F., Schneider, J., Tao, C. (2022). Application of artificial intelligence and machine learning for HIV prevention interventions. Lancet HIV 9, e54–e62. doi: 10.1016/S2352-3018(21)00247-2

PubMed Abstract | Crossref Full Text | Google Scholar

Yan, C., Duan, G., Wu, F.-X., Wang, J. (2019). IILLS: predicting virus-receptor interactions based on similarity and semi-supervised learning. BMC Bioinf. 20, 651. doi: 10.1186/s12859-019-3278-3

PubMed Abstract | Crossref Full Text | Google Scholar

Yang, Y., Xu, B., Murray, J., Haverstick, J., Chen, X., Tripp, R. A., et al. (2022). Rapid and quantitative detection of respiratory viruses using surface-enhanced Raman spectroscopy and machine learning. Biosens Bioelectron 217, 114721. doi: 10.1016/j.bios.2022.114721

PubMed Abstract | Crossref Full Text | Google Scholar

Yang, S., Bai, M., Liu, W., Li, W., Zhong, Z., Kwok, L. Y., et al. (2025). Predicting Lactobacillus delbrueckii subsp. bulgaricus-Streptococcus thermophilus interactions based on a highly accurate semi-supervised learning method. Sci. China Life Sci. 68, 558–574. doi: 10.1007/s11427-023-2569-7

PubMed Abstract | Crossref Full Text | Google Scholar

Yeung, W., Zhou, Z., Mathew, L., Gravel, N., Taujale, R., O'Boyle, B., et al. (2023). Tree visualizations of protein sequence embedding space enable improved functional clustering of diverse protein superfamilies. Brief Bioinform. 24, bbac619. doi: 10.1093/bib/bbac619

PubMed Abstract | Crossref Full Text | Google Scholar

Yu, C., Dong, Y., Liu, J., Ren, G. (2019). Incorporating causal factors into reinforcement learning for dynamic treatment regimes in HIV. BMC Med. Inform Decis Mak 19, 60. doi: 10.1186/s12911-019-0755-6

PubMed Abstract | Crossref Full Text | Google Scholar

Zacher, B., Czogiel, I. (2022). Supervised learning using routine surveillance data improves outbreak detection of Salmonella and Campylobacter infections in Germany. PloS One 17, e0267510. doi: 10.1371/journal.pone.0267510

PubMed Abstract | Crossref Full Text | Google Scholar

Zhang, L., Huang, T., Xu, F., Li, S., Zheng, S., Lyu, J., et al. (2022). Prediction of prognosis in elderly patients with sepsis based on machine learning (random survival forest). BMC Emergency Med. 22.

Google Scholar

Keywords: machine learning, artificial intelligence, clinical microbiology, infectious diseases, application

Citation: Xu C, Zhao L-Y, Ye C-S, Xu K-C and Xu K-Y (2025) The application of machine learning in clinical microbiology and infectious diseases. Front. Cell. Infect. Microbiol. 15:1545646. doi: 10.3389/fcimb.2025.1545646

Received: 15 December 2024; Accepted: 08 April 2025;
Published: 01 May 2025.

Edited by:

Hong Li, Hainan University, China

Reviewed by:

Yong-Zi Chen, Tianjin Medical University Cancer Institute and Hospital, China
Sushil Shakyawar, University of Nebraska Medical Center, United States

Copyright © 2025 Xu, Zhao, Ye, Xu and Xu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Ke-Chen Xu, ODAxNTA1OUB6anUuZWR1LmNu; Ke-Yang Xu, a3l4dUBtdXN0LmVkdS5tbw==

^†These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.