AUTHOR=Bai Fang , Gong Zelong , Cui Dong , Zhang Xiaomei , Hong Wenteng , Gao Yi , Lin Kai , Chen Weijie , Li Lu , Huang Juan , Zheng Biying , Xu Junfa , Xiao Na TITLE=Development of a host-signature-based machine learning model to diagnose bacterial and viral infections in febrile children JOURNAL=Frontiers in Pediatrics VOLUME=Volume 13 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/pediatrics/articles/10.3389/fped.2025.1608812 DOI=10.3389/fped.2025.1608812 ISSN=2296-2360 ABSTRACT=BackgroundEarly aetiological diagnosis is critical for the management of febrile children with infectious illness, as it strongly influences the choice of appropriate medication and can affect a child's complications and outcome. New diagnostic strategies based on host genes have recently been developed and have achieved high accuracy and clinical practicability. In this study, through integrative bioinformatics analysis, we aimed to construct artificial neural network (ANN, multilayer perceptron) and random forest (RF) models based on host gene signatures to diagnose bacterial or viral (B/V) infection in febrile children.ResultsTranscriptome data from the whole blood of children were collected from a public database. Of these, 384 febrile young children (definite bacterial: n = 135, definite viral: n = 249) were involved in the construction of the RF model. For the generalized RF model, 1,042 patients were included with various aetiological infections, such as Staphylococcus aureus, pathogenic Escherichia coli, Salmonella, Shigella, adenovirus, HHV6, enterovirus, rhinovirus, human rotavirus, human norovirus, and influenza A pneumonia. The overlap of 57 candidate genes between the 117 differentially expressed genes (DEGs) and the 264 module member genes was identified through DEGs analysis and weighted gene co-expression network analysis (WGCNA). Subsequently, L1 regularization algorithms and variable significance analysis (multilayer perceptron) were used to simplify and rank the predictive features, and LCN2 (100.0%), IFI27 (84.4%), SLPI (63.2%), IFIT2 (44.6%) and PI3 (44.5%) were identified as the top predictors. By utilizing the transformed value RefValue (i) of these five genes, the RF model achieved an AUC of 0.9917 in training and 0.9517 in testing for diagnosing B/V infection in children. The ANN model achieved an AUC of 0.9540 in testing. Furthermore, a generalized RF model involving 1,042 patients was developed to predict different aetiological types of samples, achieving an AUC of 0.9421 in training and 0.8968 in testing.ConclusionsA five-gene host signature (IFIT2, SLPI, IFI27, LCN2, and PI3) was identified and successfully used to construct an RF model that distinguishes B/V infection in febrile children, achieving 85.3% accuracy, 95.1% sensitivity, and 80.0% specificity, and to construct an ANN model that achieves 92.4% accuracy, 86.8% sensitivity, and 95% specificity.