AUTHOR=Kirtsanis Georgios , Dolias Georgios , Kintzios Spyridon , Ioannidis Konstantinos , Vrochidis Stefanos , Kompatsiaris Ioannis 

TITLE=DL-based organism-level microbial identification via VOCs fingerprints through gas chromatography – ion mobility spectrometry

JOURNAL=Frontiers in Bacteriology

VOLUME=Volume 4 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/bacteriology/articles/10.3389/fbrio.2025.1620906

DOI=10.3389/fbrio.2025.1620906

ISSN=2813-6144

ABSTRACT=IntroductionOrganism-level microbial identification is a well-established topic in literature. Due to biosafety concerns, specifically identifying pathogenic bacteria is of critical importance. This study positions Deep Learning (DL) - based chemometric analysis as a promising strategy for organism-level microbial identification, with potential translational value for rapid diagnostics. Various chemometric methods have been applied to analyze pure and mixed cultures of microorganisms and generate data via Volatile Organic Compounds (VOCs) fingerprints for classification. Although Gas Chromatography - Ion Mobility Spectrometry (GC-IMS) is a promising chemometric technique in this field, limited research has explored its potential for organism-level microbial identification. Materials and methodsIn this study, GC-IMS prototypes were employed to generate two-dimensional spectral data, which were then used to train supervised classification models. Utilizing a publicly available dataset of four microorganisms, we conduct a series of experiments to perform multi-class classification of pure and mixed cultures. Additionally, we introduce innovative experiments for distinguishing bacteria from fungi and Gram-positive from Gram-negative bacteria. We further investigate the presence and pureness of two pathogenic bacteria, Escherichia coli and Pseudomonas fluorescens, within the cultures. To achieve this, we apply eight Machine Learning and DL baseline methods, while following a five-fold cross-validation evaluation protocol and presenting a wide set of evaluation metrics to ensure result reproducibility and models’ generalization. A further evaluation of DL models is also conducted to report the training times and the number of parameters of the proposed DL methods.ResultsOur key findings highlight a Fully Connected Neural Network (FCNN) with four hidden layers as the most efficient model, consistently achieving the best performance across all tasks in comparison to the other tested models of this study. Additionally, the FCNN model provides fast training and maintains a relatively small number of parameters compared to other DL approaches. DiscussionWhile the dataset’s limited size and class imbalance present challenges such as potential overfitting and optimistic bias, the results achieved so far are encouraging and demonstrate the model’s strong potential. Future work should aim to expand the dataset across multiple sites and instruments and include clinical validation on real-world samples to further enhance generalizability and ensure translational impact.