Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Bacteriol.

Sec. Bacterial Genetics and AI-enhanced Microbial Engineering

Volume 4 - 2025 | doi: 10.3389/fbrio.2025.1620906

DL-Based Organism-Level Microbial Identification via VOCs Fingerprints through Gas Chromatography – Ion Mobility Spectrometry

Provisionally accepted
  • Information Technologies Institute (ITI), Centre for Research and Technology Hellas (CERTH), Thessaloniki, Greece

The final, formatted version of the article will be published soon.

Organism-level microbial identification is a well-established topic in the literature. Due to biosafety concerns, specifically identifying pathogenic bacteria is of critical importance. This study positions DL-based chemometric analysis as a promising strategy for organism-level microbial identification, with potential translational value for rapid diagnostics. Various chemometric methods have been applied to analyze pure and mixed cultures of microorganisms and generate data via Volatile Organic Compounds (VOCs) fingerprints for classification. Although Gas Chromatography - Ion Mobility Spectrometry (GC-IMS) is a promising chemometric technique in this field, limited research has explored its potential for organism-level microbial identification. Through GC-IMS prototypes that generate two-dimensional spectral data, we train Deep Learning (DL) models for supervised classification. Utilizing a publicly available dataset of four microorganisms, we conduct a series of experiments to perform multi-class classification of pure and mixed cultures. Additionally, we introduce innovative experiments for distinguishing bacteria from fungi and Gram-positive from Gram-negative bacteria. We further investigate the presence and pureness of two pathogenic bacteria, Escherichia coli and Pseudomonas fluorescens, within the cultures. To achieve this, we apply eight Machine Learning and DL baseline methods, while following a five-fold cross-validation evaluation protocol and presenting a wide set of evaluation metrics to ensure result reproducibility and method generalization. A further evaluation of DL models' is also conduced to report the training times and the number of parameters of the proposed DL methods. Our key findings highlight a Fully Connected Neural Network with four hidden layers as the most efficient model, consistently achieving the best performance across all tasks. This model provides fast training and maintains a relatively small number of parameters compared to other DL approaches. Despite these promising results, the dataset's limited size and class imbalance pose risks of overfitting and optimistic bias. Future work will focus on building larger, multi-site datasets across diverse instruments and laboratories, alongside clinical validation on real-world samples, to ensure model generalization and translational impact.

Keywords: biosafety, Organism-Level Microbial Identification, volatile organic compounds (VOCs), chemometrics, GC-IMS, Deeplearning, machine learning

Received: 30 Apr 2025; Accepted: 31 Aug 2025.

Copyright: © 2025 Kirtsanis, Dolias, Kintzios, Ioannidis, Vrochidis and Kompatsiaris. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Georgios Kirtsanis, Information Technologies Institute (ITI), Centre for Research and Technology Hellas (CERTH), Thessaloniki, Greece

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.