Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Med.

Sec. Hepatobiliary Diseases

Volume 12 - 2025 | doi: 10.3389/fmed.2025.1596476

This article is part of the Research TopicDigital Technologies in Hepatology: Diagnosis, Treatment, and Epidemiological InsightsView all 7 articles

Exploratory Integration of Near-Infrared Spectroscopy with Clinical Data: A Machine Learning Approach for HCV Detection in Serum Samples

Provisionally accepted
Eloy  Pérez GómezEloy Pérez Gómez1José  GómezJosé Gómez1Jennifer  GonzaloJennifer Gonzalo1Sergio  SalgüeroSergio Salgüero2Daniel  RiadoDaniel Riado3María Luisa  CasasMaría Luisa Casas4María Luisa  GutiérrezMaría Luisa Gutiérrez4Elena  JaimeElena Jaime4Enrique  Pérez-MartínezEnrique Pérez-Martínez1Rafael  García-CarreteroRafael García-Carretero5Javier  RamosJavier Ramos1Conrado  Fernández-RodríguezConrado Fernández-Rodríguez4Myriam  CataláMyriam Catalá1Luca  MartinoLuca Martino6Óscar  Barquero-PérezÓscar Barquero-Pérez1*
  • 1Rey Juan Carlos University, Móstoles, Madrid, Spain
  • 2El Escorial Hospital, San Lorenzo de El Escorial, Madrid, Spain
  • 3Hospital Universitario Rey Juan Carlos, Madrid, Madrid, Spain
  • 4Hospital Universitario Fundación Alcorcón, Alcorcón, Madrid, Spain
  • 5Hospital Universitario de Móstoles, Móstoles, Madrid, Spain
  • 6University of Catania, Catania, Sicily, Italy

The final, formatted version of the article will be published soon.

In this study, we propose a novel approach that combines Near-Infrared Spectroscopy (NIRS) and clinical data with machine learning (ML) to improve Hepatitis C Virus (HCV) detection in serum samples. NIRS offers a fast, non-destructive, and residue-free alternative to traditional diagnostic methods, while ML models enable feature selection and predictive analysis. We applied L1-regularized Logistic Regression (L1-LR) to identify the most informative wavelengths for HCV detection within the 1000-2500 nm range, and then integrated these spectral features with routine clinical markers using a Random Forest (RF) model. Our dataset comprised 137 serum samples from 38 patients, each represented by a NIRS spectrum and clinical data from blood tests. After preprocessing with Standard Normal Variate (SNV) correction and downsampling, the bestperforming RF model, which combined NIRS features and clinical data, achieved an accuracy of 72.2% and an AUC-ROC of 0.850, outperforming models using only clinical or spectral data.Feature importance analysis highlighted specific wavelengths near 1150 nm, 1410 nm, and 1927 nm, associated with water molecular states and liver function biomarkers (GPT, GOT, GGT), reinforcing the biological relevance of this approach. These findings suggest that integrating NIRS and clinical data through machine learning enhances HCV diagnostic capabilities, offering a scalable and non-invasive alternative for early detection and risk assessment.

Keywords: NIRS, HCV, Hepatitis C, machine learning, Permutation feature importance

Received: 19 Mar 2025; Accepted: 16 May 2025.

Copyright: © 2025 Pérez Gómez, Gómez, Gonzalo, Salgüero, Riado, Casas, Gutiérrez, Jaime, Pérez-Martínez, García-Carretero, Ramos, Fernández-Rodríguez, Catalá, Martino and Barquero-Pérez. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Óscar Barquero-Pérez, Rey Juan Carlos University, Móstoles, 28933, Madrid, Spain

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.