ORIGINAL RESEARCH article

Front. Mol. Biosci.

Sec. Molecular Diagnostics and Therapeutics

Volume 12 - 2025 | doi: 10.3389/fmolb.2025.1631265

This article is part of the Research TopicTransforming Chronic Disease Treatment with AI and Big DataView all 3 articles

Using Machine Learning Methods to Investigate the Role of Volatile Organic Compounds in Non-Alcoholic Fatty Liver Disease

Provisionally accepted
Chih-Hao  ShenChih-Hao Shen1Ruei-Hao  HuangRuei-Hao Huang2*Yaw-Kuen  LiYaw-Kuen Li2*Ta-Wei  ChuTa-Wei Chu1*Dee  PeiDee Pei3*
  • 1Tri-Service General Hospital, Taipei, Taiwan
  • 2National Yang Ming Chiao Tung University, Hsinchu, Taiwan
  • 3Fu Jen Catholic University Hospital, New Taipei City, Taiwan

The final, formatted version of the article will be published soon.

Approximately 25-30% of the global population is affected by non-alcoholic fatty liver disease (NAFLD). This study aimed to explore whether NAFLD could be effectively detected using 341 volatile organic compounds (VOCs) via 10 machine learning (Mach-L) algorithms in a cohort of 1,501 individuals.Participants were selected from the Taiwan MJ cohort, which includes comprehensive demographic, biochemical, lifestyle, and VOCs data. NAFLD was diagnosed by experienced gastroenterologists. Exhaled breath samples were collected using a 1.0-L aluminum bag (late expiratory fraction) and analyzed with selected-ion flow-tube mass spectrometry. Ten Mach-L techniques were employed to evaluate two predictive models: Model 1 (demographic, lifestyle, and biochemical data), and Model 2 (Model 1 + VOCs), assessed using area under the receiver operating characteristic curve (AUC).Subjects with NAFLD had significantly higher values for age, BMI, blood pressure, and other biomedical markers, except for eGFR and HDL-C. Key predictors of NAFLD included BMI, triglycerides (TG), uric acid (UA), fasting plasma glucose (FPG), γ-GT, gender, LDL-C, and sleep duration. The addition of VOCs to Model 1 improved the AUC from 0.722 ± 0.149 to 0.770 ± 0.264 (p < 0.001). Ten VOCs were identified as the most influential, in order of importance: 2-propanol, acetone,butyl 2-methylbutanoate, diethylethanolamine, urethane, βcaryophyllene, furfural, tridecane, 4-methyloctanoic acid, and (S)-2-methyl-1- butanol.Incorporating VOCs into traditional demographic, biochemical, and lifestyle data significantly enhanced the model's predictive performance. This suggests that VOCs may be associated with the underlying pathophysiology of NAFLD.

Keywords: Volatile Organic Compounds, non-alcoholic fatty liver, machine learning, AI, cohort

Received: 19 May 2025; Accepted: 18 Jun 2025.

Copyright: © 2025 Shen, Huang, Li, Chu and Pei. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Ruei-Hao Huang, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Yaw-Kuen Li, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Ta-Wei Chu, Tri-Service General Hospital, Taipei, Taiwan
Dee Pei, Fu Jen Catholic University Hospital, New Taipei City, Taiwan

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.