ORIGINAL RESEARCH article
Front. Mol. Biosci.
Sec. Molecular Diagnostics and Therapeutics
Volume 12 - 2025 | doi: 10.3389/fmolb.2025.1631265
This article is part of the Research TopicTransforming Chronic Disease Treatment with AI and Big DataView all 3 articles
Using Machine Learning Methods to Investigate the Role of Volatile Organic Compounds in Non-Alcoholic Fatty Liver Disease
Provisionally accepted- 1Tri-Service General Hospital, Taipei, Taiwan
- 2National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- 3Fu Jen Catholic University Hospital, New Taipei City, Taiwan
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Approximately 25-30% of the global population is affected by non-alcoholic fatty liver disease (NAFLD). This study aimed to explore whether NAFLD could be effectively detected using 341 volatile organic compounds (VOCs) via 10 machine learning (Mach-L) algorithms in a cohort of 1,501 individuals.Participants were selected from the Taiwan MJ cohort, which includes comprehensive demographic, biochemical, lifestyle, and VOCs data. NAFLD was diagnosed by experienced gastroenterologists. Exhaled breath samples were collected using a 1.0-L aluminum bag (late expiratory fraction) and analyzed with selected-ion flow-tube mass spectrometry. Ten Mach-L techniques were employed to evaluate two predictive models: Model 1 (demographic, lifestyle, and biochemical data), and Model 2 (Model 1 + VOCs), assessed using area under the receiver operating characteristic curve (AUC).Subjects with NAFLD had significantly higher values for age, BMI, blood pressure, and other biomedical markers, except for eGFR and HDL-C. Key predictors of NAFLD included BMI, triglycerides (TG), uric acid (UA), fasting plasma glucose (FPG), γ-GT, gender, LDL-C, and sleep duration. The addition of VOCs to Model 1 improved the AUC from 0.722 ± 0.149 to 0.770 ± 0.264 (p < 0.001). Ten VOCs were identified as the most influential, in order of importance: 2-propanol, acetone,butyl 2-methylbutanoate, diethylethanolamine, urethane, βcaryophyllene, furfural, tridecane, 4-methyloctanoic acid, and (S)-2-methyl-1- butanol.Incorporating VOCs into traditional demographic, biochemical, and lifestyle data significantly enhanced the model's predictive performance. This suggests that VOCs may be associated with the underlying pathophysiology of NAFLD.
Keywords: Volatile Organic Compounds, non-alcoholic fatty liver, machine learning, AI, cohort
Received: 19 May 2025; Accepted: 18 Jun 2025.
Copyright: © 2025 Shen, Huang, Li, Chu and Pei. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Ruei-Hao Huang, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Yaw-Kuen Li, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Ta-Wei Chu, Tri-Service General Hospital, Taipei, Taiwan
Dee Pei, Fu Jen Catholic University Hospital, New Taipei City, Taiwan
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.