AUTHOR=Shen Chih-Hao , Huang Ruei-Hao , Li Yaw-Kuen , Chu Ta-Wei , Pei Dee TITLE=Using machine learning methods to investigate the role of volatile organic compounds in non-alcoholic fatty liver disease JOURNAL=Frontiers in Molecular Biosciences VOLUME=Volume 12 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/molecular-biosciences/articles/10.3389/fmolb.2025.1631265 DOI=10.3389/fmolb.2025.1631265 ISSN=2296-889X ABSTRACT=AimsApproximately 25%–30% of the global population is affected by non-alcoholic fatty liver disease (NAFLD). This study aimed to explore whether NAFLD could be effectively detected using 341 volatile organic compounds (VOCs) via 10 machine learning (Mach-L) algorithms in a cohort of 1,501 individuals.MethodsParticipants were selected from the Taiwan MJ cohort, which includes comprehensive demographic, biochemical, lifestyle, and VOCs data. NAFLD was diagnosed by experienced gastroenterologists. Exhaled breath samples were collected using a 1.0-L aluminum bag (late expiratory fraction) and analyzed with selected-ion flow-tube mass spectrometry. Ten Mach-L techniques were employed to evaluate two predictive models: Model 1 (demographic, lifestyle, and biochemical data), and Model 2 (Model 1 + VOCs), assessed using area under the receiver operating characteristic curve (AUC).ResultsSubjects with NAFLD had significantly higher values for age, BMI, blood pressure, and other biomedical markers, except for eGFR and HDL-C. Key predictors of NAFLD included BMI, triglycerides (TG), uric acid (UA), fasting plasma glucose (FPG), γ-GT, gender, LDL-C, and sleep duration. The addition of VOCs to Model 1 improved the AUC from 0.722 ± 0.149 to 0.770 ± 0.264 (p < 0.001). Ten VOCs were identified as the most influential, in order of importance: 2-propanol, acetone, butyl 2-methylbutanoate, diethylethanolamine, urethane, β-caryophyllene, furfural, tridecane, 4-methyloctanoic acid, and (S)-2-methyl-1-butanol.ConclusionIncorporating VOCs into traditional demographic, biochemical, and lifestyle data significantly enhanced the model’s predictive performance. This suggests that VOCs may be associated with the underlying pathophysiology of NAFLD.