Composition-Centered Prediction of Kenaf Core Saccharification for Next-Generation Bioethanol via Machine Learning

Niu, Yitong; Tye, Ying Ying; Lee, Chee Keong; Ahmad, Mardiana  Idayu; Leh, Cheu Peng

doi:10.3389/ffuel.2025.1722932

ORIGINAL RESEARCH article

Front. Fuels

Sec. Biofuels

This article is part of the Research TopicNext-Generation Biofuels from Lignocellulosic Biomass: Catalytic Pathways, Microbial Engineering, and Process IntegrationView all articles

Composition-Centered Prediction of Kenaf Core Saccharification for Next-Generation Bioethanol via Machine Learning

Provisionally accepted

Yitong Niu^*

Ying Ying Tye

Chee Keong Lee

Mardiana Idayu Ahmad

Cheu Peng Leh^*

Universiti Sains Malaysia, Minden Heights, Malaysia

The final, formatted version of the article will be published soon.

Biomass pretreatment outcomes are heterogeneous across routes and severities, and condition-centred empirical models often fail to generalize beyond the settings on which they were trained. This gap limits early-stage decisions about where to focus costly wet-lab effort. To address this, the study examines a composition-centred surrogate that treats the post-pretreatment solid composition-cellulose, hemicellulose, lignin—as the input space and predicts enzymatic glucose yield as the response. Using n=35 kenaf-core samples under a fixed hydrolysis protocol, random-forest models were tuned by six optimizers. Held-out performance clusters tightly (test R2≈0.49-0.55; RMSE 4.42-4.69 GY%), indicating that attainable accuracy is governed more by model capacity and data coverage than by optimizer choice. Feature diagnostics converge on a cellulose-led mechanism (cellulose positive and monotonic; lignin negative; hemicellulose weaker and context-dependent). Iso-yield maps translate these patterns into feasible composition windows that prioritize high-cellulose/low-lignin targets. Given this accuracy band, the surrogate is best positioned for uncertainty-aware screening to prune unproductive regions before targeted design-of-experiments, rather than replacing it.

Keywords: bioethanol, Kenaf core, pretreatment, machine learning, Yield prediction, Heuristic optimization

Received: 11 Oct 2025; Accepted: 07 Nov 2025.

Copyright: © 2025 Niu, Tye, Lee, Ahmad and Leh. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Yitong Niu, itong_niu@163.com
Cheu Peng Leh, cpleh@usm.my

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.