ORIGINAL RESEARCH article
Front. Plant Sci.
Sec. Plant Physiology
Metabolomic Analysis of Yunnan Cigar Tobacco Leaves: Impact of Geography and Climate on Flavor Characteristics and Machine Learning-Based Origin Traceability
Provisionally accepted- 1Yunnan Minzu University, Kunming, China
- 2Yunnan Academy of Tobacco Agricultural Science, kunming, China
- 3China Tobacco Yunnan Industrial Corporation, Kunming, China
- 4Kunming University, Kunming, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
In order to explore how the geographical and climatic characteristics of Yunnan shape the unique metabolic profile of its cigar tobacco leaves (CTLs) and develop a reliable method for tracing the origin of CTLs using machine learning algorithms. 71 samples were collected from country scale Dominica, Indonesia and Yunnan, and prefecture scale Lincang, Puer, Yuxi in Yunnan. A non-targeted metabolomics approach was applied to investigate the metabolic differences, and 778 highly reliable metabolites were identified. Associated with high altitude, large day-night temperature differences, intense ultraviolet radiation, and relative dryness (drought), Yunnan CTLs exhibit distinct metabolic profiles. Specifically, pathways such as flavone and flavonol biosynthesis and betalain biosynthesis are significantly enriched. With higher contents of polyphenols, indoles, jasmonates, carotenoids and so on, contribute to Yunnan CTLs' distinctive woody, roasted, and astringent flavor profile. 12 key biomarkers were selected using Multivariable methods with unbiased variable selection in R (MUVR). Machine learning algorithms—including LDA, LR, GMM, KNN, and SVM—leveraged these biomarkers to achieve exceptional origin traceability across national (Yunnan vs. Dominica/Indonesia) and regional (Lincang, Pu'er, Yuxi) scales. Validation demonstrated with a median of 100 false classification rates about 0.1 and the AUC close to 1, underscoring the model's high accuracy and robustness.
Keywords: biomarkers, Flavor profile, geographical origin, machine learning, Metabolomics
Received: 12 Sep 2025; Accepted: 29 Dec 2025.
Copyright: © 2025 Zhao, Wu, Li, Li, Wang, Yang, Lin, Yao, Kong, Jiao, Zhao, Zhang, Zhao, Zhang and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Jin Wang
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
