Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Genet.

Sec. Computational Genomics

Volume 16 - 2025 | doi: 10.3389/fgene.2025.1650244

This article is part of the Research TopicInsights in Computational Genomics: 2025View all articles

HFTC: A Hierarchical Fungal Taxonomic Classification Model for ITS Sequences Using Low-Dimensional Embedding Features

Provisionally accepted
Jiawei  WangJiawei Wang1Shaojie  QiaoShaojie Qiao1*Chao  WangChao Wang2*Dongsheng  XiangDongsheng Xiang1Yangcheng  LiaoYangcheng Liao1
  • 1Chengdu University of Information Technology, Chengdu, China
  • 2Guangxi Medical University, Nanning, China

The final, formatted version of the article will be published soon.

Introduction: Fungal identification through ITS sequencing is pivotal for biodiversity and ecological studies, yet existing methods often face challenges with high-dimensional features and inconsistent taxonomy predictions. Method: We proposed HFTC, a hierarchical fungal taxonomic classifier built upon a multi-level Random Forest architecture. Notably, HFTC incorporates a bidirectional k-mer strategy to capture contextual information from both sequence orientations. By leveraging Word2Vec embedding, it reduces the feature dimensionality from 4k to just 200, significantly improving computational efficiency while preserving rich sequence context. Result: Experimental results demonstrate that HFTC outperforms Mothur, RDP, Sintax, QIIME2 and CNN-Duong, achieving MCC of 95.31% despite uneven class distributions. Its overall ACC reaches 95.25%. Notably, it attains a hierarchical accuracy (HA) of 95.10% at the species level, surpassing the best-performing deep learning baseline, CNN-Duong, by 3.2%. Moreover, HFTC exhibits the smallest discrepancy between ACC and HA for just 1.60‰, in contrast to CNN-Duong, which shows the largest gap for 35.00‰, highlighting HFTC's superior hierarchical consistency. Discussion: HFTC offers a scalable and accurate approach for fungal taxonomic classification. Its compact feature representation and hierarchical architecture make it particularly suitable for microbial diversity research.

Keywords: Fungal identification, ITS sequencing, Hierarchical classification, Word2Vec embedding, Random forests

Received: 19 Jun 2025; Accepted: 11 Sep 2025.

Copyright: © 2025 Wang, Qiao, Wang, Xiang and Liao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Shaojie Qiao, Chengdu University of Information Technology, Chengdu, China
Chao Wang, Guangxi Medical University, Nanning, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.