ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Machine Learning and Artificial Intelligence

Volume 8 - 2025 | doi: 10.3389/frai.2025.1579998

Dynamic Taxonomy Generation for Future Skills Identification using a Named Entity Recognition and Relation Extraction Pipeline

Provisionally accepted
Luis Jose  Gonzalez-GomezLuis Jose Gonzalez-Gomez1Sofia Margarita  Hernandez-MunozSofia Margarita Hernandez-Munoz2Abiel  BorjaAbiel Borja2Fernando  A. Arana-SalasFernando A. Arana-Salas2Jose Daniel  AzofeifaJose Daniel Azofeifa1,2Julieta  NoguezJulieta Noguez2Patricia  CaratozzoloPatricia Caratozzolo1,2*
  • 1Institute for the Future of Education, Tecnologico de Monterrey, Monterrey, México, Mexico
  • 2School of Engineering and Sciences, Tecnologico de Monterrey, Mexico City, Mexico

The final, formatted version of the article will be published soon.

The labor market is rapidly evolving, leading to a mismatch between existing Knowledge, Skills, and Abilities and future occupational requirements. Reports from organizations like the World Economic Forum and the OECD emphasize the need for dynamic skill identification. This paper introduces a novel system for constructing a dynamic taxonomy using Natural Language Processing techniques, specifically Named Entity Recognition and Relation Extraction, to identify and predict future skills. By leveraging machine learning models, this taxonomy aims to bridge the gap between current skills and future demands, contributing to educational and professional development. To achieve this, an NLP-based architecture was developed using a combination of text preprocessing, NER, and RE models. The NER model identifies and categorizes KSAs and occupations from a corpus of labor market reports, while the RE model establishes the relationships between these entities. A custom pipeline was used for PDF text extraction, tokenization, and lemmatization to standardize the data. The models were trained and evaluated using over 1,700 annotated documents, and the training process was optimized for accuracy in both entity recognition and relationship prediction. 1 Blind review Dynamic taxonomy to identify future skills. The NER and RE models demonstrated promising performance, The NER model achieved a best micro-averaged F1-score of 65.38% in identifying occupations, skills, and knowledge entities. The RE model subsequently reached a best micro-F1 of 82.2% for accurately classifying semantic relationships between these entities at epoch 1009. The taxonomy generated from these models effectively identified emerging skills and occupations, offering insights into future workforce requirements. Visualizations of the taxonomy were created using various graph structures, demonstrating its applicability across multiple sectors. The results indicate that this system can dynamically update and adapt to changes in skill demand over time. The dynamic taxonomy model not only provides real-time updates on current competencies but also predicts emerging skill trends, offering a valuable tool for workforce planning. The high recall rates in NER suggest strong entity recognition capabilities, though precision improvements are needed to reduce false positives. Limitations include the need for a larger corpus and sector-specific models.

Keywords: artificial intelligence, Dynamic Taxonomy, Educational innovation, Future skills, Natural Language Processing, named entity recognition, Professional Development, word embeddings

Received: 19 Feb 2025; Accepted: 11 Jun 2025.

Copyright: © 2025 Gonzalez-Gomez, Hernandez-Munoz, Borja, Arana-Salas, Azofeifa, Noguez and Caratozzolo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Patricia Caratozzolo, Institute for the Future of Education, Tecnologico de Monterrey, Monterrey, México, Mexico

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.