TECHNOLOGY AND CODE article
Front. Comput. Sci.
Sec. Human-Media Interaction
This article is part of the Research TopicArtificial Intelligence for Technology Enhanced LearningView all 20 articles
Research on an Intelligent Tutoring System Based on Automatic Construction of Multimodal Knowledge Graphs and Retrieval-Augmented Generation
Provisionally accepted- Guangdong University of Science and Technology, Dongguan, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
As a key application of technology-enhanced learning, Intelligent Tutoring Systems have long been constrained by bottlenecks such as expert-dependent, costly manual knowledge base construction and difficulties in adapting to unstructured teaching resources. Concurrently, generative large language models face challenges in educational question-answering, including factual inaccuracies and insufficient logical reasoning capabilities. To address these issues, this study proposes a framework for an Intelligent Tutoring System based on the automatic construction of multimodal knowledge graphs and Retrieval-Augmented Generation (RAG). The system integrates technologies such as FFmpeg, Whisper, OCR, and layout analysis to establish a pipeline for the fully automatic extraction and construction of knowledge graphs not only from course videos, but also from textbook PDFs. This process enables the integration of auditory information from videos with visual and textual knowledge from textbooks, building on this foundation, the framework combines graph retrieval and vector retrieval strategies, leveraging the RAG mechanism to drive large language models in generating accurate and explainable question-answering content. Experimental results demonstrate that the proposed system achieves positive feedback in terms of knowledge graph construction, the average accuracy and relevance of intelligent Q&A responses, overall user satisfaction, and system performance. Beyond automation, its core innovation is a cross-modal fusion mechanism that aligns and integrates knowledge from auditory explanations and visual-textual textbook content, thereby creating a unified, instructionally-structured knowledge graph. Thus, this study provides a feasible and innovative path from multimodal resources to intelligent services for Intelligent Tutoring Systems, holding significant practical implications for advancing personalized learning.
Keywords: Automatic knowledge graph construction, Intelligent Tutoring System, personalized learning, Retrieval-Augmented Generation (RAG), Technology-Enhanced Learning
Received: 30 Dec 2025; Accepted: 06 Feb 2026.
Copyright: © 2026 Deng and Yuan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Bo Yuan
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.