Your new experience awaits. Try the new design now and help us make it even better

TECHNOLOGY AND CODE article

Front. Comput. Sci.

Sec. Human-Media Interaction

This article is part of the Research TopicArtificial Intelligence for Technology Enhanced LearningView all 20 articles

Research on an Intelligent Tutoring System Based on Automatic Construction of Multimodal Knowledge Graphs and Retrieval-Augmented Generation

Provisionally accepted
  • Guangdong University of Science and Technology, Dongguan, China

The final, formatted version of the article will be published soon.

As a key application of technology-enhanced learning, Intelligent Tutoring Systems have long been constrained by bottlenecks such as expert-dependent, costly manual knowledge base construction and difficulties in adapting to unstructured teaching resources. Concurrently, generative large language models face challenges in educational question-answering, including factual inaccuracies and insufficient logical reasoning capabilities. To address these issues, this study proposes a framework for an Intelligent Tutoring System based on the automatic construction of multimodal knowledge graphs and Retrieval-Augmented Generation (RAG). The system integrates technologies such as FFmpeg, Whisper, OCR, and layout analysis to establish a pipeline for the fully automatic extraction and construction of knowledge graphs not only from course videos, but also from textbook PDFs. This process enables the integration of auditory information from videos with visual and textual knowledge from textbooks, building on this foundation, the framework combines graph retrieval and vector retrieval strategies, leveraging the RAG mechanism to drive large language models in generating accurate and explainable question-answering content. Experimental results demonstrate that the proposed system achieves positive feedback in terms of knowledge graph construction, the average accuracy and relevance of intelligent Q&A responses, overall user satisfaction, and system performance. Beyond automation, its core innovation is a cross-modal fusion mechanism that aligns and integrates knowledge from auditory explanations and visual-textual textbook content, thereby creating a unified, instructionally-structured knowledge graph. Thus, this study provides a feasible and innovative path from multimodal resources to intelligent services for Intelligent Tutoring Systems, holding significant practical implications for advancing personalized learning.

Keywords: Automatic knowledge graph construction, Intelligent Tutoring System, personalized learning, Retrieval-Augmented Generation (RAG), Technology-Enhanced Learning

Received: 30 Dec 2025; Accepted: 06 Feb 2026.

Copyright: © 2026 Deng and Yuan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Bo Yuan

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.