Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Comput. Sci.

Sec. Software

This article is part of the Research TopicArtificial Intelligence for Software Engineering: Advances, Applications, and ImplicationsView all articles

Enhancing RAPTOR with Semantic Chunking and Adaptive Graph Clustering

Provisionally accepted
  • 1Huaqiao University School of Computer Science and Technology, Quanzhou, China
  • 2School of Computer Science and Engineering, Changsha university, Changsha ,China, Changsha, China

The final, formatted version of the article will be published soon.

While Retrieval-Augmented Generation (RAG) significantly enhances language models, its application to long documents is often hampered by simplistic retrieval strategies that fail to capture hierarchical context. The RAPTOR framework addresses this through a recursive tree-structured approach, yet its effectiveness is constrained by semantic fragmentation from fixed-token chunking and a static clustering methodology that is suboptimal for organizing the hierarchy. In this paper, we propose a comprehensive two-stage enhancement framework to address these issues. We first employ Semantic Segmentation to generate coherent foundational leaf nodes and then introduce an Adaptive Graph Clustering (AGC) strategy, leveraging the Leiden algorithm with a novel layer-aware dual-adaptive parameter strategy to dynamically tailor clustering granularity. Extensive experiments on both the narrative QuALITY benchmark and the scientific Qasper dataset demonstrate the robustness and domain generalization of our framework. Our full model achieves a peak accuracy of 65.5% on QuALITY and demonstrates superior semantic validity on Qasper, significantly outperforming the baseline. Furthermore, comparative ablation studies reveal that our graph-topological approach outperforms traditional distance-based, density-based, and distribution-based clustering methods. In addition to performance gains, our approach constructs a dramatically more compact hierarchy, reducing the number of required summary nodes by up to 76%. This work underscores the critical importance of a holistic, semantic-first approach to building more effective and efficient retrieval trees for complex RAG tasks across diverse domains. To facilitate future research and reproducibility, we have made our source code and data publicly available at: https://github.com/Xin5643/Graph-raptor.

Keywords: Adaptive Clustering6, Graph Clustering4, Hierarchical Retrieva5, RAPTOR2, Retrieval-Augmented Generation (RAG)1, Semantic Segmentation3

Received: 21 Sep 2025; Accepted: 15 Dec 2025.

Copyright: © 2025 liu, Xie, Wan, Pan and WANG. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Xiaodong Xie

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.