Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Machine Learning and Artificial Intelligence

Volume 8 - 2025 | doi: 10.3389/frai.2025.1660912

This article is part of the Research TopicAdvancing AI-Driven Code Generation and Synthesis: Challenges, Metrics, and Ethical ImplicationsView all articles

Blueprint2Code: A multi-agent pipeline for reliable code generation via blueprint planning and repair

Provisionally accepted
Kehao  MaoKehao MaoBaokun  HuBaokun Hu*Ruixin  LinRuixin LinZewen  LiZewen LiGuanyu  LuGuanyu LuZhengyu  ZhangZhengyu Zhang
  • Hangzhou Normal University, Hangzhou, China

The final, formatted version of the article will be published soon.

Automated programming has become a powerful tool for solving real-world problems. Code generation, in particular, plays a key role in improving developer productivity and reducing the entry barrier to software development. Recent advances in large language models (LLMs) have significantly improved program synthesis, enabling high-quality code generation from natural language. However, LLMs still struggle with complex tasks, especially in understanding problem intent, conducting multi-step reasoning, and producing code that passes all test cases. As task difficulty increases, existing models often fail to devise complete and reliable generation strategies, leading to reduced accuracy and robustness. To address these limitations, we propose Blueprint2Code, an innovative multi-agent framework for code generation. It emulates the human programming workflow through the coordinated interaction of four agents—Previewing, Blueprint, Coding, and Debugging—forming a closed-loop system from task comprehension to planning, implementation, and iterative refinement. Compared to existing methods, Blueprint2Code shows superior performance on complex programming tasks. Extensive experiments on benchmark datasets—HumanEval, MBPP, their extended versions (HumanEval-ET, MBPP-ET), and the APPS competition dataset—demonstrated its effectiveness, achieving strong pass@1 results: HumanEval 96.3%, MBPP 88.4%, HumanEval-ET 86.5%, MBPP-ET 59.4%, and APPS 24.6%. The related code is available at https://github.com/MKH99918/Blueprint2Code.

Keywords: Code generation, Large language models, multi-agent systems, Program synthesis, automated debugging, Blueprint planning

Received: 07 Jul 2025; Accepted: 30 Sep 2025.

Copyright: © 2025 Mao, Hu, Lin, Li, Lu and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Baokun Hu, bourne@hznu.edu.cn

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.