Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Machine Learning and Artificial Intelligence

Volume 8 - 2025 | doi: 10.3389/frai.2025.1672273

This article is part of the Research TopicIndustrial Transformation through Blockchain: From Smart Manufacturing to Secure HealthcareView all 3 articles

Adaptive Consensus Optimization in Blockchain Using Reinforcement Learning and Validation in Adversarial Environments

Provisionally accepted
  • University of the Americas, Quito, Ecuador

The final, formatted version of the article will be published soon.

The increasing complexity and decentralization of modern blockchain networks have highlighted the limitations of traditional consensus protocols when operating under adverse or dynamic conditions. Existing approaches often fail to adapt to real-time anomalies such as Sybil attacks, network congestion, or node failures, resulting in decreased throughput, increased latency, and reduced security. Furthermore, most sys-tems lack autonomous mechanisms to adjust operational policies based on context, especially in edge computing environments where resource constraints and topological variability demand flexible and efficient solutions. This work proposes an adaptive consensus architecture that integrates a graph-based Proximal Policy Optimization (PPO) reinforcement learning agent capable of detecting malicious behavior, optimizing validation paths, and dynamically modifying consensus logic in response to adversarial scenarios. The model is trained on a hybrid dataset composed of real traffic traces and synthetically generated adversarial behaviors, and evaluated in stress-testing en-vironments with multiple threat vectors. Experimental results demonstrate that the proposed system maintains stable throughput (TPS) while reducing average consensus latency by 34% relative to baseline protocols under adverse high-load conditions. Regarding security, it achieves high detection in Sybil and node-collapse scenarios (DR exceeding 0.90 with FPR below 0.10), and moderate detection under congestion and erroneous transactions (DR between 0.58 and 0.70, FPR between 0.14 and 0.22). Additionally, we observe up to 16% lower average energy consumption in high-congestion settings. Energy consumption is reduced by up to 17% in crash-prone scenarios. The architecture demonstrates stable convergence over 100 operating cycles and robust adaptation to topological changes, validating its applicability in real-world deploy-ments.

Keywords: Adaptive Consensus Mechanism, Reinforcement Learning in Blockchain, Malicious node detection, Energy-Efficient Edge Validation, artificial intelligence

Received: 25 Jul 2025; Accepted: 08 Sep 2025.

Copyright: © 2025 Villegas, Gutiérrez and Govea. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: William Villegas, University of the Americas, Quito, Ecuador

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.