- 1Department of Electrical Engineering, Shiraz University of Technology, Shiraz, Iran
- 2Electrical and Computer Engineering Department, University of Alberta, Edmonton, AB, Canada
Introduction: Energy efficiency is a critical challenge in Beyond-5G (B5G) cellular networks, where ground base stations (GBSs) are responsible for a substantial portion of network energy consumption. Reducing this consumption while maintaining minimum user data rate requirements remains a key research problem.
Methods: This paper proposes an Aerial Base Station (ABS)-assisted energy optimization framework that integrates ABS deployment with low-power sleep states of GBSs. Traffic is selectively offloaded from lightly loaded GBSs to ABSs, enabling energy savings without violating user quality-of-service constraints. A Deep Deterministic Policy Gradient (DDPG) algorithm is employed to jointly optimize ABS positioning, GBS sleep mode scheduling, and resource allocation under dynamic traffic conditions.
Results: Simulation results demonstrate that the proposed DDPG-based framework significantly reduces network energy consumption while improving achievable user data rates compared to baseline schemes without ABS assistance or learning-based optimization.
Discussion: The results highlight the effectiveness of integrating ABSs with GBS low-power sleep states using reinforcement learning. By enforcing minimum data rate constraints and dynamically adapting to traffic variations, the proposed approach provides a scalable and energy-efficient solution for sustainable operation.
1 Introduction
The evolution of Beyond 5G (B5G) technology builds upon the worldwide deployment of fifth-generation (5G) networks, aiming to meet growing connectivity demands through more advanced standards and innovative communication solutions. B5G is expected to deliver enhanced mobility, superior reliability, ultra-high data rates, intelligent network management, and improved energy efficiency. With increasing dependency on high-speed and ubiquitous communication, B5G networks are anticipated to progress toward greater resilience and maturity. To enable this transition, key technologies such as Artificial Intelligence (AI), Edge Computing, Reconfigurable Intelligent Surfaces (RIS), Terahertz (THz) communication, Quantum Computing, and Unmanned Aerial Vehicles (UAVs) are being actively explored (Sufyan et al., 2023; Puspitasari et al., 2023; Dogra et al., 2020; Alsharif et al., 2018).
The potential of UAVs in cellular and wireless networks has gained substantial attention in recent literature. UAV-assisted communication systems have shown promise in extending network coverage beyond the limitations of terrestrial infrastructure, enhancing link reliability, and offering flexible, resilient, and sustainable connectivity in diverse deployment scenarios (Gryech et al., 2024). A prominent outcome of this integration is the development of Aerial Access Networks (AANs), where UAVs are utilized to provide communication services from the air. Unlike Terrestrial Access Networks (TANs), AANs overcome geographic and infrastructural constraints, offering wide-area coverage, improved communication quality, and high mobility support—particularly in remote or hard-to-reach environments where TANs may be infeasible (Behjati et al., 2025). In such systems, UAVs often operate as Aerial Base Stations (ABSs) or airborne relays, enabling direct access to users and thereby improving service availability and network adaptability.
Significant research efforts have been directed toward the integration of UAVs into B5G networks, addressing challenges such as trajectory optimization, user association mechanisms, transmission power management, and the cooperative deployment of UAVs with Intelligent Reflecting Surfaces (IRS) for improved signal propagation (Banafaa et al., 2024; Qazzaz et al., 2024; Shahzadi et al., 2021; Geraci et al., 2022; Gu and Zhang, 2023; Sarkar and Gul, 2023; Jangsher et al., 2022). These advancements improve performance, reliability, and energy efficiency in future wireless communication systems (Amponis et al., 2022).
A diverse set of techniques has been proposed to effectively reduce energy consumption in drone networks (Abubakar et al., 2023). These approaches encompass resource management (Masroor et al., 2021; Basharat et al., 2022), flight and transmission scheduling (Wu et al., 2022), path planning (Azadur et al., 2024), and optimal placement and trajectory design (Elnabty et al., 2022; Won et al., 2023; Azarhava et al., 2024; Tung et al., 2022).
Beyond UAV-assisted solutions, a key energy challenge in B5G networks is the high power consumption of Ground Base Stations (GBSs). To mitigate this issue, researchers are actively developing optimization strategies to improve GBS energy efficiency. One widely adopted approach, applicable to both terrestrial networks and ABS-assisted frameworks, is the GBS sleep strategy, which dynamically deactivates underutilized GBSs with a small number of associated users while ensuring that user data rate requirements are satisfied.
GBS sleep strategies are designed to identify optimal opportunities for base stations to enter sleep mode without compromising network coverage or service quality (López-Pérez et al., 2022). These strategies can be broadly classified into binary on/off schemes and multi-level sleep modes, each offering distinct mechanisms for reducing energy consumption while maintaining network performance.
The binary scheme conserves energy by deactivating underutilized GBSs; however, this approach may negatively impact data transmission rates (Kim et al., 2015; Kooshki et al., 2023). To optimize energy efficiency in ultra-dense networks, researchers in (Amine et al., 2022) and (Ju et al., 2022) proposed reinforcement learning (RL)-based cell switching algorithms for managing small cells. Specifically, the work in (Ju et al., 2022) introduces a Decision Selection Network (DSN) to streamline the action space within a Deep Reinforcement Learning (DRL) framework, demonstrating effective management of active and sleep modes while maintaining essential data rate requirements.
In contrast, multi-level sleep modes leverage mobile traffic prediction to dynamically transition idle small cells into different sleep states, further optimizing energy efficiency while ensuring network performance (Kim et al., 2023).
An often-overlooked application of drones is their use as Aerial Base Stations (ABSs) to facilitate GBS sleep strategies. In Chowdary et al. (2021), the authors propose a resource allocation algorithm that leverages active GBSs to serve users in areas where some GBSs have entered sleep mode. While this approach ensures user connectivity in sleeping areas, it introduces challenges such as service instability for users in non-sleeping areas and increased algorithmic complexity due to resource reallocation after GBS deactivation. Moreover, this work does not explore the potential of drones operating explicitly as ABSs to enable GBS sleep modes.
Motivated by these limitations, we propose an ABS-assisted GBS sleep strategy that selectively allows lightly loaded GBSs to enter sleep mode during periods of reduced traffic demand. This approach minimizes overall network energy consumption while ensuring that each ABS satisfies the minimum data rate required to maintain Quality of Service (QoS) during GBS downtime. Our analysis demonstrates that effective ABS deployment not only enhances overall network transmission rates but also significantly reduces GBS power consumption.
To further amplify the energy-saving benefits, we introduce a joint optimization framework that integrates GBS sleep scheduling, resource allocation, and ABS position optimization to minimize network-wide energy consumption. The resulting decision-making process forms a complex binary integer programming problem, motivating the need for efficient learning-based optimization techniques.
To address this challenge, we propose a hybrid Deep Reinforcement Learning (DRL) framework that combines Deep Deterministic Policy Gradient (DDPG) and Deep Double Q-Learning (DDQL) algorithms. The rationale for this integration is that the considered optimization includes both continuous variables (e.g., ABS horizontal positioning and transmit power allocation) and discrete decisions (e.g., GBS sleep mode control and association selection). DDPG is well-suited to continuous control, whereas DDQL is effective in discrete action spaces and reduces Q-value overestimation. By combining them, the proposed framework can efficiently handle the mixed discrete–continuous action space in a unified learning process.
Unless otherwise stated, users are assumed quasi-static during each optimization interval, i.e., user locations remain fixed while decisions on association, sleep scheduling, and resource allocation are optimized.
The remainder of this paper is organized as follows: Section 2 introduces the system model, including the sleep model, terrestrial and aerial channel models, and the power consumption model, followed by the optimization problem formulation. Section 3 presents the theoretical preliminaries of the DRL algorithms and details the proposed hybrid framework. Section 4 discusses the simulation results and provides an in-depth analysis. Finally, Section 5 concludes the paper with key findings.
2 System model
This section provides a structured overview of the key components considered in this study, including the network architecture, user/ABS location assumptions, channel models, energy consumption model, and problem formulation.
As illustrated in Figure 1, this study focuses on an ABS-assisted downlink wireless network. The analysis is conducted within a
Figure 1. A wireless network consisting of multiple Ground Base Stations (GBSs) and Aerial Base Stations (ABSs), where the GBSs are equipped with sleep mode capability to optimize energy efficiency.
Additionally, the network includes
The network employs Orthogonal Frequency Division Multiplexing (OFDM) to serve users across
where
where
To model user association, we define a binary variable
where the total number of users in the network satisfies (Equation 4):
To represent the operational status of each GBS, we introduce a binary sleep indicator
The
while each GBS is positioned at the center of its respective cell. The distance between the
where
2.1 Air-to-ground (A2G) channel model
The Line-of-Sight (LoS) channel model is commonly employed in UAV-assisted networks to facilitate communication between Aerial Base Stations (ABSs) and Cellular Users (CUs) (Kim et al., 2015; Khawaja et al., 2019). The expected channel power gain from the
where
Furthermore, the probability of establishing an LoS link,
where
denotes the elevation angle between the
2.2 Ground-to-ground (G2G) channel model
For the terrestrial part of the network, which refers to the channel between Ground Base Stations (GBSs) and users, we adopt a fading channel model. The small-scale fading coefficient,
where
where
Adding shadowing to the pasthloss, we have (Equation 13):
where
Hence, the channel gain is expressed as (Equation 14):
The Signal-to-Interference-plus-Noise Ratio (SINR) of the
Similarly, for the
The achievable data rate for user
If no user is served on a given subcarrier
2.3 Power consumption model
The total power consumption in UAV-assisted networks comprises two primary components: the power consumption of the ground network and the power consumption of UAVs.
2.3.1 Power consumption of the ground network
The power consumption of the ground network includes the following components:
1. Transmission Power:
2. Circuit Power:
3. Mode-dependent Power Consumption: The power consumed by a GBS due to its active or sleep mode operation, including power supply and air conditioning, given by (Equation 18):
where
1. Mode Transition Power: The power associated with transitioning the base station
where
2.3.2 Power consumption of UAVs
The power consumption of a UAV consists of three main components (Equation 20):
where
where
2.3.3 Total weighted power consumption
In UAV-based networks, the energy consumed by UAVs for hovering and flying is typically much higher than the power required for communication. To address this imbalance, a weighting factor
The total weighted power consumption of the UAV-assisted network is given by (Equation 23):
In the simulations, the weighting factor is set to
2.4 Energy efficiency
The Energy Efficiency (EE) criteria serve as a critical framework for evaluating the effectiveness of resource allocation within the network, particularly when GBSs are in sleep mode. By computing the Energy Efficiency metric, we ensure that the power consumption of GBSs does not compromise users’ quality of service. The Energy Efficiency (EE) criterion is defined as (Equation 24):
where
2.5 Optimization problem formulation
The optimization problem for maximizing the energy efficiency of the cellular network is formulated as (Equation 25):
subject to the following constraints:
Constraint (C1) ensures that the transmission power per subcarrier remains within the maximum allowable thresholds for both GBSs and ABSs, as defined in (Equation 26a).
Constraint (C2) determines the operational status of each GBS, where
Constraint (C3) guarantees exclusive user association by ensuring that each user is connected to exactly one serving node—either a single GBS or a single ABS—on each subcarrier. The inclusion of the activity indicator
Constraint (C4) enforces that the total transmit power of each GBS does not exceed its maximum permissible value, accounting for both active and sleep states, as given in (Equation 26d). Constraint (C5) ensures that ABSs operate within their defined power limitations, as specified in (Equation 26e).
Constraints (C6) and (C7) confine ABSs within the designated region of interest, ensuring that they remain within operational limits, as enforced in (Equation 26f) and (Equation 26g), respectively. Finally, Constraint (C8) guarantees that each user achieves the minimum required data rate, thereby maintaining the network’s quality of service (QoS), as defined in (Equation 26h).
The optimization problem formulated in (Equation 25) is a mixed-integer non-convex problem involving both discrete and continuous variables, rendering it intractable for conventional mathematical optimization techniques. To address this complexity, a learning-based approach is introduced in the following section.
3 DRL-based framework for optimizing complex problems
This section presents the hybrid DDPG–DDQL framework developed to address the energy efficiency optimization problem in ABS-assisted B5G networks incorporating a sleep strategy, as formulated in (Equation 25).
3.1 Basics of deep reinforcement learning (DRL)
Reinforcement Learning (RL) has significantly advanced Artificial Intelligence (AI) by enabling agents to make decisions, observe outcomes, and iteratively refine their strategies to determine an optimal policy (Morocho-Cayamcela et al., 2019; Huang et al., 2019). However, due to its reliance on extensive exploration, traditional RL can be slow and computationally expensive, limiting its applicability in large-scale networks.
Deep Reinforcement Learning (DRL) integrates Deep Neural Networks (DNNs) into RL, significantly enhancing learning speed and efficiency. In applications such as IoT and UAV-assisted networks, devices often need to make independent decisions to optimize network performance. These scenarios are frequently modeled as Markov Decision Processes (MDPs), which are formally defined as a quintuple
•
•
•
•
•
Although traditional methods such as dynamic programming and value iteration can solve MDPs, they become computationally impractical for large-scale and complex networks. DRL techniques, particularly Deep Q-Learning (DQL), provide scalable solutions by approximating value functions using deep neural networks.
3.2 Deep Q-learning (DQL) and its limitations
Deep Q-Learning (DQL) is a fundamental DRL algorithm that estimates Q-values for state-action pairs using neural networks (Braga et al., 2020). For an agent parameterized by
Here,
3.3 Double deep Q-learning (DDQL)
To mitigate the overestimation bias in DQL, the authors in (Fährmann et al., 2022; Shokrnezhad et al., 2024) introduced Double Deep Q-Learning (DDQL), which decouples action selection and evaluation employing two distinct Q networks:
• The primary Q-network:
• The target network:
The DDQL update equation is (Equation 28):
The target network
where
Although DDQL effectively reduces overestimation and improves convergence in discrete action spaces (such as GBS sleep mode decisions), it struggles with continuous action spaces, such as power allocation and ABS positioning.
3.4 Deep deterministic policy gradient (DDPG)
For continuous action spaces, Deep Deterministic Policy Gradient (DDPG) algorithm (Yu et al., 2021) is a more suitable approach. DDPG is an actor-critic algorithm that efficiently handles sequential decision making. It optimizes a policy function
Unlike DQL, where policies output a probability distribution over discrete actions, DDPG directly maps states to actions through a policy network
where
In large-scale environments with numerous actions, the actor-critic framework efficiently approximates Q-values using (Equation 32):
Similarly to DDQL, DDPG enhances stability by using:
• Experience replay to train the critic network.
• Target networks for both actors and critics, updated using Polyak averaging.
3.5 Hybrid DDPG-DDQL for UAV-assisted B5G networks
Given the nature of our optimization problem, which involves both discrete decisions (e.g., GBS sleep mode and discrete association choices) and continuous variables (e.g., power allocation and ABS positioning), we propose a Hybrid DDPG–DDQL framework. In this hybrid design, DDQL handles the discrete decision component and mitigates overestimation bias, while DDPG learns a deterministic continuous-control policy through actor–critic training. This explicit separation helps stabilize learning and reduces the overall search complexity compared to using a single algorithm to handle a mixed action space.
3.6 Proposed hybrid DDPG-DDQL framework with ABS-assisted sleep strategy
The objective of this research is to develop a DRL-based framework that optimizes the sleep scheduling of GBSs, the power allocation vector, and the ABS positioning vector, based on a given Channel State Information (CSI) matrix, defined as (Equation 33):
where without loss of generality, a single UAV in the network has been assumed and hence, the UAV index of the channel matrix discarded. Furthermore,
For a given sleep configuration, the actor–critic DDPG algorithm (Zhou et al., 2022) is used to optimize the power allocation and ABS horizontal positioning. DDPG continuously outputs the continuous control vector, denoted by
To determine the optimal sleep configuration, the DDQL algorithm (Van Hasselt et al., 2016) is used, since the number of possible sleep configurations is finite and each configuration index is discrete.
3.7 Optimization problem formulation
As illustrated in Figure 2, both algorithms interact with a simulated ABS-assisted network environment to address the optimization problem formulated in Equation 24.
The network environment state is represented as (Equation 34):
where each state
3.8 Reward function and action space
The immediate reward function
where
In this framework, as shown in Figure 2, the action space
Here,
• The continuous action vector
• tThe discrete action vector
3.9 Computational complexity analysis
The computational complexity of the proposed method mainly arises during the offline training stage due to iterative neural network updates in both DDPG and DDQL. For a fully connected network layer
After training, the online inference stage requires only forward passes through the trained networks. With
In addition, the DDQL component evaluates a discrete action among a finite set of sleep configurations; thus, the per-step selection overhead scales with the number of discrete actions (sleep configurations) considered by the DDQL output layer. Overall, the proposed approach is computationally intensive during offline training but has low online computational overhead, making it suitable for real-time operation once trained.
4 Results and discussions
In this section, we present the results of the proposed hybrid-DRL algorithm to optimize energy efficiency in ABS-assisted cellular networks. The simulation parameters are summarized in Table 1.
The actor and critic networks, along with their respective target networks, are designed with two hidden layers comprising 256 and 128 neurons, respectively. In contrast, the DDQN architecture includes two fully connected layers with 64 neurons each, followed by ReLU activation functions, and terminates with a linear output layer. To enhance convergence and ensure training stability, the Adam optimizer is employed for the critic network, using its default hyperparameters
Figure 3 illustrates the relationship between the number of sleeping GBSs and the rate requirement across three optimization scenarios: (i) optimizing achievable rate without considering EE, (ii) prioritizing EE while ignoring rate constraints, and (iii) jointly optimizing EE under the minimum-rate constraint. The results show that
It can be observed from Figure 3 that when the minimum rate requirement exceeds
As illustrated in Figure 4, the energy efficiency criterion is expressed in terms of training episodes. To evaluate the role of the ABS in the proposed system, three scenarios were considered: one with the ABS at a higher altitude, another at a lower altitude, and a scenario without an ABS. In this context,
5 Conclusion
This paper presents an ABS-assisted energy optimization framework for beyond-5G (B5G) cellular networks, utilizing selective ground base station (GBS) sleep modes and traffic offloading through aerial base stations (ABSs). To address the dynamic and non-convex nature of the problem, a hybrid reinforcement learning algorithm combining Deep Deterministic Policy Gradient (DDPG) and Double Deep Q-Learning (DDQL) is developed. This algorithm jointly optimizes ABS positioning, GBS sleep scheduling, and resource allocation. Simulation results demonstrate that the proposed framework significantly reduces overall network energy consumption while maintaining service quality.
These findings underscore the potential of hybrid deep reinforcement learning techniques in enabling intelligent and energy-efficient wireless communication systems. Future research directions include incorporating renewable-powered ABSs, implementing cooperative multi-ABS coordination, and developing real-time adaptive mechanisms to further enhance system scalability and performance.
Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.
Author contributions
VS: Writing – review and editing, Writing – original draft. ME: Writing – original draft, Writing – review and editing. KK: Writing – review and editing, Writing – original draft.
Funding
The author(s) declared that financial support was not received for this work and/or its publication.
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was used in the creation of this manuscript. Generative AI was used exclusively to edit and improve the clarity of the manuscript’s text.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Abubakar, A. I., Ahmad, I., Omeke, K., Ozturk, M., Ozturk, C., Abdel-Salam, A., et al. (2023). A survey on energy optimization techniques in UAV-based cellular networks: from conventional to machine learning approaches. Drones 7 (3), 214. doi:10.3390/drones7030214
Alsharif, M., Nordin, R., Abdullah, N. F., and Kelechi, A. H. (2018). How to make key 5G wireless technologies environmental friendly: a review. Trans. Emerg. Telecommun. Technol. 29 (1), e3254. doi:10.1002/ett.3254
Amine, A. E., Chaiban, J. P., Hassan, H. A. H., Dini, P., Nuaymi, L., and Achkar, R. (2022). Energy optimization with multi-sleeping control in 5G heterogeneous networks using reinforcement learning. IEEE Trans. Netw. Serv. Manag. 19 (4), 4310–4322. doi:10.1109/tnsm.2022.3157650
Amponis, G., Lagkas, T., Zevgara, M., Katsikas, G., Xirofotos, T., Moscholios, I., et al. (2022). Drones in B5G/6G networks as flying base stations. Drones 6 (2), 39. doi:10.3390/drones6020039
Azadur, R.Md., Pawase, C. J., and Chang, K. (2024). Multi-UAV path planning utilizing the PGA algorithm for terrestrial IoT sensor network under ISAC framework. Trans. Emerg. Telecommun. Technol. 35 (1), e4916. doi:10.1002/ett.4916
Azarhava, H., Abdollahi, M. P., and Musevi Niya, J. (2024). Placement and power assignment for hierarchical UAV networks under hovering fluctuations in mmWave communications. Trans. Emerg. Telecommun. Technol. 35 (11), e70002. doi:10.1002/ett.70002
Banafaa, M. K., Pepeoğlu, Ö., Shayea, I., Alhammadi, A., Shamsan, Z. A., Razaz, M. A., et al. (2024). A comprehensive survey on 5G-and-Beyond networks with UAVs: applications, emerging technologies, regulatory aspects, research trends and challenges. IEEE Access 12, 7786–7826. doi:10.1109/access.2023.3349208
Basharat, M., Naeem, M., Qadir, Z., and Anpalagan, A. (2022). Resource optimization in UAV-Assisted wireless networks—A comprehensive survey. Trans. Emerg. Telecommun. Technol. 33 (7), e4464. doi:10.1002/ett.4464
Behjati, M., Alobaidy, H. A. H., Nordin, R., and Abdullah, N. F. (2025). UAV-assisted federated learning with hybrid LoRa P2P/LoRaWAN for sustainable biosphere. Front. Commun. Netw. 6, 1529453. doi:10.3389/frcmn.2025.1529453
Braga, I. M., Cavalcante, E. d. O., Fodor, G., Silva, Y. C. B., e Silva, C. F. M., and Freitas, W. C. (2020). User scheduling based on multi-agent deep Q-learning for robust beamforming in multicell MISO systems. IEEE Commun. Lett. 24 (12), 2809–2813. doi:10.1109/lcomm.2020.3015462
Chowdary, A., Ramamoorthi, Y., Kumar, A., and Cenkeramaddi, L. R. (2021). Joint resource allocation and UAV scheduling with ground radio station sleeping. IEEE Access 9, 124505–124518. doi:10.1109/access.2021.3111087
Dogra, A., Jha, R. K., and Jain, S. (2020). A survey on beyond 5G network with the advent of 6G: architecture and emerging technologies. IEEE Access 9, 67512–67547. doi:10.1109/access.2020.3031234
Elnabty, I. A., Fahmy, Y., and Kafafy, M. (2022). A survey on UAV placement optimization for UAV-Assisted communication in 5G and beyond networks. Phys. Commun. 51, 101564. doi:10.1016/j.phycom.2021.101564
Fährmann, D., Jorek, N., Damer, N., Kirchbuchner, F., and Kuijper, A. (2022). Double deep Q-Learning with prioritized experience replay for anomaly detection in smart environments. IEEE Access 10, 60836–60848. doi:10.1109/access.2022.3179720
Geraci, G., Garcia-Rodriguez, A., Azari, M. M., Lozano, A., Mezzavilla, M., Chatzinotas, S., et al. (2022). What will the future of UAV cellular communications be? A flight from 5G to 6G. IEEE Commun. Surv. Tutor. 24 (3), 1304–1335. doi:10.1109/comst.2022.3171135
Ghorbel, M. B., Rodriguez-Duarte, D., Ghazzai, H., Hossain, M. J., and Menouar, H. (2019). Joint position and travel path optimization for energy efficient wireless data gathering using unmanned aerial vehicles. IEEE Trans. Veh. Technol. 68 (3), 2165–2175. doi:10.1109/tvt.2019.2893374
Gryech, I., Vinogradov, E., Saboor, A., Bithas, P. S., Mathiopoulos, P. T., and Pollin, S. (2024). A systematic literature review on the role of UAV-enabled communications in advancing the UN’s sustainable development goals. Front. Commun. Netw. 5, 1286073. doi:10.3389/frcmn.2024.1286073
Gu, X., and Zhang, G. (2023). A survey on UAV-assisted wireless communications: recent advances and future trends. Comput. Commun. 208, 44–78. doi:10.1016/j.comcom.2023.05.013
Huang, Y., Xu, C., Zhang, C., Hua, M., and Zhang, Z. (2019). An overview of intelligent wireless communications using deep reinforcement learning. J. Commun. Inf. Netw. 4 (2), 15–29. doi:10.23919/jcin.2019.8917869
Jangsher, S., Al-Jarrah, M., Al-Dweik, A., Alsusa, E., and Kong, P. Y. (2022). Energy constrained sum-rate maximization in IRS-assisted UAV networks with imperfect channel information. IEEE Trans. Aerosp. Electron. Syst. 59 (3), 2898–2908. doi:10.1109/taes.2022.3220493
Ju, H., Kim, S., Kim, Y., and Shim, B. (2022). Energy-efficient ultra-dense network with deep reinforcement learning. IEEE Trans. Wirel. Commun. 21 (8), 6539–6552. doi:10.1109/twc.2022.3150425
Khawaja, W., Guvenc, I., Matolak, D. W., Fiebig, U. C., and Schneckenburger, N. (2019). A survey of air-to-ground propagation channel modeling for unmanned aerial vehicles. IEEE Commun. Surv. Tutor. 21 (3), 2361–2391. doi:10.1109/comst.2019.2915069
Kim, J., Jeon, W. S., and Jeong, D. G. (2015). Base-station sleep management in open-access femtocell networks. IEEE Trans. Veh. Technol. 65 (5), 3786–3791. doi:10.1109/tvt.2015.2445922
Kim, T., Lee, S., Choi, H., Park, H. S., and Choi, J. (2023). An energy-efficient multi-level sleep strategy for periodic uplink transmission in industrial private 5G networks. Sensors 23, 9070. doi:10.3390/s23229070
Kooshki, F., Armada, A. G., Mowla, M. M., Flizikowski, A., and Pietrzyk, S. (2023). Energy-efficient sleep mode schemes for cell-less RAN in 5G and beyond 5G networks. IEEE Access 11, 1432–1444. doi:10.1109/access.2022.3233430
López-Pérez, D., De Domenico, A., Piovesan, N., Xinli, G., Bao, H., Qitao, S., et al. (2022). A survey on 5G radio access network energy efficiency: massive MIMO, lean carrier design, sleep modes, and machine learning. IEEE Commun. Surv. Tutor. 24 (1), 653–697. doi:10.1109/comst.2022.3142532
Masroor, R., Naeem, M., and Ejaz, W. (2021). Resource management in UAV-assisted wireless networks: an optimization perspective. Ad Hoc Netw. 121, 102596. doi:10.1016/j.adhoc.2021.102596
Morocho-Cayamcela, M. E., Lee, H., and Lim, W. (2019). Machine learning for 5G/B5G Mobile and wireless communications: potential, limitations, and future directions. IEEE Access 7, 137184–137206. doi:10.1109/access.2019.2942390
Puspitasari, A., An, T. T., Alsharif, M. H., and Lee, B. M. (2023). Emerging technologies for 6G communication networks: machine learning approaches. Sensors 23, 7709. doi:10.3390/s23187709
Qazzaz, M. M. H., Zaidi, S. A., McLernon, D. C., Hayajneh, A. M., Salama, A., and Aldalahmeh, S. A. (2024). Non-terrestrial UAV clients for beyond 5G networks: a comprehensive survey. Ad Hoc Netw. 157, 103440. doi:10.1016/j.adhoc.2024.103440
Sarkar, N. I., and Gul, S. (2023). Artificial intelligence-based autonomous UAV networks: a survey. Drones 7 (5), 322. doi:10.3390/drones7050322
Shahzadi, R., Ali, M., Khan, H. Z., and Naeem, M. (2021). UAV assisted 5G and beyond wireless networks: a survey. J. Netw. Comput. Appl. 189, 103114. doi:10.1016/j.jnca.2021.103114
Shokrnezhad, M., Taleb, T., and Dazzi, P. (2024). Double deep Q-Learning-Based path selection and service placement for latency-sensitive beyond 5G applications. IEEE Trans. Mob. Comput. 23 (5), 5097–5110. doi:10.1109/tmc.2023.3301506
Singh, Y. (2012). Comparison of okumura, hata and COST-231 models on the basis of path loss and signal strength. Int. J. Comput. Appl. 59 (11), 37–41. doi:10.5120/9594-4216
Sufyan, A., Khan, K. B., Khashan, O. A., Mir, T., and Mir, U. (2023). From 5G to beyond 5G: a comprehensive survey of wireless network evolution, challenges, and promising technologies. Electronics 12, 2200. doi:10.3390/electronics12102200
Tung, T. V., An, T. T., and Lee, B. M. (2022). Joint resource and trajectory optimization for energy efficiency maximization in UAV-based networks. Mathematics 10 (20), 3840. doi:10.3390/math10203840
Van Hasselt, H., Guez, A., and Silver, D. (2016). “Deep reinforcement learning with double Q-learning,” in Proc. AAAI Conf. Artif. Intell.
Won, J., Kim, D. Y., Park, Y. I., and Lee, J. W. (2023). A survey on UAV placement and trajectory optimization in communication networks: from the perspective of air-to-ground channel models. ICT Express 9 (3), 385–397. doi:10.1016/j.icte.2022.01.015
Wu, W., Sun, S., Shan, F., Yang, M., and Luo, J. (2022). Energy-constrained UAV flight scheduling for IoT data collection with 60 GHz communication. IEEE Trans. Veh. Technol. 71 (10), 10991–11005. doi:10.1109/tvt.2022.3184869
Yu, Y., Tang, J., Huang, J., Zhang, X., So, D. K. C., and Wong, K. K. (2021). Multi-objective optimization for UAV-assisted wireless powered IoT networks based on extended DDPG algorithm. IEEE Trans. Commun. 69 (9), 6361–6374. doi:10.1109/tcomm.2021.3089476
Zhou, Q., Guo, C., Wang, C., and Cui, L. (2022). Radio resource management for C-V2X using graph matching and actor–critic learning. IEEE Wirel. Commun. Lett. 11 (12), 2645–2649. doi:10.1109/lwc.2022.3213176
Keywords: deep-double Q-learning (DDQL), deterministic-policy gradient (DDPG), energy efficiency, sleeping ground BS, ABS-assisted beyond-5G network
Citation: Saleh V, Eslami M and Kazemi K (2026) DDPG-based energy efficiency optimization for ABS-assisted beyond-5G cellular networks with sleep mode management. Front. Commun. Netw. 6:1764320. doi: 10.3389/frcmn.2025.1764320
Received: 09 December 2025; Accepted: 29 December 2025;
Published: 26 January 2026.
Edited by:
Mehran Behjati, Sunway University, MalaysiaReviewed by:
Mohammed Sani Adam, National University of Malaysia, MalaysiaJavad Haghighat, TED University, Türkiye
Copyright © 2026 Saleh, Eslami and Kazemi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Mohsen Eslami, bWVzbGFtaTFAdWFsYmVydGEuY2E=