Pipeline monitoring data recovery using novel deep learning models: an engineering case study

Zhao, Yong; Zhang, Xinpeng; Liu, Yanli; Mao, Xuecheng; Chen, Xi; Maimaitituerxun, Yasheng; He, Weidong

doi:10.3389/frai.2025.1684018

ORIGINAL RESEARCH article

Front. Artif. Intell., 07 October 2025

Sec. Machine Learning and Artificial Intelligence

Volume 8 - 2025 | https://doi.org/10.3389/frai.2025.1684018

This article is part of the Research TopicAI-Driven Architectures and Algorithms for Secure and Scalable Big Data SystemsView all 5 articles

Pipeline monitoring data recovery using novel deep learning models: an engineering case study

Yong Zhao¹

Xinpeng Zhang²

Yanli Liu²

Xuecheng Mao²

Xi Chen²

Yasheng Maimaitituerxun²

Weidong He³^*

¹Xinjiang Yaxin CBM Exploration and Development Co., Ltd., Urumqi, China
²Gas Storage Co., Ltd., PetroChina Xinjiang Oilfield, Hutubi, China
³School of Civil Engineering and Geomatics, Southwest Petroleum University, Chengdu, China

Pipeline monitoring frequently encounters missing data, leading to incomplete evaluation and hindering a comprehensive assessment of the pipeline’s structural health. To address this issue, this study proposes a novel PDO-BiGRU-GAN model for missing data recovery. The model integrates three components: the prairie dog optimization algorithm (PDO) for hyperparameter tuning, the bidirectional gated recurrent unit (BiGRU) for effective temporal feature extraction, and the generative adversarial network (GAN) for data generation and completion. A comprehensive monitoring database was established using field data from an open-source pipeline project. The contributions of individual modules to the overall performance were evaluated via hyperparameter sensitivity analysis and ablation studies. The impact of missing data ratio and the number of missing sensors on the model’s recovery performance was analyzed. In addition, the proposed model was compared with eight existing mainstream deep learning models. The results show that each component of the PDO-BiGRU-GAN significantly enhances overall performance. The model achieves strong recovery accuracy across various missing data scenarios, with the R² consistently exceeding 0.93. Moreover, the model performs optimally when the missing data ratio is below 20/24. Compared to other models, PDO-BiGRU-GAN achieves the highest R² and the lowest error metrics (MSE, RMSE, MAPE, MAE). In terms of computational efficiency, the model requires slightly more processing time than simpler models but is faster than more complex models. Overall, the proposed model provides a robust and scalable solution for pipeline monitoring data recovery, advancing intelligent pipeline health assessment and supporting the development of infrastructure safety management and smart monitoring technologies.

1 Introduction

Pipeline systems are essential infrastructure supporting multiple urban functions such as water supply, drainage, heating and gas distribution (Mazumder et al., 2018; Xiong et al., 2020). Their safe operation is vital for ensuring stable urban systems and sustainable daily life. However, pipelines are typically laid across regions, over long distances, and beneath the ground (Li et al., 2022; Quej-Ake et al., 2020; Shirazi et al., 2023). They often traverse diverse geological formations and complex subsurface terrains. During service, pipelines are prone to structural damage due to geostress variations, soil-induced degradation, and hydrochemical reactions, potentially leading to deformation and leakage (Sharma et al., 2024; Wong and McCann, 2021). To address these challenges, a comprehensive system spanning the entire lifecycle of pipeline infrastructure is necessary for monitoring and assessment.

Commonly used pipeline monitoring techniques include acoustic detection, fiber-optic sensing, electromagnetic induction, pressure monitoring, and image recognition (Adegboye et al., 2019; Ho et al., 2020; Sun et al., 2025a). Acoustic detection offers rapid response and is effective for leak localization; however, it is highly susceptible to environmental noise and interference (Xu et al., 2013). Electromagnetic induction is suitable for detecting corrosion in metallic pipelines and benefits from non-contact operation, but it is limited to conductive materials and suffers from rapid signal attenuation (Vasagar et al., 2024). Pressure monitoring is structurally simple and cost-effective, providing basic insights into operational conditions, though it lacks multidimensional data and cannot effectively identify structural damage (Ciang et al., 2008; Lopez and Sarigul-Klijn, 2010). Image recognition enables intuitive visualization but relies on favorable environmental conditions and has limited coverage over large areas (Cai et al., 2019; Zhou, 2023). In contrast, fiber-optic sensing provides high sensitivity, supports long-distance continuous monitoring, and is relatively cost-effective (Sun et al., 2025b). Therefore, fiber-optic sensing systems offer significant potential for widespread application in pipeline monitoring.

However, similar to traditional electrical sensors, fiber-optic sensing technology still faces data loss (Feng et al., 2019). First, signal interruptions may occur at sensor splicing points due to manufacturing defects or mechanical stress (Kuntoğlu et al., 2021). Second, prolonged use can cause sensor material aging, resulting in failure to acquire valid data. Additionally, external factors such as geological shifts, construction disturbances, and chemical corrosion can damage sensors, affecting data continuity and accuracy (Wright et al., 2019). Moreover, fiber-optic systems typically require long-distance signal transmission, during which signal attenuation at splicing points accumulates along the transmission path and may result in the loss or distortion of valid data (Sun et al., 2025c). Furthermore, environmental noise can obscure or disrupt the original measurement signals, causing further loss of valid information.

Data loss undermines the monitoring system’s ability to perceive critical operational states in real time, compromises data integrity, and reduces analytical accuracy (Lei et al., 2023; Tan et al., 2016). Moreover, it increases the response delay and uncertainty in fault detection. In severe cases, data loss may obscure early risk indicators, weakening the system’s warning capability (Avula, 2021; Jieyang et al., 2023). Furthermore, prolonged data gaps can lead to insufficient accumulation of historical data, impairing trend analysis of pipeline structural performance and lifespan prediction. Consequently, this affects the scientific basis for pipeline safety assessment and management strategy development. Therefore, it is imperative to develop effective solutions addressing data loss in fiber-optic sensing systems.

To address data loss issues, deep learning provides effective technical solutions (Bao et al., 2025; Li et al., 2025a; Ressi et al., 2022, 2024a; Sun et al., 2024a). Deep learning is a machine learning approach based on artificial neural networks, whose core concept is inspired by the structure of the human brain (Liu et al., 2025a; Sun et al., 2024b; Wang et al., 2025a, 2025b, 2025c). It employs multiple layers of nonlinear transformations to automatically extract complex features from data, enabling efficient modeling of large-scale information (Wei et al., 2025). Compared to traditional machine learning, deep learning automatically captures intricate nonlinear relationships within data, thereby reducing reliance on manual feature engineering (Sun et al., 2024c). In the field of data recovery, deep learning leverages spatiotemporal dependencies in incomplete datasets to reconstruct missing information with high accuracy (Lei et al., 2021). Common models for data loss recovery include the standard GAN (Sun et al., 2025c), LSTM-GAN (Pu et al., 2022; Kumari et al., 2024), GRU-GAN (Huang et al., 2022; Shen et al., 2022), Bi-LSTM-GAN (Jiang et al., 2023), and STOA-Bi-LSTM-GAN (Sun et al., 2025c), all of which have demonstrated promising effectiveness in preliminary applications.

Nevertheless, existing models still exhibit several limitations. When applied to time-series data, the standard GAN often suffers from unstable training of the generator and discriminator. It fails to effectively capture sequential dependencies, resulting in imputed values that deviate from actual dynamic patterns (Ren and Xu, 2019). LSTM-GAN and GRU-GAN alleviate this issue by employing recurrent neural networks (LSTM or GRU), thereby improving the ability to model long-term dependencies. However, these models remain vulnerable to vanishing or exploding gradients when processing multivariate or long-sequence data. Their performance is also highly sensitive to hyperparameter settings, such that minor deviations can lead to overfitting or unstable generation (Wan and Liu, 2023). BiLSTM-GAN enhances modeling by leveraging both past and future contextual information, yet it is still prone to mode collapse, unstable convergence, and challenging hyperparameter optimization (Wan, et al., 2023). STOA-BiLSTM-GAN demonstrates strong results across multiple benchmark datasets, but its high complexity and intensive training requirements hinder scalability in large-scale industrial applications. Therefore, there is an urgent need for a novel deep learning framework that maintains recovery accuracy while reducing model complexity and computational cost, thereby enhancing practicality and scalability.

Based on this, the study proposes a novel PDO-BiGRU-GAN model that integrates the prairie dog optimization algorithm (PDO), bidirectional gated recurrent units (BiGRU), and generative adversarial network (GAN). This model is designed to recover missing fiber-optic sensing data. Monitoring data from an open-source pipeline project were used to construct a pipeline monitoring dataset. Hyperparameter sensitivity analysis and ablation experiments were performed to evaluate the necessity and contribution of each module. The impact of ratios of missing data and the number of missing sensors on the recovery performance of the PDO-BiGRU-GAN model was analyzed. Furthermore, the model’s accuracy and computational efficiency were compared with eight existing deep learning models.

The main contributions and innovations of this study are as follows: First, a novel PDO-BiGRU-GAN deep learning framework was developed. This framework integrates the hyperparameter optimization capability of the PDO module, the temporal feature extraction capability of the BiGRU module, and the data generation and imputation ability of the GAN module. Second, fiber-optic monitoring data were obtained from an open-access pipeline project, and the model was systematically evaluated through hyperparameter sensitivity analysis and ablation studies. Additionally, the proposed model was compared with eight existing mainstream models in terms of accuracy and computational efficiency under varying missing-data scenarios. Overall, the proposed framework provides a new approach for imputing missing pipeline monitoring data and plays a crucial role in ensuring the completeness of pipeline monitoring information.

2 Motivation for developing the PDO-BiGRU-GAN network

To address the issue of missing monitoring data, numerous studies have focused on developing techniques for managing missing values in time-series data. Traditional approaches can be broadly categorized into two types. The first involves case deletion, which removes observations containing missing values to avoid their impact on analytical results (Schafer and Graham, 2002; Silva and Zárate, 2014). The second includes statistical methods, such as spline interpolation, matrix completion, and mean imputation, which replace missing data points based on statistical estimates (Gu et al., 2021). However, these methods have notable limitations. First, they exhibit limited capability in modeling temporal dependencies and often fail to capture complex dynamic relationships among variables. Second, their performance becomes unstable under conditions with a high missing ratio, leading to substantial errors (Zhang and Wang, 2024). Finally, they struggle to accurately reconstruct the underlying data distribution, particularly when the interactions among multiple features are intricate (Zhang and Wang, 2024). Consequently, traditional methods often fail to meet the accuracy and reliability requirements for reconstructing pipeline monitoring data.

In recent years, researchers have explored the use of GANs to recover missing data. Unlike traditional methods, GANs can model the underlying data distribution, thereby enabling high-quality data imputation. This capability enhances the completeness and reliability of the recovered data. The advantages of GANs are particularly pronounced in multivariate time-series analysis. They can not only impute individual missing values but also maintain complex dependencies among variables (Oh et al., 2021). Moreover, the generative framework of GANs enables adaptation to diverse missing data patterns, improving the robustness and flexibility of recovery outcomes (Gong et al., 2022). Studies have demonstrated that GANs achieve high accuracy in handling missing data, providing strong support for data analysis, prediction, and decision-making across various domains (Kachuee et al., 2020).

GANs have been widely applied to data recovery tasks. However, existing GAN-based approaches often neglect the interrelationships among variables and the bidirectional temporal dependencies inherent in time-series data (Wu et al., 2021). An ideal time-series imputation method should capture both characteristics while accurately modeling the underlying data distribution. Integrating temporal feature extraction modules into the GAN framework can therefore enhance model performance. Among temporal feature extraction networks, recurrent neural networks (RNNs) are particularly well-suited for modeling complex temporal dependencies. Nevertheless, they exhibit inherent limitations, including a restricted ability to extract fine-grained information and susceptibility to gradient explosion and vanishing. These issues reduce training efficiency and limit their applicability in temporal feature extraction tasks (Li et al., 2024). Recent advances in RNN architectures have addressed some of these challenges. In particular, the development of GRUs mitigates the vanishing gradient problem, accelerates convergence, and reduces computational uncertainty (Noh, 2021). Building on this, BiGRUs were introduced to process sequences in both forward and backward directions. Compared with BiLSTMs, BiGRUs retain the ability to model long-term dependencies. They also maintain a simpler structure, have fewer parameters, offer higher computational efficiency, and enable more stable training. They also demonstrate reduced sensitivity to overfitting in small datasets or noisy environments. BiGRUs excel at capturing both local temporal patterns and global trends, effectively lowering predictive uncertainty and alleviating forgetting effects. Therefore, embedding a BiGRU module within a GAN framework leverages the generative adversarial mechanism to approximate the original data distribution. It also fully exploits bidirectional temporal dependencies for feature modeling. This approach achieves higher accuracy and more robust performance in time-series missing value recovery tasks.

Furthermore, the performance of the BiGRU-GAN network is strongly influenced by hyperparameter configurations (Richter et al., 2024). Existing studies typically rely on manual trial-and-error to identify optimal parameter combinations, a process that is both inefficient and prone to producing suboptimal predictive results. Consequently, the use of advanced optimization algorithms for automated hyperparameter search is essential. Inspired by prairie dogs’ natural behaviors, Ezugwu et al. (2022) proposed the PDO algorithm. They compared PDO with several classical optimization methods, including the arithmetic optimization algorithm (Abualigah et al., 2021), grey wolf optimizer (Mirjalili et al., 2014; Sun et al., 2024d), differential evolution optimizer (Kosorukoff, 2001), salp swarm optimizer (Mirjalili et al., 2017), biogeography-based optimizer (Simon, 2008), sine cosine optimizer (Mirjalili, 2016), particle swarm optimizer (Kennedy and Eberhart, 1995; Sun et al., 2023a), and dwarf mongoose optimizer (Agushaka et al., 2022). Experimental results demonstrate that PDO excels in searching for the global optimum and exhibits a more stable convergence process than many of these algorithms. Statistical analyses further confirm its robustness in balancing exploration and exploitation. By leveraging these advantages, PDO can be integrated into the BiGRU-GAN framework to significantly improve hyperparameter search efficiency, thereby enhancing predictive accuracy and model stability. Based on this approach, the present study develops a PDO-BiGRU-GAN network to better capture the spatiotemporal correlations between missing and available data.

3 Basic principles of the PDO-BiGRU-GAN network

This study proposes a novel hybrid model, PDO-BiGRU-GAN, for missing data imputation. This model consists of three components: the PDO module, the BiGRU module, and the GAN module. The BiGRU module models the bidirectional temporal dependencies between available and missing data, thereby enhancing the representation of time series information (Du et al., 2019). The PDO module optimizes critical hyperparameters of the BiGRU model (learning rate, batch size, units per layer, and number of layers) to improve training efficiency and generalization performance. The GAN module introduces an adversarial mechanism to further enhance the realism and distribution consistency of the recovered data. The following sections provide a detailed explanation of the principles underlying each module of the PDO-BiGRU-GAN network.

3.1 BiGRU module

The GRU is an improved variant derived from the long short-term memory (LSTM) network. It merges the forget gate and the input gate of the LSTM into a single update gate (Richter et al., 2024). By integrating the cell state and hidden state of the LSTM, the GRU effectively mitigates the vanishing and exploding gradient problems encountered in modeling long-term dependencies. Furthermore, this architecture enhances both the convergence speed and computational efficiency of the model. The GRU primarily consists of two gates: the reset gate and the update gate, which, respectively, regulate the forgetting and retention of information before and after data transmission. This mechanism enables effective control and propagation of information within the neural units. The computational formulas of the GRU are presented in Equation 1.

{\begin{matrix} z_{t} = σ (W_{Z} \cdot [h_{t - 1}, x_{t}] + b_{z}) \\ r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}] + b_{r}) \\ {\tilde{h}}_{t} = tanh (W_{r} \cdot [r ⊙ h_{t - 1}, x_{t}] + b_{h}) \\ h_{t} = (1 - z_{t}) ⊙ h_{t - 1} + z_{t} ⊙ {\tilde{h}}_{t} \end{matrix} (1)

Where r_t denotes the reset gate; Z_t represents the update gate; h_t is the output; ${\tilde{h}}_{t}$ is the candidate activation; and σ denotes the sigmoid activation function.

BiGRU is an extension of the GRU model that integrates information flow from both forward and backward directions, thereby enhancing its ability to process time series data. By simultaneously capturing dependencies in both directions, the model achieves a more comprehensive understanding of the dynamic patterns within time series. Compared to the unidirectional GRU, BiGRU more effectively captures intricate temporal features, significantly improving prediction accuracy. This model maximizes the extraction of key features relevant to forecasting, resulting in enhanced precision and stability. The computational formulas of the model are shown in Equation 2. A schematic diagram of the BiGRU architecture is presented in Figure 1.

{\begin{array}{l} {\vec{h}}_{t} = GRU (x_{t}, {\vec{h}}_{t - 1}) \\ {\overset{\leftarrow}{h}}_{t} = GRU (x_{t}, {\overset{\leftarrow}{h}}_{t - 1}) \\ h_{t} = w_{t} {\vec{h}}_{t} + ν_{t} {\overset{\leftarrow}{h}}_{t} + b \end{array} (2)

Figure 1

Diagram depicting a Bidirectional Gated Recurrent Unit (GRU) network architecture. It consists of an input layer $x_1$ to $x_{t+1}$, a forward hidden layer, and a backward hidden layer both using GRU units. Outputs are generated at the output layer $h_1$ to $h_{t+1}$. Red arrows indicate forward direction and blue arrows indicate backward direction.

Figure 1. Structure diagram of the BiGRU model.

3.2 PDO module

The hyperparameters of the BiGRU network (learning rate, batch size, number of neurons per layer, and number of layers) significantly affect its performance in data recovery. Selecting appropriate hyperparameters is crucial for enhancing the model’s training effectiveness and overall predictive accuracy. In view of this, this study employs the PDO module to optimize the BiGRU’s hyperparameters. The PDO algorithm is inspired by two primary behaviors of prairie dogs: foraging and burrowing (Ezugwu et al., 2022; Sun et al., 2024e). During the foraging phase, prairie dogs search for new food sources within a certain area and communicate the location of food to other individuals. They also estimate the required burrowing effort based on the quality of the discovered food. In the burrowing phase, prairie dogs move according to the shared food location information and hide in burrows to evade predators. The algorithm divides the total number of iterations into four equal stages: the first two simulate foraging behavior, while the last two simulate burrowing behavior. This staged approach allows PDO to balance exploration and exploitation dynamically, thereby enhancing the effectiveness of hyperparameter optimization.

During the foraging phase, the algorithm further divides the total number of iterations evenly. When the iteration count satisfies t < Max_iter/4, individuals explore new food sources across the entire search space. The position update method for this phase is given by Equation 3.

\begin{array}{l} X_{i + 1, j + 1} = GBes t_{i, j} - CBes t_{i, j} \cdot ρ - C X_{i, j} \cdot Levy (n) & (3) \end{array}

Where GBest_i,j represents the global best position, ρ denotes the food source alert level, CX_i,j refers to the random cumulative effect of all individuals, and CBest_i,j indicates the current best position. The function Levy(n) follows the Levy distribution, which enhances the diversity of food source exploration and strengthens the algorithm’s global search capability.

The calculation formulas for GBest_i,j and CX_i,j are presented in Equations 4 and 5, respectively.

\begin{array}{l} CBes t_{i, j} = GBes t_{i, j} \cdot Δ + \frac{X_{i, j} \cdot mean (X_{n, m})}{GBes t_{i, j} \cdot ({ub}_{j} - {lb}_{j}) + Δ} & (4) \end{array}

\begin{array}{l} C X_{i, j} = (GBes t_{i, j} - r X_{i, j}) / (GBes t_{i, j} + Δ) & (5) \end{array}

Where Δ represents the difference between individuals; ub and lb denote the upper and lower bounds of the search space, respectively; rX refers to the position of a randomly selected individual.

When the iteration count satisfies Max_iter/4 ≤ t ≤ Max_iter/ 2, the algorithm enters the phase of evaluating food quality and determining the mining intensity, as detailed in Equation 6.

\begin{array}{l} X_{i + 1, j + 1} = GBes t_{i, j} \cdot rX \cdot DS \cdot Levy (n) & (6) \end{array}

Where DS represents the mining intensity.

During the first half of the burrowing activities, when the iteration count satisfies Max_iter / 2 ≤ t < 3 Max_iter / 4, the algorithm evaluates the quality of the food sources. The position update method is as follows:

\begin{array}{l} X_{i + 1, j + 1} = GBes t_{i, j} - CBes t_{i, j} \cdot ε - C X_{i, j} \cdot rand & (7) \end{array}

Where ε represents the quality of the food source, and rand is a random number between 0 and 1.

When the iteration count satisfies 3Max_iter /4 ≤ t < Max_iter, the prairie dogs retreat to their burrows to observe predators.

\begin{array}{l} X_{i + 1, j + 1} = GBes t_{i, j} \times PE \cdot rand & (8) \end{array}

Where PE represents the predator effect, as defined in Equation 9.

\begin{array}{l} PE = 1.5 \times {(1 - \frac{t}{{max}_{-} iter})}^{(\frac{2 t}{{max}_{-} iter})} & (9) \end{array}

Based on the above principle, this study integrates the PDO module with the BiGRU model. The PDO module systematically explores near-optimal combinations of hyperparameters (learning rate, batch size, the number of neurons per layer, and the number of network layers) by employing a predefined behavioral mechanism. Specifically, the learning rate is optimized within the range of [1e-4, 1e-2], ensuring a stable and efficient training process while preventing gradient explosion or slow convergence. The batch size is set between 16 and 256, enabling adaptive adjustments during mini-batch gradient descent; this facilitates efficient GPU memory utilization while preserving the model’s generalization capability. The number of layers is restricted to 1–4 to effectively mitigate gradient explosion during deep network training (Yang et al., 2016). The number of neurons per layer is optimized within the range of 32–256 to enhance the model’s capacity to capture temporal features while minimizing the risk of overfitting (Siami-Namini et al., 2019). In each iteration, the algorithm identifies a relatively optimal solution and updates the current configuration according to a specific replacement strategy. Through continuous iteration, the quality of the hyperparameter configuration progressively improves. Figure 2 illustrates the overall architecture of the PDO-BiGRU model.

Figure 2

Flowchart illustrating the process of integrating PDO and BiGRU models. The PDO process involves steps like setting parameters, executing activities, updating solutions, and returning the best solution. The BiGRU process starts with dataset preparation, followed by normalization, parameter setting, hyperparameter optimization, model training, and retraining the optimal PDO-BiGRU model. The chart uses boxes and arrows to depict the sequential steps, with specific equations and conditions noted in red text.

Figure 2. Flowchart of the PDO-BiGRU model.

3.3 GAN module

GANs are innovative deep learning architectures that demonstrate superior performance in data generation and complex distribution modeling tasks. The model consists of two main components: a generator and a discriminator. The generator learns the distribution characteristics of real data to produce highly similar synthetic samples, while the discriminator aims to distinguish whether the input data originates from the real dataset. These components are trained jointly through an adversarial process, where continuous competition drives ongoing improvements in model performance (Alqahtani et al., 2021). During training, the generator attempts to create samples that can “fool” the discriminator, whereas the discriminator strives to accurately differentiate between real and generated data. The loss functions for both components, which measure adversarial effectiveness and training convergence, are presented in Equations 10, 11.

\begin{array}{l} L_{G} = - E_{z ~ P_{z}} {log (D \cdot [G (z)])} & (10) \end{array}

\begin{array}{l} L_{D} = - E_{x ~ P_{r (x)}} {log [D (x)]} - E_{z ~ P_{z}} {log {1 - D [G (z)]}} & (11) \end{array}

Where L_D and L_G denote the discriminator and generator loss functions, respectively_. P_r is the true data distribution derived from the training set. P_z is the distribution of the data generated by the generator. The variable z is a latent variable drawn from a predefined prior distribution. D(x) indicates the discriminator’s output score for a real data sample x. G(z) refers to the synthetic data sample produced by the generator given the input z. D(G(z)) reflects the discriminator’s assessment of the generated data sample.

3.4 PDO-BiGRU-GAN network

This study proposes a time series imputation model—PDO-BiGRU-GAN—that integrates the PDO, BiGRU, and GAN. The model leverages BiGRU’s capability in capturing bidirectional temporal dependencies, PDO’s efficiency in hyperparameter optimization, and GAN’s potential in generating high-quality data. When addressing missing values in time series, the proposed approach demonstrates superior accuracy and robustness. In the modeling process, the generator receives time series data with missing values and utilizes a PDO-BiGRU architecture to extract bidirectional temporal features. Through adversarial training, it continuously generates data samples that increasingly resemble the real ones. Meanwhile, the discriminator distinguishes between generated and original samples, providing gradient feedback to the generator to enhance output quality. The training process is grounded in game-theoretic principles, where the generator and discriminator undergo iterative adversarial optimization, progressively enhancing the model’s imputation capability. To improve training stability, gradient penalty is applied to the discriminator. These strategies collectively reduce training oscillations and prevent overfitting, ensuring robust and reliable model convergence. The corresponding loss functions are defined in Equations 12–16.

\begin{array}{l} L_{R} = ∥ x ⊙ m - G (z) ⊙ m ∥_{2} & (12) \end{array}

\begin{array}{l} L_{G 2} = - D (\tilde{x}) & (13) \end{array}

\begin{array}{l} L_{C} = Deviation ({\tilde{c}}_{t}^{f}, {\tilde{c}}_{t}^{b}) & (14) \end{array}

\begin{array}{l} L_{G} = k \cdot L_{R} + L_{G 2} + L_{C} & (15) \end{array}

\begin{array}{l} L_{D} = - D (x) + D (\tilde{x}) & (16) \end{array}

Where L_R denotes the reconstruction loss, which measures the discrepancy in alignment of the model-generated sequence with the original incomplete input. L_C refers to the calibration loss, which quantifies the deviation between the forward- and backward-generated sequences in a time series. L_G2 represents the discriminator loss, which evaluates the authenticity of the generated data. k serves to modulate the relative impact of the discriminator loss compared to the reconstruction loss in the total loss formulation.

Based on this approach, the study employs the PDO-BiGRU-GAN network to learn spatiotemporal dependencies from available data to infer missing values, thereby enabling the recovery of pipeline monitoring data. Figure 3 presents the network architecture, illustrating the workflow of the proposed model. The pseudocode of the PDO-BiGRU-GAN network is provided in Appendix 1.

Figure 3

Diagram illustrating a process for handling incomplete time series data. The generator (G) uses the PDO-BiGRU model to impute missing data, producing complete time series data. The discriminator (D) evaluates the data using the same model. Loss functions include reconstruction loss $L_R$, calibration loss $L_C$, and discriminator loss $L_{G2}$. There is a generator loss function feeding back to the generator and a probability distribution $P_r$ with a loss $L_D$ affecting the discriminator.

Figure 3. Illustration of the PDO-BiGRU-GAN network architecture.

4 Introduction to the engineering case

The data used in this study were obtained from an open-access monitoring database released by the project owner. All data were standardized according to a unified format to facilitate subsequent data analysis and model development. The implementation of the open-access mechanism has significantly improved the accessibility and reusability of engineering monitoring data, providing a reliable and authentic foundation for this research. A brief overview of the project background is presented below.

This study is based on a typical open-access data project from a natural gas pipeline engineering initiative in Hebei Province, China. The project, organized and implemented by the owner, deployed an advanced fiber optic sensing monitoring system along an operational gas pipeline. This system covers five key monitoring areas to enable real-time and continuous surveillance of the pipeline’s operational status (Figure 4). Upon project completion, the collected monitoring data were made available to research institutions, providing a multidimensional platform for academic analysis and methodological validation.

Figure 4

Composite image with two panels: (a) Aerial view of a large plot of land labeled with areas 1 to 5, showcasing different sections and infrastructure layout. (b) Diagram of Area 2's sensor layout, including fiber optic cables connecting sensor units UL1 to UL7 and BL1 to BL7. A circular diagram indicates paths 1 and 2 with angles marked, suggesting directional layout.

Figure 4. Pipeline layout and monitoring areas: (a) Five key areas; (b) Schematic diagram of sensor deployment in area.

This study focuses on Area 2 as the primary research area, emphasizing the analysis of monitoring data collected by long-gauge fiber Bragg grating (FBG) sensors within this area. A dual-end data acquisition mode is employed, which enhances system stability and improves fault tolerance in the event of single-point sensor failure. Figure 4B illustrates the layout of the long-gauge FBG sensors in Area 2. All sensors were installed in strict accordance with national technical standards and industry regulations (DB32/T 2880-2016, 2016; TSG D7005-2018, 2018). Data acquisition was performed using the MOI Sm125-500 demodulator, ensuring high precision and reliability. The construction cost of the monitoring system approached one million yuan. The system was completed and put into operation on November 1, 2023, with data collection commencing on the same day.

5 Database construction and preprocessing

5.1 Database construction

Section 3 briefly introduces the basic overview of the pipeline monitoring project. Based on the project’s open-source data, this study conducted relevant analyses. Fiber-optic monitoring data exhibit high sensitivity to environmental factors. Under normal weather conditions, the data primarily correlate with variations in temperature and pipeline deformation. However, under rainy conditions, the changes in fiber-optic monitoring data become more complex, influenced by multiple factors such as rainfall, temperature, pipeline deformation, and groundwater levels. As an initial exploration into pipeline monitoring data recovery, this study focuses on analyzing the feasibility of recovering fiber-optic monitoring data under normal weather conditions. Specifically, the data were collected on January 15, 2024, with sensors sampling at 1 Hz. A total of 86,400 data sets were obtained that day, each containing monitoring information from 14 sensors. Subsequently, a database was constructed based on this data set. Two methods were considered for database construction: (1) Strain-based method: This method derives strain data by removing the temperature component from wavelength shifts. However, it requires additional temperature sensors, which may introduce measurement errors and reduce the spatiotemporal consistency of the dataset. (2) Wavelength difference method: This method calculates the difference between the wavelength measured on the collection day and the reference wavelength recorded on November 1, 2023. Because wavelength shifts inherently reflect both strain and temperature effects, directly using wavelength differences better preserves the spatiotemporal characteristics of the sensor data and improves the reliability of data recovery. Given these advantages, this study adopts the wavelength difference method for database construction. The constructed database is shown in Figure 5.

Figure 5

Graph with two panels showing wavelength difference over time for two paths. Panel (a) displays Path 1 with seven lines labeled UL1 to UL7, hovering around -0.15 nanometers over 24 hours. Panel (b) shows Path 2 with lines BL1 to BL7, around 0.02 to 0.08 nanometers. Both panels indicate fluctuating trends across the time period.

Figure 5. Database construction: (a) Path 1; (b) Path 2.

On this basis, it is necessary to further examine the causes and potential types of missing data to enable the construction of an incomplete dataset. Unlike conventional electrical sensors, the FBG sensors used in this study include multiple measurement points along a single optical fiber. This configuration results in distinctive data loss mechanisms. The main causes are as follows: (1) construction activities near the pipeline may render multiple sensors within localized regions unavailable. (2) Optical fibers are typically spliced, and splice points are prone to breakage, leading to partial data loss. (3) Individual FBG sensors may fail due to damage or aging. In this case, the optical path remains intact, but valid measurements cannot be obtained (Wu et al., 2020). (4) Signal attenuation occurs during long-distance optical transmission. Without proper amplification, excessive attenuation prevents correct data interpretation at the receiving end, resulting in data loss (Torres et al., 2011). Collectively, these factors contribute to the occurrence of missing data in pipeline monitoring.

FBG monitoring systems typically perform signal demodulation via single-end wiring. However, a break in the optical fiber may prevent downstream sensors from functioning. To mitigate this risk, a dual-end redundant wiring strategy was implemented in this study (Figure 4). This strategy ensures that a fault in one sensor does not compromise the monitoring of others, thereby minimizing the overall system impact. Based on this sensor architecture and relevant literature (Jiang et al., 2022; Tien et al., 2024), missing sensor data are classified into two types: single-sensor loss and multiple-sensor loss. To investigate these scenarios, several incomplete datasets were constructed to simulate different conditions: (1) Single-sensor loss with proportions of 1/24, 8/24, 16/24, 20/24, and 22/24; and (2) Multiple-sensor loss involving 2/14, 4/14, 8/14, and 14/14 sensors. Figure 6 illustrates the incomplete datasets, while Table 1 summarizes key information. This completes the construction of the incomplete database.

Table 1

Table 1. Statistics of the incomplete dataset.

5.2 Database preprocessing

Section 5.1 presents the construction of various types of incomplete datasets based on engineering data. Before inputting these multi-condition incomplete datasets into the PDO-BiGRU-GAN network, data normalization is a critical preprocessing step (Liu et al., 2025b; Sun et al., 2025d). Normalization eliminates dimensional inconsistencies among features, ensuring that variables vary within comparable numerical ranges (Lv et al., 2023; Sun et al., 2023b). This reduces the risk of gradient shift and enhances model stability during training. Moreover, mapping raw data to a unified scale improves training efficiency, accelerates convergence, and mitigates the likelihood of the model becoming trapped in local minima (Li et al., 2025b). The specific normalization formula is provided in Equation 17 (Xie et al., 2025a).

\begin{array}{l} G_{i} = \frac{Q_{i} - Q_{\min}}{Q_{\max} - Q_{\min}} & (17) \end{array}

Where Q_i and G_i are the original and normalized values of measured data, respectively. Q_max and Q_min are the maximum and minimum values of measured data, respectively.

6 Analysis of data recovery results based on the PDO-BiGRU-GAN network

Using incomplete datasets and the PDO-BiGRU-GAN network, this study investigates data recovery performance under various missing data scenarios. The experiments were conducted on the TensorFlow platform with hardware comprising 256 GB of memory, an NVIDIA TITAN X (Pascal) GPU, and two Intel Xeon(R) E5-2696 v4 processors, ensuring efficient handling of large-scale computations. Specifically, Section 6.1 analyzes the model’s hyperparameter sensitivity to demonstrate the necessity of integrating the PDO module. Section 6.2 employs ablation experiments to evaluate the contribution of each module to overall performance. Section 6.3 examines recovery performance across different missing data ratios (1/24, 8/24, 16/24, 20/24, and 22/24). Section 6.4 further assesses recovery under multiple sensor missing scenarios (2/14, 4/14, 8/14, and 14/14). Section 6.5 examines the model’s computational time. Additionally, to comprehensively evaluate the proposed model’s effectiveness, it is compared against eight existing deep learning methods.

6.1 Hyperparameter sensitivity analysis

This study employs the PDO module to optimize four key hyperparameters: learning rate, batch size, units per layer, and number of layers. The rationale for focusing on these hyperparameters is explained as follows. The learning rate controls the speed of parameter updates. An excessively high learning rate can cause oscillation or divergence, while a rate that is too low may result in slow convergence or entrapment in local optima (Dohare et al., 2024). Batch size directly affects both generalization and computational efficiency. Smaller batches increase the stochasticity of gradient estimates, thereby enhancing generalization, whereas larger batches enable faster computation and more stable convergence (Offiong et al., 2023). The number of neurons per layer determines the representational capacity of each layer. Too few neurons can lead to underfitting, while too many may cause overfitting and substantially increase computational cost. Network depth, indicated by the number of layers, reflects the model’s capacity for feature extraction. Shallow networks may fail to capture long-term dependencies, while excessively deep networks can suffer from vanishing gradients, overfitting, and training instability (Offiong et al., 2023). Given these considerations, the PDO module focuses on optimizing these four hyperparameters to enhance model performance while controlling computational costs. In contrast, secondary hyperparameters, such as dropout rate or regularization coefficients, exert only indirect effects. Including them would significantly expand the search space, potentially increasing computational costs and reducing optimization efficiency (Bian and Priyadarshi, 2024). Therefore, this study excludes them from the optimization process.

Figure 6

White background with no visible content.

Figure 6. Illustration of the incomplete dataset: (a) Missing data ratio 1/24; (b) Missing data ratio 8/24; (c) Missing data ratio 16/24; (d) Missing data ratio 20/24; (e) Missing data ratio 22/24; (f) Missing data from 2 sensors; (g) Missing data from 4 sensors; (h) Missing data from 8 sensors; (i) Missing data from 14 sensors.

To evaluate the effectiveness of the PDO module in hyperparameter optimization, this study conducts a hyperparameter sensitivity analysis. This approach assesses the model’s performance variations across different hyperparameter configurations, thereby demonstrating the necessity of integrating the PDO module. Table 2 presents the configurations of learning rate, batch size, units per layer, and number of layers automatically selected by the PDO module across various recovery tasks. These results indicate that the PDO module can select appropriate hyperparameter combinations based on task characteristics. Specifically, when the missing ratio is relatively low (e.g., 1/24), the model tends to adopt a higher learning rate (0.02), a smaller batch size (32), a shallow two-layer structure, and an asymmetric distribution of units (98). This “low-capacity–shallow” configuration reduces complexity and mitigates overfitting while preserving temporal features. In contrast, under a higher missing ratio (e.g., 20/24), the model prefers a lower learning rate (0.0011), a larger batch size (128), a larger number of units (196), and a deeper three-layer stacked structure. Such a configuration enhances the generative adversarial network’s ability to model sparse data and enables it to capture long-term dependencies through increased depth. Further analysis involves varying each hyperparameter sequentially to assess the tuning effect of the PDO module, as shown in Figure 7. It is evident that the hyperparameter configurations selected by the PDO module yield the lowest MSE, thereby achieving optimal tuning and enhanced model performance. Overall, the hyperparameter sensitivity analysis demonstrates that the PDO module exhibits strong adaptability to varying task complexities by dynamically adjusting hyperparameter settings, thereby enhancing the accuracy of data recovery.

Table 2

Table 2. Hyperparameter optimization results.

Figure 7

Two sets of bar charts show the effect of different hyperparameters on mean squared error (MSE) multiplied by 100, under a missing time ratio of 1/24. The charts are labeled (a) and (b). The first row (a) shows bar charts for learning rate, batch size, number of layers, and units per layer, indicating optimal hyperparameters and respective MSE values. The second row (b) presents the same hyperparameter categories with different values and MSE outcomes, highlighting variations.

Figure 7. Hyperparameter sensitivity analysis: (a) Missing data ratio 1/24; (b) 2 Missing sensors.

6.2 Ablation study analysis

Ablation experiments selectively remove specific components of a model to evaluate their impact on overall performance. This approach effectively confirms the necessity and contribution of each module in the model architecture. Based on this methodology, the present study analyzed three models—GAN, BiGRU-GAN, and PDO-BiGRU-GAN—via selective module removal (Table 3). Initially, loss curves were plotted for the three models under a missing data ratio of 1/24 (Figure 8). The basic GAN model exhibited a rapid decrease in loss; however, its loss curves displayed pronounced fluctuations, indicating instability in missing data recovery tasks. Incorporating the BiGRU module (BiGRU-GAN) produced smoother loss curves, demonstrating that the integration of temporal information enhances model stability and generalization. Further addition of the PDO module in PDO-BiGRU-GAN achieved more favorable convergence characteristics. Loss decreased rapidly and stabilized at a low level, indicating that PDO-based hyperparameter optimization significantly improves training efficiency and overall model performance. The contributions of each component were further assessed under varying data missing ratios of 1/24, 8/24, and 16/24 (Table 3). The results indicate that removing either the PDO module or the BiGRU structure degrades the model’s predictive capability. Specifically, when the missing data ratio ranges from 1/24 to 16/24, excluding the PDO module leads to increases in mean absolute error (MAE) by 9.80–21.88%, root mean square error (RMSE) by 18.30–23.05%, mean absolute percentage error (MAPE) by 16.72–27.23%, and MSE by 33.26–40.79%, while coefficient of determination (R²) decreases by 0.53–1.00%. Further removal of the BiGRU structure causes more pronounced performance deterioration: MAE increases by 53.67–60.03%, RMSE by 42.28–61.73%, MAPE by 42.35–57.07%, MSE by 66.68–85.35%, and R² declines by 3.27–5.08%. These findings indicate that the GAN module provides fundamental generative capability, the BiGRU structure captures temporal dependencies to improve reconstruction accuracy, and the PDO module optimizes key hyperparameters to enhance training efficiency and generalization (Ezugwu et al., 2022). The synergistic effect of these three components enables PDO-BiGRU-GAN to achieve optimal performance in data imputation tasks, confirming the necessity and contribution of each module in the model architecture.

Table 3

Table 3. Ablation study results.

Figure 8

Line graph showing loss versus epoch for three models: PDO-BiGRU-GAN (purple), BiGRU-GAN (teal), and GAN (green). The loss significantly decreases within the first 100 epochs, stabilizing below 0.5 after initial spikes above 3.0.

Figure 8. Loss curves of the three models (GAN, BiGRU-GAN, and PDO-BiGRU-GAN) under a missing data ratio of 1/24.

6.3 Data recovery results of the PDO-BiGRU-GAN model under different missing data ratios

The pipeline project employed a total of 14 sensors. This section focuses on analyzing three sensors: BL1, BL4, and BL7. The recovery performance of these sensors was evaluated using the PDO-BiGRU-GAN model under varying missing data ratios of 1/24, 8/24, 16/24, 20/24, and 22/24. Figure 9 illustrates five performance metrics of data recovery based on the PDO-BiGRU-GAN model. Notably, all data were normalized to eliminate the influence of differing measurement units (Liu et al., 2025c; Sun et al., 2025e). Overall, the PDO-BiGRU-GAN model demonstrated strong recovery capabilities across all missing data ratios, with the R² consistently above 0.95. Further analysis revealed that as the missing data ratio increased, error metrics (MSE, RMSE, MAPE, and MAE) showed an increasing trend, while R² values gradually decreased. Particularly, a sharp decline in model performance occurred when the missing ratio increased from 20/24 to 22/24. This performance drop is attributed to the significant reduction of available historical data, which impairs the model’s ability to capture the intrinsic temporal patterns of the time series, thus weakening the imputation effect. Under conditions of extreme data sparsity, the model struggles to accurately restore complex features, resulting in substantially higher error metrics and notably lower R² values. Therefore, when employing the PDO-BiGRU-GAN model for data recovery, the missing data ratio within incomplete data windows should be maintained below 20/24 to ensure optimal recovery performance.

Figure 9

Three 3D cone plot graphs labeled (a), (b), and (c) compare performance metrics. The axes represent MAPE, MSE multiplied by 100, MAE multiplied by 10, and RMSE multiplied by 10 against R-squared values. The cones vary in height and color intensity, indicating different values of the metrics. Each cone is labeled with numerical values for clarity.

Figure 9. Data recovery results based on the PDO-BiGRU-GAN network: (a) BL1; (b) BL4; (c) BL7.

This section presents a detailed comparison between the proposed PDO-BiGRU-GAN model and eight representative data recovery models, aiming to demonstrate its superior performance. These models span from basic generative frameworks to advanced architectures, providing a comprehensive overview of mainstream techniques in time-series recovery tasks. The selected models include the traditional GAN, GRU-GAN, LSTM-GAN, CNN-GRU-GAN, CNN-LSTM-GAN, Bi-GRU-GAN, Bi-LSTM-GAN, and STOA-Bi-LSTM-GAN. These models span from basic generative frameworks to advanced architectures, providing a comprehensive overview of commonly used techniques in time-series recovery tasks. Specifically, the traditional GAN serves as a baseline generative model. GRU-GAN and LSTM-GAN emphasize unidirectional temporal dependencies, making them suitable for capturing long-term trends. Bi-GRU-GAN and Bi-LSTM-GAN capture both past and future dependencies through bidirectional sequence modeling, thereby improving recovery accuracy for rare events or edge cases. CNN-GRU-GAN and CNN-LSTM-GAN extract local temporal features via convolutional layers and combine them with recurrent structures to model multivariate dependencies, thereby enhancing recovery of complex patterns. The STOA-Bi-LSTM-GAN represents the state-of-the-art approach, integrating bidirectional recurrent modeling with optimized training strategies to achieve greater adaptability and stability. To ensure a fair comparison between PDO-BiGRU-GAN and the eight baseline models, all models were trained under identical data preprocessing and normalization conditions (Sun et al., 2024f; Xie et al., 2025b), and with the same computational resources. Notably, PDO-BiGRU-GAN employs the PDO algorithm for automated optimization, whereas STOA-Bi-LSTM-GAN uses the STOA algorithm. The hyperparameters of the remaining seven baseline models were not taken directly from literature values, as optimal settings can vary across different data recovery tasks. Instead, a grid search was conducted to tune these models, ensuring they achieved their best possible performance for the current task and maintaining the objectivity of the comparison.

For clarity, this study selected the recovery results of sensor BL4 under three missing data ratios: 1/24, 8/24, and 16/24. Figure 10 presents radar charts of the performance metrics for all models under these conditions. The results indicate that PDO-BiGRU-GAN consistently achieves the lowest error metrics (MSE, RMSE, MAPE, MAE) and the highest R² across all three scenarios, demonstrating its superior data recovery capability. In contrast, the remaining eight models exhibit varying degrees of performance degradation across the five evaluated metrics. For example, at a missing ratio of 1/24, the PDO-BiGRU-GAN model attained an R² of 0.9902, MSE of 0.001128, RMSE of 0.03358, MAPE of 0.05898, and MAE of 0.02089. Compared to other models, PDO-BiGRU-GAN improved R² by 0.197–3.28%, reduced MSE by 21.77–66.68%, RMSE by 11.55–42.28%, MAPE by 5.25–42.36%, and MAE by 2.93–53.67%. To verify the statistical significance of these performance differences, Wilcoxon signed-rank tests were conducted on R², MAE, RMSE, MSE, and MAPE [see Taheri and Hesamian (2013) and Woolson (2007) for calculation details]. The results indicate that PDO-BiGRU-GAN significantly outperforms all eight baseline models across all metrics (p < 0.05), confirming that its performance improvements are statistically robust and not due to random variation. Overall, the PDO-BiGRU-GAN model exhibits optimal recovery performance across various missing data conditions.

Figure 10

Three radar charts labeled (a), (b), and (c) compare different GAN models: PDQ-BGRU-GAN, STOA-B-LSTM-GAN, BGRU-GAN, CNN-LSTM-GAN, BLSTM-GAN, CNN-GRU-GAN, LSTM-GAN, GRUL-GAN, and GAN. Metrics include R2, RMSE, MAPE, MAE, and MSE. Each chart shows model performance across these metrics with varying values on the axes.

Figure 10. Performance comparison between PDO-BiGRU-GAN and eight models: (a) 1/24; (b) 8/24; (c) 16/24.

6.4 Data recovery results of the PDO-BiGRU-GAN model under different numbers of missing sensors

To assess the model’s robustness in the face of multi-sensor data loss, this section investigates the recovery capabilities of the PDO-BiGRU-GAN model under varying degrees of sensor unavailability. Four missing data scenarios were designed: missing 2 sensors (BL3, UL3), missing 4 sensors (BL3, UL3, BL4, UL4), missing 8 sensors (BL2, UL2, BL3, UL3, BL4, UL4, BL5, UL5), and missing 14 sensors. In all scenarios, data from UL3 and BL3 were reconstructed. To simplify the analysis, only the recovery results of sensors UL3 and BL3 were evaluated. Figure 11 presents the performance metrics of the PDO-BiGRU-GAN model for recovering UL3 and BL3 under different levels of sensor loss. All performance metrics were computed using normalized values to ensure comparability and mitigate the impact of differences in data magnitude. Overall, the results indicate that the PDO-BiGRU-GAN model consistently demonstrates strong recovery performance across all scenarios. The R² values remain generally above 0.93, suggesting that the model effectively captures spatiotemporal features and reconstructs missing data by leveraging latent dynamic correlations among sensors. However, as the number of missing sensors increases, the R² value gradually decreases. Meanwhile, error metrics (MSE, RMSE, MAPE, MAE) increase accordingly, indicating a decline in model performance. This degradation can be attributed to two primary factors: the diminishing volume of available reference data and the growing complexity of underlying data patterns, both of which increase the difficulty of accurately estimating missing values. Despite the observed degradation as the number of missing sensors increases, the PDO-BiGRU-GAN model still exhibits remarkable recovery performance when dealing with multi-sensor data loss. Its advanced capacity for learning temporal–spatial dependencies and modeling nonlinear relationships makes it particularly suitable for recovering critical sensor information under complex operating conditions. These advantages highlight the model’s significant potential for application in intelligent monitoring systems and structural health diagnostics, offering both wide applicability and notable engineering value.

Figure 11

Two 3D bar charts labeled (a) and (b) show data on sensor errors with varying heights for metrics like R2, MAPE, RMSE, MAE, and MSE against different sensor missing conditions. Each bar is represented as a cone with corresponding error values annotated above.

Figure 11. Data recovery results for sensors UL3 and BL3 with multiple missing sensors using the PDO-BiGRU-GAN model: (a) UL3; (b) BL3.

Furthermore, this section provides a comprehensive evaluation of the PDO-BiGRU-GAN model’s recovery performance compared to eight other models under multi-sensor data loss conditions. For brevity, sensor UL3 is chosen as the representative for multi-model comparison. Figure 12 presents the performance metrics of PDO-BiGRU-GAN and the eight compared models. The results demonstrate that PDO-BiGRU-GAN consistently achieves the best performance across all evaluation metrics, exhibiting the lowest error levels and the highest R² values. For example, with eight sensors missing, PDO-BiGRU-GAN attains an R² of 0.9522, MSE of 0.001588, RMSE of 0.03985, MAPE of 0.7209, and MAE of 0.02960. Compared to other models, its R² improves by 0.197 to 3.28%; MSE decreases by 21.77 to 66.68%; RMSE reduces by 11.55 to 42.28%; MAPE declines by 5.25 to 42.36%; and MAE lowers by 2.93 to 53.67%. Consistent with Section 6.3, Wilcoxon signed-rank tests were conducted on the five performance metrics across different numbers of missing sensors. The results show that PDO-BiGRU-GAN outperforms all eight comparison models, with p-values below 0.05 for all metrics, indicating that the performance differences are statistically significant. Overall, the comparison among the nine models indicates that PDO-BiGRU-GAN maintains superior recovery accuracy for UL3 data across different sensor loss scenarios. Furthermore, the model shows similar stable advantages in recovering data from other sensors; however, these results are not detailed here due to space constraints. In summary, PDO-BiGRU-GAN demonstrates excellent imputation capability under multi-sensor data loss conditions and represents a promising approach for this problem.

Figure 12

Four radar charts labeled (a), (b), (c), and (d) compare the performance of various models: PDO-BiGRU-GAN, STOA-Bi-LSTM-GAN, BiGRU-GAN, CNN-LSTM-GAN, Bi-LSTM-GAN, CNN-GRU-GAN, LSTM-GAN, GRU-GAN, and GAN. The charts assess R2, MAPE, RMSE, MAE, and MSE across different scales. Each model is represented by different colors and lines.

Figure 12. Data recovery performance of UL3 under multiple missing sensors: (a) 2 Missing sensors; (b) 4 Missing sensors; (c) 8 Missing sensors; (d) 14 Missing sensors.

6.5 Computational efficiency analysis

The PDO-BiGRU-GAN model has a relatively complex architecture and many parameters, resulting in higher computational costs during training and inference. To evaluate its feasibility and deployment potential in practical applications, a systematic assessment of its computation time is necessary. This section compares the computation time of PDO-BiGRU-GAN with eight existing methods under identical task conditions, as shown in Figure 13. Compared with simpler models (standard GAN, GRU-GAN, and LSTM-GAN), PDO-BiGRU-GAN’s computation time increases by approximately 8.77 to 15.11%. Despite this increase, the additional cost is acceptable given the significant improvement in recovery accuracy. Further comparisons show that, relative to more complex architectures (CNN-GRU-GAN, CNN-LSTM-GAN, Bi-LSTM-GAN, and BiGRU-GAN), PDO-BiGRU-GAN’s computation time increases only by 1.25 to 7.32%, indicating that slight computational overhead yields substantial performance gains. Additionally, relative to the most complex STOA-Bi-LSTM-GAN model, PDO-BiGRU-GAN reduces computation time by approximately 3.17 to 7.27%. Overall, PDO-BiGRU-GAN incurs only a slight increase in computational cost compared to simpler models, while outperforming the most complex ones in efficiency. This advantage results from two main factors. First, the PDO algorithm employs a more efficient hyperparameter search strategy, significantly reducing ineffective computations during training (Biswas et al., 2024; Izci et al., 2024). Second, the model architecture preserves essential feature extraction capabilities while avoiding redundant layer stacking, effectively controlling resource consumption and ensuring improved recovery efficiency. In summary, PDO-BiGRU-GAN achieves substantial improvements in recovery accuracy while maintaining controlled computation time, making it a promising model for pipeline monitoring data recovery tasks.

Figure 13

Two bar charts compare computational efficiency across different GAN models. Chart (a) shows performance based on the training ratio, ranging from one twenty-fourth to twenty-four twenty-fourths. Chart (b) depicts efficiency with varying numbers of missing sensors, from two to fourteen. Each model is color-coded, including GAN, CNN-GRU-GAN, Bi-LSTM-GAN, GRU-GAN, BiGRU-GAN, PDO-BiGRU-GAN, LSTM-GAN, CNN-LSTM-GAN, and STOA-Bi-LSTM-GAN. Trends show increasing efficiency with higher training ratios and more missing sensors.

Figure 13. Computation time comparison of nine models across different data recovery tasks: (a) Varying missing data ratios; (b) Multiple missing sensors.

6.6 Summary of comparative analysis between PDO-BiGRU-GAN and eight existing models

Sections 6.3–6.5 systematically compare the performance of PDO-BiGRU-GAN with eight existing models—GAN, GRU-GAN, LSTM-GAN, CNN-GRU-GAN, CNN-LSTM-GAN, BiGRU-GAN, BiLSTM-GAN, and STOA-BiLSTM-GAN—on data reconstruction tasks. This section further summarizes the advantages and limitations of each model and the trade-offs between accuracy and computational cost, as shown in Table 4. In the pipeline monitoring project examined in this study, PDO-BiGRU-GAN emerged as the optimal model based on a trade-off between accuracy and computational efficiency. For future research, investigators can select an appropriate model based on the characteristics in Table 4 and the specific requirements of their projects for data reconstruction.

Table 4

Table 4. Comparative analysis of the proposed PDO-BiGRU-GAN model and eight existing models.

7 Discussion

This study tackles the prevalent issue of missing data in pipeline monitoring by proposing a novel PDO-BiGRU-GAN framework. The framework integrates three key components: the PDO module for hyperparameter optimization, the BiGRU module for temporal feature extraction, and the GAN module for data generation and distribution approximation. To validate the method, a pipeline monitoring dataset was established using field data collected from actual pipeline projects. The study first analyzes the model’s sensitivity to hyperparameters, demonstrating the necessity of the PDO module in the optimization process. Ablation experiments were then conducted to assess the independent contribution of each module. Furthermore, the proposed model is compared with eight mainstream deep learning models in terms of prediction accuracy and computational efficiency. Overall, the PDO-BiGRU-GAN framework effectively reconstructs missing information in pipeline monitoring from a data-driven perspective, thereby providing more complete and reliable support for pipeline performance evaluation.

Although the proposed PDO-BiGRU-GAN model demonstrates high accuracy in recovering missing data in pipeline monitoring, it has several key limitations. These limitations can be categorized into three areas: the model itself, engineering applicability, and the intelligence of the pipeline monitoring system. Regarding the limitations of the model itself, the PDO-BiGRU-GAN model can generate samples that closely match the statistical characteristics of real data. However, it may struggle to capture rare fault patterns or extreme anomalies, such as those occurring under heavy rain, snow, or pipeline malfunctions. This limitation could potentially pose risks in pipeline monitoring. Future research could focus on modeling such exceptional operating conditions to enhance the model’s learning capability and robustness. Due to the focus and length constraints of the current study, the model’s performance across datasets of varying scales was not examined. Broader investigations into diverse data loss scenarios are required to address this gap. The approach also heavily depends on training data. When historical data are biased or incomplete, the model may produce misleading patterns, compromising the reliability of decisions. To mitigate this, future studies could incorporate physical model constraints, expert knowledge, or multi-source data into the PDO-BiGRU-GAN framework. This would reduce reliance on single historical datasets and improve both anomaly detection and overall predictive performance. Additionally, due to space limitations, this study only explored the combination of GAN and BiGRU modules and validated its effectiveness. Future research could explore the integration of GANs with other recurrent architectures, such as Transformer-based models or temporal convolutional networks, to assess data recovery performance across different frameworks. Finally, the PDO-BiGRU-GAN model is inherently a black-box model, lacking transparency and interpretability in its generation process. In practical applications, false positives or false negatives could complicate responsibility assignment and regulatory compliance. Future work could incorporate interpretability-enhancing techniques, coupled with uncertainty quantification and human-in-the-loop verification, to ensure the safety and reliability of data recovery.

Regarding engineering applicability, several limitations should be noted. First, the model has only been validated on the pipeline project in Tangshan, Hebei. Its generalizability across different pipelines or regions remains unassessed. Future studies could investigate the model’s transferability and adaptability across various pipeline types, geographic regions, and operating conditions. Second, the model requires significant computational resources during training, limiting direct deployment on low-power devices. In this study, it required approximately 2 GB of GPU memory and 4.9 GFLOPs. Future work could explore model lightweighting, parameter compression, and efficient inference strategies to enable deployment on low-power or edge devices. Third, the model can function as a data recovery module within pipeline monitoring systems and can be seamlessly integrated with existing optical fiber or other sensor data acquisition systems. While training in this study was conducted on a high-performance workstation, practical deployment can leverage a single GPU or high-performance CPU depending on data scale and real-time requirements, meeting computational demands for data recovery. From a software perspective, the model was developed on the TensorFlow platform and can interface with existing industrial pipeline monitoring systems, supporting deployment and extension across mainstream deep learning frameworks. Finally, this study applied the model solely to optical fiber sensor data recovery. Future research could extend the model to other sensor types, such as pressure, flow, and temperature, to systematically evaluate its applicability, stability, and cross-sensor generalization. This would further validate the model’s versatility across multi-source heterogeneous monitoring data.

In the context of intelligent pipeline monitoring, blockchain technology can be leveraged to enhance system reliability and automation. It ensures the integrity and traceability of both raw sensor data and AI-reconstructed data. Moreover, smart contracts can automatically trigger alerts or initiate maintenance actions based on recovered signals. For example, when a recovered signal indicates a potential risk, a smart contract can immediately activate warning mechanisms or execute pre-defined maintenance tasks, thereby reducing manual intervention and improving response efficiency. Furthermore, blockchain-based decentralized learning frameworks can improve privacy and robustness in multi-site deployments (Ressi et al., 2024b). Overall, the integration of blockchain, artificial intelligence, and other advanced technologies holds significant promise for advancing intelligent pipeline monitoring and providing a more robust technical foundation for the long-term safety of pipelines.

8 Conclusion

This study developed a novel PDO-BiGRU-GAN network to efficiently recover missing pipeline data. The model’s performance was evaluated using real engineering monitoring data under various types of data loss. The main findings are summarized as follows:

1. This study developed a novel PDO-BiGRU-GAN network, which integrates the hyperparameter optimization capability of the PDO module, the temporal feature extraction strength of the BiGRU module, and the data generation and imputation functionality of the GAN module. The proposed network was subsequently applied to recover missing data in pipeline monitoring systems.

2. Using an open-access pipeline project, a pipeline monitoring dataset was obtained. This dataset was employed to evaluate the proposed PDO-BiGRU-GAN network. Hyperparameter sensitivity analysis and ablation experiments were conducted to assess the model. The sensitivity analysis demonstrated that the PDO module substantially improved model performance by guiding optimal hyperparameter selection. Ablation experiments further showed that removing either the PDO or BiGRU module led to significant performance degradation, underscoring their essential roles in enhancing data recovery accuracy.

3. The study evaluated the data recovery capability of the PDO-BiGRU-GAN model under various missing-data scenarios. Results demonstrated that the model accurately reconstructed missing values by effectively leveraging underlying spatiotemporal dependencies, achieving an R² greater than 0.93. Furthermore, to maintain optimal recovery performance, the missing data ratio within any given window should not exceed approximately 20/24.

4. The study compared the proposed PDO-BiGRU-GAN model with eight existing models in terms of accuracy and computational efficiency. Results indicated that PDO-BiGRU-GAN achieved the lowest values across all error metrics (MSE, RMSE, MAPE, MAE) and the highest R², demonstrating a clear advantage in accuracy. Moreover, the model’s computation time increased only marginally. Overall, PDO-BiGRU-GAN substantially improved data recovery accuracy while maintaining efficient computational performance, highlighting its promise for pipeline monitoring applications.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

YZ: Conceptualization, Writing – review & editing. XZ: Conceptualization, Writing – review & editing. YL: Methodology, Writing – review & editing. XM: Software, Writing – review & editing. XC: Validation, Writing – review & editing. YM: Validation, Writing – review & editing. WH: Writing – original draft, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. The authors gratefully acknowledge the financial support provided by the National Natural Science Foundation of China (No. 52208290), the Natural Science Foundation of Sichuan Province (No. 2022NSFSC1046).

Conflict of interest

YZ was employed by the Xinjiang Yaxin CBM Exploration and Development Co., Ltd. and XZ, YL, XM, XC, and YM were employed by the Gas Storage Co., Ltd.

The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Abualigah, L., Diabat, A., Mirjalili, S., Abd Elaziz, M., and Gandomi, A. H. (2021). The arithmetic optimization algorithm. Comput. Methods Appl. Mech. Eng. 376:113609. doi: 10.1016/j.cma.2020.113609

Pipeline monitoring data recovery using novel deep learning models: an engineering case study

1 Introduction

2 Motivation for developing the PDO-BiGRU-GAN network

3 Basic principles of the PDO-BiGRU-GAN network

3.1 BiGRU module

3.2 PDO module

3.3 GAN module

3.4 PDO-BiGRU-GAN network

4 Introduction to the engineering case

5 Database construction and preprocessing

5.1 Database construction

5.2 Database preprocessing

6 Analysis of data recovery results based on the PDO-BiGRU-GAN network

6.1 Hyperparameter sensitivity analysis

6.2 Ablation study analysis

6.3 Data recovery results of the PDO-BiGRU-GAN model under different missing data ratios

6.4 Data recovery results of the PDO-BiGRU-GAN model under different numbers of missing sensors

6.5 Computational efficiency analysis

6.6 Summary of comparative analysis between PDO-BiGRU-GAN and eight existing models

7 Discussion

8 Conclusion

Data availability statement

Author contributions

Funding

Conflict of interest

Generative AI statement

Publisher’s note

References

Appendix 1: Pseudocode of the PDO model.