Intelligent leak monitoring of oil pipeline based on distributed temperature and vibration fiber signals

Liang, Xiaobin; Deng, Yonghong; Wang, Yibin; Li, Hongtao; Ma, Weifeng; Wang, Ke; Ren, Junjie; Ma, Ruijiao; Zhang, Shuai; Liu, Jiawei; Wu, Wei

doi:10.3389/fdata.2025.1667284

ORIGINAL RESEARCH article

Front. Big Data, 20 November 2025

Sec. Machine Learning and Artificial Intelligence

Volume 8 - 2025 | https://doi.org/10.3389/fdata.2025.1667284

Intelligent leak monitoring of oil pipeline based on distributed temperature and vibration fiber signals

XL
Xiaobin Liang ¹
YD
Yonghong Deng ²
YW
Yibin Wang ²
HL
Hongtao Li ²
WM
Weifeng Ma ¹
KW
Ke Wang ¹
JR
Junjie Ren ¹
RM
Ruijiao Ma ³
SZ
Shuai Zhang ³
JL
Jiawei Liu ³
WW
Wei Wu ³^*

1. Institute of Safety Assessment and Integrity, State Key Laboratory of Oil and Gas Equipment, CNPC Tubular Goods Research Institute, Xi'an, China
2. Engineering Technology Research Center, Hancheng Gas Production Management Area of PetroChina Coalbed Methane Co., Ltd., Hancheng, China
3. School of Chemical Engineering, Northwest University, Xi'an, China

Article metrics

View details

1,1k

Views

Downloads

Abstract

Due to long-term usage, natural disasters and human factors, pipeline leaks or ruptures may occur, resulting in serious consequences. Therefore, it is of great significance to monitor and conduct real-time detection of pipeline leaks. Currently, the mainstream methods for pipeline leak monitoring mostly rely on a single signal, which have significant limitations such as single temperature being susceptible to environmental temperature interference leading to misjudgment, and single vibration signal being affected by pipeline operation noise. Based on this phenomenon, this research has built a distributed optical fiber system as an experimental platform for temperature and vibration monitoring, obtaining 3,530 sets of real-time synchronized spatial-temporal temperature and vibration signals. A dual-parameter fusion residual neural network structure has been constructed, which can extract characteristic signals from the original spatial-temporal temperature and vibration signals obtained from the above monitoring system, thereby achieving a classification accuracy of 92.16% for pipeline leak status and a leakage location accuracy of 1 m. This solves the problem of insufficient feature extraction and weak anti-interference ability in single signal monitoring. By fusing the original temperature and vibration signals, more leakage features can be extracted. Therefore, compared with single signal monitoring, this study has improved the accuracy of leakage identification and location, bridging the gap of misjudgment caused by single signal interference, and providing a basis for pipeline leakage monitoring and real-time warning in the oil industry.

1 Introduction

With the increasing global demand for energy, pipeline as an important channel of energy transmission, its safe and reliable operation is of great significance to ensure energy supply and social stability. However, due to various reasons such as long-term use, natural disasters and human factors, various failure problems will occur in pipelines. Pipeline leaks or ruptures pose significant hazards, encompassing environmental pollution, substantial economic losses, and even human casualties. Consequently, the implementation of pipeline monitoring and early warning systems is particularly crucial for enhancing operational safety and ensuring energy supply.

At present, the common technology of pipeline leakage monitoring can be categorized as hardware detection and software monitoring according to the main implementation mode (Wu et al., 2023). Hardware detection includes acoustic emission detection, soil detection, ultrasonic detection, cable detection, and distributed optical fiber detection method (Mostafapour and Davoudi, 2013; Leinov et al., 2016; Datta and Sarkar, 2016; Zuo et al., 2020). Software monitoring includes mass (or volume) balance method, pressure point analysis method, pressure gradient method, statistical method, and real-time transient modeling method (Mujtaba et al., 2020; Fukushima et al., 2000; Lu et al., 2020; Abbaspour and Chapman, 2008; Wang et al., 2019). Most of the traditional hardware methods need manual participation, low efficiency, high operating cost, small coverage, easy to be disturbed by human factors, and can't be continuously monitored. As a novel pipeline detection technology proposed in recent years, the distributed optical fiber detection method can achieve uninterrupted parameter acquisition along the optical fiber path. Optical fibers can not only achieve simultaneous sensing and transmission but also possess an extended detection range, high measurement precision, strong corrosion resistance, and immunity to electromagnetic interference. Statistical method and real-time transient modeling method are the most widely used methods with the highest accuracy among software monitoring methods. However, real-time transient modeling needs to collect various detailed parameters of the monitored pipeline, and modeling can be carried out based on these data. The detail and accuracy of parameters will directly affect the accuracy of the model, so it is crucial to collect multi-dimensional parameters from the pipeline being inspected (Arifin et al., 2018; Oseni et al., 2023). The statistical method based on machine learning can analyze the collected pipeline sensor data by statistical method, and train the data by machine learning algorithm to obtain the standard mode of normal operation of the pipeline and identify the abnormal mode when pipeline leakage occurs.

At present, various machine learning models are often used to provide standard patterns, so that they have better accuracy and adaptability. In 2020, Ya et al. (2019); Zhang et al. (2021) used a distributed optical fiber temperature system to conduct pipeline leakage detection and localization experiments. For determining the pipeline's leakage condition, the correlation coefficient method and absolute distance method are used to cluster the temperature detection signal, and the selective average threshold of valid leakage point is used. Liu et al. (2023) proposed a multi-dimensional spatial data fusion algorithm based on the acquisition of time-air pipeline leakage signals by DVS (distributed optical fiber vibration sensing technology) system. The average value of the obtained non-leakage fusion signals was used as the alarm threshold to improve the leakage alarm rate and realize multi-point leakage alarm.

However, in practical applications, the temperature and vibration signal fluctuate when the pipeline leaks, so identifying the leak by setting a fixed threshold can cause false positives and false negatives, especially for small leakage problems. With the rise of machine learning, neural networks can perform statistical analysis on signals and learn the features of global networks. They are gradually used in the identification and classification of distributed fiber signals, and good results have been achieved. Abufana et al. (2020) denoised and filtered distributed optical fiber acoustic signals through wavelet denoising and difference time domain method, used variational mode decomposition to extract features such as variance, skewness, and kurtosis of signals and finally classified signals based on linear SVM (support vector machine). Bao et al. (2020) employed a combined endpoint detection and variational mode decomposition approach for extracting time-frequency characteristics from fiber vibration signals, and then three different intrusion signals were identified by SVM model with a recognition accuracy of 98%. Wang et al. (2021) integrated the distributed vibration and temperature system, extracted 6 temperature and 5 vibration characteristic values, and identified the operating state of the pipeline through the random forest model.

However, artificial feature extraction usually increases the computing resources of the recognition algorithm, resulting in low processing efficiency and poor real-time performance. Therefore, more and more studies consider deep learning models to integrate feature extraction and classification recognition to improve the accuracy and real-time performance of the models. In 2023, Li et al. (2023) adopted the CNN-LSTM structure to analyze the temporal and spatial features of the signal, and also used the double cubic reduction to simplify the network structure, and realized the vibration event recognition of the buried distributed fiber optic sensing system. Observations from numerous recent studies indicate that deep learning models typically outperform machine learning models in both recognition accuracy and computational speed, and in terms of model computing efficiency, deep learning also has a faster recognition speed (Lyu et al., 2020; Xie et al., 2022; Zhu et al., 2023). Shi et al. (2020) acquired spatio-temporal signals based on the DVS system, converted the signals into grayscale graphs, extracted signal features, and classified them by 2DCNN-SVM model, with an accuracy rate of 94.17%. Peng et al. (2020) used a DAS (distributed optical fiber acoustic sensing technology) system to monitor acoustic signals from pipelines, and compared the results of shallow un-convolutional neural networks with the CNN smodel. The results showed that the CNN model had higher event type recognition accuracy. Yang et al. (2022) proposed an integrated 1DCNN-VAPSO-SVM model utilizing pipeline acoustic signals for leakage detection. And the parameter combination in SVM was optimized by adopting an amplitude-based parameter adjustment strategy, effectively improve the accuracy of the model. Zhang et al. (2023) proposed AM-LSTM model to identify monitoring signals of time series and realize real-time monitoring and location of pipeline leakage.

Comprehensively considering the research and application status of deep learning models in the field of distributed fiber signal recognition, it can be seen that simultaneous feature extraction and classification recognition of signals through deep learning models is a feasible method to improve the intelligence of signal recognition. In addition, considering the space-time dimension of fiber signals in the model also can effectively improve the signal characterization ability. However, a deep learning model that is too complex will also reduce real-time signal recognition. Therefore, how to optimize the accuracy-real-time performance trade-off in deep learning models is a problem that must be considered to realize intelligent monitoring of pipeline leakage.

In addition, most studies mainly focus on the vibration, strain, temperature and acoustic signals of distributed optical fibers, and few studies mix a variety of signals together to identify pipeline leakage status. In fact, due to the high temperature and pressure of oil in the oil pipeline, the leakage of the pipeline will produce vibration and accompanied by the phenomenon of temperature rise. But when the leakage is small, the minute temperature or vibration variations induced by leakage, making their timely detection challenging using individual signals. In order to solve this problem, the temperature and vibration signals can be considered simultaneously, so that the impact of environmental interference can be reduced, and more information can be extracted to improve the accuracy of leakage identification (Wang et al., 2021).

Taking the above reasons into consideration, this study proposes an intelligent oil pipeline leakage monitoring method based on spatio-temporal signals of the distributed optical fiber vibration and temperature detection systems. The deep residual network is used to characterize the features and states of the vibration and temperature signals of distributed fiber with spatio-temporal dimensions to obtain the pipeline leakage status. This model can quickly process the collected original data, realize intelligent real-time monitoring of pipeline leakage, and accurately locate the leakage point. The main research questions and objectives of this study are as follows.

Research Questions (RQs):

RQ1: How to build an experimental system that can stably obtain high-quality spatial-temporal synchronized temperature-vibration signals?
RQ2: How to enhance the feature extraction ability and anti-interference performance of pipeline leakage and solve the inherent defects of single-signal monitoring?
RQ3: What level of accuracy and localization precision can be achieved with the proposed dual-parameter fusion method?

Research Objectives (ROs):

RO1: A self-developed distributed optical fiber temperature and vibration monitoring experimental platform has been built. A total of 3,530 sets of real-time synchronized spatial and temporal temperature and vibration signals have been successfully collected, which ensures that the data cover the signal characteristics of the pipeline under both normal and different leakage conditions and meet the data quality and quantity requirements of the leakage monitoring model.
RO2: The model is used to extract features from the original vibration signal and temperature signal, respectively. Through decision-level fusion, the advantages of larger amplitude changes in DVS during leakage and a 1-meter accuracy of DTS are combined, thereby breaking the limitations of a single signal.
RO3: A dual-parameter fusion residual neural network structure was constructed, which solved the problems of gradient vanishing and slow convergence during the training process when traditional CNNs learned more complex deep feature models. Eventually, a classification accuracy of 92.16% for pipeline leakage and a leakage location accuracy of 1 m were achieved.

2 Preliminaries

In this section, the fundamental theories, concepts and advantages of distributed optical fiber system, convolutional neural network (CNN), and residual network (ResNet) are briefly outlined.

2.1 Distributed optical fiber system

A distributed optical fiber sensing technique employs a novel sensing technology that directly uses the monitoring optical cables laid in the same trench as pipelines as sensors. It fully exploits the characteristics of continuous spatial distribution in fiber, forming the integration of “transmission” and “sensing,” and enabling the acquisition of physical parameter information at any point along the fiber. It can be used for applications in the fields of various industries including petroleum, petrochemicals, electric power, transportation, and bridge construction. Compared with traditional detection technologies, this technique has the advantages of long measurement distance, continuous distributed measurement, accurate positioning, simple installation, high safety and strong scalability.

There are three types of scattered light that occur when light is transmitted within the fiber, which are Rayleigh scattering, Brillouin scattering, and Raman scattering. Based on these scattering types, the optical fiber can realize the detection of vibration, temperature and other signals. Figure 1 shows the typical architecture of a common distributed optical fiber sensing system.

Figure 1

The DVS system is founded on the principles of Rayleigh scattering and ϕ-OTDR (phase-sensitive optical time domain reflectometry). The optical modulator modulates the continuous light emitted by the narrow linewidth laser with strong coherence into pulsed light in this system. The pulsed light continuously generates coherent back-Rayleigh scattering light modulated by external vibration signals when propagating in the fiber. When the fiber is in a stable state, the variation law of Rayleigh scattering intensity is basically unchanged. But when a certain section of the fiber vibrates, the refractive index and length of the corresponding position will change, leading to a change in the phase of the scattered light from the point of scattering. The change of the relative phase relation of the scattered light will lead to the fluctuation of the intensity of the back-Rayleigh scattered light under the action of light interference, and the change frequency is highly related to the external vibration frequency. Therefore, the vibration information can be obtained by calculating the intensity change of the scattered light, and the optical power can be obtained by the photodetector to detect sensing signals of DVS.

The DTS system is based on the Raman scattering and ϕ-OTDR principles. The properties of Raman scattering are used to measure ambient temperature. After Raman scattering, the incident light can produce two different frequencies of light. The frequency of Stokes light is lower than incident light and its intensity is independent of temperature, while anti-Stokes light possesses a greater frequency compared to incident light and its intensity is affected by temperature. Therefore, the temperature of the region can be obtained by measuring the intensity proportion of Stokes light vs. anti-Stokes light. By using OTDR technology, different temperature change points can be located according to the time difference between the incident light and the backward Raman scattering light and the transmission rate of light in the fiber.

2.2 Convolutional neural networks

The Artificial Neural Network (ANN) serves as a crucial cornerstone within artificial intelligence and machine learning, mirroring the functionality of biological neural networks. To simulate the role of neurons and synapses in biological neural networks, ANN is composed of a large number of nodes, and these nodes transfer information through connections. Each connection has a weight that indicates the strength of the signal transmission. The main goal of ANN is to solve complex problems through learning and training, such as image recognition, object detection, pattern recognition, text classification and so on.

The Convolutional Neural Network (CNN), a prominent deep learning architecture, has emerged as the predominant artificial neural network for leakage monitoring applications due to its hierarchical feature extraction capability through successive convolutional, activation, pooling, and fully-connected layers (Le Cun et al., 1989; Gu et al., 2018).

The convolutional layer employs trainable kernels that convolve with input data for hierarchical feature extraction. The filters slide over the input, calculating the dot product between their weights and local areas of the input, resulting in a feature map containing different aspects of the data. The activation function typically follows each convolutional operation. By using the activation function, nonlinear operations can be introduced into the network model, so as to learn more complex features. Pooling layers perform dimensionality reduction via down-sampling while preserving salient features, thereby enhancing computational efficiency, preventing overfitting, and improving model robustness and generalization. The network architecture terminates with fully connected layers that leverage the extracted feature representations to execute final classification or regression operations.

Because CNN is able to automatically extract and learn features from inputs, this makes it well suited for applications that require high accuracy and robustness, including pipeline leak detection, where the presence or absence of leaks can be discovered by extracting features and patterns in sensor data.

2.3 Residual network

While conventional deep convolutional neural networks theoretically achieve enhanced feature extraction through sequential stacking of convolutional and activation layers, practical implementations often encounter optimization challenges including gradient vanishing and explosion phenomena as network depth increases. As the networks advance to a fairly deep layers, model training becomes more difficult. The phenomenon of gradient disappearance means that during the backpropagation of gradient information, the gradient gradually becomes smaller, resulting in the weight update of the earlier layer becomes very slow, which may cause the network convergence speed to slow down, and even the training failure. To solve these problems, researchers have proposed the residual network (ResNet). Based on the original CNN, the concepts of residual block and skip connection are introduced, in which residual block is the basic building block and core architecture of ResNet. K. He believes that the core idea of the residual network is to directly transfer information from the previous layer to the subsequent layer through skip connection (He et al., 2016). Therefore, the input is added directly to the output through the proposed skip connection, so that the gradient can be propagated more easily through the network. The residual block consists of the mapping part and the residual part, and the mathematical expression is as follows:

Where, h(x_l) represents the mapping part, which is used to raise or decrease dimension, usually in the form of direct mapping or 1 × 1 convolution operation, and F(x_l, W_l) represents the residual part.

The structure of the residuals block is shown in Figure 2a. After the introduction of jump joins, the residual function can be expressed as F(x) = f(x)−x, and when F(x) = 0, an identity map f(x) = x is formed. When the neural network layer is an identity mapping, the residual function to be learned is 0, which reduces the difficulty of model learning, and the identity mapping also alleviates the problem of model degradation.

Figure 2

The residual architecture facilitates direct feature propagation from input to weight layer outputs through skip connections, effectively mitigating signal attenuation in deep networks. This structural design not only alleviates gradient vanishing during backpropagation—thereby preventing model degradation—but also reduces parametric complexity while enabling effective training of deep neural architectures. As illustrated in Figure 2b, conventional residual blocks employ this mechanism to achieve substantial performance improvements through optimized information flow and gradient propagation pathways.

In the actual network construction, by stacking multiple residual blocks, the network can train the deep structure more easily, thus improving the performance of the model.

3 Proposed method

The proposed methodology's architectural overview is illustrated in Figure 3.

Figure 3

With the rapid development of deep learning, CNN, RNN, LSTM, Transformer, and other models have emerged one after another. Each model excels at specific tasks with its unique structure and design principles, however, choosing the right neural network model remains a challenge when faced with different application scenarios. The selection of neural network architecture should be determined based on task requirements, input-output data characteristics, and empirical training performance. The appropriate model structure and parameters often have good accuracy and generalization. The input data of this study are the vibration signal and temperature signal of the pipeline with temporal and spatial characteristics. The integrated analysis of these dual-signal characteristics enables simultaneous pipeline leakage detection and localization, constituting a dual-task learning framework classification and regression. Therefore, we may need to choose a more in-depth and complex model. A residual network is proposed to establish a pipeline leakage monitoring model.

3.1 Experimental platform construction (Findings for RQ1)

To ensure model generalization and prevent overfitting, substantial training datasets are essential for machine learning algorithms. But the fiber optic system in the actual pipeline has few signals in this study. Thus, it is necessary to build an experiment platform with DTS and DVS systems on the pipeline to simulate the operating state (normal operation and leakage state) of the actual pipeline. The DTS and DVS systems are used to collect sufficient temperature and vibration signals of the pipeline under normal operation and leakage state, and fuse these two types of signals in the machine learning model, carry out feature extraction, and complete the classification and regression tasks to obtain the pipeline operation state. The experimental test bench enables controlled leakage simulations to generate extensive datasets, ensuring sufficient training samples for neural network models while minimizing statistical uncertainties.

The DTS and DVS used in this experimental platform are based on an Advantech as their mainboards. The finished product is manufactured by Herch Opto Electronic Technology Co., Ltd. This research employs dual distributed fiber optic sensing systems: a Raman scattering-based temperature monitoring system and a Rayleigh scattering-based vibration detection system, with the experimental setup illustrated in Figure 4.

Figure 4

The DTS system adopts the temperature calibration algorithm, the steps of which are shown in Figure 5. This algorithm can reduce the influence of light scattering and transmission loss, thereby improving the accuracy of this DTS system. The technical specifications of DVS and DTS are respectively shown in Tables 1, 2.

Figure 5

Table 1

Parameters	Specification
Model number	HQ-DVS-0010
Working wavelength	1,550 nm
Monitoring distance	1–10 km
Spatial resolution	±5 m
Frequency range	0.1–2 kHz
Sampling frequency	400 times/min
Response time	≤ 2 s

Technical specifications of DVS system.

Table 2

Parameters	Specification
Model number	HQ-DTS-0010
Monitoring distance	0–30 km
Temperature measurement accuracy	±1 °C
Temperature measurement resolution	0.1 °C
Spatial resolution	1 m
Sampling frequency	27 times/min
Response time	2 s

Technical specifications of DTS system.

The experimental setup consists of a 20-meter-long steel pipeline with 60 mm diameter (Figure 6), featuring multiple artificially created leakage orifices of 2 mm and 3 mm diameters sealed by threaded fasteners along the pipe wall at varying intervals. By changing the diameter and spacing of leakage holes, the influence of pipeline leakage on the monitoring results can be reduced, and the resolution of leakage location of the model can be easily determined. The distributed temperature fiber and the vibration fiber are fixed to the pipeline through cable ties. A water pump is connected to the inlet of the pipeline, and a valve and pressure gauge are installed at the outlet. The water flow pressure in the pipe can be adjusted through the valve to the maximum of 1MPa. During the experiment, the non-leakage state of the pipeline was simulated by sealing the nut at the leak hole, and the leakage of different diameters could be simulated by unscrewing the nut of the leak hole. In addition, in order to enable the model to monitor multiple leaks, a series of experiments with two sets of leak holes were also conducted. Taking into account the temperature variation, the duration of each set of experiments was 120 s.

Figure 6

Although the sampling frequency of distributed fiber is 1 KHz, due to the need for filtering and preprocessing of optical fiber signals, as well as temperature calibration of optical fiber temperature signals, these operations lead to a decrease in the signal output speed of the distributed optical fiber system. Finally, during the experiment, the time length of vibration signal obtained from the distributed fiber vibration system for each sample is 800, and the time length of temperature signal read from the distributed fiber temperature system is 54.

Figure 7 shows the temperature signal and vibration signal of a sampling time point read through the distributed fiber system. The results show simultaneous temperature increase and vibration amplification at the leak location. However, since leakage is a long-time process, the introduction of time dimension enables the model to extract more characteristic quantities of leakage signals and judge the pipeline state more accurately. Interference events caused by changes in temperature and vibration signals caused by human and environmental factors can be better identified from the perspective of data samples. Figure 8 presents the distribution characteristics of different leakage holes along the pipeline length through two dimensions of amplitude and temperature. Different color layers correspond to different combinations of leakage holes (such as single-hole leakage, multiple-hole simultaneous leakage), and can be used to analyze the influence patterns of leakage on the vibration and temperature field of the pipeline. In the vibration amplitude distribution under different leakage hole conditions, the higher the peak value, the stronger the vibration caused by the leakage at that position. The temperature change triggered by the leakage will lead to an abnormal increase in local temperature. For multi-pore leakage, the range and intensity of amplitude and temperature enhancement can be increased.

Figure 7

Figure 8

As RO1 stated, this study collected a total of 3,530 sets of usable signal samples. Table 3 lists the number of signal samples and related parameters. This study collected 1,813 sets of normal operating data and 1,717 sets of leakage operation data. Under the leakage condition, two different diameters of leakage holes were set up, and each leakage hole was equipped with a mechanical nut. The mechanical nut could adjust the size of the leakage volume, and different leakage situations ranging from slight leakage to heavy leakage were collected. This enables a more comprehensive assessment of the model's identification and classification capabilities. Additionally, datasets of single-hole leakage and simultaneous multi-hole leakage were also collected. By locating the leakage holes, the positioning accuracy of the model can be further tested.

Table 3

Signal samples	Number
Normal	1,813
Leakage	1,717
Time	120 s
Length of vibration signal	800
Length of temperature signal	54

The relevant parameters of the collected signal samples.

3.2 Construction of neural network model

The experimental dataset was constructed by extracting all 3,530 synchronized temperature and vibration signal pairs from the distributed optical fiber system's TDMS output files, which were subsequently normalized to consistent dimensions and integrated into a unified JSON format suitable for deep neural network training. Following standard machine learning protocols, the compiled dataset underwent randomized partitioning into three distinct subsets (training, validation, and test sets) as detailed in Table 4, where the training set facilitates model parameter learning, the validation set enables hyperparameter optimization and interim performance assessment, while the held-out test set provides an unbiased evaluation of the model's generalization capability on previously unseen data. This systematic data preparation and partitioning approach ensures rigorous model development and reliable performance estimation.

Table 4

Data set	Number
Training set	1,994
Validation set	768
Test set	768
Total	3,530

Distribution of the dataset.

In this study, the model construction is based on the Pytorch platform. Due to the temporal and spatial characteristics of fiber optic data, the ResNet model architecture consisting of two-dimensional convolution layer is used to extract the features of the data and complete the task of pipeline status identification and location.

Because the signals collected in this study include vibration signal and temperature signal, and the sampling frequency and sampling point interval between the two signals are different, the decision-level fusion is selected for the fusion level when constructing the residual block of the residual neural network, so as to analyze and monitor the two groups of signals in real time and reduce the extra computing resources as much as possible. Finally, the specific model structure is shown in Figure 9.

Figure 9

Numerous hyperparameters and algorithmic components are fine-tuned throughout model architecture design and training to enhance predictive performance.

Batch normalization layers can not only accelerate the training process, but also reduce overfitting by adjusting the inputs for each layer to keep the mean stays close to 0 and the variance stays close to 1.

In addition, regularization can also control model complexity to prevent overfitting. Both L1 and L2 regularization operate by incorporating parameter magnitude (either absolute values or squared terms) into the loss function as penalty terms, effectively constraining model weights to control complexity. In this study, the L2 regularization method, also known as weight decay, was chosen, and the penalty term adopted in this method is the square of the parameter, which is able to make the weight of the model gradually approach 0, as shown in Equation 2.

Where, J(θ) is the loss function with regularization, L(θ) is the original loss function, α is the regularization intensity, and w_i is the weight of the model.

The loss function plays a critical role in neural networks by measuring the discrepancy between model predictions and actual target values. Selecting an appropriate loss function significantly enhances both training effectiveness and model performance. For classification problems, cross-entropy loss is widely adopted due to its effectiveness in quantifying inter-class prediction errors and facilitating efficient parameter optimization. The cross-entropy loss is defined as follows:

Where, L(y, p) represents the cross-entropy loss, and p_i is the probability that the model prediction output belongs to category i.

The evaluation metrics of the regression model are different from those of classification tasks. Regression analysis deals with continuous numerical predictions, and traditional accuracy metrics are not applicable in this case. The standard evaluation criteria for regression performance include mean squared error (MSE), mean absolute error (MAE), and coefficient of determination (R²). In this study, MAE is adopted as the main loss function, which calculates the arithmetic mean of the absolute differences between the predicted values and the actual values. Its mathematical formula is as follows:

The choice of optimization method significantly impacts parameter adjustment efficiency and convergence speed during model training. Among various optimizers like stochastic gradient descent (SGD), RMSprop, and AdaGrad, this research employs the Adam algorithm. Adam integrates momentum-based gradient descent with RMSprop's adaptive learning rate approach, dynamically modifying individual parameter learning rates through first-order (momentum term) and second-order (RMSprop term) gradient moment estimations.

The parameter update method of Adam algorithm is as follows (Kingma and Ba, 2014; Zaheer et al., 2018):

(1) Calculate the first moment estimation m_t and second moment estimation (RMSprop term) v_t corresponding to gradient g_t.

(2) Correct the deviation between the first and second moment estimates:

(3) Update parameters

Where, θ is the model parameter, η is the learning rate, and ϵ is a constant that usually takes the value of 1e−8.

4 Results and discussion

4.1 Model results

Model optimization is achieved through iterative training. The learning process essentially minimizes the objective function through an iterative process. Training begins with parameter initialization, followed by alternating forward and backward calculation cycles. In the forward calculation process, the input data propagates through the network layers to generate prediction results. Subsequently, the backward calculation computes the gradients of the loss with respect to all parameters, enabling the adjustment of weights, gradually reducing errors and improving model performance. The complete training algorithm is as follows.

(1) Initialize weights and biases. The weights and biases of nodes in the model need to be initialized before the training begins.
(2) Forward propagation. Forward calculations are performed based on the model structure and parameters to obtain the output of the model.

Where, a_l denotes the output of the l_th layer, ω_l refers to the weight of the l_th layer, b_l refers to the bias of the l_th layer, and g(·) is the activation function. The common activation functions are Tanh, Sigmod, and ReLu functions.

(3) Calculate the loss. The loss function is calculated based on the forward calculation result of the model and the actual label.

Where, m represents the number of samples, L(·) refers to the loss function, y⁽ⁱ⁾ and ŷ⁽ⁱ⁾ represents the actual label and predicted result of the i_th sample, respectively.

(4) Backward propagation. The gradient of the model parameters can be calculated from the loss function through the following formulas.

(5) Parameter update. The optimization algorithm is adopted to update the model parameters, so that the parameters are optimized along the direction of decreasing the loss function.

During model training, steps (2)–(5) are executed cyclically until either loss convergence or predefined termination conditions are satisfied. This iterative optimization process systematically improves the model's ability to learn from training data.

This study evaluated various neural network architectures, with their respective accuracies presented in Table 5. As can be seen from Table 5, when the fused signal is the dataset, the classification accuracy of ResNet is the highest. When the single-source signal is the dataset, the classification accuracy of the same model with the fused signal is slightly higher. It can be concluded that the input of the fused signal into the ResNet model is the optimal model for this study. The comparative analysis reveals that the ResNet-based fusion model achieved superior performance, reaching a peak classification accuracy of 92.16%.

Table 5

Model	Signal	Accuracy
RBF-SVM	Fusion signal	82.44%
Random Forest	Fusion signal	85.04%
2DCNN	Fusion signal	89.36%
1DCNN+LSTM	Fusion signal	91.12%
ResNet	Temperature signal	78.24%
	Vibration signal	84.61%
	Fusion signal	92.16%

Training results of different model structures.

Model optimization effectiveness and computational efficiency are highly dependent on the selected training algorithm. To select the most appropriate optimizer, this study compared the performance of different optimizers during the training of the same model for 100 epochs, as shown in Figure 10.

Figure 10

Comparative results in Table 6 demonstrate that the Adam optimizer achieves superior performance in both accuracy and computational efficiency for the ResNet fusion model among various optimization algorithms tested.

Table 6

Optimize algorithm	Accuracy	Training time
Adam	92.16%	22,858.4 s
AdaGrad	90.27%	29,545.1 s
AdaDelta	91.45%	23,678.9 s
SGD	90.42%	32,541.8 s
RMSProp	91.22%	27,154.2 s

The accuracy and training time of each optimization algorithm.

Through validation set evaluation, the fusion model's optimal architecture and parameters were determined, with results presented in Table 7.

Table 7

Model		Value
Structure	Conv2d	1 × 800 × 86/1 × 54 × 43
	Residual block1	16 × 800 × 86/16 × 54 × 43
	Residual block2	32 × 400 × 43/32 × 27 × 21
	Residual block3	64 × 200 × 22/64 × 13 × 10
	AvgPool2d	64 × 200 × 22/64 × 13 × 10
	FC	64
	Classification FC	2
	Regression FC	5
Parameter	lr	0.001
	Epoch	100
	Batchsize	64
	Dropout	0.5

The fusion model structure and parameters.

The performance of the optimized model is illustrated in Figure 11a, which displays both classification accuracy and loss for leak type identification, while Figure 11b depicts the regression loss results. In the classification task, as the number of training rounds increases, the classification loss drops rapidly, then fluctuates slightly at a lower level, the prediction error keeps decreasing, and eventually stabilizes, indicating that the model gradually converges in the classification task. The classification accuracy rose rapidly in the early stage of training and then gradually stabilized. As the number of training rounds increases, the regression loss drops rapidly and then fluctuates within a lower range. This indicates that the model's prediction accuracy in the regression task keeps improving, gradually optimizing from the initial large deviation, and stabilizing later, suggesting that the model also converges in the regression task.

Figure 11

The regression loss stabilizes near 1 after sufficient training iterations. Given that MAE measures the absolute difference between predictions and true values, this result suggests a mean positioning error of 1 meter, demonstrating the model's capability for meter-level leak detection. Implementing a helical fiber arrangement could enhance this accuracy even more.

We conducted a statistical test on the classification accuracy using the McNemar test. We sorted MAE results from 5-fold cross-validation and took the 2.5% and 97.5% percentiles as the 95% confidence interval. The specific results are as follows.

Based on the experimental data and through the McNemar test, the p-value is less than 0.05, indicating a significant difference between the two models. The MAE confidence interval of Resnet is [0.9107, 0.9526], while that of 1DCNN + LSTM is [1.0024, 1.1195]. Since the intervals do not overlap, it can be directly concluded that Resnet has better performance.

4.2 Dual-signal feature extraction and fusion (Findings for RQ2)

The single temperature signal has low sensitivity to minor leaks and is easily affected by environmental temperature variations, leading to incorrect judgments; the single vibration signal is susceptible to pipeline operation noise and difficult to distinguish between leakage vibrations and normal operating vibrations. Moreover, both single signal methods cannot fully extract the spatiotemporal correlation features of the leakage event, resulting in low accuracy of leakage identification. As RO2 pointed out, the core of integrating temperature signals with vibration signals lies in making these signals complement each other and work together to optimize. Through feature collaboration, leakage conditions can be identified, errors in individual signals can be avoided, and the integrity of feature extraction can be fundamentally improved.

In this study, residual networks were used to extract features from the original vibration signals and temperature signals separately, effectively learning and extracting the important features of the signals. The output of the residual network can yield two types of features: classification features and regression features. The classification features are used to classify the input signals, that is, to determine whether the signal belongs to the leakage category; while the regression features are used to locate the abnormal leakage points in the input signal, that is, to find the location where the leakage occurs. Through the decision-level fusion of the model, the classification features and regression features are concatenated to form a fused feature vector. The feature vector is classified through the fully connected layer to determine the state category of the signal. The fused feature vector is regressed through the linear layer to predict the final regression value, which is used for signal positioning. Through this deep learning model, this study can achieve intelligent state classification and positioning of the input signals without manual feature selection. However, the regression task usually does not use accuracy as an evaluation metric because the regression problem is the prediction of continuous numerical values and there is no concept of “correct” or “wrong” prediction. Therefore, the metric used to evaluate the performance of the regression model in this study is the mean absolute error. The output result of the regression task is the distance from the leakage location to the starting point of the pipeline. Figure 12 is the fitting graph for pipeline leakage location, which enables a clear visualization of the data distribution and the deviation between the model prediction and the actual values. In this study, the R² value is 0.9857, indicating that the model positioning accuracy is quite good.

Figure 12

4.3 Potential applications

The above model provides a foundation for the construction of the monitoring system, enabling real-time monitoring and early warning of pipeline leaks. Moreover, by further integrating blockchain technology (Ressi et al., 2024), it can achieve automated alerts and maintenance workflows, avoiding the aggravation of pipeline corrosion, medium loss and environmental pollution caused by leakage, extending the effective lifespan of the pipeline, and also predicting potential pipeline failures (Doshmanziari et al., 2020), achieving the upgrade from “repair after failure” to “prevention before failure.” Through the closed loop of “physical entity—digital model—data interaction—decision feedback,” the full state digital mapping of the physical pipeline is realized (Grieves, 2005; Zhou et al., 2024; Huang et al., 2023).

5 Conclusions and prospect (Findings for RQ3)

In this study, temperature and vibration signals of distributed fiber optic systems under normal and leakage conditions were collected by the pipeline leakage experiment platform. In order to obtain more characteristic information about the pipeline operation time, as RO3 stated, this study proposes a residual network structure that integrates two parameters. This structure can simultaneously input the corresponding original optical fiber temperature and vibration signals. The two-dimensional convolution layer in the network model can extract and identify the temporal and spatial features of the signal, reduce manual participation, and realize the intelligent leakage monitoring of the oil pipeline. The training result shows that the ResNet model based on Adam optimization algorithm can achieve 92.16% leak identification accuracy, and the leak location accuracy reaches 1 meter. Based on this model, the comprehensive and real-time leakage identification and early warning of oil pipelines can be realized timely and accurately.

During our data collection process, due to the limitations of the experimental platform, it was not possible to simulate more complex service environments. In the future, we will further enrich our dataset in terms of pipeline length, pipe diameter, environmental interference, and the comparison of signals in straight and curved pipes. The research on vibration wave feature recognition methods will further expand the application boundaries and improve the structure and processing logic of the model, thereby enhancing the accuracy and reliability of the leakage assessment mechanism. In addition, because the viscosity of crude oil is much higher than that of water, it may cause significant differences in the flow velocity distribution and leakage volume within the pipeline. The experiment collected data without considering soil and weather conditions. The high moisture content of clay would enhance the attenuation of electromagnetic signals, resulting in errors in pipeline leakage location. Heavy rain weather would cause a sudden increase in soil moisture content, leading to baseline drift of the sensors. Low temperatures would freeze the outer walls of the pipelines, affecting the accuracy of temperature sensor data. The existing model training data covers a relatively short pipeline length, and an increase in pipeline length may lead to signal transmission attenuation. Verifying the generalization performance of the model in different scenarios of fluids, soil, weather, and pipeline lengths is necessary. Determining the applicable boundaries of the model can provide a basis for subsequent optimization. The specific verification plan can be found in Appendix A.

In the next 5 to 10 years, these three elements will form a deep synergy: Large-scale deployment will rely on IoT/micro-edge ubiquitous connections to achieve full-scenario coverage. The environmental robustness will break through the limitations of scenarios through transfer learning and edge adaptive algorithms, ultimately building an intelligent monitoring system of “deployment as a service, perception as decision-making, and environment as adaptation,” and achieving a paradigm shift from “passive response” to “active warning” in fields such as energy and municipal services (see Tables B1–B3). Furthermore, in order to trade-offs in model complexity and real-time application, we can structurally prune and quantize the model to make it more lightweight, reducing computational costs while maintaining accuracy. For embedded hardware commonly used in industry, the convolution operations in the residual network can be converted into pipelined calculations that can be processed in parallel by FPGA, or the CUDA cores of GPU can be utilized to accelerate spatiotemporal feature fusion. This enables the model to improve inference speed on low-cost hardware and increase the speed of real-time response for different types of events in low-load scenarios.

Statements

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

XL: Writing – original draft, Conceptualization, Visualization, Resources, Validation, Methodology. YD: Data curation, Supervision, Validation, Writing – review & editing. YW: Resources, Validation, Writing – review & editing. HL: Data curation, Resources, Writing – review & editing. WM: Validation, Data curation, Writing – review & editing. KW: Writing – review & editing, Validation, Data curation. JR: Resources, Writing – review & editing, Data curation. RM: Validation, Data curation, Writing – review & editing. SZ: Data curation, Writing – review & editing, Resources. JL: Validation, Writing – review & editing, Data curation. WW: Methodology, Supervision, Writing – review & editing, Funding acquisition.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. The research was supported by the CNPC Science and Technology Project “Research and Development of Corrosion Resistant Materials for Extreme Environments (No. 2023ZZ11-02).”

Conflict of interest

YD, YW, and HL were employed by Hancheng Gas Production Management Area of PetroChina Coalbed Methane Co., Ltd.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1
AbbaspourM.ChapmanK. S. (2008). Nonisothermal transient flow in natural gas pipeline. J. Appl. Mech.75:031018. doi: 10.1115/1.2840046
- CrossRef
- Google Scholar
2
AbufanaS. A.DalverenY.AghnaiyaA.KaraA. (2020). Variational mode decomposition-based threat classification for fiber optic distributed acoustic sensing. IEEE Access8, 100152–100158. doi: 10.1109/ACCESS.2020.2997941
- CrossRef
- Google Scholar
3
ArifinB. M. S.LiZ.ShahS. L.MeyerG. A.ColinA. (2018). A novel data-driven leak detection and localization algorithm using the Kantorovich distance. Comput. Chem. Eng.108, 300–313. doi: 10.1016/j.compchemeng.2017.09.022
- CrossRef
- Google Scholar
4
BaoJ.MoJ.XuL.LiuY.LvX. (2020). VMD-based vibrating fiber system intrusion signal recognition. Optik205:163753. doi: 10.1016/j.ijleo.2019.163753
- CrossRef
- Google Scholar
5
DattaS.SarkarS. (2016). A review on different pipeline fault detection methods. J. Loss Prev. Process Indus.41, 97–106. doi: 10.1016/j.jlp.2016.03.010
- CrossRef
- Google Scholar
6
DoshmanziariR.KhaloozadehH.NikoofardA. (2020). Gas pipeline leakage detection based on sensor fusion under model-based fault detection framework. J. Pet. Sci. Eng.184:106581. doi: 10.1016/j.petrol.2019.106581
- CrossRef
- Google Scholar
7
FukushimaK.MaeshimaR.KinoshitaA.ShiraishiH.KoshijimaI. (2000). Gas pipeline leak detection system using the online simulation method. Comput. Chem. Eng.24, 453–456. doi: 10.1016/S0098-1354(00)00442-7
- CrossRef
- Google Scholar
8
GrievesM. W. (2005). Product lifecycle management: the new paradigm for enterprises. Int. J. Prod. Dev.2, 71–84. doi: 10.1504/IJPD.2005.006669
- CrossRef
- Google Scholar
9
GuJ.WangZ.KuenJ.MaL.ShahroudyA.ShuaiB.et al. (2018). Recent advances in convolutional neural networks. Pattern Recognit.77, 354–377. doi: 10.1016/j.patcog.2017.10.013
- CrossRef
- Google Scholar
10
HeK.ZhangX.RenS.SunJ. (2016). “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (Las Vegas, NV: IEEE), 770–778.
- Google Scholar
11
HuangY.TaoJ.SunG.WuT.YuL.ZhaoX. (2023). A novel digital twin approach based on deep multimodal information fusion for aero-engine fault diagnosis. Energy270:126894. doi: 10.1016/j.energy.2023.126894
- CrossRef
- Google Scholar
12
KingmaD. P.BaJ. (2014). Adam: a method for stochastic optimization. arXiv [Preprint]. arXiv.1412.6980. doi: 10.48550/arXiv.1412.6980
- CrossRef
- Google Scholar
13
Le CunY.BoserB.DenkerJ. S.HendersonD.HowardR. E.HubbardW.et al. (1989). “Handwritten digit recognition with a back-propagation network,” in Proceedings of the 2nd International Conference on Neural Information Processing Systems (Cambridge, MA: MIT Press), 396–404.
- Google Scholar
14
LeinovE.LoweM. J. S.CawleyP. (2016). Ultrasonic isolation of buried pipes. J. Sound Vib.363, 225–239. doi: 10.1016/j.jsv.2015.10.018
- CrossRef
- Google Scholar
15
LiY.ZengX.ShiY. (2023). A spatial and temporal signal fusion based intelligent event recognition method for buried fiber distributed sensing system. Optics Laser Technol.166:109658. doi: 10.1016/j.optlastec.2023.109658
- CrossRef
- Google Scholar
16
LiuZ.ShangY.WangC.ZhaoW.LiC. (2023). Pipeline leakage monitoring technology of distributed optical fiber vibration based on multi-dimensional spatial data fusion algorithm. Laser Optoelectron. Prog.60:0928002. doi: 10.3788/LOP220964
- CrossRef
- Google Scholar
17
LuH.IseleyT.BehbahaniS.FuL. (2020). Leakage detection techniques for oil and gas pipelines: State-of-the-art. Tunn. Undergr. Space Technol.98:103249. doi: 10.1016/j.tust.2019.103249
- CrossRef
- Google Scholar
18
LyuC.HuoZ.ChengX.JiangJ.AlimasiA.LiuH. (2020). Distributed optical fiber sensing intrusion pattern recognition based on GAF and CNN. J. Lightwave Technol. JLT38, 4174–4182. doi: 10.1109/JLT.2020.2985746
- CrossRef
- Google Scholar
19
MostafapourA.DavoudiS. (2013). Analysis of leakage in high pressure pipe using acoustic emission method. Appl. Acoust.74, 335–342. doi: 10.1016/j.apacoust.2012.07.012
- CrossRef
- Google Scholar
20
MujtabaS. M.LemmaT. A.TaqviS. A. A.OfeiT. N.VandrangiS. K. (2020). Leak detection in gas mixture pipelines under transient conditions using hammerstein model and adaptive thresholds. Processes8:474. doi: 10.3390/pr8040474
- CrossRef
- Google Scholar
21
OseniW. Y.AkangbeO. A.AbhulimenK. (2023). Mathematical modelling and simulation of leak detection system in crude oil pipeline. Heliyon9:e15412. doi: 10.1016/j.heliyon.2023.e15412
22
PengZ.PengZ.JianJ.WenH.GribokA.WangM.et al. (2020). Distributed fiber sensor and machine learning data analytics for pipeline protection against extrinsic intrusions and intrinsic corrosions. Opt. Express OE28, 27277–27292. doi: 10.1364/OE.397509
23
RessiD.RomanelloR.PiazzaC.RossiS. (2024). AI-enhanced blockchain technology: a review of advancements and opportunities. J. Netw. Comput. Appl.225:103858. doi: 10.1016/j.jnca.2024.103858
- CrossRef
- Google Scholar
24
ShiY.WangY.WangL.ZhaoL.FanZ. (2020). Multi-event classification for Φ-OTDR distributed optical fiber sensing system using deep learning and support vector machine. Optik221:165373. doi: 10.1016/j.ijleo.2020.165373
- CrossRef
- Google Scholar
25
WangF.LiuZ.ZhouX.LiS.YuanX.ZhangY.et al. (2021). (INVITED)oil and gas pipeline leakage recognition based on distributed vibration and temperature information fusion. Results Optics5:100131. doi: 10.1016/j.rio.2021.100131
- CrossRef
- Google Scholar
26
WangX.LinJ.KeramatA.GhidaouiM. S.MeniconiS.BrunoneB. (2019). Matched-field processing for leak localization in a viscoelastic pipe: an experimental study. Mech. Syst. Signal Process.124, 459–478. doi: 10.1016/j.ymssp.2019.02.004
- CrossRef
- Google Scholar
27
WuT.DengZ.ShenL.XieZ.ChenY.LiuC.et al. (2023). Research progress of long-distance oil pipeline leakage monitoring technology. Oil Gas Storage Transp. 42, 259–275. doi: 10.6047/j.issn.1000-8241.2023.03.003
- CrossRef
- Google Scholar
28
XieQ.TaoG.HeB.WenZ. (2022). Rail corrugation detection using one-dimensional convolution neural network and data-driven method. Measurement200:111624. doi: 10.1016/j.measurement.2022.111624
- CrossRef
- Google Scholar
29
YaZ.QiangW.ZhangweiL. (2019). Experimental analysis and leakage location detection of tap water pipe based on distributed optical fiber with selective average threshold. Laser Optoelectron. Prog.56:30602. doi: 10.3788/LOP56.030602
- CrossRef
- Google Scholar
30
YangD.HouN.LuJ.JiD. (2022). Novel leakage detection by ensemble 1DCNN-VAPSO-SVM in oil and gas pipeline systems. Appl. Soft Comput.115:108212. doi: 10.1016/j.asoc.2021.108212
- CrossRef
- Google Scholar
31
ZaheerM.ReddiS.SachanD.KaleS.KumarS. (2018). “Adaptive methods for nonconvex optimization,” in Advances in Neural Information Processing Systems (Curran Associates, Inc.), 9815–9825. Available online at: https://papers.nips.cc/paper/2018/hash/90365351ccc7437a1309dc64e4db32a3-Abstract.html (Accessed September 19, 2024).
- Google Scholar
32
ZhangX.ShiJ.YangM.HuangX.UsmaniA. S.ChenG.et al. (2023). Real-time pipeline leak detection and localization using an attention-based LSTM approach. Process Safety Environ. Prot.174, 460–472. doi: 10.1016/j.psep.2023.04.020
- CrossRef
- Google Scholar
33
ZhangZ.WangQ.GuX.ZhaoY.WuL.ZhuK. (2021). Analysis on underground water pipes multi-point leakage location method based on distributed optical fiber. J. Appl. Optics41, 228–234. doi: 10.5768/JAO202041.0108002
- CrossRef
- Google Scholar
34
ZhouT.ZhangX.KangB.ChenM. (2024). Multimodal fusion recognition for digital twin. Digit. Commun. Netw.10, 337–346. doi: 10.1016/j.dcan.2022.10.009
- CrossRef
- Google Scholar
35
ZhuC.YangK.YangQ.PuY.ChenC. L. P. (2023). A comprehensive bibliometric analysis of signal processing and pattern recognition based on distributed optical fiber. Measurement206:112340. doi: 10.1016/j.measurement.2022.112340
- CrossRef
- Google Scholar
36
ZuoJ.ZhangY.XuH.ZhuX.ZhaoZ.WeiX.et al. (2020). Pipeline leak detection technology based on distributed optical fiber acoustic sensing system. IEEE Access8, 30789–30796. doi: 10.1109/ACCESS.2020.2973229
- CrossRef
- Google Scholar

Appendix A. On-site verification plan

Table A1

Verification dimension	Scene type	Quantity/Scale
Fluid type	Water, crude oil	Two verification pipelines each
Soil conditions	Clay (with moisture content of 25%) and sandy soil	Each consisting of 3 verification points
Weather conditions	Sunny weather, light rain (< 20mm/h), heavy rain (> 50mm/h)	Each lasts for 3 days of verification
Pipeline length	1 kilometers (short), 2 kilometers (medium), 3 kilometers (long)	One verification pipeline each

Verification object and scenario design.

Appendix B. Structured outlook (deployment, IoT/edge, environmental robustness)

Table B1

Stage	Objective	Key task	Time
Pilot verification	Verify the feasibility of the core scenario	Select 3 to 5 typical scenarios (oil pipeline + sandy soil), and complete the model and hardware compatibility testing	1 to 2 years
Regional promotion	Formulate a standardized deployment plan	Establish hardware installation guidelines and model parameter adjustment manuals, and replicate them in the same type of areas	2 to 3 years
Cross-domain scalability	Covering multiple scenarios and the entire chain link	Integrate the transfer learning module to achieve rapid deployment across scenarios for fluids, soils, and lengths, and form an industry solution.	3 to 5 years

Stage-based deployment path.

Table B2

Level	Current status	Mid-term goals (3–5 years)	Long-term goals (5–10 years)
Edge node	With a single function (such as data collection)	Integrated with lightweight AI chips, supporting local inference	Possessing self-repairing capability (automatically switching to the backup module in case of hardware failure)
Internet of things network	Wireless as a supplement to wired (LoRa)	5G slice dedicated network coverage, transmission delay ≤ 50ms	6G ubiquitous connectivity, supporting ultra-long-range edge collaboration up to 100 kilometers
Cloud-edge collaboration	Periodic model update (monthly)	Real-time Federated Learning (Hourly Parameter Synchronization)	Self-evolutionary collaboration (where edge nodes autonomously initiate model optimization requests)

Internet of things/evolution path of edge technologies.

Table B3

Environmental dimension	Current challenges	Breakthrough technology	Expected outcome
Extreme climate	Heavy rain or intense heat exposure	For the pipe optical fibers, a double-layer sheath structure is adopted, and a real-time signal calibration algorithm is employed.	The optical fiber signal attenuation is controlled within 0.2 dB/km, and the failure rate is reduced to below 0.1% per year.
Complex geology	The signal attenuation in the clay zone leads to inaccurate positioning.	Adaptive signal enhancement algorithm (adjusts the sensor's transmission power dynamically according to soil resistivity)	Reduction of positioning error
Fluid diversity	Interference of crude oil impurities on model predictions	Fluid feature alignment module based on transfer learning (real-time correction of density / viscosity to mitigate the impact on the model)	The accuracy rate of cross-fluid type prediction remains above 90%.

Prospects for environmental resilience.

Summary

Keywords

deep learning, safety pre-warning, distributed fiber optic sensing system, leakage monitoring, oil pipeline

Citation

Liang X, Deng Y, Wang Y, Li H, Ma W, Wang K, Ren J, Ma R, Zhang S, Liu J and Wu W (2025) Intelligent leak monitoring of oil pipeline based on distributed temperature and vibration fiber signals. Front. Big Data 8:1667284. doi: 10.3389/fdata.2025.1667284

Received

16 July 2025

Accepted

31 October 2025

Published

20 November 2025

Volume

8 - 2025

Edited by

Jingjing Deng, Durham University, United Kingdom

Reviewed by

Sabina Rossi, Ca' Foscari University of Venice, Italy

Yuxing Duan, Hubei University of Technology, Wuchang University of Technology, China

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Wei Wu, wuwei@nwu.edu.cn

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Machine Learning and Artificial Intelligence

ORIGINAL RESEARCH article

Intelligent leak monitoring of oil pipeline based on distributed temperature and vibration fiber signals

Abstract

1 Introduction

2 Preliminaries

2.1 Distributed optical fiber system

2.2 Convolutional neural networks

2.3 Residual network

3 Proposed method

3.1 Experimental platform construction (Findings for RQ1)

3.2 Construction of neural network model

4 Results and discussion

4.1 Model results

4.2 Dual-signal feature extraction and fusion (Findings for RQ2)

4.3 Potential applications

5 Conclusions and prospect (Findings for RQ3)

Statements

Data availability statement

Author contributions

Funding

Conflict of interest

Generative AI statement

Publisher’s note

References

Appendix A. On-site verification plan

Appendix B. Structured outlook (deployment, IoT/edge, environmental robustness)

Summary

Outline

Figures

Cite article

Article metrics

ORIGINAL RESEARCH article

Intelligent leak monitoring of oil pipeline based on distributed temperature and vibration fiber signals

Abstract

1 Introduction

2 Preliminaries

2.1 Distributed optical fiber system

2.2 Convolutional neural networks

2.3 Residual network

3 Proposed method

3.1 Experimental platform construction (Findings for RQ1)

3.2 Construction of neural network model

4 Results and discussion

4.1 Model results

4.2 Dual-signal feature extraction and fusion (Findings for RQ2)

4.3 Potential applications

5 Conclusions and prospect (Findings for RQ3)

Statements

Data availability statement

Author contributions

Funding

Conflict of interest

Generative AI statement

Publisher’s note

References

Appendix A. On-site verification plan

Appendix B. Structured outlook (deployment, IoT/edge, environmental robustness)

Summary

Outline

Figures

Cite article

Share article

Article metrics