An ensemble data-driven method for fault detection and diagnosis of digital control systems in nuclear power plants

Lei, Baimao; Tian, Bohao; Yao, Yingrong; Jiang, Chenyu; Yang, Jun

doi:10.3389/fnuen.2025.1714098

ORIGINAL RESEARCH article

Front. Nucl. Eng., 08 January 2026

Sec. Nuclear Safety

Volume 4 - 2025 | https://doi.org/10.3389/fnuen.2025.1714098

This article is part of the Research TopicAdvancing Nuclear Engineering through Artificial Intelligence and Machine LearningView all 4 articles

An ensemble data-driven method for fault detection and diagnosis of digital control systems in nuclear power plants

Baimao Lei^1,2

Bohao Tian³

Yingrong Yao³

Chenyu Jiang^1,2

Jun Yang³*

¹China Electronic Product Reliability and Environmental Testing Research Institute, Guangzhou, China
²Guangdong Provincial Key Laboratory of Electronic Information Products Reliability Technology, Guangzhou, China
³School of Electric Power Engineering, South China University of Technology, Guangzhou, China

Fault detection and diagnosis (FDD) is essential for maintaining safety and preventing hazardous situations in industrial process control. Effective fault diagnosis allows for the timely detection and correction of anomalies, preventing potential disruptions and maintaining optimal performance. In the paper, we present a unified framework for fault detection and diagnosis by combining the real-time sensitivity of the moving window particle filtering (PF) with the diagnostic precision of the generalized likelihood ratio test (GLRT). Within the framework, the particle filtering is integrated to provide accurate real-time state monitoring and prediction in scenarios with nonlinear digital control system dynamics and non-Gaussian noise. The moving window (MW) is adopted to identify anomalous patterns within a stream of data by focusing on a fixed-size segment that moves across the data. The GLRT is then used to isolate the specific type of fault that has occurred based on the observed data and the different fault hypotheses and models. The method is demonstrated with a digital U-shaped tube steam generator water level control system in pressurized water reactor nuclear power plants. Comparative studies have also been conducted with LSTM network to demonstrate the effectiveness and superiority of the proposed PF-based MW-GLRT method. The demonstration results show that the proposed PF-based MW-GLRT framework can provide a robust and efficient solution for identifying and characterizing faults in complex digital control systems.

1 Introduction

Industrial process control is the application of control theory, instrumentation, and automation technologies to manage, regulate, and optimize industrial processes. In recent years, the roles and capabilities of process control systems designated for industrial applications such as Nuclear Power Plants (NPPs) have been further enhanced with the rapid pace of digital technology updates (Lee et al., 2019). The digital upgrades have high potential to bring about enhancement on system operational safety and performance, but new challenges are also posed for the digital control systems due to the complex non-linear dynamics, strongly-coupled physical-process interactions problems, and resilience on fault-tolerant mechanisms and software, etc (IAEA, 2009). The anomalies and failures that may occur in digital control systems can be devastating for the safe operation of nuclear power plants. For nonlinear digital instrumentation and control systems in NPPs, the residuals response under different operational conditions is highly complex and condition-dependent due to uncertainties and variability in system behavior. Additionally, the behavior of digital control systems is not simply proportional to input changes, which makes it difficult to isolate the source of any anomaly and to predict how a fault will propagate through the system. Fault detection and diagnosis can act as a vital mechanism for enhancing the safety and efficiency of digital control systems. By identifying and addressing faults early, Fault Detection and Diagnosis (FDD) helps prevent potential failures, minimizes downtime, and optimizes system performance.

Fault detection and diagnosis (Park et al., 2020) is defined as the systematic process of identifying when a system’s behavior deviates from its expected or nominal performance (fault detection) and pinpointing the specific causes or locations of those deviations (fault diagnosis). It involves identifying and locating faults in system process by integrating information from diverse data streams, including sensor measurements, actuator signals, and mathematical or physics-based models, as well as contextual knowledge about normal operating conditions (Dreven et al., 2023). Fault detection aims for determining whether a fault or abnormality has occurred or not based on the signal acquisition and limit checking. Fault diagnosis is more specifically in pinpointing where and what size of the fault and even sometimes with the estimation of fault occurrence time. Early-stage fault detection and diagnosis systems primarily employed model-based reasoning to identify, locate, and characterize faults in engineering applications (Roozbeh et al., 2009; Atoui and Cohen, 2021). Conventional FDD approaches rely on statistical analyses, signal processing, and pattern recognition to distinguish normal from abnormal behavior using historical or real-time sensor data. Common methods include statistical analysis, Principal Component Analysis (PCA), cluster analysis, and pattern recognition, etc (Maican et al., 2025).

Meanwhile, the recent advances in machine learning and artificial intelligence have extended the FDD framework from original fault detection, diagnosis to the prognosis analysis with various new emerging technologies (Li et al., 2020). For example, Yang et al. (2018) proposed a deductive method integrated with simulation-based fault injection and testing for diagnostic analysis of digital instrumentation and control systems. A feasibility study was carried out by Nadesalingam and Towill. (1978) for frequency domain fault detection and diagnosis in hybrid control systems. Wang et al., (2025) presented an interpretability study of a typical fault disagnosis model for nuclear power plant primary circuit based on a graph neural network. S. Simani et al. (2003) introduced the model-based fault diagnosis in dynamic systems using identification techniques. Alsaif et al. (2024) discussed the multimodal large language model-based methods for fault detection and diagnosis in context of industry 4.0. G. B. Jonne et al. (2022) developed a fault diagnostic and prognosis system based on digital twin and blockchain. An enhanced methodology of fault detection and diagnosis was also proposed by Abdoune et al. (2022) and Zayed et al. (2023) based on digital twin and machine learning models. Li et al. (2022) carried out fault diagnosis of control system based on neural network data fusion. Xu (2023) presented an intelligent fault detection approach for digital integrated circuits through graph neural network. A framework is proposed by Melani et al. (2021) based on moving window principal component analysis and Bayesian network, where expert knowledge is translated into Bayesian networks for automata fault detection and diagnosis in dynamic systems. A hybrid model augmenting deep neural networks with kernel principal component analysis was proposed by A. Kopbayev et al. (2022) to enhance the accuracy and generality of fault detection and diagnosis in digitalized process system. M. Davoodi et al. discussed integrated fault diagnosis and control design of linear complex systems in their recent publication (Davoodi et al., 2018). A systematic description of the many facets of fault diagnosis and failure prognosis for complex engineering systems was presented by Karimi (2021). A survey of fault diagnosis and fault tolerant operation for a variety of engineering systems was provided by Gao et al (2015), where fault diagnosis techniques and their applications have been comprehensively reviewed from model-based and signal-based perspectives, respectively. A. Mouzakitis classified the fault diagnosis methods for control systems into three types: model-based, hardware-based and history-based fault diagnoses (Mouzakitis, 2013). Dai and Gao, (2013) presented a full picture of fault detection and diagnosis in complex systems from the perspectives of model, signal to knowledge. More review works concerning fault detection and diagnosis techniques can be found in References (Bid et al., 2021; Qiu et al., 2023).

Overall, the wide variety of methods and tools that are used for fault detection and diagnosis can be grouped as model-based, knowledge-based, signal-based, and data-driven approaches (Leite et al., 2025). The model-based FDD methods (Isermann, 2005) utilize explicit mathematical or physical models of the system to compare predicted behavior with actual data, thereby identifying discrepancies that may indicate faults. The model-based fault detection and diagnosis methods can provide strong interpretability and pinpoint fault causes by analyzing residuals effectively with well-understood, physics-driven systems. But the diagnostic results are sensitive to modeling errors and may be susceptible to unmodeled dynamics or complex nonlinearities. The knowledge-based FDD methods (Xu, 2019) encode domain expertise in rules, ontologies, or reasoners (e.g., if-then rules, model-based reasoning with expert knowledge) to diagnose faults from observed symptoms. The knowledge-based paradigm offers significant advantages when expert knowledge is readily available. It allows for rapid deployment and provides a transparent reasoning process that is easy to audit. However, the knowledge elicitation process can be labor-intensive especially for digital control systems that require high-fidelity formalization of tacit expertise into rules or models to ensure synchronization with physical dynamics and real-time constraints. Besides, the knowledge-based expert system tends to be brittle when encountering novel or rare faults not covered by its rules. Signal-based FDD system (Gangsar and Tiwari, 2020) extracts features and patterns directly from measured signals (time-frequency, spectral, statistical features) and uses simple thresholds or classifiers to detect anomalies. The signal-based methods are data-efficient for clear fault patterns and robust to noisy measurements. But they usually suffer from ad hoc feature and threshold design. The data-driven approaches (Taqvi et al., 2021) rely on large datasets to learn predictive or diagnostic models (e.g., machine learning, deep learning) without explicit physics-based models. Machine learning models have the advantages of being highly flexible and scalable complex nonlinear digital control systems. The intricate fault patterns involved in digital control systems can also be discovered with machine learning models. Today’s big data revolution also pushes a data-driven culture for the integrated fault detection, diagnosis and prognosis applications. The data-driven methods use system operational data to gain valuable insights into abnormal situation management through the early fault prediction and identification. Therefore, an ensemble data-driven method integrating particle filter is proposed for fault detection and diagnosis of digital control systems based on Moving Window Generalized Likelihood Ratio Test (MW-GLRT). Here it should be noted that the moving window is defined as a window of a specified size moves across the data to perform calculations on subsets. The term moving window is usually used interchangeably with sliding window particularly in the context of data analysis and time series. Both terms describe the same technique of repeatedly applying a fixed-size operation over a sequence of data points that shifts the window one step at a time. Particle filter, also known as sequential Monte Carlo method, uses a set of random samples with associated weights to approximate the posterior probability density function and make time-series forecasting in dynamic data-driven simulation (Fearnhead and Kunsch, 2018). The nonlinearities and uncertainties (i.e., non-Gaussian noise) involved in digital control system can thus be effectively characterized by the particle-based representation. Within the proposed framework, the particle filtering process is also augmented with the statistical power of the GLRT and the ability of a moving window to track changes over time in fault detection and diagnosis. The GLRT helps determine if a fault is present by comparing the likelihood of the observed data under different hypotheses (e.g., normal vs. faulty), while the moving window allows for the detection of time-varying faults. This work extends the framework established in our previous studies (Lei et al., 2024; Yang et al., 2025). The major contributions of the study include three folds: 1) proposed a unified framework for fault detection and diagnosis in digital control systems of nuclear power plants; 2) developed a fault injection-based model for dynamic data-driven simulation in FDD; 3) achieved diagnosis and analysis for multiple fault types, including identifying the type, location, time of occurrence, and size of each fault.

The remainder of the paper is organized as follows. A unified framework integrating particle filtering with moving window generalized likelihood ratio test for fault detection and diagnosis is firstly introduced in Section 2. The method is then demonstrated with a case example of digital water level control system in U-shaped Tube Steam Generator (UTSG) in Section 3. The mathematical modeling and simulation-based FDD analysis are also presented in Section 3. Finally, the discussions and conclusions are respectively drawn in Section 4 and Section 5.

2 A unified framework for fault detection and diagnosis based on moving window generalized likelihood ratio test

As shown in Figure 1, a unified framework for fault detection and diagnosis of digital control systems is proposed based on MW-GLRT method. The unified framework consists of three modules: 1) particle filter-based data assimilation for accurate real-time system state estimation; 2) moving window-based fault detection; 3) generalized likelihood ratio test for fault diagnosis. The Particle Filter (PF)-based MW-GLRT framework can provide a powerful statistical hypothesis testing technique for fault detection and diagnosis in digital control systems by integrating real-time sensitivity of the moving window approach with the diagnostic precision of the GLRT.

Figure 1

Flowchart illustrating a system simulation process for fault diagnosis. It begins with system simulation and progresses through particle filter-based data assimilation, state estimation, data stream processing, and moving window partitioning. A Chi-square goodness of fit test follows. If the goodness of fit exceeds the limit, a likelihood ratio test is conducted, leading to fault model establishment and likelihood ratio computation. Candidate models are analyzed using a Generalized Likelihood Ratio Estimator. The process concludes by comparing the maximum likelihood ratio with a diagnostic threshold to obtain diagnostic results.

Figure 1. PF-based MW-GLRT for fault detection and diagnosis.

2.1 Particle filter-based data assimilation for system state estimation

Particle filter (Elfring et al., 2021) is a nonlinear filtering algorithm that uses Monte Carlo random sampling and Bayesian filter to approximate the posterior density probability of a system. The core idea of particle filtering is to represent the Probability Density Function (PDF) of the system state using a set of particles. These particles are propagated through time using the system’s dynamics and updated based on new measurements. The weights of the particles are adjusted according to how well they match the observed data, and particles with low weights are resampled to focus computational effort on more likely states. In the context of fault detection and diagnosis, particle filtering can be used to monitor the system’s behavior and identify deviations from normal operation. By comparing the estimated state with expected values or thresholds, potential faults can be detected and isolated.

As illustrated by Figure 2, the particle filtering process consists of the following steps.

1. Initialization. At the initial time step k = 0, particles are drawn from the initial state distribution p (x₀). The particles represent possible initial states of the system.

2. Prediction. For each particle, the state is propagated forward in time using the system process model. This step simulates how the system might evolve based on its dynamics and any control inputs.

3. Update. When a new measurement is available, the likelihood of each particle is computed based on the measurement model. Particles that are more consistent with the measurement are assigned higher weights.

4. Normalization. The weights of all particles are normalized to one to ensure that the weights represent a valid probability distribution.

5. Resampling. To avoid degeneracy (where most particles have negligible weights), resampling is required to maintain a diverse set of particles that effectively represent the state distribution. Particles with higher weights are duplicated, while those with low weights are discarded.

6. Estimation. The state estimate of system is computed as the weighted average of the particles. Particle filtering provides a probabilistic estimate of the system state, which can be further used for anomaly detection.

Figure 2

Diagram illustrating a system state-space model and sequential importance sampling process. The left panel shows a system dynamics and control model with mathematical equations and a dynamic simulation diagram. The right panel outlines steps like prediction, update, normalization, resampling, and estimation with arrows and elements indicating shrinkage and degeneration.

Figure 2. Particle filtering for system state estimation.

2.2 Moving window-based fault detection

The moving window, also known as sliding window, refers to a fixed-size window of recent data points for a specific process variable that slides over the incoming data stream (Zhang et al., 2017). The window captures a subset of consecutive time steps, allowing for the analysis of local variations and temporal patterns. Within each moving window, the system monitors changes in the process variable’s values. Anomaly detection based on monitoring changes in process variable values within a moving window is used to identify unusual patterns or deviations from expected behavior in time-series data. An anomaly will be detected when the deviations between the observational data and the predicted values exceed the threshold (T_D) preset for hypothesis testing (e.g., the chi-square goodness of fit test). The threshold is chosen through a combination of analytical considerations and empirical tuning. The determination of detection threshold (T_D) is guided by the desired false-alarm rate and the statistical distribution of the moving-window PF residuals under nominal operation.

Let $y_{1 : t} = (y_{1}, y_{2}, \dots y_{t}) \in R^{N}$ be the column vector containing the time-series observational data points $y_{t} \in R$ , then the Parzen window estimator p (y_t| y_1:t-1) can be written as Equation 1. Here Parzen window is defined as non-parametric model to estimate a sample’s PDF from a given set of observations without making any assumption about its distribution. It places a window (or kernel) around each data point and sum up the contributions from all of the windows.

p (y_{t} | y_{1 : t - 1}) \approx \frac{1}{N} \sum_{i = 1}^{N} h (y_{t} | x_{t}^{i}) (1)

Where p (y_t| y_1:t-1) is the estimated PDF of system state y_t at time t, given all the measurements up to t-1. N is the sample size. $x_{t}^{i}$ represents the ith sample of system state vector at time t. h (·) is the kernel function, which is usually set as a symmetric function such as the Gaussian kernel in Equation 2:

ϕ (z) = \frac{1}{\sqrt{2 π σ_{ϕ}^{2}}} \exp (- \frac{z^{2}}{2 σ_{ϕ}^{2}}) (2)

The degree of conformity ( $ζ_{t}$ ) between the observed data and predicted values is estimated by the following Equation 3.

ζ_{t} = F (y_{t} | y_{1 : t - 1}) \approx \frac{1}{N} \sum_{i = 1}^{N} \int_{- \infty}^{y_{t}^{i}} ϕ (I - y_{t}^{i}) d I (3)

Where the independent variable $ζ_{t}$ is measured by a chi-square ( $χ^{2}$ ) goodness of fit test to show how well the values expected based on the model fit the observed values. The chi-square test is defined for the null hypothesis H₀ that the observational data follow a uniform distribution.

For the chi-square goodness of fit test, we first divide the probability interval [0, 1] into J equally spaced bins. Let c_j be the observed counts that $ζ_{t}$ falls into the jth ( $χ^{2}$ ) bin, the following identify holds $c_{j} = N \cdot p_{j}$ given that the probability of a data point falling in bin j is $p_{j}$ . In the meantime, the sum of the observed counts must equal to the given number of N of data points to account for the constraint: $\sum_{j = 1}^{J} c_{j} = N$ . The expected frequency or counts for each bin j should be the same value of N/J when the random variable $ζ_{t}$ is uniformly distributed. J-1 is known as the degrees of freedom (df).

The chi-square ( $χ^{2}$ ) test statistic can be calculated by the following Equation 4.

χ_{α, d f}^{2} = \sum \frac{{(O b s e r v e d - E x p e c t e d)}^{2}}{E x p e c t e d} = \sum_{j = 1}^{J} \frac{{(c_{j} - N / J)}^{2}}{N / J} (4)

Where Observed and Expected represents the observations and expectations directly. $χ_{α, d f}^{2}$ is the chi-square critical value with significance level α. The anomaly occurs when the following criterion is satisfied by Equation 5.

χ^{2} \begin{array}{c} \overset{ω_{0}}{\leq} \\ \underset{ω_{m}}{>} \end{array} T_{D} (5)

Where $T_{D}$ represents the threshold design for fault detection.

2.3 Generalized likelihood ratio test for fault diagnosis

Upon detection of an anomaly, the GLRT principle can be further applied to the data within the current window to pinpoint the specific type of fault and estimate its characteristics (e.g., magnitude, onset time) by distinguishing between different potential fault hypotheses. The GLRT (Harkat et al., 2019) is a statistical test that compares the likelihood of observing the data under different hypotheses (e.g., normal vs. faulty conditions). In statistics, the likelihood ratio test is usually used to assess the goodness of fit of two computing models. The likelihood ratio will be computed and compared for all hypothesized faults to locate and identify the potential faults.

Suppose that the state space model of nonlinear dynamical system in normal operating mode (ω₀) is given by Equation 6.

\{\begin{array}{c} x_{t} = f_{0} (x_{t - 1}, θ^{0}) + μ_{t - 1} \\ y_{t} = h_{0} (x_{t}, θ^{0}) + ν_{t} \end{array} (6)

Where x_t represents the vector of system state at time t. f (•) is a non-linear function describing the system state evolution over time. u_t-1 is the state noise. y_t represents the measurement at time t. v_t is the measurement noise. h(•) represents for the measurement function.

The particle filters defined for the various of faulty modes (m = 1, 2, …, M) are defined in Equation 7.

\{\begin{array}{c} x_{t}^{m} = f_{m} (x_{t - 1}^{m}, θ^{m}) + μ_{t - 1} \\ y_{t}^{m} = h_{0} (x_{t}^{m}, θ^{m}) + ν_{t} \end{array} (7)

Where ω_m represents the likelihood estimator that is specifically defined for faulty mode m. The likelihood function $L_{t}^{m}$ can be obtained for different likelihood estimator ω_m at time t by the following Equation 8.

L_{k}^{m} = \int p (y_{k}^{m} | ω_{m}, x_{k}^{m}) \int p (x_{k}^{m} | ω_{m}, y_{k - 1}^{m}) d x_{k}^{m} \approx \frac{1}{N} \sum_{i = 1}^{N} p (y_{k}^{m} | ω_{m}, x_{k}^{m, i}) (8)

Where $x_{k}^{m, i}$ (i = 1,2,…,N) represents the sample swarm (also called particles) that follows a selected prior distribution of $p (x_{k}^{m} | ω_{m}, y_{k - 1}^{m})$ . The likelihood function of system normal state at time t cab be obtained in a similar way as defined in Equation 9.

L_{k}^{0} = \frac{1}{N} \sum_{i = 1}^{N} p (y_{k}^{m} | ω_{0}, x_{k}^{0, i}) (9)

The log likelihood ratio ( $L L R_{k}^{m}$ ) of fault mode (ω_m) to normal operating mode (ω₀) is calculated by the following Equation 10.

L L R_{t}^{m} = \sum_{k = t - τ + 1}^{t} \ln \frac{L_{k}^{m}}{L_{k}^{0}} = \sum_{k = t - τ + 1}^{t} \{\ln \sum_{i = 1}^{N} p (y_{k}^{m} | ω_{m}, x_{k}^{m, i}) - \ln \sum_{i = 1}^{N} p (y_{k}^{m} | ω_{0}, x_{k}^{0, i})\} (10)

Where τ is the width of detection window. A diagnosis function is constructed by the following Equation 11.

R_{t} = \max_{1 \leq m \leq M} L L R_{t}^{m} (11)

The fault is detected when the strategy applied for diagnosis satisfies $R_{t} > λ$ . Where λ is the diagnostic threshold. The diagnostic threshold (λ) is set to achieve a balance between sensitivity to true faults and robustness against spurious category signals. The time occurrence of fault is estimated by the following Equation 12.

T_{f a u l t} = \min {t| R_{t} > λ} (12)

The type of fault (ω_ς) can finally be identified with the maximum likelihood estimation as defined in Equation 13.

ς = \arg \max_{1 \leq m \leq M} R_{T_{f a u l t}}^{m} (13)

3 Case example: fault detection and diagnosis of digital water level control systems in NPPs

3.1 System description and modeling

In the Section, a digital UTSG water level control system in Pressurized Water Reactor Nuclear Power Plants (PWR-NPPs) is taken as an example to demonstrate the fault detection and diagnosis analysis. Digital water level control system is a crucial element in steam generators within nuclear power plants. The primary objective of digital feedwater control system is to maintain the steam generator water level at a desired setpoint during steady-state or transient operation. The major components of digital water level control system in UTSG consist of: i) sensors {a feedwater flow sensor, a steam flow sensor, a water level sensor}; ii) regulating valves {feedwater regulating valve, steam flow control valve}; iii) Proportional-Integral (PI) controllers {feed regulating valve controller, steam flow controller}; iv) microprocesses {main CPU, backup CPU}. During normal reactor operation, the level of the water in the steam generator is controlled by a feedwater controller which receives inputs from reactor vessel water level, steam mass flow rate, and feedwater mass flow rate transmitters as shown in Figure 3. In three-element control mode, the sensed SG water level input to the reactor power level/flow error will be modified by the magnitude of any mismatch between steam flow and feed flow. The PI controllers and microprocessors work together to maintain the water level within normal range. The PI controller combines the proportional and integral actions to provide a stable and accurate control response, while microprocessors handle the logic and processing for the control system. The primary coolant, typically water under high pressure, absorbs heat from the reactor core. This heated primary coolant then flows through U-tubes in steam generator, transferring its heat to secondary water. The secondary coolant, heated by the primary coolant, boils and turns into steam to drive turbines and generate electricity. The entire process involves heat transfer and mass transfer in steam generator. By disassembling the heat transfer and mass transfer in steam generator within the nuclear power plants, the system simulation model can be developed based on lumped-parameter method. As illustrated in Figure 4, the heat and mass transfer processes in UTSG are described by the following regions.

1. Primary coolant circuit: where the primary coolant flowing is contained within the U-tubes. It comprises distinct rising and descending flow paths based on the coolant circulation directions.

2. Tube wall interface: which represents the heat transfer surface separating the primary and secondary circuits. It consists of tube walls for both the rising and descending sections in accordance with the primary flow paths.

3. Core heat transfer zone: Within this area, thermal energy moves from the primary coolant inside the U-tube walls to the surrounding secondary water.

4. Secondary water circuit: where secondary water circulates externally around the U-tubes throughout the steam generator body.

5. Annular mixing chamber: where saturated recirculated water and subcooled feedwater blend before entering the core heat transfer zone. The mixture is treated as incompressible during this process.

6. Steam-water separation region: where moisture is removed from saturated steam using dryers in this region.

Figure 3

Diagram of a steam generator system showing various components and flow paths. It includes a feedwater controller, feedwater valve, and sensors for steam flow, feedwater flow, and water level. The steam flows through a U-shaped tube, with zones labeled for primary coolant, tube wall interface, core heat transfer, secondary water circuit, annular mixing chamber, and steam-water separation region. The system features entrance and exit chambers for the steam generator.

Figure 4

Diagram illustrating a water level control system with various components, including a filter, water level controller, feed regulating valve controller, and feedwater valve. Below is a detailed schematic of a reactor coolant system. Sections labeled one to six represent different regions such as the primary coolant rising region, core heat transfer region, and steam-water separation region. Arrows indicate flow directions of water and signals between components, with symbols representing variables and equations for system control.

Figure 4. System dynamics control model in UTSG.

The control laws used for three-element water level control in UTSG are also modelled with transfer functions, as defined by the block diagrams in Figure 4. The physical meaning and initialization values of variables involved in lumped-parameter mathematical models are explained in Table 1.

Table 1

Table 1. Variables defined in mathematical modeling of system dynamics in UTSG.

Specifically, the system simulation model is developed based on a nine-variable mathematical model of UTSG (Guimaraes et al., 2008) in MATLAB/Simulink environment, where the nine major process variables involving in UTSG water level control are stored in a state vector x = [ T_p1, T_p2, T_m1, T_m2, P_s, X_e, L_dw, T_dw, T_d]^T. Here T_p1 and T_p2 represent the average temperature of the primary coolant in rising and descending section, respectively. T_m1 and T_m2 respectively denote the average temperature of metal tube in rising and descending section. P_s is the saturation pressure in saturated steam. X_e is the dryness fraction of saturated steam. L_dw means the water level in steam generator. T_dw and T_d are the average temperature of recirculated water and average temperature of annular inlet flow, respectively. The mathematical equations and associated variables defined for the development of UTSG models in process dynamics modeling and control can be referred to Appendix A in one of our previous studies (Jiang et al., 2023). To facilitate the validation of the model method and extraction of fault characteristics, an in-depth investigation into the failure modes and mechanisms of the system components was also conducted. For instance, sensor failure modes include permanent failure, unstable signal output, fixed deviation, signal drift, and transient. Permanent failure is typically caused by structural damage or aging effects. Fixed-deviation can be resulted from bias current or voltage. Unstable signal output may result from ware and increased noise, or from changes in the measurement environment such as temperature and pressure. Signal drift is often caused by structural damage or aging effects. Transient failures are usually due to the failure of electronic equipment associated with the sensor, such as excessive current or voltage, or interference like lightning or radio interference. Common failure modes for regulating valves include sticking or binding of the valve stem or actuator, which can prevent the valve from moving as intended and result in improper flow control. Additionally, leakage can occur in various forms, such as seat leakage, stem leakage, or body leakage, all of which can compromise the efficiency and safety of the process. Other notable failure modes are cavitation, erosion and corrosion due to abrasive or corrosive fluids, clogging from particles or debris, actuator failure due to electrical or mechanical issues, and inadequate response time to changes in the control signal. Control failures of PI controllers, including feed regulating valve controllers and steam flow controllers, can stem from a variety of factors. These include electronic equipment failure, loss of input signals from microprocessors, loss of output signals, or power failures. Abnormal output values, which may be excessively high or low, are often attributed to aging effects or structural damage. Additionally, microprocessors may lose all sensor signals due to electronic equipment failure. Data processing failures can occur when both the main and backup CPUs fail. Communication failures may result from the loss of output signals.

In the study, a total of four components faults are considered, including: i) steam flow sensor offset with a constant gain 0.4 kg/s; ii) failure of both microprocessors (main and backup CPUs); iii) abnormal output of feed regulating valve controller; iv) leakage occurred on feedwater regulating valve with a discrepancy value of 30 kg/s. The fault modes, causes and models of the abovementioned digital components are listed in Table 2.

Table 2

Table 2. Fault modes, causes and models of system components being considered in the case study.

For each component failure mode, a fault model can be constructed based on the underlying failure mechanism in a system analysis. By linking failure modes to their underlying mechanisms, the fault models characterize how a component behaves when it fails. For example, the observation model of a sensor can be constructed as observation value = true value + observation noise. When the sensor undergoes a drift fault, the corresponding observation model output is modified to observation value = true value + observation noise + a·t (drift value). Figure 5 shows the pseudo-code of the drift fault model of steam flow sensor. The fault models for other components can be constructed in a similar way. With the fault models, fault injection testing can be implemented to evaluate the robustness and reliability of systems by deliberately introducing faults and observing the system’s behavior. These models define the types, locations, and activation times of faults, allowing for systematic testing of various failure scenarios.

Figure 5

Diagram describing the drift fault of a steam flow sensor. The fault function $ y = fcn(u,t) $ is explained with variables: $ u $ (input variable, true value), $ t $ (input time), $ y $ (output variable, faulty sensor observations), and $ a $ (constant). If $ t < 50 $, then $ y = u + randn(noise) $. Else if $ t \geq 50 $, then $ y = u + randn(noise) + a \times (t - 50) $.

Figure 5. Pseudo-code of the drift fault model of steam flow sensor.

3.2 Simulation-based fault detection and diagnosis analysis

Based on the system simulation model, the system dynamic behaviors under normal and faulty conditions can be simulated. The time step for simulation is set as $Δ t = 0.1 s$ . Assuming that the system is initially operated under a steady state. The faults are injected into the system operation in 50s later. The covariance of process state noise and measurement noise of process state vector x = [ T_p1, T_p2, T_m1, T_m2, P_s, X_e, L_dw, T_dw, T_d]^T are assumed to be Diag (0.001, 0.001, 0.001, 0.001, 0.0001, 0.00001, 0.0001, 0.001, 0.001) and Diag (0.0001, 0.0001, 0.0001, 0.0001, 0.00001, 0.000001, 0.00001, 0.0001, 0.0001), respectively. During the simulation trials, the sample size used for particle filter tacking is selected as N_s = 200. The length of moving window is set as l = 5 in fault detection. The fault detection results for the four different component faults are shown in Figure 5.

Obviously, we can quickly achieve fault detection through the chi-square statistical distribution ( $χ^{2}$ ). The $χ^{2}$ is a numerical measure that quantifies the difference between a system observed behavior and its expected (or normal) behavior. As shown in Figure 6a, the effects of fault are not so evident in the short period immediately following the occurrence time of fault for the reason that drift is a slow and systematic change in a signal over time. The similar phenomenon can also be observed at time 67s for the abnormal control behavior of feed regulating valve controller. In contrast, the failure of microprocessors and leakage in feedwater regulating valve will cause a great disturbance to the operation of the steam generator in a very short time period.

Figure 6

Four charts show system faults over time, each with a detection threshold line. Chart (a) depicts fault #1, steam flow sensor drift, with a rapid increase after 60 seconds. Chart (b) shows fault #2, failure of both microprocessors, fluctuating above the threshold after 50 seconds. Chart (c) presents fault #3, abnormal output of feed regulating valve controller, escalating steeply after 70 seconds. Chart (d) illustrates fault #4, leakage of feedwater regulating valve, with values exceeding the threshold after 50 seconds.

Figure 6. Moving window-based fault detection. (a) Fault #1: steam flow sensor drift. (b) Fault #2: failure of both microprocessors. (c) Fault #3: abnormal output of feed regulating valve controller. (d) Fault #4: leakage of feedwater regulating valve.

When an anomaly is detected, the subsequent step is to conduct fault diagnosis to identify and pinpoint the detailed information about the fault, such as the type, location, time of occurrence, and magnitude. Here we hypothesize 8 potential faults in the system simulation under test, which are i) ω₁: failure of both microprocessors; ii) ω₂: fixed deviation in steam flow sensing; iii) ω₃: steam flow sensor drift; iv) ω₄: fixed deviation in water level sensing; v) ω₅: abnormal output value of feed regulating valve controller; vi) ω₆: abruptly open of feedwater regulating valve; vii) ω₇: abruptly open of steam flow control valve; viii) ω₈: leakage of feedwater regulating valve. The diagnosis results obtained using PF-based likelihood ratio test is displayed in Figure 7.

Figure 7

Four graphs depict the Log-Likelihood Ratio (LLR) over time for different faults in a system. Graph a shows steam flow sensor drift with LLR1 rising sharply. Graph b illustrates failure of both microprocessors with a notable increase in LLR1 and LLR5. Graph c depicts abnormal output of the feed regulating valve controller with LLR5 peaking. Graph d shows leakage of the feedwater regulating valve with LLR7 increasing significantly. Each graph has distinct LLR lines marked with different colors and symbols representing various components.

Figure 7. Likelihood ratio test for fault diagnosis. (a) Fault #1: steam flow sensor drift. (b) Fault #2: failure of both microprocessors. (c) Fault #3: abnormal output of feed regulating valve controller. (d) Fault #4: leakage of feedwater regulating valve.

As revealed in Figure 7, the type of fault can be identified based on the multiple-model maximum likelihood estimator. The fault mode corresponding to the highest likelihood ratio is identified as the most probable fault. It is obvious that accurate diagnosis results can be obtained for all four test cases although there could be similar effects posed by different modes of fault. For example, the similar effects propagated by the failure of both microprocessors and abnormal output of feedwater regulating valve controller of Figure 7b can only be distinguished by a longer period of system evolution.

In order to demonstrate the superiority of the PF-based MW-GLRT method, the testing data obtained from system simulations are also used to train the LSTM network for providing data-driven fault detection and diagnosis. The Long Short-Term Memory (LSTM) is a recurrent neural network that is widely used in deep learning for process and making predictions based on sequential data (Hochreiter and Schmidhuber, 1997). In LSTM network training, the collected datasets are further divided into a training set, a validation set, and a test set with a ratio of 7.6:1.7:0.7 so as to ensure the prediction performance of the model. Both the count of neurons in hidden layer and number of iterations in LSTM network training are set to 100 with sampling time window l = 5 and learning rate ƞ = 0.01. The LSTM hyperparameters were tuned to ensure stable training and fair comparison with the proposed PF-based MW-GLRT framework. Each sample dataset contains input sequence data for online model learning and output data points for model validation. The Mean Squared Error (MSE) between the output data points and the predicted values are calculated during the training process, which are again used to guide the inverse model parameters calibration. The diagnosis results obtained by LSTM network is displayed with a confusion matrix representation in Figure 8.

Figure 8

Confusion matrix showing predicted versus actual fault types ranging from one to eight. Diagonal values indicate high accuracy, with most faults correctly predicted, such as 0.965 for type one and 0.956 for type five. Off-diagonal values are low, indicating minimal misclassification. A color gradient from light to dark red visualizes accuracy levels, with a scale bar on the right.

Figure 8. Diagnosis results obtained by LSTM network.

Similar to the PF-based MW-GLRT diagnosis, a good accuracy score can also be achieved by LSTM network except for the very similar propagation effects of fault mode pairs {ω₁, ω₅} and {ω₂, ω₃}. This issue arises because different underlying causes can produce identical or very similar observable effects. The similarity makes it challenging to isolate the faults, as the observed symptoms do not provide sufficient differentiation.

In addition to the identification of fault types, the magnitude of faults can also be quantified by the PF-based MW-GLRT method. In the case example of leakage of feedwater regulating valve, the fault models can be further divided into more specific subcategories according to the fault severity levels. The fault severity can then be evaluated based on the comparison of likelihood ratios among different test scenarios. To enable a clear comparison, we incrementally increased the leakage rate of feedwater regulating valve from 10 kg/s, 20 kg/s, 30 kg/s to 40 kg/s. The comparsion of likelihood ratios among the four test scenarios is displayed in Figure 9.

Figure 9

Line graph showing the likelihood ratio over time for four mass flow rates: 10 kg/s, 20 kg/s, 30 kg/s, and 40 kg/s. The likelihood ratio increases over time for all rates, with the highest increase observed at 40 kg/s. Each line represents a different flow rate with distinct markers and colors.

Figure 9. Likelihood ratio test for feedwater regulating valve with.

As we can see from Figure 9, the likelihood ratio curve of the fault model with a leakage rate of 30 kg/s ranks at the highest position, followed by the case scenarios of 20 kg/s and 40 kg/s in leakage rate. Interestingly, the likelihood ratios of Case #2 (20 kg/s) and Case #4 (40 kg/s) are very close to each other. The bottom line represents the fault model with a leakage rate of 10 kg/s. From the results, it can be diagnosed that the actual leakage rate is approximately 30 kg/s.

4 Discussion

Fault detection and diagnosis in digital control systems is crucial for maintaining system reliability, safety, and efficiency. Early detection and diagnosis of faults can prevent accidents, equipment damage, and environmental hazards, ensuring a safer operating environment in nuclear power plants. However, digital control systems involve strong coupling and interactions among numerous interconnected components such as hardware, software, firmware, and human factors. Faults in one component can propagate through the system, affecting others and making diagnosis difficult. Besides, the complex nonlinear dynamics and inherent uncertainties in digital control systems also pose significant challenges for fault detection and diagnosis. Such challenges stem from the difficulty in modeling and predicting the behavior of digital control systems, as well as the potential for unexpected and abrupt changes due to faults. The PF-based MW-GLRT framework proposed in the paper can offer a robust solution to fault detection and diagnosis of digital control systems. However, the following issues are still needed to be addressed.

1. Particle filter-based data assimilation in dynamic system state estimation and prediction. Compared to alternative Extended or Unscented Kalman Filters (EKF/UKF) (Yang et al., 2025), the probabilistic nature of particle filtering allows for higher sensitivity to subtle changes in system behavior that may indicate a fault. Its ability to provide probabilistic state estimates through recursive Bayesian estimation makes it highly suitable for real-time monitoring and anomaly detection. In the context of fault detection, PFs can generate residuals that serve as the basis for decision-making in FDD algorithms. These residuals can be analyzed using methods such as the moving window to detect faults and the generalized likelihood ratio test to diagnose them. However, PFs also face challenges such as the need for a large number of particles to maintain accuracy and the computational cost associated with resampling. To address this challenge, ensemble methods that integrate PF with other machine learning algorithms are suggested in handling complex fault scenarios.

2. Multi-model fault detection and diagnosis with ambiguity. Although the method presented in this paper can effectively identify multiple faults, the problem of similar symptoms remains a significant challenge in fault diagnosis. Similar symptoms can lead to ambiguity in identifying the exact nature and location of faults, making it difficult to distinguish between different fault mode. This issue is particularly prevalent in complex digital control systems where multiple faults may produce overlapping or indistinguishable symptoms. It could involve integrating more variables into the diagnostic algorithms, enhancing feature extraction techniques, and incorporating domain-specific knowledge to improve the robustness and reliability of fault detection and diagnosis systems.

3. Extend generalized likelihood ratio test with kernel principal component analysis for high-dimensional, nonlinear data during feature extraction. By combining the real-time sensitivity of the moving window with the diagnostic precision of the GLRT, the proposed MW-GLRT framework can provide a robust and efficient solution for identifying and characterizing faults in digital control systems. However, digital instrumentation and control systems often rely on a large number of sensors and complex algorithms to monitor and control processes, which may result in datasets with many variables and intricate relationships. Effectively handling this data requires sophisticated feature extraction techniques such as kernel principal component analysis that can reduce dimensionality while preserving essential information.

4. Sensitivity analysis for optimal threshold determination in FDD. In the paper, we propose a unified fault-detection-and-diagnosis framework that explicitly balances real-time monitoring with diagnostic precision. The particle filtering is used to obtain accurate, non-Gaussian, nonlinear state estimates and predictions within a fixed-size, sliding window. Then the generalized likelihood ratio test operates on the PF-derived residuals within the moving window to detect anomalies and to isolate the fault type by evaluating competing hypotheses/models. Such a combination leverages the strengths of PF for state propagation under nonlinear/non-Gaussian conditions and GLRT for principled, hypothesis-driven discrimination among fault types. However, there is inevitably a tension between frequent, threshold-based anomaly detection and the practical burden of false alarms, as well as the risk of missing incipient faults due to overly conservative thresholds. The interplay between False Positives (FPs) and False Negatives (FNs) is central to any deployed anomaly detection system, especially when decisions are time-critical and operator workload is a factor. It is therefore that threshold tuning is critical to manage the trade-off between sensitivity and false alarms. A practical balance must be achieved through optimal window size and threshold determination to reduce operator workload and match safety requirements.

5 Conclusion

In the paper, a novel particle filtering-based moving window generalized likelihood ratio test approach is proposed for fault detection and diagnosis in digital control systems. The method is demonstrated with a digital U-shaped tube steam generator water level control system in pressurized water reactor nuclear power plants, where 8 fault modes are taken into account for the digital components including sensors, control valves, PI controller, CPU module, etc., for system simulation and testing. Comparisons are also conducted between the proposed PF-based MW-GLRT method and LSTM network. The comparative results demonstrated that both methods can accurately diagnose various types of faults in digital control systems. The PF-based method is particularly effective when an accurate mathematical or physics-based model is available, while the LSTM method relies heavily on the availability of comprehensive fault datasets. Besides, the fault magnitude can also be diagnosed in addition to fault type, location, and time when a sophisticated multi-model generalized likelihood ratio test is conducted.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

BL: Project administration, Conceptualization, Writing – original draft. BT: Writing – review and editing, Investigation, Methodology. YY: Writing – review and editing, Formal Analysis, Data curation. CJ: Visualization, Validation, Writing – review and editing. JY: Supervision, Writing – review and editing, Funding acquisition, Validation, Writing – original draft.

Funding

The authors declare that financial support was received for the research and/or publication of this article. The work is supported by the Open Foundation of the Guangdong Provincial Key Laboratory of Electronic Information Products Reliability Technology under Grant No. GDDZXX202303, National Foreign Experts Program under Grant No. H20250123, 2025 Annual Special Innovation Projects for Institution of Higher Education in Guangdong Province (Grant No. 2025KTSCX007), the Special Funds of CEPREI (Research on dynamic reliability analysis and life prediction technology for nuclear power plant intelligent control system (Project No: 24Z11)), and the Soft Science Research Project of Guangdong Province under Grant No. 2022A0505050007.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Abdoune, F., Nouiri, M., Cardin, O., and Castagna, P. (2022). An enhanced methodology of fault detection and diagnosis based on digital twin. IFAC-PapersOnLine 55 (19), 43–48. doi:10.1016/j.ifacol.2022.09.181

CrossRef Full Text | Google Scholar

Alsaif, K. M., Albeshri, A. A., Khemakhem, M. A., and Eassa, F. E. (2024). Multimodal large language model-based fault detection and diagnosis in context of industry 4.0. Electronics 14 (24), 4912. doi:10.3390/electronics13244912

CrossRef Full Text | Google Scholar

Atoui, M. A., and Cohen, A. (2021). Coupling data-driven and model-based methods to improve fault diagnosis. Comput. Industry 128, 103401. doi:10.1016/j.compind.2021.103401

CrossRef Full Text | Google Scholar

Bid, A., Khan, M. T., and Iqbal, J. (2021). A review on fault detection and diagnosis techniques: basics and beyond. Artif. Intell. Rev. 54, 3639–3664. doi:10.1007/s10462-020-09934-2

CrossRef Full Text | Google Scholar

Dai, X. W., and Gao, Z. W. (2013). From model, signal to knowledge: a data-driven perspective of fault detection and diagnosis. IEEE Trans. Industrial Inf. 9 (4), 2226–2238. doi:10.1109/tii.2013.2243743

CrossRef Full Text | Google Scholar

Davoodi, M., Meskin, N., and Khorasani, K. (2018). Integrated fault diagnosis and control design of linear complex systems. HESSE, Germany: The Institution of Engineering and Technology.

Google Scholar

Dreven, J. V., Boeva, V., Abghari, S., Grahn, H., Al Koussa, J., and Motoasca, E. (2023). Intelligent approaches to fault detection and diagnosis in distinct heating: current trends, challenges, and opportunities. Electronics 12 (6), 1448. doi:10.3390/electronics12061448

CrossRef Full Text | Google Scholar

Elfring, J., Torta, E., and Molengraft, R. D. (2021). Particle filters: a hands-on tutorial. Sensors 21 (2), 438. doi:10.3390/s21020438

PubMed Abstract | CrossRef Full Text | Google Scholar

Fearnhead, P., and Kunsch, H. R. (2018). Particle filters and data assimilation. Annu. Rev. Statistics Its Appl. 5, 421–449. doi:10.1146/annurev-statistics-031017-100232

CrossRef Full Text | Google Scholar

Gangsar, P., and Tiwari, R. (2020). Signal based condition monitoring techniques for fault detection and diagnosis of induction motors: a state-of-the-art review. Mech. Syst. Signal Process. 144, 106908. doi:10.1016/j.ymssp.2020.106908

CrossRef Full Text | Google Scholar

Gao, Z. W., Cecati, C., and Ding, S. X. (2015). A survey of fault diagnosis and fault-torerant techniques part I: fault diagnosis with model-based and signal-based approaches. IEEE Trans. Industrial Electron. 62 (6), 3757–3767. doi:10.1109/tie.2015.2417501

CrossRef Full Text | Google Scholar

Guimaraes, L. N. F., Oliveira, N. D. S., and Borges, E. M. (2008). Derivation of a nine-variable model of a U-tube steam generator coupled with a three-element controller. Appl. Math. Model. 32 (6), 1027–1043. doi:10.1016/j.apm.2007.02.022

CrossRef Full Text | Google Scholar

Harkat, M. F., Mansouri, M., Nounou, M. N., and H.N., N. (2019). Fault detection of uncertain chemical processes using interval partial least squares-based generalized likelihood ratio test. Inf. Sci. 490, 265–284. doi:10.1016/j.ins.2019.03.068

CrossRef Full Text | Google Scholar

Hochreiter, S., and Schmidhuber, J. (1997). Long short-term memory. Neural Comput. 9 (8), 1735–1780. doi:10.1162/neco.1997.9.8.1735

PubMed Abstract | CrossRef Full Text | Google Scholar

IAEA (2009). Implementing digital instrumentation and control systems in the modernization of nuclear power plants.

Google Scholar

Isermann, R. (2005). Model-based fault-detection and diagnosis–status and applications. Annu. Rev. Control 29 (1), 71–85. doi:10.1016/j.arcontrol.2004.12.002

CrossRef Full Text | Google Scholar

Jiang, C. Y., Yang, J., Xue, K., Zhanyu, H., and Ming, Y. (2023). Coupling of adjoint-based Markov/CCMT predictive analytics with data assimilation for real-time risk scenario forecasting of industrial digital process control systems. Process Saf. Environ. Prot. 171, 951–974. doi:10.1016/j.psep.2023.01.077

CrossRef Full Text | Google Scholar

Jonne, G. B., Ibrahim, D., Rai, L., Chen, Q., and Liu, F. (2022). Development of fault diagnostics, and prognosis system based on digital twin and blockchain. ECS Trans. 107 (1), 13603–13611. doi:10.1149/10701.13603ecst

CrossRef Full Text | Google Scholar

Karimi, H. (2021). Fault diagnosis and prognosis techniques for complex engineering systems. Academic Press.

Google Scholar

Kopbayev, A., Khan, F., Yang, M., and Halim, S. Z. (2022). Fault detection and diagnosis to enhance safety in digitalized process system. Comput. and Chem. Eng. 158, 107609. doi:10.1016/j.compchemeng.2021.107609

CrossRef Full Text | Google Scholar

Lee, J., Cameron, I., and Hassall, M. (2019). Improving process safety: what roles for digitalization and industry 4.0? Process Saf. Environ. Prot. 132, 325–339. doi:10.1016/j.psep.2019.10.021

CrossRef Full Text | Google Scholar

Lei, B. M., Yang, J., Xue, K., and Tian, B. (2024). “A multi-modal particle filtering method for fault detection and diagnosis of digital instrumentation and control systems,” in IET conference proceedings of the 14th international conference on quality, reliability, risk, maintenance, and safety engineering (QR2MSE 2024) (Harbin, China).

Google Scholar

Leite, D., Andrade, E., Rativa, D., and Maciel, A. M. A. (2025). Fault detection and diagnosis in industry 4.0: a review on challenges and opportunities. Sensors 25 (1), 60. doi:10.3390/s25010060

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, W. J., Li, H., Gu, S., and Chen, T. (2020). Process fault diagnosis with model- and knowledge-based approaches: advances and opportunities. Control Eng. Pract. 105, 104637. doi:10.1016/j.conengprac.2020.104637

CrossRef Full Text | Google Scholar

Li, C. C., Huang, Y., and Hou, C. R. (2022). Fault diagnosis of control system based on neural network data fusion.

Google Scholar

Maican, C. A., Pana, C. F., Patrascu-Pana, D. M., and Radulescu, V. M. (2025). Review of fault detection and diagnosis methods in power plants: algorithms, architectures, and trends. Appl. Sci. 15 (11), 6334. doi:10.3390/app15116334

CrossRef Full Text | Google Scholar

Melani, A. H., Michalski, M. A., Silva, R. F., and Martha, G. F. (2021). A framework to automate fault detection and diagnosis based on moving window principal component analysis and Bayesian network. Reliab. Eng. Syst. Saf. 215, 107837. doi:10.1016/j.ress.2021.107837

CrossRef Full Text | Google Scholar

Mouzakitis, A. (2013). Classification of fault diagnosis methods for control systems. Meas. Control 46 (10), 303–308. doi:10.1177/0020294013510471

CrossRef Full Text | Google Scholar

Nadesalingam, K., and Towill, D. R. (1978). Frequency domain fault detection and diagnosis in hybrid control systems: a feasibility study. IEEE Trans. Instrum. Meas. 27 (2), 193–199. doi:10.1109/tim.1978.4314656

CrossRef Full Text | Google Scholar

Park, Y. J., Fan, S. S., and Hsu, C. Y. (2020). A review on fault detection and process diagnostics in industrial processes. Processes 8 (9), 1123. doi:10.3390/pr8091123

CrossRef Full Text | Google Scholar

Qiu, S. H., Cui, X. P., Ping, Z. W., Shan, N., Li, Z., Bao, X., et al. (2023). Deep learning techniques in intelligent fault diagnosis and prognosis for industrial systems: a review. Sensors 23 (3), 1305. doi:10.3390/s23031305

PubMed Abstract | CrossRef Full Text | Google Scholar

Roozbeh, R. F., Hadi, D., Vasile, P., and Lucas, C. (2009). Model-based fault detection and isolation of a steam generator using neuro-fuzzy networks. Neurocomputing 72 (13-15), 2939–2951. doi:10.1016/j.neucom.2009.04.004

CrossRef Full Text | Google Scholar

Simani, S., Patton, R., and Fantuzzi, C. (2003). Model-based fault diagnosis in dynamic systems using identification techniques. Berlin, Heidelberg: Springer-Verlag.

Google Scholar

Taqvi, S. A., Zabiri, H., Tufa, L. D., Uddin, F., Fatima, S. A., and Maulud, A. S. (2021). A review on data-driven learning approaches for fault detection and diagnosis in chemical processes. ChemBioEng Rev. 8 (3), 239–259. doi:10.1002/cben.202000027

CrossRef Full Text | Google Scholar

Wang, X., Wang, H., and Peng, M. J. (2025). Interpretability study of a typical fault diagnosis model for nuclear power plant primary circuit based on a graph neural network. Reliab. Eng. Syst. Saf. 261, 111151. doi:10.1016/j.ress.2025.111151

CrossRef Full Text | Google Scholar

Xu, S. C. (2019). A survey of knowledge-based intelligent fault diagnosis techniques. J. Phys. Conf. Ser. 1187, 032006. doi:10.1088/1742-6596/1187/3/032006

CrossRef Full Text | Google Scholar

Xu, Z. L. (2023). An intelligent fault detection approach for digital integrated circuits through graph neural networks. Math. Biosci. Eng. 20 (6), 9992–10006. doi:10.3934/mbe.2023438

PubMed Abstract | CrossRef Full Text | Google Scholar

Yang, J., Aldemir, T., and Smidts, C. (2018). A deductive method for diagnostic analysis of digital instrumentation and control systems. IEEE Trans. Reliab. 67 (4), 1442–1458. doi:10.1109/tr.2018.2864630

CrossRef Full Text | Google Scholar

Yang, J., Yao, Y. R., Tian, B. H., Jiang, C. Y., and Xue, K. (2025). “A comparative study of unscented kalman filtering and particle filtering for state predictions of industrial process,” in Proceedings of the 20th IEEE conference on industrial electronics and applications, 3–6.

Google Scholar

Zayed, S. M., Attiya, G., Sayed, A. E., Sayed, A., and Hemdan, E. D. (2023). An efficient fault diagnosis framework for digital twins using optimized machine learning models in smart industrial control systems. Int. J. Comput. Intell. Syst., 16–69. doi:10.1007/s44196-023-00241-6

CrossRef Full Text | Google Scholar

Zhang, L. W., Lin, J., and Karim, R. (2017). Sliding window-based fault detection from high-dimensional data streams. IEEE Trans. Syst. Man, Cybern. Syst. 47 (2), 289–303. doi:10.1109/TSMC.2016.2585566

CrossRef Full Text | Google Scholar

Keywords: fault detection and diagnosis, particle filter, long short-term memory network, moving window, generalized likelihood ratio test, digital control systems

Citation: Lei B, Tian B, Yao Y, Jiang C and Yang J (2026) An ensemble data-driven method for fault detection and diagnosis of digital control systems in nuclear power plants. Front. Nucl. Eng. 4:1714098. doi: 10.3389/fnuen.2025.1714098

Received: 27 September 2025; Accepted: 17 November 2025;
Published: 08 January 2026.

Edited by:

Nicola Pedroni, Polytechnic University of Turin, Italy

Reviewed by:

Shiming Yin, Purdue University, United States
Arvind Sundaram, Purdue University, United States

Copyright © 2026 Lei, Tian, Yao, Jiang and Yang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jun Yang, eW91bmdqdW41MUBob3RtYWlsLmNvbQ==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.