Multi-state Risk-Based Maintenance Analysis of Redundant Safety Systems Using the Markov Model and Fault Tree Method

The risk-based maintenance strategy has received special attention in the safe operation of nuclear power plants. Simultaneous quantification of the positive and negative effects of maintenance activities and components degradation effect makes it possible to accurately evaluate the risk criterion for safety systems of nuclear power plants. However, it is difficult to integrate the effects of maintenance and components degradation into the standard reliability approaches. A straightforward approach for considering components degradation and different maintenance policies is to make use of Markov maintenance models. In this article, the effectiveness of maintenance activities (including changes in the surveillance test intervals and alteration in the different maintenance policies) on the components unavailability with considering aging effects is quantified using Markov maintenance models and then by coupling these models and the fault tree method, the risk measure is upgraded from the component level to the system level. The proposed models are applied to evaluate the unavailability of two safety systems of VVER-1000/V446 nuclear power plants as case studies. The results show that the Markov method due to its multi-state nature is effective in the conservative evaluation of risk measures so that the unavailability computed by the coupling process is higher than the original unavailability (calculated by system fault tree using PSA data of nuclear power plants) for all maintenance policies. In addition, this study illustrates that the developed Markov maintenance models could be applied to the large-scale whole plant level and provides a proper transition from the classical PSA methods to new techniques. This approach integrates the effects of maintenance strategies and components degradation. Also, it provides a practical and a more accurate tool to determine the technical specification of a real nuclear power plant from the risk point of view.


INTRODUCTION
The unexpected failures, the downtime associated with such failures, the loss of production, and higher maintenance costs are among the major issues in the nuclear industry (Krishnasamy et al., 2005). Therefore, it is necessary to identify the failure as soon as possible to avoid inconvenience in the nuclear power plant (NPP) system (Maitloa et al., 2020). Fault diagnosis systems are widely applied to guarantee the safety of nuclear power plants (Ma and Jiang, 2011;Gong et al., 2018;Wu et al., 2018;Maitloa et al., 2020). The fault detection and diagnosis (FDD) methods are categorized into the fuzzy logic method (FLM), model-based methods (MBMs), data-driven methods (DDMs), and sensor fault detection and diagnosis method (SFDDM). While the practical applications of MBMs, DDMs, and SFDDM are extremely limited, the FLM is used for the operation key of the NPP (Maitloa et al., 2020). Indeed, practical applications of model-based FDD methods are very limited due to the requirement of an accurate model that is always hard to obtain in practice. Data-driven FDD methods also rely on relationships between correlated measurements within a system. In this regard, one needs to formulate the relationships using certain ways that require data obtained during normal operations of NPP (Ma and Jiang, 2011). Also, SFDDM does not give appropriate accuracy compared with other methods of FDD (Maitloa et al., 2020).
Along with the interest in using FDD methods to improve safety, reliability, and availability of NPPs (Ma and Jiang, 2011), over the recent decades, there has been a growing interest in NPPs to develop maintenance approaches to attain the highest level of availability and safety (Kancev and Cepin, 2011;Hellmich and Berg, 2014;Kim et al., 2015;Shin et al., 2015;Soares et al., 2015;Kolykhanov and Kozlov, 2018;Ngarayana et al., 2019;Zhang et al., 2019;Gohel et al., 2020;Mohammadhasani and Pirouzmand, 2020).
Nuclear power industries have increasing interest in using maintenance activities in the form of risk-based models. A riskbased maintenance (RBM) approach helps toward designing an alternative strategy for minimizing the risks emanating from breakdowns or failures (Krishnasamy et al., 2005). Over the recent decades, the deterministic test and maintenance (T and M) strategy models are increasingly supported, especially those based on risk measures (Kancev and Cepin, 2011). Adopting a risk-based T and M strategy is an essential step toward evaluating the effects of maintenance activities (MAs) on risk measures at both the component level and the system level. In this context, the question of quantifying the effects of maintenance strategies on risk measures has been discussed repeatedly in the literature (Vesely and Rezos, 1995;Baraldi et al., 2011;Kumar et al., 2012;Veeramany, 2012;Joel and Kumar, 2014;Zio and Compare, 2013;Kumar and Joel, 2018). In this regard, both the positive and the negative aspects of MAs should be quantified (Vesely and Rezos, 1995), thereby optimizing the MAs to reduce the risks or increase the availability of safety systems.
Such standard probabilistic risk assessment (PRA) approaches as the fault tree (FT) method is by far the most popular approach in dealing with PRAs (Bucci et al., 2008). In developing an FT analysis, all potential causes of a specified system failure are investigated. As a result, the construction of a fault tree will provide the analyst with a better understanding of the potential causes of a system failure. Nevertheless, concerns have been raised in the literature with regard to the potential limitations of FTs (Andow, 1981;March-Leuba et al., 1984;Aldemir, 1989;Hassan and Aldemir, 1990;Aldemir and Siu, 1996;Bucci et al., 2008). Some of these limitations are related to modeling complex maintenance strategies. In fact, a static FT method, due to hidden failures, can either underestimate or overestimate the unavailability of a system, and when the system comprises many trains, the computational errors increase (Kancev and Cepin, 2011;Hellmich and Berg, 2014).
Furthermore, it is difficult to integrate the effects of maintenance strategy and components degradation into the FT model (Kancev and Cepin, 2011). In fact, the FT method assumes two states for each component, that is, a success state and a failure state. Owing to these assumptions, only the negative aspects of MAs can explicitly be quantified, that is, the effects of maintenance downtime and possible maintenance-related errors. The benefits of maintenance strategies cannot be explicitly quantified since a major yield of the maintenance is to prevent and correct degradations before the occurrence of a failure. Degraded component conditions are not taken into account in standard PRA modeling, and hence, the advantage of maintenance in correcting degraded conditions is not explicitly considered.
To assess multistate systems reliability, many approaches, including the Monte Carlo method, universal generating function approach, semi-Markov model, etc., have been proposed up to now (Jung and Cho, 1991;Tomasevicz and Asgarpoor, 2009;Veeramany, 2012;Wang et al., 2017). These approaches are introduced to describe the random behavior of systems and the degradation/repair of the components. The Markov method has been widely applied to analyze the system reliability as well (Dugan et al., 1993;Vesely and Rezos, 1995;Chan and Asgarpoor, 2006;Veeramany, 2012;Hellmich and Berg, 2014;Dawid et al., 2015;Kumar et al., 2020;Mohammadhasani and Pirouzmand, 2020).
In the Markov models, the transition rates between states are constant, which means that the failure and degradation processes are memoryless , while in some realistic situations, the transition rates are time-dependent. In this case, one requires estimating the transition rates from field data. In practice, it can be difficult or even impossible to collect relevant data, especially for the highly reliable devices (e.g., nuclear components and aerospace devices) . In order to overcome the above bottleneck, some approaches have been proposed so far, in which the transition rates are described by physics functions rather than estimated from service data . However, applying these approaches to real complex systems could be very time consuming and increases the computational costs significantly. This article seeks a multistate model that has the capability of modeling MAs and aging effects. It can be applied to NPPs where a huge number of components need to be analyzed. Despite the Markov model limitations, the literature shows that the Markov process is an affordable and a straightforward approach that can be easily applied to a large number of components with complicated maintenance policies with reasonable accuracy and computational costs (Alam, 1982;Sim and Endrenyi, 1993;Somani et al., 1993;Vesely and Rezos, 1995;Papazoglu, 2000;Bukowski, 2001;Chan and Asgarpoor, 2006;Kumar et al., 2013;Matsuoka, 2014). Therefore, in this article the Markov process models are developed and applied to quantify basic events unavailability.
Indeed, in our present article, we have upgraded the Markov maintenance approach set forth in a previous work on evaluating the effect of maintenance policies on the components unavailability to the system level (Mohammadhasani and Pirouzmand, 2020). Three maintenance policies at the component level are first developed using the Markov approach. Then, by coupling Markov maintenance models (MMMs) with the FT model developed at the system level, the risk criterion (i.e., unavailability) is upgraded from the component level to the system level. It is worth mentioning that in this study, component degradation is also considered in the Markov model. Evaluation of variations in the unavailability calculated by coupling MMMs and FT at the system level due to variation in the components degradation rate and also a change in the surveillance test interval (STI) is another goal followed in the present article. This is carried out for three maintenance policies, and it is shown that using the Markov model due to its multistate nature and modeling the degradation state for components leads to the best estimate evaluation of unavailability computed at the system level. The developed model is applied to calculate the unavailability of two standby safety systems of a VVER-1000/V446 NPP.
The present article is structured as follows: Markov Maintenance Models provides a discussion of the MMMs developed in this research and introducing three different maintenance policies implemented in the Markov model.   The transition rate from a functioning state to a degraded state (degradation rate). ξ The transition rate from a functioning state to a test and inspection state, and also the transition rate from a failure state to a repair state (test interval). η.
The transition rate from a test and inspection state to a functioning state (test duration). μ The transition rate from a repair state to a functioning state. α The transition rate from a degraded state to a test and inspection state. β The transition rate from a degraded state to a failure state (failure rate when the component is degraded). A standard PRA usually covers both the degraded state with the functioning state and does not model them separately. To quantify the maintenance effectiveness, it is necessary to discriminate the degraded state from the functioning state. Therefore, the present research study also considers a degraded state for the component assuming that in the degraded state, the component is still functional but in a degraded condition. Also, it is assumed that transition from a degraded state to a failure state occurs when a severe degradation drops the component performance below the expected design normal level.
Given that the components are under periodic testing, the test state is also taken into account. Another assumption is that testing of a component for inspection does not bring about the component unavailability. Also, in the models, a repair state is considered when the component is down. This state reflects the negative aspect of the maintenance process due to the component unavailability. Finally, five states are considered for the components which are defined in Table 1. Given the five states (A, D, M, R, and F), it is required that a definition of the transition rates between the states is provided. The relevant transition rates are shown in the transition matrix of Table 2. The missing values are disallowed transitions and can be treated as having a transition rate value of zero. The nonzero transition rates are defined in Table 3.
It is notable that for 1oo4 redundant safety systems investigated in this study, a simultaneous failure of two-, three-, and fourcomponents caused by the common cause failures (CCFs) are assumed and modeled by λ 2 , λ 3 , and λ 4 transition rates, respectively. In this regard, the CCF data given in the PSA are used to calculate CCF rates (AEOI, 2003).

MMMs Assumptions
To develop the MMMs, the following assumptions are made: 1) It is assumed that the performed maintenance and repair are perfect so that to restore the component to as good as a new condition.
2) The transition rates between states are constant and the components unavailability is calculated in the steady state.
3) The failures are assumed to be hidden until the components are tested. 4) The components testing is assumed to be staggered and scheduled. 5) It is supposed that a component testing does not lead to system unavailability (AEOI, 2003). 6) It is assumed that the testing time is negligible compared to the repair time.

Maintenance Policies
According to Hellmich and Berg (2014), there are three different maintenance policies at the component level applicable to MMMs of redundant safety systems. The policies are as follows: Policy 1: If a failure is detected in one component during the surveillance test, it is repaired promptly after the detection. No additional test is performed on other components. When the repair job is finished, the normal surveillance test schedule is resumed.
Policy 2: If a failure is detected in one component during the surveillance test, it is repaired promptly after the detection. Other components are subjected to a test as soon as the repair of the first component is finished, and if found defective, they are repaired immediately as well.
Policy 3: If a failure is detected in one component during the surveillance test, it is repaired promptly after the detection. Other components are subjected to simultaneous additional tests. If they are found defective as well, all four components are repaired simultaneously.
Figures 1-3 illustrate the implemented Markov models for maintenance policies 1-3, respectively. As was mentioned earlier, the model developed in the present article is considered for 1oo4 redundant components. And so, as can be seen in those figures, in each state of the Markov process, the first, second, third, and fourth letters represent the states of components 1, 2, 3, and 4 of the redundant system, respectively.
For example, Figure 1A illustrates the transition cycle of 1oo4 components for policy 1, assuming that the system is initially in the MAAA state (the state in question is indicated in green). Figure 1B shows the transitions also occurred for the AAAA states shown in Figure 1A.
All transitions of policy 1 are established for policies 2 and 3 as well. The transitions which distinguish policies 2 and 3 from policy 1 are represented in Figures 2, 3, respectively. Specifically, what discriminates policy 2 from policy 1 is the assumption that after the repair of a failed component (the failure detected by testing), the component of the next redundant train enters into a surveillance test state and so on (i.e., the states shown in green in Figure 2). Also, in policy 3, after identifying a failed component and its transition to the repair state, other redundant components are simultaneously subjected to surveillance testing. Therefore, it is possible to not only repair all components in parallel but also test and repair the redundant components simultaneously (see Figure 3). Figure 3 shows the transitions occurred in policy 3 assuming that the system is initially in the AFAA state. It is worth mentioning that policy 3 is applicable to NPP equipment as it has been considered to be a reliable analysis of different systems of NPP in the literature (Jung and Cho, 1991;Hellmich and Berg, 2014). In this case, it should be noted that for many standby safety systems in NPPs additional restrictions are applied. For example, in a plant with a 1out-of-4 safety system, technical specifications require that if the simultaneously failure of two trains is revealed, the plant must be switch to the cold shutdown condition. Also, the duration of downtime (due to repair) of failed trains must not exceed a specified allowed outage time (AOT). If the AOT is exceeded, it is mandatory to shut the plant down. Hence, a simultaneous repair of all redundant components is possible within the permissible time, provided the plant is shutdown.
In order to drive the various states and transitions rates, the following procedure is pursued: The redundant 1oo4 components of VVER1000 NPP safety systems are considered to implement the Markov models developed in this study. The implementation of the Markov model taking into account five states (F, D, A, M, and R) for the components will produce 540 states for policies 1 and 2 and 716 states for policy 3. For each policy, the states are first divided into eight groups. Eight groups of states must be recognized since the process has to remember which component is tested next, in spite of the memoryless property of the Markov process. Four groups cover the test and repair process for the component and the other do not include any testing or repair. After that, for each policy, the states are formed in the MATLAB software by applying the assumption given in MMMs Assumptions and assume that only a clockwise permutation between states in eight groups is allowed (see Figure 4). At this time, the transition rate matrixes (a 540 × 540 matrix for policies 1 and 2 and a 716 × 716 matrix for policy 3) are constructed by programming in the MATLAB software. Finally, Markov equations are formed and solved to give each state probability and other components performance characteristics such as MVF (maintenance visit frequency), RVF (repair visit frequency), FVF (failure visit frequency), and MTBF (mean time between failure) (Høyland and Rausand, 2004;Modarres et al., 2016).
To establish the transition rate matrixes, the following assumption are applied (see Table 2): 1) Transition from a degraded state directly to a standby state (D → A) is not considered since a maintenance state must first exist. 2) The transition from a failed state directly to a maintenance state (F → M) is not modeled assuming a repair has precedence over maintenance.
3) It is assumed that the degradation is not critical and it does not need to be repaired; therefore, the transition from a degraded state to a repair state (D → R) is not deliberated. 4) The transition rates from a failed state to a standby state (F → A) and a transition from a standby state to a repair state (A → R) are set to zero.
After establishing the transition rate matrix for each policy, the governing linear equations are formed. Equation 1 presents the simplified matrix equation for policy 1 as a sample: Here, ω i is the sum of all elements in each row with a negative sign and p i , i 1, 2, 3, . . . , 540, are the state probabilities that need to be calculated. Other parameters are introduced in Component Markov Model.

DESCRIPTION OF CASE STUDIES
This section is devoted to describe different functions of the two main safety systems of VVER-1000/V446 NPPs as case studies: the emergency core cooling safety system (ECCS) and the emergency cooling safety system. These systems are 1oo4 redundant systems accommodating four identical trains normally in a standby mode.

Case I: Emergency Core Cooling System
ECCS is one of the most important NPP safety systems designed to remove the reactor core heat under accident conditions. The system is designed to mitigate the consequences of any break in the reactor coolant system (RCS) pressure boundary which might result in the loss of reactor coolant at a rate exceeding the capability of the reactor coolant makeup system. The system is also intended for the reactor core cooling after its shutdown in modes when heat removal via steam generators (SG) becomes FIGURE 4 | Permutation between states in eight groups is allowed in a clockwise. ineffective and for removing heat from the fuel placed in the fuel pool as well (AEOI, 2008). The ECCS comprises four independent trains. All system trains are physically and structurally separated one from the other. Each train performs safety function measures in all states of the unit including design basis accidents (AEOI, 2003).
As mentioned before, each safety system has a specific and different function; hence, the FT related to each function is developed separately and coupling of the MMMs with the FT method is performed for each separate function.

Case II: System for Emergency Cooling
The system is intended for emergency heat removal from the core through the secondary circuit under the following conditions: 1) maintaining pressure in the secondary circuit and 2) reactor cooling at the predetermined rate (AEOI, 2003).
Under the first mode, the system operates automatically. Specifying points for opening and closing fast-acting reducing stations for steam dump into the atmosphere (FASD-A) are performed according to the design pressure in steam generators and steam lines. In the case of FASD-A failure to open, pressure in the steam generator is maintained with the help of steam generator safety valves (AEOI, 2003).
Under the second mode in order to ensure that the preset cooling rate is equal to 30°S/h (slow cooling) or 60°S/h (fast cooling), the operator switches over the FASD-A to the corresponding cooling mode (AEOI, 2003).
The system functions are as follows: Functions HO, HO″, R1: Residual Heat Removal Through the Secondary Circuit Over Opened Cycle The function of the residual heat removal through the secondary circuit over the opened cycle is executed for all initiating events when it is impossible to perform the function of long-term heat removal from the core via turbine condenser through the closed cycle (AEOI, 2003). Functions HO, HO′′ and R1 are executed when SGs are connected to the main steam collector (MSC) (AEOI, 2003).

Functions R, RS: Reactor Plant Cooling Through the Secondary Circuit
The emergency cooling system performs the function of the reactor plant cooling through secondary circuit for all initiating events when the reactor plant cooling is required. Function R is executed when SGs are connected to MSC while function RS is executed when SGs are isolated from MSC (AEOI, 2003). Further explanations on these functions are presented in Table 5 (AEOI, 2014). Figure 5 presents the chart related to the coupling process of MMMs with the FT method for calculating the system unavailability. To start the coupling process, for each safety function, the FT is developed in the SAPHIRE software. Then, the critical components are extracted using the FT analysis and cut sets generation for each function, by the classical PSA approach (i.e., two states for each basic event and no degradation). Herein, the importance measures related to the maintenance phase, that is, the risk reduction ratio (RRR) and risk increase ratio (RIR) are applied to prioritize the basic events (Nøkland, 2013). After determining the critical components, multistate unavailability models, developed in MATLAB software, are assigned to the prioritized basic events and the risk measure is upgraded from the components level to the system level using the FT analysis in the SAPHIRE software. Figure 6 as a sample shows a simplified FT for the GL function of ECCS system of VVER1000/V446 NPP. To analyze the FTs, they are first implemented in the SAPHIRE code. Then, by applying the Boolean algebra, the top event for each FT is calculated based on basic events and is simplified to gives the minimal cut sets (MCSs).

COUPLING MARKOV MODEL WITH FAULT TREE METHOD
After developing the MMMs at the component level and implementing them for different maintenance  (Høyland and Rausand, 2004): Let X {0, 1,. . ., r } stand for the set of all possible states of a component and let B and F (F X-B) stand for the subset of states respectively corresponding to the component functioning and failure states, then the average availability of the component is the mean proportion of time when the component is functioning.

The average component availability A c is thus calculated as follows:
A c j∈B P j . (2) Here, P j is the probability of being in state j. Clearly, the component unavailability (U c ) is obtained as follows: In the next step, the components unavailability calculated by MMMs is assigned to respective basic events in the FT and the top event probability is calculated as a risk criterion. This process evaluates the effects of MAs at the component level on the system unavailability. The above mentioned procedure is iterated for all functions under investigation.
The developed model is eventually applied for evaluating the system unavailability variations with STI [as one of its technical specifications (TSs)] and the component's degradation rate. According to IAEA-TECDOC-503, the permitted tolerance for the deviation from a specified surveillance test interval is plus or minus 25% of the interval (IAEA- TECDOC-503, 1989). The test interval currently used for the components of the safety systems of VVER-1000 NPP is 28 days (AEOI, 2003). Therefore, the adopted test intervals to examine the changes in system unavailability are determined at 21 and 35 days.
The effects of degradation on the system unavailability also are investigated. It is to be noted that the degradation rate is considered by varying the β parameter value introduced in Component Markov Model. This parameter indicates the transition from a degradation state to a failure state, which is calculated based on NUREG/CR-6002 as given in the study by Vesely and Rezos (1995): where λ 1 is the independent constant failure rate and r AD indicating the degradation ratio-a relative factor, where its relevant small values (e.g. 1 <r AD ≤ 3) and large values (e.g., r AD ≥ 10), respectively, represent the slow and the rapid degradation rates. Furthermore, f AF is the catastrophic failure fraction, which is the fraction of all failures regarded as catastrophic. According to NUREG/ CR-6002, a small value for f AF e.g. (f AF 0.1) is selected to represent a small fraction of catastrophic failures not passing through a degraded state (Vesely and Rezos, 1995).

RESULTS
This section presents the results of implementing the coupling MMMs and FT method. Table 6 shows the system unavailability calculated through the coupling process for most important functions of the safety systems and for three maintenance policies. Also, in Table 6, the original unavailability (calculated by system FT using NPP PSA data) is compared with that of the unavailability from coupling process for each function. As is shown, the unavailability values computed by coupling MMMs with FT for all maintenance policies and for all functions are higher than the original unavailability. It should be noted that the Markov method-due to its multistate nature-is effective in a realistic evaluation of the component unavailability and consequently the system unavailability. The results are also displayed in Figures 7, 8 to provide a better comparison. Frontiers in Energy Research | www.frontiersin.org July 2021 | Volume 9 | Article 685634 As is expected, the system unavailability in policy 3 is lower than that of policy 2 and the system unavailability in policy 2 is similarly lower than that of policy 1. This is attributed to the characteristics of maintenance policies implemented at the component level. In other words, given that in policy 3, after detecting the hidden failure of a component, as the failed component enters the repair state, other components also undergo a surveillance test. If they are found to be in a failure state, they are simultaneously repaired. Hence, other redundant components failure is detected sooner than other policies, as a result of which, the component unavailability and consequently the system unavailability are reduced.
In policy 2, it is supposed that by detecting a failed component and accomplishing relevant repair, other components are subjected to surveillance testing. Therefore, the failure of redundant components is identified sooner than that of policy 1 and consequently its unavailability becomes lower than that of policy 1.
In policy 1 no additional testing procedure is performed on other redundant components. After detecting a failed component during the surveillance testing and subsequent prompt repair, the normal surveillance test schedule for other components is resumed. Hence, the failure of other components remains hidden. Accordingly, in policy 1, the system unavailability is higher than that of other policies.
The effect of STI on the system unavailability is also quantified for STI 21 and 35 days and compared with the reference values of STI for the VVER-1000 NPPs (see Table 7). As is expected, decreasing STI from 28 to 21 days reduces the unavailability in all maintenance policies. In fact, by a decrease in STI, the component is inspected and tested sooner, so, it remains in a lesser amount of time in a failure (and unavailable) state. This result is inversed when the STI 28 days is increased to STI 35 days, in which case, the unavailability value increases in all maintenance policies. These results are represented graphically in Figures 9, 10, thus providing a better comparison. Table 8 presents the systems unavailability for both fast and slow degradation rates for selected system functions. This Table provides a comparison between STI 28 days and STI 10 years. The STI 10 years is selected to evaluate the effect of degraded component on the systems unavailability. It is shown that at fast degradation, the systems unavailability for all functions increases sharply. Indeed, an increase in the degradation rate simultaneous with an increase in the STI at the component level (equaling to the component aging) leads to growing unavailability. Therefore, considering the degraded state for components under MAs results in more a realistic evaluation of the unavailability at the system level as expected. Finally, the FV importance measure, which represents the component contribution to the system failure (Høyland and Rausand, 2004) is calculated for basic events in three maintenance policies and is compared to the FV value in the base case (i.e., two state for each basic event and no degradation). As a sample, the FV measure computed for the critical components of "F" function (described in Case I: Emergency Core Cooling System) of ECCS of VVER1000 NPP is shown in Table 9. As presented, the FV measure of all MMMs is higher than the base case. Therefore, the importance of components in the coupling process increases compared to the base case. A similar result is obtained by comparing data of policy 1 and policy 2 to policy 3, respectively. In addition, the coupling of the MMMs with the FT method, changes the order of components importance as can be seen in Table 10.

CONCLUSION
In this study, a risk-based maintenance strategy was adopted for evaluating the effects of maintenance activities on the system risk criterion. Conventional reliability approaches including the FT method can only quantify the negative aspects of maintenance while the upsides of the maintenance procedure in correcting degradations and preventing failures are generally neglected. This article proposed the Markov maintenance models to integrate the effects of maintenance measures and the components degradation on the system unavailability. By coupling these models with the FT method, the risk criterion was upgraded from the component level to the system level, where up on the effects of MAs on the system unavailability were put to assessment. The evaluation was performed via comparing the unavailability calculated by a coupling process and the original unavailability obtained through the system FT using NPP PSA data for several important functions of VVER-1000 reactor safety systems (Table 6 and Figures 7, 8). The comparisons showed that the FT method underestimates the unavailability of the systems. In contrast, the MMM, due to its multistate nature and modeling the degradation state for components, leads to a more realistic estimate of the unavailability computed at the system level so that for all maintenance policies and for all functions, the estimates are higher than those of the original unavailability (Table 6). Also, the obtained results confirmed that due to the characteristics of maintenance policies implemented at the component level, the system unavailability in policy 3 is lower than that of policy 2 and the system unavailability in policy 2 is similarly lower than that of policy 1 for all functions ( Table 6).
Quantification of the STI effects on the system unavailability was rendered for the reference STI for 1oo4 redundant components of VVER1000/V446 safety systems with an allowable tolerance equaling 25% of the interval ( Table 7). As is expected, decreasing the STI from the reference value reduces the unavailability in all maintenance policies and the situation is reversed when the STI increases from the reference value ( Table 7).
The effects of components degradation rate and the STI value on the system unavailability were evaluated as well ( Table 8). As is expected, the systems unavailability at fast degradation for all functions rises (Table 8) and the component aging factor increases the risk criterion at the system level. Therefore, modeling the degraded state for components under MAs is a most essential step toward a risk-based maintenance optimization.
It is worth mentioning that the Markov models developed in this work can provide researchers with a new tool for evaluating riskbased maintenance measures. Upgrading the risk measures to the plant level, that is, core damage frequency (CDF), is being conducted by the authors. In this regard, it is possible to study different maintenance policies at the plant level from a risk-based point of view and evaluate the effects of alteration in TSs on the CDF. In this   Thus, all systems of VVER1000/V446 NPP will be modeled in the SAPHIRE software and by establishing a link between MMMs, fault trees, and event trees of various initiating events, the CDF for different scenarios will be calculated.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material; further inquiries can be directed to the corresponding author.