Cyber-physical cascading failure and resilience of power grid: A comprehensive review

Smart grid technologies are based on the integration of the cyber network and the power grid into a cyber-physical power system (CPPS). The increasing cyber-physical interdependencies bring about tremendous opportunities for the modeling, monitoring, control, and protection of power grids, but also create new types of vulnerabilities and failure mechanisms threatening the reliability and resiliency of system operation. A major concern regarding the interdependent networks is the cascading failure (CF), where a small initial disturbance/failure in the network results in a seemingly unexpected large-scale failure. Although there has been a significant volume of recent work in the CF research of CPPS, a comprehensive review remains unavailable. This article aims to fill the gap by providing a systematic literature survey regarding the modeling, analysis, and mitigation of CF in CPPS. The open research questions for further research are also discussed. This article allows researchers to easily understand the state of the art of CF research in CPPS and fosters future work required towards full resolutions to the remaining questions and challenges.


Introduction
Smart grid technologies have been transforming power grid operation and control paradigms in the recent decades.Beyond the physical power delivery infrastructure, a smart grid is equipped with smart sensing devices, advanced communication network, and powerful computing resources for monitoring operating conditions, transferring data, and optimizing resource allocation for the grid, respectively.With the integration of the cyber network, the power grid evolves into the so-called cyber-physical power system (CPPS).
Although the cyber-physical nature of modern power grids is advantageous in many ways, it poses major challenges on the reliability and resiliency of system operation as well.In CPPS, the cyber and the physical networks are highly interdependent.As a result, the grid becomes more vulnerable and prone to natural disasters and man-made attacks.In the CPPS, a small malfunction/failure in one network can affect the functionality of the other network, which may in turn affect the former one; this vicious cycle may continue until a cascade of failures occur with catastrophic consequences.For example, in September 2003, a severe blackout occurred in Italy due to the initial disconnection of one power station from the grid, which then led to the failure of several nodes in the cyber network.As a result, the grid could not be effectively monitored by the cyber network, leading to the failure of additional power stations and transmission lines (Buldyrev et al., 2010).Similarly, a cyber-attack on the Ukrainian power grid caused outages in 2015.The CF effects in the interdependent CPPS accelerated failure propagation in the grid, resulting in a largescale blackout.A detailed survey on the CF in the power grid can be found in (Haes Alhelou et al., 2019).
Over the past decades, extensive research has been conducted in the field of cascading failure (CF) of the physical power grid (Guo et al., 2017) (Nakarmi et al., 2020).However, those studies are not enough to characterize the CF in CPPS because of the interdependency between the cyber and physical networks.In recent years, several failure propagation models have been proposed for analyzing the CF in interdependent networks.Some of the models incorporate the individual network properties whereas others mainly focus on unifying the mechanisms of failures in different networks (Buldyrev et al., 2010) (Ji et al., 2016).As a typical example of CF in interdependent networks, the study of CF in CPPS has emerged as an important research topic, and many interesting works have been published in the recent years.Despite a few related attempts, a comprehensive review on the state-of-the-art techniques of CF modeling, analysis, and mitigation in CPPS remains unavailable to summarize and guide research in this field.Jufri et al. (2019) reviews the existing research works on enhancing the resiliency of CPPS for preventing and mitigating CF, but it does not discuss the modeling of failure propagation in CPPS.The interdependencies and CF in general cyber-physical systems (CPS) are reviewed in (Li et al., 2019), but it is does not provide an in-depth coverage on the CPPS, especially with unique physical properties of power grids compared with other networks.Liu et al. (2021) provides a concise review of CF modeling and analysis in future power grids from two major perspectives: cyber network integration into the grid and high penetration of power electronics, but it lacks a detailed CF modeling and mitigation strategy.Guo et al. (2017) presents a comprehensive survey on techniques for CF modeling and analysis in physical power grids.Although it discussed the impact of the cyber network, a systematic review from the cyber-physical perspective remains missing.
To fill the aforementioned gap, a thorough review of CF modeling, analysis, and mitigation in CPPS will be provided in this article.First, background research regarding CF in power grid, communication network, and general interdependent networks will be reviewed.Next, the models of various CPPS components adopted by existing CF analysis will be categorized.They include the models of power flow in the power grid, the models of data flow (routing) in the communication network, the models of cyber-physical interdependencies, and the models of intra-network and internetwork failure propagation.This will be followed by a categorization of the CF analysis methods, including simulationbased methods, percolation-theory-based methods, and Markovchain-based methods.Subsequently, mitigation strategies for CF in CPPS is summarized, and the linkage between CF mitigation and the concept of resilience is introduced.The paper concludes by extensive discussions on the remaining challenges to be addressed and potential future research directions.The overall structure of the paper is given in Figure 1.Overall, the article will answer many of the common questions regarding CF in CPPS and allow researchers to grasp a holistic picture of the landscape of this increasingly popular research field.
The rest of the paper is organized as follows.Section 2 will present a brief overview of background research on CF analysis in siloed power grid and communication network, and introduce general concepts and theories of CF in interdependent networks; essential concepts and definitions of CF in CPPS will be given in Sections 3, 4 describes in detail the modeling techniques for various components in CPPS for CF analysis; several categories of methodologies for CF analysis will be described in Sections 5,6 presents CF mitigation strategies as well as their linkage to the concept of resilience; future research questions are summarized and discussed in Section 7. Finally, Section 8 concludes the paper.
2 Background research: Cascading failure in power grids, communication networks, and general interdependent networks CF is a prominent phenomenon in many complex infrastructures such as power grid (Schäfer et al., 2018), water system (Sitzenfrei et al., 2011), gas system (Bao et al., 2021), IoT network (Zhao and Xing, 2020), etc.The main reason of CF is that the components in a complex network rely on and coordinate with each other for fulfilling the functionalities of the network.As a result, a malfunction/fault in one component may affect the functionalities of other components and make them fail.This process may start with an insignificant failure in the network, but continue to fail many more components progressively, and therefore is referred to as a cascading failure process.At the end of a CF, a large portion of the network may collapse, and the remaining portion may be unable to meet the demand of the operators and the users.The main challenge of CF research is to understand the mechanisms of CF, identify possible failure paths, recognize early failures, and take precautions to avoid a cascading effect.In recent years, the difficulty of modeling and analyzing CF has been exacerbated by the interdependency between multiple complex networks.In view of the challenges, many approaches have been developed for modeling, analysis, and mitigation of the CF of both individual and interdependent networks.While this review primarily focuses on CF in CPPS, this section will first provide a brief description of the CF process in siloed power grid and communication network; the general concepts of CF in interdependent networks will also be introduced.

Cascading failure in power grid
In a power grid, CF can be viewed as a series of outages followed by an initial outage of a component.It can be analyzed with power flow models based on physical laws (e.g., Kirchhoff's Law and Ohm's Law) and the capacity constraint of the grid components.In a power grid, electric power is transferred from generators to loads via transmission/distribution lines, i.e., branches.In this process, the power is distributed among different branches according to the power flow model.However, when disturbances occur and cause a branch failure, the power flow will be redistributed among the remaining active branches to create a new equilibrium.In this process, the redistributed power may overload (i.e., exceed the flow capacity of) some active branches and/or cause under-or overvoltage at some buses, resulting in further failures in sequence due to the triggering of overcurrent or under-or overvoltage relays (Simpson-Porco et al., 2016).Additionally, due to the increased penetration of renewable energy resources (RES), power grids experience low-inertia conditions prone to frequency instability and CF in the system (Jalali et al., 2019).A number of CF events in power grids have been reported over the past few decades.They were initially triggered by a wide variety of mechanisms such as line overloading, device malfunction, and lack of coordination between operation and planning (Haes Alhelou et al., 2019).As the causes and propagations of CF are complex and diverse, many models and methodologies are present to study the CF in power grids.Among them, two main categories will be briefly reviewed here: simplified statistical models, and detailed physicsbased models.
The simplified statistical models render a fast and approximate overview of probable CF paths neglecting the detailed physical properties of a grid.As a result, these statistical models can simulate CF in a large-scale grid with tractable complexity.For example, the CASCADE model studies the loading effect of grid components on CF with several simplified assumptions of power grid properties (Dobson et al., 2004).CASCADE simulates CF with several iterative failure stages.The model initializes a disturbance and checks the loading conditions of all the grid components.A failure is triggered if an overloaded component exists, and the loads of the failed components are equally distributed among all other components.Then the next stage of failures starts by checking and loading conditions of all the remaining components again.This model shows that the distribution of the failed components follows a quasi-multinomial joint distribution and presents an analytical solution to calculation of the probability distribution of the number of failed components due to an initial failure.Similar to the CASCADE model, the Branching Process (BP) model provides an analytical solution to the calculation of the probability distribution of the number of line outages and amount of load shedding due to CF by estimating the influence of a failed component on the following stage of failures using stochastic processes (Qi et al., 2013).The CF simulation using the BP model is shown to yield fair approximate results with respect to those achieved using more complex models, but with significantly lower computational complexity.
Obviously, the main limitation of the simplified statistical models is inaccuracy.In contrast, the detailed physics-based models study the CF incorporating the physical properties of power grids.Within this category there are two subcategories of models: static and dynamic.In static models, the power grid dynamics are neglected, and the steadystate operational condition is typically analyzed using DC power flow models.For example, the ORNL-PSERC-Alaska (OPA) model considers the standard DC power flow and solves an integer linear programming (ILP) problem for generation and load redispatch after line outages due to overloading (Carreras et al., 2003).This model runs iteratively to find failed lines based on a probabilistic model with respect to overloading conditions.The static models with DC power flows are adopted in many CF analyses (Soltan et al., 2017) (Yan et al., 2015) and mitigation strategies (Das et al., 2022).In dynamic models, the dynamic behavior of the grid is captured throughout the CF process.For achieving compatible accuracy levels, AC power flow models are often adopted (Noebels et al., 2022).For example, the Cascading Outage Simulator with Multiprocess Integration Capabilities (COSMIC) model uses differential equations to represent the dynamics of generators and loads (Song et al., 2016).This model helps understand the CF caused by dynamic grid events such as switching and high volumes of load or generator disconnection or reconnection.Both static and dynamic models have benefits and drawbacks.The static models are relatively faster than the dynamic models.However, in practice, the power grid responds to an initial outage with real-time control and protection mechanisms, which may result in a different steady state than the predicted one by static models.Therefore, dynamic models can capture more failure mechanisms and predict the propagation of CF more precisely.
Other than the above-described methods, there are several models to study the CF in power grids, which can be categorized based on their properties, such as topological models, modified topological models, stochastic simulation models, etc.Some of the specific examples from all the above models are the multi-timescale quasi-dynamic model (Yao et al., 2016), the improved OPA model (Mei et al., 2009), and the Markov transition model (Wang et al., 2012) (Rahnamay-Naeini et al., 2014).Recently, machine learning-based data-driven models have also been used to predict the CF propagation path in the power grid (Shuvro et al., 2019) (Pi et al., 2018).Detailed information about these methods can be found in the existing review papers (Guo et al., 2017) (Abedi et al., 2019) (VaimanBellChenChowdhuryDobsonHines et al., 2012) and will not be elaborated here.

Cascading failure in communication network
Communication network has become an inseparable part of modern societies.Each individual infrastructure, e.g., power grid, water system, etc., is equipped with a communication network to allow for situational awareness and control.However, the communication network itself can suffer from a CF, deteriorating the performance of the infrastructures dependent on it.Communication networks transfer data among different devices (data sources and sinks) via links and routers.Routers find paths and forward data via links between sources and destinations.Links and routers have limited data transfer capacities, and they may malfunction when the volume of data flow surpasses their capacities.Because of the initial failure of a few components (links, routers), data flows could be redistributed to the other active components, leading to further failures.This process is repeated until a significant portion of the network fails.In this subsection, the methodologies for CF analysis in communication networks will be introduced first, followed by various network examples such as wireless sensor networks (WSN) and the internet of things (IoT).
The CF models for communication networks can be categorized into deterministic models and stochastic models (Lehmann and Bernasconi, 2010).In deterministic models, the data load of the failed components is distributed to other active components with some deterministic rules.For instance, (Wang and Chen, 2008) provides a load redistribution model in which the data load of a failed edge is redistributed to its neighboring edges based on their weights (i.e., flow capacities).Based on this model, the authors investigate the robustness of weighted networks against cascading failure and identify the appropriate weights that provide the best robustness in typical communication network models, including small-world and scale-free networks.Wang et al. (2020a), on the other hand, employs a global load redistribution model in which the load in the network is set to the node's betweenness centrality after an initial failure.The authors use this model to identify the network's critical nodes, the failure of which accelerates CF events in the network.However, since the deterministic load redistribution only approximates the load in the network for triggering the next stage of failure due to overload, it may not fully reflect the complete or most probable set of failures that may occur.Unlike deterministic models, stochastic models adopt a more comprehensive analytical approach.For example, (Ren et al., 2018) proposes a conditional Markov state transition model to describe the failure propagation in a network due to node overloading and also shows how the failures are temporally dependent.
Although the general deterministic and stochastic models largely reflect the CF behavior of communication networks, a few adjustments are required to describe the CF in WSN and IoT networks since they have some distinct features and utilize low-capacity components.As the use of WSN and IoT networks have increased significantly in recent years, the study of CF in these networks receives special attention in the literature (Fu et al., 2020) (Xing, 2021).To study the CF in WSN, (Hu et al., 2015) sets the node traffic to its betweenness centrality and considers traffic overload and invalid connectivity of the nodes as the causes of failures.With these assumptions, (Hu et al., 2015) describes a CF model for WSN considering the dynamic load change in the network.Fu et al. (2021) considers both node and link capacities in CF analysis and assumes that a node can self-recover after a certain time as may occur in WSN.With the rising concept of internet of thing (IoT) many devices are being connected, and it becomes necessary to study the CF of IoT infrastructures.A detailed review on CF analysis and reliability for IoT infrastructures is provided for a wide range of IoT applications in (Xing, 2021).Fu and Yang (2021) considers the layered architecture and realistic characteristics of IoT, and presents a CF model driven by overload in relay nodes, base stations, and communication links.

Cascading failure in general interdependent networks
From the discussion of CF in siloed power grids and communication networks in the previous two subsections, it is seen that the network components may fail due to the redistribution of loads of the failed component within the same network.In the case of interdependent networks, the process of CF could be even more complicated.In addition to intra-network failure propagation, inter-network failure propagation may also occur.Unlike intranetwork failure propagation, load redistribution does not take place across networks; rather, inter-network failure propagations are caused by the dependence of components in one network on the functionalities of the failed components in the other network.Due to the mutual dependency between the two networks, failures may propagate back and forth between networks and lead to a multinetwork CF.There are several general methodologies for characterizing CF propagation in interdependent networks.In this subsection, we will review some of those methods and mention a few specific examples of CF in interdependent networks.
For the general modeling and analysis of CF in interdependent networks, it is challenging to incorporate the detailed physical properties of each network into the study.Rather, generic probabilistic analysis is usually adopted.Some of the methods for interdependent CF analysis are described below.i) Percolation-theory-based methods.Any complex network can be represented as a graph, where the nodes represent the components of the network and the edges represent the connectivity among the components.Two complex networks can be coupled to create interdependent networks by considering dependency among the nodes of the two networks.As the percolation theory provide a probabilistic framework for capturing the interaction among nodes and links in graphs, it is used for the CF analysis in many interdependent networks (Buldyrev et al., 2010).ii) Markov-chain-based methods.The CF is a sequence of failure events occurring one after another, which can be analyzed using Markov chain models with the assumption that the current failure events only depends on the failure events that immediately preceed them (Rahnamay-Naeini and Hayat, 2016); iii) Branchingprocess-based methods.The branching process can be used for CF analysis assuming that each failure component in the current stage will affect the next-stage failures with a probability (Qi et al., 2017).iv) Machine-learning-based methods.With the advancement of artificial intelligence, there are increasing attempts of using data-driven methods to analyze CF in interdependent networks (Maghsoodi and Khansari, 2021).It has the prerequisite of a large training dataset for learning the cascading failure paths within and cross the networks.
CF can occur in many real-world cases of interdependent networks where two or multiple networks depend on each other for proper operation.For electric power grid, the counter dependent network can vary from water supply systems to natural gas networks and others.In interdependent electric power-water infrastructures, pump stations, control units, and storage tanks in water network are dependent on power supply from nearby electric substations (Zhang et al., 2016) (Wang et al., 2022).Meanwhile, several types of power plants, such as coal-fired power plants and nuclear power plants, depend on water supply for proper operation.Similarly, in electric power-gas infrastructures, the two networks are coupled through electricity-driven gas compressors and gasfired electricity generators (Bao et al., 2020).As a result, a malfunction in one network may affects the production process of the counter one.A catastrophic CF in power-gas infrastructures occurred in Texas, United States in February 2021, affecting millions of people and causing hundreds of billions of capital losses (Extreme winter weather causes u.S. Blackouts, 2022) (Busby et al., 2021).Aside from the electric power grid, there are many other interconnected networks of significance.For instance, a syncretic railway network (SRN) comprising regional railway network and urban rail transit network is studied in (Liu et al., 2022a).A CF analysis is performed for interdependent road-channel network to assess urban flood propagation on the road network due to channel failure (overflow), where the channel is responsible for dumping the road's rainfall runoff (Dong et al., 2020).

Cascading failure in CPPS
The previous section reviews the background research of CF in power grids and communication networks, and introduces the general concept of CF in interdependent networks.This section will now concentrate on discussing the CF in CPPS.The CPPS model will be defined using graph theory and the CF propagation paths will be discussed in detail.

CPPS model
An abstract diagram of an interdependent CPPS is presented in Figure 2A.The power grid and the cyber network are represented by two separate graphs each having nodes and edges.Note that the communication network comprises networking devices (e.g., routers) and communication media, whereas the cyber network comprises communication networks, sensors, and controllers.The nodes in the power grid represent the buses at substations and the edges represent branches such as transmission/distribution lines and transformers.The nodes in the power grid are heterogeneous and include generation buses, transmission/distribution buses, and load buses.Similarly, the nodes in the cyber network represent control centers, controllers, sensors, and routers, and the edges represent the data transfer media, i.e., communication links.The terminal devices (sensors/controllers) are installed at different buses in the power grid, which monitor and control the power grid and communicate with the control center via intermediate routers in the cyber network.In Figure 2A, terminal devices are shown in the cyber network and are coupled with routers to enable communication.The interdependence between the two networks is shown in dashed lines.Both unidirectional and bidirectional dependencies are shown, where the terminal devices of the cyber network and the nodes of the power grid have bidirectional dependencies, and routers have unidirectional dependencies on the power grid (Abdelmalak et al., 2022).The reason for this consideration is that the power grid supplies power to both terminal devices and routers, whereas only the terminal devices monitor and control the power nodes.

Failure propagation in interdependent CPPS
The power grid depends on the cyber network for proper operation and control, whereas the cyber network depends on the power grid for energy supply.Sensors monitor the operating conditions of the power grid and report measurements to the control centers using available routes in the cyber network.After analyzing the received data, the control centers send control commands to controllers via available routes.When power grid nodes cannot be monitored and controlled due to the failures of cyber nodes, it is referred to as inter-domain failure in the power grid (Zhang and Yağan, 2020).Similarly, when a cyber node shuts down because it does not receive energy supply from the power grid, it is referred to as inter-domain failure in the cyber network (Zhang and Yağan, 2020).Additionally, after a failure in the power grid, the load of the failed power transfer path is redistributed to the other active paths leading to further overloading and power outages, which is referred to as intra-domain failure in the power grid.Similarly, a failure in the cyber network may result in additional failures in the network, which is referred to as intra-domain failure in the cyber network.Although the scale of the initial failures may be small, the process progressively triggers a cascade of failures with catastrophic consequences due to both intra-domain and inter-domain failure propagations.The mechanisms of failure propagations are explained with an example shown in Figures 2B-E.For demonstration, it is considered that a fault occurs to a generator node in the power grid.Due to this fault, the generator cannot supply power to two of its neighboring nodes which leads to load shedding (Figure 2B).As the loads at two power nodes fail due to the failure in the power grid, it is an intra-domain failure in the power grid.Subsequently, the counter dependent nodes in the cyber network fail as they loss their power supply, which is referred to as the inter-domain failure in the cyber network (Figure 2C).Then, the failure propagates within the cyber network because some of the cyber nodes become disconnected from the network, which is an intra-domain failure in the cyber network (Figure 2D).This triggers the failure of generators at power nodes due to the lack of monitoring and control from the failed cyber nodes, which is the inter-domain failure in the power grid (Figure 2E).This process continues until the system stabilizes again, i.e., when no new failure is triggered, and the system ends up operating in a significantly degraded state.

Cascading failure modeling in CPPS
With the advent of the CPPS, the study of dynamic interactions between the power grid and the cyber network has drawn the attention of research communities.With proper modeling, the characteristics of the interdependent systems can be captured, and further analysis can be performed to identify the vulnerabilities and reduce the catastrophic consequences of CFs.There are several important components to be considered in the CF modeling in CPPS.In this section, we will study the CPPS modeling techniques adopted by recent literature on CF analysis.These techniques are summarized in Table 1.

Modeling of the power flow in power grid
When a power branch/bus fails, the topology of the power grid changes, and the load of the failed branch/bus redistributes to the active branches/buses following the power flow model based on physical laws (e.g., Kirchhoff's Law and Ohm's Law in AC circuits) and control laws (e.g., automatic generation control and economic dispatch).After the occurrence of the faults, the updated operating condition associated with the new topology can be obtained by power flow analysis.For analyzing CF, power flow models with varying accuracy and computational efficiency are considered.They can be broadly categorized as DC power flow models and AC power flow models (Cetinay et al., 2018).The DC models can approximate the power flow in the system with lower computational complexity when the voltage magnitude differences or phase angle differences along the branches are small.However, it leads to approximation errors in power flow solutions especially when there are large differences of voltage magnitudes or phase angles between two terminal buses of a branch, which typically happens under heavy loading conditions (Li et al., 2018a).Some methods have considered DC optimal power flow (DCOPF) to obtain the maximum benefits from the DC power flow analysis, as optimal power flow problems are inherently computational intensive (Chen et al., 2019a) (Pan et al., 2020).To obtain the accurate operating conditions of the system, AC power flow models are more effective but at the expense of higher computational complexity (Li et al., 2018a) (Gao et al., 2021).Instead of using the DC or AC models, it is assumed in (Zhang and Yağan, 2020) that the load of the failed branch is redistributed globally and equally among the active lines arguing that it is a reasonable assumption under the DC power flow model and also follows the long-range nature of the Kirchhoff's Laws.With the assumptions, (Zhang and Yağan, 2020) shows that the obtained simulation results match the analytical ones derived in the article.(Prusty and Jena, 2017) thoroughly reviews the probabilistic load flow models, uncertainty characterizations, and uncertainty handling methods and proposes an analytical model for estimating the probabilistic load flow results while accounting for the photovoltaic generation and load demand uncertainties.

Modeling of intra-domain failure propagation in power grid
When the current of a branch exceeds its power flow capacity, the branch will fail if no control action is taken within a certain time limit because of the activation of relay protection (Kiliçkiran et al., 2018).After the initial failure, the load of the failed branch will be redistributed to the other active branches, which may overload the active branches, resulting in additional failures.The modeling of failure propagation within a power grid can be classified into two categories namely deterministic and stochastic models.In deterministic models, a branch fails instantly when there is an overcurrent flowing along the branch (Li et al., 2021), and a bus fails when the voltage of the bus exceeds the allowable thresholds (Gao et al., 2021).However, (Gao et al., 2020) argues that as the power grid is equipped with an increasing number of renewable resources and controllable loads, the power flow will become more uncertain, and the deterministic model cannot generate the accurate behaviors of failures.With this argument, the authors propose a stochastic failure model for estimating the failed components in the grid.In this model, it is assumed that every electrical component can fail with a certain probability at any minuscule time and a model is developed for determining the number of failed components at any given time.Besides failures due to overcurrents along branches and over/ undervoltages at buses, there are other types of failures as such as over-heating failure and hidden failure (e.g., malfunction of protective devices) (Li et al., 2021).According to (Cordova-Garcia et al., 2019), the use of automatic active control strategies may lead to more frequent reconfiguration of the grid in order to maximize grid operating conditions, which may induce overheating in the grid and increase intra-domain failure propagation.

Modeling of data flow (routing) in communication network
The proper modeling of routing is one of the important requirements for modeling, analysis, and mitigation of CF in CPPS.In CPPS, as fast and secure communication is required between the terminal devices (sensors/controllers) installed at power grid nodes and the control center, different routing algorithms are presented in the literature to fulfill the requirements (Sabbah et al., 2014).The routing algorithms find the minimum-cost paths between sources and destinations.The algorithms can use either global information or decentralized information about the network to generate data transfer paths.In the case of global information-oriented algorithms, the routers or a central controller of software-defined network (SDN) gathers and stores the global topology information for determining the optimal paths of data flows; whereas in decentralized informationoriented algorithms, each router/node finds paths based on the information obtained from its neighboring nodes.Han et al. (2018) determines the routing paths centrally by applying the Flued-Marshal algorithm for finding the overall weighted shortest path between a terminal device and the control center, where the weights are the queue length of packets along the path.Li et al. (2021) uses the publish-subscribe network (PSN) strategy to create a multicast tree between the control center and the terminal devices fulfilling the delay and the bandwidth requirements.The global information is also used in (Cordova-Garcia et al., 2019) to minimize the packet transmission delay, propagation delay, and the expectation of service delay along the path.Gao et al. (2021) considers decentralized information to construct the least-score paths from the source to the destination, where the scores are the weighted sum of the queue length of packets at the neighboring nodes of the source and the shortest hop count from the neighboring nodes to the destination.Instead of using the direct score, (Cai et al., 2016) uses probabilistic score values for selecting the routes.

Modeling of intra-domain failure propagation in communication network
When a fault occurs in the communication network, it may propagate within the network and affect its overall performance.The initial failure can occur due to internal faults of the devices, external disasters, or cyber-attacks.With the initial fault in the network, the data flows of the faulty nodes reroute to the other active nodes, which may increase congestion in the active nodes and cause overloading.As a result, the active nodes may malfunction and drop data packets (Gao et al., 2020).This process may continue and propagate to the entire network until additional measures are taken.Han et al. (2018) consider the concept of the round-trip time (RTT) in the data transfer and assume that the network malfunctions when the data cannot be transferred within the RTT threshold.The intra-domain failure in the network is simplified in (Zhang and Yağan, 2020) with the assumption that a node remains functioning as long as it belongs to the largest connected components (giants) in the network.

Modeling of interdependencies
In CPPS, the power grid and the cyber network are coupled together; where the power grid depends on the cyber network for its monitoring and control and the cyber network depends on the power grid for the energy supply.In literature, different types of dependency are considered between the two networks namely unidirectional and bidirectional interdependency.Gao et al. (2021) consider the unidirectional interdependency between the power grid and the cyber network, where only the power grid depends on the cyber network for its operation and control, but the cyber network is independent of the grid, arguing that the network has backup power sources installed.In the case of bidirectional interdependency, the following interdependencies are considered for analyzing the robustness of CPPS.I) One-to-one interdependency: each grid node is related to each cyber node (Zhang and Yağan, 2020); ii) one-to-multiple interdependency: each grid node is related to cyber nodes or vice versa (e.g., redundant control) (Chen et al., 2018); iii) multiple-to-multiple interdependency: multiple nodes of one network are related to multiple nodes of another network (Shao et al., 2011).There are some other interdependencies within the above three categories such as topological-characteristic-based interdependency and random interdependency.In topologicalcharacteristic-based interdependent models, an assortativity coefficient is defined to find the nodes with similar topological characteristics (e.g., degree, betweenness) to couple them together (Liu et al., 2022b).To compare the performance of different types of interdependencies, (Yagan et al., 2012) defines a random interdependency, where the interdependent networks are partitioned into multiple subgroups with an equal number of nodes for each subgroup.Then, each node of an interdependent subgroup is correlated with j (j is within the range of 0 to the number of nodes in the subgroup) other nodes with a probability.Abdelmalak et al. (2022) describes the modeling techniques for CPPS interdependence in detail considering the power grid as a distributed and autonomous system.It provides evaluation criteria for interdependence modeling, and describes potential applications of CPPS interdependence modeling techniques.Clearly, cascading failure analysis is one of the domains for application.

Modeling of inter-domain failure in power grid and communication network
The inter-domain failure occurs when the failure/malfunction of one network affects the other network due to the interdependent nature of CPPS.A common practice regarding inter-domain failure research is that the cyber node fails instantly when the corresponding grid node fails and vice versa.However, a few studies assume that the power or cyber node does not necessarily fail instantly due to the failure of a node in the counter system but imposes a probability of failure (Qu et al., 2019).Gao et al. (2021) define a threshold of average transmission time in the cyber network and assumes that when the data cannot be transferred within the threshold, the control center would lose the control of the grid nodes with a certain probability.Cordova-Garcia et al. (2019) argue that the control-command transfer from the control center to the grid nodes is asynchronous, i.e., timevarying, which can escalate further failures in the power grid.To reduce the effect of asynchronous control, the authors propose a loadshedding scheme to shed large loads at those nodes, where the control action can be performed with low delay.Cai et al. (2016) use the overcurrent relay operating time as the threshold and assume that the overloaded grid branch will trip when the data-exchanging model requires more time steps for the data transfer than the threshold.They also assume that due to the failure of a grid node, the dependent node in the cyber network will fail with a low probability.Li et al. (2021) and Han et al. (2018) use the RTT as a threshold to identify the failed grid nodes when the data transfer delay exceeds the threshold.Although most of the above models consider cyber failure propagation to the power grid, there is modeling of failure propagation in the opposite direction as well.Das et al. (2017) proposes an influence model based on a networked Markov chain framework to incorporate the powerdependent failure into the cyber network.In this model, the functionality state of a cyber node is modeled by combining its internal state with that of the related grid node.Qu et al. (2019) considers both power and cyber nodes to fail due to capacity overload, thereby failing the counter system's dependent node.Based on the assumption, the paper presents an optimal load allocation strategy in both the power grid and the cyber network to limit the effects of the CF.

Cascading failure analysis in CPPS
The modeling of different components in the CPPS is described in the previous section.Based on the models of different components, several categories of methods have been proposed in the literature to analyze the CF in CPPS.The objective of CF analysis is to examine the behavior of CPPS, especially the propagation of failures and their consequences, in the event of initial failures due to internal or external disturbances.These studies allow us to understand how the CPPS will respond during any failure events and what measures should be taken to mitigate the impacts based on the identified vulnerabilities.In this section, we will categorize the existing methods for CF analysis and describe the concepts and features of each category in detail.The summary of the methods is provided in Table 2.

CF analysis based on deterministic or probabilistic simulation
The most widely used category of analyses in practice is numerical simulations.They find out the paths of CF propagation based on computations of system operation conditions using pre-determined CPPS models with detailed physical and the operational properties, e.g., power flow analysis in the power grid and routing analysis in the cyber network.The simulation of the CF process begins with a selected initial failure of the devices in either the power grid or the cyber network, and the sequence of failures to occur is determined using both intra-domain and inter-domain failure propagation models.After the propagation of the failures, the final states of the systems are obtained and the consequences are quantified.This process can be repeated many times to find out a representative or critical set of failure propagation paths under various operating conditions, initial failure scenarios, or random sampling results.The existing simulationbased methods can be described in several categories.i) Asynchronous-model-based analysis.In this category, failures propagate in either the power grid or the cyber network at each stage.Once the failure propagation in one network is completed, the next stage of the failure in another network starts with the incorporation of both intra-domain and the inter-domain effects.Zhang and Yağan (2020) uses the asynchronous model to evaluate the robustness of CPPS against CF initiated by random attacks.It covers intra-domain failures in both networks, but only powerdependent failure in a cyber network is included for interdependency.However, this model simplifies the power flow model and ignores the data flow model in cyber network.On the contrary, considering AC power flow, Boyaci et al. (2022) uses the asynchronous model to estimate the blackout probability triggered by attacks on both grid buses and branches.ii) Deterministic-modelbased analysis.In this category, the assumption for CF analysis is that the occurrence of a failure can be deterministically calculated with physical laws.Li et al. (2021) considers deterministic failures in the power grid when the power flow exceeds the branch capacity and studies the probability of load loss ratio in the grid.iii) Stochasticmodel-based analysis.Unlike deterministic models, the CF analysis is performed to incorporate the stochastic nature of the failures.Gao et al. (2020) considers the stochasticity in power flow due to increased renewable penetration and develops a model for analyzing the CF under uncertain power flow patterns.Simulation-based analysis is the most practical and accurate method for understanding the behaviors of CF in CPPS once the operating conditions, initial failure conditions, and failure propagation models are determined.However, the astronomical number of possible conditions, scenarios, and random samples make it very challenging to scale the methods to large systems.Therefore, efficient screening of critical cases is a significant challenge to address for scalability of simulation-based methods.

CF analysis based on percolation theory
The application of the percolation theory in complex networks is a well-established research direction.With the help of statistical physics principles and game theory, the percolation theory analyzes the phase transition of a network to determine the giant clusters that appear within the network (Li et al., 2021).In the context of cascading failure, percolation theory is also extensively used to determine the transition threshold above which there will be a catastrophe and cascading failure will happen.For a given graph G (v, e), if we consider the failure probability of edges e as ϕ, then there presents a threshold of ϕ between 0 and 1 when a giant failed/sustained component appears and the threshold can be determined using the percolation theory.When the failure probability is considered only for the edges, it is called bond percolation, whereas, the consideration of failure in nodes is called site percolation (Li et al., 2021).With the defined control threshold of interdependency, (Chen et al., 2018) measures the critical point (the initial failure fraction of the entire network) for CF in CPPS using percolation theory and found that both increasing the interdependency and decreasing the control threshold enhance the robustness of the system.In (Huang et al., 2013), the size of the functioning components after CF is calculated in a k-to-n interdependency model (each grid node is controlled by k cyber nodes, each cyber nodes control n grid nodes) and a relationship between robustness and cost is deduced to help determine a tradeoff between the two parameters for building a reliable smart grid infrastructure.The percolation-theory-based methods provide a unique perspective and in-depth insight into the occurrence of CF in CPPS without extensive repeated simulations.However, these methods usually overlook the detailed physical and operational properties of both the power grid and the cyber network during the CF, and it requires careful examination whether the generic simplified probabilistic model can truly represent the failure propagation patterns in a real-world CPPS (Parandehgheibi et al., 2014).

CF analysis based on Markov chain
The Markov chain (MC) model is a stochastic model used to describe a sequence of linked events, where each event depends only on the immediately preceding event.Rahnamay-Naeini and Hayat (2016) describes an Inter-Dependent Markov Chain (IDMC) model for CF analysis, where two separate MC models are coupled to describe the inter-domain and intra-domain failures in CPPS.It considers two separate probabilities to relate power-dependent communication failure and communication-dependent power failure, respectively, which can be used to control the level of interdependency between the two networks.It also provides an analytical solution to describe the show that two reliable networks may be combined into an unreliable one because of the interdependency.The MC model is also used to analyze the CF in cyber networks considering both the intra-domain and inter-domain failures (Das et al., 2017).The article also considers the repairability of both cyber and power nodes, where a failed node can recover from failure with a probability.Based on the simulations, it shows the impact of intra-domain and inter-domain failures on cyber networks and the positive impacts of adding node repairability.
Similarly, (Shuvro et al., 2017) studies the impact of cyber failures on power grid reliability based on the MC model.It defines a powercyber interdependence function to capture the influence of cyber failures on the grid based on the hop distance to the control center and the degree centrality of cyber nodes.Based on the simulation results, the key insight of the article is that the cyber dependency of the grid has a significant impact on the probability of blackout.Similar to the percolation theory, the Markov chain model provides elegant results for the entire profile of CF propagation paths with guided simulations.However, the simplified assumptions about the characteristics of the power grid and the cyber network may lead to inaccurate results and miss severe individual failure cases.Furthermore, the assumptions about failure probabilities play a critical role in Markov chain models and must be carefully derived and verified.

Cascading failure mitigation in CPPS
The ultimate goal of modeling and analyzing CF is to prevent its occurrence or mitigate its impact.In order to discuss the mitigation strategies against CF, the concept of resilience will be introduced first.According to the US National Infrastructure Advisory Council (NIAC), the resilience of infrastructure systems is defined as "their ability to predict, absorb, adapt, and/or quickly recover from a disruptive event such as natural disasters" (Wu and Li, 2021).In case of CPPS, the resiliency can be explained by its functionality curve F (t) as shown in Figure 3, which can be divided into three stages: preparedness, response, and recovery stages.In the preparedness stage, the system maintains its normal functionality, F (t 0 ).In the response stage, the system initially manages to sustain its full functionality at F (t 0 ) under certain failures by utilizing redundant components in the system.However, if the failure propagation continues and a CF is triggered, the functionality of the system starts to degrade sharply.As the system stabilizes upon the completion of failure propagation, the functionality degrades from A to B at time t 2 , as shown in Figure 3. Points A and B can be used to quantify the vulnerability of CPPS during the response stage.Finally, in the recovery stage, adequate measures are taken to quickly restore the system's functionality to its initial operation at time t 3 .In this subsection, CF mitigation strategies will be described in accordance with different stages of the resilience curve in Figure 3.The methods are summarized in Table 3.

System hardening
Hardening refers to preplanning measures during the design of CPPS to handle a certain level of failure in the system without initializing the CF, which takes place in the preparedness stage of the resilience curve in Figure 3 (Ghanbari et al., 2018) (Chen et al., 2019b).In the power grid, branches and generators should have headroom capacities over their normal loads so that when any failure occurs, the active components can handle the redistributed loads (Zhang and Yağan, 2016).Similarly, the cyber network should be designed such that it can tackle additional data flow through the nodes/edges after an initial failure.However, as the extension of capacity requires extensive capital cost, an optimal search for the critical nodes/edges is necessary for the capacity enhancement to mitigate CF (Ghanbari et al., 2018) (Wu et al., 2021a).The topology of the cyber and physical systems also has a great impact on the robustness of the system.For instance, it is found that the scalefree cyber network is more robust than the small-world network against random failures (Li et al., 2021).In (Chen et al., 2019b), the authors study the impact of different topologies on CF and suggest modifications of the topologies of both cyber and physical networks considering their interdependence to limit the scale and impact of the CF under different cyber-attack strategies.

Varying the interdependency
As the cyber-physical interdependence heavily affects the paths and properties of failure propagation, the robustness of coupled networks against cascading failures is studied under different levels of interdependencies (Banerjee et al., 2017) (Gao et al., 2012).There are several approaches considered in the literature to increase the robustness of CPPS by varying the interdependency between the power grid and the cyber network (Yagan et al., 2012) (Liu et al., 2022b) (Kong, 2019).One of the approaches is to increase the number of autonomous nodes by decoupling the interdependency between the two networks.In power grid, FACTS devices such as shunt, series, and unified controllers can be installed to control the voltage of power node and the active and reactive power flow of power branches within a certain range without reliance on a cyber network (Han et al., 2018).Similarly, the interdependency of cyber nodes on the power nodes can be decreased by installing backup power sources, e.g., uninterrupted power sources (UPS), on site.Here, effective placement of the FACTS and backup power sources is necessary, as discussed in (Kong, 2022).In contrast to the above approach, proper enhancement of cyberphysical interdependency can also make the CPPS robust against CF Frontiers in Energy Research frontiersin.org11 Islam et al. 10.3389/fenrg.2023.1095303(Korkali et al., 2017).For instance, both one-to-multiple and multipleto-multiple interdependencies are more robust against CF than oneto-one interdependency (Yagan et al., 2012).In these models, a cyber node can have energy supply from multiple power nodes, and a power node can be equipped with redundant sensors and controllers at multiple cyber nodes.The interdependence based on intra-domain characteristics also impacts the robustness, as shown in (Liu et al., 2022b), where an assortative coefficient metric is defined based on the intra-domain characteristics to generate different levels of interdependency between the two networks and used to evaluate the robustness of the system against CF.In (Kong, 2019), an optimal configuration of interdependence is obtained by employing sufficient power-disjoint communication routes for the data transfer.Lastly, it should be pointed out that this category of CF mitigation strategies take place both in the preparedness stage and the response stage of the resilience curve of Figure 3. On the one hand, the system needs to be sufficiently prepared before a failure occurs, such as by installing FACTS devices and backup power sources, which lies in the preparedness stage.On the other hand, the installed devices must be properly operated to limit the propagation of failures, which lies in the response stage.For example, in (Han et al., 2018), the power input from the installed FACTS devices is regarded as a constraint in order to achieve an optimal cascade mitigation strategy.

Optimal response and recovery
Following an initial failure in CPPS, response strategies can be adopted to halt the propagation of the failure in the networks, which lies in the response stage of the resilience curve in Figure 3.The most difficult task in this process is determining the precise location of the initial fault in the networks.Tootaghaj et al. (2019) proposes an optimal response strategy that can stop CF even if the fault location is unknown or only partially known.Here, the authors formulate cost flow assignment as a linear programming optimization problem to minimize the total cost of redispatching generation and shedding loads in the power grid.Neglecting the intrinsic properties of the individual network, (Chen et al., 2021) proposes a strategy for increasing the robustness of interdependent networks by intentionally removing a few nodes and links after an initial failure.The authors argue that intentional node and link removal strategies can effectively interdict the propagation path of cascading failure, which is economical and efficient.
Although the response stage improves the robustness of the networks and minimizes the effects of CF, the system performance may still be partially degraded during a cascading failure.A fast recovery plan is necessary to bring the system back to its normal condition, as shown in the recovery stage of the resilience curve in Figure 3.As the recovery resources are often limited, several optimization models are proposed for the best allocation of the resources to maximize the functionality of the network.Almoghathawi et al. (2019) propose multi-objective restoration models for K interdependent networks using the mixed-integer programming (MIP) to maximize the resilience and minimize the cost subject to the network flow constraint, interdependence, available resources, and other related constraints.Zhao et al. (2016) propose a multi-stage recovery model using integer linear programming (ILP) and design two algorithms based on relaxation and bounding of the ILP and dynamic programming for solving the problem in large-scale interconnected systems.Wu et al. (2021b) proposes an optimization model that incorporates recovery resources, recovery activity execution modes, the precedence of damaged components, and the  2022) assumes cyber-physical dependency and uses Q-learning to find the best sequence combination for recovering failed loads with limited resources.The model incorporates the recovery process within the framework of the cascading failure model.However, the method ignores critical power system operational constraints, such as bus voltage, and necessitates a large amount of memory with high computational complexity.Overall, although physical power grid restoration has been extensively studied, research on recovery plans that account for the impact of cyber networks, especially the interactions between the two networks in a CF, remain relatively unmature.

Future research directions
Although abundant research work has been conducted on the CF of siloed power grids, the CF of CPPS is a historically less explored topic and have received rapidly growing attention in the recent years.As observed from the literature review, many recent attempts have been made to understand and tackle various challenges regarding CF in CPPS.However, as an emerging research topic, there are still many factors to be carefully addressed towards more accurate, efficient, and comprehensive solutions.Based on the literature review, possible future research directions are suggested as follows.
(1) Incorporation of heterogenous components and failure propagation mechanisms.In a CPPS, the power grid and the cyber network are driven by heterogeneous laws: the power flows are driven by circuit laws, and the data flows are driven by router forwarding policies.Furthermore, the components in each network itself, and the interaction mechanisms between components in different networks, are highly heterogenous.
Although the models and analyses of individual networks and components are relatively mature, the question remains how to develop a general theory or methodology that covers all heterogeneous components and mechanisms without unacceptable simplifications that significantly degrades accuracy.In fact, this is a major challenge encountered by the modeling and analysis of CF in any interdependent networks (Wang et al., 2020b).( 2) Creditable modeling of failure probabilities under scarce data.As renewable energy generation and demand-side participation become more prevalent, power grid operating points and dynamics become more uncertain and unpredictable.Therefore, stochastic models and methods are required to accurately portray the profile of possible or probable failure propagation paths.However, it is could be difficult to build trustworthy stochastic models as historical data of failures, especially large-scale CF, is often scarce (Tomsovic et al., 2005).Furthermore, the historical data of one system cannot be safely reused to characterize other systems, as the stochastic properties of the failures are high dependent on system-specific parameters such as geographical locations, network topologies, resource distribution, and operating paradigms.Therefore, future studies should consider how to model and validate the stochastic properties of failures in CPPS with high trustworthiness especially when historical data is scarce (Wu et al., 2021c) (Dobson, 2012).(3) Situational awareness during cyber-physical CF.The situational awareness of the physical power grid is ensured by successful sensing, communication, and computing via the cyber network.
When failures propagate to the cyber network, the situational awareness of the power grid may be degraded, which disables proper and timely decision making and accelerates the cascade of failures (Panteli et al., 2013).The cascading failure of the power grid in the Northeast region in North America in August 2003 was known to be partially attributed to the failure of cyber systems and the lack of situational awareness (Muir and Lopatto, 2004).Furthermore, situational awareness is required not only for the physical power grid but also for the cyber network itself.Hardware failures or cyber attacks must be quickly detected, identified, localized, and isolated to allow effective decision making and prevent wide spreading of failures.Therefore, it is essential to incorporate the factor of situational awareness into CF models and investigate effective measures to prevent the spread of failures under limited situational awareness or to agilely restore situational awareness (Edib et al., 2021) (Edib et al., 2020).Furthermore, the impact of the simultaneous failure of multiple networks, such as the SCADA and PMU networks, is worth further investigation.(4) Dynamic response to prevent failure propagation.As shown in the resilience curve in Figure 3, there are three stages to enhance the resilience of CPPS: preparedness, response, and recovery stages.
The CF mitigation strategies reviewed in Section 6 work mostly in the preparedness and recovery stages.However, targeting the preparedness and recovery stages only is not enough.In the preparedness stage, it is impossible to predict and prepare for all possible scenarios of failures/disturbances due to their astronomical numbers.Meanwhile, although a successful recovery stage is important for shorten the period of outages, it cannot really stop the failure propagation and limit the scale and degree of the performance degradation.Therefore, agile response actions during the CF are the key to preventing failure propagation and limit the consequences of CF.For instance, dynamic reconfiguration of the network, as one of the efficient response actions, has been widely studied for the siloed power grids and can also be explored for the CPPS (Ding et al., 2017) (Pournaras et al., 2013).It takes place during the response stage of the resilience curve, where the system is reconfigured after any failures such that the existing components can take over the responsibility of the failed ones without being overloaded.Additionally, a remedial action scheme (RAS) coordinating actions such as generation tripping, load shedding, or system reconfiguration is used to limit the impact of cascading in the response to contingencies that cannot be constrained with normal protection and control devices (Mahmoudi et al., 2017).Although RAS has been extensively researched for siloed power grids, it is critical to extend RAS methodologies to CPPS. ( 5) Scalability of CF analysis to large-scale systems.The existing simulation-based CF analysis methods, as discussed in Section 5, achieves higher accuracy at the expense of higher complexity since they consider the physical properties of CPPS during the analysis.When the scale of the system becomes large, the possible scenarios become astronomical and it is impossible to enumerate all possible scenarios via simulation.On the other hand, the percolation theory-based CF analysis methods, as discussed in Section 5, does not consider detailed physical properties of the components and can provide scalable analytical solutions for understanding the consequences of CF without enumerate all possible scenarios.However, due to the lack of consideration of detailed physical properties, the later methods may yield inaccurate results.Therefore, scalable yet accurate methods for CF analysis in CPPS remains a significant gap (Liu et al., 2022c).Note that efficient screening of critical initial contingency (failure) scenarios that may lead to severe CF is a possible solution that has been extensively studied for siloed power grids (Narimani et al., 2022).However, more investigation is required to extend the methodologies to CPPS.(6) Cyber-physical CF due to malicious cyber attacks.The integration of a cyber network creates a large surface for cyber attacks against the power grid.There is growing research on the modeling of and defense against cyber attacks in power grids (Che et al., 2019).However, most of the existing literature only focus on hardening strategies implemented in the preparedness stage, or detection strategies implemented in the response stage (Clark and Zonouz, 2019).There is still a lack of strategies to suppress the impact of attacks in the response stage (e.g., attack isolation and data rerouting) and to recover system performance in the recovery stage (e.g., security upgrade, malware cleaning, and data recovery) (Sahu et al., 2021).These aspects require further investigation in order to establish a holistic framework for handling CF due to malicious cyber attacks.(7) Use of distributed energy resources and edge computing resources to mitigate CF impacts.The CF phenomenon of CPPS is largely due to the centralized operation paradigm, the interdependency between different components, and long-distance transfer of energy and data.Local and distributed operation paradigms can effectively reduce the complexity and interdependency of the networks and thus reduce the risks of CF.The decentralization of CPPS operation can be realized by distributed energy resources (DERs) in the power grid and edge computing resources in the cyber network, which can achieve self-sufficiency in local areas without long-distance transfer of energy and data (Maharjan et al., 2015) (Li et al., 2018b).For example, in the event of bulk power grid failures, self-healing cyber-physical microgrids can be formed with DERs to maintain the power supply to critical loads (Vu et al., 2020).Similarly, edge computing resources can fulfill many real-time monitoring, optimization, control, and protection requirements without the need of a remote centralized controller (Liu et al., 2019) (Gai et al., 2019).( 8) Incorporation of human factors in cyber-physical CF.Although most control and decision-making processes are automated in CPPS, human is kept in the loop for many applications at the high level.During a large-scale CF where pre-computed plans or intelligent real-time decision-making tools are not available or sufficient, human intervention plays a critical role in determining the course of the event.In power grids, CF events attributed to or magnified by human errors have been reported in the past (Anderson, 2004).Human errors can arise from a variety of factors, including the external environment, a lack of experience and expertise, and the complexity of the tasks (Bao et al., 2018).Note that failures occurring concurrently in and propagating across cyber and physical networks significantly increase the complexity of operational tasks and hinders human understanding of the situation and possible measures to be taken.Although numerous studies have been conducted to explore human aspects on CF in physical power grids, there is lack of study on the impact of human behaviors in interdependent power grids and cyber networks.

Conclusion
The increasing interdependency between power grids and cyber networks leads to the so-called CPPS with higher heterogeneity and complexity.In recent years, several large-scale CF events have been observed in CPPS, motivating research on the modeling, analysis, and mitigation of CF considering the cyber-physical nature of smart grids.This paper systematically summarizes the state-of-the-art research on cyber-physical CF in CPPS.It starts with the motivation of the review, followed by background research conducted on siloed power grid and communication network, as well as on interdependent networks in general.Then, existing techniques for the modeling, analysis, and mitigation of CF in CPPS are categorized, and their linkage with the concept of resilience is discussed.The literature survey portrays the vibrant research efforts on this topic, while also revealing many outstanding questions and challenges to be further addressed.This paper concludes by discussing possible future research directions and recommendations.

FIGURE 2
FIGURE 2 Failure propagation in interdependent CPPS.(A) Example CPPS model with interdependency.(B) Failure propagation in the power grid due to an initial failure in the power grid, i.e., intra-domain failure in the power grid.(C) Power-dependent cyber network failure, i.e., inter-domain failure in the cyber network.(D) Failure propagation in the cyber network due to its own failure, i.e., intra-domain failure in the cyber network.(E) Cyber-dependent power grid failure, i.e., inter-domain failure in the power grid.

TABLE 1
Summary of CPPS modeling.

TABLE 2
Comparison of cascading failure analyses in CPPS.

TABLE 3
Summary of mitigation cascading failure strategies in CPPS., cost, and timing of recovery resources to achieve optimal recovery in terms of system resiliency.The authors solve the optimization model with a modified simulated annealing algorithm and quantify the model's real-time performance with a CF model.Li  et al. ( availability