Injection-Induced Seismic Risk Management Using Machine Learning Methodology – A Perspective Study

Effective identification of induced seismicity and real-time management of seismic risks are hot topics due to increasing induced seismicity in areas related to energy exploitation. Existing decision-making tool for managing seismic risks, known as the traffic light system, is not robust enough. To meet the increasing needs for safe mining of energy at production sites, finding an advanced and efficient method to improve the traffic light system is essential. In recent years, machine learning, an advanced inductive and analytical method, has been widely used in seismology. In this context, research gaps associated with the identification and management of induced seismicity, as well as the current achievements of machine learning in addressing induced seismicity problems, are reviewed. A basic framework of using machine learning method to optimize the traffic light system in the industrial production process is first proposed. Then, its feasibility and rationality are demonstrated by similar cases. This framework may provide a reference for the development of a risk-based adaptive traffic light management system.


HIGHLIGHTS
-The progress and gaps in injection-induced seismicity are reviewed.
-The concepts and current challenges of traffic light systems are reviewed.
-The main methods of machine learning and their applications in induced seismicity are summarized. -The basic framework for improving the traffic light system using machine learning method is proposed. -The feasibility and rationality of the framework are demonstrated by similar cases.

INTRODUCTION
Earthquakes can cause the sudden release of elastic energy in the Earth. Generally, natural earthquakes are caused by crustal movements, and the majority of strong seismic events in the world are natural earthquakes and have tectonic origin. Unlike natural earthquakes, most induced seismic events are usually related to anthropogenic production activities. The mechanisms that account for induced seismicity include changes in the state of stress, pore pressure, volume, and applied forces or loads (McGarr et al., 2002). The earliest induced earthquake records can be traced back to the 1920s (Pratt and Johnson, 1926), and the most classic case of wastewater reinjection-induced seismicity occurred in Denver, Colorado, in the 1960s (Evans, 1966;Healy et al., 1968;Hsieh and Bredehoeft, 1981).
The potential of energy technologies to induce earthquakes is known (National Research Council, 2013), mainly including wastewater disposal (e.g., Pratt and Johnson, 1926;Davis and Frohlich, 1993;Currie et al., 2018), extraction and injection of natural gas (e.g., Grasso and Wittlinger, 1990;Herber and de Jager, 2014;Zbinden et al., 2017), geothermal energy exploitation (e.g., Majer and Peterson, 2007;Martínez-Garzón et al., 2016;Boyd et al., 2018), hydraulic fracturing (e.g., Rutqvist et al., 2015;Bao and Eaton, 2016;Ghofrani and Atkinson, 2016) and Carbon dioxide (CO 2 ) geological sequestration (Goertz-Allmann et al., 2014, 2017a. Frequently felt earthquakes (3 ≤ M < 4.5) can create negative public sentiments and even hinder the smooth progress of industrial production. For instance, in the Basel Geothermal Field, Switzerland, enhanced geothermal system (EGS) exploitation projects induced a series of locally felt earthquake events. The damage claims in Basel amounted to more than $9 million, and these projects were temporarily suspended due to strong protests from the general public (Giardini, 2009). A similar case occurred in the Groningen gas field, the Netherlands. In the 28 years since the gas was extracted in 1991, there have been approximately 320 earthquakes with M > 1.5, including 3 M≈3.5 earthquakes. These growing seismic events have caused widespread building damage, social concern and political upheaval (Sintubin, 2018;Vlek, 2018Vlek, , 2019a. Moderate magnitude earthquakes (4.5 ≤ M < 6) may threaten public safety and cause heavy economic losses. For instance, in the Changning shale gas block, Sichuan Basin, China, a moderate intensity earthquake (M L 5.7) occurred on 16 December 2018. In this event, 17 persons were injured, more than 390 houses were damaged, and 9 houses collapsed. The direct economic loss reached approximately 50 million CNY (Lei et al., 2019). Another famous induced earthquake is the M W 5.5 Pohang earthquake. The earthquake occurred on 15 November 2017, in Heunghae, Pohang in the North Gyeongsang Province in South Korea (Kim et al., 2018). The earthquake injured 82 people, killed one and left about 1,500 homeless, moreover, it caused serious damage to infrastructure and more than 75 million US dollars . More cases of human-induced seismicity can be found in some reports and papers (e.g., Gibson and Sandiford, 2013;Folger and Tiemann, 2017;Foulger et al., 2018). Therefore, managing the risks of induced seismicity is of great significance for the smooth progress of industrial production and the development of energy technology.
In recent years, due to the fossil energy crisis, more resources from deeper in the Earth are needed, and in turn, the induced seismicity becomes more prominent. Although the traffic light system (TLS) has been used to manage the risk of induced seismicity, there remain two unresolved problems at present: (1) the differences between natural and induced earthquakes cannot be completely distinguished; (2) the evolving risk of deep fluid injection-induced seismicity is still difficult to manage effectively and timely, especially in the post-injection phase after shut-in. To solve these two problems, many different types of data related to injection-induced seismicity are needed, including geological data, seismicity and operational parameters (Yang et al., 2017). Unfortunately, the characteristics of the induced seismicity and their relationships with the production parameters are usually hidden in these large and disorderly data, and we cannot extract effective information from the large amount of data merely by analytical or conventional methods.
The rapid development of computer science provides technical support for big data processing, especially machine learning techniques. Machine learning (ML) was proposed in the computational model theory of neural networks in the 1940s (McCulloch and Pitts, 1943). It is a computer algorithm that can automatically improve itself through experience (Mitchell, 1997), and also a transformation chain from data to decision. ML can use mathematical and statistical methods to determine the intrinsic regulation from various data. To date, it has evolved towards more advanced learning types that are closer to the human brain, such as deep learning (Hinton and Salakhutdinov, 2006), transfer learning (Pan and Yang, 2010) and deep reinforcement learning (Mnih et al., 2015). At the same time, ML has been widely used in many fields, especially in seismology, such as for the identification and prediction of earthquake events (e.g., Asim et al., 2016;DeVries et al., 2018;Lubbers et al., 2018;Rajguru et al., 2018;Corbi et al., 2019) and the classification of seismic remote sensing images (e.g., Afonso et al., 2016;Bialas et al., 2016;Frank et al., 2017), etc. However, the achievements in solving induced seismicity by using MLbased methods are rarely reported, especially in improving traffic light systems.
In this context, we attempt to propose a basic framework for improving current traffic light system that is, using machine learning method to build a proxy model. This model could well reflect the complex relationship between seismicity magnitude and operational parameters. The paper is organized as follows. The first section briefly reviews the research progress of induced earthquakes related to fluid injection, including the explanation of the mechanism, the discrimination of induced seismicity and the concept of the traffic light system. Then, the main methods of machine learning and their research achievements in induced seismicity are summarized and reviewed systematically. Finally, the basic framework of using machine learning method to optimize the traffic light system is elaborated.

REVIEW OF FLUID-INDUCED SEISMICITY
Fault reactivation induced by fluid injection in industrial activity is one of the main causes of seismicity (Ellsworth, 2013), and Figure 1 displays this process. Identified factors affecting fault reactivation mainly include the rate and volume of fluid injection (Nicol et al., 2011), in-situ stress conditions (Rutqvist et al., 2010), type and occurrence of the fault (Macdonald et al., 2012), and physical and mechanical properties of host rocks (Figueiredo et al., 2015). Notably, the most relevant role is played by fault orientation versus dominant stress orientation. In this section, the research progress and gaps in induced seismicity related to industrial activities are reviewed briefly.

Mechanisms of Induced Seismicity
The Coulomb failure criterion based on Mohr-Coulomb theory is commonly used to explain the mechanism of induced seismicity (Healy et al., 1968) and can be defined as: where CFS is Coulomb stress, τ is the shear stress change (MPa), µ is the friction coefficient on the fault, and σ is the effective normal stress change (MPa), which can be calculated according to the effective stress principle (Alcoverro, 2003): where σ is the normal stress (MPa) and P is the pore pressure (MPa). In Eq. (1), when CFS > 0, the fault loses stability and fails; otherwise, it remains stable.
Based on different ways of fluid injection or extraction in industrial production, Doglioni (2018) further summarized four possible mechanisms, that is, fluid removal from a stratigraphic reservoir that can cause normal faults-related earthquakes (also called graviquakes), reinjection quakes, hydrofracturing quakes and load quakes, as shown in Figure 2. For a more detailed explanation of the mechanism of fluid-induced seismicity, please refer to Shapiro (2015).
It is worth noting, however, that the Coulomb failure criterion can only be used to simply evaluate whether a fault is reactivated, and cannot truly reflect the entire physical process of fault failure induced by industrial activities.

Discrimination of Natural and Induced Seismicity
The first step for reducing seismic hazard is to effectively identify induced seismicity among a large number of earthquake events. Davis and Frohlich (1993) first proposed analytical approaches to discriminate between induced and natural earthquakes in industrial activity based on a set of YES or NO criteria. The analytical approaches for differentiation mainly include physicsbased probabilistic models, statistics-based seismicity models and source parameter approaches . Table 1 lists the basic information about these three discrimination methods, including theoretical bases, advantages, disadvantages, and application scopes.

Physics-Based Probabilistic Model
The physics-based model is used to understand the complicated relationship between fluid migration and seismic activity and reveal the physical processes of induced seismicity by using laboratory and in-situ measurements and numerical simulations (e.g., Guglielmi et al., 2015;Zbinden et al., 2017). In recent years, physics-based probabilistic models have been developed, taking into account physical and statistical-stochastic factors (Dahm et al., 2015).
This model can be used to quantify the probability of event rate change induced by stress changes. However, it is not suitable in case of insufficiently detailed data, such as production/exploitation data and rock parameters (e.g., Passarelli et al., 2012;Rinaldi and Nespoli, 2017).

Statistics-Based Seismicity Model
The statistics-based seismicity model directly uses the change in statistical parameters in an earthquake catalog to determine whether the rate of stress change caused by production exceeds the change in natural stress. It is worth mentioning that the changes in these parameters are likely related to human activities. The epidemic-type aftershock sequence (ETAS) model (Hainzl and Ogata, 2005) is often used to identify artificial factors in seismicity event statistics.
The ETAS model is a combination of a constant background activity rate and the behavior of aftershock sequences according to the Omori-Utsu law (Tokuji et al., 1995). The total occurrence rate can be calculated by the sum of Omori's law for aftershocks ν (t) and the background activity rate λ (t) (Lei et al., 2017): where K 0 is a constant, α is a constant related to the magnitude dependence, M i is the magnitude of the i-th earthquake, M c is the estimated cut-off magnitude of completeness,ĉ andp are constants for Omori's law.
Statistics-based methods do not need to consider complex physical constraints, so their advantage is that few input data or detailed models are required.

Source Parameter Approach
The source parameter approach attempts to discriminate between induced and natural earthquakes by searching specific rupture processes. It is generally believed that the hypocentral locations of induced earthquakes are different in the range of human activities. The source mechanism solution obtained by moment tensor (MT) inversion helps in understanding the orientation and type of fault and obtaining the subsurface stress field disturbance caused by fluid injection (Grigoli et al., 2017).
The moment tensor inversion is an important physical quantity describing different types of earthquakes. It consists of double-couple (DC) and non-DC components, where the non-DC includes isotropic (ISO) components and compensated linear vector dipole (CLVD) components (Gilbert, 1971): where MT is the moment tensor, which can be inversed by is Green's function, andũ is the ground motion. When a high proportion of ISO and non-DC components occurs, an earthquake is likely to be induced because natural earthquakes are often characterized by a nearly pure DC source mechanism. This contrast provides an effective means to discriminate induced from natural seismicity .
The moment tensor inversion is usually combined with other methods, such as waveform template matching (Skoumal et al., 2015), source spectra, and stress drop estimation (Clerc et al., 2016;Zhang et al., 2016). However, the source parameter approach requires high-quality data. If the structural model is not sufficiently accurate, a certain error appears in the analog waveform, and the results of waveform inversion may have significant deviations (Jechumtálová and Bulant, 2014).
It is worth noting that these three methods are often used in combination with detailed seismicity source parameters (Dahm et al., 2015). Some research cases and achievements in typical regions are given in Table 2.

Relationship Between Operational Parameters and Seismicity Magnitude
The operational parameters of fluid injection, such as wellhead pressure, total injection volume, injection rate and injection location, are the important factors affecting fault reactivation. Understanding the relation between reservoir engineering operations and corresponding seismic response is important towards the optimization of production and mitigating seismic hazard (Hofmann et al., 2018).
Some scholars have built analytical models based on statistical methods. For instance, Nicol et al. (2011) conducted statistical analysis of water injection data and seismic data from the 30 best reported seismicity sites. They found that the magnitude was positively correlated with the fluid injection, which could be expressed as M = 0.3353 ln V + 1.8061, where M is the earthquake magnitude and V is the water injection  This method not only allows us to understand the type of fault, but also reveals the specific motion of the fault before and after earthquake This method is greatly affected by external uncertain factors such as anisotropic media and data quality, and its robustness is difficult to be guaranteed This method is usually only applicable to some large seismic events of M L > 3 non-DC terms.
volume. McGarr (2014) studied some earthquakes induced by unconventional oil and gas production in the central and eastern United States and found that the maximum magnitude seemed to be proportional to the total volume of the injected fluid but with an upper limit. Under some assumptions, which can be found in section 3 of McGarr (2014), the upper limit magnitude for a given fluid injection activity could be estimated as M 0 (max) = G V, where G is the shear modulus. Galis et al. (2017) analyzed Both M L 5.7 and 5.3 earthquakes were induced by nearby hydraulic fracturing, the pore overpressure to induce M w > 3.5 events was ranged from 0.2 to 5.8 MPa a scaling relation between the largest magnitude M max −arr 0 of self-arrested earthquakes and the injected volume V with an analytical model M max −arr 0 ∝ V 3/ 2 . Numerical simulations are also used to predict the maximum seismic magnitude. Lei et al. (2017) studied the physical mechanisms of several moderate earthquake sequences in the Shangluo shale gas block and its environs (Sichuan Basin, China). They used the "TOUGH-FLAC" simulator to conduct coupled thermal-hydrological-mechanical (THM) analysis on the CFS evolution patterns caused by hydraulic fracturing injection in this area. Based on it, a numerical model containing four layers of different mechanical and hydraulic properties was established, and the critical pressure at the injection wellhead that could cause fault failure was determined. Focusing on the different shale gas basins in North America, Amini and Eberhardt (2019) studied the effect of tectonic stress regime on the magnitude and its distribution of induced seismicity associated with hydraulic fracturing. They used the 3D distinct-element method to simulate the fault slip under different stress regimes, and found that both the shear displacement magnitudes and the stress drop for the thrust and reverse fault case were much larger than those for the normal and strike-slip fault case.
Notably, the prediction accuracy of these analytical models is highly dependent on the input parameters (Eaton and Igonin, 2018), and the relationship between operational parameters and seismicity magnitude may not always be simple or straight. In addition, numerical simulations might have problems with calibration when proper parameters are difficult to select.

Traffic Light System for Safe Production
The traffic light system (TLS) is a powerful decision-making tool to manage industrial activities (Bommer et al., 2006). This system has been widely used to manage a risk associated with operationrelated earthquakes, often using two or more thresholds to mitigate unexpected seismic hazards (Baisch et al., 2019;Wei et al., 2020). In this section, the application and limitations of TLS in induced seismicity are reviewed.

Working Mechanisms and Application Cases
A TLS consists of decision variables (e.g., peak ground acceleration, peak ground velocity, earthquake magnitude and other parameters) and thresholds (Mignan et al., 2017). When the earthquake magnitude or ground shaking exceed the threshold, the alert levels will be automatically activated and relevant actions must be taken (e.g., stopping operations, reducing injection rate or volume) (Braun et al., 2020).
The TLS is usually divided into three alert levels to provide feedback to manage operational measures or actions. Figure 3 shows a workflow of the traffic light system. When the traffic light is green, conditions are normal, and production can continue as planned. When the light becomes orange, it means caution, and the operational parameters should be adjusted. However, if the light turns red, the production must be suspended.
At present, the model of predicting piecewise induced seismic activity rateλ (t, M ≥ M 0 ; θ)and the exceedance probability of risk assessment in TLS are as follows (Mignan et al., 2017): where V(t) is the cumulative injected fluid volume (m 3 ),V (t) is the injection flow rate (m 3 /day), θ a fb , b s , τ r is a set of model parameters describing the underground characteristics, a fb is the activation feedback in m −3 , b s is the earthquake size ratio, τ r is the mean relaxation time in days, M 0 is the minimum cutoff magnitude, t shut−in is the shut-in time (day), and M saf is the given safety magnitude (i.e., threshold magnitude). Notably, Y represents the probability that the magnitude exceeds the threshold. Whenever the magnitude of an induced seismic event exceeds the threshold, Y = 1.
For different industrial activities, criteria for setting-up the TLS may be very different. Baisch et al. (2019) summarized some examples of existing TLS that correspond to different industrial activities taking place in Europe, North America and Australia, as shown in Figure 4. Notably, the wastewater disposal in the Cavone oil field is mainly used to enhance oil (or gas) production and to balance the loss of volume due to the primary production. It is within the production reservoir and is considered quite safe as regards induced seismicity (Styles et al., 2014).
Meanwhile, Baisch et al. (2019) investigated the different effects of TLS on seismic activity caused by fluid injection and gas production based on observation data from 12 fluid-injection operations in geothermal reservoirs and gas production in 26 gas fields. Their research showed that, for an earthquake of a given strength, the effect of TLS on short-term fluid injection induced seismicity was significantly better than that of gas productioninduced seismicity.

Current Limitations
Remarkably, in TLS, the decision variables such as magnitude, PGV, and possibly other parameters are linked each other, and which decision variable to choose depends on how they contribute to the risk evaluation. When the appropriate reference variables for the decision are selected, their corresponding thresholds need to be further determined. Unfortunately, there are some challenges in determining the threshold effectively and timely.
Take the threshold magnitude as an example. Threshold magnitude is the transitional magnitude between different alert levels. The relationship between threshold magnitudes and operational parameters plays a vital role in TLS. However, the current definition of threshold magnitude M saf in Eq. (6) is mainly chosen on the basis of expert judgment and regulation (Grigoli et al., 2017;Mignan et al., 2017), which lacks objectivity and cannot reflect the full range of possible scenarios. Magnitude thresholds enforced by different jurisdictions may vary significantly. It would further result in TLS not being flexible enough to adapt to evolving risks. Therefore, at present, one of the main limitations of classical TLS is that the threshold magnitudes for different stages are difficult to reappraisal accurately and timely, especially in the post-injection phase.
An important lesson about the limitation of classical TLS has recently come from the Pohang earthquake. Notably, Woo et al. (2019) used a variety of methods including the construction of a velocity model, to conduct earthquake detection, the determination of hypocenters, magnitudes, focal mechanisms and stress inversion, and a clustering analysis, to conduct a seismic analysis on the earthquakes occurring around the EGS site in the past 10 years. It is worth noting that all those investigations were important in order to improve the seismological interpretation, which was needed for an effective application of TLS. As part of the EGS project, a clear causal relationship between the origin of the M W 5.5 Pohang earthquake and the injected fluid was discovered. That is, the earthquake initiated on the fault zone that was reactivated by fluid injection, representing a self-sustained rupture process that released a large FIGURE 3 | A complete workflow of the traffic light system. The traffic light system needs to make decisions based on the results of the risk assessment model, which consists of a seismic assessment model and a building damage assessment model. The judgment basis for the former model is magnitude, while that for the latter model is ground motion. amount of energy via tectonic loading rather than being a directly induced earthquake via fluid injection.
In the Pohang EGS project, the monitoring focus of classical TLS is the threshold magnitude. However, Ellsworth et al. (2019) and Lee et al. (2019) found that EGS stimulation could trigger large earthquakes that rupture beyond the stimulated volume, and the assumption that the maximum seismic magnitude was governed by the injection volume might no longer be true. It indicated that in addition to injection volume, more operational parameters should be considered in reappraising the threshold magnitude. Moreover, sufficient attention must be paid to postinjection seismicity after shut-in.
On the issue of post-injection earthquakes, Baisch et al. (2019) pointed out that the performance of TLS depends heavily on the prediction model that accounts for post-injection seismicity, so the post-injection phase must be considered in the design of TLS thresholds. The predicting model in Eq. (5) has the advantages of simplicity and stability, and can also consider the underground feedback in the post-injection phase. However, it assumes that the relationship between injection rate and overpressure is linear, and the fluid diffusion process at the post-injection phase is only represented by an exponential decay, which is different from the actual physical process (Mignan et al., 2017). In addition, the magnitude used in post-injection model does not change with time and cannot meet the real-time design of TLS thresholds in the post-injection phase.
Based on the 2017 Pohang earthquake experience, it is urgent to develop risk-based TLS based on the relationship between EGS and associated stimulus activities to adapt to evolving hazards. To this end, physical and statistical models of induced and triggered seismicity were also needed to further develop, as well as some advanced analyses. Machine learning methods may help us.

MACHINE LEARNING IN INDUCED SEISMICITY
Artificial intelligence (AI) is considered one of the most sophisticated technologies with research prospects and strategic value in the 21st century. The cornerstone of AI is machine learning, a simulation of the learning process of the human brain. ML has become a useful tool to study earthquakes in the field of earth science in recent years. In this section, the basic algorithms of ML and their application to induced seismicity are briefly reviewed.

Basic Algorithms of Machine Learning
According to the types of available datasets, ML can be roughly divided into supervised learning and unsupervised learning (Love, 2002). Supervised learning is the most basic type of ML. The goal of supervised learning is to train data from samples with known labels and finally gain generalization capabilities. Unsupervised learning can automatically find hidden structures in unlabelled data. Some information on supervised and unsupervised learning is introduced as follows.

Supervised Learning
The basic tasks of supervised learning are regression and classification. The algorithm models include but are not limited to linear regression, logistic regression, artificial neural networks (ANNs) and support vector machines (SVMs).
Linear regression is, as the definition implies, a regression algorithm (Franklin, 2005). The basic idea is to use a linear combination of features to approximate the predicted value: where y is a vector of predicted values, x is a matrix of features, ω is a matrix of optimizing coefficients, and b is a matrix of bias.
The optimal values of ω and b can be determined by square loss: where arg min f (x) stands for the argument of the minimum, that is to say, the set of points of the given argument for which the value of the given expression attains its minimum value. Notably, linear regression sometimes leads to over-fitting, and adding a regularization coefficientλ is an effective method to address the over-fitting problem (Cawley and Talbot, 2007;Aghajanyan, 2017).
Logistic regression is actually a classification algorithm that outputs predicted values in the range of 0 to 1, which is suitable for discrete labels. Although the logistic regression model is a non-linear model, it is still based on linear regression theory. The predicted value can be expressed as: where sigmoid (x) = 1 1+e −ar x , and a r is the learning rate.
The ANNs are an abstraction and approximation of the biological nervous system. The purpose of ANNs is to simulate the learning function of biological neural networks (Rumelhart et al., 1986). An ANN is a network in which many logical units are organized according to different layers, and the variable of each layer is the output. Then, the variable acts as the input for the next layer. Figure 5 shows an n-layer neural network. The first layer becomes the input layer, and the last layer is called the output layer. In addition, other layers are called hidden layers. There is an activation unit on each neuron that acts to turn the input into a specific output and a weight between different layers. Furthermore, the ANNs have a powerful non-linear adaptive information processing capability and can adapt sample data (Schmidhuber, 2015).
The core of SVMs is to find the one hyperplane with the largest margin out of many hyperplanes that divide the training set (Cortes and Vapnik, 1995). The mathematical representation is: min where m f is the number of features, φ (x) is the eigenvector after x mapping and f (x) = ω T φ (x) + b is the mathematical representation of the hyperplane.
The final performance of SVMs is directly determined by the kernel function (Crammer and Singer, 2002): where α L is the Lagrange multiplier and κ (x, is the kernel. Commonly used kernel functions are shown in Table 3. The linear function has the advantage of fewer parameters and fast speed but can be used only in the case of linear separability. Although the polynomial function can map a low-dimensional input space to a high-latitude feature space, its multiple parameters may make the computation complicated. In contrast, the radial basis function (RBF) has good performance in both large and small samples and requires fewer parameters than the polynomial function. Therefore, the RBF is widely used in SVMs.

Unsupervised Learning
The major task of unsupervised learning is data clustering. Clustering is one of the data mining techniques in which data are divided into groups of similar and dissimilar objects by using the intrinsic relations between data (Saroj and Kavita, 2016).
K-means clustering is the most basic clustering algorithm. The idea is to measure the similarity between different data objects by using the Euclidean distance. The similarity is inversely proportional to the Euclidean distance between different data objects. If the similarity between the data objects is greater, the Euclidean distance is smaller.
The local optimal solution of the K-means algorithm (y i ) is the minimum value of the sum of distances between all data points and their associated cluster centroids. It can be calculated as: where m t is the total number of samples, k is the total number of clusters, C j is the j-th cluster, u j is the cluster centroid of C j and n j is the number of samples in C j . The workflow of the K-means clustering algorithm is shown in Figure 6.
The K-means algorithm is very efficient when dealing with a large dataset. However, there are typically still two disadvantages: sensitivity to the initial value and ease of falling into a local optimal solution (Hung et al., 2013).
In short, these basic algorithms of ML do not have absolutely clear scopes of application. We should choose a relatively appropriate algorithm according to specific problems.

Applications of Machine Learning in Induced Seismicity
Incorporating novel ML into traditional seismological methods could provide great benefit. Kong et al. (2018) reviewed several current applications of ML in seismology, including earthquake detection and phase picking, real-time earthquake early warning, ground-motion prediction, seismic tomography and earthquake geodesy, etc. However, these applications are mainly focused on natural earthquakes. Since in the induced seismicity context one has to establish a causative connection between the human activities and detected seismicity, studying induced seismicity needs more data than studying natural earthquakes, especially operational data, which reflect the information of industrial activities. Unfortunately, some operational data are hardly available to the public. This is the main obstacle to applying ML to induced seismicity. For this reason, the current application of ML in induced seismicity is mainly focused on the laboratory study of mechanism, while the preliminary field applications are relatively rare.

Laboratory Study of the Mechanism
The seismogenic mechanism of induced seismicity is absolutely vital for managing induced seismicity, and laboratory rock testing is an important means to understand its mechanism. Energy stored in rocks is released in the form of elastic waves, which are called acoustic emissions (AEs), during micro-crack initiation, propagation and connection in the laboratory (Lei et al., 2003;Liu et al., 2019;Yue et al., 2019). The phenomena of micro-seismicity at the field scale and AEs in the laboratory are similar in nature (Sarout et al., 2017), and high-quality microseismic observations can effectively improve the statistical models that forecast expected seismic event magnitudes (Clarke et al., 2019). Therefore, FIGURE 5 | Diagram of an n-layer neural network. In this neural network, i is the number of layers and j is the number of units in each layer. x 1 ,x 2 ,. . .,x m represent input units, while y 1 ,y 2 ,. . .,y k represent output units. a (i) j represents the j-th activation unit in the i-layer, which is responsible for mapping the input of a neuron to the output. (i) represents the weight from the i-th layer to the i+1-th layer, whose function is to adjust the proportion of each input value before entering the activation function. Furthermore, we sometimes add a bias unit to each layer to reduce the error of the output value.

Function name Mathematical expression Instruction
Linear function κ (x, x i ) = x T x i This function is mainly used in the case of linear separability, and its fewer parameters and fast speed are ideal for classification of linear separable data Polynomial function κ (x, x i )= γx T x i + c This function can map a low-dimensional input space to a high-latitude feature space, but its multiple parameters may make the computation complicated This function can realize non-linear mapping. It has good performance in both large and small samples and needs fewer parameters than the polynomial function Sigmoid function κ (x, x i )= tanh ηx T , x i +θ This function is usually used to implement multi-layer neural networks a In the polynomial function, γ > 0, c≥0, and n is the degree of the polynomial. When γ = c = n = 1, we can obtain the linear function. b In the radial basis function,σ is the width of the Gaussian kernel, andσ > 0. c In the sigmoid function, tanh(x) is the hyperbolic tangent function, η > 0 andθ < 0.
the accurate identification and location of rock AE signals in the laboratory is of great significance for the study of induced seismicity. Liu et al. (2015) used wavelet transform and ANNs to study the AE characteristics of different rock specimens during fracture under uniaxial compression. They created an efficient way to identify rock types and noise based on wavelet transform and ANNs. Rouet-Leduc et al. (2017) and Hulbert et al. (2018) applied the ML method to identify AE signals from fault shear experiments and AE signals before both slow and fast slip modes. It was found that ML could not only effectively identify AE signals that were previously considered to represent low-amplitude fault zones but also successfully predict the timing and duration of laboratory earthquakes. Rock fracture and blast events have similar waveform characteristics, and they are always mixed together. To effectively distinguish the AE signals of rocks from environmental noise and other signals, Zhou et al. (2018) proposed a new method based on signal complexity analysis and ML. This method did not seek waveform parameters of detected signals and could classify rock fracture and blast signals automatically based on the self-learning capacity of back propagation neural networks (BPNNs). On this basis, (Zhou et al., 2019a(Zhou et al., ,b, 2020 further proposed an improved joint method based on discrete wavelet transform, modified energy ratio and Akaike information criterion (AIC) pickers, which effectively overcame FIGURE 6 | The workflow of the K-means clustering algorithm. This algorithm needs to select k initial cluster centroids in advance. Then, the positions of cluster centroids are continuously updated according to the similarity between data objects and these positions. When updated cluster centroids no longer change or the objective function converges, the iteration ends, and the final result can be obtained. the interference of spike noise and channel crosstalk and greatly improved the accuracy of automatic method for picking onset time of AE signals.
These results indicate that using ML method can capture more comprehensive and accurate rock fracture information at the laboratory scale, which can help us deepen understanding of the mechanism and further improve the physical and statistical models of induce seismicity.

Preliminary Exploration and Application in the Field
For seismicity induced by EGS exploitation, Holtzman et al. (2018) studied various small earthquake events in the Geysers geothermal field using the ML method, indicating that it could reveal the time-dependent spectral characteristics of seismicity signals and identify changes in the faulting process. In response to a large number of induced earthquakes in Oklahoma since 2009, Hincks et al. (2018) developed an advanced Bayesian network to model joint conditional dependencies among spatial, operational and seismic parameters, indicating that the injection depth was closely related to the release of the seismic moment. The model they proposed proved the feasibility and superiority of using ML to establish the mapping relationship between operational parameters and seismic information.

USING MACHINE LEARNING METHOD TO IMPROVE ADAPTIVE TRAFFIC LIGHT SYSTEM
The importance of TLS for managing induced seismic risks has been illustrated in section "Traffic Light System for Safe Production." One of the ways to improve TLS is to develop a riskbased adaptive TLS. The work of Mignan et al. (2017) has great reference value. They used statistical and actuarial approaches to improve the current prediction model (mentioned in section "Traffic Light System for Safe Production") of induced seismic activity, and further proposed an adaptive traffic light system (called ATLS) that can reflect the mapping relationship between earthquake magnitude and risk.
In this ATLS, the threshold magnitude is selected as the decision variable. They put forward a method to calculate the threshold magnitude so that it can be updated in time with the change of the planned injection scheme.
where M th is the threshold magnitude in ATLS, and the meaning of other parameters can be found in Eq. (5) and Eq. (6). Based on it, they then defined the ATLS as the operational threshold M th at which the injection was stopped in order to meet the safety target, and thus obtained the following system of equations [for the meaning of the symbols, please see Eq. (5) and Eq. (6)]: In Eq. (14), the first equation represents the situation of production safety. The probability that the magnitude exceeds the threshold is much less than 1 (i.e., Y << 1), and M th ≈ M saf can be obtained according to Eq. (13). The second equation represents the first time during the production process that the magnitude exceeds the threshold (i.e., Y = 1). At this point, the injection should be stopped immediately (t = t shut−in , V t shut−in = 0), and M th =M saf can be obtained according to Eq. (13). In particular, m 0 in Eq. (5) can be replaced by M th to consider induced seismic activity in the post-injection phase. The quantitative risk-based ATLS they proposed is a suite of simple closed-form expressions, with the advantages of high transparency and fast execution speed, which can provide the operator with the greatest chance of success. It is important to note, however, that the data set used to generate ATLS is relatively small (only data from 6 underground reservoir stimulation experiments were used for analysis), so its application scope needs further testing, although in principle its adoption would make any project in compliance with the safety threshold whatever the response of the underground.
The present study points the way for developing riskbased adaptive TLS and also illustrates that a large number of operational data are needed to improve TLS. In terms of big data processing, ML method is superior to traditional actuarial approach. Using ML methods to further improve the adaptive TLS has attractive potential.

Basic Framework
In order to solve the shortcomings of the ATLS proposed by Mignan et al. (2017), this section describes a basic framework for further improving the adaptive TLS by using ML method. Through the training and learning of a large number of samples, a proxy model that reflects the complex non-linear relationship between threshold magnitude and operating parameters can be obtained. The implementation process of the basic framework is shown in Figure 7.
The establishment of the proxy model requires sufficient geological, seismic and production data. Geological data include information such as geomechanical and hydraulic parameters, which can be obtained through geological exploration. Seismic data mainly contain waveform information that can be acquired by monitoring at nearby seismic stations. Production data can be divided into engineering data and operational data. The engineering data mainly include drilling information, such as drilling time, drilling position, drilling depth and the number of wells, while the operational data reflect the information about injection operational parameters, such as wellhead pressure, total injection amount, injection speed and injection position.
For a specific study region, the local geological data should be acquired in advance, which include geomechanical and hydraulic parameters. Next, we must collect seismicity wave data and the engineering dataset in this region over a relatively long period of time (e.g., 5 or 10 years) to ensure that the dataset is comparatively complete. It should be noted that the seismic wave data are mixtures of natural and induced earthquake information (i.e., unlabelled), while the engineering dataset contains some previously known drilling information (i.e., labeled). Then, the original data for earthquake events, including hypocentral location, focal mechanism solution and aftershock sequence type analysis, need to be pre-processed to obtain the seismic dataset. The seismic dataset and the engineering dataset are combined, and methods mentioned in section "Discrimination of Natural and Induced Seismicity" can be used to distinguish the induced seismic events, as shown in Figure 7A. There are some relations between induced seismicity data and engineering data. Finally, we can form a training dataset based on the relationship between the induced seismicity data and operational data. These data are taken as prior knowledge for ML. Since both induced seismicity data and operational data are labeled, we can use the supervised learning methods mentioned in section "Supervised Learning" to train this dataset and further fit a proxy model, as shown in Figure 7B.
Additionally, to ensure the generalization ability of the training model, over-fitting should be mitigated to the greatest extent. To improve the algorithms, we can divide the original dataset into three parts: training dataset, validation dataset and testing dataset before training. According to the performance of the training model on the cross-validation dataset and testing dataset, the generalization ability of this proxy model could be roughly determined (Santos et al., 2018).

Similar Cases
In this section, the theoretical feasibility of the basic framework will be demonstrated with some real cases. However, as mentioned in section "Applications of Machine Learning in Induced Seismicity, " there is almost no public report on the use of ML for injection-induced seismic risk management. To overcome this difficulty, we select some similar cases that use ML to manage reservoir-induced seismicity (RIS). At present, the application of ML in reservoir induced earthquake is mainly reflected in the prediction of earthquake magnitude based on reservoir parameters, which is very similar to the real-time evaluation of threshold magnitude in TLS. Samui and Kim (2013) used SVM and Gaussian process regression (GPR) models in ML to predict the magnitude of RIS based on reservoir properties. At first, they collected information from 30 worldwide historical cases of RIS into dataset. These data were taken as prior knowledge for SVM and GPR models. Reservoir depth H R , reservoir capacity V R , reservoir surface S R and earthquake magnitude M were suggested to be main influential factors of RIS. In that study, they used the comprehensive parameter E to characterize the three elements (i.e., H R , V R , S R ) of reservoir size, which could be expressed as E = S R H R V R (Baoqi, 1990). These parameters could well reflect the internal and exterior causes of RIS. Then, to develop the SVM and GPR, they divided the dataset into two types: the training dataset (containing information about 24 reservoirs) and the testing dataset (containing information about 6 reservoirs). The comprehensive parameter E and the reservoir depth H R were used as input variables, while the earthquake magnitude M was output. Finally, through learning process by SVM and GPR models, the complex mapping relationship between RIS magnitude and its influence factors was established, respectively. Thus the magnitude could be controlled by adjusting the input parameters in the relationship.
It is worth noting that the correlation coefficient (R) value is the criterion for determining the success of the learning process. The closer R is to 1, the more accurate the mapping relationship (overlearning needs to be excluded). For SVM model, the penalty coefficientC, the error insensitive zoneε and the radial basis function widthσ should be determined. As for GPR model, the design values that need to be determined are the Gaussian noise and the radial basis function widthσ.
Based on those studies, Samui and Kim (2014) further tested the feasibility of least square support vector machine (LSSVM) and relevance vector machine (RVM) in predicting RIS magnitude and compared the results with ANN and linear regression models. They found that the LSSVM and RVM models were more efficient and effective. Furthermore, Su (2008) also used the GPR model to predict the magnitude of RIS in western China. They considered more impact parameters in learning process, including reservoir depth, tectonic stress, lithologic character, activity background of seismicity, activity of fault, development degree of geology structure plane, connectivity between geology structure plane and reservoir water, and development degree of karst, etc. Their method has been applied to control RIS magnitude at the Three Gorges Reservoir in China. Although these achievements are limited to RIS, it provides ideas and confidence for using ML to manage the risk of injectioninduced seismicity also for other kinds of activities.

Some Statements
(1) Decision variable: Since the basic framework is based on the results of Mignan et al. (2017), it takes the threshold magnitude as the decision variable for induced seismic risk control. If PGV or other parameters are selected as decision variables, they must be converted to the corresponding magnitude.
(2) Application scope: The operational parameters mentioned in the basic framework mainly involve some fluid injection information. Therefore, it is mainly applicable to injection-induced seismicity, including wastewater reinjection, hydraulic fracturing and geothermal energy exploitation, etc. Notably, for different activities, it is necessary to collect corresponding key parameters according to the characteristics of the activities and take these parameters as the prior knowledge for ML training. The combination of ML methods with specific TLS models implemented for different activities is very important.
(3) Post-injection phase: The basic framework can also consider the post-injection phase. To do this, we need to expand the time window to a period of time after shut-in and collect the data. Then, the data before and after shut-in are trained to establish their proxy models, respectively.
(4) Main challenges: The complete collection of relevant data is a major challenge, because some production data is hardly available to the public. In addition, the efficient processing of large amounts of data is also a problem faced by ML.

CONCLUSION
This paper provides an overview of current research progress of induced seismicity related to fluid injection, and proposes a basic framework of using ML method to improve the adaptive TLS. The implementation process of the framework has been described in detail, and its feasibility and rationality have also been demonstrated by some cases related to reservoirinduced seismicity. The proposed framework uses the threshold magnitude as the decision variable and can also consider the post-injection phase after shut-in. Compared with the statistical and actuarial approaches, the proxy model based on ML training with a large amount of data can more comprehensively reflect the complex relationship between operational and seismic parameters. Moreover, it can be applied to various injectioninduced earthquakes by collecting characteristic data of different activities. However, the complete collection of relevant data is the primary challenge.
We hope that this basic framework can provide valuable references for developing the risk-based adaptive TLS models in the future.

AUTHOR CONTRIBUTIONS
QL designed the research project. MH and XL wrote the manuscript. All authors performed the research, analyzed the results, and finally approved the manuscript.