Empirical properties of inter-cancellation durations in the Chinese stock market

Order cancellation process plays a crucial role in the dynamics of price formation in order-driven stock markets and is important in the construction and validation of computational finance models. Based on the order flow data of 18 liquid stocks traded on the Shenzhen Stock Exchange in 2003, we investigate the empirical statistical properties of inter-cancellation durations in units of events defined as the waiting times between two consecutive cancellations. The inter-cancellation durations for both buy and sell orders of all the stocks favor a $q$-exponential distribution when the maximum likelihood estimation method is adopted; In contrast, both cancelled buy orders of 6 stocks and cancelled sell orders of 3 stocks prefer Weibull distribution when the nonlinear least-square estimation is used. Applying detrended fluctuation analysis (DFA), centered detrending moving average (CDMA) and multifractal detrended fluctuation analysis (MF-DFA) methods, we unveil that the inter-cancellation duration time series process long memory and multifractal nature for both buy and sell cancellations of all the stocks. Our findings show that order cancellation processes exhibit long-range correlated bursty behaviors and are thus not Poissonian.


Introduction
In an order-driven market, order submission and cancellation play the most important role in the process of price formation. For order submission process, lots of studies have been conducted to investigate the statistical properties of the ingredients of an order including order price [1,2,3,4,5,6,7,8], order size or volume [9,10,11,12,13,14,15,16,17,18], order direction [7,19,20,21], and so on. Special attention has been paid to the probability distribution and memory effect of these ingredients and many stylized facts have been documented.
Order cancellation is a process of removing orders from the limit-order book which is a queue of limit orders waiting to be executed and constructed according to the price-time priority. If all orders placed at the best ask or best bid are cancelled, the mid-price defined as the mean value of the best ask and best bid will change. If cancellation occurs inside the limit order book, it causes changes of the structure of limit order book and has potential impact on price fluctuation.
The motivation of order cancellation is related to the non-execution (NE) risk or free option (FO) risk [22,23], and the former is the major reason for cancelling limit orders [23]. NE risk arises when the current security price moves away from the submitting price. The orders submitted in the front of limit-order book cannot be transacted immediately, which makes the traders suffer opportunity cost. Traders may cancel the stale orders and resubmit more aggressive ones to increase the transaction probability. So in order to reduce NE risk, buy traders potentially drive the security price up, and sell traders drive the price down. FO risk arises when important news arrives. The intrinsic value of asset will be underestimated (for good news) or overestimated (for bad news) according to the current price. In order to prevent to be traded at the unfavorable price, traders will cancel their limit orders and resubmit unaggressive ones. So conversely, in order to reduce FO risk, buy traders potentially drive the price down, and sell traders drives the price up.
Since there are rare cancellation data recorded in the past, only a few literatures investigated the empirical regularities of order cancellation. With the development of information technology and computer science, it is possible to record the order flow data which enables us to analyze the statistical properties of order cancellation and construct cancellation models. Ni et al. investigated the empirical regularities of inter-cancellation duration of 22 stocks in the Chinese stock market, and made a conclusion that order cancellation is a non-Poisson process [24]. Liu showed a simple model of order revision and cancellation, and found that the frequency of order cancellation is positively related to order submission risk and stock capitalization, but negatively related to bid-ask spread [25]. In an order-driven model, Daniels et al. assumed that order cancellation follows a Poisson process, which makes the model having powerful predictions of stylized facts, such as price diffusion, price impact, and so on [26]. In the empirical model proposed by Mike and Farmer, the order cancellation process is determined by three independent factors, the position in the order book relative to the opposite best price, the imbalance of buy and sell orders in the limit-order book, and the total number of orders stored in the limit-order book. This cancellation model gives an excellent prediction for the life time of cancelled orders [7].
In financial markets, a widely studied subject is the recurrence interval defined as the waiting time between two consecutive events. Many scholars have analyzed the probability distribution of recurrence intervals of different variables such as returns, volatilities and trading volumes. However, the results are controversial. Power-law distribution [27,28,29,30] and stretched exponential distribution [31,32,33,34,35,36,37,38,39,40,41,42] are mainly selected to fit the probability density function (PDF) in different financial markets. Moreover, other distributions are also proposed for complement [43,44]. It is interesting that the recurrence interval time series usually processes long memory [28,30,31,32,34,38,39,41,42,45,46]. Recurrence interval analysis has also been applied to other fields such as the energy dissipation rate in three-dimensional fully developed turbulence [47].
The goodness of fit for Weibull distribution and q-exponential distribution has been estimated in the distribution of intertrade durations. Jiang et al. studied the limit order data of 18 liquid stocks listed in the Chinese stock market, and showed that Weibull distribution gives better fitting than q-exponential distribution with the maximum likelihood estimation method, while q-exponential distribution outperforms Weibull distribution with the nonlinear least-squares estimation method [62]. Poloti and Scalas analyzed the tick-by-tick data set of DJIA stocks traded at NYSE in year 1999, and found that q-exponential distribution compares well to the Weibull distribution [63].
In this paper, we will study the statistical properties of inter-cancellation durations in event time for both cancelled buy and sell orders of 18 stocks listed on the Shenzhen Stock Exchange. The rest of paper is organized as follows. We study the probability distributions of the inter-cancellation durations based on the maximum likelihood estimation and nonlinear least-square estimation methods. We further discuss the memory effect and multifractal nature.

Dateset
Our analysis is based on the order flow data of 18 liquid stocks traded on the Shenzhen Stock Exchange in 2003. There were three periods in a trading day in 2003: opening call auction, cool period and continuous double auction. Opening call auction is held from 9:15 a.m. to 9:25 a.m., referring to the process of one-time centralized matching of buy and sell orders accepted during a specified period to generate the opening price at 9:25 in a trading day. Following the opening call auction, cool period is held from 9:25 a.m. to 9:30 a.m. when the Exchange is opened to orders routing from investors, but the orders or cancellations are not allowed to be processed. The main trading period is the continuous auction (9:30 a.m. -11:30 a.m. and 13:00 p.m. -15:00 p.m.), which refers to the process of continuous matching of both buy and sell orders on a one-by-one basis. Our database records ultra-high-frequency order flow data whose time stamps are accurate to 0.01s. It contains the details of order placement and order cancellation. For example, the stock Ping An Bank Co., Ltd. (000001) contains 3,925,832 records in the whole year of 2003. So we can rebuild the limit-order book based on the prefect database according the trading rules. On the other hand, it is difficult to obtain this type of prefect database and we only have the data of 23 stocks in the whole year of 2003 among which 5 stocks have wrong records of order cancellation, and we select the rest 18 stocks to study the statistical properties of order cancellation. For the 18 stocks analyzed, they cover 9 CSRC (China Securities Regulatory Commission) Industries, such as finance and insurance, real estate, transportation, machinery, to list a few. On the other hand, in the year of 2003, the Chinese stock index first went up then fell down. Bull market and bear market were both existed in 2003. So the database we studied generally presents the situation of Chinese market.
The paper not only focuses on the cancellation data in the continuous auction, but also includes the cancellation data in the opening call auction and cool period. We count the cancellation numbers N C for both cancelled buy and sell orders of each stock, and then calculate the ratio r of N C to the number of all the orders N A (including both submitted orders and cancelled orders). The results are listed in Table 1. We find that the ratio r fluctuates within a wide range, being [0.097, 0.195] for cancelled buy orders and being [0.087, 0.189] for cancelled sell orders. An interesting feature shows that the ratio of cancelled buy orders approximates to cancelled sell orders for each stock, which implies that a large proportion of buy orders cancelled corresponds to a large number of cancellations for sell orders, and vice versa.
We further investigate the ratios r in each trading day for all the stocks. Figure 1 presents the linear relations between N C and N A for both cancelled buy and sell orders of two stocks. The slopes γ of the fitted lines are calculated using the least-squares fitting method, and the values of 18 stocks are listed in Table 1. It is evident that the value of γ is close to the value of r for each stock, which implies that the ratios of cancellation in each trading day are almost similar for both cancelled buy and sell orders of each stock.
In the paper, the inter-cancellation duration is defined as the interval between two consecutive cancellations in units of events, which reads where t(i) is the event time when the i-th cancellation takes place. It is clear that inter-cancellation duration d(i) is the number of orders (including both buy orders and sell orders) submitted between the (i − 1)-th cancellation and the i-th cancellation. We calculate the average values d of inter-cancellation durations for both cancelled buy and sell orders of each stock, and depict the results in Table 1. According to the definition of inter-cancellation duration, we easily obtain the relation r > 1/ d which is confirmed by the data listed in the table. The reason is that the ratio r is defined based on a certain kind of orders (buy orders or sell orders) while the inter-cancellation duration d is considered as the number of both buy and sell orders submitted between two successive cancellations.

Probability distribution
Probability distribution of financial variables has crucial implications on asset pricing and risk management. In this section, we focus on investigating the probability distributions of inter-cancellation durations for both cancelled buy and sell cancelled orders of 18 stocks. The probability density functions (PDFs) P(d) of four randomly chosen stocks are presented in Figure 2(a). According to the empirical results shown in the literature and the curve shape presented in Figure 2, we fit the distributions by Weibulls [60,61,62,24] and q-exponentials [62,63,24,66,67]. For Weibull distributions, we have where a is the scale parameter and b is the shape parameter. The q-exponential distributions can be described as follows: where κ is the scale parameter and q is the shape parameter. The maximum likelihood estimation (MLE) method is first applied to estimate the parameters of Weibull and q-exponential distributions. As we know, MLE method captures the major part of the fitting data. We find that it accounts for 64.9% (63.7%) in the range d ≤ 10 for cancelled buy (sell) orders. The parameters a and b of the Weibull distribution and the parameters κ and q of the q-exponential distribution estimated with the MLE method are listed in the left panel of Table 2.
Since Weibull distribution has the same number of parameters as q-exponential distribution, the root mean square χ of the difference between the best fit and the empirical data is applied to compare the performance of the two distributions, which is presented in Table 2 as well. It is clear that χ qE of q-exponential distribution are smaller than χ WBL of Weibull distribution (χ qE < χ WBL ). So we conclude that q-exponential distribution outperforms Weibull distribution for the cancelled buy orders of each stock with the MLE method. Table 2: Parameters of the Weibull and q-exponential distributions based on the MLE and NLSE methods for cancelled buy orders of 18 stocks. χ is the root mean square of the difference between the best fit and the empirical data.

MLE NLSE Stock
Weibull In order to capture the tail behavior of the distribution, we then utilize the nonlinear least-squares estimation (NLSE) method to fit the distribution of cancelled buy orders, and the parameters of Weibull and q-exponential distributions are listed in the right panel of Table 2. The parameters a and b calculated from the NLSE method are all smaller than those with the MLE method for 18 stocks. For the parameter κ, 3 stocks out of 18 stocks own larger values with NLSE method, while for the parameter q, 12 stocks have larger values with NLSE method. We also select the root mean square χ to compare the performance of the two distributions with the NLSE method. According to the values of χ listed in Table 2, we find that the result is different from the MLE method. There are 6 stocks prefer Weibull distribution, and the rest 12 stocks are better fitted by q-exponential distribution. The mean values of four parameters for cancelled buy orders are also presented in the last row of Table 2. The mean value of the four parameters obtained from the MLE method are larger than those from the NLSE method.
With the same procedure mentioned above, we then analyze the probability distribution of cancelled sell orders with the MLE and NLSE methods, and obtain similar results. The parameters a and b of Weibull distribution and the parameters κ and q of q-exponential distribution are listed in Table 3. For the cancelled sell orders, the relation χ qE < χ WBL is satisfied for each stock when using the MLE method, which indicates that the distribution prefers qexponential distribution than Weibull distribution with the MLE method. However, there are 3 stocks out of 18 stocks have smaller values of χ WBL and prefer Weibull distribution with the NLSE method. Table 3: Parameters of Weibull and q-exponential distributions based on the MLE and NLSE methods for cancelled sell orders of 18 stocks. χ is the root mean square of the difference between the best fit and the empirical data.

MLE NLSE Stock
Weibull Similar to the cancelled buy orders, the parameters a and b of Weibull distribution calculated from the NLSE method are smaller than those from the MLE method for cancelled sell orders of each stock. However, when considering q-exponential distribution, there are 2 stocks (000488 and 000720) having larger values of κ and smaller values of q with the NLSE method. The mean values of four parameters for the cancelled sell orders are presented in the last row of Table 3 as well. It is evident that the mean value of the four parameters obtained from the MLE method are larger than those from the NLSE method, except for the parameter q.
We rescale the inter-cancellation duration d to d/ d and the probability density function P(d) to P(d) d , where d is the mean value of inter-cancellation durations d. The rescaled PDFs of inter-cancellation durations for the same four stocks are presented in Figure 2(b). We find that four rescaled curves collapse together, showing a perfect scaling behavior. Since the rescaled probability distribution has an excellent scaling, we aggregate all the inter-cancellation durations of 18 stocks together and treat them as an ensemble to obtain a better statistic. The rescaled PDFs of ensemble durations for both cancelled buy and sell orders are shown in Figure 3.
The parameters of Weibull and q-exponential distributions for ensemble inter-cancellation durations are calculated  Table  2 and Table 3, which confirms that the scaling behavior is truly existed. In addition, we find in Figure 3 that Weibull distribution evidently deviates from the empirical data in the tail with the MLE method, which is consistent with the relation χ qE < χ WBL for both cancelled buy and sell orders.

Memory effect
Another important issue about financial time series is the memory effect. Many methods have been proposed for quantitatively measuring the memory effort, such as the rescaled range (RS) analysis [68,69], the fluctuation analysis (FA) [70], the wavelet transform module maxima (WTMM) method [71,72], the detrended fluctuation analysis (DFA) [73], the detrending moving average (DMA) [74], and so on. Shao et al. compared the performance of the FA, DFA, and DMA methods using different long-range correlated time series, and found that centred detrending moving average (CDMA) has the best performance and DFA is only slightly worse in some situations, while FA performs the worst [75]. In this paper we apply the DFA and CDMA to investigate the memory effect of inter-cancellation duration series for both cancelled buy and sell orders of 18 stocks. Figure 4 presents the detrended fluctuation functions F DFA (s) with respect to the size scale s using the DFA method for both cancelled buy and sell orders of four stocks, 000001, 000009, 000012 and 000021. Each curve reveals excellent power-law scaling over more than three orders of magnitude.
Applying the DFA method, the Hurst exponents H of 18 stocks are estimated according to the power-law relation F DFA (s) ∼ s H , which are the slopes of solid lines shown in the log-log plot of Figure 4. We list the Hurst exponents for both cancelled buy and sell orders of 18 stocks in Table 4. The value of H for cancelled buy orders varies in the range [0.68, 0.82] with the mean value H = 0.76 ± 0.04, and for cancelled sell orders it varies from 0.68 to 0.85 with the mean value H = 0.76 ± 0.04. Since all the Hurst exponents are evidently larger than 0.5, we conclude that the inter-cancellation duration time series of both cancelled buy and sell orders process long memory.
Comparing with the backward detrending moving average (BDMA) and forward detrending moving average (FDMA), the centered detrending moving average (CDMA) performs better in one dimensional time series [76]. We then choose the CDMA method to estimate the memory effect of inter-cancellation duration series. The detrended fluctuation functions F CDMA (s) calculated from the CDMA method for both cancelled buy and sell orders of four stocks are depicted in Figure 5. Perfect power-law scalings are observed in the log-log plot which implies that the relation F CDMA (s) ∼ s H is well satisfied. The Hurst exponents H are the slopes of solid lines in the log-log plot. Using the least-squares fitting method, we calculate the Hurst exponents for both cancelled buy and sell orders of 18 stocks which are listed in Table  4 as well. It is obvious that the values of H obtained from the CDMA method are close to those calculated from DFA method. With all the Hurst exponents apparently larger than 0.5 with the two methods, we conclude that the inter-cancellation duration time series for both cancelled buy and sell orders of 18 stocks process long memory. On the other hand, the memory effect might be affected by the distribution of inter-cancellation durations. In order to test this hypothesis, we first shuffle the inter-cancellation duration series of each stock for 100 times, and then calculate the Hurst exponents H SFL for each shuffling series based on both DFA and DMA methods, respectively. The mean Hurst exponents H SFL of 100 shuffling series for both cancelled buy and sell orders of each stock are shown in Table 4. We find that the values of H SFL are all extremely close to 0.5, being significant smaller than the original ones H. So we conclude that the distribution of inter-cancellation duration series has little impact on its memory effect, and confirm that inter-cancellation duration series of both cancelled buy and sell orders truly exhibit significant long memory for all the 18 stocks.
Memory effect presents the time persistence of inter-cancellation durations. It reflects the clustering behavior  of order cancellation which is caused by traders' similar reactions to the market. For example, when good news arrives, traders will immediately cancel their limit orders in order to avoid being transacted at the unfavorable price. Many cancellations occur in a short period, which results to the clustering behavior and long memory effect of order cancellation.

Multifractal nature
Multifractals are ubiquitous in the nature and society [77], including financial time series. In this section, we investigate the multifractal properties of inter-cancellation durations applying the multifractal detrended fluctuation analysis (MF-DFA) method [78] which is generalized from the DFA method. The MF-DFA algorithm is described as follows.
Step 1. Consider an inter-cancellation duration series d(i), where i = 1, 2, · · · , N. Construct the cumulative sum sequence y(i) as follows, Step 2. Divide the sequence y(i) into N s disjoint segments with the same length s, where N s = [N/s], and [x] is the largest integer not larger than x. Each segment can be denoted as y v such that y v ( j) = y(ℓ + j) for 1 j s, and ℓ = (v − 1)s. Since the length of the inter-cancellation duration series N might not be a multiple of the segment size s, a remaining part (with the length smaller than s) at the end of sequence y(i) is not covered by the dividing procedure. We will select another N s disjoint segments from the end of the series for compensating the remaining part, and then consider the 2N s segments which covers the whole sequence y(i).
Step 3. In each segment y v , a polynomial function is utilized to represent the trend by the least-squares regression. The simplest function could be a line, and in the paper we adopt the linear function y v ( j) with 1 j s to remove the trend. The residual ǫ v ( j) in the segment y v can be calculated by Step 4. The detrended fluctuation function F(v, s) of the segment y v is defined as follows, Step 5. The q-th order overall fluctuation function F q (s) is determined through where q can take any real value except for q = 0. When q = 0, according to L'Hôspital's rule we have Step 6. Varying the values of segment size s from 10 to [N/6], a power-law relation between the function F q (s) and the size scale s is determined, which reads According to the standard multifractal formalism, the multifractal scaling exponent τ(q) characterizes the multifractal nature, which reads where D f is the fractal dimension of the geometric support of multifractal measure. For one dimensional time series analysis, we have D f = 1. If the scaling exponent τ(q) is a nonlinear function of q, the series has multifractal nature. Finally, it is easy to obtain the singularity strength function α(q) and the multifractal spectrum f (α) via the Legendre transform, that is, We calculate the q-th order fluctuation functions F q (s) of inter-cancellation durations for both cancelled buy and sell orders of two stocks, 000009 and 000012, and present the fluctuation functions F q (s) in Figure 6. We find that the function F q (s) has a excellent power-law scaling with respect to the scale size s. Using the least-squares fitting method, we obtain the slopes h(q) for q = −4, −2, 0, 2, 4, respectively. Figure 7 presents the scaling exponents τ(q) with respect to the order q and the multifractal spectra f (α) as a function of the singularity strength α for both cancelled buy and sell orders of the two stocks. We observe that the function τ(q) is nonlinear with respect to q, which illustrates that the inter-cancellation durations own multifractal nature.
In addition, the strength of multifractal can also be measured by the width of the multifractal spectrum f (α) (∆α = α max − α min ), and a larger value of ∆α corresponds to stronger multifractal. We calculate the singularity width ∆α for both cancelled buy and sell orders of 18 stocks and list the results in Table 5. The value ∆α of cancelled buy orders varies from 0.37 to 1.28 with the mean value ∆α = 0.74 ± 0.24, and for cancelled sell orders the value ∆α varies in the range [0.41, 1.38] with the mean value ∆α = 0.68 ± 0.25. Since all the values of ∆α larger than zero, we consider that the inter-cancellation duration series for both cancelled buy and sell orders of 18 stocks have multifractal nature, which is consistent with the results obtained from the scaling exponent τ(q).
Similar to the memory effect, the probability distribution might have influence upon the multifractal nature of inter-cancellation durations. In order to test the influence of distribution, we shuffle the inter-cancellation durations for 100 times to test this influence. For each shuffling series, the width of the multifractal spectrum ∆α SFL is obtained  Table 5. The values of ∆α SFL are clearly larger than zero, which indicates that the distribution of inter-cancellation durations reliably generates multifractal. We define the residual of spectrum width R through removing the shuffled width ∆α SFL from the original one ∆α, that is, R = ∆α − ∆α SFL , and list the values R of 18 stocks in Table 5. Since the values of R are evidently larger than zero, we conclude that inter-cancellation durations process multifractal nature for both cancelled buy and sell orders of all the stocks.

Conclusion
Order cancellation is an important issue in the dynamics of price formation in financial markets. We have carried out empirical investigations on the statistical properties of inter-cancellation durations (in units of events) using the order flow data of 18 liquid stocks traded on the Shenzhen Stock Exchange in the whole year of 2003.
We first study the probability distributions of inter-cancellation durations for both cancelled buy and sell orders, and find that the rescaled probability density functions have a scaling behavior. When fitting the probability distributions by Weibull and q-exponential distributions, we find that both cancelled buy and sell orders prefer q-exponential distribution with MLE method. However, applying the NLSE method, we find that cancelled buy orders of 6 stocks and cancelled sell orders of 3 stocks prefer Weibull distribution which is different from the result obtained from the MLE method.
We then investigate the memory effect of inter-cancellation durations based on the detrended fluctuation analysis (DFA) and centered detrending moving average (CDMA) methods. Using the DFA method we obtain the average Hurst exponent of 18 stocks H = 0.76 for both cancelled buy and sell orders, and with the CDMA method it is  H = 0.75 for both cancelled buy and sell orders. According to the results from these two methods, it is evident that the inter-cancellation duration series processes the same strength of long memory for both cancelled buy and sell orders.
Finally, we investigate the multifractal properties applying the multifractal detrended fluctuation analysis (MF-DFA) method. We find that the average width of multifractal spectrum ∆α = 0.74 for cancelled buy orders of 18 stocks and it is ∆α = 0.68 for cancelled sell orders. So we conclude that the inter-cancellation duration series has multifractal nature, and inter-cancellation duration series of buy orders has little stronger multifractality than cancelled sell orders. Our findings indicate that the cancellation process has a bursty behavior and possesses longrange correlations. Such non-Poisson behaviors have been unveils in many other human dynamics [79]. Table 5: The width of the multifractal spectra ∆α of inter-cancellation durations for both cancelled buy and sell orders of 18 stocks based on the MF-DFA method. ∆α SFL is the mean width of 100 shuffled inter-cancellation durations. R is the residual of spectrum width by removing the shuffled width ∆α SFL from the original one ∆α.