Stochastic Models for Radon Daily Time Series: Seasonality, Stationarity, and Long-Range Dependence Detection

Siino, Marianna; Scudero, Salvatore; D’Alessandro, Antonino

doi:10.3389/feart.2020.575001

ORIGINAL RESEARCH article

Front. Earth Sci., 10 November 2020

Sec. Economic Geology

Volume 8 - 2020 | https://doi.org/10.3389/feart.2020.575001

This article is part of the Research TopicGas Geochemistry: New Progresses and ApplicationsView all 24 articles

Stochastic Models for Radon Daily Time Series: Seasonality, Stationarity, and Long-Range Dependence Detection

Marianna Siino

Salvatore Scudero^*

Antonino D’Alessandro

Istituto Nazionale di Geofisica e Vulcanologia, Osservatorio Nazionale Terremoti, Rome, Italy

This study detects the presence of seasonality, stationarity, and long-range memory structures in daily radon measurements from a permanent monitoring station in central Italy. The transient dynamics and the seasonality structure are identified by power spectral analysis based on the continuous wavelet transformation and a clear 1-year periodicity emerges. The stationarity in the data is assessed with the Dickey–Fuller test; the decay of the estimated autocorrelation function and the estimated Hurst exponent indicate the presence of long-range dependence. All the main characteristics of the data have been properly included in a modeling structure. In particular, an autoregressive fractionally integrated moving average (ARFIMA) model is estimated and compared with the classical ARMA and ARIMA models in terms of goodness of fit and, secondarily, of forecast evaluation. An autoregressive model with a noninteger value of the differencing parameter ( $d = 0.278$ ) resulted to be the most appropriate on the basis of the Akaike Information Criterion, the diagnostic on the residuals, and the root mean squared error. The results suggest that there is statistically significant evidence for not rejecting the presence of long memory in the radon concentration. The radon measurements are better characterized as being stationary, but with long memory and so, the statistical dependence decays more slowly than an exponential decay.

1.Introduction

The monitoring of soil radon ( ${}^{222}R_{n}$ ) emission is a relevant topic for the risk that this radioactive gas poses to human health but also for its relationship with environmental and geological processes. The radon signals usually present a complex dynamic structure that is directly and indirectly influenced by several factors, such as environmental and climatic conditions of the site and characteristics of the ground soil, tide, and solar effect. (Pinault and Baubron, 1996; Piersanti et al., 2015; Siino et al., 2019b). All these factors have a different effect on the signal, as they can result either in a trend, seasonal, or stochastic component. For instance, climate or tidal forces reflect in a multiple seasonality of the radon time series: hourly, diurnal, multi-day, annual, and even multi-annual cycles have been detected in different studies worldwide (Crockett et al., 2006; Udovičić et al., 2014; Yan et al., 2017; Crockett et al., 2018; Siino et al., 2019b; D’Alessandro et al., 2020). Of particular interest is the role of Rn as a potential earthquake precursor (Barbosa et al., 2015; Woith, 2015; Baskaran, 2016; Morales-Simfors et al., 2019) because the fracturing processes in the crust could enhance the mobility of Rn toward the surface (Toutain and Baubron, 1999; Woith, 2015). Similarly, anomalies can be the result of weather episodes which cannot be explained by meteorological variables. Whatever the cause, these anomalies can be masked within the signal, and a way to bring them to light would be to de-noise the signal from the trend and/or periodic components (Baykut et al., 2010; Siino et al., 2019b; D’Alessandro et al., 2020). As a matter of fact, it is a challenging task to untangle and properly quantify all of these effects on the radon fluctuations because Rn time series present generally a nonstationary behavior, not constant variability over time and a long-term memory (Donner et al., 2015).

Methodologically, time series analysis techniques are proper statistical tools to extract meaningful characteristics from data. Moreover, because long-term records of environmental variables show often long-range memory, some other tools are usually applied. The fractionally integrated moving average models [ARFIMA (p,d,q)] have been widely used in the literature to describe meteorological variables (Yaya and Fashae, 2015; Bowers and Tung, 2018), pollutants and soil gas (Pan and Chen, 2008; Donner et al., 2015; Belbute and Pereira, 2017; Reisen et al., 2018), and hydrological time series (Montanari et al., 1997; Wang et al., 2007). This class of models is used when the long-term correlations in the data decay more slowly than an exponential form, that is, a typical shape of autocorrelation in the autoregressive moving average [ARMA(p,q)] processes (Box et al., 2015). Furthermore, several studies investigate the predictability of the ARFIMA model assessing multi-step ahead performance with respect to others univariate time series forecasting methods such as a naive method, random walk (with drift), ARMA with trend and seasonality, and the exponential smoothing (Papacharalampous et al., 2018a; Papacharalampous et al., 2018b).

In the literature, the radon data have been described with different methods. Dunn and Henschel (1989) characterize a three-week record at an hourly frequency using simple autoregressive moving-average (ARMA) models. Later, the Box–Jenkins methodology often used in econometrics was applied to describe five-year-long radon time series considering a seasonal integrated, autoregressive moving averages model with exogenous variables (SARIMAX) also adding external covariates such as delayed atmospheric parameters (Stránský and Thinová, 2017). Donner et al. (2015) present complementary methods that have been applied for evaluating the presence of long-range correlations and fractal scaling in environmental radon measurements.

In this study, we analyze a 3-year-long radon concentration signal aiming at the assessment of a model which describes its dynamics with time series methodologies (Shumway and Stoffer, 2017). A comprehensive analysis of the seasonality structure is performed to detect clues about the stationarity and the presence of long-range memory in the data that could be related to geological processes. We estimate some ARFIMA models which explicitly consider simultaneously both the short-term and long-term correlation structures of the series. Moreover, we tested the forecast performance of the obtained models. The novelty of this analysis relies on the simultaneous estimation of seasonality and long-range memory in the estimation of proper ARFIMA stochastic models.

2.Materials and Methods

In this section, the time series methods used in the analysis of daily radon measurements are described following Shumway and Stoffer (2017) and Beran (2017). We briefly present some tools useful to check in an observed time series the presence of nonstationarity (in terms of seasonality and trend) and long-range memory behaviors.

The seasonality behavior in the data is studied by power spectral density based on the time-averaged continuous wavelet spectrogram (Daubechies, 1992; Conraria and Soares, 2011). To properly apply the stochastic models, the daily radon time series is examined for the presence of stationarity. The Dickey–Fuller test (Dickey and Fuller, 1979) is used for this purpose to determine the presence of a unit root in an autoregressive model. The presence of long-range memory has been assessed on the data estimating the Hurst exponent (Hurst, 1951) and looking at the shape of the estimated autocorrelation coefficients for several lags. The presence of long-term memory can justify the estimation of the ARFIMA models, and their structure is also explained.

2.1.Spectral Analysis for Seasonal Detection

In this paragraph, we briefly describe the spectral analysis in the time–frequency domain based on continuous wavelet transformation following the notation in Daubechies (1992) and Conraria and Soares (2011).

The space $L^{2} ()$ is the set of square integrable functions satisfying ${\int^{}}_{- \infty}^{+ \infty} | g (t) |^{2} d t < \infty$ and denoted by the capital letter, G(t) the Fourier transformation of a given function, $G (ω) = {\int^{}}_{- \infty}^{+ \infty} g (t) e^{(- i ω t)} d t .$ A function $ψ (t) \in L^{2} (R)$ that satisfies the admissibility condition $Ψ (0) = {\int^{}}_{- \infty}^{+ \infty} ψ (t) d t = 0$ is called “mother wavelet,” and a doubly indexed family (“wavelet daughters”) is generated by scaling and translating $ψ (\cdot)$ : $ψ_{τ, s} (t) = {| s |}^{- 1 / 2} ψ (\frac{t - τ}{s})$ with $s, τ \in$ and $s \neq 0$ . In this analysis, we use the well-known, quite flexible, and complex-valued Morlet mother wavelet that takes the form $ψ (t) = π^{- 1 / 4} e^{i ω t} e^{- t^{2} / 2}$ . The local wavelet power spectrum (WPS) based on the continuous wavelet transformation (CWT) of a given function $g (t) \in L^{2} ()$ with respect to the wavelet family

{| W P S |}_{g} (τ, s) = {| W_{x; ψ} (τ, s) |}^{2} = {| {\int^{​}}_{- \infty}^{+ \infty} g (t) {| s |}^{- 1 / 2} ψ^{*} (\frac{t - τ}{s}) d t |}^{2} (1)

where $*$ represents the complex conjugate operation, s is the scale parameter controlling the wavelet width, and τ controls the wavelet location in the time domain. The wavelet power spectrum Eq. 1 can be interpreted as the local variance of the time series.

To do a comparison with the classical spectral method, the previous quantity can be averaged over time (τ) obtaining the global wavelet power spectrum,

{\int^{​}}_{- \infty}^{+ \infty} {| W_{x; ψ} (τ, s) |}^{2} d τ [2]

The peaks in the global power spectral density indicate the prevalent periods in the data. In this study, the wavelet transformation and the computation of the global power spectrum are computed with the WaveletComp package (Roesch and Schmidbauer, 2018) in R statistical software (Team, 2005).

2.2.Autocorrelation and Partial Autocorrelation Functions

Given a time series ${y_{t}}$ , the autocorrelation is the similarity between the observations as a function of the time lag between them. The $j^{t h}$ order autocorrelation $ρ (j)$ can be estimated by using the formula

\hat{ρ} (j) = \frac{\hat{C o v} (y_{t}, y_{t - j})}{\hat{V a r} (y_{t})}, (3)

where

\hat{C o v} (y_{t}, y_{t - j}) = \frac{1}{n - 1} \sum_{t = j + 1}^{n} (y_{t} - \bar{y}) (y_{t - j} - \bar{y}) (4)

and

\hat{V a r} (y_{t}) = \frac{1}{n - 1} \sum_{t = j + 1}^{n} {(y_{t} - \bar{y})}^{2} (5)

In Eqs. 4 and 5, $\bar{y}$ is the mean of $y_{t}$ , and 5 is just the special case of 4 in which $j = 0$ . The empirical autocorrelation function (ACF) is $\hat{ρ} (j)$ defined in Eq. 3, computed in the data as a function of the lag j.

Moreover, another way to characterize the relationship between $| {y_{t}}$ and its lagged values is by the partial autocorrelation function, or PACF. The partial autocorrelation coefficient of order j, $ρ_{j}^{(j)}$ measures the effect (linear dependence) of $y_{t}$ on $y_{t - j}$ after removing the effect of $y_{t - 1}$ , $y_{t - 2}$ , $\dots$ $y_{t - j - 1}$ on both $y_{t}$ and $y_{t - j}$ . Each partial autocorrelation can be obtained as a series of regressions of the form:

y_{t} = γ^{(j)} + ρ_{1}^{(j)} y_{t - 1} + \dots + ρ_{j}^{(j)} y_{t - j} + ε_{t} (6)

The empirical PACF of order J is computed by running 6 for $j = 1, \dots, J$ and retaining only the estimate $ρ_{j}^{(j)}$ for each j. The shape of both the sample ACF and PACF provides a way to see which is the pattern of serial dependence, and it may help to suggest which kind of stochastic process would fit well the data.

If ${y_{t}}$ presents long-range memory, the correlation function 3 decays hyperbolically showing a power law distribution (Höll et al., 2019). Clauset et al. (2009) give an overview of the statistical methods that can be used to detect and characterize power law distribution in empirical data.

2.3.The Hurst Coefficient and the Rescaled Range (R/S) Method

The Hurst exponent (H) is an index of long-term memory of time series ${y_{t}}$ originally developed for hydrological data (Hurst, 1950; Hurst et al., 1965). It is defined in asymptotic terms of the rescaled range as $E [R (n) / S (n)] = C n^{H} as n \to \infty$ , where n is the number of data points in a time series, E is the expected value, C is a constant, $R (n)$ is the range of the first n cumulative deviations from the mean, and $S (n)$ is their standard deviation.

We consider the rescaled range (R/S) method to estimate H (Mandelbrot and Wallis, 1968; Mandelbrot and Wallis, 1969).

Given $y_{t}$ , the mean is computed ( $m = \frac{1}{n} {\sum^{}}_{i = 1}^{n} y_{i}$ ) and the mean adjusted series $x_{t} = y_{t} - m$ for $t = 1,2, \dots, n$ . Then, the cumulative series is $z_{t} = {\sum^{}}_{i = 1}^{t} x_{i}$ , and the range series is $R_{t} = max (z_{1}, z_{2}, \dots, z_{t}) - min (z_{1}, z_{2}, \dots, z_{t})$ for $t = 1,2, \dots, n$ . A standard deviation series S is computes as $S_{t} = \sqrt{\frac{1}{t} {\sum^{}}_{i = 1}^{t} {(y_{i} - m (t))}^{2}}$ where m(t) is the mean for the time series values through time t. The following series of the ratio is considered $(R_{t} / S_{t})$ for $t = 1,2, \dots, n$ .

The Hurst exponent is estimated as the slope of the line between $log [R_{t} / S_{t}]$ and $log t$ . The long memory structure exists when $0 < H < 1$ . If $H \geq 1$ , the process has infinite variance and is nonstationary. If $0 < H < 0.5$ , an antipersistence structure exists; if $0.5 < H < 1$ the series is persistence, instead when $H = 0.5$ , the process is a white noise. Other methods have been proposed in the literature to detect the presence of long-range temporal correlations in the presence of nonstationary in the data, that is, the detrended fluctuation analysis (Höll et al., 2019).

2.4.Autoregressive Fractionally Integrated Moving Average Model

Environmental data, and also radon measurements, can exhibit characteristics consistent with long-range memory in time series (Donner et al., 2015). Such characteristics consist in a specific structure of the autocorrelation function of the process.

If ${y_{t}}$ presents long-range memory, correlation function 3 decays hyperbolically, rather than showing the exponential decay that is a characteristic of an ARIMA $(p, 0, q)$ process. A way to characterize long-range dependence in observational data is by fitting autoregressive fractionally integrated moving average (ARFIMA $(p, d, q)$ ) models, which are a natural extension of the classic ARIMA $(p, d, q)$ models (Hosking, 1981). The ARFIMA models allow for handling explicitly both the short-term and the long-term correlation structures of a series. Let ${y_{t}}$ be a stationary process, an ARFIMA $(p, d, q)$ process where p and q are integers and d is real, represented as

ϕ (B) \nabla^{d} (y_{t} - x_{t} β) = θ (B) ε_{t} t = 1, \dots, T, [7]

where $\nabla^{d}$ is the fractional differencing operator $\nabla^{d} = {(1 - B)}^{d} = {\sum^{}}_{k = 0}^{\infty} (\begin{matrix} d \\ k \end{matrix}) {(- B)}^{k}$ , B is the backward shift operator defined by $B y_{t} = y_{t - 1}$ , and ${ε_{t}}$ is a white noise with variance $σ_{ε}^{2}$ . $ϕ (B) = 1 - ϕ_{1} B - \dots - ϕ_{p} B^{p}$ and $θ (B) = 1 - θ_{1} B - \dots - θ_{q} B^{q}$ are the autoregressive and the moving average operators, respectively. Note that the binomial coefficient can be defined for real values of d, $(\begin{matrix} d \\ k \end{matrix}) = \frac{d (d - 1) (d - 2) \dots (d - k + 1)}{k (k - 1) (k - 2) \dots 1}$ for $k \in ℕ$ and an arbitrary d.

The row vector $x_{t}$ contains the exogenous variables, and in our analysis, they are harmonic terms used to describe the seasonality in the radon time series. The parameter d in Eq. 7 describes the high-lag correlation structure of a time series, while the p and q parameters are chosen to describe the low-lag correlation structure.

An important aspect to assess in a time series is its stationarity; a process is defined as stationary when its mean, variance, or autocorrelation structure remains constant over time. For stationary series, $d \in (- 0.5, 0.5)$ , and the Hurst exponent associated with the process is given by $H = (2 d + 1) / 2$ . Consequently, long-range memory is present for $d \in$ (0, 0.5), while $d \in$ (−0.5, 0) indicates antipersistent fluctuations. When $| d | > 0.5$ , the process is nonstationary and its variance is infinite. The process exhibits short memory for $d = 0$ , corresponding to stationary and invertible ARMA (autoregressive moving average) model. Instead, the arbitrary restriction of d to integer values corresponds to the standard autoregressive integrated moving average (ARIMA) model, and in this case, the variable is $I (d)$ , and it becomes stationary after d differences and it is nonstationary after day 1 differences. For instance, an $I (1)$ variable can have a linear trend but no quadratic trend, and it can be transformed into a stationary series with the first-order differences. If a series exhibits long memory, it is neither stationary ( $I (0)$ ) nor it is a unit root process ( $I (1)$ ); it is an $I (d)$ process, with d a real number.

There are statistical tests to check stationarity, named unit root tests. The results are traditionally interpreted as that the effects of one-time shocks to the series are either transitory (if the series is stationary), or permanent (if the series is not stationary). The Dickey–Fuller test (DF) (Dickey and Fuller, 1979) tests the null hypothesis that the series is nonstationary; however, DF only considers the dichotomy between stationarity and nonstationarity. The rejection of the null provides evidence for a stationary series, then the ARMA model can be directly applied. Instead, if the null hypothesis is accepted, the series needs to be made stationary through differencing.

The model in Eq. 7 is estimated with exact maximum likelihood estimation explained in Veenstra (2013) using arfima package of R statistical software (Team, 2005). Usually, the model selection is performed evaluating simultaneously the goodness of fit and the forecast performances. The assessment of the goodness of fit can be done using the Akaike Information Criteria, $A I C = 2 k - 2 l n (L)$ , where k is the number of estimated parameters in the model Eq. 7 and L is the maximized value of the likelihood function for the estimated model, and the model with the smallest AIC value is preferred. Moreover, the assumptions of the model on the random component ( $ε_{t}$ ) are checked assessing the constant variability, the normality assumption, and the absence of correlation structure in the model residuals ( ${\hat{ε}}_{t}$ ).

The model family in Eq. 7 is fitted to time series data both to understand the data and to forecast (to predict future points in the series). Forecast evaluation can be done when the observed values are available. Usually, the observed data are divided into training and test samples. The model is fitted to the training sample, and then its k-step ahead forecast performance is evaluated on the test one. The root mean square errors (RMSEs) are used to check the forecast accuracy of the estimated models; it is given by $R M S E = \sqrt{{\sum^{}}_{i = 1}^{n} {(y_{i} - {\tilde{y}}_{i})}^{2} n^{- 1}}$ , where $y_{i}$ is the observed value for the ith observation and ${\tilde{y}}_{i}$ is the predicted one.

3.Results

3.1.Data

The analyzed radon time series is recorded at Pietralunga (PTRL, Italy, lat 43.44 N and long 12.44 E) between 28/09/2012 and 01/08/2015 for a total of 1,038 days. The PTRL station is in a framework of near real-time monitoring of soil radon emission to study earthquake preparatory processes, the Italian radon monitoring network (IRON) (Cannelli et al., 2018). The selected station is equipped with a Lucas cell, an alpha scintillation detector with an acquisition window of about 2 h (115 min of data acquisition followed by a 5-min standby time). In detail, the Lucas cell consists of a flask in which the inner surface is coated with silver-activated zinc sulfide (ZnS). It integrates a front-end electronics and measures radon concentration by counting the radon decay signals in the given acquisition window. The radon detector is located in a small room of a school basement, not disturbed by anthropogenic influences and without any kind of opening and/or aeration system. However, the pressure and the temperature could affect the radon measures. The PTRL site is characterized by a contained seasonal variability also if compared with other sites of the same network, even equipped with a borehole probe which should be more immune from such effects (c.f. Figure 2 in Siino et al. (2019b)). The radon concentration is measured in $B q / m^{3}$ , becquerel per cubic meter.

The raw time series of the mean daily concentrations is shown in Figure 1; the measured values range between 20.35 $B q / m^{3}$ and 377.86 $B q / m^{3}$ , and they show a clear seasonal signal connected with the temperature (Cannelli et al., 2018; Siino et al., 2019b), and the higher values are during the summer period.

FIGURE 1

FIGURE 1. (Left) Data, mean daily radon observations in $B q / m^{3}$ at Pietralunga (Umbria, Italy). The red vertical line separates the training set and the remaining 5% of the test set. (Central) Autocorrelation coefficient and (right) partial autocorrelation coefficient of the time series.

The 0.87% of the daily data are missing. Generally, it is a challenge to handle missing values, especially for time series data. Two possible ways to deal with the incomplete data can be to omit the entire record that contains information or impute the missing values. However, since a small percentage of the analyzed data presents missing values, they are filled by the weighted moving average method with a semi-adaptive window of 4 days. Weighted moving averages assign a linear weighting to the data points used to perform the imputation.

The data are divided into two subsets: the training set used to do the main analysis and to fit the stochastic models, and the test set (5% of the data identified by a vertical line in Figure 1) used to compare the models in terms of forecasts.

3.2.Seasonality, Stationarity, and Long-Memory Detection

We report the main results about the seasonality detection, the autocorrelation, and the long-memory analysis of the observed series. For the study of the dynamical and seasonal behaviors of the observed radon concentrations, we compute the spectral density analysis in the time–frequency domain based on the continuous wavelet transformation. The wavelet power spectrum is shown in Figure 2 where the period ranges from 16 to 512 days. The time–frequency regions with warm colors are characterized by high power, and the black lines indicate the significant maxima of the undulations of the wavelet power spectrum, and they give an indication of the permanent cycle period. The thick black contour indicates the 90% confidence level, and the lighter shade indicates regions inside the cone of influence due to the border effect.

FIGURE 2

FIGURE 2. (Left) Wavelet power spectrum of daily radon series in time–frequency domain with the CWT method. The black contour indicates the significant period with 90% confidence level. The lighter shade is the regions influenced by edge effects. (Central) The corresponding global power spectrum density marginalizing over time. The horizontal lines are for 180- and 365-day periods. (Right) The gray line is the observed time series, and the black curve is the fitted linear regression model with respect to harmonic terms to describe the 1-year periodicity.

Clearly, the series exhibits transient dynamics and the magnitude of the WPS is not constant over time fixing a specific frequency. We can observe a high value of the spectral power density at about 1-year periodicity that is persistently significant. Other high-power periodicities are present, even though not continuous over the entire period. A medium-power, ∼180-day cycle is recognizable in the first half of the series, and a ∼22-day cycle characterized by high-power appears around summer 2014. These cycles can be also observed in Figure 2 which shows the global wavelet power spectrum; the horizontal lines provide a reference at 180 and 365 days. The series show a clear 1-year periodicity and subordinate periodicities at about 180 days and three weeks. The longer cycles are probably related to the annual and semiannual cycles of the climatic variables (temperature, pressure, and rainfall), while the ∼22-day cycle is likely ascribable to the lunisolar gravitational influence which results in a tide effect on the flux or radon [see Siino et al. (2019b)]. This descriptive analysis is preparatory to decide which seasonality terms include in the model formulation for the explanatory variables ( $x_{t}$ in Eq. 7). In particular, we consider harmonic terms to describe the 1-year periodicity that is the only one persistent with a constant power over time. Figure 2 shows the estimated curves considering a regression linear model fitted on the data with harmonic terms for a 365-day period (with $R^{2}$ coefficient equals to 0.34).

The shapes of the correlogram and the partial correlogram provide indication about the properties of the time series and could indicate a plausible structure of the stochastic model in Eq. 7. In Figure 1, the estimated autocorrelation up to 400 lags seems to decay slower than an exponential one. Also from the autocorrelation, it is clear that the data exhibit a prevalent seasonal cycle which dominates the dependence structure. The partial autocorrelation coefficient is defined as the autocorrelation at each lag after controlling for the autocorrelation due to all preceding lags. It helps determine how many AR terms (i.e., lagged observations as predictors) should be included in Eq. 7. If there is a sharp drop in the PACF after p lags, then the previous p-values are responsible for the autocorrelation in the series, and the model should include p autoregressive terms. In our case, the highest and also significant value is at lag 1, with a value of the correlation equals to 0.792, and in the following lags ( $p > 1$ ), the autocorrelation coefficients are close to zero. It indicates that an autocorrelation term ( $p = 1$ ) can be included in the model.

The estimate of the Hurst exponent (H) with the rescaled range analysis (Section 2.3) is used to assess the presence of long memory. The obtained value is 0.785 indicating that the mean daily measurements have a persistent long-memory structure since $0.5 < H < 1$ . In the literature, there are several results consistent with our analysis. For instance, in Cuculeanu et al. (1996), the determined values of Hurst’s coefficient (0.809) highlight a persistent behavior of the gas. Also, Nikolopoulos et al. (2018) compute the rescaled range analysis for several time intervals obtaining a persistent Hurst exponent between 0.7 and 0.9 and in some periods between 0.9 and 1.

The Dickey–Fuller test is used to check the null hypothesis that the series is nonstationary, and thus, the rejection of the null provides evidence for a stationary series. The value of the test on the data in Figure 1 is −5.285, and the value of the p-value (1.53e−07) is lower than the significant level $α = 0.05$ , and we can reject the null hypothesis that the series has a unit root and hence is not stationary. According to this result, for our data, integration (first-order differences) is not necessary.

3.3.Modeling Results

The obtained results indicate that the studied radon concentrations present persistent long-memory structure, 1-year seasonality, and an absence of a trend. Also, according to the PACF, an autoregressive term can be appropriate to describe the short-term correlation.

Starting from these evidences, four models are estimated and compared, and all of them have the harmonic terms ( $sin (2 π t ω)$ and $cos (2 π t ω)$ ) as external covariates to describe the seasonality. Four candidate models are estimated:

• Model (a) is a fractional model with $p = q = 0$ , ARIMA $(0, d, 0)$

\nabla^{d} (y_{t} - x_{t} β) = ε_{t} t = 1, \dots, T

• Model (b) is an ARMA $(1,0)$ , so it is an autoregressive model of order 1 without differencing (also it can be indicated as an ARIMA (1,0,0) model)

(1 - ϕ_{1} B) (y_{t} - x_{t} β) = ε_{t} t = 1, \dots, T

• Model (c) is an ARFIMA $(1, d, 0)$ model

(1 - ϕ_{1} B) \nabla^{d} (y_{t} - x_{t} β) = ε_{t} t = 1, \dots, T

• Model (d) is ARIMA $(1,1,0)$ model with order of integration equal to $d = 1$

(1 - ϕ_{1} B) \nabla^{1} (y_{t} - x_{t} β) = ε_{t} t = 1, \dots, T

The Models (b), (c), and (d) have an autoregressive term ( $p = 1$ ) since doing several comparisons, a parsimonious model is obtained without moving average terms ( $q = 0$ ). Only in the Models (a) and (c), the fractional integration parameter is freely estimated. The estimated parameters, their standard errors, and significance are reported in Table 1. Also, additional information for each model such as the log-likelihood, the AIC, the range of the residuals, and the RMSE at 1- and 5-lag are shown.

TABLE 1

TABLE 1. Estimates of ARFIMA models with different orders where the response variable is the average daily radon measurements in Figure 1.

The results of the estimate models suggest that for all of them, the coefficients associated to the harmonic terms are significant, and the comparison with the models without the external covariates are worse (the results are not shown).

Examining the results of Models (a) and (c) where the parameter d is estimated varying in the real values, the fractional parameter for both models is between 0 and 1, thus allowing us to reject both the case of pure stationarity (I = 0) and the unit root model (I = 1). The estimated parameters are statistically significant at the 1% level and lie within the interval (0, 0.5). The confidence intervals for the estimated fractional-integration parameters are relatively narrow and always in the positive range of persistent long-memory.

The results show that Model (c) is the best model in terms of fitting since it has the lowest AIC and the shorter range of the residuals. For the RMSE at 1- and 5-lag forecast, Model (a) performs slightly better than Model (c); however, the fractional model has too simple parametrization, and it is not able to describe the autocorrelation dynamic in the data (see the residuals time series in Figure 3 and the diagnostics on the residuals Figures 4, 5).

FIGURE 3

FIGURE 3. Residual time series for the estimated Models (A–D) as labeled in Table 1.

FIGURE 4

FIGURE 4. Autocorrelation coefficients of the residuals of the estimated Models (A–D) as labeled in Table 1. The horizontal lines indicate the confidence interval at 95% for not significant correlation coefficient.

FIGURE 5

FIGURE 5. Partial autocorrelation coefficients of the estimated Models (A–D) as labeled in Table 1. The horizontal lines indicate the 95% confidence bounds for strict white noise.

FIGURE 6

FIGURE 6. Q‐Q norm of the residuals time series for the estimated Models (A–D) as labeled in Table 1 to check the normality assumption.

For all the estimated models, the assumption of constant variability along time appears respected (Figure 3), and there is not a marked pattern in the residuals, in particular the seasonality behavior in the original data is not present (Figure 1). The residual ACF (Figure 4) and PACF (Figure 5) of the fitted Models (a), (b), and (d) show that there are significant estimated correlation coefficients at short lags; therefore, these models are not adequate. Instead, for the Model (c), the residual ACF and PACF are not significant. The Ljung–Box test from 1 to 10 lags is computed to assess the absence of serial autocorrelation in the residuals. For the model (c), the null hypothesis is not rejected for all the considered lags. The q-q plot of the residuals is used to assess the normality assumption of the considered models. In Figure 6, for all the four models, there is a slight departure from the normality in the tails. Finally, the plots of observed and estimated values obtained with the four model at 1- and 5-lags are shown in Figures 7, 8 respectively.

FIGURE 7

FIGURE 7. Prediction at 1‐lag and observed radon measurements for the estimated Models (A–D) as labeled in Table 1.

FIGURE 8

FIGURE 8. Prediction at 5‐lag and observed radon measurements for the estimated Models (A–D) as labeled in Table 1.

4Discussion and Conclusions

Being sensitive to crustal stress, the soil radon discharge is widely considered as a promising earthquake precursor. Because of the influence of several environmental factors and local geological conditions, pre-seismic radon anomalies cannot be easily detected with conventional statistical methodologies.

The general approach is to model the observation and highlight the anomalies. This study tests a radon concentration time series, covering almost 3 years, for the presence of nonstationarity (seasonality and trend) and long memory. Overall, our results indicate that the radon series are better characterized as being stationary in the trend, but with persistent long-memory and 1-year seasonality. It is widely accepted that the periodic annual component in radon concentration time series is correlated to the climatic variables as temperature, atmospheric pressure, and rainfall (Siino et al., 2019a; Siino et al., 2019b; D’Alessandro et al., 2020).

The class of the ARFIMA model presented here provides a general framework for representing radon time series that display both short- and long-term persistence. The analysis of daily radon shows that the ARFIMA approach provides a better representation of the observed data with respect to the traditional ARMA and ARIMA models. More specifically, according to the model comparison, an ARFIMA model with an autoregressive term has a better fitting to the data. The estimated fractional integration parameters of this ARFIMA model is positive and smaller than 0.5 ( $d = 0.278$ ). It corresponds to a Hurst’s coefficient of H=0.778 that is consistent with the results obtained with the rescaled range analysis and in general with other literature results (Cuculeanu et al., 1996; Nikolopoulos et al., 2018). The proposed model has been also assessed in terms of its predictive capacity, however the performances are quite poor especially increasing the lag of prediction (moving from 1-day forecast to 5-day forecast).

The occurrence of long-range correlation in the time series has been also tested by the application of Detrended Fluctuation Analysis (DFA) (Höll et al., 2019). Also, this method indicates that the radon concentration can be considered as coming from a fractional Gaussian noise (fGn).

It is widely recognized that radon time series are strongly controlled by the combination between site-specific factors and large-scale variations (i.e., astronomical cycles) (Schery et al., 1984; Schumann et al., 1988; Aumento, 2002; Piersanti et al., 2015; Crockett et al., 2018). It is note worth that the proposed approach (based only on radon measurements) is able to describe with good reliability the data and also to perform short-term forecasts when accurate radon measurements are taken for a reasonably long time span.

Model residuals could be retrospectively compared with external evidence of transitory phenomena in the study area (seismic, meteorological, etc.). Having available the seismic catalogue of the area, we make an attempt to find the relationship between the found anomalies in the radon time series and the earthquakes. In this case study, there is no evidence between the residuals of the fitted model and the seismicity in the study area. However, it should be considered the absence of any relevant earthquakes during the observation period but only the occurrence of background seismicity. In fact, in spite of the well-known seismicity of the area, the larger recorded event was a $M_{w} = 3.9$ located at 9.5 km from the radon monitoring site.

In conclusion, our findings on the long-memory nature of radon measurements have important implications that can be useful for further analysis. The long-memory structure is the result of a long-lasting and aperiodic process such as a weather episode and changes in the circulation of geofluids, ground sealing. However, at this stage (i.e., a single time series), it is not possible to propose a comprehensive physical/geological or physical/meteorological mechanism that could account for the long memory in radon concentration time series; moreover, it would also be out of the purpose of this work. The extension of this methodology by applying the ARFIMA models to longer radon time series, or to series with a different measurement range, or recorded in other monitoring sites, could provide the missing hints.

The proposed approach represents an effective tool to analyze radon signals, and in particular to detect long-range memory in the times series, which are the necessary preliminary steps to explore the relationship between radon anomalies and seismic activity. Finally, it would be interesting for further analysis, to compare the forecast of radon observations, or the identification of preseismic anomalies with those obtained with other methods such as the linear regression analysis (Stojanovska et al., 2017), the artificial neural network approach (Pasini and Ameli, 2003), or the decision tree method (Zhang et al., 2020). It would also be interesting to consider other external covariates in the model formulation Eq. 7, such as weather variables (Stránský and Thinová, 2017).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author Contributions

AD and MS conceived the idea of the analysis approach. Data analyses and comments were made by MS. MS and SS prepared and reviewed the manuscript.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We thank A. Piersanti e V. Cannelli (Istituto Nazionale di Geofisica e Vulcanologia) who provided the data.

References

Aumento, F. (2002). Radon tides on an active volcanic island: Terceira, azores. Geofisc. Int. 41, 499–505. doi:10.22201/IGEOF.00167169P.2002.41.4.501

CrossRef Full Text | Google Scholar

Barbosa, S., Donner, R., and Steinitz, G. (2015). Radon applications in geosciences—progress and perspectives. Eur. Phys. J. Spec. Top. 224, 597–603. doi:10.1140/epjst/e2015-02393-y

CrossRef Full Text | Google Scholar

Baskaran, M. (2016). Radon: a tracer for geological, geophysical and geochemical studies. New York, NY: Springer.

Google Scholar

Baykut, S., Akgül, T., İnan, S., and Seyis, C. (2010). Observation and removal of daily quasi-periodic components in soil radon data. Radiat. Meas. 45, 872–879. doi:10.1016/j.radmeas.2010.04.002

CrossRef Full Text | Google Scholar

Belbute, J. M., and Pereira, A. M. (2017). Do global co₂ emissions from fossil-fuel consumption exhibit long memory? A fractional-integration analysis. Appl. Econ. 49, 4055–4070. doi:10.1080/00036846.2016.1273508

CrossRef Full Text | Google Scholar

Beran, J. (2017). Statistics for long-memory processes. Upper Saddle River: Routledge.

Google Scholar

Bowers, M. C., and Tung, W.-w. (2018). Variability and confidence intervals for the mean of climate data with short-and long-range dependence. J. Clim. 31, 6135–6156. doi:10.1175/JCLI-D-17-0090.1

CrossRef Full Text | Google Scholar

Box, G. E., Jenkins, G. M., Reinsel, G. C., and Ljung, G. M. (2015). Time series analysis: forecasting and control. Hoboken: John Wiley & Sons.

Google Scholar

Cannelli, V., Piersanti, A., Galli, G., and Melini, D. (2018). Italian radon monitoring network (iron): a permanent network for near real-time monitoring of soil radon emission in Italy. Ann. Geophys. 61, 444. doi:10.4401/ag-7604

CrossRef Full Text | Google Scholar

Clauset, A., Shalizi, C. R., and Newman, M. E. (2009). Power-law distributions in empirical data. SIAM Review. 51, 661–703. doi:10.1137/070710111

CrossRef Full Text | Google Scholar

Conraria, L. A., and Soares, M. J. (2011). The continuous wavelet transform: a primer. NIPE Work. Paper. 16, 1–43. doi:10.1111/joes.12012

CrossRef Full Text | Google Scholar

Crockett, R. G., Gillmore, G. K., Phillips, P. S., Denman, A. R., and Groves-Kirkby, C. J. (2006). Tidal synchronicity of built-environment radon levels in the UK. Geophys. Res. Lett. 33, 1319–1334. doi:10.1029/2005GL024950

CrossRef Full Text | Google Scholar

Crockett, R. G., Groves-Kirkby, C. J., Denman, A. R., and Phillips, P. S. (2018). Significant annual and sub-annual cycles in indoor radon concentrations: seasonal variation and correction. Geol. Soc. Lond. Spl. Publ. 451, 35–47. doi:10.1144/SP451.2

CrossRef Full Text | Google Scholar

Cuculeanu, V., Lupu, A., and Sütö, E. (1996). Fractal dimensions of the outdoor radon isotopes time series. Environ. Int. 22, 171–179. doi:10.3390/environments6030029

CrossRef Full Text | Google Scholar

D’Alessandro, A., Scudero, S., Siino, M., Alessandro, G., and Mineo, R. (2020). Long-term monitoring and characterization of soil radon emission in a seismically active area. Geochem. Geophys. Geosyst. 21, e2020GC009061. doi:10.1029/2020GC009061

CrossRef Full Text | Google Scholar

Daubechies, I. (1992). Ten lectures on wavelets. Philadelphia: SIAM, Vol. 61.

Google Scholar

Dickey, D. A., and Fuller, W. A. (1979). Distribution of the estimators for autoregressive time series with a unit root. J. Am. Stat. Assoc. 74, 427–431. doi:10.1080/01621459.1979.10482531

CrossRef Full Text | Google Scholar

Donner, R. V., Potirakis, S. M., Barbosa, S. M., Matos, J. A., Pereira, A. J., and Neves, L. J. (2015). Intrinsic vs. spurious long-range memory in high-frequency records of environmental radioactivity. Eur. Phys. J. Spec. Top. 224, 741–762. doi:10.1140/epjst/e2015-02404-1

CrossRef Full Text | Google Scholar

Dunn, J., and Henschel, B. (1989). Statistical aspects of autoregressive-moving average models in the assessment of radon mitigation. Environ. Int. 15, 247–252. doi:10.16966/2576-6430.112

CrossRef Full Text | Google Scholar

Höll, M., Kiyono, K., and Kantz, H. (2019). Theoretical foundation of detrending methods for fluctuation analysis such as detrended fluctuation analysis and detrending moving average. Phys. Rev. 99, 033305. doi:10.1103/PhysRevE.99.033305

CrossRef Full Text | Google Scholar

Hosking, J. R. M. (1981). Fractional differencing. Biometrika 68, 165–176. doi:10.1093/biomet/68.1.165

CrossRef Full Text | Google Scholar

Hurst, H. (1951). Long-term storage capacity of reservoirs. Am. Soc. Civil Eng. 76, 770–799.

Google Scholar

Hurst, H., Black, R., and Simaika, Y. (1965). Long-term storage: an experimental study constable. London, UK: Constable.

Google Scholar

Mandelbrot, B. B., and Wallis, J. R. (1968). Noah, joseph, and operational hydrology. Water Res. Res. 4, 909–918. doi:10.1029/WR004i005p00909

CrossRef Full Text | Google Scholar

Mandelbrot, B. B., and Wallis, J. R. (1969). Robustness of the rescaled range r/s in the measurement of noncyclic long run statistical dependence. Water Resour. Res. 5, 967–988. doi:10.1029/WR005i005p00967

CrossRef Full Text | Google Scholar

Montanari, A., Rosso, R., and Taqqu, M. S. (1997). Fractionally differenced arima models applied to hydrologic time series: identification, estimation, and simulation. Water Res. Res. 33, 1035–1044. doi:10.1029/97WR00043

CrossRef Full Text | Google Scholar

Morales-Simfors, N., Wyss, R. A., and Bundschuh, J. (2019). Recent progress in radon-based monitoring as seismic and volcanic precursor: a critical review. Crit. Rev. Environ. Sci. Technol. 50, 1–34. doi:10.1080/10643389.2019.1642833

CrossRef Full Text | Google Scholar

Nikolopoulos, D., Matsoukas, C., Yannakopoulos, P., Petraki, E., Cantzos, D., and Nomicos, C. (2018). Long-memory and fractal trends in variations of environmental radon in soil: results from measurements in lesvos island in Greece. J. Earth Sci. Clim. Chang. 9, 1–11. doi:10.4172/2157-7617.1000465

CrossRef Full Text | Google Scholar

Pan, J.-N., and Chen, S.-T. (2008). Monitoring long-memory air quality data using arfima model. Environ. Off. J. Int. Environ. Soc. 19, 209–219. doi:10.1002/env.882

CrossRef Full Text | Google Scholar

Papacharalampous, G., Tyralis, H., and Koutsoyiannis, D. (2018a). One-step ahead forecasting of geophysical processes within a purely statistical framework. Geosci. Lett. 5, 12. doi:10.1186/s40562-018-0111-1

CrossRef Full Text | Google Scholar

Papacharalampous, G., Tyralis, H., and Koutsoyiannis, D. (2018b). Predictability of monthly temperature and precipitation using automatic time series forecasting methods. Acta Geophys. 66, 807–831. doi:10.1007/s11600-018-0120-7

CrossRef Full Text | Google Scholar

Pasini, A., and Ameli, F. (2003). Radon short range forecasting through time series preprocessing and neural network modeling. Geophys. Res. Lett. 30, 39. doi:10.1029/2002GL016726

CrossRef Full Text | Google Scholar

Piersanti, A., Cannelli, V., and Galli, G. (2015). Long term continuous radon monitoring in a seismically active area. Ann. Geophys. 58, 0437. doi:10.4401/ag-6735

CrossRef Full Text | Google Scholar

Pinault, J.-L., and Baubron, J.-C. (1996). Signal processing of soil gas radon, atmospheric pressure, moisture, and soil temperature data: a new approach for radon concentration modeling. J. Geophys. Res.: Solid Earth. 101, 3157–3171. doi:10.1029/95JB03121

CrossRef Full Text | Google Scholar

Reisen, V. A., Monte, E. Z., da Conceição Franco, G., Sgrancio, A. M., Molinares, F. A. F., Bondon, P., et al. (2018). Robust estimation of fractional seasonal processes: modeling and forecasting daily average so₂ concentrations. Math. Comput. Simulat. 146, 27–43. doi:10.1016/j.matcom.2017.10.004

CrossRef Full Text | Google Scholar

Roesch, A., and Schmidbauer, H. (2018). WaveletComp: computational wavelet analysis. R package version 1.1.

Google Scholar

Schery, S., Gaeddert, D., and Wilkening, M. (1984). Factors affecting exhalation of radon from a gravelly sandy loam. J. Geophys. Res.: Atmos. 89, 7299–7309. doi:10.1029/JD089iD05p07299

CrossRef Full Text | Google Scholar

Schumann, R., Owen, D., and Asher-Bolinder, S. (1988). “Factors affecting soil-gas radon concentrations at a single site in the semiarid western us,” in Proceedings of the 1988 EPA symposium on radon and radon reduction technology 2, Citeseer. Washington, DC, September 22–25, 1992.

Google Scholar

Shumway, R. H., and Stoffer, D. S. (2017). Time series analysis and its applications: with R examples. New York, NY: Springer.

Google Scholar

Siino, M., Alessandro, G., Buonmestieri, S., D’Alessandro, A., Mineo, R., and Scudero, S. (2019a). “Radon concentration in the ragusa province (sicily, Italy),” in International scientific conference man and Karst. Silicy, Italy, June 24–26, 2019.

CrossRef Full Text | Google Scholar

Siino, M., Scudero, S., Cannelli, V., Piersanti, A., and D’Alessandro, A. (2019b). Multiple seasonality in soil radon time series. Sci. Rep. 9, 8610. doi:10.1038/s41598-019-44875-z

CrossRef Full Text | Google Scholar

Stojanovska, Z., Ivanova, K., Bossew, P., Boev, B., Zora, S., Kolar, P., et al. (2017). Prediction of long-term in door radon concentration based on short-term measurements. Nucl. Technol. Radiat. 32, 77–84. doi:10.2298/NTRP1701077S

CrossRef Full Text | Google Scholar

Stránský, V., and Thinová, L. (2017). Radon concentration time series modelling and application discussion. Radiat. Protect. Dosim. 177, 155–159. doi:10.1093/rpd/ncx207

CrossRef Full Text | Google Scholar

Team, R. D. C. (2005). R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.

Google Scholar

Toutain, J.-P., and Baubron, J.-C. (1999). Gas geochemistry and seismotectonics: a review. Tectonophysics 304, 1–27. doi:10.1016/S0040-1951(98)00295-9

CrossRef Full Text | Google Scholar

Udovičić, V., Filipović, J., Dragić, A., Banjanac, R., Joković, D., Maletić, D., et al. (2014). Daily and seasonal radon variability in the underground low-background laboratory in belgrade, Serbia. Radiat. Protect. Dosim. 160, 62–64. doi:10.1093/rpd/ncu109

CrossRef Full Text | Google Scholar

Veenstra, J. Q. (2013). Persistence and anti-persistence: theory and software. PhD thesis. Ontario (Canada): The University of Western Ontario.

Google Scholar

Wang, W., Van Gelder, P., Vrijling, J., and Chen, X. (2007). Detecting long-memory: Monte Carlo simulations and application to daily streamflow processes. Hydrol. Earth Syst. Sci. Discuss. 11, 851–862. doi:10.5194/hessd-3-1603-2006

CrossRef Full Text | Google Scholar

Woith, H. (2015). Radon earthquake precursor: a short review. Eur. Phys. J. Spec. Top. 224, 611–627. doi:10.1140/epjst/e2015-02395-9

CrossRef Full Text | Google Scholar

Yan, R., Woith, H., Wang, R., and Wang, G. (2017). Decadal radon cycles in a hot spring. Sci. Rep. 7, 12120. doi:10.1038/s41598-017-12441-0

CrossRef Full Text | Google Scholar

Yaya, O. S., and Fashae, O. A. (2015). Seasonal fractional integrated time series models for rainfall data in Nigeria. Theor. Appl. Climatol. 120, 99–108. doi:10.1007/s00704-014-1153-8

CrossRef Full Text | Google Scholar

Zhang, S., Shi, Z., Wang, G., Yan, R., and Zhang, Z. (2020). Groundwater radon precursor anomalies identification by decision tree method. Appl. Geochem. 121, 104696. doi:10.1016/j.apgeochem.2020.104696

CrossRef Full Text | Google Scholar

Keywords: autoregressive integrated moving average model, autoregressive fractionally integrated moving average model, continuous wavelet transform, radon, spectral analysis, forecast

Citation: Siino M, Scudero S and D’Alessandro A (2020) Stochastic Models for Radon Daily Time Series: Seasonality, Stationarity, and Long-Range Dependence Detection. Front. Earth Sci. 8:575001. doi: 10.3389/feart.2020.575001

Received: 23 June 2020; Accepted: 22 September 2020;
Published: 10 November 2020.

Edited by:

Francesco Italiano, National Institute of Geophysics and Volcanology, Italy

Reviewed by:

Kit Lai, Universiti Brunei Darussalam, Brunei
Dimirios Nikolopoulos, University of West Attica, Greece

Copyright © 2020 Siino, Scudero and D'Alessandro. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Salvatore Scudero, c2FsdmF0b3JlLnNjdWRlcm9AaW5ndi5pdA==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.