Stochastic Models for Radon Daily Time Series: Seasonality, Stationarity, and Long-Range Dependence Detection

This study detects the presence of seasonality, stationarity, and long-range memory structures in daily radon measurements from a permanent monitoring station in central Italy. The transient dynamics and the seasonality structure are identified by power spectral analysis based on the continuous wavelet transformation and a clear 1-year periodicity emerges. The stationarity in the data is assessed with the Dickey–Fuller test; the decay of the estimated autocorrelation function and the estimated Hurst exponent indicate the presence of long-range dependence. All the main characteristics of the data have been properly included in a modeling structure. In particular, an autoregressive fractionally integrated moving average (ARFIMA) model is estimated and compared with the classical ARMA and ARIMA models in terms of goodness of fit and, secondarily, of forecast evaluation. An autoregressive model with a noninteger value of the differencing parameter ( d = 0.278 ) resulted to be the most appropriate on the basis of the Akaike Information Criterion, the diagnostic on the residuals, and the root mean squared error. The results suggest that there is statistically significant evidence for not rejecting the presence of long memory in the radon concentration. The radon measurements are better characterized as being stationary, but with long memory and so, the statistical dependence decays more slowly than an exponential decay.


INTRODUCTION
The monitoring of soil radon ( 222 R n ) emission is a relevant topic for the risk that this radioactive gas poses to human health but also for its relationship with environmental and geological processes. The radon signals usually present a complex dynamic structure that is directly and indirectly influenced by several factors, such as environmental and climatic conditions of the site and characteristics of the ground soil, tide, and solar effect. (Pinault and Baubron, 1996;Piersanti et al., 2015;Siino et al., 2019b). All these factors have a different effect on the signal, as they can result either in a trend, seasonal, or stochastic component. For instance, climate or tidal forces reflect in a multiple seasonality of the radon time series: hourly, diurnal, multi-day, annual, and even multi-annual cycles have been detected in different studies worldwide (Crockett et al., 2006;Udovičić et al., 2014;Yan et al., 2017;Crockett et al., 2018;Siino et al., 2019b;D'Alessandro et al., 2020). Of particular interest is the role of Rn as a potential earthquake precursor Woith, 2015;Baskaran, 2016;Morales-Simfors et al., 2019) because the fracturing processes in the crust could enhance the mobility of Rn toward the surface (Toutain and Baubron, 1999;Woith, 2015). Similarly, anomalies can be the result of weather episodes which cannot be explained by meteorological variables. Whatever the cause, these anomalies can be masked within the signal, and a way to bring them to light would be to de-noise the signal from the trend and/or periodic components (Baykut et al., 2010;Siino et al., 2019b;D'Alessandro et al., 2020). As a matter of fact, it is a challenging task to untangle and properly quantify all of these effects on the radon fluctuations because Rn time series present generally a nonstationary behavior, not constant variability over time and a long-term memory .
Methodologically, time series analysis techniques are proper statistical tools to extract meaningful characteristics from data. Moreover, because long-term records of environmental variables show often long-range memory, some other tools are usually applied. The fractionally integrated moving average models [ARFIMA (p,d,q)] have been widely used in the literature to describe meteorological variables (Yaya and Fashae, 2015;Bowers and Tung, 2018), pollutants and soil gas (Pan and Chen, 2008;Donner et al., 2015;Belbute and Pereira, 2017;Reisen et al., 2018), and hydrological time series (Montanari et al., 1997;Wang et al., 2007). This class of models is used when the longterm correlations in the data decay more slowly than an exponential form, that is, a typical shape of autocorrelation in the autoregressive moving average [ARMA(p,q)] processes (Box et al., 2015). Furthermore, several studies investigate the predictability of the ARFIMA model assessing multi-step ahead performance with respect to others univariate time series forecasting methods such as a naive method, random walk (with drift), ARMA with trend and seasonality, and the exponential smoothing (Papacharalampous et al., 2018a;Papacharalampous et al., 2018b).
In the literature, the radon data have been described with different methods. Dunn and Henschel (1989) characterize a three-week record at an hourly frequency using simple autoregressive moving-average (ARMA) models. Later, the Box-Jenkins methodology often used in econometrics was applied to describe five-year-long radon time series considering a seasonal integrated, autoregressive moving averages model with exogenous variables (SARIMAX) also adding external covariates such as delayed atmospheric parameters (Stránský and Thinová, 2017). Donner et al. (2015) present complementary methods that have been applied for evaluating the presence of longrange correlations and fractal scaling in environmental radon measurements.
In this study, we analyze a 3-year-long radon concentration signal aiming at the assessment of a model which describes its dynamics with time series methodologies (Shumway and Stoffer, 2017). A comprehensive analysis of the seasonality structure is performed to detect clues about the stationarity and the presence of long-range memory in the data that could be related to geological processes. We estimate some ARFIMA models which explicitly consider simultaneously both the short-term and long-term correlation structures of the series. Moreover, we tested the forecast performance of the obtained models. The novelty of this analysis relies on the simultaneous estimation of seasonality and long-range memory in the estimation of proper ARFIMA stochastic models.

MATERIALS AND METHODS
In this section, the time series methods used in the analysis of daily radon measurements are described following Shumway and Stoffer (2017) and Beran (2017). We briefly present some tools useful to check in an observed time series the presence of nonstationarity (in terms of seasonality and trend) and longrange memory behaviors.
The seasonality behavior in the data is studied by power spectral density based on the time-averaged continuous wavelet spectrogram (Daubechies, 1992;Conraria and Soares, 2011). To properly apply the stochastic models, the daily radon time series is examined for the presence of stationarity. The Dickey-Fuller test (Dickey and Fuller, 1979) is used for this purpose to determine the presence of a unit root in an autoregressive model. The presence of long-range memory has been assessed on the data estimating the Hurst exponent (Hurst, 1951) and looking at the shape of the estimated autocorrelation coefficients for several lags. The presence of long-term memory can justify the estimation of the ARFIMA models, and their structure is also explained.

Spectral Analysis for Seasonal Detection
In this paragraph, we briefly describe the spectral analysis in the time-frequency domain based on continuous wavelet transformation following the notation in Daubechies (1992) and Conraria and Soares (2011).
The space L 2 ( ) is the set of square integrable functions satisfying +∞ −∞ g(t)| 2 dt < ∞ and denoted by the capital letter, G(t) the Fourier transformation of a given function, G(ω) +∞ −∞ g(t)e (−iωt) dt. A function ψ(t) ∈ L 2 (R) that satisfies the admissibility condition Ψ(0) +∞ −∞ ψ(t)dt 0 is called "mother wavelet," and a doubly indexed family ("wavelet daughters") is generated by scaling and translating ψ(·): ψ τ,s (t) |s| − 1/2 ψ t−τ s with s, τ ∈ and s ≠ 0. In this analysis, we use the wellknown, quite flexible, and complex-valued Morlet mother wavelet that takes the form ψ(t) π −1/4 e iωt e −t 2 /2 . The local wavelet power spectrum (WPS) based on the continuous wavelet transformation (CWT) of a given function g(t) ∈ L 2 ( ) with respect to the wavelet family where * represents the complex conjugate operation, s is the scale parameter controlling the wavelet width, and τ controls the wavelet location in the time domain. The wavelet power spectrum Eq. 1 can be interpreted as the local variance of the time series.
To do a comparison with the classical spectral method, the previous quantity can be averaged over time (τ) obtaining the global wavelet power spectrum, +∞ The peaks in the global power spectral density indicate the prevalent periods in the data. In this study, the wavelet transformation and the computation of the global power spectrum are computed with the WaveletComp package (Roesch and Schmidbauer, 2018) in R statistical software (Team, 2005).

Autocorrelation and Partial Autocorrelation Functions
Given a time series {y t }, the autocorrelation is the similarity between the observations as a function of the time lag between them. The j th order autocorrelation ρ(j) can be estimated by using the formula ρ j Cov y t , y t−j where and Var y t 1 n − 1 n t j+1 y t − y 2 (5) In Eqs. 4 and 5, y is the mean of y t , and 5 is just the special case of 4 in which j 0. The empirical autocorrelation function (ACF) is ρ(j) defined in Eq. 3, computed in the data as a function of the lag j.
Moreover, another way to characterize the relationship between {y t } and its lagged values is by the partial autocorrelation function, or PACF. The partial autocorrelation coefficient of order j, ρ (j) j measures the effect (linear dependence) of y t on y t−j after removing the effect of y t−1 , y t−2 , . . . y t−j−1 on both y t and y t−j . Each partial autocorrelation can be obtained as a series of regressions of the form: The empirical PACF of order J is computed by running 6 for j 1, . . . , J and retaining only the estimate ρ (j) j for each j. The shape of both the sample ACF and PACF provides a way to see which is the pattern of serial dependence, and it may help to suggest which kind of stochastic process would fit well the data.
If {y t } presents long-range memory, the correlation function 3 decays hyperbolically showing a power law distribution (Höll et al., 2019). Clauset et al. (2009) give an overview of the statistical methods that can be used to detect and characterize power law distribution in empirical data.

The Hurst Coefficient and the Rescaled Range (R/S) Method
The Hurst exponent (H) is an index of long-term memory of time series {y t } originally developed for hydrological data (Hurst, 1950;Hurst et al., 1965). It is defined in asymptotic terms of the rescaled range as E[R(n)/S(n)] Cn H as n → ∞, where n is the number of data points in a time series, E is the expected value, C is a constant, R(n) is the range of the first n cumulative deviations from the mean, and S(n) is their standard deviation.
We consider the rescaled range (R/S) method to estimate H (Mandelbrot and Wallis, 1968;Mandelbrot and Wallis, 1969).
Given y t , the mean is computed (m 1 n n i 1 y i ) and the mean adjusted series x t y t − m for t 1, 2, . . . , n. Then, the cumulative series is z t t i 1 x i , and the range series is is the mean for the time series values through time t. The following series of the ratio is considered (R t /S t ) for t 1, 2, . . . , n.
The Hurst exponent is estimated as the slope of the line between log[R t /S t ] and logt. The long memory structure exists when 0 < H < 1. If H ≥ 1, the process has infinite variance and is nonstationary. If 0 < H < 0.5, an antipersistence structure exists; if 0.5 < H < 1 the series is persistence, instead when H 0.5, the process is a white noise. Other methods have been proposed in the literature to detect the presence of long-range temporal correlations in the presence of nonstationary in the data, that is, the detrended fluctuation analysis (Höll et al., 2019).

Autoregressive Fractionally Integrated Moving Average Model
Environmental data, and also radon measurements, can exhibit characteristics consistent with long-range memory in time series . Such characteristics consist in a specific structure of the autocorrelation function of the process.
If {y t } presents long-range memory, correlation function 3 decays hyperbolically, rather than showing the exponential decay that is a characteristic of an ARIMA(p, 0, q) process. A way to characterize long-range dependence in observational data is by fitting autoregressive fractionally integrated moving average (ARFIMA(p, d, q)) models, which are a natural extension of the classic ARIMA(p, d, q) models (Hosking, 1981). The ARFIMA models allow for handling explicitly both the shortterm and the long-term correlation structures of a series. Let {y t } be a stationary process, an ARFIMA(p, d, q) process where p and q are integers and d is real, represented as where ∇ d is the fractional differencing operator respectively. Note that the binomial coefficient can be defined for for k∈N and an arbitrary d.
The row vector x t contains the exogenous variables, and in our analysis, they are harmonic terms used to describe the seasonality in the radon time series. The parameter d in Eq. 7 describes the high-lag correlation structure of a time series, while the p and q parameters are chosen to describe the low-lag correlation structure.
An important aspect to assess in a time series is its stationarity; a process is defined as stationary when its mean, variance, or autocorrelation structure remains constant over time. For stationary series, d ∈ (−0.5, 0.5), and the Hurst exponent associated with the process is given by H (2d + 1)/2. Consequently, long-range memory is present for d ∈ (0, 0.5), while d ∈ (−0.5, 0) indicates antipersistent fluctuations. When |d| > 0.5, the process is nonstationary and its variance is infinite. The process exhibits short memory for d 0, corresponding to stationary and invertible ARMA (autoregressive moving average) model. Instead, the arbitrary restriction of d to integer values corresponds to the standard autoregressive integrated moving average (ARIMA) model, and in this case, the variable is I(d), and it becomes stationary after d differences and it is nonstationary after day 1 differences. For instance, an I(1) variable can have a linear trend but no quadratic trend, and it can be transformed into a stationary series with the first-order differences. If a series exhibits long memory, it is neither stationary (I(0)) nor it is a unit root process (I(1)); it is an I(d) process, with d a real number.
There are statistical tests to check stationarity, named unit root tests. The results are traditionally interpreted as that the effects of one-time shocks to the series are either transitory (if the series is stationary), or permanent (if the series is not stationary). The Dickey-Fuller test (DF) (Dickey and Fuller, 1979) tests the null hypothesis that the series is nonstationary; however, DF only considers the dichotomy between stationarity and nonstationarity. The rejection of the null provides evidence for a stationary series, then the ARMA model can be directly    (d) is an integrated moving average model with order of integration equals to 1 (d 1). ϕ 1 is the estimate of the autoregressive coefficient in Eq. 7. β 1 and β 2 are associated to the harmonic terms (sin(2πtω) and cos(2πtω), where t is the time and ω 1/365) introduced in the model 7 as external variables xt to describe the observed seasonality at 365-day. The log-likelihood, the Akaike Information Criterion and the range of the model residuals are shown. The root mean square errors (RMSE) for the rolling forecasts at 1-and 5-lag are reported. The significance of the estimates is in terms of p-value.*p < 0.1; **p < 0.05; ***p < 0.01.  applied. Instead, if the null hypothesis is accepted, the series needs to be made stationary through differencing. The model in Eq. 7 is estimated with exact maximum likelihood estimation explained in Veenstra (2013) using arfima package of R statistical software (Team, 2005). Usually, the model selection is performed evaluating simultaneously the goodness of fit and the forecast performances. The assessment of the goodness of fit can be done using the Akaike Information Criteria, AIC 2k − 2ln(L), where k is the number of estimated parameters in the model Eq. 7 and L is the maximized value of the likelihood function for the estimated model, and the model with the smallest AIC value is preferred. Moreover, the assumptions of the model on the random component (ϵ t ) are checked assessing the constant variability, the normality assumption, and the absence of correlation structure in the model residuals ( ϵ t ).
The model family in Eq. 7 is fitted to time series data both to understand the data and to forecast (to predict future points in the series). Forecast evaluation can be done when the observed values are available. Usually, the observed data are divided into training and test samples. The model is fitted to the training sample, and then its k-step ahead forecast performance is evaluated on the test one. The root mean square errors (RMSEs) are used to check the forecast accuracy of the estimated models; it is given by RMSE n i 1 (y i −ỹ i ) 2 n −1 , where y i is the observed value for the ith observation andỹ i is the predicted one.

Data
The analyzed radon time series is recorded at Pietralunga (PTRL, Italy, lat 43.44 N and long 12.44 E) between 28/09/2012 and 01/ 08/2015 for a total of 1,038 days. The PTRL station is in a framework of near real-time monitoring of soil radon emission to study earthquake preparatory processes, the Italian radon monitoring network (IRON) (Cannelli et al., 2018). The selected station is equipped with a Lucas cell, an alpha scintillation detector with an acquisition window of about 2 h (115 min of data acquisition followed by a 5-min standby time). In detail, the Lucas cell consists of a flask in which the inner surface is coated with silver-activated zinc sulfide (ZnS). It integrates a front-end electronics and measures radon concentration by counting the radon decay signals in the given acquisition window. The radon detector is located in a  small room of a school basement, not disturbed by anthropogenic influences and without any kind of opening and/or aeration system. However, the pressure and the temperature could affect the radon measures. The PTRL site is characterized by a contained seasonal variability also if compared with other sites of the same network, even equipped with a borehole probe which should be more immune from such effects (c.f. Figure 2 in Siino et al. (2019b)). The radon concentration is measured in Bq/m 3 , becquerel per cubic meter. The raw time series of the mean daily concentrations is shown in Figure 1; the measured values range between 20.35 Bq/m 3 and 377.86 Bq/m 3 , and they show a clear seasonal signal connected with the temperature (Cannelli et al., 2018;Siino et al., 2019b), and the higher values are during the summer period.
The 0.87% of the daily data are missing. Generally, it is a challenge to handle missing values, especially for time series data. Two possible ways to deal with the incomplete data can be to omit the entire record that contains information or impute the missing values. However, since a small percentage of the analyzed data presents missing values, they are filled by the weighted moving average method with a semi-adaptive window of 4 days.
Weighted moving averages assign a linear weighting to the data points used to perform the imputation.
The data are divided into two subsets: the training set used to do the main analysis and to fit the stochastic models, and the test set (5% of the data identified by a vertical line in Figure 1) used to compare the models in terms of forecasts.

Seasonality, Stationarity, and Long-Memory Detection
We report the main results about the seasonality detection, the autocorrelation, and the long-memory analysis of the observed series. For the study of the dynamical and seasonal behaviors of the observed radon concentrations, we compute the spectral density analysis in the time-frequency domain based on the continuous wavelet transformation. The wavelet power spectrum is shown in Figure 2 where the period ranges from 16 to 512 days. The time-frequency regions with warm colors are characterized by high power, and the black lines indicate the significant maxima of the undulations of the wavelet power spectrum, and they give an indication of the permanent cycle period. The thick black contour  indicates the 90% confidence level, and the lighter shade indicates regions inside the cone of influence due to the border effect. Clearly, the series exhibits transient dynamics and the magnitude of the WPS is not constant over time fixing a specific frequency. We can observe a high value of the spectral power density at about 1-year periodicity that is persistently significant. Other high-power periodicities are present, even though not continuous over the entire period. A medium-power, ∼180-day cycle is recognizable in the first half of the series, and a ∼22-day cycle characterized by highpower appears around summer 2014. These cycles can be also observed in Figure 2 which shows the global wavelet power spectrum; the horizontal lines provide a reference at 180 and 365 days. The series show a clear 1-year periodicity and subordinate periodicities at about 180 days and three weeks. The longer cycles are probably related to the annual and semiannual cycles of the climatic variables (temperature, pressure, and rainfall), while the ∼22-day cycle is likely ascribable to the lunisolar gravitational influence which results in a tide effect on the flux or radon [see Siino et al. (2019b)]. This descriptive analysis is preparatory to decide which seasonality terms include in the model formulation for the explanatory variables (x t in Eq. 7). In particular, we consider harmonic terms to describe the 1-year periodicity that is the only one persistent with a constant power over time. Figure 2 shows the estimated curves considering a regression linear model fitted on the data with harmonic terms for a 365day period (with R 2 coefficient equals to 0.34).
The shapes of the correlogram and the partial correlogram provide indication about the properties of the time series and could indicate a plausible structure of the stochastic model in Eq. 7. In Figure 1, the estimated autocorrelation up to 400 lags seems to decay slower than an exponential one. Also from the autocorrelation, it is clear that the data exhibit a prevalent seasonal cycle which dominates the dependence structure. The partial autocorrelation coefficient is defined as the autocorrelation at each lag after controlling for the autocorrelation due to all preceding lags. It helps determine how many AR terms (i.e., lagged observations as predictors) should be included in Eq. 7. If there is a sharp drop in the PACF after p lags, then the previous p-values are responsible for the autocorrelation in the series, and the model should include p autoregressive terms. In our case, the highest and also significant value is at lag 1, with a value of the correlation equals to 0.792, and in the following lags (p > 1), the autocorrelation coefficients are close to zero. It indicates that an autocorrelation term (p 1) can be included in the model.  Table 1 to check the normality assumption.
Frontiers in Earth Science | www.frontiersin.org November 2020 | Volume 8 | Article 575001 The estimate of the Hurst exponent (H) with the rescaled range analysis (Section 2.3) is used to assess the presence of long memory. The obtained value is 0.785 indicating that the mean daily measurements have a persistent long-memory structure since 0.5 < H < 1. In the literature, there are several results consistent with our analysis. For instance, in Cuculeanu et al. (1996), the determined values of Hurst's coefficient (0.809) highlight a persistent behavior of the gas. Also, Nikolopoulos et al. (2018) compute the rescaled range analysis for several time intervals obtaining a persistent Hurst exponent between 0.7 and 0.9 and in some periods between 0.9 and 1.
The Dickey-Fuller test is used to check the null hypothesis that the series is nonstationary, and thus, the rejection of the null provides evidence for a stationary series. The value of the test on the data in Figure 1 is −5.285, and the value of the p-value (1.53e−07) is lower than the significant level α 0.05, and we can reject the null hypothesis that the series has a unit root and hence is not stationary. According to this result, for our data, integration (first-order differences) is not necessary.

Modeling Results
The obtained results indicate that the studied radon concentrations present persistent long-memory structure, 1year seasonality, and an absence of a trend. Also, according to the PACF, an autoregressive term can be appropriate to describe the short-term correlation.
Starting from these evidences, four models are estimated and compared, and all of them have the harmonic terms (sin(2πtω) and cos(2πtω)) as external covariates to describe the seasonality. Four candidate models are estimated: • Model (a) is a fractional model with p q 0, is an ARMA(1, 0), so it is an autoregressive model of order 1 without differencing (also it can be indicated as an ARIMA (1,0,0) model)  Table 1.
Frontiers in Earth Science | www.frontiersin.org November 2020 | Volume 8 | Article 575001 The Models (b), (c), and (d) have an autoregressive term (p 1) since doing several comparisons, a parsimonious model is obtained without moving average terms (q 0). Only in the Models (a) and (c), the fractional integration parameter is freely estimated. The estimated parameters, their standard errors, and significance are reported in Table 1. Also, additional information for each model such as the log-likelihood, the AIC, the range of the residuals, and the RMSE at 1-and 5-lag are shown.
The results of the estimate models suggest that for all of them, the coefficients associated to the harmonic terms are significant, and the comparison with the models without the external covariates are worse (the results are not shown).
Examining the results of Models (a) and (c) where the parameter d is estimated varying in the real values, the fractional parameter for both models is between 0 and 1, thus allowing us to reject both the case of pure stationarity (I 0) and the unit root model (I 1). The estimated parameters are statistically significant at the 1% level and lie within the interval (0, 0.5). The confidence intervals for the estimated fractional-integration parameters are relatively narrow and always in the positive range of persistent long-memory.
The results show that Model (c) is the best model in terms of fitting since it has the lowest AIC and the shorter range of the residuals. For the RMSE at 1-and 5-lag forecast, Model (a) performs slightly better than Model (c); however, the fractional model has too simple parametrization, and it is not able to describe the autocorrelation dynamic in the data (see the residuals time series in Figure 3 and the diagnostics on the residuals Figures 4, 5).
For all the estimated models, the assumption of constant variability along time appears respected (Figure 3), and there is not a marked pattern in the residuals, in particular the seasonality behavior in the original data is not present (Figure 1). The residual ACF ( Figure 4) and PACF ( Figure 5) of the fitted Models (a), (b), and (d) show that there are significant estimated correlation coefficients at short lags; therefore, these models are not adequate. Instead, for the Model (c), the residual ACF and PACF are not significant. The Ljung-Box test from 1 to 10 lags is computed to assess the absence of serial autocorrelation in the residuals. For the model (c), the null hypothesis is not rejected for all the considered lags. The q-q plot of the residuals is used to assess the normality assumption of the considered models. In Figure 6, for all the four models, there is a slight departure from the normality in the tails. Finally, the plots of

DISCUSSION AND CONCLUSIONS
Being sensitive to crustal stress, the soil radon discharge is widely considered as a promising earthquake precursor. Because of the influence of several environmental factors and local geological conditions, pre-seismic radon anomalies cannot be easily detected with conventional statistical methodologies. The general approach is to model the observation and highlight the anomalies. This study tests a radon concentration time series, covering almost 3 years, for the presence of nonstationarity (seasonality and trend) and long memory. Overall, our results indicate that the radon series are better characterized as being stationary in the trend, but with persistent long-memory and 1-year seasonality. It is widely accepted that the periodic annual component in radon concentration time series is correlated to the climatic variables as temperature, atmospheric pressure, and rainfall (Siino et al., 2019a;Siino et al., 2019b;D'Alessandro et al., 2020).
The class of the ARFIMA model presented here provides a general framework for representing radon time series that display both short-and long-term persistence. The analysis of daily radon shows that the ARFIMA approach provides a better representation of the observed data with respect to the traditional ARMA and ARIMA models. More specifically, according to the model comparison, an ARFIMA model with an autoregressive term has a better fitting to the data. The estimated fractional integration parameters of this ARFIMA model is positive and smaller than 0.5 (d 0.278). It corresponds to a Hurst's coefficient of H 0.778 that is consistent with the results obtained with the rescaled range analysis and in general with other literature results (Cuculeanu et al., 1996;Nikolopoulos et al., 2018). The proposed model has been also assessed in terms of its predictive capacity, however the performances are quite poor especially increasing the lag of prediction (moving from 1-day forecast to 5-day forecast).
The occurrence of long-range correlation in the time series has been also tested by the application of Detrended Fluctuation Analysis (DFA) (Höll et al., 2019). Also, this method indicates that the radon concentration can be considered as coming from a fractional Gaussian noise (fGn).
It is widely recognized that radon time series are strongly controlled by the combination between site-specific factors and large-scale variations (i.e., astronomical cycles) (Schery et al., 1984;Schumann et al., 1988;Aumento, 2002;Piersanti et al., 2015;Crockett et al., 2018). It is note worth that the proposed approach (based only on radon measurements) is able to describe with good reliability the data and also to perform short-term forecasts when accurate radon measurements are taken for a reasonably long time span.
Model residuals could be retrospectively compared with external evidence of transitory phenomena in the study area (seismic, meteorological, etc.). Having available the seismic catalogue of the area, we make an attempt to find the relationship between the found anomalies in the radon time series and the earthquakes. In this case study, there is no evidence between the residuals of the fitted model and the seismicity in the study area. However, it should be considered the absence of any relevant earthquakes during the observation period but only the occurrence of background seismicity. In fact, in spite of the well-known seismicity of the area, the larger recorded event was a M w 3.9 located at 9.5 km from the radon monitoring site.
In conclusion, our findings on the long-memory nature of radon measurements have important implications that can be useful for further analysis. The long-memory structure is the result of a long-lasting and aperiodic process such as a weather episode and changes in the circulation of geofluids, ground sealing. However, at this stage (i.e., a single time series), it is not possible to propose a comprehensive physical/geological or physical/meteorological mechanism that could account for the long memory in radon concentration time series; moreover, it would also be out of the purpose of this work. The extension of this methodology by applying the ARFIMA models to longer radon time series, or to series with a different measurement range, or recorded in other monitoring sites, could provide the missing hints.
The proposed approach represents an effective tool to analyze radon signals, and in particular to detect long-range memory in the times series, which are the necessary preliminary steps to explore the relationship between radon anomalies and seismic activity. Finally, it would be interesting for further analysis, to compare the forecast of radon observations, or the identification of preseismic anomalies with those obtained with other methods such as the linear regression analysis (Stojanovska et al., 2017), the artificial neural network approach (Pasini and Ameli, 2003), or the decision tree method (Zhang et al., 2020). It would also be interesting to consider other external covariates in the model formulation Eq. 7, such as weather variables (Stránský and Thinová, 2017).

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
AD and MS conceived the idea of the analysis approach. Data analyses and comments were made by MS. MS and SS prepared and reviewed the manuscript.