Insights On Streamflow Predictability Across Scales Using Horizontal Visibility Graph Based Networks

Streamflow is a dynamical process that integrates water movement in space and time within basin boundaries. The authors characterize the dynamics associated with streamflow time series data from about seventy-one U.S. Geological Survey (USGS) stream-gauge stations in the state of Iowa. They employ a novel approach called visibility graph (VG). It uses the concept of mapping time series into complex networks to investigate the time evolutionary behavior of dynamical system. The authors focus on a simple variant of VG algorithm called horizontal visibility graph (HVG). The tracking of dynamics and hence, the predictability of streamflow processes, are carried out by extracting two key pieces of information called characteristic exponent, {\lambda} of degree distribution and global clustering coefficient, GC pertaining to HVG derived network. The authors use these two measures to identify whether streamflow process has its origin in random or chaotic processes. They show that the characterization of streamflow dynamics is sensitive to data attributes. Through a systematic and comprehensive analysis, the authors illustrate that streamflow dynamics characterization is sensitive to the normalization, and the time-scale of streamflow time-series. At daily scale, streamflow at all stations used in the analysis, reveals randomness with strong spatial scale (basin size) dependence. This has implications for predictability of streamflow and floods. The authors demonstrate that dynamics transition through potentially chaotic to randomly correlated process as the averaging time-scale increases. Finally, the temporal trends of {\lambda} and GC are statistically significant at about 40% of the total number of stations analyzed. Attributing this trend to factors such as changing climate or land use requires further research.


Introduction
Transport of water in natural streams is one of the main components of the hydrologic cycle.
Like other natural systems, streamflow shows fluctuation over time. Intense rainfall events manifesting as peak streamflow and sometimes flooding, longer dry periods followed by low flows and droughts, snowmelt in higher latitudes after cessation of cold season are some examples illustrating streamflow fluctuations. Under changing climate, understanding increased demand of water or flooding issues requires the understanding of variability and predictability of underlying streamflow dynamics. The changing context of human and climate-induced changes in hydrologic cycle manifest into a new realm of streamflow predictability (Kumar, 2011) in addition to its traditional understanding in the context of water management and forecasting. In this study, the authors treat streamflow time-series as an output of the non-linear dynamical system and map it into complex networks using visibility-graph-based algorithm (e.g., Braga et al., 2016;Lacasa et al., 2012;Lacasa and Toral, 2010;Lacasa et al., 2008;Stephen et al., 2015).
Streamflow time series have been studied using different approaches, including Fourier transforms (e.g., Lundquist & Cayan, 2002), wavelet transforms (e.g., Coulibaly and Burn, 2004;Smith et al., 1998), chaos theory (e.g., Bordignon and Lisi, 2000;Porporato and Ridolfi, 1997) and stochastic modeling (e.g., Livina et al., 2003;Prairie et al., 2006;Wang, 2006). Most of these studies were motivated by the needs of streamflow forecasting but determining predictability lies on the ability to distinguish origin of underlying process (e.g. Lacasa and Toral, 2010). The distinction of the presence of low-dimensional chaos (determinism) or longrange randomness (stochasticity) in streamflow has long been a long standing issues that has not been comprehensively resolved.
The issue of mathematical and numerical modeling of river flow time-series to understand the predictability of streamflows requires distinguishing whether the underlying dynamics is deterministic or stochastic. In this direction, limited efforts have been documented in the hydrologic community. Porporato and Ridolfi (1997) and Bordignon and Lisi (2000) checked for the evidence of chaotic behavior considering non-linear dynamics measures such as phaseportrait of the attractor, largest Lyapunov exponent, correlation dimension and demonstrated that presence of a low-dimension chaotic (deterministic) component cannot be excluded. Bordignon and Lisi (2000) further verified the presence of chaotic behavior using the Deterministic versus Stochastic (DVS) algorithm proposed by Casdagli and Weigend (1993). Their study showed that non-linear river flow modeling enhances the predictability of streamflow. Other non-linear method i.e. Correlation Integral Analysis (CIA) was proposed by Grassberger and Procaccia (1983). Pasternack (1999) investigated the presence of low-dimensional chaos using CIA and showed that it is not possible to confirm its presence in streamflow time-series. However, the method could be a useful tool for evaluating model output characteristics. There has not been a clear consensus in the literature regarding identification of chaotic behavior in streamflow timeseries in terms of the methods applied, data size used and other factors used in their analyses (see e.g., Pasternack (1999), Koutsoyiannis (2006)).
To shed some more light into this debate, here we apply a new method, which provides a criterion that discriminates between chaotic and stochastic processes. In this method, any time series can be mapped into a network by using visibility graphs (e.g., Braga et al., 2016;Lacasa et al., 2012;Lacasa and Toral, 2010;Lacasa et al., 2008;Stephen et al., 2015). Such graphs have the ability to reveal many salient characteristics of the time series. Regardless of the type of the time series used, the constructed network, by definition, is connected, undirected, and invariant to affine transformations of the time series. Hence, the constructed network from a time series holds its inherent properties.
A simple variant to visibility graph is the horizontal visibility graph (HVG).
Mathematically, let [xi, i = 1, 2,…, N] be a time-series of N data, where i represents nodes. Two nodes i and j are connected if [xj, xj] > xk ∀ k| i k j . In other words, two nodes i and j in the graph are assumed connected, if one can draw a horizontal line in the time series joining xi and xj, which does not intersect any intermediate data height. A key feature of HVG graphs is that the nearest neighbors are visible to each other. Note that rescaling of horizontal and vertical axes, and horizontal and vertical translations do not affect the result of network obtained from HVG graphs. In Figure 1, we illustrate the concept of transforming time-series to a network using HVG. We use this approach to generate the network from streamflow time series to explore its underlying dynamics. horizontally visible from each other. For example, take node 4. We can draw a horizontal straight line between node 4 and node 2 that does not intersect the height of node 3. Thus, node 4 is connected to node 2. Similarly, node 4 is connected to nodes 3, 5 and 9, but not, for example to node 6, since there is no way we can draw a horizontal line without crossing the height of node 5. (b) The complete network. Lacasa and Toral (2010) employed HVG to characterize and distinguish between random and chaotic processes underlying a time series. They showed that time series maps to a network where degree i.e. the number of connections with other nodes have exponential distribution, thus enabling exact distinction between processes. It has been shown analytically (e.g., Braga et al. 2016;Lacasa and Toral 2010)  pre-processing can result in contradictory results and thus should be used with caution. In another study, Gonçalves et al. (2016) explored new ways to extract information from HVG using information theory. They showed that alternative distributions to degree distributions such as distance distribution and weight distribution can help extract efficient information especially for shorter time-series. Further, Serinaldi and Kilsby (2016) used a directed HVG to explore irreversibility, a signature of nonlinearity of streamflow through analysis of degree distributions.
In their comprehensive study, they used 699 unregulated daily time scale streamflow time-series across the conterminous United States (CONUS) to show that degree distributions have systematic sub-exponential behaviors of different strengths and quantified it through the information theory measures. Their findings show that streamflow dynamics are more complex than simple stochastic linear dynamics and irreversibility is a key feature.
In this work, we study unregulated streamflow time series at United States Geological Survey (USGS) stream gauge stations in the State of Iowa in U.S using networks derived from simple undirected HVG. Our objective is not just to explore the predictability of streamflow in terms of processes generating them but also to answer some questions not addressed by previous works. Specifically, we address the following questions: (1) What process underlies streamflow dynamics? (2) How does time resolution of streamflow time series data, as well as a normalization procedure, impact inference from HVG based networks? (3) Does the description of this process demonstrate spatial (basin) scale dependence? (4) Do the characteristics describing this process show temporal evolution?
This paper is organized as follows. In the methods section, we present the study area, data, and provide HVG application strategy across time-scales of streamflow time-series. In the results and discussion section, we seek answer(s) to the above questions based on results from our analysis. Finally, we draw conclusions from this work presenting avenues for further research on visibility-based network analysis of streamflow dynamics in hydrologic context.

Methods
We conducted this study in the domain of the State of Iowa with rivers draining to the Mississippi and Missouri Rivers. At present, around 140 USGS streamflow gauges monitor the streams and rivers providing data for this study. Figure 2 shows the spatial distribution of these stations over Iowa. The stream gauges measure drainage areas that range from about 7 km 2 to 37,000 km 2 . We considered USGS daily streamflow records with record length of at least 50 years. About seventy-one stations qualify this criterion, as depicted by dark green dots in Figure   2. There are about six stations out of seventy-one which are distant downstream from reservoirs.
However, we expect them to have minimal effect on our overall inference from hydrologic standpoint of streamflow predictability. Figure    From river network standpoint, about 65% of the state drains to the Mississippi River while about 35% of the state drains to the Missouri River (e.g., Ghimire et al., 2018;Ghimire et al., 2019). Most of the land use in the state is predominantly agricultural. The North-eastern part of the state can be described by deeply carved terrain, narrow valleys and relatively higher stream slopes while low-reliefs with relatively milder stream slope represent the rest of the state (e.g., Ghimire et al., 2019). As our goals include exploring the impact of time resolution of streamflow time series on our inference, we ought to use station with different record lengths. At daily time-scale, we used data from seventy-one stations with records of 50 years or longer.
However, for comparison of three time-scales: instantaneous (15 minutes), hourly, and daily, we use records between 2002 and 2018 as prior to 2002 only daily data are available. We adopted 15-minutes streamflow time series first and averaged them to generate subsequent hourly (four nodes) and daily scale (ninety-six nodes) streamflow time series for the entire state.
We constructed networks representing every year of the historical streamflow records for USGS stations described above. For each year there are 365 nodes represented by the index i = 1, 2, … ,365 such that N = 365 at daily time-scale. To avoid major seasonal trends between years, we normalized the time series using the mean and standard deviation of flows for each day of the year. We discuss later the impact of this normalization on inference derived from network. Let Xt(i) represent the flow in the year 't' on the day 'i'. Likewise, let xt(i) be the normalized flow for the year 't' on the day 'i'. Then, we define the normalization of time-series through equations (1) -(3).
(1) where ∑ (2) and ∑ (3) represent the mean and standard deviation of the flow respectively on the day 'i' obtained from the distribution over the years of historical records at the given station (n [50,115]). As an illustration, we present in Figure 4 envelope of the entire record of daily streamflow together with its mean and median (see Figure 4(a) and 4(c)) as well as the corresponding envelope of the normalized flows (see Figure 4(b) and 4(d)) for a large basin (16,800 km 2 ) of Cedar river at Cedar Rapids and a small basin (70 km 2 ) of Rapid Creek near Iowa City respectively. What is apparent is that the seasonality of flows clearly visible in the original series is much reduced in the normalized data. Next, we mapped the time series for every year using HVG to a complex network. An illustration of the network associated with two streamflow time series is presented in Figure 5. year associated with each node with their size representing the number of connections they make with their neighbors. Clearly, larger nodes of HVG derived network are associated with the major events. As Figures 5(a) shows, the nodes related to consecutive months (neighbors in the proximity) are connected to each other implying the information transfer during streamflow generation process in the form of antecedent soil moisture, baseflow, and others. We extracted two fundamental pieces of information from these networks to explain the underlying streamflow dynamics and hence, the streamflow predictability. First metric is the degree distribution. The degree, k corresponds to the number of connections a node can have with other nodes. An ensemble of nodes results in a distribution of k with P(k) representing its cumulative probability function i.e. P(K  k). There are documented efforts (e.g., Braga et al., 2016;Lacasa et al., 2012;Luque et al., 2009) illustrating that k resulting from HVG follows the exponential distribution of the form, ~ where λ is the decay parameter, also referred to as the characteristic exponent. For the purely random process, it has been analytically demonstrated (e.g., Braga et al. 2016;Lacasa and Toral 2010;Lacasa et al., 2012;Luque et al., 2009) that . For logistic map, an example of a deterministic chaotic time series, λ from HVG is equal 0.26 (see Lacasa and Toral, 2010). The decay parameter for the purely random process serves as the limit for describing underlying behavior of the streamflow process. If , the streamflow process is chaotic i.e. the dimensionality of the system is smaller. If , the streamflow is a stochastic process with some dependence structure, e.g. linear. In such a case, the process will have higher predictability that can lead to higher forecasting skill. For our analysis, we obtained the degree distribution for every year at each station.
The second information we extracted from the HVG derived network is called the global clustering coefficient, GC. It is a measure of likelihood of nodes to form clusters of tightly-knit groups. Because we explored only the clustering nature of the entire network rather than its local behavior, only the computation procedure of global clustering coefficient is discussed here. We with 1 corresponding to a full network of triangles. As GC approaches the value of 1, the network becomes more and more fully connected, which means that every node is connected to all other nodes, which in turn means that every node "sees" every other node, hence the process is perfectly linear.
To summarize: (a) If >0.41, then the process is a red noise (stochastic) process and thus linear. (b) If <0.41, then the process is a chaotic (nonlinear) process. GC is expected to be higher in the former cases than in the later cases. It follows that the estimation of  and GC provides insights on the predictability of the process in questions since a chaotic process is inherently more unpredictable that a linear stochastic process.
In the context of normalized flow of Cedar River at Cedar Rapids (see Figure 5(a)), the major events are less dominant in the entire network resulting in relatively simpler internal network structures. Consequently, the values of λ and GC are higher, and hence the higher streamflow predictability. The corresponding randomly shuffled time series in Figure 5(b), however, shows event-like signals (not true streamflow signal) dominating the entire network resulting in a random internal network structure. The resultant values of λ and GC are as expected smaller. In this study we computed GC for every year at each station in the same way as we did for the degree distribution.

Normalized Streamflow Time Series
In Figure 6 (Braga et al., 2016) suggesting that the streamflow process for the entire Iowa region at daily time-scale has its origin in the long-range correlated (persistence) stochastic process. Note that as λ increases, GC increases too, as expected when the process becomes more and more linear.

Natural (Raw) Streamflow Time Series
A rationale behind using normalized streamflow time series is to avoid potential seasonal trends in streamflow. Here, we perform similar analysis for λ and GC using natural and raw streamflow time series. In Figure 8, we present two-dimensional histograms with 1:1 relationship between metrics for raw and normalized streamflow for the pooled data for the state.
The distribution of λ around 1:1 line is almost symmetric as depicted by the percentage of pooled data. It illustrates that λ derived from HVG based network (see Figure 8(a)) is not overly sensitive (47% versus 53% toward normalized and raw data respectively) to the normalization of streamflow, at least at the daily time-scale. In Figure 8(b), we show the histogram for GC between normalized and raw streamflow data. Unlike λ, GC shows strong bias toward the normalized data (99%). As can be seen from the figure, the dynamic range of GC is much smaller compared to λ. The disparity in our inference from GC arises mainly owing to the disparity in complex internal structures of the network for two forms of streamflow time-series (see Figure 9). Figure 9 shows an example of the difference in network structure for Cedar River at Cedar Rapids for years 1904 and 2016. The natural streamflow data for the year 1904 (see Figure 9(b)) shows significant departure in the network internal structure from the year 2016 (see Figure 9(a)) mainly due to dominant base flow conditions in the network. Consequently, λ is enhanced while GC shows decline because the internal structure of the network is simplified. This is reflected in a large majority of GC obtained from natural streamflow depicting chaotic regime. It is at least illustrative of the fact that GC is more sensitive to the form of streamflow data in assessing the streamflow predictability based on HVG derived network.

Effect of Time Scale of Streamflow Time Series
With the availability of instantaneous (sampled every 15-minutes) streamflow data, we want to exploit its utility especially in the context of streamflow forecasting. Therefore, we devise an experiment to explore the variability of the predictability measure λ across three timescales viz.
15-minutes, hourly, and daily using the same set-up as described earlier for the daily time-scale streamflow time series with entire historical records. Note that we use the normalized streamflow for all three scales.
In Figure 10  respectively. The solid white dot inside each violin represents median while the thicker solid line inside represents interquartile range. Clearly, the daily time-scales have higher λ and higher GC indicating that averaging affects the dynamics (from chaotic at short time-scales to stochastic at longer time-scales).

Spatial and Temporal Dependence of λ
From hydrologic standpoint, it is important to understand the predictability of streamflow process across spatial scales. In Figure 11, we illustrate the relationship between λ and basin scale. Figure 11(a) depicts a clear spatial dependence of 〈 〉 emerging from the daily scale streamflow time-series. The solid black line represents the predictive model fit using power law which shows that the variability in streamflow predictability in terms of λ is explained by the basin scales. For detailed visualization of uncertainty in λ conditional on basin scales, we present violin plots in Figure 11(b). The uncertainty arises from variability over the historical periods of records conditional on windows of basin scales. Most of the point density of λ is around the median depicting the similar relationship as Figure 11(a). Clearly, larger the basin scales, longer the persistent behavior of streamflow (Ghimire and Krajewski, 2019) and hence the predictability of streamflow. This is supported by the fact that larger basin scales have been shown to display long-memory. This result has important implication for the water resources management and skillful streamflow forecasting. In addition, it is of interest to forecasting community to assess the evolutive trend of streamflow predictability measures λ and GC. To explore this trend, we fit a linear regression model using the ordinary least square method for each measure with t [years]. For λ, the model we fit is of the form: where a and b are intercept and slope of the model fit line respectively. Subsequently, we fit similar model as Eq. 5 to GC of the form: * * where a* and b* are intercept and slope parameters of linear model fit respectively employing ordinary least squares method. Figure Figure 12). Though the assumption of residuals distribution being normal is not ideal in this case, studies have shown that using the bootstrap regression also yields similar results (Braga et al., 2016). Moreover, a large majority of these stations are of scales larger than 1000 km 2 , which suggests that evolutive trend of streamflow predictability is more prominent at larger basin scales. Further, it is apparent at the central-eastern parts of the state which could be attributed to factors such as anthropogenic and climatic changes, and modifications to the base flow conditions. The explicit attribution of trend is beyond the scope of this study.

Summary and conclusions
We explored streamflow predictability across scales using seventy-one stations in the State of Iowa. Insights on the predictability of streamflow process were provided through the distinction between underlying stochastic and chaotic processes responsible for generating them, by estimating the characteristic exponent, λ from degree distributions and the global clustering coefficient, GC obtained from HVG derived complex networks. Our study answers some key questions set forth at the beginning of this paper pertaining to fundamentals of predictability of streamflow process from hydrologic standpoint.
 We showed that determining the predictability of streamflow process lies in the distinction between chaotic or stochastic processes. Our results based on HVG application to streamflow at daily time scale demonstrates that streamflow dynamics is a correlated stochastic process. The presence of correlated structure in streamflow dynamics indicates the potential for strong predictability of streamflow.
 The normalization of streamflow shows strong effect on the overall inference on predictability. We show that GC is more sensitive than λ to the form of streamflow data. for GC. We attribute this change in streamflow predictability across time-scales to change in the dynamics of the process itself as we average it over time.
 For hydrologic community, it is of interest to understand the predictability of streamflow process across spatial scales. We show a clear spatial scale dependence of λ for streamflow at the daily time-scale. In other words, larger the basin scales, the longer the persistent behavior of streamflow and hence, the predictability of streamflow. This result has important implication for water resources management and skillful streamflow forecasting.
 Finally, the forecasting community is always interested in assessing the evolutive trend of streamflow predictability. A simple linear regression-based evolution model fit shows that thirty-one stations show statistically significant trend in terms λ while twenty-six stations show statistically significant trend in terms of GC. The changes could have arisen from factors such as climate change induced activities, changing rainfall patterns, and land use patterns. The explicit attribution of the trend requires further research focusing solely on their impact to predictability of streamflow process.
As rainfall is the key agent of flooding in Iowa, it should be interesting to explore the effect of rainfall on predictability of streamflow. It is clear that at small scales, rainfall and streamflow are connected stronger than at large scales, where water transport separates the two. Water transport is in the river network where streamflow aggregation process plays an important role in shaping streamflow fluctuations (e.g., Ayalew et al., 2014aAyalew et al., , 2014bAyalew et al., , 2015. We neglected the effects of streamflow measurement errors as the USGS data are considered to be accurate within 5% (e.g., Ghimire and Krajewski, 2019). For our analysis, we expect the measurement uncertainty to have minimal impact on the overall inference. We performed a fundamental study on predictability of streamflow process through HVG based complex networks. We believe, our findings provide hydrologic context of interpreting underlying dynamics of streamflow process. Our analysis captures a wide range of spatial scales; hence results are deemed adequate in representing hydrologic processes across scales. Though nonlinearity is known to exist in the hydrologic process, our study does not explicitly explore nonlinearity from streamflow signals using HVG based approach. Therefore, exploiting this aspect of streamflow process and impact on predictability and incorporating them in our hydrologic modeling strategy is open to further research.
Though we implemented our analysis to Iowa, we believe that the streamflow process will demonstrate similar predictability across scales in regions with similar landscapes and climatology such as Upper Mississippi and Ohio River basins (Schilling et al., 2015) when derived from HVG based complex networks. Given that the hydro-climatologic conditions, landscapes, and base flow conditions are quite similar, the inference on streamflow dynamics across spatio-temporal scales is expected to be similar. Though we did not report it in this paper, one could separately perform the effect of time resolution (time-scale) on standard chaotic maps, which could validate our results in standalone approach.