Exploring the Potential of DSCOVR EPIC Data to Retrieve Clumping Index in Australian Terrestrial Ecosystem Research Network Observing Sites

Vegetation foliage clumping significantly alters the radiation environment and affects vegetation growth as well as water, carbon cycles. The clumping index (CI) is useful in ecological and meteorological models because it provides new structural information in addition to the effective leaf area index. Previously generated CI maps using a diverse set of Earth Observation multi-angle datasets across a wide range of scales have all relied on the single approach of using the normalized difference hotspot and darkspot (NDHD) method. We explore an alternative approach to estimate CI from space using the unique observing configuration of the Deep Space Climate Observatory Earth Polychromatic Imaging Camera (DSCOVR EPIC) and associated products at 10 km resolution. The performance was evaluated with in situ measurements in five sites of the Australian Terrestrial Ecosystem Research Network comprising a diverse range of canopy structure from short and sparse to dense and tall forest. The DSCOVR EPIC data can provide meaningful CI retrievals at the given spatial resolution. Independent but comparable CI retrievals obtained with a completely different sensor and new approach were encouraging for the general validity and compatibility of the foliage clumping information retrievals from space. We also assessed the spatial representativeness of the five TERN sites with respect to a particular point in time (field campaigns) for satellite retrieval validation. Our results improve our understanding of product uncertainty both in terms of the representativeness of the field data collected over the TERN sites and its relationship to Earth Observation data at different spatial resolutions.

Vegetation foliage clumping significantly alters the radiation environment and affects vegetation growth as well as water, carbon cycles. The clumping index (CI) is useful in ecological and meteorological models because it provides new structural information in addition to the effective leaf area index. Previously generated CI maps using a diverse set of Earth Observation multi-angle datasets across a wide range of scales have all relied on the single approach of using the normalized difference hotspot and darkspot (NDHD) method. We explore an alternative approach to estimate CI from space using the unique observing configuration of the Deep Space Climate Observatory Earth Polychromatic Imaging Camera (DSCOVR EPIC) and associated products at 10 km resolution. The performance was evaluated with in situ measurements in five sites of the Australian Terrestrial Ecosystem Research Network comprising a diverse range of canopy structure from short and sparse to dense and tall forest. The DSCOVR EPIC data can provide meaningful CI retrievals at the given spatial resolution. Independent but comparable CI retrievals obtained with a completely different sensor and new approach were encouraging for the general validity and compatibility of the foliage clumping information retrievals from space. We also assessed the spatial representativeness of the five TERN sites with respect to a particular point in time (field campaigns) for satellite retrieval validation. Our results improve our understanding of product uncertainty both in terms of the representativeness of the field data collected over the TERN sites and its relationship to Earth Observation data at different spatial resolutions.

INTRODUCTION
The clumping index (CI) quantifies the level of foliage grouping within distinct canopy structures relative to a random distribution (Nilson, 1971;Chen and Black, 1992). It provides best agreement between transmittance through clumped canopy and Beer's exponential transmission law (Nilson, 1971;Kuusk, 2018). CI equals unity with leaves completely randomly distributed. Canopy foliage is usually clumped into various subcanopy structures such as crowns, branches, and twigs, thus clumping is defined as a situation where CI < 1. Regularly distributed foliage results in CI greater than unity. Clumping affects the interception and distribution of solar radiation within a canopy (Chen et al., 2003Hill et al., 2011;Wei et al., 2019). In addition, the distribution of foliar nutrients and canopy evapotranspiration (ET) were found to be significantly influenced by CI (Thomas et al., 2011). Both ground and satellite ET estimates are greatly underestimated if CI is not considered (Chen et al., 2016). CI is also an important parameter for accurate canopy-level gross primary production (GPP) modeling (Ryu et al., 2011;Chen et al., 2012). Global and regional scale CI maps have been generated from a diverse set of Earth Observation multi-angle datasets: POLarization and Directionality of the Earth's Reflectances (POLDER) data at ∼6 km resolution ; the Bidirectional Reflectance Distribution Function (BRDF) product from Moderate Resolution Imaging Spectroradiometer (MODIS) at 500 m resolution (He et al., 2012;Wei and Fang, 2016;Jiao et al., 2018), and Multi-angle Imaging SpectroRadiometer (MISR) data at 275 m resolution (Pisek et al., 2013).
All the products listed above share the common feature of estimating CI through a single approach-its empirical relationship with the normalized difference between the hotspot and darkspot (NDHD) Leblanc et al., 2005a): where A and B are coefficients determined by the linear regression, based on a set of model simulations made with the 4-Scale model in Chen et al. (2005). The coefficients vary with assumed crown shape and solar zenith angle [see Table 2 in Chen et al. (2005)]. The NDHD index is defined as: where HS and DS mark the canopy reflectance at the hotspot and darkspot, respectively (Leblanc et al., 2001). The hotspot corresponds to the backscatter peak when the solar radiation and view directions coincide, leading to minimum shading in that view direction. The darkspot exists in the direction opposite to that of the hotspot, where the maximum shadow area can be seen leading to minimum reflectance. This brief research report explores a new, alternative approach to NDHD how to estimate CI from Earth Observation data. The approach exploits unique observation data and products from a new satellite that is quite different from traditional polarorbiting or geostationary satellites. The Deep Space Climate Observatory (DSCOVR) is a satellite positioned near the first Lagrange point (or L1). It offers the continuous observations of full, sunlit side of the Earth. The DSCOVR satellite carries onboard a spectroradiometer-the Earth Polychromatic Imaging Camera (EPIC). EPIC can provide spectral images of the entire sunlit face of the Earth with 10 narrow channels (from 317 to 780 nm) (Marshak et al., 2018) every 1-2 h in summer and winter, respectively.
The CI retrievals with DSCOVR/EPIC product data are validated in this study using available in situ measurements obtained with digital hemispherical photography (DHP), carried over select sites belonging to the Australian Terrestrial Ecosystem Research Network (TERN; Lowe et al., 2016) comprising a diverse range of canopy structure from short and sparse to dense and tall forest.

METHOD DSCOVR EPIC Vegetation Earth System Data Record (VESDR) Product
The DSCOVR EPIC version 1 Vegetation Earth System Data Record (VESDR) provides Leaf Area Index (LAI) as well as diurnal courses of Sunlit Leaf Area Index (SLAI), Normalized Difference Vegetation Index (NDVI), Fraction of incident Photosynthetically Active Radiation (FPAR) absorbed by the vegetation and Directional Area Scattering Function (DASF). The product at 10 km sinusoidal grid with 65-110 min temporal frequency is generated from the upstream DSCOVR EPIC L2 MAIAC surface reflectance product (Lyapustin et al., 2018). With the exception of LAI, all VESDR parameters vary with the sunsensor geometry. The VESDR files also include Solar Zenith Angle (SZA), Solar Azimuthal Angle (SAA), View Zenith (VZA), and Azimuthal (VAA) angles at the same temporal and spatial resolutions. A quality assessment variable (QA_VESDR) is also provided. For this analysis, only the EPIC observations with best quality flags (QA_VESDR = 0) were used for the CI retrieval. It is noted that the DSCOVR EPIC VESDR product is currently released at a provisional quality level. The EPIC level 2 VESDR product and accompanying documentation are available from the NASA Langley Atmospheric Science Data Center (https://asdc. larc.nasa.gov/project/DSCOVR). The VESDR product data were downloaded through NASA's Open-source Project for a Network Data Access Protocol (OPeNDAP; https://opendap.larc.nasa.gov/ opendap/).

Foliage Clumping Retrieval With DSCOVR EPIC Data
Available DSCOVR/EPIC VESDR products of sunlit leaf area index (SLAI) and leaf area index (LAI) allow to estimate sunlit fraction of leaf area : CI provides best agreement between directional uncollated transmittance through clumped canopy, t 0 (θ ), and Beer's exponential transmission law, exp (−τ ), which is applicable for completely randomly distributed leaves. Here where G is the geometry factor as a function of viewing direction θ . We approximate SF based on Beer's law (Warren Wilson, 1967, i.e., Using SF from Equation (3) allows us to solve Equation (5) for τ . Finally, CI can be then estimated from Equation (4) as: The geometry factor may not be always precisely known, but G approaches a value of 0.5 around 57 degrees irrespective of orientation of canopy elements (Ross, 1981;Jupp et al., 2009;Woodgate et al., 2015). We adopt the G value of 0.5 in the CI retrieval while using VESDR products collected with the suitable sun-sensor geometry-observations with view zenith angle around 57 degrees-as an input.

Study Sites and Data for Validation
Australia's Terrestrial Ecosystem Research Network (TERN) is a distributed research infrastructure providing intensive monitoring of the physical and chemical environmental and biological components of ecosystems (Karan et al., 2016). Insitu measurements of CI at different heights using towers were collected at five of the TERN's SuperSites, which together offer a diverse range of canopy structure from short and sparse to dense and tall forest. Their locations and vegetation characteristics are summarized in  Neyland et al., 2000) is located in SW Tasmania, Australia. It represented a tall E. obliqua wet forest with rainforest understory and a dense man-fern (Dicksonia antarctica) ground-layer. The forests around the Warra site had mature heights in excess of 55 m: the tallest E. obliqua within the LTER reaches a height of 90 m. Both Tumbarumba and Warra sites experienced bushfires in the last 2 years. Our site descriptions, in situ validation data, as well as the retrievals with DSCOVR EPIC data correspond to the prefire period.
The vertical profiles of CI (i.e., Cl for all vegetation above the given height) were obtained by climbing scaffolding/flux towers and taking leveled digital hemispherical photos (DHPs) along the climbed height. At each profile, usually several series of DHPs were acquired using a Nikon CoolPix 4500 digital camera with a Nikon FC-E8 fisheye lens under diffuse illumination conditions, following the protocol of Zhang et al. (2005). No leaves were present directly above the camera to obscure its field of view. The towers were masked from the photos before the analysis. The reference DHPs were obtained above the top of the tree canopy. Gap fraction profiles were extracted from the blue channel at view zenith angle 57 • with the DHP software (v4.5; Canada Center for Remote Sensing, Ottawa, Canada). Various methods exist to estimate CI (see Gonsamo and Pellikka, 2009;Woodgate et al., 2015;Chianucci et al., 2019). The method of Leblanc et al. (2005b) was previously shown to provide reliable clumping estimates in both simulated and real canopies Leblanc and Fournier, 2014;Woodgate et al., 2017;Yan et al., 2019): where CI CLX (θ ) is CI determined with the method of Leblanc et al. (2005b), CI CCk (θ ) is the CI of segment k using the corrected  Chen and Cihlar (1995) method by Leblanc (2002), P k (θ ) is the gap fraction of segment k, n is the total number of segments (segment size = 15 • ), P (θ ) is the mean gap fraction, and θ is the view zenith angle. The segment size was set to 15 • as it produced the smallest error out of three segment sizes tested (15, 45, and 90 • ) in the mimicked virtual Eucalypt stand by Woodgate et al. (2017). Equation (7) is used to estimate CI at each climbed height.

Spatial Representativeness Assessment
An analysis of the surface heterogeneity representativeness (Román et al., 2009;Wang et al., 2017) was used in this study to determine whether direct "point-to-pixel" comparisons were appropriate for all validation sites. The method employs variograms calculated using surface albedos obtained using shortwave near nadir surface reflectances (0.25-5.0 um) generated from cloud free 30 m Landsat/Operational Land Imager (OLI) data (Román et al., 2009). To facilitate the 10 km subset, the Landsat imagery was resampled to 90 m spatial resolution. The OLI data were collected as close to the sampling date as possible. Where valid imagery was not available within a reasonable window of the sampling date, imagery from the corresponding season of a different year was used. As such, the analysis was done to illustrate the representativeness of the tower site with respect to a particular point in time. When a measurement site is spatially representative, the overall variability between the internal (1.0 km) components (here Landsat pixel reflectances) of the measurement site and its adjacent landscape corresponding to the satellite pixel footprint should be similar in magnitude. The variogram estimator (variance of the albedo values obtained from the resampled 90 m spatial resolution Landsat imagery at the given distance) usually levels off upon reaching the variogram range indicating the distance where they are no longer spatially correlated (e.g., Figure 1B, points). The site can be simply judged to be spatially representative with respect to the given footprint when the sill value (i.e., the ordinate value of the range at which the variogram levels off to an asymptote) is <5.0e-04 (Román et al., 2009;Wang et al., 2017).

RESULTS
The spatial representativeness was evaluated at three different footprint sizes: 1, 6, and 10 km (Figure 1). All five sites may be considered spatially representative at the smallest 1 km pixel footprint around the time of in situ measurements, although the variogram curve did not reach clear asymptote at Cumberland Plain ( Figure 1B). The spatial heterogeneity increased with the footprint for all sites (here indicated with an increase in sill value). Only two sites (Tumbarumba and Warra) preserved the spatial representativeness all the way to the nominal pixel resolution of 10 km for the DSCOVR EPIC VESDR product (sill value < 5.0e-04).
The landscape heterogeneity within the DSCOVR EPIC VESDR product pixel resolution also manifested itself in the agreement with the in situ measured values of CI over the different sites. Good agreement between the EPIC CI-derived value and in situ measurements (i.e., EPIC CI retrievals intersecting with the vertical profiles collected with DHP) was observed over the most homogeneous sites, Tumbarumba ( Figure 2B) and Warra ( Figure 2E). The EPIC CI values did not show agreement with the vertical profiles of CI at Whroo ( Figure 2C) and Wombat ( Figure 2D). The EPIC CI value was found to intersect the range of CI variation with height at Cumberland Plain site (Figure 2A).

DISCUSSION
This study explored the potential of using an alternative approach, along with unique observations and products from the Earth Polychromatic Imaging Camera (EPIC) onboard the Deep Space Climate Observatory (DSCOVR) satellite, to estimate the clumping index (CI).
First, it must be acknowledged that our measurements are limited to single location (tower) and moment in time for a vertical profile at each site. Any factors that cause an increase in the variance of gap fraction (e.g., canopy type and size, density, disturbances) would imply that a higher number of samples is needed . The relatively coarse nominal resolution of the EPIC sensor at 10 km makes the product validation with in situ data a particularly challenging exercise. Out of five TERN sites with available in situ CI measurements included in this study, only Tumbarumba and Warra may be deemed to be spatially representative of the relatively coarse EPIC nominal pixel footprint. It is encouraging that CI values obtained with EPIC data provided good agreement with the in situ measurements over these two sites (Figures 2B,E). It shall be noted that the spatial representativeness approach used in this study does not include land cover or vegetation type information. Many modeling studies using flux tower data will use classification layer and assess the various proportions of classes to determine whether a site is representative. Using Landsat OLI data and variograms as originally proposed by Román et al. (2009) and applied in this study may provide more detail and catch possible variation that may not be assessed with the land cover or vegetation type based evaluation. It shall be noted that our spatial representativeness evaluations may be valid only for the indicated moments in time. Although all five sites are classified as broadleaf evergreen vegetation dominated by Eucalypts, they may still experience seasonal dynamics of vegetation growth and decay (Duursma et al., 2016). The spatial representativeness of individual sites may change accordingly throughout seasons as well. Additionally, Tumbarumba and Warra sites experienced intensive bushfires in 2019 and 2020, which may have affected their current spatial representativeness as well. We recommend that future studies would follow our example and carry the spatial representativeness assessments to match the moment in time when the in situ measurements are collected.
The best agreement was usually not observed with the in situ measurements acquired close to the ground (h =2 m), but rather with those taken at a distance higher up in the canopy. Similar behavior was previously observed in case of CI estimates obtained with other EO sensors as well (Pisek et al., 2013(Pisek et al., , 2015, as satellite measurements respond primarily to the structural effects in upper levels of canopies (Pisek et al., 2015). This feature will be further exacerbated if the observations are made under oblique angles (Biriukova et al., 2020), like in our study. Ground measurements may be also biased by any lower vegetation/understory layers that would make the foliage distribution more random. Indeed there was an understory layer present at Tumbarumba and Warra when the in situ measurements were taken and the comparison with DSCOVR EPIC data was done (i.e., pre-2019/2020 fire period). Lower shrubs are also present around Cumberland Plain, another site where the EPIC CI retrievals intersected with the vertical profile of CI measured along the tower height (Figure 2A). This agreement might be purely coincidental and treated with caution, since the Cumberland Plain site was the least spatially representative site at the EPIC nominal resolution. The actual EPIC measurements used in this study may in fact come even from an area twice as large due to the large oblique angles around the hinge region (Delgado-Bonal et al., 2020). The two sites from Victoria, Whroo and Wombat, were found to be spatially non-representative at the EPIC nominal resolution. EPIC CI estimates correspondingly did not match with the available in situ measurements (Figures 2C,D), presumably because they did not capture the variability within the greater area within the EPIC pixel footprint. The general range of the in situ measured CI reported in this study agreed with values reported from other Eucalyptus-dominated sites in Australia (Macfarlane et al., 2007;Woodgate et al., 2017).
Our exploratory study is very pertinent to the on-going efforts to map and incorporate clumping information in ecosystem modeling at different scales (Ryu et al., 2012;He et al., 2018). It is very encouraging we showed it is possible to obtain such good quality results using a different approach and different EO data that are very much comparable to previous efforts of mapping CI from space Pisek et al., 2013;Wei and Fang, 2016;Jiao et al., 2018). This general agreement between different retrieval strategies and input data sources is important for increasing overall confidence, justification, and general validity of clumping information retrieval from space in the future.
As a part of the analysis, we also assessed the spatial representativeness of the five TERN forest ecosystem sites for validation of satellite retrievals using a different approach and extending the analysis all the way to the spatial resolution of EPIC sensor compared to Griebel et al. (2020). Our results improve our understanding of product uncertainty both in terms of the representativeness of the field data collected over the TERN sites and its relationship to Earth Observation data at different spatial resolutions.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
JP conceived the project, collected data, ran data analysis and interpretation, and led the writing of manuscript. AE carried the spatial representativeness analysis. WW helped with the field collection at Tumbarumba. SA, AE, EP, CS, TW, WW, and YK discussed the results and contributed to writing the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
This study was supported from Estonian Research Council Grants PUT232, PUT1355, and Mobilitas Pluss MOBERC11. WW was supported by an Australian Research Council DECRA Fellowship (DE190101182). The OzFlux and SuperSite network was supported by the National Collaborative Infrastructure Strategy (NCRIS) through the Terrestrial Ecosystem Research Network (TERN). YK was supported by the NASA DSCOVR project under grant 80NSSC19K0762.