phenoC++: An open-source tool for retrieving vegetation phenology from satellite remote sensing data

Satellite-retrieved vegetation phenology has great potential for application in characterizing seasonal and annual land surface dynamics. However, obtaining regional-scale vegetation phenology from satellite remote sensing data often requires extensive data processing and computation, which makes the accurate and rapid retrieval of regional-scale phenology a challenge. To retrieve vegetation phenology from satellite remote sensing data, we developed an open-source tool called phenoC++, which uses parallel technology in C++. phenoC++ includes six common algorithms: amplitude threshold (AT), first-order derivative (FOD), second-order derivative (SOD), third-order derivative (TOD), relative change rate (RCR), and curvature change rate (CCR). We implemented the proposed phenoC++ and evaluated its performance on a site scale with PhenoCam-observed phenology metrics. The result shows that SOS derived from MODIS images by phenoC++ with six methods (i.e., AT, FOD, SOD, RCR, TOD, and CCR) obtained r-values of 0.75, 0.76, 0.75, 0.76, 0.64, and 0.67, and RMSE values of 21.36, 20.41, 22.38, 19.11, 33.56, and 32.14, respectively. Satellite-retrieved EOS by phenoC++ with six methods obtained r-values of 0.58, 0.59, 0.57, 0.56, 0.36, and 0.40, and RMSE values of 52.43, 46.68, 55.13, 49.46, 71.13, and 69.34, respectively. Using PhenoCam-observed phenology as a baseline, SOS retrieved by phenoC++ was superior to MCD12Q2, while EOS retrieved by phenoC++ was slightly inferior to that of MCD12Q2. Moreover, compared with MCD12Q2 on a regional scale, phenoC++-retrieved vegetation phenology yields more effective pixels. The innovative features of phenoC++ are 1) integrating six algorithms for retrieving SOS and EOS; 2) quickly processing data on a large scale with simple input startup parameters; 3) outputting phenology metrics in GeoTIFF format image, which is more convenient to use with other geospatial data. phenoC++ could aid in investigating and addressing large-scale phenology problems of the ecological environment.


Introduction
Vegetation phenology is a first-order control of ecosystem productivity that reflects and affects the physiological, physical, and chemical processes of vegetation ecosystems. It also plays a key role in energy exchange and affects the carbon balance of the earth system (Foley et al., 1996;Ganguly et al., 2010;Elmore et al., 2012;Zhang et al., 2020;Gao et al., 2021). It is an essential component of the earth's ecology (DeFries et al., 1995;Foley et al., 1996). In the context of global warming, characterizing seasonal and annual land surface dynamics is critical for monitoring climate change scenarios (Peng et al., 2017;Caparros-Santiago et al., 2021;Kollert et al., 2021).
Satellite remote sensing data with rich historical records and suitable spatial-temporal resolutions are frequently employed to monitor the variation of vegetation phenology (Ganguly et al., 2010;Peng et al., 2017;Ao et al., 2020;Peng et al., 2021). Satellite-retrieved phenology has great application potential in characterizing both seasonal and annual land surface dynamics (De Beurs and Henebry, 2008;Zhang et al., 2018), such as monitoring climate-vegetation interaction and extreme events, modeling carbon cycles, crop-type discrimination, crop-yield estimation, and land cover mapping (Walker et al., 2012;Elmore et al., 2017;Bolton et al., 2020;Peng et al., 2021). Most studies based on satellite remote sensing data (such as AVHRR, MODIS, and SPOT VGT) have developed related algorithms for retrieving regional-or global-scale phenological products, such as amplitude threshold (AT) (ORNL Distributed Active Archive Center, Fischer, 1994;Zhou et al., 2016), first-order derivative (FOD) (Yu et al., 2003), second-order derivative (SOD) (Sakamoto et al., 2005), third-order derivative (TOD) (Tan et al., 2011), relative change rate (RCR) (Piao et al., 2006), and curvature change rate (CCR) (Zhang et al., 2003). Previous researchers integrated one or more phenological retrieval algorithms into software toolkits such as TIMESAT (Jönsson and Eklundh, 2004), phenofit (Kong et al., 2022), and the phenor R package . These are of great significance and aid in vegetation phenology. However, these software tools frequently require complex parameters to be entered, are slow to process data at runtime, lack open-source code, or lack ground-based observations to evaluate their performance results.
In the past, few ground-based observational datasets of vegetation phenology were publicly available online, preventing researchers from accessing ground validation data to evaluate the performance of retrieving algorithms (Zhou et al., 2016). Recently, some vegetation phenology datasets obtained from ground-based observations have been directly downloaded from the internet, including United States of America National Phenology Network (USA-NPN) data resources, the Pan-European Phenological database (PEP725) (Templ et al., 2018), and the PhenoCam Dataset (Seyednasrollah et al., 2019). The PhenoCam Dataset mainly applies red-green-blue (RGB) digital cameras to record the timing of the specific phenophases of plants, which differs from the USA-NPN and PEP725, which are collated by human observation. The use of digital cameras to observe changes in the phenological states of vegetation may reduce the uncertainty caused by non-uniformity compared to traditional human observations (Menzel, 2002;Richardson et al., 2018) and may be more appropriate for evaluating the performance of satellitebased vegetation phenology.
Our study developed an open-source tool called phenoC++ for retrieving vegetation phenology. The innovation of this tool is threefold. First, it integrates six algorithms for retrieving data at the start of season (SOS) and end of season (EOS). Second, phenoC++ quickly processes data, and the input startup parameters are simple. Third, it outputs phenology metrics in GeoTIFF format images, which are more convenient to use with other geospatial data. Furthermore, in this study, we implement this open-source tool to retrieve vegetation phenology and evaluate its performance on both site and regional scales using PhenoCamobserved phenophases and existing MODIS phenology products.
2 Data and methods

The algorithms of phenoC++ for retrieving vegetation phenology
The open-source tool phenoC++ includes six methods that correspond to different algorithms for retrieving vegetation phenology from satellite remote sensing data (see Table 1. These six methods have been widely used to retrieve SOS and EOS from remote sensing time-series data such as EVI2, NDVI, and LAI. In this study, we employed the EVI2 time series data from MOD09Q1 as input data for phenoC++ and obtained the EVI2 time series data from MOD09Q1 using the following equation: where EVI2 denotes a two-band enhanced vegetation index, P N denotes the near-infrared band reflectance, and P R denotes the red band reflectance. We used the Savitzky-Golay (S-G) method (Savitzky and Golay, 1964) to remove the outliers of the EVI2 time series and interpolated the eight-day EVI2 time series to one-day intervals. We divided the EVI2 time series of a growing season into a growth period and a dormancy period and used Eq. 2 to fit them.
where t denotes the day of the year, y(t) denotes the EVI2 value on date t, and a, b, c, and d are fitting parameters. We used this data series when fitting the EVI2 of vegetation dormancy with Eq. 2.

Test region
We selected the conterminous United States as the test region; it has a rich distribution of vegetation types, such as a large number of deciduous broadleaf forests distributed in its east and west, and vast grassland, shrub, and some deciduous broadleaf forests distributed in its center. In addition, many digital camera observation stations for tracking vegetation phenology have been established in the conterminous United States, and data from these stations can be used to evaluate the performance of vegetation phenology extracted from satellite remote sensing data using phenoC++.

Frontiers in Environmental Science
frontiersin.org

MODIS data
The MODIS/Terra surface reflectance eight-day 250-m product (MOD09Q1, Version 6) (Vermote et al., 2015) was used to retrieve the vegetation phenology in the developed open-source tool. MOD09Q1 includes two surface reflectance bands (red band, 620-670 nm; near-infrared band, 841-876 nm) with eight-day temporal resolution and an approximately 250-m spatial resolution. The MODIS/Terra and Aqua Land Cover Dynamics Yearly L3 Global 500 m SIN Grid (MCD12Q2, Version 6), which provides vegetation phenology over global land surfaces , was used as reference data for evaluating the performance of the developed open-source tool. Both MOD09Q1 and MCD12Q2 are available from https://search. earthdata.nasa.gov/. We used the greenup layer (i.e., SOS) and the dormancy layer (i.e., EOS) of MCD12Q2 for analysis. In the MCD12Q2 product, SOS and the EOS would be obtained when EVI2 crosses 15% of the segment EVI2 amplitude for the first or last time, respectively.

PhenoCam data
We used PhenoCam Dataset v2.0 (Seyednasrollah et al., 2019) as validation data for evaluating the performance of vegetation phenophases extracted from the MOD09Q1 EVI2 time series by phenoC++. The PhenoCam sites were established in 2008 and use networked digital cameras to track vegetation phenology. There are TABLE 1 Description of the six algorithms for extracting the start-of-season (SOS) and end-of-season (EOS) from the EVI2 time series.

Method
Index Zhang et al. (2003) Note: EVI2s represents EVI2 at the periods of sustained increase for phenological cycles (i.e., spring): EVI2a represents EVI2 at the periods of sustained decrease for phenological cycles (i.e., autumn); t represents the day of the year in the EVI2 time series; K ′ s represents the rate of change of the curvature of the logistic-fitted EVI2 time series during the periods of sustained increase for phenological cycles (i.e., spring); K ′ a represents the rate of change of curvature of the logistic-fitted EVI2 time series during the periods of sustained decrease for phenological cycles (i.e., autumn) (for more details, please see Zhang et al. ,2003); max () and min () denote the maximum and minimum of the time series data, respectively; dt(), dt 2 (), and dt 3 () denote obtaining the first-order, second-order, and third-order derivatives of time series data, respectively.

Frontiers in Environmental Science
frontiersin.org 500 sites, particularly located in North America. The PhenoCam Dataset v2.0 (Seyednasrollah et al., 2019) can be obtained from https://phenocam.sr.unh.edu/webcam/. We used the phenology records of 28 deciduous broadleaf forest (DBF) sites, 43 shrubland (SHR) sites, and 38 grassland (GRA) sites from 2009 to 2018 for satellite-retrieved phenology validation (as shown in Figure 1). In the PhenoCam Dataset, three threshold values (i.e., 10%, 25%, and 50%) of the green chromatic coordinate index mean (G cc -mean) were employed to retrieve vegetation phenophases. Among them, the threshold value of 50% of the vegetation index time series is rarely used to retrieve vegetation phenophases from remote sensing data, while the SOS and EOS retrieved by threshold values of 10% and 25% of the vegetation index time series are similar (Ruan et al., 2021). Therefore, SOS and EOS of the PhenoCam Dataset v2.0, which were retrieved from the threshold values of 25% of G cc -mean amplitude, were selected for validation.

Performance of phenoC++ on a site scale
After obtaining the EVI2 time series from MOD09Q1, we used phenoC++ with the six methods to retrieve the vegetation phenology from 2009 to 2018 and evaluated the retrieved vegetation phenology using PhenoCam-observed phenology. Pearson correlation coefficient and root-mean-square error (RMSE) were used to evaluate the performance of vegetation phenology retrieved by the open-source tool. Figure 2 shows the scatterplots for the regression result between the PhenoCamobserved SOS and the satellite-retrieved SOS using multiple methods. In Figure 2, most points are around 1: 1, demonstrating that SOS derived from two independent datasets is consistent. Compared with PhenoCam-observed SOS, the SOS derived from MOD09Q1 EVI2 by phenoC++ with six methods obtained r-values of 0. 75,0.76,0.75,0.76,0.64,and 0.67,and RMSE values of 21.36,20.41,22.38,19.11,33.56,and 32.14, respectively. The TOD method produced the worst SOS, while the other methods performed similarly to each other. Figure 3 shows the scatterplots for the regression analysis between the PhenoCam-observed EOS and the satellite-retrieved EOS using phenoC++ with six methods. Most points in Figure 3 are also around 1: 1, but they are more discrete than the SOS scatter points, as shown in Figure 2. Compared with PhenoCam-observed EOS, the EOS derived from MOD09Q1 EVI2 by phenoC++ with six methods obtained r-values of 0. 58,0.59,0.57,0.56,0.36,and 0.40,and RMSE values of 52.43,46.68,55.13,49.46,71.13,and 69.34, respectively. The CCR method obtained the worst EOS, and the other methods had similar performance to each other. Figures 2 and  3 show that, compared with PhenoCam-observed phenology, the accuracy of satellite-retrieved SOS is slightly higher than that of satellite-retrieved EOS, which is consistent with previous results reported by Li et al. (2019). This may be caused by the different sensitivities of G cc and EVI2 for detecting vegetation growth  Frontiers in Environmental Science frontiersin.org changes. During vegetation growth, G cc and EVI2 both rapidly increase. EVI2 decreases slightly during vegetation dormancy, but G cc rapidly decreases if leaf color changes. Figure 4 illustrates the regression analysis of SOS/EOS obtained from MCD12Q2 and the PhenoCam Dataset. We used it to indirectly evaluate the performance of phenoC++. The regression analysis indicated that the SOS of MCD12Q2 achieved an r-value of 0.45 and an RMSE value of 25.22, and the EOS of MCD12Q2 achieved an r-value of 0.64 and an RMSE value of 50.67. Comparison with the PhenoCam Dataset found that the SOS of MCD12Q2 is unfavorable with the SOS retrieved by the phenoC++. The least desirable method of phenoC++ (TOD) obtained an r-value of 0.64, and its performance shows that it is better than the MCD12Q2 SOS, with an r-value of 0.45. For EOS, that of MCD12Q2 (r = 0.64; RMSE = 50.67) is better than the EOS obtained by phenoC++. Nevertheless, the SOS/EOS for retrieval by phenoC++ (N ≈ 420) obtained more effective pixels in the PhenoCam observation sites than MCD12Q2 (N = 288). Through the regression analyses between the phenoC++-retrieved phenology/MCD12Q2 with PhenoCam-observed phenology, we found an RMSE range of about 20-55 days. This is similar to the results of Xin, et al. [42], who used satellite-retrieved phenology for comparison with US National Phenology Network data (RMSE~25-55 days), and Li,et al. [43], who used 30 m fine-resolution satellite-retrieved phenology to compare PhenoCam-observed phenology (RMSE about 25 days; r of SOS is 0.66 and r of EOS is 0.43). The main reason for the large RMSE could be that the observation scale between the satellite and PhenoCam is inconsistent, and the observation of spectral difference between the satellite sensor and PhenoCam-camera. Figure 5 shows the profile of phenophases obtained from MOD09Q1 by phenoC++ with six methods and MCD12Q2. The average SOS estimated by the six methods and the AT and SOD methods were similar to the MCD12Q2 SOS. Compared with the MCD12Q2 SOS, the FOD and RCR methods were overestimated, while the TOD and CCR methods were underestimated. Meanwhile, the average EOS estimated by the six methods and the AT and SOD methods were similar to the MCD12Q2 EOS. Compared with the MCD12Q2 EOS, the TOD and CCR methods were overestimated, while the FOD and RCR methods were underestimated. Overall, the phenophases (SOS and EOS) estimated by phenoC++ were in line with those of MCD12Q2 with reliable performance.
3.2 The performance of phenoC++ on a regional scale Figure 6 shows the spatial distribution for SOS over the conterminous United States, including the MCD12Q2 SOS and the SOS obtained from the MOD09Q1 EVI2 time series by phenoC++ with six methods. The SOS values obtained from MOD09Q1 EVI2 (Figures 6B-G) are close to MCD12Q2 SOS ( Figure 6A) on a regional scale, while the former SOS is generally slightly earlier than the latter SOS in the southwest of the USA and slightly more delayed than the latter SOS in the northeast. Compared with Figure 6A, the SOS retrieved by phenoC++ shows high-value (red) pixels in North Dakota, South Dakota, (C) location of the profile. The average value of phenophases was calculated every 0.05°along the profile. MCD12Q2 denotes the phenophases obtained from the MCD12Q2 product; AT, FOD, SOD, RCR, TOD, and CCR denote the phenophases estimated from the MOD09Q1 EVI2 time series using corresponding methods, and "mean" denotes the average value of the phenophases estimated from the fused EVI2 time series by six methods.

Frontiers in Environmental Science
frontiersin.org Minnesota, Iowa, Illinois, Indiana, Ohio, and Pennsylvania, especially in the results from the FOD and RCR methods. From Figures 6A-G, the SOS in the southwest of the USA (less than 80 days) is earlier than in the northeast of the USA (more than 80 days, mostly more than 140 days). This shows that the vegetation season begins earlier in the southwest of the USA than that in the northeast. In addition, the MCD12Q2 SOS ( Figure 6A) shows more data gaps in the southwest of the USA, which is not shown in the SOS retrieved by phenoC++. This phenomenon could be caused by the MOCD12Q2 algorithm's inability to effectively retrieve phenological dates from remote sensing data in places with relatively sparse vegetation. Figure 7 shows the spatial distribution of the EOS over the conterminous United States, including the MCD12Q2 EOS and the EOS obtained from the MOD09Q1 EVI2 time series with phenoC++ with six methods. The EOS values obtained from MOD09Q1 EVI2 (Figures 7B-G) are close to MCD12Q2 EOS ( Figure 7A) on the regional scale, especially in Figures 7B and 7D. The EOS obtained from MOD09Q1 EVI2 with the FOD

Frontiers in Environmental Science
frontiersin.org ( Figure 7C) and RCR ( Figure 7E) methods are slightly earlier than the EOS of MCD12Q2, and the EOS obtained from MOD09Q1 EVI2 with the TOD ( Figure 7F) and CCR ( Figure 7G) methods are generally slightly delayed than the EOS of MCD12Q2. From Figures 7A-G, the EOS in the northwest of the USA (about 240 days) is earlier than that in the southeast (more than 300 days, mostly more than 320 days). This suggests that the vegetation season ends later in the southeast of the USA than that in the northwest. In addition, like Figure 6A, Figure 7A also has data gaps in the southwest of the USA.

Discussion
Previous research has provided phenology-retrieval tools, such as in the R package Kong et al., 2022) and the MATLAB platform (Jönsson and Eklundh, 2004). However, their stability and speed in large-scale vegetation phenology retrieval are not as effective as programs written in C++. This study provides a C++ compiled running program (phenoC++), which is compiled under the Linux system. If users need to run it under Windows, they must install the C++ and GDAL environments before compiling it. phenoC++ only

Frontiers in Environmental Science
frontiersin.org retrieves two commonly used and critical phenology metrics (i.e., SOS and EOS). If users need to retrieve other phenology metrics or add more algorithms, the source code of phenoC++ can be modified. We compared the performance of TIMESAT software for retrieving the phenology metrics to that of phenoC++. Areas of 802 × 642 pixels and 5164 × 4193 pixels (about the area of Texas) from MOD09Q1 were used to test the performance of TIMESAT software and phenoC++. The test computer was configured as follows. The computer system was Ubuntu 16.04, and the CPU model was Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50 GHz (52 threads) with 512 GB of computer memory. After data preprocessing, TIMESAT ran for approximately 2.5 min in the 802 × 642-pixel region, occupying 0.2 GB of memory, while in the 5164 × 4193pixel test, the running time was approximately 25 min and the memory occupied 0.4 GB. After data preprocessing, the phenoC++ test in the 802 × 642-pixel region took approximately 0.63 min and used 0.2 GB of memory, while the test for 5164 × 4193 pixels took approximately 24.5 min and used 4.8 GB of memory. This shows that the running speeds of the TIMESAT and phenoC++ are similar and that phenoC++ occupies more memory when the same number of CPU threads are used. This is because TIMESAT uses multiple steps to retrieve vegetation phenology, whereas phenoC++ is integrated, which is more user-friendly. Compared with TIMESAT, phenoC++ also has the following advantages. 1) phenoC++ outputs six algorithms' results for SOS or EOS, while TIMESAT only outputs threshold method results for SOS and EOS. 2) phenoC++ outputs phenology metrics in GeoTIFF format images with geographical coordinates, which are more convenient for use with other geospatial data, while TIMESAT outputs binary files.
3) The code of phenoC++ is open source, while TIMESAT is not. In addition, we also tested phenoC++ for retrieving vegetation phenology from MOD09Q1 EVI2 with 250-m spatial resolution over the USA (22,731 × 9774 pixels). The run time of phenoC++ was approximately 4.5 h, and the memory occupied was approximately 57.9 GB. Therefore, phenoC++ is an efficient and easy-to-use software tool.

Conclusion
Vegetation phenology is a first-order control on ecosystem productivity, so its accurate and rapid retrieval from satellite remote sensing data is key to understanding the feedback between the climate and the biosphere. We developed phenoC++, a tool that uses parallel C++ technology to retrieve start-of-season (SOS) and end-of-season (EOS) vegetation data from satellite time series data. Compared to traditional tools, the innovative features of phenoC++ include 1) integrating six algorithms for retrieving SOS and EOS; 2) rapid large-scale data processing with simple input startup parameters; 3) outputting phenology metrics as GeoTIFF format images, which are more convenient to use with other geospatial data.
We implemented phenoC++ to quickly and easily obtain the spatial distribution of SOS and EOS at 250-m spatial resolution over the conterminous United States using MODIS time-series data. We then evaluated phenoC++ performance for retrieving SOS and EOS on a site scale using PhenoCam Dataset v2.0. The results show that SOS derived from MODIS images by phenoC++ with six methods obtained r-values of 0. 75,0.76,0.75,0.76,0.64,and 0.67,and RMSE values of 21.36,20.41,22.38,19.11,33.56,and 32.14, respectively. The satellite-retrieved EOS by phenoC++ with six methods obtained r-values of 0. 58,0.59,0.57,0.56,0.36,and 0.40,respectively,and RMSE values of 52.43,46.68,55.13,49.46,71.13,and 69.34, respectively. Using PhenoCam-observed phenology as a baseline, SOS retrieved by phenoC++ outperforms MCD12Q2 SOS, while EOS retrieved by phenoC++ is slightly inferior to that of MCD12Q2 EOS. Moreover, compared to MCD12Q2, phenoC++-retrieved vegetation phenology yielded more effective pixels on a regional scale. phenoC++ can rapidly produce robust vegetation phenology on a large scale. The SOS and EOS spatial distribution information on vegetation is more easily accessible through phenoC++, which will help solve large-scale ecological phenology problems.

Data availability statement
The original contributions presented in the study are included in the article/supplementary materials, further inquiries can be directed to the corresponding author.

Author contributions
YR and BR conceived and designed phenoC++ with contributions from QX and XZ. YR developed the application programming by parallel C++; BR analyzed the data and interpreted the results. YR drafted the manuscript. All authors commented on and approved the final manuscript.