Spatial and Temporal Resolution Improvement of Actual Evapotranspiration Maps Using Landsat and MODIS Data Fusion

Producing daily actual evapotranspiration (ETa) maps with high spatial resolution has always been a challenge for remote sensing research. This study assessed the feasibility of producing daily ETa maps with a high spatial resolution (30 m) for the sugarcane farmlands of Amir Kabir Sugarcane Agro-industry (Khuzestan, Iran) using three different scenarios. In the first scenario, the reflectance bands of Landsat 8 were predicted from the moderate resolution imaging spectroradiometer (MODIS) imagery using the spatial and temporal adaptive reflectance fusion model (STARFM) algorithm. Also, the thermal bands of Landsat 8 were predicted by the spatiotemporal adaptive data fusion algorithm for temperature mapping (SADFAT). Then, ETa amounts were calculated employing such bands and the surface energy balance algorithm for land (SEBAL). In the second scenario, the input data needed by SEBAL were downscaled using the MODIS images and different methods. Then, using the downscaled data and SEBAL, daily ETa amounts with a spatial resolution of 30 m were calculated. In the third scenario, ETa data acquired by MODIS were downscaled to the scale of Landsat 8. In the second and third scenarios, downscaling of the data was carried out by the ratio, regression, and neural networks methods with two different approaches. In the first approach, the Landsat image on day 1 and the relationship between the two MODIS images on day 1 and the other days were used. In the second approach, the simulated image on the previous day and the relationship between the two consecutive images of MODIS were used. Comparing the simulated ETa amounts with the ETa amounts derived from Landsat 8, the first scenario had the best result with an RMSE (root mean square error) of 0.68 mm day−1. The neural networks method used in the third scenario with the second approach had the worst result with an RMSE of 2.25 mm day−1, which was however a better result than the ETa amounts derived from MODIS with an RMSE of 3.19 mm day−1. The method developed in this study offers an efficient and inexpensive way to produce daily ETa maps with a high spatial resolution. Furthermore, we suggest that STARFM and SADFAT algorithms have acceptable accuracies in the simulation of reflectance and thermal bands of Landsat 8 images for homogeneous areas.


INTRODUCTION
Evapotranspiration (ET) is a major component of global water cycle (Sobrino et al., 2021). Precise estimation of actual evapotranspiration (ET a ) on different temporal and spatial scales is necessary for various applications such as natural resources and agricultural management, water resources management, irrigation planning, and soil and crop modeling (Steiner et al., 1991;Nassar et al., 2021). Given the limited number of weather stations and high cost and time required for collecting ground data, using remote sensing techniques can be useful for determining ET a , if a good output accuracy is guaranteed. One of the most common methods for determining ET a using remotely sensed data is the surface energy balance algorithm for land (SEBAL) (Bastiaanssen et al., 2002;Bastiaanssen et al., 2005;Li et al., 2021). Hafeez et al. (2002) compared the results of daily ET a calculated by SEBAL using advanced spaceborne thermal emission and reflection radiometer (ASTER) data with ET o (reference ET calculated by the FAO Penman-Monteith method), ET pan (ET measured at weather stations), and ET c (ET calculated having the crop coefficient K c 0.9) and indicated that SEBAL is of a good accuracy. In another study, Chandrapala and Wimalasuriya (2003) compared ET a resulted from SEBAL with ET a measured by a scintillometer and showed that the differences between them in 10-days and monthly periods were 17 and 1%, respectively. These studies concluded that SEBAL is of a good performance in estimating ET a .
A balance is established between spatial, temporal, and spectral resolutions in the designing process of satellites (Emelyanova et al., 2013;Zhu et al., 2018;Bai et al., 2020;Moreno-Martinez et al., 2020). Nevertheless, due to technical limitations, most satellites are not able to collect images with high temporal, spatial, and spectral resolutions simultaneously (Ghassemian, 2009;Bai et al., 2015). Many of satellites images of high spatial resolution have lower temporal resolution compared to those with low to average spatial resolutions (Emelyanova et al., 2013;Zhu et al., 2018;Bai et al., 2020). For example, due to their high spatial resolution, Landsat images are used for many environmental applications i.e., production of ET a maps Ping et al., 2018). However, there is a huge limitation in using the Landsat imagery for monitoring dynamic changes over land surfaces due to a 16days temporal resolution and the probability of cloud cover (Gao et al., 2006;Roy et al., 2008;Ping et al., 2018). On the other hand, the moderate resolution imaging spectroradiometer (MODIS) imagery has a daily temporal resolution, but its low spatial resolution limits its performance in environmental applications (Zhu et al., 2010;Gevaert and García-Haro, 2015;Ping et al., 2018). Thus, it has not so far been possible to monitor important environmental phenomena like ET a with the same sensor and high spatial and temporal resolutions. In order for an accurate spatial and temporal estimation of environmental ET a , image fusion methods are usually employed (Ha et al., 2013;Ping et al., 2018). In the process of image fusion, images acquired from two or more sensors are used as input data, and an output that has more information compared to the input images is derived (Roshan et al., 2015;Malhotra et al., 2021). This process of spatial resolution enhancement of remotely sensed data by fusion of separate data is called downscaling (Atkinson, 2013). For instance, downscaling of MODIS images using Landsat data could create images with a spatial resolution of 30 m and a temporal resolution of 1 day (Ke et al., 2016).
To date, various methods have been used and assessed for downscaling remotely sensed data in order to obtain ET a maps with high temporal and spatial resolutions. The downscaling can be conducted in three levels as downscaling of bands, downscaling of input data required for energy balance algorithms, and downscaling of ET a products. For instance, Gao et al. (2006) introduced an algorithm for combining images acquired from the enhanced thematic mapper plus (ETM+) and MODIS in order to achieve images with better clarity and higher temporal and spatial resolutions. They employed the spatial and temporal adaptive reflectance fusion model (STARFM) for prediction of reflectance bands in ETM+. They also used MODIS images at time 1 (the day on which Landsat images are available) and time 2 (the day on which Landsat images are not available) as well as ETM + images at time 1 to predict reflectance bands of ETM+ (i.e., bands 1, 2, 3, 4, 5, 7) at time 2. They compared the downscaled images with the ETM + images and demonstrated that the results were good.
Given the high accuracy of STARFM, it can be used for simulation of Landsat reflectance bands in order to monitor ET a . However, for achieving ET a maps with high temporal and spatial resolutions, thermal bands are also required. Hong et al. (2013) produced high-resolution daily ET a images using two different approaches. In the first approach, they obtained input data for SEBAL (e.g., surface albedo coefficient, normalized difference vegetation index [NDVI] and land surface temperature [LST]) from MODIS, and then downscaled these data to the spatial resolution of Landsat 7 using the subtraction and regression methods and obtained ET a using the downscaled data. In this approach, they calculated a mean absolute difference (MAD) of 0.55 mm day −1 for the subtraction method and 0.54 mm day −1 for the regression method. In the second approach, they obtained ET a using MODIS images and SEBAL, and then downscaled ET a obtained from MODIS to the spatial resolution of Landsat 7 using the subtraction and regression methods. In this approach, a MAD of 0.53 mm day −1 was calculated for the subtraction method and 0.57 mm day −1 for the regression method. Brindhu et al. (2013) conducted a study in Tamil Nadu and parts of the Thamiraparani River Basin (India) with a total area of 5,665 km 2 and downscaled LSTs obtained from MODIS with a spatial resolution of 960 m to the scale of Landsat 7 with a spatial resolution of 60 m using the Nonlinear DisTrad method. They compared the LSTs downscaled using the Nonlinear DisTrad method with the LSTs downscaled using the sharpening thermal (TsHARP) method and the LSTs obtained from Landsat 7. In their study, RMSEs were 0.96°K for the For monitoring ET a , reflectance bands are also required. Weng et al. (2014) modified STARFM in order to take into account the annual temperature cycle (ATC) and developed the spatiotemporal adaptive data fusion algorithm for temperature mapping (SADFAT) for downscaling thermal data. They attained a coefficient of determination (R 2 ) of 0.87 between the downscaled and Landsat thermal data. In another study, Mahour et al. (2017) evaluated the effect of downscaling of LSTs by the Co-Kriging method on the estimation of ET a for a farmland located in Qazvin (Iran). They used two approaches for downscaling. In the first approach, the LSTs obtained from MODIS with a spatial resolution of 1,000 m were downscaled to a spatial resolution of 250 m by the Co-Kriging method. They used the outcome and applied the surface energy balance system (SEBS) algorithm for estimating daily ET a with a spatial resolution of 250 m. In the second approach, they downscaled the ET a data obtained from MODIS with a resolution of 1,000 m using SEBS to a spatial resolution of 250 m using the Co-Kriging method. They used the LSTs obtained from Landsat 8 for accuracy assessment. Their results illustrated that a difference of 2.7o K between the LSTs obtained from Landsat 8 and the downscaled LSTs exists. Further, ET a calculated by SEBS and Landsat 8 was 5.76 mm day −1 and the downscaled ET a was 5.57 mm day −1 . The RMSE between the reference ET a and the downscaled ET a was 1.26 mm day −1 , and the RMSE between the reference LSTs and the downscaled LSTs was 3.67o K. They concluded that LST has a great impact on the estimation of ET a based on remote sensing data.
So far, there has been no study that uses downscaling techniques in three levels including downscaling of bands, downscaling of input data required for the energy balance algorithms, and downscaling of ET a images to obtain daily ET a images with a spatial resolution of 30 m. Thus, the present study aimed to employ downscaling techniques in three different scenarios for producing daily ET a maps with a spatial resolution of 30 m for the sugarcane farmlands of Amir Kabir Sugarcane Agro-industry, Khuzestan, Iran. In the first scenario, the reflectance and thermal bands of Landsat 8 were predicted using STARFM and SADFAT, respectively. Then, ET a was calculated using these bands and SEBAL. In the second scenario, the input data required for SEBAL were downscaled and then daily ET a with a spatial resolution of 30 m was calculated. In the third scenario, the ET a images obtained from MODIS were downscaled to the scale of Landsat 8. In the second and third scenarios, data downscaling was conducted by using the ratio, regression, and neural networks methods with two different approaches. In the first approach, for simulating the data with a spatial resolution of 30 m and a temporal resolution of 1 day, the Landsat image on day 1 and the relationship between the two MODIS images on day 1 and the other days were used. In the second approach, for simulating the data on the desired day, the simulated image on the previous day and the relationship between the two consecutive images of MODIS were used.

Study Area and Data Acquisition
The study area is the sugarcane farmlands of Amir Kabir Sugarcane Agro-industry. The company is located in south of Khuzestan province, Iran, at 48°16′ 49″ E and 31°2' 2" N. Amir Kabir Sugarcane Agro-industry is located 45 km from Ahwaz-Khorramshahr Road and bordered with Karun River from the east. The gross area of the abovementioned farmlands is 15,000 ha and the net area is 12,000 ha, which is divided into several 25-ha pieces ( Figure 1 Table 1.

Pre-Processing of Satellite Images
Due to the atmosphere between the sensor and land surface, the reflectance of the surface features before reaching the sensor is affected by two factors of absorption and scattering, leading to uncertainty in determination of the surface reflectance (Kaufman et al., 1997;Gao et al., 2000). In this study, the method introduced by Tasumi et al. (2008) was used for atmospheric correction of the MODIS and Landsat 8 images.

Downscaling of Reflectance Bands Using STARFM
In order to achieve images with high temporal and spatial resolutions, researchers used an image fusion algorithm for combining images of ETM + and MODIS (Gao et al., 2006). This algorithm is abbreviated as STARFM and is a common basic algorithm for temporal and spatial combination of satellite images. It predicts the value of the central pixel in Landsat images at the second time based on the calculation of correct weights for the neighboring pixels. Therefore, this algorithm moves over the Landsat classified image acquired on the first time as a moving window and downscales the image with low spatial resolution acquired at the second time with respect to the images of MODIS acquired at the first and second times and the classified image of Landsat acquired at the first time. The following equation was established for a heterogeneous pixel of MODIS: where (x i , y j ) indicates the location of the given pixel in Landsat and MODIS images, t 0 and t k are the dates on which the images of MODIS and Landsat were acquired.
Frontiers in Environmental Science | www.frontiersin.org December 2021 | Volume 9 | Article 795287 Equation 1 is logical for homogeneous pixels in MODIS image, but it should be considered that all pixels in MODIS image are not homogeneous and it is possible that the type of land cover changes over the prediction period. Thus, the neighborhood information was used for correcting the surface reflectance by applying a weight function: where w is the size of the moving window, (x w 2 , yw 2 ) is the central pixel of the window, and W ijk is a weight that indicates the contribution level of neighboring pixels in estimation of reflectance.
For ensuring that the information in Eq. 2 is correct and that the neighboring pixels are homogeneous, only spectrally similar pixels (pixels belonging to one spectral class) were used. A classified Landsat image was needed for achieving this. The ISOData algorithm was used as an unsupervised classification method.
Downscaling of Thermal Bands Using the SADFAT STARFM has been used for prediction of reflectance bands based on the assumption that surface reflectances in MODIS and Landsat images on the same day are similar (Gao et al., 2006;Masek et al., 2006). For homogeneous pixels, so long as this assumption i.e., band similarity between Landsat and MODIS thermal data on the same day is established, STARFM was used for downscaling thermal images and LST. However, changes of LST over time show that daily and seasonal variations are severe (Sabins, 1997;Weng et al., 2008). Weng et al. (2014) modified STARFM taking into account ATC and heterogeneous urban thermal landscape for prediction of thermal radiance and LST data. They named the new algorithm SADFAT. In this algorithm, the linear spectral mixture analysis (LSMA) was used for relating radiance in the Landsat and MODIS images so that temporal changes in the radiance could be included in the fusion model. For a homogeneous pixel in Landsat image, radiance at t p is equal to the total radiance of Landsat image at t 0 and a coefficient of difference in radiance in MODIS images at t p and t 0 . The required coefficient (a) can be determined through a regression relationship derived from the association between thermal bands of Landsat MODIS images. Since a large number of pixels contain more than one cover type, the radiance of the mixed pixel of the Landsat image was estimated according to the below processes.
The radiance change of the MODIS pixel from t1 to t2 was computed as:  where R is the radiance and M and L are abbreviations of MODIS and Landsat respectively. Seasonal change in LST was modeled using the annual temperature cycle (ATC) approximated using a sinusoidal function (Bechtel, 2012): where MAST is the mean annual surface temperature, YAST is the yearly amplitude surface temperature, w is the angular frequency, d is the day of year (DOY) relative to the equinox, and θ is the phase shift. Since spectral radiance is related to LST by the Plank's law, the radiance change of an L pixel from time t1 to t2 was quantified as: where θ is the phase shift or heat lag, c is the amplitude of the radiance variation, C is the constant, đ is the mean acquisition date, and d1 and d2 are the parameters input to the algorithm. Incorporating the ATC model, Eq. 3 can be re-written as: If the radiances of the kth L pixel at date t1 and t2 are known, Eq. 4 has an instance as: By combining Eqs 6, 7, Eq. 8 can be obtained: Since θ reflects the phase shift of a pixel and is associated with thermal properties of land surface materials, it can be regarded as constant as long as the land cover does not change in the observational period. Therefore, the ratio of the radiance change of kth L pixel to that of the corresponding M pixel is constant for a certain L pixel. Here, h k is called the conversion coefficient for consistency (Zhu et al., 2010).
Based on Eq. 8, if one pair of L and M radiance image at t 0 and another M radiance image at t p are available, the L radiance image at t p can be predicted using the following formula: where W i is the weight of the similar neighboring pixel and N is the number of similar pixels. The calculated radiance could be converted to LST using Planck's law. In Eq. 3, the weight of similar neighboring pixels of the central pixel should be calculated. This weight determines the contribution of the neighboring pixels to the calculation of a central pixel. Similarity between reflectance and thermal radiances in a pixel and the central pixel as well as the distance between the central pixel and a neighboring pixel increase the pixel weight.
In the MODIS sensor, thermal bands 31 and 32 are used to estimate LST. For this, the radiance was converted to the brightness temperature using the inverse of the Planck's law (Lu and Weng, 2006). where T i is the brightness temperature in Kelvin, RAD is the radiance in w m 2 μm.sr , h is Planck's constant, C is the velocity of light, K is Boltzmann's constant, and λ i is the wavelength in the middle of the band. The values of λ for bands 31 and 32 are 11.3 and 12.4 μm, respectively. To put it simply, the brightness temperature was obtained for each band as: The constant coefficients c 1 and c 2 for MODIS are considered 3.741775*10 −22 and 0.0143877, respectively. By obtaining brightness temperature, LST could also be obtained. Parodi (2000)  where Tb 31 and Tb 32 are brightness temperatures for bands 31 and 32 of MODIS sensor, respectively. In the Landsat 8 satellite, LST is obtained from bands 10 or 11. In this study, LST was obtained from band 10 using Eq. 13.

T s k 2 ln εNBk1
Rc + 1 where k 1 and k 2 are constant coefficients for band 10 (k 1 774.89 , k 2 1321.08). R c is the radiance of the thermal band in w m 2 μm.sr , and T s is land surface temperature (k).

Ratio and Regression Methods
In the ratio and regression methods, for simulation of the Landsat image at T1 (the day on which Landsat images are not available), a relationship was established between the two MODIS images at T0 and T1 by the ratio and regression methods. Then, by assuming that this relationship is also established between the two Landsat images at T0 (the day on which Landsat images are available) and T1 (Hong et al., 2013), the Landsat image at T1 was simulated ( Figure 2).

Neural Networks Method
The neural networks method has different algorithms, one of which is the multi-layer perceptron (MLP). MLP is formed of multiple layers and neurons. It consists of an input layer, one or more hidden layers, and an output layer. One of the differences between different neural networks is their distinct activation functions. In the hidden layer of MLP, mostly three types of sigmoid, hyperbolic and linear (identical) moving functions are used. In the present study, the sigmoid function was used in the hidden layer. Generally, in MLP networks, some factors should be determined properly including the number of hidden layers, the number of neurons in the hidden layers, the learning rate, the momentum, and the error threshold in order to reach a suitable model for solving a specific problem.

SEBAL for Calculation of ET a
The energy balance equation is the basis of remote sensing algorithms for calculation of ET a . The energy balance equation is as follows (Bastiaanssen et al., 1998): where R n is net radiation, G is soil heat flux, H is sensible heat flux, λET is latent heat flux, s is biomass energy, P is photosynthesis, and h is the horizontal component of sensible and latent heat fluxes.
In Eq. 4, photosynthesis and heat storage in plants could be ignored, because most plants consume less than 1% of the reached Sun radiation for photosynthesis. Daytime heat storage in plants could also be ignored, because it is important only when temperature changes fast (especially at sunset and sunrise) and when amounts of R n , H, and λET are small. Moreover, the horizontal component of sensible and latent heat fluxes indicates the net amount of energy exchanged by the plant in a horizontal direction. In dry weather, it can be equal to net Frontiers in Environmental Science | www.frontiersin.org December 2021 | Volume 9 | Article 795287 radiation, and despite its importance, it is usually not taken into account due to lack of a simple solution for evaluation. Considering such assumptions in the energy balance equation and elimination of those three components, Bastiaanssen et al. (1998) applied an algorithm of land surface energy namely SEBAL using the multi-step physical calculations. The equations and calculation processes for each parameter of R n , G and H are explained fully in the study of Bastiaanssen et al. (2002). Due to the high efficiency of SEBAL for calculation of ET a , this method was used to evaluate the efficiency of the used scenarios for calculation of daily ET a with a high spatial resolution.

RESULTS
Visual comparison of the simulated bands with the original bands of Landsat 8 (Figure 3) and the RMSE values provided in Table 2 indicated that STARFM had an acceptable performance. Also, comparison of LST determined based on the simulated bands FIGURE 5 | Comparison of ET a derived from the simulations and Landsat 8 image: (A) the first scenario, (B) the second scenario, the ratio method (the first and second approaches), (C) the second scenario, the regression method, the first approach, (D) the second scenario, the neural networks method, the first approach, (E) the second scenario, the regression method, the second approach, (F) the second scenario, the neural networks method, the second approach, (G) the third scenario, the ratio method (the first and second scenario), (H) the third scenario, the regression method, the first approach, (I) the third scenario, the neural networks method, the first approach, (J) the third scenario, the regression method, the second approach, (K) the third scenario, the neural networks method, the second approach.
Frontiers in Environmental Science | www.frontiersin.org December 2021 | Volume 9 | Article 795287 8 with LST obtained from the original image of Landsat 8 acquired on June 30, 2017 showed that SADFAT was able to simulate the LST distribution well (Figure 4).
In the second scenario, the input data of SEBAL (i.e., surface albedo, NDVI, leaf area index [LAI], and LST) were simulated by the ratio, regression, and neural networks methods with two approaches. In the ratio method, both approaches were the same and had identical performances (Table 3; Figure 5). In the regression method, the R 2 values obtained for the input data simulated by the first and the second approaches were the same ( Figure 5). ET a obtained by SEBAL for the input data simulated by the regression method with the first and second approaches had the same R 2 values but different RMSEs (Table 3; Figure 5). In the neural networks method, the first and second approaches were completely different (Table 3; Figure 5).
In the third scenario, ET a amounts simulated by the first and second approaches were identical and had equal performances (Table 3; Figure 5). In the regression method, the R 2 values calculated for ET a amounts simulated with the first and second approaches were the same, but the RMSEs were different (Table 3; Figure 5). In the neural networks method, the simulated ET a had different R 2 values and RMSEs (Table 3; Figure 5).
The simulated ET a images and the ET a images obtained from MODIS and Landsat 8 data on June 30, 2017 were visually FIGURE 5 | Comparison of ETa derived from the simulations and Landsat eight image: (A) the first scenario, (B) the second scenario, the ratio method (the first and second approaches), (C) the second scenario, the regression method, the first approach, (D) the second scenario, the neural networks method, the first approach, (E) the second scenario, the regression method, the second approach, (F) the second scenario, the neural networks method, the second approach, (G) the third scenario, the ratio method (the first and second scenario), (H) the third scenario, the regression method, the first approach, (I) the third scenario, the neural networks method, the first approach, (J) the third scenario, the regression method, the second approach, (K) the third scenario, the neural networks method, the second approach.
Frontiers in Environmental Science | www.frontiersin.org December 2021 | Volume 9 | Article 795287 compared ( Figure 6). In these images, pink points indicate minimum ET a and represent idle lands. Dark brown points show maximum ET a and illustrate lands with dense vegetation. As it is shown in image A2, pixels on the northern and eastern borders of the farmlands had the lower ET a than the other pixels, due to the neighboring barren lands.

Evaluation of the Used Scenarios
The first scenario with an RMSE of 0.68 mm day −1 had the best performance (Table 3). A major reason for this could be the simulation of the thermal band with higher accuracy in this scenario. In the first scenario, the value of RMSE for the simulated LST on June 30, 2017 is estimated to be 1.95°K. (Weng et al., 2014) used SADFAT for simulation of LST with the spatial resolution of Landsat 8 and compared the results with the images acquired by Landsat 8. They found RMSE values for the study area to be between 1.32°K to 2°K. Their finding is in agreement with the results of the present study. In SADFAT, changes of LST over the time gap between acquisition times of MODIS and Landsat images of the same day are taken into account. Whilst the ratio, regression, and neural networks methods function based on the assumption that LST values obtained from MODIS and Landsat images of the same day are similar. Mahour et al. (2017) also pointed out the importance of thermal data accuracy in determination of ET a . STARFM was used in the first scenario for simulating the reflectance bands. STARFM had a satisfactory performance in the study area ( Table 2). Gao et al. (2006), Hilker et al. (2009), andBhandari et al. (2012) also reported a high efficiency for STARFM in the simulation of Landsat reflectance bands. STARFM is greatly dependent on the homogeneity of study areas, and heterogeneity and variability of land cover in the area may significantly affect results of this algorithm (Gao et al., 2006). Thus, the homogeneity of the study area in the present study could be the main reason for the high R 2 (except band 5) and low RMSE values for the simulated bands. A major reason for the reduced R 2 of band 5 can be the existence of uncertainties related to downscaling. Bhandari et al. (2012) found low R 2 for bands 1 and 2 but suggested no specific reason for this. Rather than being an accuracy indicator, R 2 is a statistical measure to reveal how close the data are to the fitted regression line. Instead, RMSE is a measure of accuracy. Since the RMSE value for band 5 is reasonably well, therefore it may not distort the results. It is seen in Table 2 that the simulation accuracy varies between the different bands. This difference could have two major reasons. First, it is due to variable atmospheric effects in different wavelengths especially in shorter ones (Roy et al., 2008), which might have caused different atmospheric conditions at the acquisition times of the MODIS and Landsat images (Bhandari et al., 2012). Second, inadequate wavelength overlapping between the corresponding bands in the Landsat and MODIS images could have caused a spectral difference in the corresponding bands and consequently affected the outcome of the downscaling process (Pohl and Van Genderen, 1998).
The neural networks method in the third scenario with the second approach had the worst performance with an RMSE of 2.25 mm day −1 (Table 3). Nevertheless, it had a better accuracy than ET a obtained from the MODIS images with an RMSE of 3.19 mm day −1 . Therefore, it is deduced that the simulated actual ET a images in all three scenarios and using all methods with both approaches have better performances than the ET a images obtained from the MODIS data. Since the time required for the calculations in the third scenario is much shorter than that of the first and second scenarios and also according to Table 3, the regression method in the third scenario with the first approach with an RMSE of 0.87 mm day −1 is of a good accuracy as a quick and efficient method. Hong et al. (2013) aimed at producing ET a maps with a resolution of 30 m and daily temporal resolution. They used four methods namely input subtraction, output subtraction, input regression, and output regression. In the input subtraction and input regression methods, they downscaled the input data of SEBAL. They reported a higher accuracy for the input regression method compared to the input subtraction method. Furthermore, Spiliotopoulos et al. (2013) showed that the regression method had a greater accuracy than the subtraction method. Due to high similarity between the subtraction and ratio methods, it is derived that the results of the second scenario in the present study is similar to the results of the studies of Hong et al. (2013) and Spiliotopoulos et al. (2013). The lower accuracy of the ratio method compared to the regression and neural networks methods has different reasons. One of the reasons is that, in  the ratio method, a separate relationship is made between both pixels of the MODIS images acquired at times 1 and 2, and this relationship is only applied for the pixels of the Landsat image within that pixel of the MODIS image. In contrast, in the regression and neural networks methods, only one relationship is made between all pixels of the MODIS images acquired at times 1 and 2, and this relationship is applied for all pixels of the Landsat image at time 1, causing no severe change in the images downscaled by the regression and neural networks methods. Whilst the images downscaled by the ratio method have severe changes in some pixels and less changes in some others. By visual comparison of the images simulated by the ratio method and the images acquired by Landsat 8 (Figure 6), these severe changes can be seen in some pixels, while there are no severe changes in such pixels in the images simulated by the regression and neural networks methods. Another reason for this could be the lack of a complete geometric conformity of MODIS and Landsat images through the georeferencing process. The ratio and subtraction methods are more vulnerable to such geometric non-conformity between satellite images compared to the regression and neural networks methods (Hong et al., 2013). In STARFM and SDFAT, since a moving window is used for simulation of the central pixel value, the error due to the geometric non-conformity is much less than the ratio, regression, and neural networks methods. Moreover, other sources of uncertainties such as difference in viewing the sensor angle, Sun radiation angle, acquisition time, atmospheric correction, and calculation of emissivity could also affect the results (McCabe and Wood, 2006;Kim and Hogue, 2012).

Evaluation of the Used Approaches
The first and second approaches were similar in the ratio method, because the ratio method is linear and the relationship in this method is pixel to pixel. In the present study, the subtraction method was not used for simulation of the input data. However, due to similar processes in the subtraction and ratio methods, it can be concluded that the first and second approaches would be identical with similar performances in the subtraction method. In the second and third scenarios in the regression method, the first and second approaches had the same R 2 values, but the simulated ET a amounts by the regression method with the first approach were more accurate than the second approach. Moreover, in the neural networks method in both scenarios, the first approach has a better performance than the second approach. Thus, in the second and third scenarios, except in the ratio method in which both approaches have the same performance, the first approach has a better performance in the regression and neural networks methods. In the first approach, for simulating all images, the Landsat image was the reference image. Therefore, the interval between the reference image and the simulated image varied from 1 day to 15 days. In fact, in this approach, the downscaling methods were used once for simulation of the Landsat image on the desired date. In the second approach, the interval between the reference image and the simulated image is 1 day for all simulations. For simulating the Landsat image on the desired date, the simulated Landsat image for the day before was used. Actually, in the second approach, for simulating the image on the desired date, the downscaling methods are used 1 to 15 times. Thus, it can be deduced that the time gap in the first approach causes less error than using the downscaling methods several times in the second approach.

CONCLUSION
We applied three scenarios for downscaling of the MODIS images using the Landsat 8 images in order to estimate daily ET a with a spatial resolution of 30 m for the sugarcane farmlands. In total, our results show that STARFM and SADFAT algorithms have acceptable accuracies in the simulation of reflectance and thermal bands, respectively, of Landsat 8 images for homogeneous areas. We recommend using the first scenario for producing daily ET a maps with a spatial resolution of 30 m. Moreover, using the third scenario, the regression method, and the first approach is a simple process for quick production of accurate daily ET a maps with a spatial resolution of 30 m. Finally, it is suggested that future studies apply the developed method in this study to heterogeneous areas.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
HS and AS developed the idea and performed the study. SM, BM, and MN analyzed and validated the study and commented on the manuscript. MN corrected the technical issues of the manuscript. HS prepared the manuscript draft. All authors read the manuscript and confirmed it before submission.

ACKNOWLEDGMENTS
The German Federal Environmental Foundation (Deutsche Bundesstiftung Umwelt, DBU) is acknowledged for funding the Ph.D. studies of MN during preparation of the manuscript.
FIGURE 6 | regression method with the first approach, (J) ET a simulated in the third scenario by the neural networks method with the first approach, (K) ET simulated in the third scenario by the regression method with the second scenario, (L) ET a simulated in the third scenario by the neural networks with the second approach.