ORIGINAL RESEARCH article

Front. Environ. Sci., 04 April 2022

Sec. Atmosphere and Climate

Volume 10 - 2022 | https://doi.org/10.3389/fenvs.2022.826165

Feasibility of Random Forest and Multivariate Adaptive Regression Splines for Predicting Long-Term Mean Monthly Dew Point Temperature

  • 1. College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China

  • 2. Department of Civil and Environmental Engineering and Water Resources Research Center, University of Hawaii at Manoa, Honolulu, HI, United States

  • 3. Department of Civil and Environmental Engineering, College of Engineering, Chung-Ang University, Seoul, Korea

  • 4. Future Technology Research Center, National Yunlin University of Science and Technology, Douliou, Taiwan

  • 5. John von Neumann Faculty of Informatics, Obuda University, Budapest, Hungary

  • 6. Institute of Information Society, University of Public Service, Budapest, Hungary

  • 7. Institute of Information Engineering, Automation and Mathematics, Slovak University of Technology in Bratislava, Bratislava, Slovakia

Article metrics

View details

5

Citations

2,8k

Views

632

Downloads

Abstract

The accurate estimation of dew point temperature (Tdew) is important in climatological, agricultural, and agronomical studies. In this study, the feasibility of two soft computing methods, random forest (RF) and multivariate adaptive regression splines (MARS), is evaluated for predicting the long-term mean monthly Tdew. Various weather variables including air temperature, sunshine duration, relative humidity, and incoming solar radiation from 50 weather stations in Iran as well as their geographical information (or a subset of them) are used in RF and MARS as inputs. Three statistical indicators namely, root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (R) are used to assess the accuracy of Tdew estimates from both models for different input configurations. The results demonstrate the capability of the RF and MARS methods for predicting the long-term mean monthly Tdew. The combined scenarios in both the RF and MARS methods are found to produce the best Tdew estimates. The best Tdew estimates were obtained by the MARS model with the RMSE, MAE, and R of respectively 0.17°C, 0.14°C, and 1.000 in the training phase; 0.15°C, 0.12°C, and 1.000 in the validation phase; and 0.18°C, 0.14°C, and 0.999 in the testing phase.

Introduction

Dew point temperature (Tdew) is defined as the temperature (at constant pressure) in which water vapor in the air condenses into liquid water. The accurate estimation of Tdew is required in many fields such as climatology, hydrology, meteorology, and agronomy (Emmel et al., 2010; Millán et al., 2010; Katul et al., 2012; Feld et al., 2013; Mohammadi et al., 2015; Mohammadi et al., 2016; Alizamir et al., 2020a). Tdew along with the wet bulb temperature can be used to compute ambient temperature (Snyder and Melo-Abreu, 2005; Shank, 2006; Mohammadi et al., 2016). The dew point also allows plants to adapt themselves for possible frosts (Mohammadi et al., 2016). Tdew is an essential element for plant survival, particularly in regions with low precipitation (Agam and Berliner, 2006). Tdew is necessary for estimating relative humidity and evapotranspiration (Hubbard et al., 2003). Robinson (2000) stated that Tdew is important for assessing long-term climate variability.

In recent years, soft computing and data mining approaches have been widely employed as powerful techniques for predicting Tdew. A review of the literature shows that random forest (RF) and multivariate adaptive regression splines (MARS) methods have rarely been utilized to estimate Tdew; however, they have been extensively used for predicting other hydro-climatological variables (Heddam et al., 2020; Kisi et al., 2021; Tan et al., 2021).

Shank et al. (2008) predicted Tdew at 20 weather stations in Georgia by using weather data into artificial neural networks (ANN). It was found that ANN could reliably predict Tdew. Zounemat-Kermasni (2012) predicted hourly Tdew data via the ANN and multiple linear regression (MLR) approaches. Kisi et al. (2013) evaluated the robustness of generalized regression neural networks (GRNN), Kohonen self-organizing feature maps (KSOFM), and adaptive neuro-fuzzy inference system (ANFIS) for estimating Tdew at the Daegu, Pohang, and Ulsan stations in South Korea. The accuracy of GRNN and ANFIS were similar and better than that of KSOFM. Shiri et al. (2014) estimated daily Tdew data at two weather stations in the Republic of Korea using gene expression programming (GEP) and ANN models. Various combinations of climatic variables were used as inputs, with the accuracy of GEP was found to be higher than that of ANN. Kim et al. (2015) investigated the potential of multi-layer perceptron (MLP), GRNN, and MLR in estimating daily Tdew at two weather stations in California. They defined different combinations of weather data as model predictors. The results indicated that the Tdew estimates from GRNN were better than those of MLP. Mohammadi et al. (2015) evaluated the accuracy of the extreme learning machine (ELM), ANN, and support vector machine (SVM) approaches in predicting daily Tdew at Bandar Abbas and Tabas, Iran. The mean air temperature, relative humidity, atmospheric pressure, solar radiation, and vapor pressure were used as model inputs. The results revealed that ELM and ANN produced the best and worst daily Tdew estimates, respectively. Amirmojahedi et al. (2016) utilized a coupled model by combining ELM with wavelet transform (WT) for predicting daily Tdew in Bandar Abbas, South Iran. The accuracies of hybrid ELM-WT and single ELM were compared with those of SVM and ANN. Four different input scenarios were used in their models. Mohammadi et al. (2016) estimated daily Tdew at two stations in Iran by the ANFIS technique. Different ANFIS models were developed using various input combinations. Their results demonstrated that water vapor pressure was the most influential variable for the accurate prediction of Tdew. Mehdizadeh et al. (2017a) employed GEP to estimate daily Tdew at the Urmia and Tabriz stations in Northwest Iran. Various input scenarios were developed using meteorological variables and lagged Tdew data. Moreover, Tdew at each station was predicted using data from a nearby station. Qasem et al. (2019) estimated daily Tdew at the Tabriz station in Iran using GEP, SVM, and M5 model tree (M5), with M5 was found to show the highest performance. Naganna et al. (2019) attempted to increase the accuracy of estimating Tdew at two stations in India by coupling the MLP with two bio-inspired optimization algorithms. The hybrid methods outperformed the classic MLP. Alizamir et al. (2020b) recommended a deep echo state network (DESN) to forecast daily Tdew at two locations in the Republic of Korea. The proposed model produced the best performance compared to other soft computing methods. Dong et al. (2020) improved the performance of ELM by optimization algorithms to estimate daily Tdew in Yangling, China. They indicated the better accuracy of hybrid models compared to the classic ELM.

Given the importance of Tdew in various disciplines, particularly agriculture and hydrology, its precise prediction is vital. Therefore, this study investigated the applicability of random forest (RF) and multivariate adaptive regression splines (MARS) for predicting the long-Temperature-, sunshine duration-, radiation-, other climatic variables-, geographical information-, and combined-based input scenarios were considered in this study.

Only a few studies used RF and MARS to predict Tdew (Shiri, 2018). Also, the correct choice of inputs for soft computing models plays an important role in achieving their optimal performance. Hence, this study attempted to find the best input combination.

Materials and Methods

Study Region and Data

The study area was Iran, which is located in southwest Asia. With an area of about 1,648,000 km2, Iran spans over the latitude of25°00 N′- 40°00 N′ and longitude of 44°00′ E-63°30′ E. The locations of the study stations are shown in Figure 1. Table 1 presents the geographical properties of the selected stations. As can be seen in Table 1, the long-term mean annual Tdew ranges from -2.58 °C at Kerman to 20.70 °C at Chabahar.

FIGURE 1

FIGURE 1

Spatial distribution of the studied stations in Iran.

TABLE 1

StationsLatitude (°N)Longitude (°E)Altitude (m)Mean Annual Tdew (°C)
Abadan30.3748.256.610.03
Ahwaz31.3348.6722.59.55
Arak34.1049.7717080.02
Ardabil38.2548.2813323.57
Babolsar36.7252.65-2113.45
Bam29.1058.351066.92.60
Bandar Abbas27.2256.379.819.24
Bandar Anzali37.4849.45-23.613.26
Bandar Lengeh26.5354.8322.719.40
Birjand32.8759.201491-0.82
Bojnurd37.4757.2711123.72
Bushehr28.9750.82916.98
Chabahar25.2860.628.020.70
Dezful32.4048.381439.30
Fasa28.9753.681288.32.80
Gorgan36.9054.40011.49
Hamedan34.8748.531741.50.64
Ilam33.6346.4313370.42
Iranshahr27.2060.70591.15.62
Isfahan32.6251.671550.4-0.02
Jask25.6357.775.220.67
Karaj35.9250.901312.52.58
Kashan33.9851.45982.33.36
Kerman30.2556.971753.8-2.58
Kermanshah34.3547.151318.60.64
Khorramabad33.4348.281147.83.06
Khoy38.5544.9711033.49
Mashhad36.2759.63999.22.98
Qazvin36.2550.051279.22.35
Qom34.7050.85877.42.02
Ramsar36.9050.67-2013.06
Rasht37.3249.62-8.612.60
Sabzevar36.2057.659721.40
Sanandaj35.3347.001373.40.34
Saqez36.2546.271522.80.81
Sari36.5553.002313.13
Semnan35.5853.4211272.84
Shahrekord32.2850.852048.9−0.82
Shahrud36.4254.951349.12.31
Shiraz29.5352.6014841.87
Tabas33.6056.927112.34
Tabriz38.0846.2813611.37
Tehran35.6851.321190.81.48
Torbat-e Heydarieh35.2759.221450.81.21
Urmia37.6745.0513282.72
Yasuj30.6851.551816.3−0.06
Yazd31.9054.281237.2−1.03
Zabol31.0361.48489.24.64
Zahedan29.4760.881370−0.74
Zanjan36.6848.4816630.90

Geographical properties of the stations in Iran and long-term mean annual values of Tdew.

Meteorological data from 50 stations (compiled by the Iran Meteorological Organization, IMO) were utilized in this study. The data include long-term mean monthly dew point temperature (Tdew), minimum, maximum, and mean air temperatures (Tmin, Tmax, T), solar radiation (Rs), sunshine duration (S), relative humidity (RH), vapor pressure (Vp), and precipitation (P) between 1951 and 2015. Statistical characteristics of these variables are presented in Table 2. In this table, So and Ra denote the maximum possible sunshine duration and extraterrestrial radiation, respectively, which were calculated based on the relationships presented by Allen et al. (1998). La, Lo and Alt are the latitude, longitude, and altitude of study stations, respectively. We can observe that Tmin, So, Ra and Vp respectively in the temperature-sunshine duration- radiation- and other meteorological variables-based input scenarios have the highest correlations with Tdew (Table 2). Figure 2 illustrates the long-term mean monthly of meteorological variables in the study stations.

TABLE 2

ParameterMinimumMaximumMeanStandard DeviationCoefficient of VariationCorrelation with Tdew
Tdew, °C−7.9027.625.227.721.481.000
Tmin, °C−8.6930.7011.179.220.830.793
Tmax, °C2.3046.3024.2210.280.420.590
T, °C−2.7938.0017.829.720.550.695
S, hr2.8911.877.972.270.280.228
So, hr9.3514.6512.001.550.130.465
S/So, -0.250.880.660.140.220.055
Rs, MJ m−26.2027.8417.836.070.340.400
Ra, MJ m−214.6541.7030.378.550.280.491
Rs/Ra, -0.380.690.580.070.120.055
RH, %17.0087.0051.3019.340.380.218
Vp, hpa3.6937.2110.576.710.630.964
P, mm0.00308.9129.6339.591.34−0.009
α, -1.0012.006.503.450.530.142
La, °N25.2838.5533.483.600.11−0.287
Lo, °E44.9761.4852.564.620.090.113
Alt, m−23.602048.90933.26639.820.69−0.737

Statistical characteristics of long-term mean monthly meteorological data.

FIGURE 2

FIGURE 2

Long-term mean monthly meteorological variables in the study stations.

The data were split into three parts. 70% (420 months), 15% (90 months), and 15% (months) of the data were used for training, testing, and validating the models, respectively.

Random Forest

Random forest (RF), first developed by Breiman (2001), is a powerful ensemble learning algorithm. This model can be employed for regression, classification, and unsupervised learning problems (Liaw and Wiener, 2002). Many decision trees are created using the RF technique via permutation and continual variation of the elements influencing the intended parameter, before all created trees are incorporated for the prediction. Over-fitting, which may occur in the decision tree approach, is eliminated when the number of trees increases. Hence, at every phase of tree growth, the developed model becomes more accurate, and the error rate is reduced. In the RF, the bagging process is utilized to choose random samples of variables as the training dataset. Next, for each variable, if the values of that variable are permuted across the out-of-bag observations, the function specifies the model prediction error (Trigila et al., 2015). Various bootstrap samples of the data, a sampling approach with permutations, were involved in the construction of the RF. Therefore, some out-of-bag datasets were generated from the training dataset via the repetition of the sampling operation.

The number of trees is the most important feature affecting the accuracy of RF (Breiman, 2001). The optimal number of trees is determined by trial and error. 500 trees were used in the RF as increasing the number of trees did not improve its performance.

Multivariate Adaptive Regression Splines

Multivariate adaptive regression splines (MARS) were initially presented by Friedman (1991). This is a non-parametric regression technique, in which the response/target variable can be estimated by using a series of coefficients and functions called basis functions. Cheng and Cao (2014) stated that one of the advantages of MARS is its ability to estimate the contributions of these basis functions. Therefore, the additive and interactive influences of input predictors are allowed to specify the target variable.

The typical form of a MARS model can be defined as follows:where y is the dependent variable predicted by MARS, x is the independent variable(s), co is a primary constant or bias, ci is the coefficient for the ith basis function, and bi(x) indicates the ith basis function.

The MARS model consists of two phases: forward and backward. The prediction process begins using an intercept, which is the average of the dependent parameter values. The basis functions are subsequently added continuously to the developed model. It should be noted that when the basis functions are added, the model considers the functions that cause a significant reduction in the sum of square errors. In the forward stage, an over-fitted MARS that include a large number of knots is realized. Then, the backwards stage prunes the model until a suitable MARS is presented based on the lowest value for the generalized cross-validation criterion.

Performance Investigation Metrics

The accuracies of the models were evaluated using three statistical metric: root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (R). These metrics can be expressed as follows:where To,i and Tp,i are the ith measured and predicted long-term mean monthly Tdew, respectively; and denote the mean of the measured and predicted values of the long-term mean monthly Tdew, respectively, and N is the number of data points.

Low values for the RMSE and MAE indices, and a high value of the R index indicate higher performance of the model for predicting the long-term mean monthly Tdew.

Results and Discussion

This study evaluated the performance of two soft computing approaches, RF and MARS, for predicting the long-term mean monthly Tdew at 50 stations in Iran. Thirty-one scenarios in six categories were considered to identify the most important variables affecting Tdew, and to determine the best input combinations. The RMSE, MAE, and R values were employed to assess the accuracy of the methods.

Performance of RF and MARS Approaches

The statistical indices of dew point estimates from the RF and MARS approaches for various input scenarios are presented in Tables 3, 4, respectively.

TABLE 3

Type of ScenariosInputsTrainingValidationTesting
RMSE (°C)MAE (°C)RRMSE (°C)MAE (°C)RRMSE (°C)MAE (°C)R
Temperature-basedTmin4.924.060.7824.283.240.8862.621.790.922
Tmax6.174.780.6286.564.710.6573.602.890.899
T5.634.470.7005.313.890.7963.002.300.921
Tmin, Tmax3.442.800.9062.381.890.9651.611.230.941
Tmin, T3.783.070.8812.842.220.9541.901.350.930
Tmin, Tmax, T3.743.110.8872.852.220.9531.771.290.938
Sunshine duration-basedS7.355.750.3667.675.430.4485.064.250.673
So6.445.120.5936.534.860.6724.383.860.838
S/So7.605.810.2807.715.630.4245.124.010.543
So, S5.984.660.6645.804.230.7564.043.320.783
So, S/So5.914.610.6725.674.130.7643.943.140.788
So, S, S/So5.854.550.6865.804.320.7623.823.140.807
Radiation-basedRs7.005.610.4667.075.350.5844.633.890.789
Ra6.555.170.5736.704.970.6554.403.760.824
Rs/Ra7.605.810.2807.715.630.4245.124.010.543
Ra, Rs6.204.870.6276.024.570.7364.183.400.785
Ra, Rs/Ra5.804.500.6905.464.020.7913.843.040.809
Ra, Rs, Rs/Ra5.904.480.6745.664.190.7723.713.030.822
Other meteorological variables-basedRH6.905.400.4846.795.280.6284.843.780.231
Vp0.530.310.9980.670.340.9970.390.210.996
P7.375.730.3707.545.770.5545.354.280.495
Vp, RH0.570.320.9970.700.370.9970.350.210.997
Vp, P0.580.320.9970.710.370.9970.360.210.997
Vp, P, RH0.850.500.9960.870.460.9970.540.340.994
CombinedVp, Tmin0.560.320.9980.680.360.9970.350.210.997
Vp, So0.530.310.9980.650.350.9970.350.210.997
Vp, Ra0.540.310.9980.670.360.9970.360.220.997
Vp, Tmin, Ra0.770.550.9960.700.450.9980.450.290.995
Vp, Tmin, So0.760.540.9960.710.450.9980.430.270.995
Vp, Tmin, Ra, So0.630.430.9970.640.380.9980.380.230.996
Geographical information-basedLa, Lo, Alt, α2.311.780.9592.291.830.9681.691.360.943

Statistical indices of Tdew estimates from the RF model for the training, validation, and testing phases.

Note: Bold values indicate the statistical metrics of the best input.

TABLE 4

Type of ScenariosInputsTrainingValidationTesting
RMSE (°C)MAE (°C)RRMSE (°C)MAE (°C)RRMSE (°C)MAE (°C)R
Temperature-basedTmin5.014.100.7714.153.160.8882.631.760.923
Tmax6.374.990.5916.444.600.6713.843.070.901
T5.804.670.6785.634.090.7633.252.430.928
Tmin, Tmax2.932.300.9292.361.770.9651.611.220.927
Tmin, T3.152.540.9172.401.800.9661.541.140.935
Tmin, Tmax, T2.882.270.9312.211.610.9701.521.190.936
Sunshine duration-basedS7.525.960.3077.745.600.4375.074.330.660
So6.795.390.5136.965.260.5814.653.950.782
S/So7.705.940.2317.895.730.3895.254.210.499
So, S6.044.700.6465.574.140.7553.993.220.733
So, S/So6.044.640.6455.574.000.7613.933.110.743
So, S, S/So5.684.290.6954.823.370.8323.792.980.747
Radiation-basedRs7.085.730.4457.235.580.5434.834.100.743
Ra6.885.480.4937.145.510.5484.603.910.807
Rs/Ra7.705.940.2317.895.730.3895.254.210.499
Ra, Rs5.794.430.6825.463.910.7644.283.260.666
Ra, Rs/Ra5.814.450.6785.614.330.7503.983.070.745
Ra, Rs, Rs/Ra6.144.750.6306.044.580.7014.133.370.722
Other meteorological parameters-basedRH6.995.430.4646.925.390.5845.404.070.083
Vp0.480.380.9980.480.360.9980.580.440.991
P7.555.960.2967.795.920.4395.434.480.508
Vp, RH0.350.270.9990.280.221.0000.370.270.997
Vp, P0.440.340.9980.390.300.9990.450.330.995
Vp, P, RH0.240.181.0000.190.141.0000.230.170.999
CombinedVp, Tmin0.280.220.9990.200.151.0000.230.190.999
Vp, So0.350.260.9990.270.201.0000.340.260.997
Vp, Ra0.320.240.9990.250.181.0000.310.220.998
Vp, Tmin, Ra0.240.181.0000.160.121.0000.210.170.999
Vp, Tmin, So0.190.151.0000.160.121.0000.190.150.999
Vp, Tmin, Ra, So0.170.141.0000.150.121.0000.180.140.999
Geographical information-basedLa, Lo, Alt, α2.602.040.9442.161.670.9712.511.830.866

Statistical indices of Tdew estimates from the MARS model for training, validation, and testing phases.

Note: Bold values indicate the statistical metrics of the best input.

In the temperature-based input scenarios, Tmin and T both produced better results than Tmax., Tdew was found to have a higher correlation with Tmin than T and Tmax. Therefore, better results were obtained by employing Tmin as the input. The superiority of Tmin compared to T and Tmax was also found by Mohammadi et al. (2016) and Mehdizadeh et al. (2017a). Tdew is more correlated with Tmin as cool air cannot retain water vapor much longer, meaning the effect of Tmin on Tdew is greater than those of Tmax and T (Mehdizadeh et al., 2017a). To develop scenarios with more inputs, T and Tmax were added to Tmin. A similar strategy was followed to develop scenarios with multiple inputs for other categories. The input combination of Tmin and Tmax exhibited a better accuracy than Tmin and T. Also, the scenarios with all inputs generally yielded better results in comparison with the scenarios with fewer inputs, particularly single-input scenarios. Air temperature is typically measured at all weather stations. Therefore, it can be easily used as a possible input predictor to predict Tdew.

Among the sunshine duration-based scenarios, So and S/So were the best and the worst predictors, respectively. Input combinations So and S, and So and S/So generally produced a similar accuracy, particularly for the MARS model. Interestingly, the So and S/So scenario was slightly better than the So and S scenario in the RF approach. The full-input scenario performed best in both the RF and MARS approaches. However, the performance of this scenario is still not accurate enough for predicting Tdew. Additionally, a sunshine duration sensor is needed to measure the sunny hours, which may not be available at some locations. Therefore, the application of sunshine duration variables as the only input of the models is not recommended.

In the radiation-based scenarios, the input Ra showed the best accuracy, while the performance of the clearness index (Rs/Ra) was not as good as Rs. In general, the performance of the Ra and Rs/Ra input combinations was slightly better than that of Ra and Rs single-input predictors. The RF approach generally produced the highest accuracy with the full-input scenario in the radiation-based classes. However, for the MARS models, two-input scenarios exhibited better performance than the full-input scenario. Similar to the sunshine duration scenarios, radiation-based input combinations did not perform satisfactorily, resulting in higher values of RMSE and MAE and lower values of R. Solar radiation is measured by pyranometer, a relatively expensive device that may not be available at weather stations in developing countries. Therefore, the use of radiation-based scenarios may be limited.

In the other meteorological scenarios, various combinations of RH, Vp, and P were examined. The results for the single-input scenarios show that Vp is the most influential input variable for the accurate prediction of Tdew. Also, the performance of this predictor is better than the most effective variables in temperature- (i.e., Tmin), sunshine duration- (i.e., So), and radiation-based (i.e., Ra) scenarios. For the Vp predictor, the RMSE, MAE, and R of Tdew estimates from the RF method in the testing phase were 0.39°C, 0.21°C, and 0.996, respectively. Corresponding values from the MARS method were0.58°C, 0.44°C and 0.991. Furthermore, the model with RH as input performed better than P. Comparing the statistical indices of single RH and P scenarios with the two- and full-input scenarios shows that the accuracy of Tdew predictions significantly increased by adding Vp to RH and P. For the two-input and three-input scenarios, the Vp and RH combination in the RF method, and the Vp, P, and RH combination in the MARS method were the best performers.

The most important variables of the four classes (i.e., Tmin, So, Ra, and Vp) were employed to develop the combined scenarios. The performance of Tmin, So, and Ra was not as good as that of Vp. However, the feasibility of Tmin, So, and Ra was considerably improved by adding Vp into them. In the combined-based classes with two inputs, Vp and Tmin in the MARS model, and Vp and So in the RF model yielded slightly better Tdew estimates. Interestingly, utilizing three-input and four-input scenarios did not necessarily increase the accuracy of the RF method. But, the accuracy of the MARS method was enhanced by increasing the number of predictors. All combined scenarios produced reliable results due to the higher R values and lower RMSE and MAE values. Unfortunately, these scenarios require many weather variables, which is typically unavailable in developing countries. These scenarios can only be used to predict Tdew at weather stations, which are able to measure all required meteorological parameters.

The long-term mean monthly Tdew can also be predicted from the geographical characteristics (i.e., latitude, longitude, and altitude) and periodicity (α), which denotes the number of months (i.e., one for January and 12 for December). These predictors can be applied to predict the long-term mean monthly Tdew without using meteorological data. These results support the outcomes of previous studies (Kisi et al., 2015; Kisi and Sanikhani, 2015; Mehdizadeh et al., 2017b; Sanikhani et al., 2018) in which the geographical information and number of month were successfully utilized in soft computing models to predict mean monthly time series of hydrological variables such as air and soil temperatures, precipitation, and reference evapotranspiration.

As can be seen in Tables 3, 4, Tmin, So, Ra, and Vp variables showed more accurate results than the other sole-input scenarios. The better performance of these predictors in their respective scenario classes can be attributed to their high correlations with Tdew (see Table 2).

Comparison of MARS and RF Approaches for Different Input Scenarios

It can be concluded that the RF method is generally superior to the MARS method for the single-input temperature-, sunshine duration-, and radiation-based scenarios. However, the MARS approach generally showed a better performance for the multi-input scenarios. The geographical information-based scenario was superior in the RF method compared to the MARS method. In contrast, the other weather variable-based classes (except the single RH and single P inputs, and the combined scenarios) performed better in MARS than RF.

Comparison of predicted and measured long-term mean monthly Tdew values by the best inputs for the training, validation, and testing phases are depicted in Figure 3. As can be seen in Figure 3, these inputs can accurately predict long-term mean monthly Tdew. As shown in Tables 3, 4, the input combination of Vp and So in the RF approach, and Vp, Tmin, Ra, and So in the MARS model were the superior combinations in all of the three study periods (bold text in Tables 3, 4). The estimates of long-term mean monthly Tdew using these inputs are very close to the measured data, particularly for the MARS method.

FIGURE 3

FIGURE 3

Dew point temperature (Tdew) predicted by the superior scenarios of RF and MARS approaches versus the measured values for the training, validation, and test phases.

The results revealed that the other weather variable-based (except the single RH and single P variables) and combined scenarios outperformed the other scenarios (Table … … ). However, for both methods, combined scenarios indicated a slightly better performance over other weather variables-based scenarios. Temperature-based combinations had better performance compared to sunshine duration- and radiation-based scenarios, which both had the lowest prediction accuracies. Furthermore, the accuracy of the geographical information-based combinations was better than the temperature-, sunshine duration-, and radiation-based scenarios. This confirms the feasibility of RF and MARS for predicting the long-term mean monthly Tdew from the geographical information and the periodicity term.

Conclusion

This study evaluated the performance of two soft computing approaches, random forest (RF) and multivariate adaptive regression splines (MARS), for predicting the long-term mean monthly

Tdew

. To specify the influential variables, different input combinations consisting of meteorological variables, geographical characteristics, and the periodicity component were employed as inputs in the RF and MARS models. The meteorological variables included minimum, maximum, and mean air temperatures (

Tmin

,

Tmax

, and

T

); actual sunshine duration, maximum possible sunshine duration, and sunshine duration ratio (

S

,

So

, and

S/So

); actual solar radiation, extraterrestrial radiation, and clearness index (

Rs

,

Ra

, and

Rs/Ra

); and relative humidity (

RH

), vapor pressure (

Vp

), and precipitation (

P

). Thirty-one input scenarios were considered in six different categories: temperature-, sunshine duration-, radiation-, other weather variable-, geographical information-based, and combined scenarios. The results obtained are summarized as follows:

  • • For the single-input scenarios, Tmin, So, Ra, and Vp were the optimum inputs for the temperature-, sunshine duration-, radiation-, and other weather variables r-based scenarios, respectively. Among these variables, Vp had the best performance.

  • • sunshine duration- and radiation-based scenarios showed the lowest accuracy, while the combined scenarios performed the best.

  • • The geographical information-based scenarios were superior to the temperature-, sunshine duration-, and radiation-based scenarios. Therefore, the geographical properties and periodicity term can be used to predict the long-term mean monthly Tdew without using any meteorological data.

  • • In general, the single-input scenarios had a higher accuracy for the RF model compared to the MARS model. While, the multi-input scenarios in the MARS model outperformed the RF method.

  • • The best multi-input combinations were Vp and So for RF, and Vp, Tmin, Ra and So for MARS.

  • • Vp can be used as the sole input in both the RF and MARS approaches to predict the long-term mean monthly Tdew with acceptable accuracy.

Often only a few input configurations were used to estimate different hydrologic variables such as evaporation, solar radiation, soil temperature. The various inputs scenarios used in this study can be tested in future works to find the best input combinations for estimating different variables of interest. Other standalone and coupled models can be used in future studies to estimate Tdew and compare it with the outcomes of this work.

Statements

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author contributions

All the authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Nomenclature

  • Tdew

    Dew point temperature

  • MARS

    Multivariate adaptive regression splines

  • RF

    Random forest

  • ANN

    Artificial neural networks

  • MLR

    Multiple linear regression

  • GRNN

    Generalized regression neural networks

  • KSOFM

    Kohonen self-organizing feature maps

  • ANFIS

    Adaptive neuro-fuzzy inference system

  • GEP

    Gene expression programming

  • MLP

    Multi-layer perceptron

  • ELM

    Extreme learning machine

  • SVM

    Support vector machine

  • WT

    Wavelet transform

  • M5

    M5 model tree

  • DESN

    Deep echo state network

  • Tmin

    Minimum air temperature

  • Tmax

    Maximum air temperature

  • T

    Mean air temperature

  • S

    sunshine duration

  • So

    Maximum possible sunshine duration

  • Rs

    Solar radiation

  • Ra

    Extraterrestrial radiation

  • RH

    Relative humidity

  • Vp

    Vapor pressure

  • P

    Precipitation

  • La

    Latitude

  • Lo

    Longitude

  • Alt

    Altitude

  • y

    Dependent variable predicted using the MARS

  • x

    Independent variable in MARS

  • co

    Bias

  • ci

    Coefficient for the ith basis function of the MARS

  • bi(x)

    ith basis function

  • RMSE

    Root mean square error

  • MAE

    Mean absolute error

  • R

    Correlation coefficient

  • To,i

    ith measured long-term mean monthly Tdew

  • Tp,i

    ith predicted long-term mean monthly Tdew

  • Mean of the measured values of the long-term mean monthly Tdew

  • Mean of the predicted values of the long-term mean monthly Tdew

References

  • 1

    AgamN.BerlinerP. R. (2006). Dew Formation and Water Vapor Adsorption in Semi-arid Environments-A Review. J. Arid Environments65 (4), 572–590. 10.1016/j.jaridenv.2005.09.004

  • 2

    AlizamirM.KimS.Zounemat-KermaniM.HeddamS.KimN. W.SinghV. P. (2020a). Kernel Extreme Learning Machine: an Efficient Model for Estimating Daily Dew point Temperature Using Weather Data. Water12 (9), 2600. 10.3390/w12092600

  • 3

    AlizamirM.KimS.KisiO.Zounemat-KermaniM. (2020b). Deep echo State Network: a Novel Machine Learning Approach to Model Dew point Temperature Using Meteorological Variables. Hydrological Sci. J.65 (7), 1173–1190. 10.1080/02626667.2020.1735639

  • 4

    AllenR. G.PereiraL. S.RaesD.SmithM. (1998). Crop Evapotranspiration. GuideLines for Computing Crop Evapotranspiration. Rome, Italy: FAO Irrigation and Drainage Paper No. 56.

  • 5

    AmirmojahediM.MohammadiK.ShamshirbandS.Seyed DaneshA.MostafaeipourA.KamsinA. (2016). A Hybrid Computational Intelligence Method for Predicting Dew point Temperature. Environ. Earth Sci.75, 1–12. 10.1007/s12665-015-5135-7

  • 6

    BreimanL. (2001). Random Forests. Mach. Learn.45, 5–32. 10.1023/a:1010933404324

  • 7

    ChengM.-Y.CaoM.-T. (2014). Accurately Predicting Building Energy Performance Using Evolutionary Multivariate Adaptive Regression Splines. Appl. Soft Comput.22, 178–188. 10.1016/j.asoc.2014.05.015

  • 8

    DongJ.WuL.LiuX.LiZ.GaoY.ZhangY.et al (2020). Estimation of Daily Dew point Temperature by Using Bat Algorithm Optimization Based Extreme Learning Machine. Appl. Therm. Eng.165, 114569. 10.1016/j.applthermaleng.2019.114569

  • 9

    EmmelC.KnippertzP.SchulzO. (2010). Climatology of Convective Density Currents in the Southern Foothills of the Atlas Mountains. J. Geophys. Res.115 (D11). 10.1029/2009jd012863

  • 10

    FeldS. I.CristeaN. C.LundquistJ. D. (2013). Representing Atmospheric Moisture Content along Mountain Slopes: Examination Using Distributed Sensors in the Sierra Nevada, California. Water Resour. Res.49 (7), 4424–4441. 10.1002/wrcr.20318

  • 11

    FriedmanJ. H. (1991). Multivariate Adaptive Regression Splines. Ann. Statist.19, 1–67. 10.1214/aos/1176347963

  • 12

    HeddamS.PtakM.ZhuS. (2020). Modelling of Daily lake Surface Water Temperature from Air Temperature: Extremely Randomized Trees (ERT) versus Air2Water, MARS, M5Tree, RF and MLPNN. J. Hydrol.588, 125130. 10.1016/j.jhydrol.2020.125130

  • 13

    HubbardK. G.MahmoodR.CarlsonC. (2003). Estimating Daily Dew point Temperature for the Northern Great Plains Using Maximum and Minimum Temperature. Agron. J.95 (2), 323–328. 10.2134/agronj2003.0323

  • 14

    KatulG. G.OrenR.ManzoniS.HigginsC.ParlangeM. B. (2012). Evapotranspiration: a Process Driving Mass Transport and Energy Exchange in the Soil-Plant-Atmosphere-Climate System. Rev. Geophys.50 (3). 10.1029/2011RG000366

  • 15

    KimS.SinghV. P.LeeC.-J.SeoY. (2015). Modeling the Physical Dynamics of Daily Dew point Temperature Using Soft Computing Techniques. KSCE J. Civ Eng.19 (6), 1930–1940. 10.1007/s12205-014-1197-4

  • 16

    KisiO.KimS.ShiriJ. (2013). Estimation of Dew point Temperature Using Neuro-Fuzzy and Neural Network Techniques. Theor. Appl. Climatol.114 (3-4), 365–373. 10.1007/s00704-013-0845-9

  • 17

    KisiO.SanikhaniH.Zounemat-KermaniM.NiaziF. (2015). Long-term Monthly Evapotranspiration Modeling by Several Data-Driven Methods without Climatic Data. Comput. Elect. Agric.115, 66–77. 10.1016/j.compag.2015.04.015

  • 18

    KisiO.KhosraviniaP.HeddamS.KarimiB.KarimiN. (2021). Modeling Wetting Front Redistribution of Drip Irrigation Systems Using a New Machine Learning Method: Adaptive Neuro- Fuzzy System Improved by Hybrid Particle Swarm Optimization - Gravity Search Algorithm. Agric. Water Manag.256, 107067. 10.1016/j.agwat.2021.107067

  • 19

    KisiO.SanikhaniH. (2015). Prediction of Long-Term Monthly Precipitation Using Several Soft Computing Methods without Climatic Data. Int. J. Climatol.35 (14), 4139–4150. 10.1002/joc.4273

  • 20

    LiawA.WienerM. (2002). Classification and Regression by Random forest. R. News2 (3), 18–22.

  • 21

    MehdizadehS.BehmaneshJ.KhaliliK. (2017a). Application of Gene Expression Programming to Predict Daily Dew point Temperature. Appl. Therm. Eng.112, 1097–1107. 10.1016/j.applthermaleng.2016.10.181

  • 22

    MehdizadehS.BehmaneshJ.KhaliliK. (2017b). Evaluating the Performance of Artificial Intelligence Methods for Estimation of Monthly Mean Soil Temperature without Using Meteorological Data. Environ. Earth Sci.76, 1–16. 10.1007/s12665-017-6607-8

  • 23

    MillánH.Ghanbarian-AlavijehB.García-FornarisI. (2010). Nonlinear Dynamics of Mean Daily Temperature and Dewpoint Time Series at Babolsar, Iran, 1961-2005. Atmos. Res.98, 89–101. 10.1016/j.atmosres.2010.06.001

  • 24

    MohammadiK.ShamshirbandS.MotamediS.PetkovićD.HashimR.GocicM. (2015). Extreme Learning Machine Based Prediction of Daily Dew point Temperature. Comput. Elect. Agric.117, 214–225. 10.1016/j.compag.2015.08.008

  • 25

    MohammadiK.ShamshirbandS.PetkovićD.YeeP. L.MansorZ. (2016). Using ANFIS for Selection of More Relevant Parameters to Predict Dew point Temperature. Appl. Therm. Eng.96, 311–319. 10.1016/j.applthermaleng.2015.11.081

  • 26

    NagannaS. R.DekaP. C.GhorbaniM. A.BiazarS. M.Al-AnsariN.YaseenZ. M. (2019). Dew point Temperature Estimation: Application of Artificial Intelligence Model Integrated with Nature-Inspired Optimization Algorithms. Water11 (4), 742. 10.3390/w11040742

  • 27

    QasemS. N.SamadianfardS.NahandH. S.MosaviA.shamshirbandS.ChauK.-w. (2019). Estimating Daily Dew point Temperature Using Machine Learning Algorithms. Water11 (3), 582. 10.3390/w11030582

  • 28

    RobinsonP. J. (2000). Temporal Trends in United States Dew point Temperatures. Int. J. Climatol.20 (9), 985–1002. 10.1002/1097-0088(200007)20:9<985::aid-joc513>3.0.co;2-w

  • 29

    SanikhaniH.DeoR. C.SamuiP.KisiO.MertC.MirabbasiR.et al (2018). Survey of Different Data-Intelligent Modeling Strategies for Forecasting Air Temperature Using Geographic Information as Model Predictors. Comput. Elect. Agric.152, 242–260. 10.1016/j.compag.2018.07.008

  • 30

    ShankD. B. (2006). Dew point Temperature Prediction Using Artificial Neural Networks. MS thesis. United Kingdom: Harding University.

  • 31

    ShankD. B.HoogenboomG.McClendonR. W. (2008). Dewpoint Temperature Prediction Using Artificial Neural Networks. J. Appl. Meteorol. Climatol.47 (6), 1757–1769. 10.1175/2007jamc1693.1

  • 32

    ShiriJ.KimS.KisiO. (2014). Estimation of Daily Dew point Temperature Using Genetic Programming and Neural Networks Approaches. Hydrol. Res.45 (2), 165–181. 10.2166/nh.2013.229

  • 33

    ShiriJ. (2018). Prediction vs. Estimation of Dewpoint Temperature: Assessing GEP, MARS and RF Models. Hydrol. Res.50 (2), 633–643. 10.2166/nh.2018.104

  • 34

    SnyderR. L.Melo-AbreuJ. P. D. (2005). Frost Protection: Fundamentals, Practice and Economics, 1. Rome: Food and Agricultural Organization of the United Nations.

  • 35

    TanJ.XieX.ZuoJ.XingX.LiuB.XiaQ.et al (2021). Coupling Random forest and Inverse Distance Weighting to Generate Climate Surfaces of Precipitation and Temperature with Multiple-Covariates. J. Hydrol.598, 126270. 10.1016/j.jhydrol.2021.126270

  • 36

    TrigilaA.IadanzaC.EspositoC.Scarascia-MugnozzaG. (2015). Comparison of Logistic Regression and Random Forests Techniques for Shallow Landslide Susceptibility Assessment in Giampilieri (NE Sicily, Italy). Geomorphology249, 119–136. 10.1016/j.geomorph.2015.06.001

  • 37

    Zounemat-KermasniM. (2012). Hourly Predictive Levenberg–Marquardt ANN and Multi Linear Regression Models for Predicting of Dew point Temperature. Meteorol. Atmos. Phys.117, 181–192. 10.1007/s00703-012-0192-x

Summary

Keywords

dew point temperature, random forest, multivariate adaptive regression splines, machine learning, big data, artificial intelligence

Citation

Zhang G, Bateni SM, Jun C, Khoshkam H, Band SS and Mosavi A (2022) Feasibility of Random Forest and Multivariate Adaptive Regression Splines for Predicting Long-Term Mean Monthly Dew Point Temperature. Front. Environ. Sci. 10:826165. doi: 10.3389/fenvs.2022.826165

Received

30 November 2021

Accepted

11 March 2022

Published

04 April 2022

Volume

10 - 2022

Edited by

Hong Liao, Nanjing University of Information Science and Technology, China

Reviewed by

Saeid Mehdizadeh, Urmia University, Iran

Wei Sun, Sun Yat-Sen University, China

Updates

Copyright

*Correspondence: Changhyun Jun, ; Shahab S. Band,

This article was submitted to Atmosphere and Climate, a section of the journal Frontiers in Environmental Science

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics