Can Machine Learning be Applied to Carbon Emissions Analysis: An Application to the CO2 Emissions Analysis Using Gaussian Process Regression

In this paper, a nonparametric kernel prediction algorithm in machine learning is applied to predict CO2 emissions. A literature review has been conducted so that proper independent variables can be identified. Traditional parametric modeling approaches and the Gaussian Process Regression (GPR) algorithms were introduced, and their prediction performance was summarized. The reliability and efficiency of the proposed algorithms were then demonstrated through the comparison of the actual and the predicted results. The results showed that the GPR method can give the most accurate predictions on CO2 emissions.


INTRODUCTION
As the population of the earth is being exponentially increasing, the exhaustion of carbon dioxide is increasing day by day resulting in the extreme overheating of the environment, which has become a significant reason for climate change. Global efforts to mitigate climate change were focused on the reduction of future days with extreme overheating of the environment. There are many research and surveys that have been conducted by various scientists, students, and other officials which were about the reasons for the high emission of CO 2 in different countries. Most of the empirical studies took the parametric modeling approach to analyze the factors that initiate and support the emission of CO 2 . However, the traditional parametric approach optimizes a function to a known form with a set of finite and pre-determined parameters. This rigidity limits the predictive power of the parametric models. In recent years, nonparametric machine learning techniques have played dominantly with the enhancement of the forecast.
In this paper, a Bayesian nonparametric kernel prediction algorithm in machine learning is applied to predict CO 2 emissions. A literature review has been conducted so that the proper independent variables have been identified. Classical least squares, robust least squares, and algorithms of the GPR were introduced and their prediction performance, including the evaluation criteria that are effective in the measurements for model performance, were summarized. The reliability and efficiency of the proposed algorithms were then demonstrated through the comparison between the actual data and the predicted results. It is found that GPR can give the most accurate predictions on CO 2 emissions.

LITERATURE REVIEW
The growth of the economy, energy utilization, and CO 2 emissions are deeply related to each other. Kolstad and Krautkraemer (1993) point out that while the use of resources like the energy has a bright side on growth, it has negative environmental impacts. Traditional growth theories like the Solow growth model failed to consider the environmental impacts of growth (Solow, 1956). More modern growth theories study the interrelationship among energy, the environment, and economic growth (see, for example, Kolstad and Krautkraemer (1993), Jorgenson and Wilcoxen (1993) or Xepapadeas (2005) for a brief review).
Empirical studies depict that the growth of the economy and the ingestion of energy incorporates with the process of CO 2 emission. Recently, Hu et al. (2020) study the dominant reasons for carbon emission among the Belt and Road countries and find that CO 2 emissions have increased significantly due to economic growth. Similarly, Shabaz et al. (2013) found that in Indonesia, the emission of CO 2 increased for the extreme boost of the economic zone, while Shahbaz et al. (2016) found that economic growth led to CO 2 emissions in Bangladesh and Egypt. Meanwhile, other studies discovered a bilateral causal relationship among the three variables. Munir et al. (2020) prove the fact that there is a relation of aftermaths and economy between GDP and energy ingestion in the major countries of the ASEAN (Association of Southeast Asian Nations), while Liu and Hao (2018) find that in energyexporting countries, there is a bilateral relationship which may be a full-duplex connection between CO 2 emissions, energy utilization, and GDP per capita. Similarly, a repeating loop effect is observed between energy ingestion, CO 2 emission, and the advancement of the economy by Kahouli (2018). Accordingly, Mohmannd et al. (2020) observed the working principle of the causal relationship among transportation infrastructure, economic growth, and transportation emissions from 1971 to 2017 in Pakistan. The results show short-term causality from transportation infrastructure, economic growth, fuel consumption to CO 2 emissions, and the long-run relationship between economic advancement and infrastructure.
Apart from the growth and energy consumption, industrialization, population growth, and income level also contributed a great share in global carbon emissions. Minx et al. (2011) found that "industrialization" can be taken into consideration for the rapid increase of carbon dioxide emission in China from 2002 to 2007 while Zhang et al. (2014) found that the growth of the tertiary industry can decrease the CO 2 exhalation intensity. Nasir et al. (2021) examined the connection between the factors which are the exhalation of CO 2 , industrialization, growth of the economy, energy ingestion, and several connecting factors from 1980 to 2014 in Australia. The observations of those involved say that all variables affect CO 2 emissions. Li et al. (2021) discussed the effect of the growth and structure of the economy on per capita CO 2 emissions in 147 countries from 1990 to 2015. The results show that at the global level, economic growth and economic structure are the most significant positive and positive effects, respectively. Studies on population have thus far concentrated on the relationship between population growth and emission increase. The effect of population growth on CO 2 emissions can be summarized as follows (Birdsall, 1992): On one side, the energy demand was increased for power generation, industry, and transport. On the contrary, it increased deforestation emissions due to population growth. Empirically, Knapp and Mookerjee (1996) conducted a Granger causality test on annual data from 1880 to 1989 to determine the connecting clauses between global population expansion and carbon dioxide exhaustion. The results show there is a short-term dynamic relationship between the exhaustion of carbon dioxide and population growth. Very recently, Zhang et al. (2020) analyzed the knot between CO 2 emissions, GDP, and fuel ingestion in China and ASEAN countries. It was found that carbon density, energy intensity, GDP, and population are positively correlated with CO 2 emissions. Empirical findings also show that the developing countries are facing the effect of overpopulation and that's why, they are facing more of a carbon emissions record per year other than the developed countries (Shi, 2003).
In the past decade, the theory and methodology of the Environmental Kuznets Curve (EKC) have been used to analyze the relationship between the net income and exhaustion of carbon of an area (Dinda, 2004;Williams and Rasmussen, 1996). According to the EKC, at relatively lowincome levels, emissions increase as income increases. After a certain point, emissions will decline with income. Thus, the emission of CO 2 varies concerning the level of income. Luo et al. (2021) investigated the influencing factors of Shanghai's CO 2 emissions from 1995 to 2017. They found that personal disposable income is one of the top drivers of CO 2 emissions. Yuan et al. (2014) examined the long-term relationship between China's per capita income, ingestion of energy, and the emission of CO 2 from 1953 to 2008. They found out, there is a unilateral Granger inter-relation between the gross national income and the emission of CO 2 .
Based on the literature above, it concludes that economic upgradation, energy utilization, manpower density, industrialization, and income can be classified as the predominant factors affecting CO 2 emissions. Other factors might also affect CO 2 emissions in China. For example, R&D (Nguyen et al., 2020;Jones, 1995), financial development (Bhattacharya et al., 2017;Zaidi et al., 2019;Wang et al., 2020), the degree of foreign direct investment (Essandoh et al., 2020;Le et al., 2020;Khan and Rana, 2021. etc). This paper limits the focus on how well the different prediction models perform based on the information set which includes only the most predominant driver of CO 2 emissions and excludes those unimportant ones to be captured by the stochastic terms in the models.

METHODOLOGY
Gaussian Process Regression (GPR) method can be introduced as a non-parametric Bayesian regression method (Gershman and Blei, 2012 and outputs and lets the data determine the complexity of the underlying functions through the means of Bayesian inference (Williams, 1998). Considering the output y of a function w at input x with independent and identically distributed random noise ε ∼ N 0, σ 2 n . The function accompanied with the distributed random noise can be presented as: In classical linear regression, w(x) is deterministic whereas the noise term is random. In Gaussian process regression, however, w(x) is assumed to be random and follows a Gaussian process. A Gaussian process is an extension of multivariant Gaussian distribution to infinite dimensions; any finite subset sampled from the Gaussian process follows multiple Gaussian distributions (MacKay, 1998). The distribution over functions can be described with the help of the Gaussian process, where x is applied as the input variable, m(x) is denoted as the mean function, finally, k(x, x ' ) is known as covariance function. These two functions are defined respectively as: A finite collection of function values sampled from the Gaussian process follows multiple Gaussian distributions: where K is a n × n (n by n) matrix with the entries K ij k x (i) , x (j) and μ has entries μ i m(x (i) ). Given a training set that contains observation points y [y(x (1) ), y(x (2) ), . . . , y(x (n) )] T and function values w [w(x (1) ), w(x (2) ), . . . , w(x (n) )] T , it follows that the conditional distribution p(y w) and the Gaussian prior p(w) are N (w, σ 2 ε I) and N (μ, K), respectively. By definition, the set of observations y and the set of function values w follow a joint multivariate Gaussian distribution. The join distribution p(w(x p ), y) is defined as Here, I will be considered as the identity matrix, σ 2 ε is the unknown variance of the random noise and k p ) i k(x p , x (i) for i 1, 2, . . . , N. Using the Bayesian rule, the predictive posterior, p(w(x p )|y) ∼ N (w p , Σ p ), can be obtained, and the mean w p and variance Σ p are defined by The covariance function determines the characteristics of the Gaussian method that can be expressed as k(x (i) , x (j) ). The covariance function models the dependence between the function values at different input points x (i) and x (j) . The covariance function is often called the kernel of the Gaussian process. There are many possible options for the prior covariance function. A popular kernel is the exponential covariance function which allows the model to general a non-negative definite covariance matrix for any set of input points (Williams and Rasmussen, 1996). The exponential covariance function is defined as where I is the characteristic length scale, σ 2 f is the signal variance, and δ ij is a Kronecker delta. The Gaussian process regression employs a set of hyperparameters θ including I, σ 2 f and σ 2 n to increase or reduce the priority correlation between points and consequentially the variability of the resulting function. The hyperparameters θ can be optimized based on the log-likelihood framework: More details about the regression process of Gaussian can be researched and acknowledged in the book of Williams and Rasmussen (2006), available free online and is accessible via the link: www.GaussianProcess.org/gpml.

EMPIRICAL RESULTS
A literature review has been conducted so that five independent variables; namely: economic growth, energy consumption, population, industrialization, and income, have been identified. In this study, the GPR method and the other proposed algorithms are applied to study carbon emissions in China. Economic growth is approximated by GDP (100 million RMB), energy consumption is approximated by per capita energy consumption (tons of standard coal), the population is approximated by population size (10,000 people), industrialization is weighted by the percentage of secondary industry in China, and income is measure by the average annual salary (RMB).
The data of GDP, population size, energy consumption, percentage of secondary industry, and average annual salary are collected from the China City Statistical Yearbook. CO 2 emissions data come from four main sources of energy consumption. These are electricity, fuel, heating, and transportation. Those data can be obtained and calculated through the China Urban Construction Statistical Yearbook, the China City Statistical Yearbook, and the submerged government Panel on the change of weather and climate. Since some of those data is not available after 2014, the data in this paper range from the year 2002-2014.

Statistical Analysis of Prediction Results
The commonly used criteria in prediction performance are used in this study to evaluate the validity of the fitting. In Table 1, the root means squared error (RMSE), the mean squared error (MSE), the R-square, and the mean absolute error (MSE) are shown, where a well-fitted model should have R-square close to 1, whereas the RMSE, the MSE, and the MAE should be as small as possible. As per the observation from Frontiers in Energy Research | www.frontiersin.org September 2021 | Volume 9 | Article 756311 Table 1, Exponential GPR provides the best fit data as it has the smallest RMSE, MSE, and MAE, and an R-square closest to 1.

Data Visualization
Since the data set is large, which made it difficult to demonstrate and view the whole set of data, visualization methods are typically needed especially for representative scenarios. The prediction results were analyzed at the model level to see the allover authenticity of the three models and at the individual component level to get a picture of the estimates produced by the three models over the range of some particular variable. At the overall level, the comparison and deviation of the actual value and the predicted dimension of the emission of carbon dioxide are determined. Figure 1 demonstrates the comparison of actual value and prediction of CO 2 emissions predicted by the three models; for each model, the predicted value is plotted against the actual value. To have a good fit, each plot should resemble a straight line at 45°. However, compare with the exponential GPR model, for the classical least squares model and the robust least-squares model, the predicted values are larger than the actual values over the range of 3.5-4 logarithm units of CO 2 emissions. This means that the classical least squares model and the robust least-squares model are overestimating CO 2 emissions over a particular range compare with the exponential GPR model. The same issue can be observed from Figure 2 which shows the deviation of actual value and prediction of CO 2 emissions for the three models. Figure 2 shows that, compare with the other two models, the deviations for the exponential GPR model cluster more closely around the horizontal line which represents no deviations. It suggests that the exponential GPR model provides a much better fit than the other two models.
Apart from analyzing the prediction results at the overall model level, the all over performance of the three models is also be evaluated at an individual component level. At the individual component level, the estimates produced by the selected models are analyzed over the extended range of some particular variables. Figures 3-7 below plot the actual and predicted values of CO 2 emissions against each of the most predominant factors of the models. Figure 3 plots the predicted values of CO 2 emissions against the logarithm of the GDP measured in 10,000 Chinese Yuen. Ideally, it's convenient if the predicted values are as much closer possible to the actual values for all conducted observations. As shown in Figure 3C predicted CO 2 emissions are quite close to the actual values predicted by using the logarithm of GDP. Even though a small number of deviations can be observed. On the contrary, Figures 3A,B revealed that the classical least squares and the robust least-squares overestimate the CO 2 emissions over the range of 3.2-3.7 logarithm units of GDP. It implies that conditioning on GDP, the Exponential GPR model provides more accurate CO 2 emissions predictions compare with the other two models. Figures 4-6 show similar results. The predicted CO 2 emissions by using the exponential GPR model are tensed to the actual values over the entire range of population size (see Figure 4), the energy consumption (see Figure 5), and the level of industrialization (see Figure 6). However, when the classical least squares and the robust  Exponential GPR. Notes: 1) The horizontal axis represents actual CO 2 emissions in logarithm, and the vertical axis represents predicted CO 2 emissions in logarithm. 2) CO 2 emissions are measured in ten thousand tons of standard coal.
Frontiers in Energy Research | www.frontiersin.org September 2021 | Volume 9 | Article 756311 least-squares model are used, extreme deviations between the actual value and predicted value can be observed. In Figures 4A,B, it is determined that the classical least squares and the robust leastsquares model overestimate the CO 2 emissions over the range of 2.7-2.8 logarithm units of population size. Similarly, in Figures  5A,B, CO 2 emissions are overestimated by the classical least squares and the robust least squares models over the range of 0.5-1 logarithm units of per capita energy consumption. In Figure 6, although not obvious, CO 2 emissions are overestimated by the classical least squares and the robust least squares models over the range of 40-50% of secondary industry in China. Figure 7 shows how the predicted values deviate from the actual values when the independent variable is non-Gaussian for the presence of threshold data points from extreme references. As with the evidence presented above, extreme upward bias over a particular range can be observed when the classical least squares and the robust least squares are used; the models overestimated CO 2 emissions over the range of 4.5-to 4.75 logarithm units of average annual salary. The extreme bias disappears when the exponential GPR model is used. Moreover, when the exponential GPR model is used, the deviations between the actual values and the predicted values are smaller for the extreme data values observed over the range of 1-to 1.5 and 5 to 5.5 logarithm units of average annual salary.
In summary, Figures 3-7 show that predicted CO 2 emissions conditional on individual components (i.e., GDP, population size, energy consumption, and industrialization) are quite close to the actual values predicted using the exponential GPR model. Even the underlying distribution of FIGURE 2 | The deviation of actual value and prediction of CO 2 emissions between the selected models. (A) Classical Least Squares, Robust Least Squares, Exponential GPR. Notes: 1) The horizontal axis represents actual CO 2 emissions in logarithm, and the deviation of the actual extremity of the CO 2 emission from the predicted value is represented by the vertical axis. 2) CO 2 emissions are measured in ten thousand tons of

CONCLUSION AND FUTURE WORKS
In this paper, the Gaussian process regression method is proposed for CO 2 emissions analysis in China. The traditional linear regression approach is limited by its rigid functional form and the approach often encounters an overfitting problem. The Gaussian progress regression approach relaxes the parametric assumption by applying the Bayesian nonparametric inference approach. The preciseness and exactitude of the prediction of the exponential GPR were compared and discussed with the classical least squares and the robust least-squares model. Based on the outcome of the whole study, it is proved that the Gaussian progress regression algorithms can give the most accurate predictions on CO 2 emissions compared with the other two traditional models and thus is applicable for CO 2 emissions prediction analysis to enhance forecast performance.  The prediction performances of the selected methods discussed only focus on the six predominant factors affecting carbon emissions. Future research should focus on further reviewing the completeness of the set of driving factors and the effectiveness of model predictions, compared them with other commonly used models.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
NM proposed the conceptualization, methodology and funding acquisition. WS gave the formal data analysis and wrote the original formal draft. TH performed the data collection and original arrangement. FL gave formal methodology, writing-review and editing.