Driving Factors of CO2 Emissions: Further Study Based on Machine Learning

Greenhouse gases, especially carbon dioxide (CO2) emissions, are viewed as one of the core causes of climate change, and it has become one of the most important environmental problems in the world. This paper attempts to investigate the relation between CO2 emissions and economic growth, industry structure, urbanization, research and development (R&D) investment, actual use of foreign capital, and growth rate of energy consumption in China between 2000 and 2018. This study is important for China as it has pledged to peak its carbon dioxide emissions (CO2) by 2030 and achieve carbon neutrality by 2060. We apply a suite of machine learning algorithms on the training set of data, 2000–2015, and predict the levels of CO2 emissions for the testing set, 2016–2018. Employing rmse for model selection, results show that the nonlinear model of k-nearest neighbors (KNN) model performs the best among linear models, nonlinear models, ensemble models, and artificial neural networks for the present dataset. Using KNN model, sensitivity analysis of CO2 emissions around its centroid position was conducted. The findings indicate that not all provinces should develop its industrialization. Some provinces should stay at relatively mild industrialization stage while selected others should develop theirs as quickly as possible. It is because CO2 emissions will eventually decrease after saturation point. In terms of urbanization, there is an optimal range for a province. At the optimal range, the CO2 emissions would be at a minimum, and it is likely a result of technological innovation in energy usage and efficiency. Moreover, China should increase its R&D investment intensity from the present level as it will decrease CO2 emissions. If R&D reinvestment is associated with actual use of foreign capital, policy makers should prioritize the use of foreign capital for R&D investment on green technology. Last, economic growth requires consuming energy. However, policy makers must refrain from consuming energy beyond a certain optimal growth rate. The above findings provide a guide to policy makers to achieve dual-carbon strategy while sustaining economic development.


INTRODUCTION
Greenhouse gases, especially carbon dioxide (CO 2 ) emissions, are viewed as one of the core causes of climate change, and it has become one of the most important environmental problems in the world (Rehman et al. (2021a)). At the press conference on WMO State of the Climate 2019 Report, António Guterres, UN Chief, reported that 2019 was the second hottest year on record during his opening remarks. According to the World Meteorological Organization's (WMO) flagship State of the Global Climate report, the global average temperature in 2020 was about 1.2°C above preindustrial level.
To mitigate the threat of runaway climate change, the Paris Agreement calls for limiting global warming to well below 2 and preferably to 1.5°C, compared to preindustrial levels. This requires global emissions to peak as soon as possible, with a rapid fall of 45 percent from 2010 levels by 2030, and to continue to drop off steeply to achieve net zero emissions by 2050 (Bertram et al., 2021). The world is way off track in meeting this target at the current level of nationally determined contributions. Global greenhouse gas emissions of developed countries and economies in transition have declined by 6.5 percent over the period 2000-2018. Meanwhile, the emissions of developing countries are up by 43.2 percent from 2000 to 2013. The rise is largely attributable to increased industrialization and enhanced economic output measured in terms of GDP.
Carbon dioxide emissions have been the primary source of extreme environmental pollution (Rehman et al. (2021a)). With the rapidly growing agriculture and farm mechanization, agricultural sector has become a factor in the surge in CO 2 emissions and other greenhouse gases in the globe (Rehman et al. (2021b)).
Economic, social, and environmental suitability are the three core pillars of the UN's Sustainable Development Goals (SDG) declarations (Rehman et al. (2021a)). In September 2019, Heads of State and Government gathered in the SDG Summit at the United Nations Headquarters in New York to follow up and comprehensively review progress in the implementation of the 2030 Agenda for Sustainable Development and the 17 Sustainable Development Goals (SDGs). The summit resulted in the adoption of the Political Declaration and its core message is to take action to respond to climate emergencies. Relevant research shows that if economic growth and climate and environmental sustainability are achieved at the same time, emission reduction policies need to be incorporated into the economic growth policies of various countries , Li et al. (2021), Rehman et al. (2021a)).
As the world's second largest economy, the Chinese government strives to achieve environmental sustainability through a series of policies and measures. At the General Debate of the 75th session of the United Nations General Assembly on 22nd September 2020, President Xi Jinping of China announced that China will scale up its Intended Nationally Determined Contributions by adopting more vigorous policies and measures. It also aims to have CO 2 emissions peak before 2030 and achieve carbon neutrality before 2060.
Generally speaking, various economic activities will affect carbon emissions, Liu et al. (2021). They include industrial structure (Shen et al., 2021), energy consumption, trade, and urbanization (Kasman and Duman, 2015), consumption structure of fossil fuel and cleaner fuel , foreign investment (Elliott and Sun, 2013), and technology advancement (Yu and Du, 2018).
Based on this background, this paper studies the relation between China's economic growth, industrial structure, urbanization, R&D investment, foreign investment, energy consumption growth, and CO 2 emissions from 2000 to 2018 and predicts it.
Choosing China as an ideal case to study driving factors on CO 2 emissions is because China has accounted for the highest level of CO 2 emissions across the globe in 2017 (Ma et al., 2021). President Xi Jinping addressed the General Assembly of United Nations and declared China's national goal of turning carbon neutral by 2060. China is an important country to play a key role in achieving the 2030 Sustainable Development Agenda of the United Nations. In order to achieve the 2030 Sustainable Development Agenda of the United Nations and the Paris Agreement at the same time, China must achieve the carbon emissions peak by 2030 and the carbon neutrality by 2060 while sustaining a certain economic growth. To this end, China has formulated a "dual-carbon" strategy. Therefore, it is vital to study the drivers that influence CO 2 emissions. Economic growth and CO 2 emissions go hand in hand as economic activities give rise to CO 2 emissions. Therefore, economic growth is the core factor affecting CO 2 emissions. Industrialization and urbanization are the two main lines of China's economic and social development that includes the CO 2 emissions of the production side and the consumption side, respectively (Cao et al, 2016;Han et al., 2019). Industrialization and urbanization are compound factors affecting carbon emissions. It is because the process of industrialization and urbanization includes the factors driving CO 2 emissions and limiting CO 2 emissions. Industrialization has brought the change of industrial structure, and the CO 2 emissions of different industries are different. On the one hand, urbanization has an impact on the CO 2 emissions caused by residents' consumption, which is quite different between urban residents and rural residents. On the other hand, urbanization is the movement of industries and population in different areas. Therefore, urbanization also reflects the different performance of carbon emissions in urban area and rural area. Technological progress, foreign investment, and energy consumption are the specific factors of CO 2 emissions, technological progress reduce CO 2 emissions by exploring and usage of clean energy, foreign investment reflects the pollution haven (tax environmental regulation, good market access to high-income countries, and corruption opportunities) (Candau and Dienesch, 2017), and energy consumption determines the quantity of CO 2 emissions. This paper contributes to the literature in two ways. 1) This is a comprehensive research; we try to build a framework which includes three levels of six driving factors on CO 2 emissions as shown in Figure 1. The most important factors include economic growth, industrialization, urbanization, technology progress, foreign direct investment, and energy consumption. 2) Most of the existing studies are based on OLS framework to explore the relation between carbon emissions and related factors. It is difficult to avoid the omission of variables or endogeneity issues, Kasman and Duman (2015). An increasing number of recent studies (Li et al., 2021;Liu et al., 2021) have been using cross-sectionally augmented autoregressive distributed lag (CS-ARDL) approach developed by Chudik and Pesaran (2015) for short-and long-term CO 2 emissions forecast. This research applies a suite of machine learning algorithms in predicting CO 2 emissions using the factors discussed. Machine learning avoids omission of variables and endogeneity issues. In addition, the trends and relation between CO 2 emissions and various factors are predicted.
The rest of this paper is organized as follows. Literature Review provides a literature review on CO 2 emissions. Data and the Variables describes the data and variables under study. Methodology describes the machine learning algorithms deployed for predicting the level of CO 2 emissions. Results compares the accuracy of predictions among various machine learning algorithms. Discussions discusses the results using the best performing model, while Conclusion and Policy Implications concludes the paper.

LITERATURE REVIEW
Economic scale, economic structure, and technological level are the three major factors affecting the environment (Grossman and Krueger, 1995). Economic scale is the output of the economy; more economic output means more pollution. It is because that economic growth needs more resources investment and more energy consumption. Economic structure is industry structure. The change of industry structure will reduce the pollution. With economic developing, percentage of secondary industry, especially energy-intensive industry, will reduce percentage of tertiary industry, and energy consumption will increase, so the pollution will be reducing. Technology progress will realize the usage of resource efficiency and reduce the energy consumption. So, technological level is an important factor which influences the energy intensity and pollution. Many research studies are based on these three environment factors and extend them accordingly. The research can be classified into relation between economic growth and CO 2 emissions, industry structure, technology, and CO 2 emissions, and urbanization and CO 2 emissions. But results differ from research focus, theories, and methods. There are three parallel literatures on factors what will influence CO 2 emissions.
The first group of studies has investigated the relation between CO 2 emission, economic growth, and energy consumption.
Environmental Kuznets Curve (EKC) is often used to discuss the relation between environmental pollution and economic growth, which is also the main method to analyze the relation between CO 2 emissions and economic growth (Lin and Jiang, 2009). Grossman and Krueger (1991) found the U-shaped relation between economic growth and CO 2 emissions. But the result is opposite if CO 2 is used as the environmental indicator. Holtz-Eakin and Selden (1995), Sachs et al. (1999), Friedl and Getzner (2003), and Galeotti et al. (2006) found that the relation between CO 2 emission and economic growth is inverted U-shape. It is opposite in the study of Shafik (1994), Martin (2008) and Murshed and Dao (2020) which find that per capita CO 2 emission increased in parallel with per capita income, and there is no turning point. Moomaw and Unruh (1997), Martinez-Zarzoso and Bengochea-Morancho (2004), Friedl and Getzner (2003), and Akpan and Chuku (2011) found that the relation between CO 2 emission and economic growth is N-shape. Saidi and Hammami (2015) examined the effect of energy use and the CO 2 emissions on economic growth for 58 countries, and their empirical results showed that CO 2 emissions negatively affected economic growth. Rahman et al. (2020), Liu et al. (2012), and Lantz and Feng (2006) found that per capita GDP has no relation with CO 2 emission.
Environmental Kuznets Curve describes the economic growth in developed countries and the inverted U-shaped relation between environmental pollution, consciously or unconsciously, as for the developed countries to adjust economic structure and the energy consumption structure and achieve a faster pace of the inverted U-shaped path, the overall environmental quality as economic growth accumulation showed a trend of deterioration before improvement (Lin and Jiang, 2009). Acheampong (2018) found that energy consumption has a negative impact on economic growth in global level, economic growth has a negative impact on CO 2 emission, and CO 2 emission has positive impact on economic growth. In the Asia-Pacific region, economic growth does not cause CO 2 emissions. But in Caribbean-Latin America, there is a feedback causality between economic growth and carbon emissions.
The second group of studies has investigated the relation between CO 2 emissions, industry structure, and technology progress. Bernardini and Galli (1993) found that the decline in energy intensity shows a decline trend with the increase in income. The three reasons behind the relationship descent are the following. First of all, with the development of the economy, the final demand structure changes with changes in the stage of industrialization. In the preindustrial stage, agriculture is the leading industry in economic development, and economic growth is driven by basic needs, which can be met with low energy intensity. In the stage of industrialization, the infrastructure network needs to be built up to facilitate large-scale production and consumption. The primitive accumulation of capital stock related to industrialization can increase energy intensity, but it eventually reached the saturation point. At this time, the consumption of materials tended to replace durables rather than create durables. In the postindustrialized stage, the decline of manufacturing industry in relation between services and energy intensity in service-based economies is smaller than that in manufacturing-oriented economies. Shahbaz et al. (2018) and Khan et al. (2019) found that financial development helps control CO 2 emissions in both France and China. However, Liu et al. (2021) found that with 1% financial development, CO 2 emissions increased 0.17-0.52%. Technology progress is the dominant factor of long-run economic growth with scarce resources. Technology change has a positive influence on energy efficiency and negative influence on energy intensity (Lin and Du, 2014;Sadorsky, 2013;Yu et al., 2021). Ang (2009) used the framework to combine modern growth theoretically, which can analyze the role of R&D activity and technology progress in reducing pollution. Technology progress is the result of R&D investment, which contributes to energy intensity reduction (Young, 1998). Wei et al. (2010) extended Antweiler's model (Antweiler et al., 2001) to analyze the influence factors of CO 2 emissions. The study found that GDP, industrialization, and free trade have positive influence on CO 2 emissions, but independent research and development and technology import contribute to reducing CO 2 emissions.
One source of technology progress is independent innovation; another source is FDI and trade. FDI and trade are latecomer advantage of countries, which develops later. Elliott and Sun (2013) found that FDI has negative influence on energy intensity. The last study  investigates the roles of export diversification and composite country risks in carbon emissions abatement. The researchers found that lowering country risks, undergoing renewable energy transition, and enhancing environmental-related technological innovations assist in reducing CO 2 emissions in the long run.
The third group of studies has investigated the relation between CO 2 emissions and urbanization. At present, there are a large number of literatures on urbanization and its impact on carbon dioxide (CO 2 ) emissions for reference. A lot of research have directly investigated the positive impact of urbanization on carbon dioxide emissions (Behera and Dash, 2017;York et al., 2003;Zhang and Lin, 2012). Shahbaz et al. (2017) provided evidence showing that the development of urbanization leads to higher demands for food, housing, transportation, land usage, and energy consumption and causes serious environmental degradation problems. For instance, traffic congestion, waste management, and poor sanitation could cause pollution and health problems in most urban areas.
A number of studies have tested the linear impact of urbanization on global carbon dioxide emissions. You can find contributions that support it, such as those by York et al. (2003), Cole and Neumayer (2004), Liddle and Lung (2010), Wang et al. (2012), and Behera and Dash (2017), or those that refute it, such as those by Hossain (2011) and Liu and Bae (2018). Specifically, York et al. (2003) used panel data from 143 countries to record the positive impact of urbanization on CO 2 . Cole and Neumayer (2004) and Liddle and Lung (2010) reached similar findings using panel data and Stochastic Impacts by Regression on Population, Affluence, and Technology (STIRPAT). Wang et al. (2012) applied PLS with STIRPAT model in Beijing, China, and concluded that urbanization is the most influential factor that has adverse impact on environmental quality. Subsequently, Wang et al. (2013) found that urbanization, industrial growth, income levels, and population stimulate CO 2 emissions during a provincial study. Behera and Dash (2017) used panel cointegration test to study the positive impact of urbanization on carbon emissions in South and Southeast Asian countries. Conversely, several studies proposed uncertain results and reported the negligible impact of urbanization on CO 2 emissions (Hossain, 2011;Liu and Bae, 2018).
To a large extent, the level of economic development of the country may alleviate the nature of the relation between urbanization and pollution (Fan et al., 2006;Li and Lin, 2015;Poumanyvong and Kaneko, 2010). However, higher urbanization growth rates and development rates can improve the environment by promoting technological innovation in energy usage and efficiency, increasing awareness of environmental issues, and using green technologies, Bekhet and Othman (2017). Urbanization has an inverted U-shaped relation with CO 2 emissions in Asia (Fan et al., 2020). However, Zhu et al. (2012) found there is limited support for inverted U-shaped relation between CO 2 emissions and urbanization in 20 emerging economies. There was a long-run bidirectional positive relation between CO 2 emissions, urbanization, and energy consumption in MENA countries. However, the longrun relation is based on the countries' income and development (Al-mulalia et al., 2013). Urbanization has a positive influence on CO 2 emissions; in the stage of urbanizing, it needs more energy consumption which will increase CO 2 emissions (Lin and Du, 2013). CO 2 emissions are higher in big cities or urban agglomeration areas, because of the high energy consumption on residential electricity consumption, residential gas consumption, residential heating consumption, and residential transportation energy consumption (Bai et al., 2019). In east and central China, the center and surroundings featured high levels (high-high cluster) of total CO 2 emissions and low levels (lowlow-cluster) of per-unit-GDP CO 2 emission in urban agglomerations. The Yangtze-River-Delta, the Beibu-Gulf, and the Guangdong-Hong Kong-Macao UAs were more efficient at emission reduction with the cities' rising scales, while cities of the Beijing-Tianjin-Hebei UA and the Chengdu-Chongqing UA performed less efficiently (Cui et al., 2020).
Two main methodologies are used by three groups of studies. One of the methodologies is econometrics. Econometric methods Frontiers in Environmental Science | www.frontiersin.org August 2021 | Volume 9 | Article 721517 include spatial autocorrelation analysis, semiparametric fixed effect (Zhu et al., 2012), panel threshold regression (Zi et al., 2016), panel threshold regression (Du and Xia, 2018), autoregressive distributed lag model and vectorerror correction model (Bekhet and Othman, 2017), twostage least squares (2SLS) and augmented Stochastic Impacts by Regression on Population, Affluence, and Technology (STIRPAT) model (Bai et al., 2019), and autoregressive distributed lag (ARDL) (Ang, 2009). Econometric methods have been used to estimate the long-run relationship and the short-run dynamics for environmental pollution and its determinants. To address the issues of multicollinearity and overfitting, a recent study introduced the least absolute shrinkage and selection operator (LASSO) regression model which can pinpoint the most important determinants to investigate the driving factors influencing household carbon emissions (Shi et al., 2020). Another study on methods called cross-sectionally augmented autoregressive distributed lags (CS-ARDL) can account for cross-sectional dependency, slope heterogeneity, and structural break issues in the data (Li et al., 2021;Ma et al., 2021). The other methodology is calculating the quantity of CO 2 emissions. Many research studies are based on Kaya identity and Logarithmic Mean Divisa Index (LMDI) (Ang and Zhang, 2000). Using these methods, researchers calculate the industrial CO 2 emissions, regional CO 2 emissions, and national CO 2 emissions (Yang and Li, 2017). Based on LMDI, index decomposition analysis (IDA) is developed and becomes one of the most popular methods. However, IDA calculates the technology efficiency of economy system, not the efficiency of energy usage (Lin and Du, 2013). Wang (2011) developed the method based on productiontheory decomposition approach (PDA), which is based on output-oriented distance function to decompose the energy production to technology efficiency, technology program, and input alternative. Lin and Du (2014) gave a complex framework (L-D framework) of index decomposition and production theory. Then, Yang et al. (2019) used L-D framework to calculate CO 2 emissions of major industries.
There are two gaps in the above literature. Factor choice is confused by economic methods which do not support all factors (Shi et al., 2020). So, the studies always try to select one or two important factors. Actually, factors framework is a hierarchical structure, and they inevitably influence each other. Methodologies reviewed above are very useful and have been adopted with many successes. However, there are many restrictions such as collinearity and causality issues of variables. On the other hand, it is not necessary to consider these issues in machine learning. Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns, and make decisions with minimal human intervention. Machine learning aims to develop algorithms that can learn and create statistical models for data analysis and prediction. The ML algorithms should be able to learn by themselves, based on data provided, and make accurate predictions, without having been specifically programmed for a given task.

DATA AND THE VARIABLES CO 2 Emissions
The International Panel on Climate Change (IPCC) had introduced three methods of calculating CO 2 emissions (Y) from fossil fuel combustion in both stationary and mobile sources. "Method 1" is based on the amount of fuel burned and the emission factor, and it is achievable (Wang et al., 2010). Thus, this method is adopted by this paper accordingly. The method is specified as follows: In Eq. 1, CO 2 represents the amount of carbon dioxide emissions to be estimated; i represents various energy fuels, including coal, coke, coke oven gas, blast furnace gas, converter gas, other gas, crude oil, gasoline, kerosene, diesel, fuel oil, and liquefied petroleum, natural gas, and liquefied natural gas; E i represents the combustion consumption of various energy sources; NCV i is the average low calorific value of various energy sources, used to convert various energy consumption into energy units (TJ); CEF i represents carbon dioxide emission factor of the energy consumption, which is calculated by Eq. 2: In Eq. 2, CC i is the carbon content of energy sources. COF i is the carbon oxidation factor of energy sources; usually, the value is 1, which means that the energy is completely oxidized. In this paper, coal and coke are set to 0.99 and the rest is 1 (Chen, 2011

Industrial Structure Rationalization Index
Industrial structure rationalization (X1) reflects the coordination of different industries; moreover, it reflects the efficiency of energy usage (Gan et al., 2011). The Theil index measures the industrial structure rationalization (Gan et al., 2011). The Theil index is defined as the equation below: TL is the Theil index, Y is GDP, L is employment, i represents industries, and n represents industry sectors. When economy is equilibrium, TL 0, industrial structure is rational. The industrial structure rationalization index related data are derived from Chinese Statistical Yearbook (2001-2019).

Other Variables and Data
This paper also includes other important variables. They are GDP, urbanization, research and development (R&D) Frontiers in Environmental Science | www.frontiersin.org August 2021 | Volume 9 | Article 721517 5 investment, actual use of foreign capital, and growth rate of energy consumption. Data for GDP (X2) and actual use of foreign capital (X5)

METHODOLOGY
This study uses a number of machine learning algorithms, or function, f, to map the output variable (Y) from input variables (X1, X2, . . . , X6) so that Y f(X1, X2, . . . , X6). Several types of algorithms have been adopted in this study, and they are briefly described here.

Linear Models-Linear Regression, Lasso, and ElasticNet
When we make assumptions to the learning process, we can simplify the process a lot. However, they can also limit what can be learned. Algorithms that simplify the function to a known form are called linear models. Examples of this class include linear regression and logistic regression. In this study, we tested three linear models, and they are linear regression (LR), least absolute shrinkage and selection operator (LASSO), and Elastic Net (EN) that adds regularization penalties to the loss function during training. Linear models provide the benchmark to measure other machine learning algorithms. However, it is expected that linear models would not provide good prediction because CO 2 emissions are complicated and depend on many factors. Furthermore, it is not expected that those factors relate to CO 2 emissions linearly.

Nonlinear Models: Classification and Regression Tree, Support Vector Regression, and k-Nearest Neighbors Regression
When we do not make strong assumptions about the form of the mapping function, the algorithms are called nonlinear models. Examples of this class include Classification and Regression Tree (CART), Support Vector Regression (SVR), and k-Nearest Neighbors (KNN). These models are useful for problems involving datasets with large number of features, many of which may be correlated. As the name implied, CART works for both classification and regression problems. For SVR, as the name suggests, it is a regression algorithm, and it should not be confused with Support Vector Machine (SVM) which is for classification. The major difference between the two is there is only one slack variable in SVM and there are two slack variables in SVR during its optimization for locating the hyperplane. For KNN algorithm, it can be applied for both classification and regression problems. In classification, the algorithm tries to predict the class to which the output variable belongs by computing the local probability, while it tries to predict the values of the output variable by using a local average in regression. One of the strengths of machine learning is that it can work with nonlinear data. If a system is nonlinear (i.e., a system that contains CO 2 emissions and its six input variables), nonlinear models would be more appropriate.

Ensemble Methods
Traditionally, machine learning application consisted of a single learner (say, a Decision Tree). Then, ensemble methods were born, which involve using many learners to enhance the performance of any single one of them individually.

Bagging Methods: Random Forest and Extra Trees
Bagging is a method of merging the same type of predictions. The idea of bagging is then simple: we want to fit several independent models and "average" their predictions in order to obtain a model with a lower variance. In bagging, weak learners are trained in parallel using randomness, and each model receives an equal Frontiers in Environmental Science | www.frontiersin.org August 2021 | Volume 9 | Article 721517 6 weight. Bagging decreases variance, not bias, and solves overfitting issues in a model.

Boosting Methods: XGBoost, AdaBoost, and Gradient Boost
Boosting models fall inside this family of ensemble methods. Boosting is a method of merging different types of predictions. Boosting decreases bias, not variance. In boosting, models are weighted based on their performance. Boosting should not be confused with bagging. In boosting, the weak learners are trained sequentially.
AdaBoost is a specific boosting algorithm developed for classification problems. The weakness is identified by the weak estimator's error rate.
Gradient boosting approaches the problem a bit differently. Instead of adjusting weights of data points, Gradient boosting focuses on the difference between the prediction and the ground truth. XGBoost builds the model by calculating similarity scores between the observations that end up in a node. Also, XGBoost allows for regularization, reducing the possible overfitting of individual trees and therefore of the ensemble model.

Artificial Neural Networks
Neural networks consist of nodes connected by links. They have three types of layers: an input layer with a node for each input, hidden layers where learning occurs in training and inputs are processed on trained nets, and an output layer with a node for each target variable, which passes information outside the network. Learning takes place in the hidden layer nodes, each of which consists of a summation operator and an activation function. Note that for neural networks, the inputs should be scaled (i.e., standardized) to account for differences in the units of the data. This is important as scaling could improve the performance by a considerable margin (Chaudhari, 2019).
In recent times, ANNs have become popular and helpful model for classification, clustering, and pattern recognition in many disciplines (Abiodun et al., 2018). With its versatility, one would expect it will work well. However, neural networks usually require much more data than traditional machine learning algorithms. In fact, the amount of data required depends both on the complexity of the problem and on the complexity of chosen algorithm. Given that the present study has only 400 + rows of panel data, whether this would impose any limitation on the accuracy of this method remains to be seen.

RESULTS
To get the best results, it is necessary to understand the data first by inspecting their descriptive statistics and plotting their histograms. The descriptive statistics and histogram of the original data between 2000 and 2015 are shown in Figure 2A; Table 1, respectively. Looking at the data, it is revealed that better results could be obtained by taking the logarithm of X2, X4, X5, and Y. The descriptive statistics and histograms of the logarithms of X2, X4, X5, and Y are shown in Figure 2B; Table 2, respectively.
Data scaling is important for some machine learning algorithms, e.g., KNN and ANN, and less critical for some others such as linear regression. For consistency and easy comparison, the second step of data preparation is standardization of data with its mean and standard deviation rather than normalization of data with its maximum and minimum vales. It is because the data are Gaussian-like than bounded by a maximum and minimum as shown in the histograms.

Linear Models: Linear Regression, Lasso, and ElasticNet
Three linear models have been applied to the scaled dataset using k-fold cross validation. There is no formal rule for the choice of k. In the present study, we set k 5 so that the length of the validation data match that of the testing set (i.e., 2016-2018). The box-and-whisker plot of mean and standard deviation of each validation for the three models are shown in Figure 3. It can be seen that the mean and standard deviation for linear regression model is tighter than the other two linear models. However, after Lasso and ElasticNet models are tuned for their hyperparameters and used to fit the whole set of training data (i.e., without k-fold cross validation), the rmse between the prediction and the actual data of the training set (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015) are all the same at 0.5482. Furthermore, when they are applied to the testing set (2016)(2017)(2018), the rmse among the three models are practically the same at 0.6732. To show how good the models are, we plot the actual against the prediction in Figure 4. For a good fit, the points should be close to the dotted line. As it can be seen, we can hardly describe that linear models are able to predict CO 2 emissions. This prompts us to apply non-linear models accordingly.

Nonlinear Models: Classification and Regression Tree, Support Vector Regression, and k-Nearest Neighbors Regression
Similar to linear models, we applied k-fold cross validation to the three nonlinear models. The box-and-whisker plot of the three models is shown in Figure 5. It can be seen that the performance of SVR and KNN is better than that of CART. The graphs of actual against prediction for SVR and KNN are shown in Figures  6, 7, respectively. It can be seen in Figures 6, 7 that nonlinear models, especially KNN, have done much better in predicting CO 2 emissions. In particular, if you compare the actual against prediction graph, you can see the points are much tighter and closer to the dotted lines. The rmse are 0.1750 and 0.3641 for the training set and testing set of data when the number of neighbors is set to 2. This is a remarkable improvement over the linear models.

Ensemble Methods
In this study, five ensemble methods, 2 bagging and 3 boosting algorithms, are applied. As k-fold cross validation randomly divided the dataset, the box-and-whisker plots change every time we run. Figure 8 shows the typical results for four runs. It can be seen that Extra Trees consistently  outperformed the other four models in the present study. If we apply Extra Trees algorithm to fit the combined training and validation dataset, we can see it can fit the prediction almost perfectly with the actual data as shown in Figure 9A. However, when it is applied to the testing data in Figure 9B, it gives a rmse of 0.4128 when the number of trees (or estimators) is 20. One thing to note for ET model is that the rmse are relatively stable with respect to the number of trees: the values of rmse are 0.4370, 0.4128, and 0.4155 when the number of trees is 10, 20, and 50, respectively. Though ET model the best among the five ensemble models under study, its performance is not as good as the KNN model discussed above.

Artificial Neural Network
As mentioned in Artificial Neural Networks, there are three types of layers in ANN. To apply ANN, one needs to determine the number of layers and number of neurons used in each layer. On top of the k-fold cross validation that introduces randomness, the stochastic nature of the model results in different output every time we run the model. Therefore, it is necessary to experiment the combination of these parameters to get the best results. As the number of instances of our dataset is only slightly over 400, one hidden layer is sufficient after experimentation. After random search, it is found that the number of neurons should be between 6 and 15 in both input layer and hidden layer. Then, we run the model at least 10 times for each combination of neurons in the input layer and hidden layer. It is found that the best combination is six neurons in the input layer and 10 neurons in the hidden layer. With this configuration of the network, we run the model 30 times. Then, we take the average of the results, which is shown in Figure 10. The values of the rmse of the training and testing data are 0.2430 and 0.4849, respectively. It can be seen that though ANN model performs better than linear models, it is not as good as nonlinear models. Comparatively, its accuracy is only a distant second to KNN model. Based on rmse for model selection, the results presented above shows that KNN model performed the best, ANN model achieved a distant second and ET came third in predicting CO 2 emissions with the dataset described in Data and the Variables. In the next section, we shall make use of KNN model and perform sensitivity analysis that would enable policy makers in setting policies to reduce CO 2 emissions.

DISCUSSIONS
Having established that KNN model performs the best in the dataset, we attempt to use KNN model to perform sensitivity analysis of independent variables on CO 2 emissions. We would like to determine how the target variable, CO 2 emissions, is affected based on changes in other input variables. As there are six input variables, we need to select a base case before we conduct sensitivity analysis. The base case consists of the input  variables with the most common values. The procedure is described below.
From the histograms, we have divided each variable into 10 bins of equal width that cover the minimum and maximum. For each variable, say X1-Industrial Structure Rationalization, we pick the midpoint value, X 1M , of the bin that contains the highest number of data. With six variables, we have X 1M , X 2M , . . . and X 6M accordingly. Let us call this the "centroid" of input variables. Now, we can vary the value of one variable, say X1-Industrial Structure Rationalization, from minimum to maximum while keeping the values of other five variables constant at their midpoint values of the highest bin. In this way, we can inspect the sensitivity of variable, X1-Industrial Structure  Rationalization around the centroid. We can repeat this analysis to other input variables and form a more complete picture about the six variables affect the CO2 emission around the centroid. The result is shown in Figure 11.
First, when all the six variables are at centroid, the predicted Ln(CO 2 emissions),Y, is 5.4960 (or equivalent to 543.70 million tonnes CO2 emissions), shown with symbol ○ in Figure 11. Then, when we adjust one of the variables, the change of Ln(CO 2 emissions), Y, is summarized in the following.
Industrial Structure Rationalization, X1: The effect on industrial structure rationalization on CO 2 emissions is shown in Figure 11A. It can be seen that their relationship is nonlinear and nonmonotonic. It exactly demonstrates the strength of machine learning is able to pick up the nonlinearity of the variables and make better predictions. It can be seen that when the industrial structure rationalization increases from 0.7486 to 1.6507, CO 2 emissions increase. Beyond that range, its effect is the opposite. It can be interpreted that industrial structure is not the only target for the policy makers. Industrial rationalization index is the equilibrium in economy; it includes output value, sectors of branch of industry employment, and industrial rationalization. If the industrial structure is rationalized, the industry, especially the output of second and third industry, should be in equilibrium, and the regional disparity should be continuously decreased. But the negative influence of industrialization will lead to the increase of CO 2 emissions. It means that the CO 2 emissions decreasing not only need economic equilibrium but also need the balance between the industrialization and harmful gas emission. Therefore, policy makers should develop economy of each province as rapid as possible. It is because the CO 2 emission will eventually decrease after the saturation point at the postindustrialization stage as explained by Bernardini and Galli (1993). On the other hand, the industrial structure rationalization should stay at 0.7486 for some provinces as their CO 2 emissions would be at minimum.
GDP on a natural log scale, X2: The effect on GDP on CO 2 emissions is shown in Figure 11B. It can be seen that CO 2 emissions are the most sensitive when the range of GDP is from exp(5.0442) to exp(5.6349) (or equivalent to 155.12 billion dollars to 280.03 billion dollars). As mentioned at the beginning of Methodology, X2 is taken logarithmically. Each interval of the x-axis represents 1.8 times of the previous level. Every country would like to develop their economy. Therefore, it would be unlikely that a country would sacrifice economic growth to curb CO 2 emissions. Figure 11B shows that CO 2 emissions will increase when economy grows. It will definitely harm the environment. Furthermore, China cannot simply grow its economy without considering CO 2 emissions. It is because one of the pledges China has committed in Paris Accord is to peak CO 2 emissions by 2030. However, with the advancement of technology, it is possible to reduce emissions without economic sacrifice. One thing that must be noted is that in Figure 11B, there is no inverted U-shaped relation between CO 2 emissions and economic growth as found by Galeotti et al. (2006). However, we can see that when GDP grows beyond exp(8.5883) (or equivalent to 5,368 billion dollars), CO 2 emissions would level off and they could even come down. It means China can fulfill its Paris Accord's pledges.
Urbanization, X3: The impact of urbanization on CO 2 emissions is very mixed and complicated as shown in literature reviewed in Literature Review. While most of the previous studies indicate a positive relationship between urbanization and CO 2 emissions, in this study, it is found that a flanged U-shape is observed as shown in Figure 11C. Given that the urbanization is 0.4975 at centroid now, policy makers of China can aim to reduce its CO 2 emissions by increasing urbanization to the trough region of 0.571 and 0.6445. This decrease is likely a result of technological innovation in energy usage and efficiency, increasing awareness of environmental issues, and using green technologies (Bekhet and Othman, 2017).
R&D Reinvestment Intensity on a Natural Log Scale, X4: R&D reinvestment intensity stimulates technological advancement, and it also affects economic growth. It can be seen from Figure 11D that CO 2 emissions increase mildly when reinvestment intensity increases from −6.5673 to its centroid position of −4.4802. Afterwards, it decreases mildly. Looking at the figure, the reinvestment intensity is at critical moment now at its centroid position. If it decreases from its current position, CO 2 emissions would decrease too. But it is likely to be accompanied by a decrease of economic growth. The implication is that China should increase its reinvestment intensity so that it could advance technology more rapidly, increase energy usage and efficiency, and make contribution in reducing CO 2 emissions accordingly.
Actual Use of Foreign Capital on a Natural Log Scale, X5: The impact of actual use of foreign capital on CO 2 emissions is shown in Figure 11E. It is observed that CO 2 emissions increase rapidly FIGURE 10 | Actual vs. prediction of ANN (first layer: six inputs, six neurons; second layer: 10 neurons).
Frontiers in Environmental Science | www.frontiersin.org August 2021 | Volume 9 | Article 721517 11 FIGURE 11 | Sensitivity analysis of variables around the centroid of KNN (2). represents the centroid of the datathe most populated bin of the data Only one variable varies while keeping the other variables unchanged at centroid values.
Frontiers in Environmental Science | www.frontiersin.org August 2021 | Volume 9 | Article 721517 12 when actual use of foreign capital increases from 5.1206 to 5.9213. Then, it levels and even decreases gradually, afterwards. According to pollution haven hypothesis, foreign firms in dirty sectors are more likely to relocate pollution activities from developed countries to poorly regulated developing countries to avoid domestic environmental control cost, which directly undermines the environmental interests of recipient countries like China. This implies that the higher the actual use of foreign capital (FDI), the higher the CO 2 emissions, termed "direct" mechanism. However, there is "indirect" mechanism that affect the CO 2 emissions. Foreign capital could act as a channel for environmentally friendly technologies, more stringent environmental regulations can be designed and implemented in low-emissions provinces to attract clean foreign capital. So the two mechanisms have the opposite effect on CO 2 emissions. In China, actual use of foreign capital of most of the provinces has well passed 5.9212 and reached its centroid position of 10.7257 already. The implication is that the impact of actual use of foreign capital is not significant as the CO 2 emissions is quite stable around that position.
Growth Rate of Energy Consumption, X6: When the economy is robust and growing, more energy is consumed. Therefore, it will result in higher CO 2 emissions. Energy consumption is in an interesting situation now. It is because CO 2 emissions are at a local maximum when the growth rate of energy consumption is at 0.0132 as shown in Figure 11F. Interestingly, when the growth rate was higher than 0.0132 during the period under study, CO 2 emissions decreased unless the growth rate was too rapid beyond 0.145 level. It could be explained that China has made good use of foreign capital and R&D investment. Therefore, it is expected that cleaner and greener energy such as hydropower and nuclear power have been used when the growth rate increases from 0.0132 to 0.145. Last, but not least, policy makers should refrain from consuming energy beyond a growth rate of 0.145. It is because it can be seen that CO 2 emissions increases sharply beyond 0.145 level. Also, the results between 0.5403 and 0.9356 can be ignored as there are only one or two (or even zero) pieces of data of that range.
Noting that the above analysis applies to the centroid position, it provides an overall picture for China as a whole. This approach can also be applied to other position that might be more relevant to individual province.

CONCLUSION AND POLICY IMPLICATIONS
Following the pledges China has committed in Paris Accord is to peak CO2 emissions by 2030 and the declaration of the 2060 carbonneutrality goal of Chinese government; it requires proactive measures to be undertaken to reduce carbon emissions while maintaining continuous economic growth and improving in living standards. Against this background, this paper analyzed the effects of industrial structure rationalization index, GDP, urbanization, R&D reinvestment, actual use of foreign capital, and growth rate of energy consumption on forecasting CO 2 emissions.
Data across 30 provincial administrative regions of China from 2000 to 2018 are used for the study. Data from 2000 to 2015 are used as training set, and data from 2016 to 2018 are used as testing set. We apply a suite of machine learning algorithms on the testing set and predict the levels of CO 2 emissions for the testing set. Machine learning algorithms include linear and nonlinear models, ensemble methods with boosting and bagging, and artificial neural networks. Employing rmse for model selection, results show that k-nearest neigbors (KNN) model performs the best when the number of neighbors is set to two for the present dataset.
Using KNN model, we conducted a sensitivity analysis of CO 2 emissions around its centroid position on its dependent variables. The overall findings revealed that economic growth measured by GDP, X2, contribute to higher CO 2 emissions. As China needs to maintain its economic growth to continuously improve living standards, it brings several implications for policy makers when setting policies concerning other variables. First, in terms of industrial structure rationalization, X1, not all provinces should develop its industrialization. Some provinces should stay at relatively mild industrialization stage that their CO 2 emissions would be at minimum. For other provinces, they should develop their economy as rapidly as possible. It is because CO 2 emissions will eventually decrease after saturation point. Therefore, the duration of high CO 2 emissions that comes with industrialization would be as short as possible. Second, in terms of urbanization, X3, there is an optimal range for a province. To minimize CO 2 emissions, provinces should try to achieve urbanization around 0.571 and 0.6445. With the range, the CO 2 emissions would be at minimum and the decrease is likely a result of technological innovation in energy usage and efficiency. It also suggests that a province should not be too densely populated. Third, the result of R&D reinvestment intensity, X4, suggests that China should increase its reinvestment intensity further. At present, there is a positive relationship between CO 2 emissions and reinvestment intensity. Therefore, it seems that monies for R&D reinvestment have not been put, or not enough, into green technology yet. Only when there is a further increase of R&D reinvestment intensity into green technology, there will be a decrease of CO 2 emissions. Fourth, it is found that the impact of the actual use of foreign capital, X5, on CO 2 emissions is insignificant, relatively speaking, when compared with other variables. If we assume that R&D reinvestment is associated with actual use of foreign capital, policy makers should prioritize the use of foreign capital for R&D investment on green technology. That would reduce CO 2 emissions while maintaining economic growth. Last, it is possible to increase the growth rate of energy consumption, X6, gradually if R&D reinvestment and use of foreign capital are directed towards cleaner and green energy sources such as hydropower and nuclear power. Policy makers must refrain from consuming energy beyond a growth rate of 0.1450 for economic growth. Otherwise, CO 2 emissions would increase rapidly and may jeopardize the pledges China committed in Paris Accord and the 2060 carbon-neutrality declaration. In summary, the above policy implications provide a blueprint for policy makers for ensuing environmentally sustainability economic development in China.
It is worth noting that the approach applied in this study can easily be replicated for other countries to make better forecasting Frontiers in Environmental Science | www.frontiersin.org August 2021 | Volume 9 | Article 721517 of CO 2 emissions for the future. The major constraint of this approach is the data limitation. For successful application of machine learning, the number of data required is usually more than traditional econometric models. With more data, more advanced machine learning algorithms can be applied to further check the robustness of the findings.

DATA AVAILABILITY STATEMENT
Publicly available datasets were used and analyzed in this study. The sources of data are listed in Data and the Variables section of this paper in details.