The Major Driving Factors of Carbon Emissions in China and Their Relative Importance: An Application of the LASSO Model

China is one of the biggest energy consumers and carbon emitters in the world. Understanding the factors affecting carbon emissions is critical for policymakers to control the rising trend of carbon emissions. This paper investigates the relative importance of carbon emissions drivers in China. Literature review has been carried out to determine a set of predominant independent variables; the LASSO model is then introduced to rank the relative importance among the set of independent variables. The results find that 1) carbon emissions were mainly driven by economic growth and energy consumption followed by population size and industrialization; and 2) income growth slowed down carbon emissions during the studied period, but it is the least significant factor among the other factors. The ranking allows policy makers to focus on the most critical contributors to carbon emissions and gives policymakers more flexibility in determining policy interventions.


INTRODUCTION
Climate change is now a global challenge. The increase of carbon emissions, which is now at the highest level in history, is the main cause for global warming and climate change. Given that the environmental deterioration became a serious issue, studies on the driving factors of carbon emissions has become a subject incredibly significant both at the international and domestic level. Over the past few decades since 1978, China has experienced remarkable economic growth, which has been associated with deteriorating environmental conditions in the country (Ma et al., 2018). Taking measures to control the rising trend of carbon emissions is increasingly recognized as critical in the Chinese government's efforts to mitigating climate change (Liu et al., 2011). As the world's largest carbon emitter, China's effort to achieve its reduction target assumes significance for global climate change control. Understanding the driving factors of carbon emissions in Chinese cities is critical for policymakers to control the rising trend of carbon emissions.
Countries have signed agreements of economic cooperation, which have increased globalization throughout the world (Zaidi et al., 2019). Globalization has promoted worldwide growth in trade, increase in energy consumption and intensity, industrial expansion and people's income. Whether the developing countries can increase growth rates with the help of globalization without damaging environmental is a critical question.
Energy intensity and carbon emissions have a long-term linkage (Shahbaz et al., 2016). When globalization comes with innovative technology and increases the country's energy intensity and its increases the level of carbon emissions.
The relationship between carbon emissions and energy consumption and economic growth has gained little attention in the literature. From the methodology perspective, most of previous studies have used traditional panel regression models, but we introduce the Least Absolute Shrinkage and Selection Operator (LASSO) regression model to investigates the relative importance of the impact factors influencing carbon emissions in China. Compare with traditional regression method, LASSO obtains a refined regression model which provides the smallest possible forecast error with a minimum number of regressors, and it offers important information on the relative importance of the variables. The benefits of using the LASSO regression model can be summarised as follows: 1) LASSO adds first order penalty to the regressors and this allows LASSO to select out the relevant predictors for dependent variables (Hastie et al., 2019); 2) The importance of independent variables in terms of the change of parameters of the LASSO model can be ranked and, this gives policymakers more flexibility in determining policy interventions (Shi et al.,202).
The remainder of this paper is structured as follows: Literature Review presents the literature review linked to energy consumption, carbon emission, income, population size, industrialization, and economic growth. Methodology introduces the LASSO regression model. LASSO Regression Results represents the empirical results and discusses the key drivers of carbon emissions. Conclusion and Policy Implications comprises the conclusion of the study and policy implications.

LITERATURE REVIEW
Numerous studies have proved that economic growth and energy consumption are important driving factors for carbon emissions. Shabaz et al. (2013) examined the relationship between economic growth, energy consumption, financial development, trade openness, and carbon emissions from 1975 to 2011 in Indonesia. Their study applied Vector Error Correction Model (VECM) causality analysis, finding that economic growth and energy consumption increases carbon emissions. Munir et al. (2020) examined the relationship between carbon emissions, energy consumption, and economic growth for the five main Association of Southeast Asian Nations (ASEAN) countries. They have applied Granger non-causality test and found that there was causality between GDP and energy consumption for all countries. Meanwhile, Shahbaz et al. (2016) examined the direction of causality among carbon emissions, energy consumption, and economic growth in 11 countries for the period 1972-2013. The authors used a novel approach and found that economic growth caused carbon emissions in Bangladesh and Egypt. Hu et al. (2020) used the Tapio decoupling model and Kaya-LMDI (Logarithmic Mean Divisia Index) model to investigate the spatiotemporal evolution of decoupling and driving factors of carbon emissions of 57 Belt and Road Initiative countries from 1991 to 2016. According to the results, most countries' carbon emissions significantly increased due to economic growth. Liu and Hao (2018) examined the relationship between energy consumption and economic growth of 69 countries along the Belt and Road between 1970 and 2013. This paper applied the VECM model, fully modified OLS, and dynamic OLS approaches. According to the findings, there was a bidirectional causality between carbon emissions, energy use, and GDP per capita for the energy-exporting countries. Zaman and Moemen (2017) examined the interrelationship between energy consumption, economic growth, and carbon emissions under the six alternative and plausible hypotheses in the context of low and middle-income countries, high-income countries, and in the aggregated panel, from 1975 to 2015. The results supported energy-induced emissions in different regions of the world. Kahouli (2018) examined the linkages between electricity consumption, carbon emissions, R&D stocks, and economic growth for Mediterranean countries over 1990-2016. The findings revealed that there were strong feedback effects between electricity consumption, carbon emissions, R&D stocks, and economic growth. Awodumi and Adewuyi (2020) adopted a non-linear autoregressive distributed lag (ARDL) technique to examine the role of non-renewable energy in economic growth and carbon emissions of oil-producing countries in Africa during 1980-2015. The result has confirmed that there is an asymmetric effect of per capita consumption of petroleum and natural gas on economic growth and carbon emissions per capita in all the selected countries except Algeria. Correspondingly, Mohmand et al. (2020) investigated the causal relationship between transport infrastructure, economic growth, and transport emissions in Pakistan from 1971-2017. The results found a short-term causality running from transport infrastructure, economic growth, and fuel consumption to carbon emissions. In the long run, a bidirectional relationship exists between economic growth and infrastructure.
In addition to economic growth and energy consumption, other factors like industrialization, population size, and income level are also important driving forces for carbon emissions. Nasir et al. (2021) examined the relationship between carbon emissions, economic growth, energy consumption, industrialization, and other factors in Australia from 1980 to 2014. The study found that all variables have affected carbon emissions. Li et al. (2021) discussed the effect of economic growth, economic structure, and other factors on per capita carbon emissions in 147 countries from 1990 to 2015. The results showed that at the global level, economic growth and economic structure were respectively the most significant positive and negative factors affecting carbon emissions. Luo et al. (2021) investigated the factors influencing carbon emissions in Shanghai during 1995-2017. They applied the LMDI method and Granger causality test, and they found that motor vehicle amount, disposable personal income, and carbon intensity are the top three driving factors of carbon emissions. Zhang et al. (2020) analysed the decoupling elasticity between carbon emissions, GDP, and energy consumption in China and ASEAN countries throughout 1990-2014. Based on the LMDI method, the authors found that carbon density, energy intensity, GDP, and population have a positive relationship with carbon emissions.
Based on the above literatures, it can be concluded that economic growth, energy consumption, population size, industrialization and income can be classified as the predominant primary driving factors for carbon emissions. Other factors, such as foreign direct investment (Essandoh et al., 2020;Le et al., 2020;Khan and Rana, 2021), financial development Bhattacharya et al., 2017;Zaidi et al., 2019;Wang et al., 2020), oil price (Brini et al., 2017;Shahbaz et al., 2021), renewable energy consumption (Vo et al., 2020;Assi et al., 2021;Magazzino et al., 2021) and innovation  can be largely classified as the secondary driving factors which affect the primary factors. For example, it has already been proven that foreign direct investment, financial development and innovation are all positively related to growth (Jones, 1995;Muhammad and Khan, 2019), while oil price and renewable energy consumption will affect energy consumption as a whole. Instead of looking at the detailed components of each driving factors, this study limits the focus on ranking the relative importance of the primary driving factors.

METHODOLOGY
The Least Absolute Shrinkage and Selection Operator (LASSO) is a reduction and selection method for the linear regression model which enhances its prediction accuracy and limits the regression coefficients within a certain range at the same time. It was originally proposed by Robert Tibshirani (Tibshirani, 1996) based on Leo Breiman's non-negative parameter inference (Breiman, 1995). The main objective of LASSO is to obtain a refined regression model which provides the smallest possible forecast error with a minimum number of regressors. Given a set of regressors x 1 , x 2 , /, x n and the regressand y, LASSO fits the linear model y β 0 + β 1 x 1 + β 2 x 2 + / + β p x p + ε. The selection criterion is to minimize the objection function: where y denotes the fitted value of the linear model, n is the number of observations, p signifies the number of regressors and λ represents a non-negative regularization parameter. The first sum of the objective function is the usual sum of squared errors of the multiple linear least squares regression while the second sum is a LASSO regularization term. The regularization term has no effect when λ is small enough and the LASSO regression method is the same as the least squares' method. However, when λ is large enough, all the regression coefficients are forced to be zero; for this reason, the LASSO solutions are reduced versions of the least squares estimates. It follows that, the coefficients will change from zero to nonzero one after another by continuously adjusting the value of λ in descending order. The quality of the LASSO estimator can be measured by the mean squared error (MSE) defined as n i 1 (y i − y i ) 2 . MSE is a reliable quality measurement for LASSO regression model selection. The smaller the MSE, the higher the quality and vice versa. Hence, it is apparent that the LASSO estimation method can be used to derive a model which provides the smallest possible forecast error with a minimum number of regressors.
The LASSO regression method has been applied by Shi et al. (2020) to study carbon emissions in China at household level. The present study follows the approach given in their paper to find the optimal model at city level. The full details of the approach can be found in their paper whereas a summary is given below: In Shi et al. (2020), the value of λ is adjusted as a descending geometric sequence. The maximum value of λ is set in such a manner that all coefficients except the intercept are forced to be zero; the value of λ is then adjusted downward geometrically such that the minimum value of λ is 1.00E-04 times the maximum value. A hundred specifications are constructed and estimated within the range of the maximum and minimum values of λ. Specification one is the specification of the minimum value of λ, and specification (100) is the specification of the maximum value of λ.
According to the order in which the coefficients appear, it is possible to identify the regressor which is the most important for model prediction. The more important factors appear earlier, whereas the less important factors appear later. Thereby, unlike the traditional regression method which signifies a set of significant factors, LASSO offers important information on the relative importance of the variables. In essence, LASSO provides a ranking for the significance of the variables. The ranking has important policy implications for policy makers when resources are scarce; it allows them to focus on the most critical areas and allocate resources effectively on those areas.
Apart from the relative importance of the variables, LASSO also provides important information on the robustness of the coefficients. Unlike the traditional regression method which estimates the model only once, LASSO estimates the model repeatedly for different values of λ. Thus, the values and the significance of the coefficients can be observed when the value of λ is adjusted downward. Therefore, the model's robustness can also be studied by observing the significance of the variables for different specifications.

LASSO REGRESSION RESULTS
Six main factors of carbon emissions have been considered in this study. The details of the variables of the linear model are illustrated in Table 1. There were 286 prefecture and aboveprefecture levels (PAA) cities in China by the end of 2014. However, there are two cities in Tibet for which most of the data are not available (Lasa and Rikaze). Therefore, we use a panel  Table 2 below and this is a strongly balanced panel dataset, which will show us more robust results. Table 3 presents the LASSO regression results for specifications (99)  can be summarized as follow: CO 2 is expected to be positively related to Log GDP; without major policy interventions, improving energy efficiency alone is unlikely to cope with the negative environmental impacts attributed to economic growth. Meanwhile, with an increase in energy consumption per head, the carbon emission level is expected to increase unless the increase in energy consumption per head is driven mainly by the decrease in population. Table 4 unveils the third factor affecting carbon emissions in China. From Table 4, it is apparent that when the value of λ reduces from 0.077513 in specification (83) to 0.070627 in specification (82), Log POPULATION becomes nonzero and the MSE decreases by 82% from 0.193,211 in specification (99) to 0.035029 in specification (82). It indicates that Log POPULATION is an important variable since its inclusion as an additional variable in the regression model significantly reduced the model's MSE. Unsurprisingly, Shahbaz et al. (2016), and Hu et al. (2020 also found the similar results. This finding is alarming because given a fixed carbon emission target, the larger the population, the lower the allowable per capita emission level. Put differently, as the population in China grows, everyone's pollution rights are declining. Note that GDP per capita (GDP_PC) is calculated by dividing the GDP of a country by its population. Having all three variables (Log GDP, Log POPULATION, and Log GDP_PC)) in the model   will introduce perfect multicollinearity, which will prevent the least squares method from solving the system of equations. Evidently, it is feasible to drop one variable to avoid perfect multicollinearity. The challenge is, no regression model can automatically identify the set of linearly dependent variables and correctly pick the most significant variables out of the set. With LASSO, the solutions can be found by omitting the most insignificant variable and including the most important ones. Based on the result presented in Table 3, it can be inferred that, out of the three linearly related variables, Log GDP is the most significant factor affecting carbon emissions. Together with the evidence in Table 4, it can be concluded that Log GDP and Log POPULATION are more important factors affecting carbon emissions in China when compared with Log GDP_PC Log(GDP/POPULATION). In other words, GDP per capita which shows the manner in which the economy grows with the population is not as important as GDP and population by themselves. Table 5 presents the LASSO regression results for specifications (66) and (67). It can be observed that when the value of λ reduced from 0.017495 in specification (67) to 0.907,977 in specification (66), the variable IND becomes nonzero. Moreover, the MSE decreased from 0.035029 in specification (82) to 0.013801 in specification (66). Therefore, it can be concluded that IND is the fourth influencing factor for carbon emissions in China. The results were in consonance with the expectations due to Bhattacharya et al. (2017), Zaman and Moemen (2017) and Liu and Hao (2018) found the similar results. Over the past few decades, China has focused on using its cheap labour force and coal to expand its manufacturing sector. Thus, its energy-intensive manufacturing sector is expected to be one of the main drivers of carbon emissions in China. Table 6 shows the LASSO regression results for specifications (63) and (64). It reveals the fifth significant factors affecting carbon emissions in China. According to Table 6, when the value of λ reduced from 0.013234 in specification (64) to 0.012059 in specification (63), the variable Log INCOME becomes nonzero. Moreover, the MSE decreased from 0.013801 in specification (66) to 0.013131 in specification (63). Thus, it can be concluded that Log INCOME is the fifth influencing factor for carbon emissions in China. For previous studies, Le et al. (2020, Awodumi andAdewuyi (2020), Apergis et al. (2018) and Nasir et al. (2021) have concluded that there was a negative relationship between income and carbon emissions. The variable INCOME is a measure of the average annual salary of Chinese citizens. As the Chinese citizens' income level increases, carbon emissions reduce -although their effect of it is not that important. Notably, the variable Log GDP_PC remains zero and indeed it remains zero for the rest of the 62 specifications (i.e., specifications 1-62). This finding is consistent with what has been presented in Table 3 and Table 4. In essence, the LASSO regression method identified the two most important variables out of the three perfectly linearly related variables and ignored the least important one.
Lastly, it can be observed the significance of the variables remains stable throughout the regression process. Put succinctly, once the variables turn to nonzero, they will not turn to zero again later. Therefore, it implies that the LASSO estimation results presented in this study are robust.

CONCLUSION AND POLICY IMPLICATIONS
In this paper, the main driving factors of China's carbon emissions have been analysed. The evidence shows that, firstly, carbon emissions were mainly driven by economic growth (GDP) and Per Capita Energy Consumption (ENERGY) followed by population size (POPULATION) and Percentage of Secondary Industry (IND). Secondly, annual average salary (INCOME) slowed down the growth of carbon emissions during the studied period, but it is the least significant factor among the five factors. Thirdly, out of the three perfectly linearly related variables (Log GDP, Log POPULATION, and Log GDP_PC), Log GDP is the most significant factor affecting carbon emissions followed by population size (Log POPULATION) whereas Log GDP_PC is the least important factor and is dropped automatically. In other words, GDP per capita which shows the manner in which the economy grows with the population  is not as important as economic growth and population by themselves. Lastly, since the significance of the variables remains stable throughout the regression process, the above results are robust. This study contributes to the existing literature in the following ways. The LASSO regression method not only addressed the problem of perfect multicollinearity and identified a set of significant factors influencing carbon emissions in China but also provided important information on the ranking for the variables' significance. It allows policymakers to identify the main area of focus, allocate resources to those prioritized areas, and reduce carbon emissions more effectively by having more flexible policy interventions.
Based on the relevant importance of the driving factors, this research has the following policy recommendations. The first is to promote green manufacturing to achieve sustainable economic development. As the largest developing country in the world, China is facing an extremely difficult task in maintaining sustainable growth. China's energy structure is still dominated by coal, and coal occupies an important share in energy conversion. The large amount of coal consumption, especially the direct combustion of coal in the terminal production process, puts tremendous pressure on carbon emission reduction. Green manufacturing can be enhanced in two directions: On the one hand, advanced and applicable energy-saving, low-carbon, and environmentally friendly technologies can be used to transform traditional industries into low-carbon factories to reduce processrelated emissions. One the other hand, efficient and effective polices can be implemented to promote the use of renewable energy, such as solar and wind energy, instead of fossil fuels to reduce energy-related emissions.
Secondly, population is also an important factor affecting carbon emissions in China. The larger the population, the higher the total emissions and the lower the allowable per capita emission level given a fixed carbon emission target. Thus, considering the relative importance of the variable and the huge population size in China, lowering energy consumption per head is the second most important tasks to reduce carbon emissions in China. It is essential to promote the green life movement extensively to encourage residents to adopt a green, low carbon, civilized and healthy lifestyle in terms of clothing, food, housing, transport, and travel.
In this study, the focus is limited to the relative importance of the primary driving factors (i.e., economic growth, energy consumption, population size, percentage of secondary industry and average annual salary) on carbon emissions in China. These effects are worth investigating for the future research on a global perspective. In addition, future research could incorporate other non-linear analytical techniques into the analysis to complement existing studies.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.