A Theoretical Model of Sequential Combinatorial Games of Subsidies and Penalties: From Waste to Renewable Energy

Subsidies and penalties are two main regulation methods adopted by authorities to promote the development of renewable energy. Due to the possibility of subsidy fraud, it is necessary to explore effective ways to combine these two policies. In this article, subsidy and penalty policies are incorporated into a sequential game theory model to explore the impact of different regulatory mechanisms on the promotion of renewable energy from recycled resources. We take biodiesel production from used cooking oil (UCO) as an example. UCO can be converted into environmentally friendly biodiesel or mixed with fresh cooking oil, resulting in inferior cooking oil containing harmful carcinogens but with huge profits. There are two mechanisms in the sequential combination model, spot checks after subsidy and subsidy after spot checks. Under both cases, fines are imposed if fraud is found during spot checks. The amounts of subsidies and fines also need to be determined. We show that the effects of subsidies depend on the implementation of the timing. The ex-ante subsidies have no effect. When spot checks are performed first, the larger subsidies will increase the probability of producing inferior cooking oil due to lower probability of spot checks. While combined with penalties, the ex-post subsidies have a positive effect on biodiesel production, that is, there exists synergy effect of penalty and subsidy on renewable energy production. In an infinitely repeated game, the shutdown threat of a grim trigger strategy (GTS) is much easier to induce biodiesel production than the penalty threat of a tit-for-tat strategy (TFT). When penalties are large enough, TFT can achieve the same goal of legal production effectively as GTS. The sooner illegal production is observed, the lower penalties are required to induce the processor to produce legally. Compared to subsidies, penalties are more effective in encouraging processors to produce renewable energy rather than illegal products. Moreover, our simulation results suggest that higher fines or profits from legal production are more likely to stimulate renewable energy production than subsidies. Our findings enrich our knowledge of the link between government regulations and the promotion of renewable energy.


INTRODUCTION
Faced with the challenges of energy shortages and rising greenhouse gas emissions, countries around the world have gradually adopted various policies to stimulate the use of environmentally friendly renewable energy, among which subsidies and penalties are the main intervention means (al Irsyad et al., 2017;Saghir et al., 2019;Chen et al., 2021). For example, the Japanese government provides investment subsidies for solar energy projects to encourage clean energy supply (Kimura and Suzuki, 2006) and allocates direct subsidies for biofuel companies . In Europe, Germany and Spain provide subsidies of around €60/MWH and €300/MWH, respectively, to wind and solar renewable energy producers (Abrell et al., 2019). At the same time, subsidy fraud has become a serious problem. Kumar (2019) stated that 70% of the subsidy was provided to ineligible solar rooftop projects in India. In 2017, the world's leading electric car maker, Tesla, lost the electric vehicle subsidies from the German government after it was accused of gaming the subsidy system (Lambert, 2017). Penalties are introduced to reduce misbehavior by renewable energy producers. The Chinese government fines three times money for solar photovoltaic subsidy fraud (Yuan et al., 2015;Liu et al., 2021).
Although both subsidies and penalties are widely followed, implementing them effectively remains a challenge. Chang et al. (2011) pointed out that direct subsidy is the main driving factor on solar water heater market expansion in Taiwan, but the highlevel subsidy might cause a negative impact on users or a sustainable industry. Zhang et al. (2014) studied the subsidy effect of the biofuel processing industry in different stages from investment, raw material input to final product, and found that investment subsidy is less effective compared with the other two. A benefit-cost analysis of subsidizing residential solar panels in the United States was conducted by Tibebu et al. (2021) and the results showed that in comparison with the federal tax credit, the optimal subsidy schedule can raise net benefits by 250%. In terms of penalty, Lu et al. (2018) developed a penaltycost-based design mechanism, which can reduce the cost of net zero energy building (NZEB) owners by half. Although a few studies attempt to design and apply the reward-penalty mechanism to lessen the over-generation from renewable energy systems (Md et al., 2017) and to reduce the risk in auctions of renewable energy support (Kreiss et al., 2017), the literature regarding the effective combination of subsidies and penalties is still limited.
From the perspective of sustainable energy use, biodiesel is increasingly being adopted as an important alternative to clean energy in the field of transportation, which is a key factor in global climate change (Fischer and Schrattenholzer, 2001). There are two main sources of biodiesel: oily plants, such as the nonedible Jatropha oil, and fatty acid-rich waste including UCO. On the basis of a life cycle assessment, biodiesel processed from UCO has nearly 74% reduced impact on the environment and 80% reduced global warming effect compared with nonedible Jatropha oil (Sajid et al., 2016). In addition, the usage of UCO as a feedstock in biodiesel production (Rincón et al., 2019) can lead to a cost reduction of 60-90% compared to other fatty acid-rich waste (Marchetti et al., 2008). According to Kharina et al. (2018), subsidizing UCO-to-biodiesel production can save the government 345 billion rupiahs (approximately US $24M) per year, in comparison with the subsidy of the same amount of palm biodiesel.
Due to the above reasons, many countries and regions have adopted incentive measures to promote the production of biodiesel from UCO and to reduce the possibility of illegal processing of UCO into other products. Mixing UCO with fresh edible oil can lead to higher profits for processors but serious health problems for consumers (Cai et al., 2015;Ortner et al., 2016;Liu, 2018;Tsai, 2019). For example, biodiesel produced from UCO can enjoy the double counting of carbon dioxide emission reduction in Europe. In Japan, the government subsidizes biodiesel companies to reduce incentives for illegal UCO processing. In China and Japan, restaurants can be fined or shut down if they buy processed UCO or sell it to illegal institutions. Zhang et al. (2017) stated that the penalty mechanism is one of the key determinants that affect the performance of biofuel companies.
The game modeling method is suitable for the theoretical study of the situation where stakeholders have different objectives. The dynamic game model developed by Zhang et al. (2014) compares the incentive effects of different subsidy modes on UCO supply for biofuel refining and sales of UCOrefined products. With a non-cooperate game-theoretical model, Wang et al. (2017) assess the promotion impact of the consumer's response capability on the penetration of distributed photovoltaic systems. A three-stage Stackelberg game model is used to explore the optimal subsidy policy for green energy trading among user residents, service providers, and the grid (Wu et al., 2021). However, it is still unclear that how governments should effectively combine subsidy and penalty measures and what effect it would be if subsidy and penalty measures are taken simultaneously. We integrate subsidy and penalty policies into one game-theoretical model to examine the influence of different regulatory mechanisms on the processing decisions of UCO processors.
The rest of this article is structured as follows. The Model Specification section discusses the specifications in the model. The Simulation Results section shows the simulation results. We offer concluding remarks and practical implications in the Conclusions and Practical Implications section.

MODEL SPECIFICATION
We assume that a processor can produce two kinds of products, one is a good product such as renewable energy with low profit under the government subsidy policy, and the other is a bad product such as unqualified products or carcinogens with high profit. For example, there is a financial support policy for the promotion of new energy cars from 2012 in China, and pure electric vehicles will be subsidized according to cruising range. In order to get a subsidy of up to US$14k, some car enterprises choose to make a false report of cruising range and produce low-standard battery instead of high-standard battery. Take UCO as an example, we consider a waste UCO processor that can produce either biodiesel legally or inferior cooking oil illegally. A local government can perform a spot check to see whether the processor produces biodiesel rather than inferior cooking oil. The spot check can be implemented before or after providing subsidies 1 . To encourage the processor to convert the waste oil into biodiesel fuel through a chemical process, the government provides a subsidy to the processor if the processor produces biodiesel and otherwise punishes the processor with a fine.

One-Stage Game Model With Subsidy Given Before Spot Checks
Consider the static game with complete information. We assume that the cost of recycling waste oil is negligible and focus on the cost difference between the production of biodiesel and inferior cooking oil. The strategic-form representation of the game where the subsidy is given first and then a spot check is performed is in Table 1.
In this table, E 1 is the subsidy provided by the government if the government does not perform a spot check or confirms that the processor produces biodiesel during a spot check. C h and C l are the processor's production costs of biodiesel and inferior cooking oil, with C h > C l . R l and R h are the benefits for the processor to produce biodiesel and inferior cooking oil, with R l < R h . M is the difference in the profits of these two products, M R h − C l − R l − C h . s and (1 − s) are the probabilities of producing inferior cooking oil and biodiesel. r is the probability of a random check. E 2 is the penalty that is charged by the government if the processor is found to produce inferior cooking oil during a spot check. K is the recycling or disposal cost of inferior cooking oil. V + is the social benefit of prevention against the reuse of waste oils by restaurants and households, which makes the consumers better-off. V − is the social cost of the reuse of waste oils by restaurants and households, which makes the consumers worse off. C g is the cost of the random check and is an increasing function of the probability of a random check, dC g /dr > 0.
Given these payoffs, we find the following pure and mixed Nash equilibria.
the sum of the cost of the spot check and the recycling or disposal cost of inferior cooking oil is larger than the sum of the fine and the social cost of the reuse of waste oils by consumers. In this case, there exists a unique pure Nash equilibrium (production of inferior cooking oil, no spot checks). Due to the limited technology for the differentiation between inferior cooking oil and normal cooking oil, the cost of the spot check is so large that it is not efficient for the government to check. Fortunately, the cost of the spot check decreases with the continuous development of techniques for identifying inferior cooking oil.
then the penalty is smaller than the difference in the profits of these two products. Because the penalty is not large enough and the profits from producing inferior cooking oil are far higher than from producing biodiesel, the processor benefits from producing inferior cooking oil instead of biodiesel even after paying the penalty. In this case, there is a unique pure Nash equilibrium (production of inferior cooking oil, spot checks). If Scenario 3. There is a mixed Nash equilibrium as follows: The processor is indifferent between producing biodiesel and producing inferior cooking oil when The government is indifferent between performing a spot check and not performing a spot check when Equations 3, 4 pin down the equilibrium probabilities of performing a spot check and producing inferior cooking oil as given by Eqs 1, 2.
a) When C g + K > E 2 + V − , there is a pure Nash equilibrium with the processor producing inferior cooking oil and the government not checking. b) When C g + K < E 2 + V − and E 2 < M, there is a pure Nash equilibrium with the processor producing inferior cooking oil and the government always checking.
c) When C g + K < E 2 + V − and E 2 > M, there is a mixed Nash equilibrium with r p M/E 2 and s p C g /(E 2 − K + V − ).
Lemma 1 says that there is a unique pure Nash equilibrium with the processor producing inferior cooking oil and the local government not checking the firm when the sum of the cost of the spot check and the recycling or disposal cost of inferior cooking oil is greater than the sum of the penalty the local government imposes on the processor and the social cost of the reuse of waste oils by consumers. This outcome makes sense because the cost of checking is so high for the government that no check is performed, so the processor's best response to the government's strategy is to produce inferior cooking oils because of the high profit. However, when the sum of the cost of the spot check and the recycling or disposal cost of inferior cooking oil is lower than the sum of the penalty and the social cost of the reuse of waste oils by consumers and the penalty is lower than the difference in the profits of these two products, the government always performs the spot check, but the processor's best response is still to produce inferior cooking oil to earn the higher profit. When both the costs and the difference in the profits of these two products are greater than the penalty, there is no unique pure Nash equilibrium. If the government performs the spot check, the processor produces biodiesel; if the government does not perform the spot check, the processor produces inferior cooking oil. If the processor produces biodiesel, the best response for the government is to not check; if the processor produces inferior cooking oil, the best response for the government is to perform the spot check and punish the processor. There is a mixed Nash equilibrium with the probabilities of the processor producing inferior cooking oil and of the government performing a spot check.
We take the partial derivatives of r*and s*, respectively, with respect to E 1 , E 2 , M, K, V − , and find the following results.
(a) zr* zE 2 − M E 2 2 < 0, so r* is a decreasing function of E 2 , meaning that a larger penalty reduces the probability that the government performs the check. If the government wants to decrease the frequency of spot checks, it must increase the penalty on the production of inferior cooking oil. The intuition for this result is that to reduce the production of inferior cooking oil, and the direct policy is to reduce the incentive of the processor to produce inferior cooking oil by increasing the penalty. In the extreme case where the penalty is so high that it exceeds the extra benefits from producing inferior cooking oil, the processor has no incentive to produce inferior cooking oil, and there is no need to make a spot check. zs* meaning that a larger penalty lowers the probability of the processor producing inferior cooking oil. If the penalty is sufficiently large, the processor has no incentive to take the risk of producing inferior cooking oil.
(b) zr* zM 1 E2 > 0, so r* is an increasing function of M, meaning that an increase in the difference between the profits of the two products increases the probability of the government performing a spot check. To reduce the frequency of spot checks, it is best to shrink the difference in the profits of the two products by promoting technological progress in the manufacturing process, increasing exports to increase the market price of biodiesel, or improving the identification technology for consumers to decrease the demand for inferior cooking oil. zs p zC g 1 E 2 −K+V − > 0, s p is an increasing function of C g if E 2 > K, meaning that the higher the cost of the spot check is, the higher the probability of the processor producing inferior cooking oil is. Similarly, the higher the cost of the spot check is, the lower is the probability of the processor producing biodiesel. zs p zK Cg (E 2 −K+V − ) 2 > 0, s p is an increasing function of K, meaning that increasing the cost to the government of disposing inferior cooking oil increases the probability of the processor producing inferior cooking oil. As the cost to the government of disposing inferior cooking oil increases, the government's incentive to check the UCO process decreases and the processor has a higher probability of producing inferior cooking oil. zs p meaning that the higher the social cost of the reuse of waste oils by consumers is, the lower is the probability of the processor producing inferior cooking oil. As the social cost of the reuse of waste oils by consumers increases, reflected in huge costs of public healthcare and a variety of diseases such as cancer, the probability of the government performing the spot check rises under the pressure of public concern over waste oils; as a result, the processor is less likely to produce inferior cooking oil. Figure 1 shows that as the social cost, V − , of the reuse of waste oils by consumers increases, the government increases the frequency of spot checks in the short run. The government reduces the frequency of spot checks in the long run as the probability of the processor producing inferior cooking oil decreases, which demonstrates how a higher social cost decreases the probability of the processor producing inferior cooking oil by increasing the incentive for the local government to increase spot checks to decrease the probability of the processor producing inferior cooking oil. A larger penalty, E 2 , results in a lower probability of the government performing a check. To decrease the social cost of the reuse of waste oils by consumers, V − , the government can increase the penalty for producing inferior cooking oil, E 2 , proportionately 2 via reducing the probability of the processor producing inferior cooking oil, s p , then further shrink the production of inferior cooking oil and the corresponding social cost related to public healthcare and a variety of diseases. When the cost of the random spot check is moderate 3 and the penalty for producing inferior cooking oil is small, E 2 < M, the government always checks. While the penalty is large enough, E 2 > M, the government will check with a positive probability no matter how much the cost of the random spot check is.
The two blue lines in Figure 2 show that as the difference in the profits of the two products increases, so does the probability that the government performs a check. The two brown lines show that as the penalty for producing inferior cooking oil increases, That is, there is a one-to-one relationship between the social cost of the reuse of waste oils by consumers and the penalty for producing inferior cooking oil. 3 When C g + K < E 2 + V − , the cost of the random spot check is moderate; when C g + K > E 2 + V − , the cost of the random spot check is large so that the government always chooses not to check.
October 2021 | Volume 9 | Article 719214 the frequency of spot checks decreases. In the short run, the processor is less likely to produce inferior cooking oil when the penalty is higher; however, a higher penalty increases rather than reduces the probability of the processor producing inferior cooking oil as r* decreases in the long run because the higher penalty decreases the probability of the government performing a spot check.
Proposition 2 (incentive paradox). Ex-ante subsidy has no effect on the probabilities of the processor producing inferior cooking oil or the government performing a spot check. Increasing the penalty for the processor does not deter it from producing inferior cooking oil in one-stage game but does reduce the frequency of spot checks by the local government. Increasing the social cost of the reuse of waste oils by consumers decreases the probability of the processor producing inferior cooking oil.

One-Stage Game Model With Subsidy Given After Spot Checks
The strategic-form representation of the game when the spot check is performed first, and then, the subsidy is given as follows in Table 2. Given these payoffs, we find the following pure and mixed Nash equilibria. Scenario 1. When the sum of the cost of the spot check and the recycling or disposal cost of inferior cooking oil is larger than the sum of the penalty and the social cost of the reuse of waste oils by consumers. In this case, there exists a unique pure Nash equilibrium (production of inferior cooking oil, no spot check). As shown above, there is no pure Nash equilibrium if the cost of the spot check is too small. Scenario 2. When Scenario 3. In all other cases, there is a mixed Nash equilibrium with The processor is indifferent between producing biodiesel and producing inferior cooking oil when The government is indifferent between performing a spot check and not performing a spot random check when Equations 7, 8 can be solved to obtain the equilibrium probabilities of the government performing a spot check and the processor producing inferior cooking oil as given by Eqs.5, 6. We take the partial derivatives of r pp and s pp with respect to E 1 , E 2 , M, K, C g , and V − to find the following. a) zr pp zE1 −M (E1+E2) 2 < 0, so r pp is a decreasing function of E 1 , meaning that a larger subsidy lowers the probability of the government performing a check. When a check is performed before the subsidy is given, a larger subsidy lowers the opportunity cost of the check. If the government wants to decrease the frequency of the checks, it must increase the subsidy for the production of biodiesel. Theoretically, if the subsidy is so high that it compensates for the difference in the profits of the two products, the processor has no incentive to produce the inferior cooking oil. However, the subsidy is also a cost of government supervision and the management of waste oil. The government faces a tradeoff between the cost of a spot check to reduce the production of inferior cooking oil and increase the production of biodiesel and the cost of a subsidy to increase the production of biodiesel and reduce the production of inferior cooking oil.
so s pp is an increasing function when E 2 > K + C g , the larger the subsidy is, the higher the probability of inferior cooking oil producing. To deter the production of inferior cooking oil, the government may decrease the subsidy and increase the frequency of spot checks to the processor. For example, as the subsidy for new energy cars increases in China, more enterprises just make simple modification of unqualified electric cars to cheat subsidy. To eliminate the occurrence of subsidy fraud, the subsidy for new energy vehicles is reduced gradually from 2016. One of car enterprises received a total of more than US$0.5B higher than its profit from two national subsidies for new energy vehicles in 2020 (from the Ministry of Industry and Information Technology of China). In other words, it would lose money without these huge subsidies. b) zr pp zE 2 −M (E 1 +E 2 ) 2 < 0, so r pp is a decreasing function of E 2 , meaning that a larger penalty decreases the probability that the government performs a check. A larger penalty deters the processor from illegally producing inferior cooking oil and increases the probability of the processor producing biodiesel legally, which decreases the probability and cost of a spot check. zs pp zE2 −(Cg+E1) (E1+E2−K+V − ) 2 < 0, so s pp is a decreasing function of E 2 . A larger penalty weakens the incentive of the processor to take the risk of producing inferior cooking oil and lowers the probability of the processor producing inferior cooking oil. Moreover, < 0, the combined effect of penalty and subsidy is positive on the probability of spot checks and negative on the probability of producing inferior cooking oil. When two measures used on the case of spot checks performed first with penalty or subsidy, the processor has lower incentive to produce inferior cooking oil because of large penalties (E 2 > K + C g ) and huge social cost of the reuse of waste oils by consumers(V − > E 1 + C g ), and higher incentive to produce biodiesel because of increasing penalties and subsidies (E 1 + E 2 > M). c) zr pp zM 1 E1+E2 > 0, so r pp is an increasing function of M. Increasing the difference in the profits of the two products increases the probability of the government performing a spot check. A larger difference in the profits of the two products increases the probability of the processor producing inferior cooking oil illegally, which results in a higher probability of the government performing a spot check. This result is similar to the results when the subsidy is given before the spot check is performed. zs pp zC g 1 E 1 +E 2 −K+V − > 0, so s pp is an increasing function of C g . A larger cost of performing a spot check results directly in a lower probability of a spot check which results in a higher probability of the processor producing inferior cooking oil. zs pp zK −(C g +E 1 ) (E 1 +E 2 −K+V − ) 2 > 0, s pp is an increasing function of K, meaning that increasing the disposal cost of inferior cooking oil increases the probability of the processor producing inferior cooking oil. zs pp zV − −(C g +E 1 ) (E1+E2−K+V − ) 2 < 0, s pp is a decreasing function of V − , increasing the social cost lowers the probability of the processor producing inferior cooking oil. Proposition 3. Therelative timing of the subsidy and spot check is important. When the subsidy is given before the spot check is performed, ex-ante subsidies have no impact on the probabilities of the processor producing inferior cooking oil or the government performing a spot check. When a spot check is performed before the subsidy is given, ex-post subsidies have a Frontiers in Energy Research | www.frontiersin.org October 2021 | Volume 9 | Article 719214 negative effect on the probability of biodiesel producing, combined with penalties have a positive effect on the probability of producing biodiesel from UCO The difference between these two timings is the effect of subsidies. When the subsidy is given before the spot check is performed, there is no effect of subsidies on the probabilities of the processor producing inferior cooking oil or the government performing a spot check. When the spot check is performed before the subsidy is given, a larger subsidy combined with large penalties decreases the probability of the processor producing inferior cooking oil. The government may increase the subsidy and penalty to lower the probability of the processor producing inferior cooking oil and increase the probability of the processor producing biodiesel.
Proposition 4. Comparing the two methods, the probability of biodiesel production caused by the first method (subsidy given before spot checks) is higher than that caused by the second method (spot checks are performed before subsidy given).
Given the probability of the government performing a spot check in the n th stage and the probability of producing biodiesel we derived above, then the conditional probability of biodiesel production caused by the first method (subsidy given before spot checks) can be written as, where p Check1 and p Check2 represent the probabilities of the government performing a spot check after or before subsidy; p BIO|Check1 and p BIO|Check2 represent the probabilities of the government performing a spot check after or before subsidy and the processor producing biodiesel; and p BIO is the total probability of the processor producing biodiesel. And the conditional probability of biodiesel production caused by the second method (spot checks are used before subsidy given) can be written as, Substitute Eqs 1, 2, 5, 6 into the following conditional probabilities, we have, and p Check2|BIO > p Check1|BIO , since Which means that production of biodiesel is caused by the second method (spot check performed before subsidy given) of the probability is greater than production of biodiesel is caused by the first method (subsidy given before spot checks) of the probability. The spot check performed before subsidy has a greater effect on biodiesel production than that performed after subsidy. In order to improve the processor's incentive to produce biodiesel, the spot check is preferred to be used first and then subsidy or penalty will be performed accordingly.

Dynamic Infinitely Repeated Game Model
We assume as above that there is one waste oil processor that can produce either biodiesel legally or inferior cooking oil illegally. The local government encourages the waste oil processor to produce biodiesel legally and gives the processor a subsidy according to its reported biodiesel production. If the waste oil processor deviates from producing biodiesel and the inferior cooking oil illegally instead and this is discovered during a spot check, the local government punishes the waste oil processor with a fine.
Because government supervision is a repeated process, both the local government and the processor know the results of the most recent spot check. Both sides readjust their strategies given the outcome of the most recent stage of the game. The processor decides whether to produce biodiesel legally or inferior cooking oil illegally, and the local government adjusts its probability of performing a spot check to adjust its cost. The game between the local government and the waste oil processor is an infinitely repeated game of complete information.
The following two strategies, grim trigger strategy (GTS) and tit-for-tat strategy (TFT), can be used to punish the processor producing inferior cooking oil in the infinite repeated game.
First, GTS is as follows, start by cooperating, that is, the waste oil processor producing biodiesel and the government does not perform a spot check at stage 1, and continue to cooperate until the waste oil processor deviates to produce the inferior cooking oil, once a deviation observed, the government will immediately punish the waste oil processor to shut down and cannot produce either legal biodiesel or illegal inferior cooking oil any more.
Suppose both the government and the processor choose to cooperate at the beginning of the infinite repeated game, the government performs a spot check in the n th stage and the processor continues to produce biodiesel.
To induce the processor to produce biodiesel to be a Nash equilibrium, the expected payoff to the processor of producing biodiesel must be no lower than the processor's expected payoff to producing inferior cooking oil. So it is only when the government punishes the processor for producing poorquality cooking oil, the processor's profit is lower when producing inferior quality cooking oil than when producing biodiesel. The expected payoff to the processor of producing biodiesel is greater than and equal to the expected payoff to the processor of producing inferior quality cooking oil in the first n stages until the illegal production is discovered with shutting down in the n th stage as shown in Eq. 13.
Which results in 1 1−δ R l − C h + E 1 ≥ 1−δ n 1−δ R h − C l + E 1 , and we have the following results, where the numerator is the profit difference between legal biodiesel production and illegal inferior cooking oil production. Moreover, the denominator represents the benefit to cooperate to produce legal biodiesel, and the numerator represents the inventive to cheat to produce illegal inferior cooking oil. Given the discount rate, δ, the smaller the incentive not to cooperate once M relative to the benefit to cooperate (R h − C l + E 1 ), the greater the probability of cooperation to produce legal biodiesel. Given the ratio of the incentive not to cooperate to the benefit to cooperate, the greater the discount rate, or the more important the future outcomes, the greater the probability of cooperation in the infinite periods. Moreover, given the ratio of the incentive not to cooperate to the benefit to cooperate, since zδ n 2 > 0, the bigger n, the greater the discount rate δ. Thus, the later production of illegal inferior cooking oil was observed, the greater the discount rate needed to cooperate, and the more patient, the more important to future benefits.
The second strategy is known as TFT, which starts by cooperating, that is, the waste oil processor producing biodiesel, and the government does not perform a spot check at stage 1, and continue to do (either cooperate or cheat) what the rival did in the most recent period. Once the waste oil processor deviates to produce the inferior cooking oil, the government will immediately revert to a period of punishment of the remaining period to perform a spot check definitely in order to push the waste oil processor back to cooperate and produce legal biodiesel again.
which is equivalent to 1−δ n+1 M E2 ≤ 1 represents profit differencepenalty ratio, as penalty increases the ratio tends to 0. This ratio is equivalent to E 2 ≥ M, which means that the penalty is large enough to compensate the illegal profit difference between biodiesel and the inferior cooking oil production. Otherwise, when the penalty is not large enough, such that E 2 < M, the waste oil processor will always produce the inferior cooking oil even if the government performs a spot check at every stage.
Where the numerator is the profit difference between legal biodiesel production and illegal inferior cooking oil production and represents the inventive to cheat to produce illegal inferior cooking oil, in other words, it is also the cost of cheating in the future. It can be considered as a threat of cheating. Moreover, the denominator is the penalty, represents the benefit to cooperate to produce legal biodiesel in the future. It can be considered as a promise of cooperating in the future.
Given the discount rate, δ, the smaller the incentive not to cooperate once M relative to the benefit to cooperate E 2 , the greater the probability of cooperation to produce legal biodiesel. Given the ratio of the incentive not to cooperate to the benefit to cooperate, the greater the discount rate, or the more important the future outcomes, the greater the probability of cooperation in the infinite periods.
Proposition 5. When the discount factor, δ, is sufficiently large, producing biodiesel is a perfect Nash equilibrium grim trigger strategy or tit-for-tat strategy for the processor in the infinitely repeated game. The discount rate, δ, is such that: E 2 > 1, producing illegal inferior cooking oil is a dominant strategy for the processor even there is a penalty.
Since GTS or TFT can be a Nash equilibrium, when n , only if the discount rate, δ ≥ δ GTS , GTS will be a Nash equilibrium in infinite repeated game. Let f(δ) δ n (1 − δ) − (1 − δ n+1 ) M E 2 ≥ 0 and δ TFT be the solution to f(δ), only if the discount rate, δ ≥ δ TFT , TFT will be a Nash equilibrium in infinite repeated game. Next, we substitute Proposition 6. The discount factor, δ GTS < δ TFT , is lower for GTS than that for TFT when the penalty is not big enough. Producing biodiesel is a perfect Nash equilibrium that is easier for GTS than TFT for the processor in the infinitely repeated game. Given the ratio of the incentive to produce illegal inferior cooking Frontiers in Energy Research | www.frontiersin.org October 2021 | Volume 9 | Article 719214 oil to the benefit to produce biodiesel, the threat of shutting down is much easier to induce biodiesel production than the threat of penalty.
That is, only if the amounts of penalties are large enough, the titfor-tat strategy requires the discount rate as low as it for trigger strategy to be a perfect Nash equilibrium and produce biodiesel. We have the following Lemma.
Lemma 7. There exists a sufficient condition that ensures ; the strategy of tit-for-tat will be a perfect Nash equilibrium in the infinitely repeated game, and producing inferior cooking oil is impossible. Moreover, the sooner production of illegal inferior cooking oil is observed, the lower penalty required to induce the processor to produce biodiesel in infinitely repeated game.

SIMULATION RESULTS
Policy makers want an efficient regulation method to achieve the objective of biodiesel production instead of inferior cooking oil production in the infinitely repeated game. Although this policy objective is clear and unique, the exact amount of subsidy or penalty for each outcome has not been calculated in the literature, to our best knowledge. We believe that the reason was that an analytic calculation is not possible, and one has to use simulations for this. Our simulation code is written in MATLAB. It begins with drawing net profits for the processor producing biodiesel and inferior cooking oil based on the current situation in Shanghai, China. Then, we adjust different levels of stages (from stage 1-20) at which the illegal production by the processor will be discovered to show the changes in discount rate for the GTS and TFT strategies in the long run and the changes in penalties. Finally, we adjust different levels of profit difference between biodiesel and inferior cooking oil production and subsidy to show that the effect of change in profit of illegal and legal production on penalties. One set of values and numbers corresponds to one strategy. Knowing these values, by using our theoretical model, we calculate different levels of discount rate, the amount of penalty and subsidy. Figure 3 shows that the discount rates comparison between GTS and TFT which two strategies need to be Nash equilibria as the stage at which the illegal inferior cooking oil production is checked out, where stage n increases from 1 to 20, the corresponding different discount rates for GTS and TFT. The simulation parameters setting based on the case of Shanghai is as follows: net revenue for biodiesel is $190/t and net revenue for inferior cooking oil is $318/t; the annual UCO output is 35,000 tons; the amount of subsidy is $0.63M per year, and penalty is $6.17M. As we shown above, the bigger n, the greater the discount rate δ for both GTS and TFT strategies. However, the later production of illegal inferior cooking oil was observed, the greater the discount rate needed to cooperate to produce biodiesel for TFT strategy than GTS strategy. In other words, it is more difficult for FTF strategy to induce the processor to produce biodiesel than GTS strategy, since TFT needs more patience for future benefits with discount rate close to 1. Figure 4 shows that as the stage at which the illegal inferior cooking oil production was observed increases from 1 to 20, the penalty will increase dramatically. That is, the later the illegal inferior cooking oil production is checked out, the higher the penalty required to force the processor to produce biodiesel. Figure 5 shows that the discount rate comparison for GTS of the changes in profit of biodiesel and subsidy. As the profit of biodiesel and subsidy increase by the same amount per year, the corresponding discount rates show that it is higher for increase in subsidy than for increase in the profit of biodiesel production. Moreover, the effect of the increase of subsidies on the discount rate is far less than that of the increase in profit of biodiesel production. It is better to increase the profit growth of biodiesel or shrink the profit difference between biodiesel and inferior cooking oil to guide biodiesel production instead of increase in subsidies. Figures 6, 7 show that the effect of change in profit of illegal and legal production on penalties when subsidy is given and on the sum of subsidies and penalties with n 1. The blue line represents the changes in penalties and the sum of subsidies and penalties when there is a decrease in profit of legal biodiesel production from the highest value $318/t by $1.5/t and profit of illegal inferior cooking oil production remains constant at $318/t. Then, the green line represents the changes in penalties and the sum of subsidies and penalties when there is an increase in profit of illegal inferior cooking oil production from the lowest value $190/t by $1.5/t and profit of legal biodiesel production remains constant at $190/t. The changes in profit difference between inferior cooking oil and biodiesel production are the same for both cases and also increase from $1.5/t to $150/t which means that there are two ways to shrink the profit difference, either decrease profit of illegal inferior cooking oil production or increase profit of legal biodiesel production. Both lines increase as the profit difference increases, but the effect of same profit difference is different. The lower the blue line, the higher profit growth in legal biodiesel production and the lower penalties; the higher the green line, the higher profit growth in illegal inferior cooking oil production and the higher penalties. The green line is steeper and more elastic than the blue line. That is, the same amount changes in the profit difference resulting from an increase profit of illegal inferior cooking oil production will require more penalties or the sum of subsidies and penalties FIGURE 4 | Increase in penalties for TFT of the changes in stage at which illegal production is discovered. October 2021 | Volume 9 | Article 719214 than that resulting from a decrease profit of legal biodiesel production. As a result, an increase in profit of legal biodiesel production will reduce penalties/subsidies more than an increase in profit of illegal inferior cooking oil production. Given the amount of penalty, as the profit difference increases, the subsidy should increase too. Lemma 8. The later production of illegal inferior cooking oil was observed, the higher patience required to cooperate for TFT strategy than GTS strategy to induce the processor to produce biodiesel in infinitely repeated game. Moreover, the effects of same increase in profit of biodiesel and subsidy are different, and it is much easier and better to increase the profit of biodiesel instead of subsidies to guide biodiesel production. Increasing profits from legal production is more effective in reducing penalties and subsidies than increasing profits from illegal production.

Conclusions and Discussion
In this article, a sequential game theory model is developed to study the combined effects of subsidies and penalties on the promotion of renewable energy production with an example of biodiesel production from UCO. This model considers three aspects, including the intensity of the government punishment, the relative timing of government subsidies, and the cost of government regulations. Regardless of the timing of the subsidy, it is more effective to raise penalties punishment and reduce the cost of spot checks. When the combined cost of spot checks and the recycling or disposal exceeds fines and social losses to consumers, governments do not spot check and processors tend to produce illegal outputs. Otherwise, the government  conducts random checks, and the processor produces illegal products with a positive probability. The timing of a subsidy or penalty also plays an important role in regulation. Ex-ante subsidies have no effect on the probabilities that processors produce illegal products or that the government conducts spot checks, so more subsidies are not always better. The results are consistent with Chang et al. (2011), who empirically find that high levels of subsidies may cause a negative impact on users or sustainability resource production. Larger penalties do not prevent more processing of illegal products, but they do reduce the need for spot checks. Ex-post subsidies reduce the probability of spot checks which will increase the illegal production.
Extending our model to an infinitely repeated game, we find a negative relationship between penalty and discount factor. A small discount factor makes processors less patient, so they are more willing to produce illegal products for short-term gain. In such cases, it is necessary to increase punishment like shutting down or heavy penalties. In such cases, severe penalties like closure or heavy fines may be necessary. In the infinitely repeated game, the earlier illegal production can be detected, the lower the penalty required to induce the processor to produce renewable energy products such as biodiesel. Lu et al. (2018) show that a penalty-cost-based mechanism can reduce the cost of renewable energy users. Our study shows that the combination of subsidies and penalties can not only increase the production of renewable energy but also decrease the probability of negative output. In addition, the efficiency and effectiveness of misconduct monitoring can reduce the penalty needed to mitigate illegal products. Between the two combined regulation methods, ex-post subsidy is more effective than ex-ante subsidy. This can explain why China's solar photovoltaic incentive policy has changed from ex-ante subsidy such as "Golden Sun Plan" to expost subsidy such as feed-in tariff policy (Yuan et al., 2015).
Our results provide a theoretical basis for the government's regulatory (GTS or TFT) strategies. When the penalties are not large enough, it is easier for GTS than TFT strategy to induce renewable energy production in an infinitely repeated game. While the penalties are large enough, TFT achieves the same objective as GTS does in biodiesel production. The simulation results show that the high-profit margin of renewable energy is a better incentive for producers to process legal products than subsidies.
Previous researches mostly focus on the positive benefits of reward and punishment policies (e.g., Kreiss et al., 2017;Md et al., 2017), while our model can provide more detailed dynamic characteristics of the subsidy and penalty mechanisms. Our study further emphasizes the importance of policy environment. The subsidy and penalty mechanism should be adjusted dynamically in accordance with the change of policy environment such as renewable energy producers' cost structure, production and management efficiency, and technology levels.

Practical Implications
First, ex-post subsidies outperform ex-ante subsidies. In practice, a successful ex-post subsidy mechanism requires detailed subsidy rules and monitoring of sustainable energy projects throughout their life cycle. Participation of third-party organizations and timely evaluation of every subsidy can contribute to the success of regulation.
Second, the grim trigger strategy (GTS) outperforms the titfor-tat strategy (TFT). Generally, GTS is superior to TFT when the penalty is not too severe. To achieve the same effect as GTS, the penalties in TFT must be severe enough and even requiring shutting down the business. It echoes a Chinese old saying, "desperate diseases need desperate remedies". A testimony is the case of Taiwan. In 2014, the maximum fine on illegal production of UCO increased from $0.28M to $3.13M after a severe scandal of mixing refined UCO with fresh cooking oil. Since then, the tougher penalty has greatly reduced the opportunism behavior of UCO processors. On the other hand, companies should not only comply with the environmental regulation but also take a more proactive approach by seeking competitive advantage from sustainability practices.
Third, synergized effect of subsidy and penalty measures on renewable energy production. Subsidies and penalties must be combined, since higher subsidies may lead to more illegal output in the absence of penalties. Regardless of the timing of providing subsidies, the imposition of penalties can always have a positive impact on sustainability production. Two other elements, namely the detection of misbehavior and profit margins, are important factors influencing the implementation of subsidies and penalties. The less punishment is needed if misbehavior can be detected in time. Hence, governments should pay more attention to the use of big data, information transmission, and other technologies to improve the efficiency and effectiveness of regulation. The less subsidy is needed if the more profit companies can gain from technological and managerial innovation. Thus, governments can provide corresponding research and development funds to incentivize technological and managerial innovation. The intensity of subsidies and penalties should also be adjusted according to governments' ability to detect misbehavior and companies' level of technological or managerial innovation.
There are some limitations on our model, such as the assumption of only one processor. However, there may be more than one processor in practice. If we extend this assumption, competition between processors might lower the profits. Our theoretical results may be revised. Furthermore, in future, our work will extend our model with complete information to the game with incomplete information.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, and further inquiries can be directed to the corresponding author. The codes generated for this study are available on request to the corresponding author.

AUTHOR CONTRIBUTIONS
All authors have significantly contributed to the article. XM and YD are the leaders of this research team who organized and designed the study. XM completed the main model design and simulation analysis. ZT and YD were responsible for supervision and funding acquisition Frontiers in Energy Research | www.frontiersin.org October 2021 | Volume 9 | Article 719214 and the writing of part I and part II. All authors have read and agreed to the published version of the manuscript.