Evaluating Asymmetric Approaches to the Estimation of Standard Uncertainties for Emission Factors in the Fuel Sector of Costa Rica

A new environmental challenge for Costa Rica involves the precise and reliable quantification of data from its fossil-fueled transportation sector. In the context of greenhouse gas inventories (measurement), uncertainty assessment, as the best quality parameter of any estimation or measurement, takes on a new relevance by becoming a mandatory requirement on ISO 14064-1:2018. However, a significant limitation has been found by users when quantifying standard (measurement) uncertainties associated with emission factors with asymmetric probability distributions. The present article sought to take advantage of fitting asymmetric distributions to estimate and compare possible standard uncertainties for the official emission factors of Costa Rica, specifically for the fuel sector. Five asymmetric distributions and a “symmetrization” method (symmetric approximation of an asymmetric distribution) were chosen and fitted to the data based on their application and previous use. Standard uncertainties were estimated from each distribution parameters as standard deviations. To evaluate the fit, quantiles of interest were extracted from simulated populations compared with the original data values. A systematically better fit was evidenced for the asymmetric triangular and generalized extreme value distributions, both for CO2 emission factors with less asymmetries and CH4 and N2O emission factors with greater asymmetries. This was not the case for the other distributions, where the log-normal distribution applying the correction factor suggested in the literature showed the worst fit. The use of the former distributions is recommended to estimate the standard uncertainties associated with the emission factors from the official Costa Rican database and other emission factors with similar asymmetries.


INTRODUCTION
Costa Rica is recognized worldwide as a pioneer in environmental protection and the fight against climate change (UN Environment Programme, 2019). Its efforts to achieve high recovery of its forest cover (Allen and Padgett-Vásquez, 2017), ecotourism as a key component of its economy (Courvisanos and Ameeta, 2006) and an electricity grid that is almost 100 % renewable (ICE Group, 2020) stand out. The new environmental challenge for Costa Rica involves its fossil-fueled transportation sector that generates large amounts of emissions and a variety of other problems (Fendt, 2017). In recent years the government of Costa Rica has launched national policies and plans that seek to decarbonize its economy (Government of Costa Rica, 2019), including its current transportation system and the environmental impact it generates. To achieve the success of these policies, it is essential to have precise and reliable data that allow the correct quantification of measures taken and guarantee the success of environmental efforts in favor of its conservation.
Measurement uncertainty, referred in this study simply as "uncertainty" and defined as the doubt about the true value of a quantity that remains after making that measurement or estimation, is the best quality parameter of any measurement or estimation and reflects the impossibility of knowing exactly its value (Joint Committee for Guides in Metrology [JCGM], 2008). Among the accepted methodologies for estimating uncertainty, the application of the law of propagation of uncertainty or Gauss's formula included in the Guide to the Expression of Uncertainty in Measurement (GUM) stands out. This methodology is based on modeling an output quantity y as a known function of several input quantities and handles the uncertainties associated with the input quantities by modeling them as random variables. This approach uses the standard deviations of these random variables and their correlations to produce an approximate evaluation of the standard (measurement) uncertainty u y , a measurement uncertainty expressed as a standard deviation associated to the output quantity y (Joint Committee for Guides in Metrology [JCGM], 2012). It should be highlighted that the conventional technique described in the GUM does not require necessarily that probability distributions of the input quantities to be symmetrical (Possolo et al., 2019).
In the context of greenhouse gas (GHG) inventories, uncertainty evaluation was initially considered as a nonmandatory requirement by ISO 14064-1 (2006), the main international standard for GHG quantification and reporting by organizations. Thus, the implementation of uncertainty estimates in GHG inventories has not yet been covered by many organizations that have already quantified emissions and mitigated their impact on the environment, depriving them of this measure of quality that strengthens the confidence of their results. Nevertheless, uncertainty in GHG inventories has been pointed as a key component to consider (Rypdal and Winiwater, 2001;EPA, 2002;Jonas et al., 2010) and several studies have been developed regarding this topic, including EPA (1996), Ritter et al. (2010); Milne et al. (2015), and Solazzo et al. (2020). With the publication of the new version of ISO 14064-1 (2018), uncertainty assessment takes on a new dimension by becoming a mandatory requirement. This conjuncture added to Costa Rican national scaled efforts to include uncertainty assessment through the National Carbon Neutrality Program (PPCN, by its Spanish acronym), has aroused national interest in the application of uncertainty estimation methodologies in GHG quantification (DCC and PMR, 2020;DCC and LCM, 2020).
A common scenario found in GHG inventories is the indirect quantification of emissions, where emissions are not measured directly as an amount of gas released into the atmosphere. Instead, emissions (E) are estimated from other data values associated with the activity that cause the emission (d) and emission factors (f ) that relate these data to the amount of gas emitted, as shown in Equation (1). Emissions from energy consumption are a typical example of indirect quantification, where the emissions from e.g., a furnace can be estimated from the amount of fuel consumed and an emission factor that relates the liters of fuel with the amount of GHG released by the combustion process.
Following the application of Gauss's formula (Joint Committee for Guides in Metrology [JCGM], 2008;Possolo and Iyer, 2017) to Equation (1), the approximate standard uncertainty of E (u E ) can be easily estimated from the values and standard uncertainties of d (u d ) and f (u f ), according to Equation (2).
However, a significant limitation has been found by users when quantifying u f since the literature usually includes references to asymmetric ranges of variation when describing the behavior of emission factors (Intergovernmental Panel on Climate Change [IPCC], 2000[IPCC], , 2006, including the official database of Costa Rican emission factors (IMN, 2020). This is attributable to the observed behavior when studying these factors, which usually present important dispersions depending on the study conditions. Small values are frequently found but relatively high values are plausible too, making asymmetry the commonly encountered behavior (Bharvirkar, 1999;Frey, 2007). Furthermore, the problem of propagating uncertainties expressed asymmetrically is not addressed in the GUM and has been treated on a limited basis by Barlow (2004) and Audi et al. (2017) using "symmetrized" approximations and by Possolo et al. (2019) 2000,2006), although they do not delve into the methodologies for estimating standard uncertainties for these asymmetric ranges. The present article sought to take advantage of fitting asymmetric distributions to estimate and compare possible standard uncertainties for the official emission factors of Costa Rica, specifically for the fuel sector. The choice of this sector was due to its strategic importance in Costa Rican national environmental policies, and the fact that it represents the largest group of factors and contains the greatest asymmetries within the entire official database. It should be noted that this study addresses the uncertainties of the emission factors as standard uncertainties (standard deviations) according to the GUM guidelines, not as probability ranges or confidence intervals as is the typical approach on the topic (Intergovernmental Panel on Climate Change [IPCC], 2006;Choulga et al., 2020;Solazzo et al., 2020). It is expected that this study will serve as a guide for the interpretation and manipulation of uncertainties within the process of implementing uncertainty estimation in GHG inventories, helping to obtain a more reliable quantification of emission data from fuels in Costa Rica and other countries around the world.

MATERIALS AND METHODS
Five asymmetric distributions were chosen based on their application and previous use in related technical documents including Intergovernmental Panel on Climate Change [IPCC] (2000[IPCC] ( , 2006, Possolo et al. (2019), and DCC and LCM (2020): the asymmetric triangular distribution, the log-normal distribution, the Fechner distribution, the skew-normal distribution and generalized extreme value distribution. An example of these distributions is shown in Figure 1.
These distributions were fitted to the official database of Costa Rican emission factors for the fuel sector (IMN, 2020). This database includes the accepted value for the emission factor f and two additional values U R (also called "right uncertainty" or "upper uncertainty", corresponding to the upper bound of the 95 % probability range estimated for the emission factor) and U L (also called "left uncertainty" or "lower uncertainty, " corresponding to the lower bound of the 95 % probability range estimated for the emission factor) that delimit an asymmetric interval of possible values. Subsequently, this information was used to estimate the parameters of each distribution in order to calculate its theoretical variance (σ 2 ) and its corresponding standard uncertainty u x = √ σ 2 . The standard uncertainty was also estimated and compared for each emission factor using method 1 of "symmetrization" proposed by Audi et al. (2017). The details of each distribution and the chosen symmetrization method are shown below. Except in cases where it is specified, the distribution mean µ was set to f . Following the methodology proposed by Possolo et al. (2019), it was considered that each input f has an underlying asymmetric probability distribution qualified by U R and U L , both positives. However, according to IPCC good practices and guidelines (Intergovernmental Panel on Climate Change [IPCC], 2000[IPCC], , 2006, the interval [f − U L , f + U R ] covers the true value of f with an approximate probability of 95 %. Also, it was assumed that "probabilities 0 < p L < p M < p R < 1 are specified such that where X denotes the random variable modeling the uncertainty associated with f (Possolo et al., 2019). In all cases, p L and p R were set to 0.025 and 0.975, respectively. For the cases of Fechner, skew-normal and generalized extreme values, p M was set to 0.50 and the applied methods sought to best reproduce f − U L , f , and f + U R as the specified percentiles of the corresponding distribution, using a non-linear, unweighted least-squares method.
Finally, to evaluate the fit of each distribution, populations of size 10 6 were simulated with the estimated uncertainties or the distribution parameters that originated them. Since the present study focuses on uncertainty estimation, quantiles 2.5 % (q 2.5 ) and 97.5 % (q 97.5 ) were extracted from each generated population and compared with the original values of U L and U R included in the official database. The largest absolute relative error (RE) was reported as fitting statistic, according to Equation (3).
For all the calculations, data processing, statistical evaluation and simulations, the free environment for statistical computing R version 3.6.1 (R Core Team, 2020) was used. The programming code used in this study can be consulted in Supplementary Material.

Asymmetric Triangular Distribution (u Tri )
As its name indicates, this distribution consists of a triangle whose maximum height is not in its center, and can be asymmetric to the left or to the right. The asymmetric triangular distribution consists of three parameters: the mode µ, the lower extreme value a, and the upper extreme value b. The variance of any triangular distribution is theoretically estimated by following Equation (4).
For the estimation of the parameters, the geometric estimation of the two-half triangles area was used (Petty and Dye, 2013). Considering µ = f , the equation system shown in (5) was obtained. This system was solved using Broyden and Newton methods (Dennis and Schnabel, 1996) included in the R-package nleqslv functions (Hasselman, 2018). For the population simulation, computational facilities provided by R-package triangle (Carnell, 2019) Log-Normal Distribution (u LN ) As its name indicates, the logarithmic normal distribution corresponds to the probability distribution of a variable whose natural logarithm results in a normal distribution. Its most common uses include modeling the multiplicative products of independent factors and strictly positive continuous variables with wide ranges (Intergovernmental Panel on Climate Change [IPCC], 2006). This distribution consists of two parameters: the mean µ and the variance σ 2 , corresponding to the natural logarithm of the mean and variance of the normally distributed variable. According to the IPCC guidelines, it is possible to estimate the distribution variance σ 2 from uncertainties U L , U R and mean µ following Equation (6), where σ g corresponds to the geometric standard deviation of the distribution, estimated according to Equation (7).
Finally, a correction factor F C is recommended to slightly increase u as u LNC = u LN · F C for cases with high standard For the simulation process, R-package stats functions were used (R Core Team, 2020).
Also known as split normal distribution (Wallis, 2014), this distribution "consists of two half-normal distributions with the same mode, one to the left of the mode, the other to the right of the mode, and with their respective densities suitably rescaled so that the resulting probability density is continuous" (Possolo et al., 2019). This distribution can be asymmetric to the left or to the right, depending on the magnitude of their respective parameters.
The Fechner distribution consists of three parameters: the mode µ, the left variance σ 2 L and the right variance σ 2 R . The Fechner distribution variance is theoretically estimated following Equation (9). A Fechner distribution may be a suitable model if 0.410 < U R /U L < 2.44 for a coverage probability of 95 % (Possolo et al., 2019).
For the estimation of the parameters and the simulation process, the code already generated by Possolo et al. (2019) was used, which uses the computational facilities of R-package fanplot (Abel, 2015).

Skew-Normal Distribution (u SN )
This distribution is a generalization of the Gaussian (normal) distribution that considers an additional bias parameter (α). This bias allows this distribution for asymmetry to the left or right, depending on the value of α (for the traditional Gaussian distribution, α = 0). The skew-normal distribution consists of three parameters: a location parameter µ, a scale parameter ω > 0, and the bias parameter α (Azzalini and Capitanio, 2014). The skew-normal distribution variance is theoretically estimated by following Equation (10). A skew-normal distribution may be a suitable model if 0.410 < U R /U L < 2.44 for a coverage probability of 95 % (Possolo et al., 2019).
For the estimation of the parameters and the simulation process, the code already generated by Possolo et al. (2019) was used, which uses the computational facilities of R-package sn (Azzalini, 2020).
The extreme value distributions are generally considered to comprise three distribution families, including the Gumbel, Fréchet, Gompertz, and reverse Weibull distributions (Possolo et al., 2019). These distributions may all be represented as members of a single family of generalized distributions with a common cumulative distribution function, known as the generalized extreme value distribution (GEVD). The GEVD consists of three parameters: a location parameter µ, a scale parameter ω > 0, and a shape parameter ξ (Johnson et al., 1995). The GEVD variance is theoretically estimated by following Equation (11), where g k = 1 − kξ and is the gamma function. For the estimation of the parameters and the simulation process, the code already generated by Possolo et al. (2019) was used, which uses the computational facilities of R-package evd (Stephenson, 2002).

Symmetrization Method (u Sym )
This method consists of a simple approximation of an asymmetric distribution as a "symmetric" normal distribution. Audi et al. (2017) mention it as method 1 and only requires estimating the difference between U R and U L . The resulting value is considered equivalent to the coverage interval (with the same coverage probability as interval [f − U L , f + U R ]) for a normal distribution centered at µ = (U R + U L )/2. Considering the properties of the normal distribution, it is possible to obtain the relationship shown in Equation (12), where σ is the standard deviation for the "symmetrized" distribution. For the simulation process, R-package stats functions were used (R Core Team, 2020).
RESULTS AND DISCUSSION  Tables 1-3 show the complete results for each of the asymmetric distributions and the symmetrization method applied to all the official Costa Rican emission factors for the fuel sector. These results correspond to the estimated standard uncertainties (u) and the maximum relative errors obtained for the quantiles of interest (RE) after the respective simulations and intervals estimation. All estimated standard uncertainties are reported as absolute uncertainties. However, due to their widespread use in the GHG sector, the corresponding relative standard uncertainties are shown in Supplementary Tables 1-3.  Table 1 and Supplementary Table 1 show the results corresponding to the CO 2 emission factors (left column of Figure 2 as example). These factors have the particularity of reporting the least asymmetric intervals of all the analyzed database. All distributions have standard uncertainty values similar to each other, except for the corrected log-normal distribution. This behavior was expected since correction factor F C is only recommended for cases with high standard uncertainties. The asymmetric triangular distribution systematically showed the largest standard uncertainties, while the smallest standard uncertainties were evidenced with the corrected log-normal distribution or the symmetrization method. The greatest differences between the standard uncertainties were evidenced in the emission factor for airplane gasoline, corresponding to the most asymmetric interval in the tables. It should be noted that the uncertainties estimated for this factor with Fechner (u Fech ) and skew-normal (u SN ) distributions may not be suitable for the interval asymmetry, due to the non-compliance of the recommendation 0.410 < U R /U L < 2.44 in both cases according to Possolo et al. (2019).
When evaluating the distribution fitting in Table 1 and Supplementary Table 1 through RE values for the simulated populations, it is noted that the best fitting is obtained for the asymmetric triangular and GEV distributions with RE < 0.03 in all cases. Subsequently, the Fechner and skew-normal distributions showed RE < 0.05 in all cases except for the airplane gasoline factor mentioned above (RE Fech = 0.58, RE SN = 0.59). The symmetrization method and the log-normal distribution showed similar fittings, with a slight improvement for the first (RE Sym ≤ 0.35). Finally, the corrected log-normal distribution did not show to be useful for such small asymmetries and small relative uncertainty values as expected (RE LNC ≤ 7.11). Table 2 and Supplementary Table 2 show the results corresponding to the CH 4 emission factors (middle column of Figure 2 as example) while Table 3 and Supplementary  Table 3 do the same for N 2 O emission factors (right column of Figure 2 as example). The factors for both gases showed percentage variation intervals with asymmetries greater than TABLE 1 | Absolute standard uncertainties u and largest relative error RE estimated for CO 2 emission factors for the fuel sector using various asymmetric approaches.
The best estimate f and expanded asymmetric uncertainties U L and U R were taken from the official database of Costa Rican emission factors (IMN, 2020   the CO 2 factors and similar to each other. The corrected lognormal distribution systematically showed the largest standard uncertainties (except for one case in N 2 O), while the smallest standard uncertainties were evidenced for the log-normal distribution or the symmetrization method. However, the results obtained for the symmetrization method should be taken with caution. Given the characteristics of the normal distribution assumed by this method and the proximity of the intervals to 0, the simulated populations presented a significant percentage of negative values (between 5 % and 10 %), which contradict 3 | Absolute standard uncertainties u and largest relative error RE estimated for N 2 O emission factors for the fuel sector using various asymmetric approaches.
The best estimate f and expanded asymmetric uncertainties U L and U R were taken from the official database of Costa Rican emission factors (IMN, 2020). Values of RE are shown inside the brackets.
Source/Fuel f (g/L) U L (g/L) the physical nature of an emission factor. The asymmetric triangular, Fechner, skew-normal, and GEV distributions shared the possibility to theoretically present negative results, but the proportions of values below 0 obtained in the simulated populations were considered insignificant (<0.2 %). For this reason, only the use of this symmetrization method is not recommended for the evaluated CH 4 and N 2 O emission factors. The greatest differences between standard uncertainties for the CH 4 emission factors were evidenced for lubricants due to the little similarity between the corrected log-normal distribution and the other distributions. It should be noted that these cases do not correspond to the factor with the greatest asymmetry in Table 2 and Supplementary Table 2, as occurred with the CO 2 emission factors. For the N 2 O emission factors, the greatest differences were evidenced for the gasoline without catalyst in land transportation factor. This factor does correspond to the most asymmetric interval in Table 3 and Supplementary Table 3, with reduced values for both log-normal methods while the highest value was evidenced in the GEV distribution (the only emission factor with this behavior).
It should be noted again that most of u Fech and u SN may not be suitable for the emission factor asymmetries of both gases due to non-compliance with recommendation 0.410 < U R /U L < 2.44 (Possolo et al., 2019). Compliance with this criterion was only evidenced for five CH 4 emission factors and five N 2 O emission factors. Uncertainties estimated with the log-normal distribution u LN should also be highlighted. According to the IPCC guidelines (Intergovernmental Panel on Climate Change [IPCC], 2006), these uncertainties could be underestimated due to their high percentage values relative to the emission factor (u rel > 50 %). Therefore, it would be initially recommended to use u LNC instead.
When evaluating the distribution fitting in the factors of both gases through RE values for the simulated populations, it is again noted that the best fitting is obtained for the asymmetric triangular distribution (RE Tri < 0.2) and GEV distribution (RE GEV ≤ 0.5), respectively. The case of N 2 O emission factor for gasoline without catalyst in land transportation should be highlighted, where these methods seem to fit adequately (RE Tri = 0.05, RE GEV = 0.11), but their relative standard uncertainties differ by more than 9 % (u Tri = 68.71 %, u GEV = 78.10 %). This situation does not occur in any other emission factor for CO 2 , CH 4 , or N 2 O, where RE values and standard uncertainties between both distributions seem to be consistent with each other. For Fechner and skewnormal distributions, the ten emission factors in which their use may be appropriate showed very good fittings as well (RE Fech < 0.1, RE SN ≤ 0.5), while mixed fittings were evidenced for the other factors (0.6 < RE < 21.9) and for the log-normal distribution (0.6 < RE LN < 28.2). Although not recommended, the symmetrization method was not very effective when evaluating its fitting in these emission factors (4.8 < RE Sym < 14.8). It is noteworthy that, according to the IPCC guidelines (Intergovernmental Panel on Climate Change [IPCC], 2006), the high uncertainties would justify the use of the correction factor F C to improve the estimation of the log-normal distribution. However, the corrected lognormal distribution fittings were the worst for all factors for both gases (8.4 < RE LNC < 32.3). In this context, a deeper analysis is recommended on the use of Equation (8)  Although most of the proposed methods can be used to estimate the standard uncertainties of the emission factors included in the official Costa Rican database for the fuel sector (except for symmetrization method for CH 4 and N 2 O emission factors), the asymmetric triangular and GEV distributions obtained the best fittings in the present study. For practical purposes, any of these two distributions could be chosen to estimate the standard uncertainty for Costa Rican fuel emission factors, with little difference in the results in most cases. Its use can also be recommended to address the estimation of standard uncertainties for other emission factors, even for other countries, as long as the probability of obtaining negative values is not a concern (for example, emission factors with U L values close to 0). In the latter cases, the use of a log-normal distribution is recommended or exploring the use of truncated variants of the distributions evaluated in the present study that prevent the presence of negative values.

CONCLUSION
From the present study, the applicability of different approaches of asymmetric distributions to address standard uncertainty estimation of emission factors in GHG inventories according to GUM guidelines was evident, specifically for the Costa Rican official database of these factors in the fuel sector. The comparability of the different methods applied was also appreciable, with significant differences in the fittings obtained for factors with probability ranges or coverage intervals reported with greater asymmetries. Overall, a systematically better fit was evidenced for the asymmetric triangular and generalized extreme value distributions, both for CO 2 emission factors with less asymmetries and CH 4 and N 2 O emission factors with greater asymmetries. The observed fit was not as good for the other distributions, with the log-normal distribution applying the correction factor suggested in the literature showed the worst results.
Therefore, the use of the asymmetric triangular or generalized extreme value distributions is recommended indistinctly to estimate the standard uncertainties associated with the emission factors from the official Costa Rican database for the fuel sector. Its use is also recommended within the process of implementing uncertainty estimation in GHG inventories and improving the quality of its results, considering other emission factors for Costa Rica or other countries as well, provided that their intervals of variation are not very close to 0. Further, depending on the total emissions inventory characteristics (e.g., expected asymmetry), the probability distributions addressed in this study can be considered for the total uncertainty of the inventory, broadening the spectrum of possibilities already used.
Finally, it is considered that the present study provides the expected guidance for the interpretation and manipulation of emission factor standard uncertainties and will hopefully ease the process of implementing uncertainty estimation in GHG inventories and obtain a more precise and reliable quantification of emission data from the fossil-fueled transportation sector in Costa Rica and other countries around the world.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

AUTHOR CONTRIBUTIONS
GM-C contributed to the conception of the study, performed the statistical analysis, and wrote the first draft of the manuscript. BC-J revised it critically for important intellectual content and approved for publication of the content. Both authors contributed to the article and approved the submitted version.

FUNDING
The publication of this study was covered by the NDC Support Programme of the United Nations Development Programme (UNDP).