Comprehensive review of carbon quantification by improved forest management offset protocols

Improved forest management (IFM) has the potential to remove and store large quantities of carbon from the atmosphere. Around the world, 293 IFM offset projects have produced 11% of offset credits by voluntary offset registries to date, channeling substantial climate mitigation funds into forest management projects. This paper summarizes the state of the scientific literature for key carbon offset quality criteria—additionality, baselines, leakage, durability, and forest carbon accounting—and discusses how well currently used IFM protocols align with this literature. Our analysis identifies important areas where the protocols deviate from scientific understanding related to baselines, leakage, risk of reversal, and the accounting of carbon in forests and harvested wood products, risking significant over-estimation of carbon offset credits. We recommend specific improvements to the protocols that would likely result in more accurate estimates of program impact, and identify areas in need of more research. Most importantly, more conservative baselines can substantially reduce, but not resolve, over-crediting risk from multiple factors.


. Introduction
Forests play a critical role in meeting greenhouse gas mitigation objectives with their potential to store large quantities of carbon and to act as an ongoing sink removing carbon from the atmosphere (Griscom et al., 2017;Fargione et al., 2018;Austin et al., 2020). Forest climate change mitigation activities generally fall into three broad categories: conserving existing forests; increasing forest extent through reforestation, afforestation, and agroforestry; and changing the management of existing forests to increase carbon in forests and forest products (improved forest management-IFM). Opportunities for increasing carbon sinks generally fall within the latter two categories, while forest conservation is focused on protecting existing forest carbon storage. Forest carbon activities can also have a range of ecosystem and societal co-benefits, including maintaining and enhancing biodiversity and providing forest products (Kremen and Merenlender, 2018;.
Around the world, 293 carbon offset projects to date have channeled substantial carbon funding into improved forest management (So et al., 2023). Offsets are seen as a critical source of funds for IFM and an important alternative mitigation option to high-cost and hard-to-abate sources of emissions. This paper examines how well currently used IFM carbon offset protocols align with the scientific literature on carbon accounting, forest management, and land use change and how they can be amended to more accurately estimate program carbon benefits.
Studies suggest that IFM has the potential to increase carbon stocks by 0.2-2.1 Gt CO 2 e/year globally (Griscom et al., 2017;Roe et al., 2019;Austin et al., 2020) without compromising the fiber and ecosystem co-benefits provided by managed forestlands. IFM includes a broad range of practices that increase carbon in forests and forest products (see Ontl et al., 2020;Ameray et al., 2021;Kaarakka et al., 2021 for detailed reviews of the range of IFM practices). For example, extending rotations can increase carbon stored on the landscape with continued or increased timber production for forests managed below maximum productivity (Sohngen and Brown, 2008;Foley et al., 2009;Nunery and Keeton, 2010). Reduced-impact logging in tropical forests can reduce forest degradation and increase or preserve soil carbon stocks, making forestry more sustainable and the conversion to agriculture less likely (Sasaki et al., 2016;Nabuurs et al., 2017;Ellis et al., 2019). Improved forest management can also make forests less susceptible to future carbon reversals from wildfire, drought, and pests (Anderegg et al., 2020).
In regulatory and voluntary carbon offset markets, carbon registries establish offset protocols that define project eligibility criteria and methods for monitoring and calculating the carbon impacts of each participating project. The registries also require third-party verification and issue offset credits. Each offset credit should represent one metric ton of carbon dioxide (tCO 2 ) emissions reduced or removed from the atmosphere. The protocols set the standard for the quality of the carbon offsets and their design allocates carbon financing toward eligible project types. Offset quality-the degree to which offset credits represent real emissions reductions and removals-is determined by protocol rules around additionality (would the project activities have occurred without the offset income?), counterfactual baselines (what would have happened without the offset income?), leakage (does the project cause increased emissions outside of project accounting boundaries?), durability (is the risk that stored carbon will be released back into the atmosphere managed and accounted for?), and carbon accounting (are the methods for monitoring and calculating carbon stocks, fluxes, and process emissions accurate and conservative?).
Peer-reviewed and non-peer-reviewed studies of IFM offset projects and protocols have shown evidence of over-crediting and non-conservative methodological rules. Studies of the California Air Resources Board (ARB) forest offset protocol found that the protocol is likely to significantly over-generate credits due to its methods for assessing project baselines (Badgley et al., 2022b;Coffield et al., 2022), leakage (Haya, 2019), and risk of reversal (Anderegg et al., 2020;Badgley et al., 2022a), as well as to create incentives counter to long term carbon stability in fire-prone areas (Herbert et al., 2022). Several peer reviewed and investigative case study analyses of projects using different IFM protocols identified substantial over-crediting (van Kooten et al., 2015;Elgin, 2020;Koberstein and Applegate, 2021).
Offset quality is essential for four main reasons. First, polluters often purchase offsets instead of directly reducing their own emissions. When used this way, offsets do not reduce emissions but rather trade where emissions reductions occur. When more offsets are generated than the program's actual climate benefits, they can reduce overall climate action. Second, when forest carbon is used to offset fossil fuel or other greenhouse gas emissions, offsets trade a known quantity of emissions with a much less certain and less durable quantity of reductions or removals (Haya, 2010;Haya et al., 2020). Third, the protocols send investment signals into the offset market sectors. If protocols result in over-crediting, climate mitigation funds will be over-allocated into less valuable activities. Fourth, over-crediting also creates a credibility problem for the offset market as a whole, undermining its ability to continue to direct private funds into effective climate mitigation. It is therefore critical that IFM protocols reflect current science and conservatively account for uncertainties.
To our knowledge, no study has yet comprehensively compared IFM offset protocols to the science of carbon accounting, forest management, and land use change to assess offset quality at the protocol level. The objective of this study is to qualitatively compare the IFM offset protocols against the scientific literature on quantifying IFM carbon impacts, with a particular focus on additionality, baselines, leakage, durability, and forest carbon accounting. Each section and our concluding discussion describe specific ways that the protocols can be improved to avoid overcrediting and to effectively support improved forest management practices that increase carbon storage in existing forests.

. . Background
Three voluntary offset market registries have generated the vast majority of IFM offset credits globally to date-American Carbon Registry (ACR), Climate Action Reserve (CAR), and Verified Carbon Standard (VCS). Each has offset protocols generating credits for voluntary use. All three also act as registries for the California Air Resources Board (ARB) offset program, hosting ARB-approved offset protocols and managing the monitoring, reporting, and verification processes for offset credits that can be used by California emitters to meet the state's cap-and-trade emissions targets.
Most IFM protocols were developed by interested stakeholders, including project developers, before the registry put them through a public vetting process. A list of the protocols reviewed for this study, along with the number of projects and credits issued by each, is shown in Table 1. We reviewed all IFM protocols with credits issued on voluntary market registries as of March 2022. While this analysis focuses on voluntary offset registries, governments also issue tradable credits from improved forest management projects, such as the UK Woodland Carbon Code.
Forest projects accounted for 30% of the total offset credits issued by voluntary registries in 2022 (Figure 1, top panel; So et al., 2023), mostly from REDD+ [Reducing Emissions from Deforestation and (forest) Degradation], which is the primary type of avoided deforestation offset (21% of 2022 credits), IFM (6%), and afforestation/reforestation (3%). IFM projects have generated 193 million offset credits since the first credits were issued in 2008. This represents 28% of the total forest-based offset credits and 11% of all .
/ gc. . offset credits generated. While 293 IFM projects in seven countries have been issued offset credits, nearly all issued credits (94%) were in the United States and most (80%) are registered under the ARB compliance offset protocol ( Figure 1, lower panel). Further, IFM projects generated close to half of all offset credits from projects in the United States.
To date, most IFM offset credits across all registries have been generated for reducing forest carbon losses by significantly reducing harvesting compared to the chosen baseline scenarios. While some projects support the types of activities highlighted in the literature as having high IFM potential-e.g., improving forest health for greater productivity and resilience, extended timber rotations, and reduced impact logging-so far the majority of credits are from activities that more resemble conservation and avoided degradation than IFM.
All protocols assess project impacts and the number of credits generated as the difference in carbon emissions and removals in the baseline scenario compared to actual levels. As relevant to the particular type of activity, all protocols take into account the major sources of carbon emissions and sinks affected by IFM projects-onsite carbon loss from logging and forest treatments, forest growth, process emissions (e.g., from equipment), and carbon held in harvested wood products. All protocols include procedures for reducing credits generated by an uncertainty deduction, and all set a proportion of credits aside in an insurance buffer pool which can be used to cover reversals such as from fire. Projects that reduce harvesting compared to the baseline also account for estimated displacement of timber harvesting to other lands (leakage). These carbon accounting factors are all discussed in the following section.
. Review of quality criteria . . Additionality and baselines A project's baseline represents land management that most likely would have occurred in the absence of the offset program and is the scenario against which a project's carbon impact is measured. The "true" baseline (counterfactual) is inherently uncertain, because once a project takes place, the baseline cannot be observed. Baseline choice has a large effect on the number of credits issued, so baseline credibility and conservativeness are important to the quality of offset credits (Griscom et al., 2009).
For IFM projects, it is hard to distinguish additionality from baselines. Unlike most types of offset projects that involve a single action in time, such as building a landfill gas capture system, IFM involves a change in practice over the project lifetime. Additionality (would the project activities have occurred without the offset income?) and baselines (what would have happened without the offset income?) are closely related questions. ARB and CAR protocols combine them and treat all divergence from the baseline as additional, while ACR and VCS use separate baseline and additionality assessments (Table 2).
. . . Summary of literature on IFM o set project baselines Badgley et al. (2022b) documented that most ARB projects define their baseline at, or very close to, the minimum level allowed by the protocol. For most projects the minimum allowed baseline is the regional average carbon stock density for the forest type. Badgley et al. found that many participating projects are composed of species with greater carbon stocks than the regional/forest type average as defined by the protocol. Because carbon stocks often change gradually over space but the minimum baseline is defined regionally, there is a strong incentive to enroll lands with naturally higher carbon stocks than the regional average. Badgley et al. estimated that this has led to over-crediting of close to 30% across the study's projects compared to what would have been credited if a more refined method was used to determine the minimum allowed baseline. Coffield et al. (2022) used remote sensing-based datasets to compare the outcomes of 37 California-based ARB IFM offset projects with similar "control" lands. They found lack of evidence that the offset program influenced land management and therefore lack of project additionality. van . / gc. .

FIGURE
Current trends in forest-based carbon o set markets (based on data from So et al., ). The upper panel shows the breakdown of credits issued by project type for forest and non-forest carbon o set projects. The lower panel shows the trend in IFM credit issuances by program/registry.

Kooten et al. (2015) investigated a large VCS IFM project in British
Columbia that assumed a "lumber liquidator" counterfactual-that an alternative forest owner would have aggressively logged the forest. van Kooten et al. found that in this case, the chosen baseline created substantially more carbon credits than would have been generated if a more likely sustainable management scenario was used as the baseline.
Qualitative research also has consistently identified problem areas in baseline setting. Several studies identified asymmetric information as a pervasive, inherent problem in baseline setting for IFM projects. Asymmetric information creates uncertainty for the program administrator and third-party verifier but not the project developer, who implements a project with full information (van Kooten et al., 2009;Asante and Armstrong, 2016;Gren and Aklilu, 2016). For example, one study highlighted the trend of pulp timberland acquisitions by real estate investment trusts (REITs) and timber investment management organizations (TIMOs), who aggressively harvest and then sell the land to carbon project developers (Gifford, 2020). The project developers can report a low baseline carbon stocking as a result of the recent harvesting. This is an example of how a complex management history and asymmetric information make accurate baseline-setting difficult.
One study documented how program administrators deflated baselines in order to reduce barriers to entry in IFM projects. The study quoted one project developer stating that "if baselines are set too high, many potential projects will not be viable for participation" (Ruseva et al., 2017). To the best of our knowledge,  is alone in finding "strong evidence of additionality" of projects under ARB's IFM protocol and suggests that baseline/additionality criteria may be too strict and may impede projects with "multiple desirable features." However, an expanded discussion by the authors suggested that they based their assessment on their observation that some rather than all projects are likely to be additional (Anderson and Perkins, 2017). Their survey of landowners with IFM projects showed that 5 of 17 (29%) selfreported that they were either not confident or unsure whether the offset credits generated by their projects "represent additional carbon sequestration that would not have happened without the forest offset program." . . . Description of the protocols . . . . ARB and CAR-U.S. protocols ARB and CAR-U.S. protocols define the baseline as the average onsite carbon stocks over a modeled 100-year baseline management scenario that should be no lower than the minimum baseline level allowed ( Figure 2). Typical baselines are set at around 30% below initial carbon stocks (calculated from Badgley et al., 2021), and just above common practice (Badgley et al., 2022b). The ARB and CAR-U.S. protocols only require that the baseline scenario is financially .
100-year baseline model -Aligned with legal and other obligations -Must be financially feasible -Not lower than common practice if initial stocks are above common practice -Otherwise, typically at initial stocks Standardized approach to additionality-any forest carbon above the baseline is considered additional.

CAR-Mexico
Initial carbon stocks standardized approach to additionality-any forest carbon above initial carbon stocks is considered additional.

ACR
Economic baseline: assume harvest to the level that maximizes net present value (NPV) over many rotations Project-by-project: -financial barriers, -exceed common practice, -exceed regulation VCS Different baseline approaches (e.g., NPV and historical management) Project-by-project: -not most financially beneficial option or experience other barriers, -exceed common practice feasible and complies with all legal and contractual requirements. Further, the chosen baseline scenario does not need to be shown to be the most feasible or likely without offsets. Setting the baseline below initial or historic carbon stocks raises an over-crediting concern. Instead of being credited for taking action, the forest owner is credited for not taking action that would have reduced the carbon stocks on their lands. In other words, the assumption is that in the absence of offset payments, the land owner would change their management practice in a way that releases carbon. Non-additional crediting has arguably been the most significant quality challenge for carbon offsets generally (Cames et al., 2016;Haya et al., 2020). For the majority of IFM projects with baselines below historical levels, additionality assessment is even more challenging because it is being tested for not taking an action.
In addition, timing of credit generation against the baseline is another quality concern for the majority of these projects. Although baselines are derived from modeled scenarios that are intended to represent realistic harvesting over time (decreasing solid orange line in Figure 2), in the 1st year of the project, project credits are issued against the 100-year-average baseline, which usually represents a sharp, unlikely drop from initial carbon stocks (flat dotted orange line in Figure 2). Thus, even in cases where the baseline is an accurate reflection of the true without-offsets scenario over decades, a large proportion of credits are generated in the 1st year of the project for reductions that will actually take place over a much longer period of time. In effect, this means that future reductions can be used to offset current emissions.
. . . . CAR-Mexico protocol By using ton-year accounting, the CAR-Mexico protocol is structured differently from all other protocols discussed in this paper. Under this approach, the project developers decide on the length of time they commit to maintaining credited carbon stocks, ranging from one to 100 years. A chosen term of 100 years earns full credits without discounting. Any shorter commitment earns a fraction of the calculated carbon impact such that a 1-year commitment earns 1% of the calculated carbon benefits, and a term of 50 years earns 50%.
Using initial carbon stocks as the baseline is more conservative than other protocols and reduces over-crediting risk. However, flexibility in the term of the commitment increases risk of nonadditional crediting. For example, terms that match rotation lengths can potentially earn offset credits without any change to harvest schedules.

. . . . ACR protocol
The ACR protocol uses net present value (NPV) to set the baseline. Project baselines are typically set to a 20-year crediting period and based on a 100-year NPV-maximizing harvest schedule. In general, the approach of setting the baseline as the scenario that maximizes NPV is sound for landowners who seek to maximize profit over a long term, like industrial forest owners who have access to reliable markets. However, this method may poorly predict the management decisions of other landowners who may manage for multiple goals like ecosystem or recreation benefits (Butler et al., 2016). Even where landowners wish to maximize long-term profit alone, irregular market demand may push them to shift their management away from what a simple NPV analysis would predict (Keegan et al., 2011). For example, small plantation owners in the U.S. Southeast currently have limited access to wood markets and, as a result, have older trees, on average, than is economically optimal (Grove et al., 2020). In addition, NPV calculations are based on internal costs, which can be difficult for verifiers to verify.

. . . . VCS protocols
The VCS IFM protocols use multiple approaches to baselinesetting, including historical baselines, legal baselines, common practice baselines, and baselines based on documented management activities. Therefore, there are multiple pathways for establishing a baseline within a single protocol, and these approaches can be applied with more or less rigor. Such flexibility is logical given the diversity of lands that might seek to enroll. However, they also allow project developers to pick the most advantageous baseline, which may lead to over-crediting. Such flexibility means potential offset credit buyers must conduct enhanced diligence to determine how appropriate the chosen baseline is.
VCS uses two additionality tools for its forestry projects which both closely mirror the Clean Development Mechanism (CDM) approach to additionality testing. Landowners must demonstrate that the project is not the most cost-effective land management approach or that other barriers would have prevented the landowner from carrying out the land management credited under the offset project. The land owner must also demonstrate that the credited land management approach is not common practice. In general, these tests have proven to be insufficient in ensuring the additionality of CDM projects (Haya, 2010;Cames et al., 2016), leaving additionality to be assessed primarily with baseline-setting as with the ARB and CAR-U.S. protocols.

. . . Persistent issues and baseline recommendations
Where good data on forest harvest exists, baseline uncertainty can potentially be reduced and conservativeness increased by developing Frontiers in Forests and Global Change frontiersin.org . / gc. .

FIGURE
A sample ARB project and baseline scenario based on a project in Oregon (ACR ). The pattern shown is similar in many other IFM o set projects. "A" represents the credits generated in the st year of the project from the di erence in actual onsite carbon stocks compared to the -year-average baseline. "B" represents the credits generated in years through of the project from forest growth.
baselines on historical practice, initial carbon stocks, similar lands with "dynamic" baselines, and NPV for landowners where NPV is reasonably predictive with some restrictions.
When NPV is used as the baseline, project developers should describe their capacity to harvest at this level and also the market conditions and mill capacity to absorb this harvest. Project developers wishing to use NPV can justify their case by demonstrating that they have a strong history of harvesting on similar lands, or better yet, can demonstrate a history of NPV harvesting on that project property. For projects that cannot demonstrate NPV-type harvest schedules, NPV is likely inappropriate.
Baselines that reflect current carbon stocking of the participating parcel are usually more conservative than broad regional averages. Such baselines only credit removals through growth.
When past management actions are used as baselines, statistical land use models can be used to provide quantitative estimates on the likelihood of harvest given a project's characteristics (Lewis, 2010). Such models can be used to create credible baselines and importantly, these models can be used to simulate alternative baselines which might reflect different market conditions (Radeloff et al., 2012).
The use of dynamic baselines is similar to control plots in experimental science. In this system, properties similar to the offset property in past management, market conditions, ecosystem, landowner type, etc., can be used as the baseline for offset projects. Matching methods developed for causal inference can be used to create comparison sets (Andam et al., 2008;Ferraro and Hanauer, 2014). Each year, the carbon values of the offset and the baseline properties can be compared, and credits can be issued on the basis of this comparison.
An advantage of dynamic baselines is that by observing similar properties in each year, changing market conditions can be integrated into baselines. For example, consider an offset in an area where mill capacity falls dramatically. Under static baselines, the offset would continue to generate credits, even though in reality there may be no market for timber in the area. Conversely, if a new technology increases the profit of harvesting, more credits could be granted. Dynamic baselines solve this problem by accurately reflecting baseline conditions relative to the project in pre-defined time periods. Such baselines might be particularly useful in areas where markets are in rapid flux, where forest managers cannot show that they have historically managed for NPV, or where land use is rapidly changing.
With all of these options, adverse selection might still lead to over-crediting. Because landowners or project developers will always know more than registries and verifiers about what would have happened without the offset income, adverse selection is a persistent issue. Statistically, adverse selection can be thought of as an unobserved variable that is correlated with the treatment decision (project enrollment) and the outcome (forest harvest). If this unobserved variable is correlated with increased enrollment and decreased forest harvest, the baseline is an overestimate of the true counterfactual. For example, this might be the case where a landowner has a strong conservation ethic and prefers to preserve rather than cut down their trees. A case like this can lead to overcrediting, because such a landowner is unlikely to harvest, even in the absence of the program.
Using historical forest harvest data can help to control conditions that lead to adverse selection, especially if these conditions do not change over time. For example, in the case of a conservation-minded landowner, if they have held similar preferences in the past, a baseline that takes into account their historical harvest levels would not overcredit (even though we cannot measure their land management philosophy). At the same time, a baseline based on regional averages or NPV alone would likely over-credit.
However, while historical baselines can help to account for unobserved variables that do not change over time, they cannot account for cases where the unobserved variable is not static. An example of this could be when a property is inherited or purchased by a new landowner. The application of a historical baseline for a property that had been harvested, but was purchased by a conservation NGO and then later enrolled in an offset program could lead to over-crediting because the true counterfactual for the new landowner is different than from the past landowner.
Dynamic baselines cannot directly account for the problem of adverse selection. To the extent that similar properties also have similar unobserved variables, then matching may reduce the impact of adverse selection. However, there is limited empirical evidence for this. Indeed, using nearby non-enrolled parcels as "control plots" could actually increase the effect of unobserved variables: if some parcels enroll and others do not, then it may precisely be an unobserved variable that is influencing this self-selection, biasing the dynamic baseline in favor of overcrediting.

. . Leakage
The Intergovernmental Panel on Climate Change IPCC (2007) defines leakage as "the unanticipated increase or decrease in greenhouse gas (GHG) benefits outside of the project's accounting boundary as a result of the project activities." Three types of leakage are relevant for forest-based offset programs: activity leakage, output market leakage, and land market leakage (Meyfroidt et al., 2020). The latter two types of leakage are collectively referred to as market leakage. Activity leakage occurs when mobile factors of production (labor and capital) are no longer needed in the offset program area and are reallocated to similar activities outside of the program area. Output market leakage occurs when changes in harvesting inside the project area affect timber prices and change harvesting outside the project area by non-participating forest managers. Land market leakage occurs when changes in timber harvesting on offset project lands changes the value of timber land relative to other land uses and provides incentives for land conversion into managed timber land or from timber land into other uses.
There is no broad agreement on how offset registries should incorporate leakage into their IFM protocols. The approach taken by the protocols is to deduct credits from a project based on a specified leakage rate. The protocols differ in the leakage rate applied, when and how it is applied, and whether the protocols account for activity leakage explicitly. Each of these aspects of leakage is discussed below and summarized in Table 3.

. . . Market leakage rate
All protocols have a mechanism for deducting leakage when timber harvesting is lower in a project relative to the baseline. All protocols use a leakage rate that reflects the assumed percent of onsite carbon loss (or gain) from a change in timber harvesting due to the offset projects that are lost (or gained) in other forests to which the harvesting is displaced.
ACR applies a 10% leakage rate if the project reduces harvesting by 5-25% compared to the baseline, and 40% if reduction in harvesting is more than 25% compared to the baseline. In the ARB, CAR-U.S., and CAR-Mexico protocols, leakage is deducted at a constant rate of 20%. Leakage rates used by all of the VCS protocols reviewed vary based on the carbon density, defined as the ratio of merchantable biomass to total biomass, of the forests where the displaced harvesting is assumed to occur compared to the forest enrolled in the carbon project. If harvesting is expected to shift to a forest with a ratio of merchantable biomass more than 15% lower than the project forest, a higher leakage rate (70%) is applied; if the destination forest produces more than 15% more merchantable biomass, relative to the project forest, a lower leakage rate applies (20%); if displacement occurs in a similar forest type, a 40% leakage rate is applied. VCS's extended rotation protocol (VM0003) also prescribes a 10% leakage rate if the rotation extension is <10 years and the harvest reduction over this time frame is <25%. VCS protocols exclude international leakage from their deduction formulas and allow for project-specific justifications for the application of a 0% leakage rate.
The academic literature has estimated forest carbon leakage using two general methods. Partial and general equilibrium models are complex optimization models based on economic theory of how markets function and calibrated to real-world data. Behavioral parameters, such as supply and demand elasticities, are drawn from the economic literature. These models are designed to capture the interconnectedness of different markets. General equilibrium models capture all economic flows within an economy, while partial equilibrium models usually focus in more detail on a subsection of the overall economy. Equilibrium models are generally used for ex-ante economic and policy analysis. Causal econometric models, which are an ex-post evaluation methodology that use statistical techniques to evaluate programs, have been utilized to assign causal attribution to leakage from other project types (e.g., Roopsind et al., 2019), but not IFM programs or projects. Challenges in applying causal inference methods to IFM include difficulty in observing a plausible harvesting counterfactual and the challenge of isolating program effects when so many IFM offset programs are currently being implemented with different rules.
Studies estimating leakage rates from reducing harvest activities have found a wide range of plausible leakage rates depending on different locations, spatial scales, time horizons, and methodological approaches. Some studies focused on national IFM programs (primarily in the United States), while others focused on global estimates. Studies in the United States context showed that leakage rates are generally higher than those commonly used in the protocols. In an econometric study of the effects of an 85% reduction in harvest on public lands in the Pacific Northwest of the United States during the 1990's, Wear and Murray (2004) found substantial evidence of output market leakage as softwood lumber prices increased by 15%. They estimated that nearly 84% of the timber harvest restriction shifted to unrestricted areas. Of that 84% leakage, they found that 43% in the region, 15% in other U.S. markets, and an additional 26% in Canadian markets. Using a general equilibrium model, Gan and McCarl (2007) estimated leakage rates from U.S. forest offset programs to be in the 75-78% range, including both domestic and global leakage.
One challenge in applying rates from the published literature to the protocols is that most, rather than quantifying leakage in units of carbon, estimate leakage of another metric like harvested .
/ gc. . wood products (Wear and Murray, 2004) or economic welfare (Gan and McCarl, 2007). Murray et al. (2004) and Murray et al. (2005) applied modeling frameworks that estimate carbon leakage directly. Murray et al. (2004) showed that domestic leakage rates (ignoring international leakage and focused on carbon instead of timber) for forest offset set-aside programs in the United States can vary from 16 to 68% depending on where the offset occurs in the country and carbon density of the protected forest. Murray et al. (2005) also conducted extensive carbon leakage analysis of forest sector carbon programs but did not focus explicitly on improved forest management is the focus of the protocols reviewed here. Sun and Sohngen (2009) used a global economic optimization model and found that set-aside programs applied globally, which permanently reduced the land available for forest harvest, resulted in leakage rates of 47-52%, depending on the specific land taken out of production. Several studies in countries other than the United States showed significant variation in IFM leakage rates. Kallio and Solberg (2018) estimated leakage rates of 60-100% from harvest reduction projects in Norway. While the model had a relatively limited temporal and carbon accounting framework, it found that the variation in leakage rates is driven by the degree of harvest reduction, the type of forest product considered (e.g., pulpwood vs. sawlogs), and the forest product supply elasticity. By contrast, Sohngen and Brown (2004), estimated leakage rates of 2-38% for a Bolivian forest setaside program. The country-to-country differences were likely driven by the country's integration into global wood product markets.
Based on findings from the literature and factors identified in Murray et al. (2004), leakage risk is likely to be highest in tight timber markets with responsive supply and in regions where non-participating land can produce similar timber products. One important caveat is that the economic equilibrium models used in the academic literature assumed that all actors have perfect information and as a result may slightly overestimate leakage risk in practice when markets are slower to adjust. More research is needed to update and refine understanding of leakage in IFM carbon projects. One particularly important area of future research is in leakage from short-term harvest deferrals.

. . . Activity leakage
There is variation in how the protocols consider market vs. activity leakage. CAR and ARB do not distinguish between market and activity leakage; any activity leakage is effectively included in the 20% market leakage rate. ACR and VCS monitor activity leakage separately. Under both of these registries, if production declines by more than 5% relative to the baseline, the landowner must demonstrate that no leakage occurs on other lands they manage or operate outside of the offset project. Landowners can demonstrate that no activity leakage occurs with historical harvesting records, or forest management plans prepared at least 2 years prior to the start of the project showing no change in harvesting on non-project lands with the implementation of the offset project. ACR includes a third option where landowners can demonstrate that they are not engaging in activity leakage if all lands owned by the landowner are certified as sustainable, such as by the Forest Stewardship Council (FSC).
These requirements prevent the most flagrant violations of activity leakage, but there are plausible cases when activity leakage might still occur. For example, a landowner could write a forest management plan with increased levels of harvesting and then enroll part of their lands in a carbon project 2 years later. As another example, FSC certification does not prevent any increase in harvesting, and thus activity leakage could easily occur on FSCcertified land. On the other hand, cumbersome activity leakage rules .
/ gc. . may prevent timberland owners from being able to enroll portions of their forest holdings as carbon projects due to the inability to manage unenrolled lands in response to changing wood product markets.

. . . Timing of the leakage deduction
In addition to market leakage rates, the timing of the leakage deduction can have large effects on the number of credits issued. Prior research found that the ARB and CAR-U.S. protocols tend to greatly over-credit at the start of each project, due to a timing mismatch in the construction of the baseline scenario (Haya, 2019;Haya and Stewart, 2019). Most ARB IFM projects start with carbon stocks far above estimated baseline levels; initial carbon stocks 40-50% higher than baseline levels are typical (Haya, 2019). This is based on the assumption that without the offset program, timber would be aggressively harvested, reducing onsite carbon stocks substantially. This initial onsite carbon above the 100-year-average baseline is credited in the first reporting period, promptly generating a large number of credits without requiring any change in land management.
However, the displacement of harvesting (leakage) associated with that large reduction in harvesting is not all deducted in the project's 1st year, but rather is deducted evenly over the 100-year life of the project. This results in over-crediting at the start of the project, which is gradually paid back over the project life. We are not aware of any academic literature that has examined the correct timing of harvest displacements in timber markets. A conservative approach would apply the leakage deduction in the year that harvest was assumed to occur in the baseline and is credited by the project. Haya (2019) estimated that this correction would reduce the number of credits generated by the ARB protocol by 35%, and if the correction were combined with a higher leakage rate of 40-80%, crediting would be reduced by 51-82%. Levels of over-crediting would be even higher if reversals were not adequately monitored and compensated for after the end of the final reporting period in which credits were issued (Haya, 2019). The CAR-Mexico, ACR, and VCS protocols do not have this timing issue.
Leakage can also result in positive carbon outcomes when the project increases timber harvesting, thus leading to less harvesting elsewhere. None of the protocols account for reverse leakage from increased harvesting compared to the baseline, which is a form of conservativeness built into the protocols. Only the CAR protocols allow for reverse leakage to be counted if cumulative leakage from the project start is positive. While accounting for leakage annually is more conservative, cumulative leakage accounting may create more incentive for forest owners to decrease harvesting temporarily and conduct thinning to enable increases in harvesting later from an older, better managed forest.

. . . Recommendations on leakage
Leakage is a complex economic phenomenon that is both hard to quantify and likely varies considerably across many dimensions, including IFM project type, location, and supply and demand conditions. The risk of over-crediting due to leakage would be reduced considerably if baselines were set more conservatively as described above. More conservative baselines that involve no or little difference in harvesting compared to the project would involve lower estimates of leakage, and so uncertainty in the leakage rate would have less impact on the number of credits generated. ARB and CAR-U.S. protocols, which attribute leakage evenly over 100 years, are likely to over-credit significantly in the 1st year of each project that chooses a baseline lower than initial carbon stocks (which is the case for most projects). This source of over-crediting can be easily removed if leakage were deducted at the same time that the onsite benefits of reducing harvest are credited.
Current literature does not provide much guidance on the appropriate leakage rate to apply in specific contexts. Generally, the literature supports higher leakage rates than are currently used, although there are only a few studies that are mostly decades old and based on national or global economic equilibrium models or statistical evidence from large policy changes. For projects that reduce harvesting permanently, a higher leakage rate than those used by current protocols would be conservative given the large uncertainties. However, there is a risk that large, immediate leakage deductions may discourage extended rotation projects with only temporary leakage risk. This may be partially remedied without overcrediting by assuming leakage plays out over several years. This would strike a balance between the ARB and CAR-U.S. protocols (which average baseline harvesting, and therefore leakage deductions, over 100 years), and ACR, VCS, and CAR-Mexico protocols (which deduct leakage immediately). In addition, assessing leakage cumulatively would better reflect the impact of projects that defer rather than reduce harvesting. Currently only the CAR protocols credit projects for reverse leakage when increased harvesting compared to the baseline is likely to cause less harvesting elsewhere. These credits can be earned if cumulative emissions from leakage over the project lifetime are still positive. Lastly, discretion for projects to choose the leakage rate, as offered by all VCS protocols reviewed, has the potential to lead to under-counting leakage impacts.

. . Durability
Carbon stored in ecosystems is inherently impermanent. Forest carbon can be released through natural occurrences like fire, drought, disease, and wind, and through human actions like harvesting and land use conversion. Protocols address these risks of reversal with commitments to maintain carbon storage over a designated period (the project term), incentives to design projects to reduce reversal risk, and recourse if reversals do occur.
The project term describes the length of time during which a project is contracted to maintain credited carbon stocks. Some protocols create incentives for forest management that reduces reversal risk. All registries host an insurance buffer pool to replace credits if a reversal does occur. Buffer pool contributions are designed to cover the calculated likelihood that those carbon stocks will be reversed, i.e., re-emitted to the atmosphere. Programs and projects vary widely across project term, risk of reversal, and reversal recourse.
The reviewed protocols have varied project terms that range from a year to a century (Table 4). The CAR-U.S. and ARB forest offset protocols have the longest project terms: 100 years from the date of credit issuance. By contrast, other protocols define the project term from the project start date rather than from the last credit issuance. For example, a VCS project with a term of 30 years may generate credits in year 20 that are only guaranteed for the remaining 10 years.
For large registries, buffer pools can be made up of a large, diverse pool of credits that offer significant risk mitigation for  individual projects. Each protocol has a different approach to allocating buffer pool credits. Intentional reversals can include, for example, negligence on the part of the project developer or active harvesting. Unintentional reversals include natural reversals, like fire and disease, and human-caused reversals that are outside the control of the project operator. Notably, the ACR and VCS buffer pools can be used to cover both intentional and unintentional reversals, while ARB, CAR-U.S., and CAR-Mexico buffer pools can only be used to cover unintentional reversals. Under these protocols, intentional reversals must be replaced. VCS allows a portion of buffer pool credits to be returned to the salable credit pool if the risk of reversal within the project lifetime can be shown to decline over time.
. . . Do the protocols adequately ensure durability?
Project terms are highly variable across protocols, but even the longest term (100 years) does not constitute a truly permanent offset equivalent to reducing fossil fuel emissions. Forest credits used to offset fossil fuel emissions convert carbon permanently stored as fossil fuels into carbon stored in trees in the short-term carbon cycle. If the end of a project term represents a reversal event, then nonpermanent carbon storage (like all IFM projects) can more accurately be understood as delaying, not fully neutralizing, emissions (Herzog et al., 2003). Decisions about the appropriate duration of carbon storage fundamentally depend on assumptions about the future, and academics have called the default choice of 100 years "political" (Archer et al., 2009;Allen et al., 2016). In practice, project terms in IFM projects can range from 1 to 100 years, and there is not yet a widely adopted framework for comparing these different terms. Even taking for granted that these projects do not represent permanent offsets, questions remain about whether the current approach (relying on buffer pools) can achieve the promised durability. Three key limitations of buffer pools could critically undermine their usefulness. First, none of the reviewed protocols take climate change into account in estimating buffer pool allocations and so may not reflect increasing risks of reversal over decadal time scales. For example, the ARB protocol for U.S.-based projects includes a buffer allocation of 2-4% for fire, 3% for biotic risks, and 3% for "other episodic catastrophic events" (e.g., drought). However, because annual acreage of forest fires in the United States is projected to quadruple by the end of the century even under a moderate emissions scenario (Anderegg et al., 2022), current buffer pool allocations may prove insufficient on the basis of wildfire risk alone. If recent wildfire trends continue in the United States, the entirety of the buffer pool for existing ARB projects will be consumed well before its intended lifetime is up (Badgley et al., 2022a). The ACR and VCS protocols have similarly low buffer allocations for natural disturbances, although no systematic assessment of these buffer pools have been conducted in the academic literature. A proposed VCS risk calculation tool may remedy this by using Climatic Impact Drivers (CIDs) to project increased risk.
Second, some registries may not have a sufficiently diversified offset portfolio to effectively mitigate risk through the buffer pool mechanism. Such systemic risks may arise when a large proportion of projects in a registry are similar and/or exist in a constrained geographic area or ecological type. For example, the ARB compliance offset pool, which is composed mostly of IFM projects entirely in the United States (Badgley et al., 2022b), may be exposed to systemic forest risks that decrease the efficacy of the buffer pool as a risk mitigation tool.
Third, a buffer pool is defined by the quality of its constituent credits. Buffer pools composed of low quality credits have little value. Extensive work has shown systematic issues with additionality, baselines, leakage, and carbon accounting for land-based offset projects across protocols (e.g., Haya, 2019;West et al., 2020;Badgley et al., 2022b). Further, the ACR protocol allows project developers to put credits into the buffer pool from any ACR project (not just the project under consideration), which creates a perverse incentive to fill the buffer pool with low-value, potentially non-additional credits.

. . . Recommendations on durability
Broadly, climate change is expected to push forest systems toward younger, shorter, less carbon-dense forests (McDowell et al., 2020). These future forests are expected to have higher rates of mortality due to climate-exacerbated disturbances, making the carbon they store less durable (Anderegg et al., 2020;McDowell et al., 2020). Many types of disturbances are expected to increase in both frequency and severity. Offset registries should incorporate these increasing risks into the rules defining buffer pool allocations. If possible, reversal risk should be defined in a spatially explicit way to reflect the fact that different types of risks vary tremendously depending on the location, species composition, and stand structure (Anderegg et al., 2020). Further, existing protocols give minimal incentive to reduce disturbance hazards and could be updated to more actively reward management activities like prescribed burning, species selection, and https://verra.org/wp-content/uploads/ / /Risk-Report-Calculation - thinning that increase resistance to reversals (Stephens et al., 2020;Herbert et al., 2022). New time accounting frameworks have been proposed to clarify the value of shorter project terms. These fall into two broad categories: vertical and horizontal stacking of offset credits. Vertical stacking approaches, which include ton-year accounting like that used by the CAR-Mexico protocol, involve purchasing multiple short-term credits upfront to offset emitted CO 2 . The multiple approaches to vertical stacking can have highly varied results depending on which assumptions are made (Levasseur et al., 2012;Groom and Venmans, 2022) and have been criticized for simply postponing climate impacts (Carton et al., 2021). Horizontal stacking, sometimes called offset rental or leasing, involves repeat purchasing of offset credits after they expire or after a reversal occurs (Herzog et al., 2003), which, if adequately enforced, could ameliorate some of the challenges of short durability terms.

. . Carbon accounting
Carbon accounting in the context of IFM protocols includes a variety of measurement and estimation techniques that attempt to accurately and precisely quantify carbon stocks in biomass and harvested wood products, as well as changes in these stocks that result from project activities (Table 5). Major sources of uncertainty in estimating onsite carbon stocks in the biomass pools fall into four categories: (i) accuracy of measurements in the field; (ii) choice of allometric models (including selection of wood density values and root:shoot ratios); (iii) sampling uncertainty related to plot size; and (iv) sampling uncertainty related to statistical representativeness of the plots within the whole landscape (Chave et al., 2004;Temesgen et al., 2015). For the soil and litter pools, substantial uncertainty exists around both the processes of organic carbon cycling, as well as accurately quantifying highly variable carbon stocks across space. Lastly, uncertainty surrounding carbon benefits from harvested wood products primarily relates to life cycle considerations, such as duration of use or potential climate benefits from product substitution.
All protocols include estimation of carbon stocks in aboveground and belowground biomass, with the exception of the VCS protocol for the Conversion of Logged to Protected Forests (VM0010), which presumes that root biomass is likely to remain constant or moderately increase. Typically, when a carbon pool is excluded from projectlevel carbon accounting, the decision is justified by an assumption that the change in the pool will be negligible under approved project activities, or will result in net carbon accumulation and thus can be excluded for conservative estimation. For example, in the context of the soil carbon pool, the stock is only estimated and included in project emissions to subtract losses from disruptive management activities or site preparation from a project's carbon benefit. Carbon pools with relatively smaller stocks compared to living tree biomass, such as standing or lying dead biomass or aboveground non-tree vegetation, are included or excluded on the basis of whether the activities eligible under the protocol are likely to have significant impacts on these stocks.
We discuss the protocol methods for estimating carbon in aboveground biomass, belowground biomass, soil carbon stocks, and harvested wood products in the following sections. Further, we identify several accounting practices that may be uncertain or yield systematic errors in carbon accounting.

. . . Aboveground biomass
The protocols employ standardized approaches to measurement of aboveground carbon stock changes. High level-guidance from the IPCC tends to distinguish between "stock change" vs. "flux" approaches to measuring carbon sources and sinks. While "flux" approaches measure GHG exchanges to and from forested systems, "stock change" approaches quantify carbon stocks across pools as well as the changes in them. The protocols that we reviewed primarily use stock change approaches, which include plot-based inventories with extrapolation to the project area, field measurement of trees, and use of allometric equations (which describe non-linear relationships between a tree's biomass and its more easily measured parameters, such as its height and/or diameter).
The protocols tend to provide appropriately rigorous, high-level guidance on inventory design under a stock change approach that aligns with recommendations from the IPCC (2019). Forest structure and composition (and thus aboveground biomass) can be highly variable. The protocols allow flexibility in carbon accounting such that project developers can adapt methods to local conditions and efficiently conduct monitoring, reporting, and verification. Protocols allow either permanent or temporary sample plots (ACR, ARB) as well as stratified random or systematic random plot designs (CAR-U.S.). Both approaches can produce unbiased and precise estimates of aboveground carbon stocks, but will depend on local forest structure and composition as well as the field inventory design used. IFM projects in regions with fewer relevant datasets may use less appropriate allometric equations and thus less robust estimates of aboveground biomass (Yuen et al., 2016). Depending on the methods used, overestimation of aboveground carbon stocks can occur (Clough et al., 2016), but this is likely to be less consequential to the overall validity of a forest carbon project than other considerations (e.g., baselines and leakage).
Methods for quantifying forest carbon stocks and their changes are rapidly evolving, including through the integration of field-based methods and remote sensing. Although challenges associated with accurately measuring changes in below-canopy forest structure for some remote sensing types (e.g., optical imagery) may limit their application to IFM projects (Asbeck and Frey, 2021), we expect technological advances to improve its future utility. However, a full discussion of these future opportunities is out of scope of this study, and we refer the reader to other reviews of the topic (Goetz and Dubayah, 2011;Xiao et al., 2019).

. . . Belowground biomass
Belowground biomass refers to living roots, typically comprising 15-25% of total living biomass in a forest (Jackson et al., 1996). The belowground biomass pool does not include soil carbon, microbial carbon, or dead roots (although living roots contribute directly to each of these other pools via complex processes including root death, root exudates, and interactions of mycorrhizal fungi). Belowground biomass estimation models vary widely across protocols. Because . / gc. . e Dead wood stocks can be excluded unless the project scenario produces greater levels of slash than the baseline and slash is burned as part of forest management. If slash produced in the project case is left in the forest to become part of the dead wood pool, dead wood may be excluded. Project proponents may elect to include the pool (where included the pool must be estimated in both the baseline and with project cases) as long as the dead wood pool represents <50% of total carbon volume on the site in any given modeled year. f The protocol provides an approach for accounting for this pool, but also allows for exclusion of wood products if transparent and verifiable information can demonstrate that carbon stocks in wood products are rising faster in the project case than in the baseline or are decreasing faster in the baseline than in the project case. g Dead wood from logging (slash) is included in the baseline.
empirical measurement of belowground biomass is difficult and timeconsuming (requiring excavating, cleaning, sorting, and weighing roots), belowground biomass is estimated indirectly based on aboveground biomass measurements. The IFM protocols estimate belowground biomass using allometric equations or root:shoot ratios, which are inherently unable to capture detailed natural variation and, additionally, may introduce systematic errors by being inappropriately matched to the system in question (Ledo et al., 2018). Root:shoot ratios assume that belowground biomass occurs in a fixed ratio to aboveground biomass, whereas allometric equations allow for non-linear relationships. VCS protocols tend to provide the greatest flexibility in ratio selection for belowground biomass estimation. VCS establishes basic criteria for eligible models, including peer-review, appropriate parameterization, and consistency with the original scope of the study. Regions with more abundant literature documenting root:shoot ratios enable developers to select estimates that produce the greatest number of credits. For example, VM0003 allows for use of the standard root:shoot ratios cited in Cairns et al. (1997), or any root:shoot value from research literature or national inventories with comparable climate and forest type. VM0012 is more stringent, requiring the use of the Cairns et al. ratios unless project-specific measurements have been taken. VM0010 is the only protocol that excludes belowground biomass entirely.
Both CAR and ARB require that projects in Washington, California, and Oregon use the Cairns et al. ratios. For other contiguous states, CAR and ARB protocols provide region-specific component ratio methods (which further divide aboveground and belowground biomass into subcompartments). ACR requires use of USFS merchantable volume equations tailored for region and species, which are then extrapolated to belowground biomass using ratios in Jenkins et al. (2003).
Because relatively little empirical belowground biomass data exists for validating either the allometric or root:shoot ratio approaches, it is not well-understood which of these approaches is preferable, what magnitude of error they may introduce, and whether they systematically over-or underestimate belowground biomass according to vegetation type, region, or climate regime (Xing et al., 2019). Across protocols, the Cairns et al. (1997) and Jenkins et al. (2003) reviews underpin nearly all belowground biomass estimates in IFM projects. Efforts to "spot-check" the validity of these simple modeling approaches have sometimes revealed large errors: for example, Xing et al. (2019) used empirical data to reveal that a root:shoot ratio approach overestimated belowground biomass in a Canadian poplar forest by between 18 and 42%.

. . . Soil carbon
Soils comprise 56% of the carbon stock within managed ecosystems across the United States, and 80% of the terrestrial carbon pool globally (Lal, 2008;Domke et al., 2017). IFM protocols rarely require the measurement or estimation of soil organic carbon (SOC) stocks and fluxes due to the assumption that changes in the soil pool are negligible relative to credit volumes and due to the considerable expense and logistical challenge of measuring the soil carbon stock accurately and comprehensively (Paustian et al., 2019). ACR and VCS IFM protocols fail to account for advances in soil science, and potentially omit declines in SOC caused by certain IFM practices. In some instances this omission could enable over-crediting by neglecting substantial losses in soil organic matter that are likely not recuperated during the crediting period (Johnson and Curtis, 2001;Jandl et al., 2007;Noormets et al., 2015;Johnson and Henderson, 2018). A growing body of literature indicates that site preparation and .
ongoing management can cause significant disturbance to soil stocks, especially in litter, organic, and topsoil carbon pools, partially eroding the benefits of biomass stock increases (Jandl et al., 2007;Achat et al., 2015). In the crediting context, the primary consideration should be whether soil disturbance and SOC stock declines under IFM exceed the baseline. Some IFM practices, such as extended rotation and retention of coarse woody material, are unlikely to yield significant or persistent changes in the soil carbon stock, and may prevent SOC losses that may have occurred under the baseline (Mayer et al., 2020). In contrast, mechanical site preparation, such as thinning, planting, removal of brush or shrubbery, or partial harvesting, may have significant and long lasting negative impacts on the SOC pool (Walmsley and Godbold, 2010;Zhang et al., 2018). The CAR and ARB protocols most appropriately and conservatively include these fluxes by requiring that projects with site preparation, harvesting, or treatment (deep ripping, furrowing, or plowing where soil disturbance exceeds 25% of project area or is not done on contours) estimate the loss of soil carbon as a product of biomass removal, mineral soil exposure, and frequency of disturbance. Estimated carbon stocks and losses are calculated using predetermined coefficients, which are determined by the soil order, harvesting intensity, disturbance frequency, site treatment, and tree type composition. This is aligned with a growing body of evidence demonstrating that harvesting can yield losses between 8 and 11% in the top meter of soil (James and Harrison, 2016). Similarly, thinning and removal of dead biomass reduce organic matter inputs, compact topsoil, mix soil layers, and reduce the total SOC stock (Mayer et al., 2020;Kaarakka et al., 2021). These impacts are most substantial in the organic layer and topsoil (0-10 cm) even under conventional thinning practices, demonstrating losses of ∼25 and 5% of total SOC stock 10 years after management, respectively (Achat et al., 2015). SOC stocks are not homogenous and can be considered relatively recalcitrant or labile depending on the degree to which the carbon is mineral-associated or particle-associated organic matter (Lavallee et al., 2020). On average, the top 20 cm of forest soils in the United States contain ∼230 tCO 2 /ha (Cao et al., 2019), thus a loss of 15% of this stock across only 20% of the project area may reduce total project credits on the order of 7 tCO 2 /ha. For context, across the 74 projects reviewed by Badgley et al. (2022b), credit issuances averaged 73 tCO 2 /ha, implying an average project could over-credit by 10% or more without violating CAR or ARB SOC stock estimation requirements. However, this is only relevant to crediting outcomes if the SOC stock under IFM declines more substantially than the baseline, which is unlikely in projects that involve a reduction in harvesting.
Only CAR and ARB allow for the inclusion of the SOC pool, and require it if the stock is likely to decline due to site preparation disturbances or other management activities. Appropriately, none of the IFM protocols include an option for additional crediting from increases in SOC. All VCS and ACR IFM protocols presume that impacts on soil carbon would be negligible or positive relative to the baseline. To rigorously incorporate the impact of SOC losses within IFM projects, protocols would need to quantify not only the impact of project management practices, but also the alternative impact to the soil carbon stock under the baseline scenario.

. . . Harvested wood products (HWPs)
The harvest of biomass for use in wood products is included in all reviewed IFM protocols with the exception of the CAR-Mexico protocol, whose projects are not expected to significantly alter the production of wood products. The ARB, ACR, CAR-U.S., and VCS protocols all offer detailed methodologies for estimating the carbon stock stored in wood products. The methodologies require an estimate of the carbon stock for both baseline and project HWPs. In general, they follow a similar process where project proponents must estimate (a) the volume of timber removed in the project and baseline scenarios, (b) the merchantable carbon in these HWPs, the carbon loss due to mill processing, and (c) the decay of HWP carbon in final products and landfills over a 100 year horizon. This decay rate varies based on the lifetime of the product category.
For example, in ARB and CAR-U.S. projects, carbon in HWPs is annualized across a 100-year decay function to generate a HWP "storage factor." This means that each year, carbon flowing into the HWP pool is immediately discounted to its 100-year average value. In other words, a large portion of carbon reduced in the forest as a result of harvesting is assumed to instantaneously decay. Since much of that carbon is actually released over decades rather than immediately, for the first 50 years of the project, if the project harvests less than that projected in the baseline scenario carbon, which is the case for most IFM projects, benefits and credits are overestimated. ACR and VCS protocols use similar "storage factor" approaches for estimating carbon in HWPs.
All of the protocols we reviewed exaggerate the emissions associated with the production of HWPs by ignoring their displacement of other fossil-intensive alternatives. Substitution benefits are typically high for construction-based materials, such as steel or concrete (Smyth et al., 2017;Geng et al., 2019) and vary widely for energy products, such as biomass used to generate electricity and heat, based on the product displaced (Cabiyo et al., 2021). Ignoring these benefits results in some over-crediting and also shifts protocol incentives toward projects that reduce harvesting.

. . . Recommendations on forest carbon accounting
The accuracy and precision of estimating forest carbon stocks within IFM protocols should improve over time as measurement technologies, inventories, allometric equations, and root:shoot ratios improve. IFM protocols generally provide appropriate selection criteria for plot distribution, measurement, and carbon stock estimation and distribution methods. The accuracy of a given site's carbon stock estimate is likely to be most significantly impacted by the availability of regionally tailored and species-specific allometric equations and root:shoot ratios to approximate the impact of IFM practices on biomass distribution. Accounting for carbon in harvested wood products is more straightforward than estimating carbon in the ecosystem, and unnecessary over-crediting in the early decades of a project could easily be avoided by modeling HWPs in a temporally realistic way instead of immediately discounting them to their 100-year "storage factor." Lastly, the protocols should account for potentially significant and lasting losses in soil carbon pools as a result of disruptive site preparation and management methods. While CAR and ARB have already incorporated literature-driven methods to account for reductions in the soil carbon stock of a project, more .
research is needed to understand how specific practices, species, and soil types respond to interventions.

. Discussion and conclusions
Carbon offsets have the potential to direct substantial funds into improved forest management, helping realize the potential for forest management to sequester carbon and achieve a range of other environmental and societal benefits. Carbon offset quality matters. Offsets are designed to compensate for known GHG emissions, reducing the overall cost of meeting an emissions target. If they generate more credits than their actual impact, they can reduce and obscure the efficacy of climate change mitigation efforts. In this paper, we compare the offset protocols that have generated offset credits from IFM globally with literature on quantifying carbon impacts from IFM activities. Focusing on all major elements of carbon accounting-baselines, additionality, leakage, durability, and carbon pool quantification-we document shortcomings of each protocol, and suggest specific ways they could be improved to reduce the risk of over-crediting.
The most important area for reducing over-crediting is changing the way baselines are determined. All protocols, except for CAR-Mexico, offer substantial flexibility in setting project baselines. When there is flexibility, project developers have a financial incentive to choose the option that generates the most credits. ARB and CAR-U.S. allow the developer discretion to use any modeled baseline that is financially, legally, and contractually feasible, and not below the minimum allowed baseline, which is defined as the regional average for most projects. With that discretion, most developers choose baselines at or very close to minimum allowed levels (Badgley et al., 2022b).
Similarly, for the ACR protocol, baselines are defined as the scenario with the highest net present value (NPV) for the landowner. While NPV is a conceptually accurate way to predict land management for industrial forest owners, it is not a good predictor for many landowners seeking to manage for multiple uses, like recreational or ecosystem benefits. Further, it can be difficult for verifiers to assess NPV claims due to information asymmetries. All four VCS protocols provide developers with flexibility in choosing the baseline scenarios. Only the CAR-Mexico protocol prohibits baselines below initial carbon stocks, but the ability for project developers to choose any crediting period between 1 and 100 years increases the risk of non-additional crediting.
In the current market, flexible baseline setting rules have resulted in a large portion of credits being generated from claims that projects prevent forest carbon loss with large reductions in timber harvesting. These projects look more similar to conservation or avoided degradation projects than to improved forest management. While these baselines might be accurate for some projects with potential for real climate benefit, the flexibility all protocols give can lead to significant over-crediting.
Several changes to the protocols could result in more accurate and conservative baselines. Baselines set at current levels or past practice for the particular parcel (not for a broad regional average) or with dynamic baselines or NPV for some forest lands are more conservative than current methods that have systematically resulted in aggressive harvesting baselines. Choosing baselines at or close to initial carbon stocks, and avoiding the deep baselines currently used allows landowners to be credited for changing their land management practice (compared to the past, present, or other similar lands dynamically), rather than for not changing it. Avoiding aggressive harvesting baselines would also lessen over-crediting from leakage and harvested wood product accounting and improve the effectiveness of reversal buffer pools by improving the quality of the credits in them.
NPV baselines are justifiable for industrial timberland owners who can show a history of management consistent with NPV and who have steady access to contract labor and mills. Dynamic baselines, while unproven in the market, offer a number of advantages because they can adjust to market conditions over time. However, until dynamic baselines are applied to real-world settings, their strengths and weaknesses may not be completely understood.
All of these baseline setting methods still risk over-crediting due to adverse selection. Adverse selection can occur because landowners that do not need to change their forest management practice to earn offset credits are the most likely to participate and earn credits against standardized rules, undetected due to information asymmetries.
While setting more conservative baselines is likely to remedy a large portion of over-crediting risk under current protocols, we identified several other areas where the current protocols could be better aligned with the scientific literature.
One important correction to the ARB and CAR-U.S. protocols is to fix a contradiction in the baseline scenario. Currently, in the 1st year of a project, landowners are rewarded for the difference in onsite carbon stocks between actual onsite carbon stocks and the often much lower baseline level, while deductions for leakage and carbon in harvested wood products in that year are based on 100year average harvest rates. A straightforward correction is to assume levels of harvesting in the baseline that match any assumed drop in onsite carbon stocks. In order to avoid discouraging projects that extend rotations by reducing harvesting for short periods, leakage deductions could be applied over several years, and all protocols could account for positive leakage cumulatively rather than annually when harvesting is larger in the project than in the baseline scenario. Similarly, protocols could avoid over-crediting by crediting against temporally explicit HWP decay functions rather than using static HWP "storage factors" for a given time period.
The science on leakage is not yet robust enough to develop rules that satisfactorily address leakage risk from projects that reduce harvesting. The protocols have opted to apply low leakage rates, which are generally inconsistent with the scant literature available. It would be prudent to apply higher leakage rates until new data and methods can be developed to support a more refined approach.
The protocols likely under-allocate credits to the buffer pool, in large part because they do not adequately address the increasing risk of reversal due to climate change. Larger buffer pool deductions along with regularly updating the protocols based on the latest science would help to address this issue. Protocols may also consider incentivizing, and avoid dis-incentivizing, practices that reduce carbon in the short run but increase resilience in the long-run, like thinning and fuels treatments that reduce the risk of catastrophic wildfire North and Hurteau, 2011;Herbert et al., 2022).
Finally, methods for estimating onsite carbon stocks in the protocols allow for a great deal of flexibility. If implemented properly, current rules are sufficient to ensure high integrity. However, this flexibility also allows for less accurate carbon accounting, including through the use of reference literature for allometric equations and root:shoot ratios that may not be appropriate or conservative .
for the project under development. While the implications of this flexibility have not yet been systematically studied in the context of IFM projects, it appears to be relatively less consequential than the baseline, leakage, and durability issues identified above. These changes will significantly reduce the risk of over-crediting and bring protocols more in line with the scientific literature. Still, we highlight one persistent challenge with ensuring the quality of IFM offset credits: uncertainty in the true baselines. Our recommendations reduce, but do not eliminate, the risk of overcrediting from baseline choices. Due to the inherent uncertainty in true baselines, baseline setting rules necessarily involve a tradeoff between false positives and false negatives (Trexler et al., 2006). If baselines err on the side of inclusiveness, allowing projects to choose baselines well below initial carbon stocks can accommodate worthwhile projects on lands at risk of degradation or conversion, but this flexibility also allows lands not at high risk of being degraded or converted to choose similar baselines, leading to over-crediting (false positives). Choosing more conservative baselines as we recommend means that some valuable projects will earn fewer credits than their true climate impact and some opportunities for real climate mitigation will be missed (false negatives). The greater the baseline uncertainty, the greater the tradeoff between false negatives and false positives. Setting the baseline at the average, given uncertainty, is not sufficient to avoid over-crediting because of information asymmetry and adverse selection.
Another potential solution to the inevitability of adverse selection (and more broadly, the incentive for project developers to take advantage of flexible rules to choose the option that results in the most credits), is to build more sources of under-crediting into the protocols so that if over-crediting occurs for any particular project, the integrity of the portfolios of projects under a protocol as a whole is not compromised.
If a higher burden of evidence for quality was required across the whole offset market, the number of credits generated by each project would shrink, and the price would go up. Poor quality of IFM and other project types keeps offset prices lower than what is needed to effectively drive mitigation without over-crediting. Expected growing demand in the voluntary market and constrained supply will likely push carbon prices higher in the future allowing offsets to play a larger role in driving real change with more accurate protocols.
IFM has a large potential to reduce emissions and sequester carbon through forest restoration, conservation of ecologically important forests, increased stand productivity through changed management, extended rotations of working forestlands, restoration of degraded forests, and reduced-impact logging. Carbon offsetting has the potential to create meaningful incentives to achieve this potential. This study identified ways to bring the IFM protocols better in line with the literature on carbon accounting and forest management to significantly reduce the risk of over-crediting. Most importantly, more conservative baselines that avoid the assumption of significantly increased harvesting can substantially reduce over-crediting risk, but does not resolve it due to persistent uncertainty and adverse selection. Better aligning protocol rules with current understanding of carbon accounting practices will help reallocate carbon financing toward projects that can have meaningful climate impact.

Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.