Review and Synthesis of Estimation Strategies to Meet Small Area Needs in Forest Inventory

Small area estimation is a growing area of research for making inferences over geographic, demographic, or temporal domains smaller than those in which a particular survey data set was originally intended to be used. We aimed to review a body of literature to summarize the breadth and depth of small area estimation and related estimation strategies in forest inventory and management to-date, as well as the current state of terminology, methods, concerns, data sources, research findings, challenges, and opportunities for future work relevant to forestry and forest inventory research. Estimation methodologies explored include direct, indirect, and composite estimation within design-based and model-based inference bases. A variety of estimation methods in forestry have been applied to extensive multi-resource inventory systems like national forest inventories to increase the precision of estimates on small domains or subsets of the overall populations of interest. To avoid instability and large variances associated with small sample sizes when working with small area domains, forest inventory data are often supplemented with information from auxiliary sources, especially from remote sensing platforms and other geospatial, map-based products. Results from many studies show gains in precision compared to direct estimates based only on field inventory data. Gains in precision have been demonstrated in both project-level applications and national forest inventory systems. Potential gains are possible over varying geographic and temporal scales, with the degree of success in reducing variance also dependent on the types of auxiliary information, scale, strength of model relationships, and methodological alternatives, leaving considerable opportunity for future research and growth in small area applications for forest inventory.


INTRODUCTION
The frequency and sophistication of statistical methods in forest inventory have grown steadily since their earliest adoption by forest researchers, with an overall goal of providing information of sufficiently high quality to inform decision-making (Schumacher, 1945). One ongoing trend involves the use of data collected as a part of broad regional or national forest inventories to produce estimates for areas smaller than the surveys were originally designed to address (Magnussen et al., 2014). This trend reflects a situation described by W. A. Fuller in a plenary presentation delivered to a 1998 workshop on Environmental Monitoring Surveys Over Time hosted by the University of Washington, Seattle (Fuller, 1999): "The client will always require more than is specified at the design stage. For example, the client will explain that they require estimates only at the regional or national level and then, when data are available, ask for county estimates." While the quotation cites a typical circumstance, many situations arise in forest resource assessment where stakeholders recognize that data from multiple sources and scales could be leveraged to give improved estimates for increasingly small subsets of survey populations-whether based on geographic areas, time periods, or demographic subsets of larger populations from which sample data were collected.
One means of addressing this need is through small area estimation (SAE), a set of statistical methods aimed at providing estimates of parameters of interest for population subsets known as small area domains, typically by linking information from auxiliary sources with sample observations gathered over a larger population that encompasses multiple small area domains. A small area in this case can be described as any geographic, temporal, or categorical domain for which an established approach for direct estimation does not provide adequate precision (Rao and Molina, 2015). Usual quantities of interest include finite population parameters such as area totals or means, especially where tolerances for estimator accuracy have been specified as a part of the sample design. Domains in SAE are typically subsets of larger survey populations, such as those found in regional or national economic, health, or agricultural surveys focused on multiple attributes of interest (Schreuder et al., 1993). These surveys may involve decades of repeated sampling and data collection targeting dozens of attributes, or be limited to a single attribute observed at just one point in time.
SAE methods seek to improve the precision of estimates for small area domains of interest (DOI) using data observed from other domains to increase the amount of sample information available, an approach often described as "borrowing strength." The indirect (i.e., outside-of-domain) data are generally linked to small area DOI through one or more auxiliary variables and a model relationship that holds across multiple domains. In this sense, the domain (d) direct data (y ∈ d) are linked to the indirect data (y / ∈ d) via a model relationship involving the auxiliary data (x) and corresponding observations of y. In the example Fuller (1999) described, counties of interest are the small area domains, with the sample data collected in a specific county serving as its source of direct information. Sample observations from surrounding areas-including other counties-are the source of indirect data. In a national forest inventory (NFI) application consistent with Fuller's example both y ∈ d and y / ∈ d would be collected on observational units selected by statistical sampling. Auxiliary information to construct a model for y ∼ x might come from remote sensing or other geospatial data sets separate from the NFI sample observations, or from other sources including surveys of forest landowners, agencies, or enterprises that keep records of timber harvesting, tree-planting, or other management-related activities. Other arrangements and sources of information are possible, but the overall pattern of direct, indirect, and auxiliary information, and a model y ∼ x is present in most SAE applications (Rao, 2008).
Methods now widely associated with SAE largely appeared in technical literature beginning in the 1970s to address needs for increased accuracy when estimating for small area domains within population, social-economic, and public health surveys (Federal Committee on Statistical Methodology, 1993). Since then, SAE has been adopted in applications aimed at estimating incomes (Fay and Herriot, 1979), census groupings, and crop areas (Battese et al., 1988), among others. Methods for SAE have continued to develop over time as new statistical and computational tools have become available, together with widespread availability and cost-effectiveness of modern data sets. Remote sensing technologies such as satellite imagery or aerial laser scanning (ALS) have played a key role in accelerating the application of SAE methodologies to forest inventory settings (Pfeffermann, 2002(Pfeffermann, , 2013Sugasawa and Kubokawa, 2020;Coulston et al., 2021).
Interest in applying SAE across disciplines has grown over time but most applications in forest inventory began to appear in published work over the past several decades (Burk and Ek, 1982;Anderson and Breidenbach, 2007;Breidenbach et al., 2010;Goerndt et al., 2011). SAE reduces forest inventory estimator errors for small area domains, offering an efficient and cost-effective option for reducing uncertainty compared to increasing sampling intensity by installing additional forestinventory field plots (Magnussen et al., 2014). SAE techniques have been used to enhance precision of NFI-derived estimates (Breidenbach and Astrup, 2012;Frank et al., 2020), in forest stand inventories (Ver Planck et al., 2018), and from surveys of wood processors or commercial landowner inventories (Green et al., 2020;Coulston et al., 2021), but are not limited to those uses (Affleck and Gregoire, 2015). The ability of SAE to increase estimator precision in small areas where data are otherwise too sparse to satisfy tolerance specifications makes it attractive for applications in forest inventory (Guldin, 2021). For example, the Norwegian NFI has employed national canopy height maps from aerial remote sensing as auxiliary data sources since about 2010 to address needs for better local information in producing municipal forest statistics and forest-management related inventories (Astrup et al., 2019;Breidenbach et al., 2020). SAE has been used with forest inventory data from the U.S. Department of Agriculture Forest Service Forest Inventory and Analysis (FIA) program to generate estimates of forest attributes in small areas such as biofuel supply areas around co-firing power plants (Goerndt et al., 2019). In forests where field plots can be precisely referenced to high-quality geospatial auxiliary data (e.g., ALS), SAE can provide increased precision of estimates for arbitrarily small spatial areas-accounting for spatial correlations in sample data when warranted-even where no direct sample data lie within some DOI (Babcock et al., 2018;Pascual et al., 2018). Current research aims to combine forest biomass data from field plots with canopy-height measurements from the NASA GEDI spaceborne lidar platform and other remotelysensed auxiliary information in a SAE framework, signaling the first efforts to obtain global scale forest biomass estimates in an inferential framework (Patterson et al., 2019).
While the potential value of SAE for use in forest inventory and monitoring is high, the number of applications reported in technical literature to date is relatively small. A main objective of this work is to provide an overview of published findings of small area applications in forest inventory including relevant considerations for their implementation in other, perhaps novel, settings. An underlying goal is to provide a backdrop of SAErelated concepts and terminology, as the clear and consistent use of statistical language is important to wider adoption of these tools in future work. A further objective is to provide clarification, without a heavy emphasis on mathematical statistics, where ample terminology and notation can pose challenges to those interested in pursuing small area applications, possibly for the first time. We begin in Section 2 with background on relevant statistical paradigms of design-and model-based inference relevant to SAE, including an important extension of designbased inference known as model-assisted estimation. Section 3 presents terminology for direct and indirect estimators along with an overview of composite estimators used in a majority of SAE methods classified as either unit-level or area-level methods. In Section 4, we summarize key findings of published SAE research in forestry, with synthesis and comparison to designbased approaches including model-assisted estimation. Section 4 also includes some discussion of variance estimators in small-area inventory applications along with emerging topics in SAE. We conclude in Section 5 with some take-home findings of the work.

Design-Based Estimators
The design-based framework for inference from statistical sampling is a pillar of many SAE procedures, especially arealevel estimators that will be discussed in Section 3.3 below. Sampling is likely familiar to most forest inventory specialists as it provides the basis for establishing statistical properties of estimators in well-designed inventories (Shiver and Borders, 1996;Gregoire and Valentine, 2007;Thompson, 2012). The sample design assigns probabilities to population units (e.g., plots, trees, etc.) in the sampling frame for being selected in a particular random draw from a finite population, with the overall probability of any unit being included in a sample determined from its selection probability in the context of the sampling scheme. While the attributes of each unit-whether sampled or not-are treated as fixed quantities, randomness in design-based methods arises through the process of sampling. Gregoire (1998) explained this detail stating, "in the design-based framework, the population is regarded as fixed whereas the sample is regarded as a realization of a stochastic process." Design-based methods rely on the randomization distribution of sampling and possible estimates that could be obtained by following the sample design and its implementation in the sampling scheme. A key consequence is the lack of reliance on mathematical assumptions of how elements in the population are distributed, or of model relationships assumed about how two or more variables in the population are related (Sterba, 2009).
The usual goal of design-based sampling and subsequent estimation is to obtain reliable estimates of finite population parameters in an inferential framework. In forest inventories, population totals, means, or proportions for various attributes of interest are typical subjects of estimation. No assumptions about the underlying structure or distributions of population units being surveyed are required for valid inference in the design-based framework. Design-based estimators compatible with their sample designs are design-unbiased, meaning the expected value of the estimator over all possible samples equals the true population parameter being estimated. It follows that design-based estimators are design consistent, such that, "both the design bias and the variance go to zero as the sample size increases" (Skinner and Wakefield, 2017).
The Horvitz-Thompson (H-T) estimator is a design-based estimator widely used in forest inventory and introduced in many texts starting with simple random sampling. Any finite population total for attribute y can be estimated using the H-T estimatorτ whereτ y is the estimated population total, s denotes the set of observed values in the sample, y i is the observed value of attribute y on the i th sample unit, and π i is the probability that y i is included in s. In this form design-based estimators rely entirely on observed sample data and sample design weights, i.e., the inclusion probabilities in the denominator of Equation (1), to estimate an attribute of interest. Standard errors of an estimate require pairwise joint inclusion probabilities P(y i ∩ y j ) (i = j), calculated as the product of π i and π j when random sampling is from non-overlapping population units. Thus, the inference base for H-T estimator makes use of properties of the sample design such as a sampling scheme that draws samples according to design weights, independence of sample observations, and sampling distribution properties including applicability of Student's t-distribution and the Central Limit Theorem (Sterba, 2009). Sampling methods relying on a design-based framework do so largely for its desirable properties of unbiasedness and asymptotic consistency, often sought in inventory settings. These methods, however, rely on direct datavalues of y observed directly by sampling from the population of interest-which can be expensive to obtain. Any need for estimates on small subsets of the population will likely result in a need for increased sampling intensity and additional expense (Fuller, 1999).

Model-Assisted Estimators
Models can be used within a design-based framework to improve estimates in what are often called model-assisted estimators (Särndal et al., 1992;McConville et al., 2017McConville et al., , 2020. Like the H-T estimator (Equation 1), model-assisted estimators are direct estimators in that they rely on values of y only from population DOI. One example is post-stratification which has formed the basis of forest inventory estimation strategies in the U.S. for many decades (Bechtold and Patterson, 2005). Among the simplest model-assisted estimators are linear combinations of auxiliary variables x -either univariate or multivariate -available for all sampled units and having known population means (µ x ) or totals (τ x ). Using the H-T estimator (1), the model-assisted estimator can be written as in Stahl et al. (2016) τ MA whereτ MA y is the estimated model-assisted population total for attribute y, U denotes the (universal) set of all population units, and predicted values are obtained from a model, i.e.,ŷ i = m(x i ), oftentimes a linear regression model. The first term on the righthand side of Equation (2) requires at least that population totals τ x for all predictors are known, while the second summand is a H-T estimator of sample deviations from model-predicted values, which should be positive if the model underpredicts for y ∈ s and negative if the model overpredicts. Särndal et al. (1992) presented an unbiased estimator for the variance of Equation (2) when the coefficients of a linear model m are known constants, an application known as the difference estimator. The difference estimator is design-based because both the estimatorτ MA y and it's estimated variance are unbiased over all possible samples due to the sample design. Thus, althougĥ τ MA is a design-based estimator, the phrase "model-assisted" is used to indicate that a model relationship is involved in the estimation of τ .
The regression estimator, or generalized regression estimator (GREG) is based on the same form as Equation (2) in cases whereŷ is predicted from a regression model. GREG has been applied in forest inventory solutions that include poststratification, ratio estimators, LASSO, ridge, and elastic-net regressions (Stehman, 2009;McConville et al., 2020). Extensions using non-linear, semiparametric, and non-parametric predictive modeling techniques have also been demonstrated in forest inventory applications (Opsomer et al., 2007;Tipton et al., 2013;Kangas et al., 2016). Although taking the same form as the difference estimator, as distinguished by Särndal et al. (1992), the linear predictor m(x) in GREG consists of a regression model y = xβ fit to paired sample values of x and y. Some authors distinguish between the difference estimator and GREG as either external or internal, respectively, based on the sources of information from which their coefficients are derived (Kangas et al., 2016;Stahl et al., 2016). Still within the realm of direct, model-assisted estimators, the term modified GREG is used to distinguish models where coefficients are derived using data outside the population of interest (Rao and Molina, 2015). For GREG to be approximately design-unbiased, sample inclusion probabilities for x and y are used in a weighted-least-squares fit of the model to sample observations. Inclusion probabilities are also used in an approximate variance formulation for the GREG estimator detailed by Särndal et al. (1992, ch. 6). Confidence intervals can be reliably constructed for these estimators, but are usually used for major domains as their variance can be large or unstable in domains having small sample sizes (Särndal, 1984;Lehtonen and Veijanen, 2009). We note that design-based approaches including H-T and model-assisted estimation are sometimes categorized as SAE (see Figure 2.1 in Rahman and Harding, 2017;Hill et al., 2021); however, they are often used where interest lies in only a single population domain, such as in the methods demonstrated by McConville et al. (2020) for estimating forest attributes in a single county in Utah, USA.

Model-Based Estimators
A second pillar of many SAE procedures is the model-based framework for inference, which we introduce here as being distinct from a purely design-based framework, including modelassisted estimators described in Section 2.2. In model-based estimation, univariate or multivariate statistical models are formulated to establish the basis for assigning probabilities to observed data, for characterizing probabilistic relationships between variables, or for both. Unlike the fixed-quantity view of population units in design-based estimation, modelbased approaches treat observable units in a population as realizations or instances of random variables that underlie the observable population. For this reason these conceptual models of population variables and their statistical distributions are sometimes referred to as "superpopulation models" but are just as often simply called models (Gregoire, 1998;Skinner and Wakefield, 2017).
Model-based estimators can be useful when random sampling is impractical, or when assigning inclusion probabilities to sample observations requires some assumption about the statistical or spatial distributions of population units (e.g, Radtke and Bolstad, 2001). Sterba (2009) emphasized the utility of model-based estimators in surveys where selection probabilities were unknown, particularly in some forms of non-random sampling. Perhaps most important in the context of this review, models serve an important purpose in providing a statistical link between sample observations of y and auxiliary data x, to make use of auxiliary information in ways that increase the precision of parameter estimates. The parameters of interest in forest inventory likely include population totals or means; additionally, parameters of the models themselves, such as regression parameters or ratios linking auxiliary and sample data may be of interest (Gregoire, 1998). Model-based methodology includes many tools to assist with identifying best models to fit data, to estimate parameters and standard errors for model predictors, and to adopt complex models and analyze more complex data structures than might be possible from sampling alone (Rao and Molina, 2015). While the complex and expansive nature of model-based methodologies leads to a wide array of SAE techniques that make use of models, they place a somewhat greater burden on analysts to adequately address model selection, goodness of fit, or checking of model assumptions-all model concepts that would not be required in design-based approaches.

TERMINOLOGY AND METHODS
In Sections 2.1-2.3, estimation was presented as a means of obtaining reliable information about population parameters of interest-either for fixed finite populations or underlying model Frontiers in Forests and Global Change | www.frontiersin.org superpopulations-along with rationale for making statistical inferences under design-based and model-based paradigms. In moving to settings where the goal involves making reliable estimates for subsets or domains of interest within broader populations, the presentation will be expanded to include ideas and terminology better suited to the purposes and practice of SAE.

Direct, Indirect, and Composite Estimators
Apart from the design-and model-based modes of inference introduced above, estimators in general can be described as being either direct, indirect, or composite in nature, depending on the sources of information they make use of. Domain-direct estimates use observed values y i only from sample units in a particular domain, i.e., i ∈ s d . Domain-specific totals can be estimated directly asτ where DIR indicates thatτ is a direct estimator, and d denotes the domain of interest. Substituting w i = 1/π i in Equation 3 shows how the H-T estimator (Equation 1) serves as the domaindirect estimator when data are restricted to sample units selected in domain d, i.e., when i ∈ s d . The benefit of direct domain estimation is that no explicit model assumptions are required (cf. Sterba, 2009), and sampling weights can be used, allowing for design-unbiased estimates. A fundamental concern of SAE is that direct estimation often leads to unacceptably large or unstable standard errors for domains with small sample sizes (n d ). Additionally, no direct estimates are possible for domains having no sampled units, i.e., when n d = 0. Indirect domain estimation seeks to remedy the direct domain estimator's shortcoming of large variance when n d is small by increasing the "effective sample size" using information outside of the domain of interest together with a statistical model (Rao and Molina, 2015, pg. 35). The indirect approach is manifest in what is often called a synthetic estimator, e.g., which gives the indirect or synthetic estimate (Schaible, 1993) of the total for the d th small area domain, with domain d auxiliary total τ xd , and regression coefficientsβ estimated from (x, y) data sampled across the entire population to borrow strength for estimating on d. Benefits of the synthetic estimator are that it can allow for estimates to be made in non-sampled units, and likely has a smaller variance than direct domain estimates, especially where n d is small. However, Equation (4) as given does not account for between-domain heterogeneity and thus can be biased for specific domains. Additionally the indirect estimator does not necessarily trend toward the unknown domain total τ d as n d increases.
Not all synthetic estimators are regression models, but the example in Equation (4) was chosen to illustrate some additional details. First, when the coefficients are estimated only from data sampled in the domain of interest, i.e., whenβ in Equation (4) is replaced byβ d , the estimator is considered to be a model-assisted (GREG) estimator having design-based properties for inference. The same is true when the regression model is fit using sample observations weighted by their design weights using weighted least squares (Särndal et al., 1992;McConville et al., 2020, Section 6.4). Second, a synthetic estimator's inference base may depend on assumptions about its model form being correct, and whether its parameters estimated from one set of domains are suitable for making predictions for other domains. As with any estimator, care should be taken to verify what conditions must be met to satisfy the inference base for GREG.
The synthetic estimator can be used in concert with the direct domain estimator to create a composite estimator that balances the strengths and weaknesses of direct and indirect estimators (Rao and Molina, 2015): The weighted average of the direct and indirect estimators comprises the composite estimatorτ COMP d , whereτ DIR d is a direct estimator, e.g., (3),τ SYN d is an indirect estimator, e.g., (4), and γ d ∈ [0, 1] is a domain-specific weighting factor, also known as a shrinkage factor. An optimal solution for minimum

Unit Level SAE
Unit-level SAE employs synthetic models that operate at the scale of observational or sample units in the population, typically field plots in forest inventory applications. The synthetic model in a unit-level estimator is used to predictŷ on all population units in a domain d, regardless of how many (or whether) sample data for y were observed in that domain. Predictors from auxiliary variables x and the model relationship y ∼ x provide the means of generating predictions forŷ ∈ d. Here x is either known for every unit in d or known in aggregate, as in a case where a domain-specific mean for x, denotedx d , is known. In either case, paired values of (x, y) are observed on sampled units across the broader population, i.e., the indirect data, and used to fit or train a synthetic model. Errors in the synthetic model predictions are partitioned into two components (see Rao and Molina, 2015, model 4.3.1). A domain-specific error v d is attributed to synthetic model variance that applies equally to all y ∈ d, and withindomain residual errors e di -independent of v d -that apply to individual sample units y i ; i ∈ d.
The unit-level composite estimator developed by Battese et al. (1988, hereafter BHF) is an excellent example from which to study unit-level SAE, as the data and models the authors used to demonstrate the application are integrated into the R "sae" package (Molina and Marhuenda, 2015). BHF used the nestederror linear regression model to estimate crop areas for corn and soybeans in Iowa, using USDA Statistical Reporting Service field survey data from 1978 as direct observations of y, and Landsat 2 multispectral scanner (57 x 79 m) raster imagery as auxiliary information for x.
Unit-level approaches rely on matching individual sample unit observations to the auxiliary data; thus, the satellite image pixels in BHF were assigned to corresponding 250-ha (1 sq. mile) field survey units (BHF called the units "segments")-about 555 raster cells per segment. The aggregated Landsat and field survey data formed (x, y) pairs for 37 sampled segments in a population of D = 12 counties, with each county treated as a small-area domain of interest. The BHF synthetic model used a response y = sample segment area (ha) growing corn, in a linear regression model with an intercept and two predictors x 1 = Landsat pixels classified as corn and x 2 = pixels classified as soybeans, with d∈D n d = 37 observed segments. Using their synthetic model, similar to Equation (4), BHF were able to predict domain means for corn area per segment in counties asμ withx d calculated from all segments in county d, using Landsat pixel class counts of a given crop in each segment (see Table  1, Battese et al., 1988). By estimatingβ as fixed effects andv d as random effects using mixed linear regression modeling, BHF obtained the empirical best linear unbiased predictor (EBLUP) of domain-specific meansμ EBLUP d =x ′ dβ +v d , a key contribution of their work because of the favorable properties of the EBLUP. Goerndt et al. (2013, Equation 5) and Costa et al. (2009, Equation 13) presented the BHF unit-level nested error EBLUP in the form of a composite estimator with domain-level, i.e., countylevel estimates obtained from direct and synthetic regression estimates weighted aŝ whereμ DIR d is a sample-direct estimate of the mean y for small area domain d andβ is the vector of fixed effects regression coefficients obtained by linear mixed modeling with domainspecific random effects. The term σ 2 v in Equation (7) denotes the variance among domain random effects v d ∼ N (0, σ 2 v ), and σ 2 e as the residual variance of population units-the variance unaccounted for by the EBLUP-within domains, i.e., e di ∼ N (0, σ 2 e ). This assumption assigns a single residual variance to all domains, with σ 2 e estimated from the full set of sample residuals. As when using optimal shrinkage weightsγ d in Equation (5), the weights in Equation (7) ensure that as the domain-specific direct estimator variance gets small, such as when n d is large, the EBLUP tends towardμ DIR d . Unit-level paired (x, y) data typically provide an informationrich means of linking indirect data from broad and extensive populations to specific domains of interest. Further, when y is observed by non-random sampling, the model-based properties of the composite estimator support approximate and asymptotic inference bases where direct estimation alone would not. Efforts should be made to verify the veracity of the underlying statistical model and stochastic processes, especially when data collection employs stratification, clustering, or disproportionate sampling among some elements of the population (Sterba, 2009). In applications involving large data sets the computational requirements of unit-level analyses can be demanding, especially when objectives include the validation of variance estimates approximated by Taylor series linearization or when using resampling procedures to estimate variance components (Prasad and Rao, 1990;González-Manteiga and Morales, 2008). Computationally demanding synthetic models can also pose challenges for SAE, but software advances have made steady gains in providing tools to address such challenges (McRoberts et al., 2007). As with any model-based inference approach, customary steps involving model selection, goodnessof-fit, and checking other assumptions are important in unitlevel SAE. In cases where influential points, heteroscedastic error variances, or non-independence of residuals present problems, developments have been made to help overcome limitations of the BHF approach (Babcock et al., 2015;Breidenbach et al., 2018).

Area Level SAE
Unlike unit-level approaches, area-level SAE employs synthetic models that operate at the scale of subpopulation domains, rather than individual sample or observational units. As a consequence, auxiliary data do not need to be paired one-toone with individual sample observations. Instead, domain-direct estimates (e.g.,τ DIR d orμ DIR d ) are paired with domain-specific observations, such as domain means or totals of x ∈ d, which we denote as x d . The paired domain data (ŷ DIR d , x d ) are then used to develop regression models or train other types of synthetic models for use in SAE. Rao and Molina (2015, model 4.2.5) presented a regression-based area-level model where ψ d are domain-specific model errors and ǫ d are errors due to sampling on the domain-direct estimates that appear on the left-hand side in Equation (8). In the basic area-level model, Rao and Molina (2015) explain that b d are domain specific positive constants set to b d = 1 in the model presented by Fay and Herriot (1979), which, for direct estimates of domain totals can be expressed asτ Fay-Herriot (F-H) models (Fay and Herriot, 1979) like the one shown in Equation (9) are often synonymous with area-level SAE, as they have seen considerable use in area-level applications since their introduction. An EBLUP derived from the F-H model, expressed as a composite estimator having a form similar to Equation (5) shows the roles of direct and synthetic estimatorŝ with weightsγ d =σ 2 ψ d /(σ 2 ψ d + σ 2 ǫ d ) composed as the model variance relative to the total variance. A difference in the shrinkage weights' formulations for Equations (7) and (10) is that σ 2 ǫ d in Equation (10) is assumed to be known, but typically specified as the variance estimated from sample observations y ∈ d. Wang and Fuller (2003) developed a modified area-level estimator that accounts for the empirical estimation of domaindirect variances, which has been demonstrated in forest inventory applications (Magnussen et al., 2017). Similar to the unit-level model,β in Equation (10) is a vector of fixed-effects coefficients from a linear mixed model having domain-specific random effects. As in Equation (7) the F-H EBLUP tends toward the direct estimator when the direct variance is low, and toward the synthetic estimator when the direct variance is high. Because the synthetic estimator in the F-H model operates on subpopulation domains, the same aggregated measures x d from the auxiliary data that were used in estimating the regression coefficients are also used in predictingτ FH d in Equation (10). In contrast, the unitlevel EBLUP uses an aggregated measure such asx d in Equation (7), despite the model coefficients having been estimated using observed x (and y) values from individual sample units.
In applications framed in a geographic context, such as forest inventories, where field plot observations are often paired to auxiliary data from remote sensing, area level modeling obviates the need for highly accurate plot coordinates, and can be used when there is a degree of misalignment between plots and auxiliary data (Goerndt et al., 2011). This concern is highly relevant when the protection of confidential information prevents the release of exact coordinates of sampled locations to unauthorized personnel. Instead, only direct linkage to a specific domain is needed for each plot. This also facilitates using sample units that possess indistinct sampling boundaries, such as where linear transects or variable-radius plots are used in field sampling (Ver Planck et al., 2018). This method also tends to require less processing time which lends itself to analysis involving very large datasets.

Other SAE Methods
In addition to the aforementioned model forms, there are other methods which serve as alterations or variations of the above model types and as such are not mutually exclusive from them. Such methods include Bayes methods, and nearest neighbor models which are worthy of particular mention due to their use in forestry SAE (McRoberts, 2012;Babcock et al., 2018;Ver Planck et al., 2018). Bayesian methods, both empirical Bayes (EB) and hierarchical Bayes (HB) offer some advantages over their frequentest counterparts such as being able to model different target variable types such as binary or count data (He and Sun, 2000). HB also gives posterior distributions of the small area parameters, and can therefore avoid relying on unrealistic asymptotic assumptions (Pfeffermann, 2013). Bayesian methods offer flexibility in specifying spatial structures such as spatially correlated random effects (Wang et al., 2018).
Nearest neighbor techniques offer a similar set of benefits for SAE, for example, where estimators for categorical, binary, or count variables are involved. They are non-parametric in that no distributional assumptions regarding response or predictor variables are necessary, which has proven useful when multivariate auxiliary information is used to construct synthetic models. Nearest neighbor models can also accommodate correlated sample and auxiliary data that may arise in spatial or temporal domains (McRoberts et al., 2007;McRoberts, 2012).

APPLICATIONS IN FOREST INVENTORY
We now turn to selected examples related to SAE in forest inventory from published research ( Table 1). Note that in Table 1, we exclude a large body of literature employing poststratification, a widely-used model-assisted estimation technique. Instead we aim to focus on estimators less-commonly used in existing forest inventory production processes. Post-stratification notwithstanding, model-assisted estimators including GREG were among the most widely-used and earliest-adopted methods for increasing estimator precision using models to link auxiliary information from remote-sensing with field sample data. As such, we elected to include a number of model-assisted applications in our example references, even though some of the selected examples do not involve "small area" domains as the term is often used in SAE (Table 1).
Among the reasons authors have given for adopting modelassisted estimators was the need to ensure design-based inference in large, multi-resource sample designs (Reich and Aguirre-Bravo, 2009;Naesset et al., 2011). Others cited the need for precise estimation in small area domains nested within broader population surveys as a motivating factor (Goerndt et al., 2011;McRoberts, 2012;Magnussen et al., 2014). A few noted that consistency and additivity of estimates from small areas nested within larger domains were motivating concerns (Reich and Aguirre-Bravo, 2009;Nagle et al., 2019). Others noted the need for sample survey organizations to conduct generic inference, i.e., to make compatible estimates of all forest attributes simultaneously by using the same model to define survey weights (Opsomer et al., 2007;Johnson et al., 2008;McConville et al., 2020). Nearly all research referenced in Table 1 reported the potential for increased efficiency in estimating forest inventory attributes as a reason for pursuing the work, with the high cost of increasing field sample sizes frequently noted as an operational constraint.
Lidar and digital aerial photogrammetry (DAP) were major data sources used as auxiliary information, with some authors using lidar or DAP point cloud metrics, e.g., height percentiles or pulse return densities aggregated to unit or area levels, and others using canopy height models (CHM) processed from point cloud data (Steinmann et al., 2013;Babcock et al., 2015). Field plots in forest inventories were the primary source of directly-sampled observations, with more than half of the selected studies using NFI or other land-management agency field sample observations, Methods included are generalized regression estimators (GREG), nearest neighbor (NN), unit-level empirical best linear unbiased predictor (U-EBLUP), area-level best linear unbiased predictor (A-EBLUP), and hierarchical Bayes (HB). Direct data sources include national forest inventories (NFI), non-national based forest inventories (FI), plus directly sampled tree data. Auxiliary data used include tree branch data, light detection and ranging (lidar), digital aerial photogrammetry (DAP), Landsat, timber products outputs (TPO) surveys, national land cover database (NLCD), management records, solar radiation models (SRM), elevation derivatives for national applications (EDNA), moderate resolution imaging spectroradiometer (MODIS), and interferometric synthetic aperture radar (InSAR). and most others using plots installed for management or research purposes, such as on state, university, private, or public experimental and working forests (Anderson and Breidenbach, 2007;Mauro et al., 2017;Green et al., 2020). Several studies aimed to examine and augment existing estimation frameworks in forest inventory settings to address potential violations in underlying assumptions. Examples include accounting for heteroscedasticity of variance in synthetic model residuals and spatial or temporal autocorrelation among measurements on observational units or areas of interest (Babcock et al., 2018;Breidenbach et al., 2018;Ver Planck et al., 2018;Mauro et al., 2019). Estimating the change of forest attributes over time with SAE methods was demonstrated successfully by Mauro et al. (2019) and by Coulston et al. (2021), both of which used repeated measurements from forest field plots in their work.
Multiple studies applying one or both unit-level and area-level EBLUP-based SAE appear in Table 1. The choice of adopting unit-or area-level SAE in forest inventory applications can depend on limitations of sample or auxiliary data sets, for instance, where georeferencing inaccuracies introduce significant errors in pairing field observations to remotely-sensed or other geospatial data sets, e.g., maps (Naesset et al., 2011;Green et al., 2020). Similar challenges in pairing field data to remote-sensing can occur where plot designs and raster layers are incompatible, such as when variable radius field plots (i.e., angle gauge sampling) are used, or when field plot sizes are small compared to the pixel size in available auxiliary data sources (Goerndt et al., 2011;Ver Planck et al., 2018;Temesgen et al., 2021). Although area-level estimators are flexible to accommodate situations where precise georeferencing is impractical, a tradeoff may arise due to the loss of information from aggregating unit-level observations to domain-or area-level scales; further, aggregating sample observations also reduces effective numbers of observations available for synthetic model development (Magnussen et al., 2017).
When georeferencing enables pairing of auxiliary and sample data, sampled units can all serve as (x, y) observations for fitting or training synthetic models. The models can then be used to make predictions on raster cells or other spatial units that correspond to unobserved population units (Babcock et al., 2018;Frank et al., 2020). It follows that unit-level SAE estimators can be used to map predictions or estimates at finer spatial resolutions than area-level methods, the latter of which allow for mapping SAE predictions only at domain levels. Another potential advantage of the unit-level approach is the ability to define arbitrary spatial domains subsequent to sampling without necessarily losing substantial power of inference (Pascual et al., 2018). Gains in efficiency between unit-level and arealevel SAE methods have been compared in multiple studies, generally confirming the potential for greater gains from unitlevel approaches Breidenbach et al., 2018).

Model-Assisted Estimation
Model-assisted estimators have proven to reduce estimator errors considerably, e.g., when measured by the relative efficiency of a model-assisted estimator compared to its simplest direct counterpart-often characterized as simple random samplingcf. Equations (2) and (3) McRoberts et al. (2013) demonstrated substantial variance reduction 100%(1 − RE) = 84% using a non-linear regressionbased model-assisted estimator of growing stock volume per unit area in a 1,300 km 2 study area in southeastern Norway. The gains corresponded to over sixfold increase in apparent sample size, calculated as RE −1 . Aerial lidar (0.7 pulses m -2 ) served as the auxiliary data, with direct observations from n = 145 NFItype (200 m 2 ) field plots over the study area. They noted similar variance reduction (82%) using the same method to estimate growing stock volume over a one-half partition of the study area represented by n = 69 field plots, thus demonstrating how sample data from a larger area can be used to increase the precision of estimates in a smaller area (McRoberts et al., 2013). Model-assisted estimation has been demonstrated to increasing precision in multiple population sub-domains including Breidenbach and Astrup (2012), who tested GREG as a model-assisted estimator in a similar sized study area in Norway divided into 14 municipalities, each having from 1 to 35 NFI sample plots of a total n = 145. They noted gains in precision were smaller and more variable than McRoberts et al. (2013), with RE ranging from 0.35 to 0.87 in eight municipalities having n d ≥ 6. They concluded that sample sizes n d < 6 in the other six municipalities gave unstable estimates, a finding similar to Naesset et al. (2011). Their data revealed that despite GREG's limitation in areas with small n d , its estimates deviated from sample-direct design-based estimates by less than half as much as synthetic model estimates alone, a result consistent with the design-unbiased property of GREG in the model-assisted framework. Naesset et al. (2013) observed consistent improvement in precision of estimates of aboveground forest biomass using GREG with aerial lidar auxiliary data in model-assisted two-stage sampling. Their gains were greatest (RE = 0.125) for all cover classes combined (n = 632), and only slightly more modest 0.09 ≤ RE ≤ 0.20 for individual age and productivity classes, all of which had class-specific sample sizes n c ≥ 46.
Reduced RMSE of synthetic model estimates when compared to direct estimator variance has been demonstrated in a number of applications involving the pairing of ALS and NFI-type field sample data (Naesset et al., 2011;Järnstedt et al., 2012;Nord-Larsen and Schumacher, 2012;Kotivuori et al., 2016;Nilsson et al., 2017;Novo-Fernandez et al., 2019). A variety of synthetic modeling approaches have been tested including parametric and non-parametric regression modeling, Random Forests and other ensemble predictive models, and nearestneighbor imputation, often with a goal of identifying suitable auxiliary data sources for estimating forest biophysical attributes (Latifi et al., 2010;Popescu et al., 2011;Bright et al., 2012;Rahlf et al., 2014). A common theme in these studies is the direct examination of synthetic model prediction errors (e.g., using cross validation) without formulating the models in a design-based or composite modeling framework to mitigate potential synthetic estimator bias (cf. McRoberts et al., 2013;Irulappa-Pillai-Vijayakumar et al., 2019;McConville et al., 2020). In model-assisted applications the usual goal is to increase the precision of population-level estimates, and less often to produce estimates for domains that divide a larger population into small areas where direct estimator instability can be a concern. Where investigated, model-assisted estimators were able to reduce small area uncertainties considerably, within limits of direct-data sampling intensity and the strength of model relationships involving indirect and auxiliary data (Breidenbach and Astrup, 2012).

Unit-Level SAE
Research comparing unit-level SAE to model-assisted estimators has shown that gains in precision are generally greater for unit-level EBLUPS than model-assisted estimates (e.g., GREG) primarily when direct-domain sample sizes are small. In directly comparing unit-level EBLUPs to model-assisted GREG estimates, Breidenbach and Astrup (2012) noted an average RE = 0.86, meaning unit-level SAE MSEs were lower than direct estimate variances by an additional 14%, on average, compared to GREG. Not all EBLUP MSEs were smaller than those computed for GREG. Comparisons showed greatest gains in five of six municipalities having 6 ≤ n d ≤ 17, but smaller-even negative-gains (RE = 0.93 and 1.15) were noted in two municipalities having n d ≥ 29. Such findings indicate that unit-level EBLUPS may not have as clear of an advantage over GREG in reducing estimator variance when domain sample sizes are relatively large; nonetheless, even when n d was relatively large, EBLUP performance was not much worse than modelassisted regression estimates (Breidenbach and Astrup, 2012). EBLUPS showed considerable stability across the range from 1 ≤ n d ≤ 35 compared to GREG, with unit-level SAE relative errors ranging from just [7.0, 12.4] % compared to a range of [0.6, 25.4] % for GREG, even with three municipalities having n d = 1 excluded for the GREG results since their errors could not be calculated (Breidenbach and Astrup, 2012). Magnussen et al. (2014) reported virtually identical gains for model-assisted and unit-level EBLUP estimates (both RE = 0.49) when compared to direct volume per hectare estimates for forest districts in Switzerland. The near identical results may have been related to comparatively large average sample sizesn d = 79 across the D = 108 Swiss forest districts.
Of the studies listed in Table 1 that compared unitlevel EBLUPs to domain-direct estimates, the largest variance reductions noted (RE = 0.03) were in a study of forest volume in Burgos Province, Spain, made up of D = 54 stands in an area covering about 13.65 km 2 (Pascual et al., 2018). The same authors reported more modest gains (RE = 0.50) in a 3 km 2 management area having D = 6 stands near Cercedilla, Spain. The authors attributed the greater gains at Burgos as being due to the lower sampling intensity there (about 0.5 %) compared to a 4 % sampling intensity at the Cercedilla site (Pascual et al., 2018). Large gains were reported for unit-level volume EBLUPs (RE = 0.09) tested by Mauro et al. (2017) in an area roughly 8 km 2 . A high degree of positive skewness was evident in the distribution of n d across domains, with roughly 30% of map unit domains having n d ≤ 2 despiten d = 10.3 across D = 64 map units having any sample observations . In estimating volume change in D = 24 stands experimentally manipulated for three levels of forest structural diversity, Mauro et al. (2019) reported unit-level gains (RE = 0.71) for 7-year volume change estimates. The sample design relied on 1 remeasured plot per 8 ha on a systematic grid, for a sparse but narrow range of domaindirect sample sizes, with 3 ≤ n d ≤ 10 andn d = 6.3 (Mauro et al., 2019). Breidenbach et al. (2018) reported greater efficiencies RE = 0.43 and RE = 0.28 compared to direct estimate variances, with the latter result including a formulation that accounted for heteroskedasticity of variance in synthetic model residuals.
By artificially reducing sample sizes from the available n = 680 NFI plots across D = 12 county-sized domains for estimating Oregon Coast Range forest volumes, Goerndt et al. (2013) demonstrated diminishing efficiency gains with increasing n d in unit-level EBLUPS from RE = 0.41 (n = 136), to RE = 0.57 (n = 204), to RE = 0.71 (n = 272). They also found that alternative unit-level composite estimators calculated with smoothed variances performed well in terms of increased precision and low apparent biases in unit-level SAE (Costa et al., 2003;Goerndt et al., 2013). The alternative estimators employed multiple linear regression, as well as nearest neighbor and gradient-nearest-neighbor imputation in synthetic models to achieve balances between bias and precision of SAE (Ohmann and Gregory, 2002).

Area-Level SAE
A number of studies have shown area-level SAE precision gains for timber volume or biomass compared to sample-direct estimates. Breidenbach et al. (2018), for example, reported a reduction in overall standard error from 32.7 m 3 ha −1 for direct volume estimates compared to F-H area-level RMSE of 23.1 m 3 ha −1 (RE = 0.50). A nearly identical gain in efficiency (RE = 0.50) was reported by Goerndt et al. (2011) for volume estimates in forest stands treated as small area domains with intentionallyreduced sample sizes between 2 ≤ n d ≤ 4. Relative gains tended to be less when the simulated small n d were increased by factors of two and three, with no gains for larger sample size increases. Despite finding modest gains when n d was large, area-level EBLUPS were demonstrably superior-in terms of RE and lack of apparent biases-to two synthetic estimators and two composite James-Stein type estimators tested (James and Stein, 1961;Goerndt et al., 2011). In testing area-level SAE with counties as small-area domains, a composite estimator similar to F-H showed 0.43 ≤ RE ≤ 0.91 over a 20-state region of the northeastern U.S. (Goerndt et al., 2019). In the same study a composite estimator based on a non-parametric nearestneighbor (NN) synthetic model showed slightly less gain in efficiency than the F-H type approach. Despite this, Goerndt et al. (2019) indicated the NN approach may have lower model bias and they cautioned against potential biases in model-based estimators, pointing out the need for thorough checking of model assumptions to ensure validity of model-based inferences.
The pattern of decreasing gains in efficiency with increasing n d was also noted by Mauro et al. (2017), who reported an average RE = 0.48 over D = 84 management areas (domains), while no gain (RE = 1.13) was noted in 14 of the domains having n d ≥ 25 sample plots (cf. Goerndt et al., 2011Goerndt et al., , 2019. Green et al. (2020) noted average RE = 0.79 in F-H estimates of timber volume across D = 40 stands, with little or no efficiency gains in stands where direct estimates were already quite precise (relative standard errors < 10%). Greater gains (RE ≈ 0.35) were noted in stands having direct relative standard errors > 25%. Findings such as these indicate that where domain-direct estimates are already quite precise, as in cases where n d is large or variation within domains is inherently low, F-H type estimators may exhibit a limited ability to further increase precision over domain-direct estimates. Magnussen et al. (2017) reported RE ranging from 0.44 to 0.77 in four study areas in Spain, Norway, Switzerland, and Germany, using a modification of F-H that treats domain-specific variances as estimates rather than known constants (Wang and Fuller, 2003). In the same study they found greater gains 0.28 ≤ RE ≤ 0.34 by including a non-stationary spatial correlation process in an area-level composite estimator which accounted for the spatial covariance structure in model residuals (Chandra et al., 2012(Chandra et al., , 2015. In a third approach Magnussen et al. (2017) used the HB approach of Datta and Mandal (2015) to obtain efficiency gains intermediate (0.27 ≤ RE ≤ 0.81) compared to their baselinespecifying empirically estimated variances-and non-stationary spatial F-H approaches. The Bayesian approach demonstrated several advantages related to estimated posterior distributions for specific domains, especially when the random-effect variance was largely attributable to a small number of domains from the larger population (Magnussen et al., 2017). Coulston et al. (2021) evaluated the performance of area-level F-H models including a simultaneous autoregressive (SAR) model of residual spatial correlation among domains in estimating forest removals, noting the spatial model improved efficiency of estimates at scales of individual counties, but not at the scale of larger survey regions encompassing groups of 12-20 counties each in the southeastearn United States. Ver Planck et al. (2018) compared F-H gains with no accounting for spatial autocorrelation (RE = 0.23) to a conditionally-autoregressive (CAR) F-H model that further reduced estimator variance (RE = 0.19), noting that the CAR model performance was generally greater in domains (stands) sharing boundaries with high numbers (> 10) of neighboring stands. The non-stationary spatial process (Chandra et al., 2012) may improve upon simpler CAR and SAR modeling approaches, as it employs a distance-weighted measure of correlation among domains rather than a simple binary model based on domain adjacency. Spatial models provide a promising tool for future applications of area-level SAE that account for non-trivial spatial correlations among small area domains likely to be realistic in many forest inventory applications (Finley et al., 2011;Magnussen et al., 2017).

Unit Level vs. Area Level SAE
Although area-level EBLUPS have shown clear gains in efficiency over direct estimates in forest inventory SAE applications, still greater gains have been demonstrated using unit-level approaches when suitable data are available and model assumptions are met . In comparing both approaches to direct estimates of forest volume, Mauro et al. (2017) observed halving of variance (RE = 0.48) and ten-fold reduction (RE = 0.09) for area-level and unitlevel estimates, respectively. A notable feature in the study was the large proportion-slightly more than half-of the 84 stand groupings defined as small area domains containing n d ≤ 6 field plots, linking greatest gains in efficiency to domains having relatively small n d . Their unit-level results achieved variance reductions among the largest of any reported in forest inventory literature Pascual et al., 2018). By comparison, Breidenbach et al. (2018) reported RE = 0.50 (see Section 4.1.3) for area-level and RE = 0.28 (see Section 4.1.2) for unit-level estimators applied to a common data set. Their domain-direct sample sizes were also small [4 ≤ n d ≤ 7], so the source of differential gains between the two studies' unitlevel estimators may be related to other factors including the strength of the synthetic model relationships, which we weren't able to compare between the two studies Breidenbach et al., 2018). A distinct barrier to achieving large gains in efficiency with unit-level SAE, especially compared to area-level approaches, is the ability to accurately georeference field plots and spatial auxiliary data sources. Green et al. (2020) noted this as a possible reason for the lack of gains in their unit-level models compared to area-level SAE.

Variance Estimation
Variances for model coefficients and estimates of finitepopulation parameters for design-based direct or model-assisted estimates (Sections 2.1 and 2.2) are in most cases calculable using commercially available statistical software packages (Molina and Marhuenda, 2015;Breidenbach, 2018;McConville et al., 2018;Hill et al., 2021). Other variance estimators are documented in research literature in sufficient detail to facilitate calculation with scientific programming software (McRoberts, 2012;Mandallaz et al., 2013;Babcock et al., 2015;Magnussen et al., 2017;Mauro et al., 2017;Frank et al., 2020). In design-based model-assisted estimators, variance calculations exist in closed form for some estimators such as GREG, or as approximations for others including ratio estimators (Särndal et al., 1992;Breidenbach and Astrup, 2012;Mandallaz, 2013;Magnussen et al., 2018). Because variance calculations often require accounting for nonindependence of observations or heteroskedasticity of residuals, or where algorithmic synthetic models such as non-parametric or nearest-neighbor modeling are employed, iterative methods can be used to estimate approximate variances, even where closedform solutions exist when model assumptions allow for them (McRoberts et al., 2007). Numerical approaches, such as leaveone-out cross validation or parametric bootstrap estimation have proven useful in variance estimation, although care must be taken to ensure that bootstrap data-generating mechanisms are aligned with error correlation structures and distributional assumptions.
Standard errors of domain-direct estimates are required inputs for FH and other area-based SAE, which can pose a problem when n d is insufficient to give stable variance estimates (Särndal, 1984;Breidenbach and Astrup, 2012). A solution for such applications is the use of generalized variance estimators (Valliant, 1987;Wolter, 2007;Goerndt et al., 2013;Coulston et al., 2021). Generalized variance functions tend to give variance estimates that are highly dependent on n d , e.g., Coulston et al. (2021), which may differ from direct variance estimates in domains having sufficient sampling intensity to produce stable standard errors.
Closed form solutions typically do not exist for variance estimation for SAE composite estimators, e.g., when domainlevel estimates are obtained as EBLUPs (Fay and Herriot, 1979;Battese et al., 1988;Rao and Molina, 2015). Variance of EBLUPs is assessed by mean squared errors (MSE) rather than standard errors to distinguish EBLUPS from design-unbiased estimators. The MSEs are routinely calculated in SAE software as additive combinations of terms representing uncertainties associated with (a) prediction of random effects, (b) estimation of regression model coefficients, and (c) estimation of the random-effect variance, i.e., the variance among small-area domains (Prasad and Rao, 1990). Alternatives involving parametric bootstrap variance estimation are employed in some applications, as are variances determined from posterior distributions in Bayesian analyses (Prasad and Rao, 1990;Babcock et al., 2015;Molina and Marhuenda, 2015).

Emerging Applications
Bayesian methods have been applied to SAE across disciplines for several decades (Morris, 1983;Ghosh and Rao, 1994), with recent applications in forest inventory settings as well (e.g., Finley et al., 2011). Babcock et al. (2018) used HB to estimate aboveground biomass with coupled auxiliary data from Landsat and lidar, to resolve incomplete coverage of remote sensing data.
Of the HB models they tested, one incorporating spatial random effects with lidar as auxiliary data led to 0.33 ≤ RE ≤ 0.51 across their 4 areas of interest. By incorporating coregionalization and adding tree cover derived from Landsat to complement incomplete lidar coverage, the range of RE decreased to 0.16 ≤ ≤ 0.35. Ver Planck et al. (2018) also saw success in formulating the F-H approach in a HB framework to increase precision of aboveground biomass SAE over direct estimates (RE = 0.23). Further improvement was demonstrated by adding conditional autoregressive random effects to account for spatial correlation (RE = 0.80), with the autoregressive model giving greater gains in precision in domains that shared boundaries with larger numbers of neighbors (Ver Planck et al., 2018). Adding a spatial random effect did not always lead to greater predictive performance, however, as shown by Babcock et al. (2015), who attributed modest to negative gains (0.83 ≤ RE ≤ 1.27) to possible overfitting.

RE
Estimating population parameters for multiple attributes of interest is an important concern in forest inventory applications ranging from local-scale stand assessments to regional and national multi-resource inventories (Babcock et al., 2013;Lochhead et al., 2018). Multivariate F-H and spatial F-H approaches have been developed and incorporated into statistical software, but their application to forest inventory has been limited (Molina and Marhuenda, 2015;Benavent and Morales, 2016). While multiple domain-specific estimates can be obtained using SAE in separate modeling procedures for each attribute, doing so fails to maintain logical consistencies among estimates, and overlooks potential gains in efficiency that may otherwise be realized by accounting for cross-attribute correlations Coulston et al., 2021). Generic inference in model-assisted design-based estimation affords consistency in multivariate estimates, but may come at a cost of increased standard errors in attributes uncorrelated or weakly correlated with selected auxiliary variables (McConville et al., 2020). Nearest-neighbor approaches have been used successfully in multivariate forest inventory settings, including multivariate model-assisted estimation to improve estimator efficiencies while preserving consistency among estimates (Chirici et al., 2016;McRoberts et al., 2017). Hierarchical Bayesian multivariate methods for SAE have been demonstrated for both unit-level and area-level settings suitable for forest inventory applications (Datta et al., 1998;Arima et al., 2017). Recent advances have also increased computational efficiencies for Bayesian analyses that can be applied to multivariate SAE involving very large data sets, expanding opportunities for further advances in this area (Finley et al., , 2017Datta et al., 2016;Babcock et al., 2018).
Another development in SAE applications for forest inventory aims at modeling sample and observational units at a level approaching that of individual trees rather than forest plots or larger subpopulation domains (Naesset, 2002;Mauro et al., 2016). Frank et al. (2020) tested a semi-individual tree (s-ITC) model approach-analogous to a unit-level approach-by segmenting tree crowns in lidar point clouds and delineating s-ITC units around them. Compared to a unit-level approach the s-ITC model showed a RE = 1.04 for volume, RE = 0.71 for basal area, RE = 1.38 for stem density, and RE = 0.48 for quadratic mean diameter. Despite attaining similar precision as plot-based unit-level EBLUPs for volume estimates at the population level, the s-ITC approach showed potential for its increased spatial resolution and ability to estimate population parameters more closely related to individual trees, such as mean diameter .
Applications of enhanced estimators such as model-assisted and SAE are not limited to cases where estimates per unit of land area are needed. Affleck and Gregoire (2015) compared estimation of crown biomass using randomized branch sampling to provide sample data for GREG. In their examination of a univariate estimator, model-assisted estimation led to improvement in the precision of estimates across a range of simulated sample sizes in randomized branch sampling (n = 5, 10, 20 and RE = 0.54, 0.71, and 0.97, respectively), although the authors cautioned against the possible trade-off between precision and design-unbiasedness (Affleck and Gregoire, 2015). Gains in precision were not limited to one sampling scheme. Increased precision was also seen for n = 5 based on other branch sampling methods: probability proportional to size sampling (RE = 0.94), simple random sampling (RE = 0.28), and stratified random sampling (RE = 0.28). The potential gains in biomass estimation at the tree level demonstrates how SAE may prove a useful tool in other contexts than estimating forest inventory attributes for small geographic areas.

CONCLUSIONS
Small area estimation (SAE) is a growing area of research in forest inventory owing largely to its ability to support modelbased inference in small area domains lacking sufficient sample data to provide stable estimates using purely design-based estimation. A variety of modeling techniques can be employed in the SAE framework with linear mixed modeling among the most widely used to-date. A unifying requirement is the use of auxiliary data that allows estimators to both borrow strength from indirect sample data that co-vary with auxiliary observations and to provide auxiliary population parameters as predictors in synthetic models for composite estimation. The availability of large data sets like those collected in NFIs, along with increasingly available auxiliary data such as DAP, ALS, satellite remote-sensing, or other digital map products has made SAE of particular interest to forest inventory specialists. Increases in precision from SAE can provide efficient alternatives to sample intensification when insufficient sample data are available to meet needed tolerances for estimator error.
The examples summarized here demonstrate potential benefits of SAE, along with some limitations researchers have encountered in applying evolving model-assisted design-based estimators or composite estimators that lie within unit-level or area-level SAE frameworks. As with any model-based approaches considerable attention should be paid to model assumptions including distributional assumptions for residuals, correct model form (e.g., linear vs. non-linear), careful selection of model predictors, and accounting for correlation structures among synthetic model residuals. Challenges may arise from a lack of correspondence between sample and auxiliary data such as when precise pairing of (x, y) observations is impractical, or when auxiliary data are unavailable for specific areas or time periods of interest. Consideration of alternative approaches including design-based and model-assisted estimation, area-level, or unitlevel (model-based) SAE is needed to ensure suitability of methods given inferential needs. Many topics for further research have been identified in the literature reviewed here, pointing out opportunities for future improvement in forest inventory applications of SAE. Although no tool can address needs in all circumstances, methods like those reviewed here provide flexible, efficient alternatives to reduce the need for sample intensification in many cases and to meet tolerance specifications in others at reduced cost by increasing estimator efficiency.

AUTHOR CONTRIBUTIONS
GD, PR, and JC: conception, design, and outline. GD and PR: literature review and document drafts. JC, PG, BW, and GM: reviewed drafts, provided corrections and revisions, discussion of needed changes, additions, and redesign. All authors contributed to the article and approved the submitted version. this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Copyright © 2022 Dettmann, Radtke, Coulston, Green, Wilson and Moisen. This is an open-access article distributed under the terms of the Creative Commons
Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.