Prior Specification for More Stable Bayesian Estimation of Multilevel Latent Variable Models in Small Samples: A Comparative Investigation of Two Different Approaches

Bayesian approaches for estimating multilevel latent variable models can be beneficial in small samples. Prior distributions can be used to overcome small sample problems, for example, when priors that increase the accuracy of estimation are chosen. This article discusses two different but not mutually exclusive approaches for specifying priors. Both approaches aim at stabilizing estimators in such a way that the Mean Squared Error (MSE) of the estimator of the between-group slope will be small. In the first approach, the MSE is decreased by specifying a slightly informative prior for the group-level variance of the predictor variable, whereas in the second approach, the decrease is achieved directly by using a slightly informative prior for the slope. Mathematical and graphical inspections suggest that both approaches can be effective for reducing the MSE in small samples, thus rendering them attractive in these situations. The article also discusses how these approaches can be implemented in Mplus.

Bayesian approaches for estimating multilevel latent variable models can be beneficial in small samples. Prior distributions can be used to overcome small sample problems, for example, when priors that increase the accuracy of estimation are chosen. This article discusses two different but not mutually exclusive approaches for specifying priors. Both approaches aim at stabilizing estimators in such a way that the Mean Squared Error (MSE) of the estimator of the between-group slope will be small. In the first approach, the MSE is decreased by specifying a slightly informative prior for the group-level variance of the predictor variable, whereas in the second approach, the decrease is achieved directly by using a slightly informative prior for the slope. Mathematical and graphical inspections suggest that both approaches can be effective for reducing the MSE in small samples, thus rendering them attractive in these situations. The article also discusses how these approaches can be implemented in Mplus.
Keywords: Bayesian estimation, Markov chain Monte Carlo, multilevel modeling, structural equation modeling, small sample As van de Schoot et al. (2017) pointed out, the number of applications of Bayesian approaches is growing quickly, mainly because software that is easy to use such as Mplus ) is providing Bayesian estimation as an option. Bayesian approaches can be beneficial in several respects, for example, by offering greater flexibility (e.g., Hamaker and Klugkist, 2011;Muthén and Asparouhov, 2012;Lüdtke et al., 2013) or fewer estimation problems (e.g., Hox et al., 2012;Depaoli and Clifton, 2015;Zitzmann et al., 2016), particularly when latent variable models are estimated. One major difference between Bayesian and traditional Maximum Likelihood (ML) estimation is that the former not only uses the information from the data at hand (i.e., the likelihood function) but combines it with additional information from what is called the prior distribution. Inferences are based on the result of this combination, that is, the posterior distribution. Scholars have advised researchers against the use of default priors in an automatic fashion and have encouraged them to specify priors on their own (e.g., McNeish, 2016;Smid et al., 2020). This may be an obstacle to some researchers. However, the prior can also be considered a feature of Bayesian estimation that can be used to improve estimation by choosing a favorable prior-a task that is particularly challenging but also particularly worth pursuing when the sample size is small.
The choice of prior has received a lot of attention in the methodological literature (e.g., Natarajan and Kass, 2000;Gelman, 2006;Chung et al., 2013), and scholars have made different suggestions about how priors can be specified in advantageous ways. Only recently, Smid et al. (2020) discussed how priors can be "thoughtfully" constructed on the basis of previous knowledge about the parameter of interest (e.g., on the basis of a previous study or a meta-analysis) in order to reduce small-sample bias. However, it has been argued that the variability of an estimator should not be ignored when evaluating the quality of a method (e.g., Greenland, 2000;Zitzmann et al., 2020), particularly when the sample size is small. Therefore, other suggestions for specifying the prior have been aimed at reducing the Mean Squared Error (MSE), which combines bias and variability: MSE = bias 2 + variability. One such approach was proposed by Zitzmann et al. (2015), who focused on the between-group slope in multilevel latent variable modeling. The authors suggested that researchers should suitably modify the estimator of the group-level variance of the predictor variable because this will result in a more stable (i.e., more accurate) estimator of the slope. To this end, a slightly informative prior is specified for the group-level variance of the predictor to pull the variance estimates away from zero (i.e., the indirect approach). By doing so, the estimates of the slope will not be too large, and the MSE of the estimator of the slope will be reduced. Notably, in contrast to Smid et al.'s (2020) suggestion, the prior does not need to match previous knowledge or the true value of the parameter in the population. Rather, an incorrect prior whose location deviates from the parameter in the population might reduce the MSE even more than a correct prior will. Zitzmann et al. (2015) found that for a standardized predictor (standardized at Level 1), a slightly informative inverse gamma prior for the group-level variance provided a somewhat biased but much more accurate (because it had a smaller MSE) estimator in small samples. Alternatively, to reduce the MSE of the estimator of the slope, one can specify a slightly informative prior directly for the slope in order to shrink the estimates and thereby ensure that they will not be too large (i.e., the direct approach).
In the present article, we mathematically work out the idea behind the direct approach for a simple multilevel latent variable model, and we contrast this approach with the indirect approach and with ML. Then, we graphically show the benefits that both approaches have over ML when the sample size is small. Finally, we discuss how these approaches can be implemented in Mplus.

EXAMPLE MODEL
Before we go into detail, we present an example model that we will use later to illustrate the different strategies. The model was suggested by Lüdtke et al. (2008) as one way to yield (asymptotically) unbiased estimates of between-group slopes in contextual studies (see also Asparouhov and Muthén, 2019). To this end, on the group level in the model, the dependent variable Y is predicted by a latent variable (i.e., the latent group mean) instead of the unreliable manifest group mean of the predictor variable, which is why the model was named the multilevel latent covariate model (Lüdtke et al., 2008). Such latent group means have become part of many more complex multilevel structural equation models that are commonly applied in research practice (see Preacher et al., 2010Preacher et al., , 2016, for overviews of such models).
More specifically, the individual-level predictor X splits into two uncorrelated and normally distributed parts: a betweengroup part X b , which is the latent group mean, and a withingroup part X w , which is the individual deviation from X b . For a person i = 1, . . . , n in group j = 1, . . . , J, the decomposition thus reads: X b,j is distributed around µ X with variance τ 2 X , whereas the deviation X w,ij has variance σ 2 X . Hereafter, we will also call σ 2 X and τ 2 X the within-group and between-group variances of X, respectively.
Applying Raudenbush and Bryk's (2002) notation, the regression at the individual level reads: where β w is the (fixed) within-group slope that describes the relationship between the predictor and the dependent variable at the individual level, and the ε ij are normally distributed residuals. The residual variance is σ 2 Y . At the group level, the intercept β 0j is regressed on X b : where α is the overall intercept, and β b is the between-group slope (i.e., the relationship between X and Y at the group level). The δ j are normally distributed residuals with variance τ 2 Y . Here, we focus on the between-group slope (β b ), which is of great interest in many applications of multilevel models (e.g., in the analysis of contextual effects). When the data are balanced (i.e., equal numbers of persons per group), the ML estimator of β b is given by: whereτ 2 X andτ YX are sample estimators of the grouplevel variance of X and the group-level covariance of X and Y, respectively. Some statistical properties of the ML estimator in Equation 4 need to be discussed first to be able to compare this estimator with the Bayesian estimators later on. First, by using the firstorder Taylor expansion (e.g., Casella and Berger, 2001; see also Grilli and Rampichini, 2011) and ignoring terms involving higher order factors such as 1 n 2 (n−1) or 1 n 2 for better readability, then the bias ofβ b can roughly be approximated by: This equation could be simplified further, but we continue to use this expression here to emphasize formal similarities with the biases of the Bayesian estimators (see below). However, even in its current form, it is evident from Equation (5) that the bias critically depends on the sample size (because J occurs in the denominator) and that the bias is generally non-zero in small samples. However, if we let J become large, the bias diminishes because 1 J−1 becomes very small-a property of the estimator that we refer to as "asymptotic unbiasedness." In a similar way, we can yield an approximation of the variability ofβ b : Similar to the bias, the variability depends on the sample size in such a way that the variability will be small when J is large. Because the MSE ofβ b is the sum of the squared bias and the variability, this measure will be small as well. Taken together, the more information the data provide, the more the overall accuracy of the estimator improves. Whereas the asymptotic properties are favorable, the ML estimator tends to be biased in small samples, and it has high variability and thus a large MSE in these situations (e.g., McNeish, 2017). This challenges the usefulness of the ML estimator when the sample size is small because the result from a single study might be highly inaccurate. Therefore, scholars have called for alternative estimators that are less variable and thus more accurate (i.e., they have a smaller MSE), although they might be more biased than ML. In the multilevel literature, such estimators have been suggested by Chung et al. (2013), Greenland (2000), Grilli andRampichini (2011), andZitzmann et al. (2015), for example. Next, we develop the direct strategy, and recap the indirect strategy of specifying the prior.

THE DIRECT STRATEGY
We refer to the first strategy as the direct strategy because the prior is specified directly for the between-group slope (β b ). To illustrate, we assume a normal prior, which can be formalized as: which should be read as "β b is normally distributed with mean a and variance b." However, for better interpretability, we employ another, more convenient parameterization. Instead of a and b, we use the terms β 0 and As we will show, β 0 and ν 0 can be meaningfully interpreted. One way of expressing the likelihood for the slope is: whereτ 2 Y andτ 2 X are the sampling variances of τ 2 Y and τ 2 X , respectively. If we combine the prior in Equation (9) with the likelihood, we obtain the following posterior: which is also a normal distribution. The mean of this distribution defines the Bayesian Expected A Posteriori (EAP) estimator, which is the standard choice for a point estimator in Bayesian estimation (Note that the Bayes module in Mplus uses the median of the posterior). With w = J ν 0 +J , this Bayesian estimator can also be expressed as: As can be seen from the equation, the estimator is simply the weighted average of the mean of the prior (β 0 ) andβ b , which suggests straightforward interpretations for the parameters of the prior. One may think of β 0 as the prior guess for the betweengroup slope and ν 0 as the prior sample size (see also Hoff, 2009). These interpretations are substantiated by the observation that the larger ν 0 , the smaller w, and the more the estimates shrink toward β 0 . Less technically speaking, when we are more confident in β 0 , the prior will gain more weight, and the posterior will shift to the mean of the prior. However, when we choose ν 0 to be very small, w will be close to 1, andβ b will be similar toβ b , which justifies the view that the modified estimator includes the original ML estimator as a limiting case. Notice that the prior guess does not need to represent previous knowledge about β b . Rather, it could be set to a value that is much smaller than what previous studies have suggested and also much smaller than the parameter in the population. However, such an "incorrect" prior guess might still be beneficial, particularly when the sample size is small. To be able to compare the properties of the Bayesian estimator with the ML estimator and with the Bayesian estimator from the second strategy of specifying the prior, we again use the Taylor expansion, and we ignore terms involving higher order factors. A rough approximation of the bias ofβ b is then given by: Frontiers in Psychology | www.frontiersin.org Similar to the ML estimator,β b is generally biased when the sample size is small. However, the bias vanishes when J approaches infinity because w approaches 1, and 1 J−1 approaches 0 (asymptotic unbiasedness). Moreover, if ν 0 is set to a value close to 0, the bias will become similar to the bias ofβ b .
The variability ofβ b can be approximated as: With a very large J, the variability becomes negligibly small, and the same holds for the MSE. However, the more interesting questions are: How does the MSE ofβ b depend on the prior parameters (β 0 , ν 0 ) when the sample size is small, and how must they be chosen such that the MSE will be smaller than the MSE of ML? Before we compare the different choices for (β 0 , ν 0 ), we present another strategy for specifying the prior. Alternatively to specifying the prior directly for the between-group slope, one can also specify a prior for the group-level variance of the predictor, thereby also modifying the estimator of the slope. We call this strategy the indirect strategy.

THE INDIRECT STRATEGY
The principle that underlies the indirect strategy was discovered in the early years of Structural Equation Modeling (SEM), where models were fit on the basis of the variances and covariances of variables. One observation was that when the sample size was small, covariance matrices tended to be on the border of positive definiteness (e.g., a variance estimate close to 0, correlations close to −1 or 1; e.g., van Driel, 1978;Dijkstra, 1992;Kolenikov and Bollen, 2012). Hence, estimators of slope parameters tended to have high variability and thus also a large MSE. This led Yuan and Chan (2008) to develop the ridge technique to mitigate such problems as it modifies the estimator of the covariance matrix by adding a small value to the main diagonal (see also Yuan and Chan, 2016;Yang and Yuan, 2019). The main idea behind this technique can also be adapted for Bayesian estimation. Papers by Chung et al. (2013), Chung et al. (2015), or Zitzmann et al. (2015) are good examples of this. By means of simulation, Zitzmann et al. (2015) verified that specifying a slightly informative prior for the group-level variance of the predictor that pulls estimates of this variance slightly away from zero can increase the accuracy of the estimator of the between-group slope by reducing its MSE. Note that pulling the variance estimates away from zero corresponds to adding a value to these estimates. A formal argument for why such a prior reduces the MSE was only recently presented by Zitzmann et al. (2020). For reasons of completeness and comparability with the two previously presented estimators, we illustrate the strategy here once more, using the example model from above. Rather than beginning with the assumption of a normal prior for the between-group slope, we begin with a gamma prior for the inverse of the group-level variance of the predictor variable (τ 2 X ): where a and b are the parameters of the gamma distribution. 2 Equation (15) reads "τ 2 X is inverse-gamma distributed." As for the normal prior in the previous section, we employ a reparameterization for better interpretability later on. If we set a to ν 0 2 and b to ν 0 τ 2 0 2 , the prior reads: where, as we will show, τ 2 0 and ν 0 have interpretations similar to those of the parameters of the (reparameterized) normal prior.
The likelihood for the inverse of the group-level variance can be written as: whereτ 2 X is the sample variance. Combined with the prior in Equation (16), we yield the inverse gamma posterior: As Zitzmann et al. (2020) showed in their Appendix C, the mean of this distribution can be approximated as: where w = J ν 0 +J . This equation defines the Bayesian EAP estimator of τ 2 X . It is interesting to note that the equation resembles Equation (12). The right-hand side of the equation is also a weighted average, and τ 2 0 and ν 0 can be thought of as the prior guess and the prior sample size, respectively (see Hoff, 2009;Lüdtke et al., 2018;Zitzmann et al., 2020).
Adding a prior for τ 2 X also has consequences for the estimator of the between-group slope (β b ). Replacing the denominator in Equation (4) (τ 2 X ) withτ 2 X results in: This new estimator is indicated by a tilde (∼) in order to better differentiate it from the ML estimator and from the Bayesian estimator that results from the direct strategy of specifying the prior (Equation 12).
To derive some properties ofβ b , we apply exactly the same reasoning that led to the respective properties ofβ b andβ b . Accordingly, the bias ofβ b is roughly: where f is used as an abbreviation for the ratio Notice that the equation implies thatβ b is generally biased when the sample size is finite, whereas the bias diminishes when J approaches infinity (asymptotic unbiasedness). Moreover, the bias becomes similar to the biases ofβ b andβ b when we let ν 0 become very small. The variability ofβ b is: (22) Similar to the previous equation, we can easily infer that when J is large, the variability will be small and, thus, the MSE, which combines bias and variability, will be small as well-an observation that once again demonstrates the role of the sample size in determining the accuracy of estimations. However, as mentioned above, it is much more interesting to ask how the prior parameters τ 2 0 and ν 0 must be chosen such that the MSE will be reduced in comparison with ML in small samples.

COMPARING THE MSEs IN SMALL SAMPLES
In this section, we investigate the MSE of the different strategies for specifying priors in small samples for different choices of the prior parameters, using the example model from above to simulate data that are typical in psychology. Because it is difficult to infer from the equations how the MSEs compare with each other, they were plotted against the sample size to allow for graphical comparisons.
In accordance with Lüdtke et al. (2008), we considered the case of standardized variables (standardized at Level 1), and 3 We would like to state that we recognized a typo in the bias formula of Zitzmann et al.'s (2020) original publication. There should be a minus sign in front of the first term of the curly-bracketed expression. Equation (21) presents the corrected formula. However, the numerical results on which Figure 3 in Zitzmann et al. (2020) was based were not affected by the typo because these results were generated from formulas that were correct and also provided even more precise approximations than the ones presented in the article (because terms with higher order factors were not omitted).
we assumed that the between-group slope (β b ) was 0.7 in the population. Moreover, we set the number of persons per group (n) to 5, which is not uncommon in many subdisciplines of psychology, including organizational, personality, and social psychology. The ICC of the predictor was 0.1, which could be considered small-to medium-sized compared with typical ICCs (Snijders and Bosker, 2012;Zitzmann et al., 2015). The sample size at the group level (J) was varied from 20 to 60 groups because these numbers represent small sample sizes (e.g., Hox et al., 2012; see also Hox et al., 2010) and the aim was to compare the estimators in these situations. Figure 1 depicts a normalized version of the MSE, the Root Mean Squared Error (RMSE), for five different estimators of the slope. The first estimator in the figure is the ML estimator (solid black line). The second estimator (blue dashed line) is the Bayesian estimator that results when the direct strategy is combined with a correct prior for β b (i.e., the prior guess, β 0 , equals the parameter in the population). Because β b was 0.7 in the population, a correct prior for β b was specified by setting β 0 equal to this value. The third estimator (blue dotted line) also resulted from the direct strategy. However, β 0 was set to 0 (and thus well below 0.7) in order to shrink estimates that were too large toward zero. The fourth estimator (red dashed line) resulted from the indirect strategy with a correct prior for the group-level variance of the predictor (τ 2 X ). The prior guess (τ 2 0 ) was set to 0.1, which was the value of τ 2 X in the population. 4 The fifth estimator (red dotted line) resulted from the indirect strategy as well. However, β 0 was set to 1, which was above the parameter in the population. Thus, estimates of the variance were pulled away from zero, and, therefore, the estimates of the slope were shrunken. The three different panels of Figure 1 show the RMSEs for different values of ν 0 : 0.1 (upper left), 1.0 (upper right), and 5.0 (lower left). The first two values can be considered choices that are only slightly informative, whereas the latter is more informative and was used here to illustrate what happens to the RMSE when the priors become more informative.
As can be seen in the Figure 1, the different estimators tended to provide different RMSEs. The RMSE was largest for the ML estimator, whereas the RMSE was reduced when a Bayesian estimator was used. The reduction was particularly pronounced when J was very small. In addition and more important, the extent of the reduction also depended on the strategy for specifying the prior and the choices for the prior parameters. Although the direct strategy reduced the RMSE overall, the RMSE was slightly smaller when this strategy was combined with an incorrect prior (i.e., β 0 = 0) than with a correct prior (i.e., β 0 = 0.7). Moreover, the choice of a larger ν 0 was associated with a smaller RMSE. However, the smallest RMSEs emerged when the indirect strategy was used with an incorrect prior (i.e., τ 2 0 = 1, which was also the upper bound of τ 2 X due to standardization). With a larger value of ν 0 = 1, the RMSE was reduced relative to a ν 0 of 0.1. However, setting ν 0 to 5 did not yield an RMSE that was even smaller. Rather, the RMSE was slightly larger than with a ν 0 of 1 because the bias induced by the prior outweighed the variability in the computation of the RMSE. Additional results FIGURE 1 | The analytically derived Root Mean Squared Error (RMSE) in estimating the between-group slope for the direct and the indirect approach as a function of the sample size at the group level (J) and the prior distribution. Results are shown for n = 5 persons per group and an intraclass correlation of ICC = 0.1. ML, maximum likelihood; direct, the prior was specified directly for the between-group slope; indirect, the prior was specified for the group-level variance of the predictor variable; correct, correct prior (i.e., the prior guess equaled the value of the parameter in the population); incorrect, incorrect prior (i.e., the prior guess deviated from the parameter in the population); ν 0 , prior sample size.
are presented in the Appendix. Figure A1 shows the RMSEs of the different estimators for a larger number of 10 persons per group, whereas Figure A2 shows the RMSEs for a higher ICC of .2. Although the RMSEs were smaller in Figures A1, A2 compared with Figure 1, the big picture was similar overall: The different estimators provided different RMSEs. All Bayesian estimators provided smaller RMSEs than the ML estimator in very small samples except the indirect strategy with an incorrect informative prior.
To sum up, both strategies for specifying the prior offer attractive ways to obtain more accurate estimators of the between-group slope in small samples when used with slightly informative priors. Especially when no previous knowledge exists about the parameters, the choice of a relatively small prior guess for the between-group slope or a relatively large prior guess for the group-level variance of the predictor could be useful when these choices are combined with a small ν 0 in the low onedigit range. Although somewhat biased, the resulting Bayesian estimators of the slope were found to be more accurate than ML when the sample size was small.

DISCUSSION
It has been argued that Bayesian approaches can be beneficial when the sample size is small because prior distributions can be used to increase estimation accuracy. In the present article, we focused on the between-group slope because this parameter is often of interest in multilevel latent variable modeling. Two approaches for specifying priors can be distinguished, both of which are aimed at reducing the MSE of the estimator of the between-group slope: In the first approach, a slightly informative prior is specified directly for the slope, whereas in the indirect approach, the MSE is reduced by using a slightly informative prior for the group-level variance of the predictor variable. In the present article, we worked out the former approach mathematically and compared it with the indirect approach and with ML. Graphical inspections suggested that both approaches can be very effective in reducing the MSE compared with ML in small samples, rendering them attractive for researchers. We would like to add that these approaches are not mutually exclusive and that researchers can also apply them simultaneously by specifying slightly informative priors for the slope as well as for the group-level variance of the predictor variable. To provide initial information about how such a simultaneous application of the two approaches performs, we conducted an additional simulation study with 20 to 60 groups, 5 persons per group, and an ICC of the predictor variable of 0.1. Figure 2 depicts the RMSE for five different estimators of the slope. The first estimator is the ML estimator (solid black line). The second estimator (blue dashed line) is the Bayesian estimator that resulted when the direct strategy and the indirect strategy were simultaneously applied and combined with correct priors for the between-group slope and the group-level variance of the predictor, respectively. The third estimator (blue dotted FIGURE 2 | The simulated Root Mean Squared Error (RMSE) in estimating the between-group slope for the combined approach as a function of the sample size at the group level (J) and the prior distribution. Results are shown for n = 5 persons per group and an intraclass correlation of ICC = 0.1. ML, maximum likelihood; correct/correct, correct priors (i.e., the prior guesses equaled the values of the parameters in the population) were specified for the between-group slope and the group-level variance of the predictor variable; correct/incorrect, a correct prior was specified for the between-group slope, and an incorrect prior (i.e., the prior guess deviated from the parameter in the population) was specified for the group-level variance of the predictor variable; incorrect/correct, an incorrect prior was specified for the between-group slope, and a correct prior was specified for the group-level variance of the predictor variable; incorrect/incorrect, incorrect priors were specified for the between-group slope and the group-level variance of the predictor variable; ν 0 , prior sample size. line) also resulted from combining the two strategies. However, whereas the direct strategy was combined with a correct prior, the indirect strategy was combined with an incorrect prior. The fourth estimator (red dashed line) resulted from simultaneously applying the direct strategy with an incorrect prior and the indirect strategy with a correct prior. The fifth estimator (red dotted line) resulted from the simultaneous application of the two strategies as well. However, both strategies were combined with incorrect priors. The three different panels of the figure show the RMSEs for different values of the prior sample size. Again, the RMSE was largest for the ML estimator, whereas the RMSE was reduced when a Bayesian estimator was used, particularly when this estimator was combined with slightly informative priors and the sample size was small. Thus, the overall finding from this simulation confirmed that a simultaneous application of the two approaches (i.e., specifying slightly informative priors for the slope as well as for the group-level variance of the predictor variable) can also be beneficial. However, because the consequences of such a use could not be studied exhaustively here, it would be interesting to conduct a more thorough simulation on this topic in future research.
Although our findings were generally favorable and could be considered a successful "proof of concept, " a word of caution is nevertheless needed. Our demonstrations were very limited. For example, the specific conditions we studied do not completely reflect real data. Future research should consider a wider range of conditions for more conclusive findings. Moreover, the example model we used was overly simple. Realistic models typically involve more than one predictor and also multiple indicators per construct. However, one can derive the Bayesian estimators analogously in this more general multivariate case. Zitzmann (2018) even showed that in a multilevel SEM with two latent predictors with three indicators each, a slightly informative inverse Wishart prior for the covariance matrix of the predictors led to more accurate estimators of the between-group slopes, particularly when the samples size was small. Finally, the MSEs of the estimators we derived were only rough approximations. These approximations can nevertheless be useful for deriving hypotheses about which prior works well under which condition.
Before we come to Mplus, we wish to acknowledge that parameter stabilization does not require Bayesian estimation. In fact, the idea of using slightly informative priors is similar to using techniques from the frequentist framework (Hastie et al., 2009). For example, the weighting parameter (w) of the Bayesian estimator in Equation (12) has an effect similar to that achieved by the penalty in regularized SEM (e.g., Jacobucci et al., 2016), and the weighting parameter in Equation (19) corresponds with the tuning parameter in ridge generalized least squares (e.g., Yuan and Chan, 2016) and regularized consistent partial least squares estimation (e.g., Jung and Park, 2018). Despite the existence of these methods, we employed Bayesian estimation here for reasons of convenience and because this type of estimation is an option in Mplus, which is the software that many researchers use to fit multilevel latent variable models.
Mplus does not use Bayesian estimation as the default, and users must request it by setting ESTIMATOR to BAYES. Next, to yield a more accurate estimator of the between-group slope by using a slightly informative prior for this parameter, users must specify such a prior manually. In Mplus, normal priors are parameterized as in Equation (8), where a is the mean and b is the variance. Thus, to specify a normal prior with the prior guess (β 0 ) and the prior sample size (ν 0 ) equaling 0 and 1, respectively, users must compute a and b first. Given a = β 0 and b = τ 2 Y ν 0 τ 2 X , we yield an a of 0 and a b of τ 2 Y τ 2 X . Because τ 2 Y and τ 2 X are unknown, they need to be replaced with, for example, their sample estimates. Assuming that these estimates areτ 2 Y = 0.15 andτ 2 X = 0.1, then the prior is specified by the following line of code:

MODEL PRIORS:
Name of slope~N( 0, 1.5); Our findings suggest that this prior increases the accuracy of estimation in small samples. Choosing an even smaller value for b can also be useful in these situations. Alternatively, one could also specify a slightly informative prior for the group-level variance of the predictor. To be able to do this, users must compute the parameters a and b in Equation (15) because Mplus uses this parameterization of the inverse gamma prior. Setting both τ 2 0 and ν 0 to 1 results in a = b = 1 2 , using a = ν 0 2 and b = ν 0 τ 2 0 2 . The following code line implements the prior with Mplus:

MODEL PRIORS:
Name of variance~IG( 0.5, 0.5); For a standardized predictor, this prior is quite effective when the sample size is small. Specifying somewhat larger values (e.g., by setting ν 0 = 2) might increase estimation accuracy even further (Depaoli and Clifton, 2015).
To conclude, we worked out and discussed Bayesian approaches that perform better than ML in small samples, and we offered some practical guidance on how to implement these approaches with Mplus. We hope that this article will help researchers in the field of psychology move beyond using Bayesian estimation as "just another estimator" and will help them make choices that are beneficial when their aim is to fit multilevel latent variable models and the sample size is small.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.

AUTHOR CONTRIBUTIONS
SZ: writing, mathematical derivations, and graphic design. CH: writing. MH: writing and lead. All authors contributed to the article and approved the submitted version. FIGURE A2 | The analytically derived Root Mean Squared Error (RMSE) in estimating the between-group slope for the direct and the indirect approach as a function of the sample size at the group level (J) and the prior distribution. Results are shown for a small number of n = 5 persons per group and an intraclass correlation of ICC = 0.2. ML, maximum likelihood; direct, the prior was specified directly for the between-group slope; indirect, the prior was specified for the group-level variance of the predictor variable; correct, correct prior (i.e., the prior guess equaled the value of the parameter in the population); incorrect, incorrect prior (i.e., the prior guess deviated from the parameter in the population); ν 0 , prior sample size.