Evaluating fit indices in a multilevel latent growth model with unbalanced design: a Monte Carlo study

This study informed researchers about the performance of different level-specific and target-specific model fit indices in the Multilevel Latent Growth Model (MLGM) with unbalanced design. As the use of MLGMs is relatively new in applied research domain, this study helped researchers using specific model fit indices to evaluate MLGMs. Our simulation design factors included three levels of number of groups (50, 100, and 200) and three levels of unbalanced group sizes (5/15, 10/20, and 25/75), based on simulated datasets derived from a correctly specified MLGM. We evaluated the descriptive information of the model fit indices under various simulation conditions. We also conducted ANOVA to calculated the extent to which these fit indices could be influenced by different design factors. Based on the results, we made recommendations for practical and theoretical research about the fit indices. CFI- and TFI-related fit indices performed well in the MLGM and could be trustworthy to use to evaluate model fit under similar conditions found in applied settings. However, RMSEA-related fit indices, SRMR-related fit indices, and chi square-related fit indices varied by the factors included in this study and should be used with caution for evaluating model fit in the MLGM.


Introduction
Social science researchers are often interested in understanding how characteristics of individuals or entities change over time (Wang et al., 2015).These characteristics could be observations about general behavior or overall academic performance, or they could be observations about specific constructs, such as depression, communication skills, attitudes toward teachers or parents, or math ability (Baumert et al., 2012).Longitudinal studies describe the changing pattern of characteristics of interest.Longitudinal studies also investigate the questions such as of how change comes about, how much change occurs, how the change process might differ across observations, and the determinants of that change over a set period.
If research questions consider both change over time and nested data, the use of Multilevel Latent Growth Model (MLGM) have been advocated as a method for analysis.MLGM, an multilevel structure equation model (SEM) extends the Latent Growth Model (LGM) model by accommodating the dependence between observations due to nested longitudinal data (Shi Pan and Liu 10.3389/fpsyg.2024.1366850Frontiers in Psychology 02 frontiersin.orget al., 2019).The nested longitudinal data include repeated measures for each individual nested within the groups, thus forming a threelevel structure.Based on previous research (e.g., Longford and Muthén, 1992;Linda et al., 1993), the three-level structure can be specified with a two-level model.In this two-level model, individual related parameters are estimated in the within-level model, and group related parameters are evaluated in the between-level model.MLGM can output different parameters for different levels, allowing researchers to separately study different levels.
Combining both the benefits of multilevel models and LGM, MLGM is ideally suited for addressing the research questions concerning multilevel longitudinal data (Hsu et al., 2016).A MLGM combines advantages of LGM (e.g., ability to incorporate indirect effects, complex measurement error structures, and multiple group analysis) while also correcting extent of clustering (Palardy, 2008).MLGM investigate both observations and group trajectories within one analysis.In MLGM, the individual level model and group level model have different latent intercepts and latent slopes, so individual level and group level can have different growth patterns (Rappaport et al., 2019).Further, MLGMs can include characteristics of both observation and group levels to explain the influence of various characteristics on: the change patterns of two levels, the change of measured attributes of observations within each group, and the change of all observations' measured attributes (Hsu et al., 2016).In addition, compared to LGM, which only considers the means of measured attributes of time points, MLGM measures both the means of different times points and the means of different groups; this can assist researchers' understanding of the overall status of the measured attributes of different groups.Muthén (1997) compared the model estimation results of MLGM, such as model fit indices and standard errors of parameters, to results of other SEM models.Compared to LGM and multilevel Confirmatory Factor Analysis, MLGM computed the best model evaluation information, indicating that MLGM was the appropriate hypothesized model for multilevel longitudinal data.
Figure 1 shows a two-level linear growth MLGM with four constant growth time points.IW (η1) represents the intercept of an individual's growth trajectory and LW (η2) represents the slope of an individual's growth trajectory.Y1-Y4 represent four continuous outcomes for individuals, and ε1-ε4 represent the degree of deviation between the observed outcome and the expected outcome of individuals.Λ represents the factor loading for individual-level; φ represents the factor variances and covariances for individual-level; θε represents the error variances and covariances for individual-level.η is the latent variable means for individual-level.IB (η1) represents the intercept of a group's growth trajectory and LB (η2) represents the slope of a group's growth trajectory.Y1-Y4 represent four continuous outcomes for groups, and ε1-ε4 represent the degree of deviation between the observed outcome and the expected outcome for groups.Λ represents the factor loading for group-level; φ represents the factor variances and covariances for group-level; θε represents the error variances and covariances for group-level.η is the latent variable means for group-level.Under this condition, both group-level and individual-level's matrices Λ will be fixed.The matrices φ and the matrix θε of both levels will be estimated.Typically, factor loadings of different levels are set to be equal to obtain unbiased parameter estimates and statistical inferences (Muthén, 1997).
The terms balanced and unbalanced are frequently encountered with longitudinal analysis approaches.A balanced design describes multilevel longitudinal data in which equal observations are planned to be measured at the different groups, whereas an unbalanced design occurs when the number of observations planned to be measured at each group is not the same.It is common to encounter an unbalanced design in empirical situations (Graham and Coffman, 2012).For example, states' educational policies may have a general requirement for the number of students in each class or school.The students in each class or school will fluctuate around this general number.Consider that policy states the number of students in each class to be 20; however, the actual number of students could be 18, 19, 20, or 21 per classroom.
When researchers are evaluating an MLGM, typical SEM model fit indices are relied upon and commonly accepted cutoff values (or "rules of thumb") are used for interpretation (e.g., Hu and Bentler, 1999).One common approach to evaluate MLGM is to use typical SEM model fit indices (e.g., the root mean square error of approximation [RMSEA], comparative fit index [CFI], Tucker-Lewis index [TLI], and standardized root mean square residual [SRMR]) to assess the model fit.However, there are problems with using typical SEM fit indices to judge the MLGM fit.The typical SEM fit indices are likely to be dominated by large sample size (Yuan and Bentler, 2000;Ryu and West, 2009;Hox et al., 2010) and are more sensitive to misspecification in the within-level model (Hsu et al., 2015).In MLGM, individual (i.e., within) level has much larger sample size than group (i.e., between) level.As a solution to the problems of typical SEM model fit indices when using MLGM, researchers have developed level-specific and target-specific model fit indices to detect whether the poor fit of the hypothesized MLGM comes from individual level model or group level model (Hsu et al., 2015).
There is little guidance for researchers interested in using level-specific and target-specific model fit indices for unbalanced MLGM.With balanced data, as each cluster has constantly measured attributes, one covariance and one mean structure could represent the relationship between subjects within each cluster.The mean structure of balanced data is calculated by summing the values of all individuals in the cluster to divide the fixed number of individuals (Distefano, 2016).However, as each cluster in unbalanced data do not have same numbers of subjects, one mean structure could not stand for mean structure of all clusters.Each cluster's covariance is also different due to different numbers of measured attributes.The non-constant covariance structure within each level may cause a severe concern, especially if separate trajectories for subjects and clusters are of interest (Grund et al., 2018).Different number of subjects in each cluster may cause misspecification of mean and covariance structures for each level required by the model estimation (e.g., Laird 1988;Little and Rubin, 2019) and result in low statistical power for overall MLGM estimation (Diggle, 2002).Therefore, this study aims to fill the gaps in these literature.There has not been any study of what happens under the 'best' circumstances (i.e., correctly specified model).In this study, a correctly specified MLGM was simulated considering two design factors: different group sizes and unbalanced observation sizes to investigate the performance of different model fit indices under these different conditions.A partially saturated model (PS model) has been proposed to obtain the level-specific fit indices (Ryu and West, 2009).A PS model means that in an MLGM, either a within-level model or a betweenlevel model is a saturated model.A PS model can be obtained by correlating all the observed variables and allowing all the covariances or correlations to be freely estimated at the between-level or withinlevel model.Ryu and West (2009) demonstrated that the PS method calculates level-specific fit indices with reasonable non-convergence rates and low Type I error rates.With a high non-convergence rate, a model fails to achieve equilibrium during analysis (Hox et al., 2010).Ryu and West (2009) indicated that PS model generated low non-convergence rate and was appropriate to generate level-specific fit indices.Using PS method, Ryu and West calculated the betweenlevel specific fit indices (PS_B) and within-level specific fit indices (PS_W), meaning that different levels will be evaluated by different fit indices (Ryu and West, 2009;Hsu et al., 2015).
Previous literature, investigating level-specific fit indices' performance, has been conducted for different multilevel SEMs (MSEM): multilevel confirmatory factor analysis, multilevel path models, multilevel nonlinear models, and MLGM (Ryu, 2014;Schermelleh-Engel et al., 2014;Hsu et al., 2016).Ryu and West (2009) simulated a multilevel confirmatory factor, indicating that within-level specific fit indices correctly indicated the within-level model's poor model fit, and between-level specific fit indices successfully detect the lack of fit in the between-level model.Based on Ryu and West (2009) results, Ryu (2014) illustrated the level-specific model evaluation using empirical data and provided recommendations to researchers interested in using level specific fit indices for MSEM.As an extension of the above two studies, Hsu et al. (2016) considered the impact of intraclass correlation coefficients (ICCs) on the performance of levelspecific fit indices in simulated MSEM.The ICC is defined as the ratio between group-level variance and total variance (Cohen et al., 2013).Hsu et al. 's results showed that the ICC does not significantly affect the effectiveness of between-level specific model fit indices.ICC did not influence all within-level fit indices.When ICC was very low, CFI PS_W and TLI PS_W can still detect the misspecification for between-level models, whereas SRMRB and RMSEA PS_W did not work.
Only one study to date has concentrated on the level-specific fit indices in MLGM (Hsu et al., 2019).In line with Wu and West (2010) study, this simulation study extended Wu and West (2010) single level LGM to a two-level MLGM model with the same accelerating quadratic trajectory and time points.The estimated MLGM had five time points, and each time point was assumed to be on a standardized scale (i.e., M = 0 and SD = 1).The parameter settings were simulated based on empirical data from the Longitudinal Surveys of Australian Youth (LSAY); Following Wu et al. (2015) simulation study, Hsu et al. (2019)'s study simulated the number of clusters (NC) as, 50, 100, 200, and cluster sizes (CS) were designed into three levels, 5, 10, and 20.The results showed that CFI-and SRMR-related fit indices were not affected by small NC or CS.The RMSEA-related fit indices were likely to be influenced by small NC or CS.TLI-related fit indices needed a moderate NC (100) and CS (10).The results also indicated that within-level specific fit indices, RMSEA PS_B , CFI PS_B , and TLI PS_B , were not sensitive to the misspecified between-covariance structure, whereas SRMR B was recommended to detect this misspecification.As for the misspecified between-mean structure, RMSEA PS_B , CFI PS_B , and TLI PS_B were suggested.Among them, RMSEA PS_B was recommended as it was found to be more sensitive to detecting misspecification.

Target specific fit indices for MLGM
In addition to level-specific fit indices, our research also evaluated the performance of target-specific fit indices.Target-specific fit indices for MLGM examine whether the misspecification comes from the covariance structure or the mean structure of between-level or withinlevel model (DiStefano et al., 2013).Hsu et al. (2019) extended the investigation of target-specific fit indices' performance from the context of LGM to MLGM.The authors outlined a practical way to compute the target-specific fit indices for the between-level covariance structure fit indices and the between-level mean structure fit indices.The target-fit indices for MLGM only need to be estimated at the between-level model.Because fixing the means of growth factors at zero, the misspecifications of the whole MLGM could only be attributed to the within-covariance structure (Muthén, 1997).The fit indices for between-level covariance structure (T_S_COV) include χ 2 T_S_COV , RMSEA T_S_COV , CFI T_S_COV , TLI T_S_ COV , and SRMR T_S_COV , and the fit indices for the between-level mean structure (T_S_MEAN) has χ 2 T_S_ Mean , RMSEA T_S_ Mean , CFI T_S_Mean , TLI T_S_Mean , and SRMR T_S_Mean .
Based on Wu and West (2010) and Ryu and West (2009) research, Hsu et al. (2019) generated T_S_MEAN fit by saturating the within-level model and the covariance structure of the between-level model.T_S_ COV fit indices were created by saturating the within-level model and the mean structure of the between-level model.The researchers studied the influence of the sample size, cluster size, and type of misspecification on the sensitivity of target-specific fit indices for MLGM.The results indicated that RMSEA T_S_COV , CFI T_S_COV , and TLI T_S_ COV showed higher sensitivity to misspecified between-variance structure than RMSEA PS_B , CFI PS_B , and TLI PS_B .In addition, the RMSEA T_S_COV yielded a higher sensitivity than the other two fit indices.χ 2 T_S_COV is also favored because of its high statistic power using for different sample size conditions.SRMR T_S_COV is not recommended when the cluster size is less than 5.As for a misspecified between-mean structure, RMSEA T_S_ Mean , CFI T_S_Mean and TLI T_S_Mean did not show a higher sensitivity than RMSEA PS_B , CFI PS_B , and TLI PS_B .Hsu et al. (2019) recommended researchers use RMSEA PS_B , CFI PS_B , and TLI PS_B to detect misspecified between structures.Both SRMR T_S_mean and SRMR B are not recommended because they had means and variances close to 0.

Unbalanced design for MLGM
As researchers can rely on level-specific and target-specific model fit indices to judge an MLGM model's acceptability, testing if the levelspecific and target-specific fit indices perform acceptably under MLGM with an unbalanced design is needed.Previous studies investigating the level-specific and target-specific model fit indices for MLGM have only examined a balanced design.Few studies concerning the MLGM model fit when unbalanced designs are present.Studies have used the effect of estimation direct maximum likelihood (direct ML) to address unbalanced issues in MLGM (Ryu and West, 2009).Direct ML conceptualizes the unbalanced design as a form of missing data.However, direct ML can only provide traditional model fit indices for MLGM and could not output levelspecific and target-specific model fit indices.
For a balanced MLGM with G balanced groups, each group has n observations.The total sample size N equals nG.The MLGM defines the within group covariance matrix as SPW and the between group covariance matrix as S*B.The formulas for the SPW & S*B covariance matrices are: In the above two equations, Y gi represents for the response for each observation, Y g represents the mean response of n observations in each group, and Y indicates for the mean response of all N observations in the data.
In an unbalanced MLGM situation, as groups have unequal numbers of individuals, SPW may still represent the within group covariance matrix because the SPW formula directly pools together all observations, regardless of group size.S*B, however, cannot represent the covariance matrix for each group because each group could have a distinct group size, n.Different S*B matrices will be calculated for each group.In this way, the aggregate covariance for unbalanced multilevel data no longer represents sufficient statistics for model estimation and may cause problems for model estimation.
Although unbalanced multilevel longitudinal data is common with many educational research applications, there is not yet a study that has investigated the influence of unbalanced multilevel longitudinal data on model fit indices of MLGM.Hox and Maas (2001) simulated an unbalanced multilevel data at one time point to investigate the performance of a multilevel confirmatory factor model with unbalanced data.The results indicated that unbalanced data had little impact on the accuracy of parameter estimates of the within level model.However, for the between level, the variances of model fit indices tended to be underestimated, so the standard errors of parameter estimates were too small to be accepted.As the Hox and Maas (2001)' investigation conducted with model at only one time point, results will not improve in a MLGM situation.Model estimation for MLGM, including model fit, parameters estimation, and standard errors, may be substantially biased if the unbalanced nature is not considered.
The following research questions were examined using a Multilevel Latent Growth Model with an unbalanced design.This study examined the performance of different level-specific and targetspecific model fit indices when evaluating unbalanced MLGM with different sampling errors (1) How are level-specific and target-specific fit indices impacted by sampling error and unbalanced design?(2) Do the level-specific and target-specific fit indices demonstrate reasonable sensitivity to sampling error and unbalanced design?
Based on previous research, parameter settings from the LSAY (Longitudinal Study of American Youth) was used to simulate the correctly specified MLGM model (Hsu et al., 2019).The parameters used for the population model are based on one MLGM study of LSAY, which contains 3,102 students from grade 7 to grade 11 nested within 52 schools (Hsu et al., 2019).In line with previous MLGM simulation studies (Wu and West, 2010), a five-wave MLGM model was measured in this research.The five-time points, denoted as V1-V5, were assumed to be continuous data distributed on the standardized scale (i.e., Mean = 0 and SD = 1).The intraclass correlation coefficients (ICCs) of five-time points ranged from 0.15 to 0.19, showing that cluster-level should not be ignored in this MLGM study (Hox et al., 2010).In the between-level model, the parameter settings for the mean structure and covariance structure are presented in matrices αB and ΦB, and mean structure and covariance structure in within model are presented in matrices αW and ΦW.According to Wu and West (2010) in the SEM framework, the general population quadratic model for the population does not consider the covariance between the intercept and slope and covariance between linear and slope.In this way, we set the nondiagonal values in the matrix at both between-level and withinlevel be zero for simplicity.
The error variances for five-time points of between level model are set to 11.91, 15.25, 10.32, 12.59, and 1.93 and are uncorrelated over time.The error variances for five-time points of within level model are set to 1.80, 1.28, 0.06, 0.54, and 0.31, and these scores are also uncorrelated over time (Kwon, 2011).
The estimation of all the population models was carried out in Mplus 7.11 (Muthén and Muthén, 2017), using maximum likelihood estimation with robust standard errors (ESTIMATOR = MLR).The maximum number of iterations were set to 100 (ITERATIONS = 100) with 95 convergence criterion set to 0.000001 (CONVERGENCE = 0.000001).MLR are robust to non-normality and non-independence of observations when used with TYPE = COMPLEX (Muthén and Muthén, 2017).Our simulated datasets contain small sample sizes, which were non-normal samples.The students of simulated datasets are nested within each cluster, meaning the datasets are non-independence.MLR was the appropriate estimator for Mplus (Pornprasertmanit et al., 2013).

Design conditions: NG and unbalanced GS
NG conditions were based on Wu et al. (2015) studies, and set at 50, 100, and 200.To maximize the effect of imbalance, the group sizes were chosen to be as different as possible.The highest number 200 conforms to Boomsma's (1983) recommended lower limit for achieving good maximum likelihood estimates with normal data.The lower values 50 and 100 have been chosen because, in empirical multilevel modeling, it is hard to collect data from as many as 200 groups (Hox and Maas, 2001).
As with Hox and Maas (2001) simulation study and the regression rule of thumb for multilevel research, each predictor requires at least 10 observations (Bryk and Raudenbush, 1992).The averages of unbalanced CS conditions are manipulated into three levels, 10, 20, and 50.Unbalanced data were simulated as follows (Grilli and Rampichini, 2011).In each level, we employ two distinct group sizes, with exactly half the groups being small and the other half being large.I assigned small to half of the groups and large to the other half.The two numbers (large and small) were computed under the restriction that the coefficient of variation Var nj is approximately 0.5 (Grilli and Rampichini, 2011).The coefficient of variation indicates the degree of imbalance of the design, and 0.5 is a moderate degree of imbalance.We choose a moderate imbalance as our GS is relative small, and we do not want extremely small unbalanced influence the evaluation outcomes.The large group size is three times as large as the small group size.For unbalanced GS average is 10, small size is 5, and large sample size is 15; For unbalanced GS average is 20, small size is 10, and large sample size is 30; For unbalanced GS average is 50, small size is 25, and large sample size is 75.These three levels of CS range are also consistent with two large-scale national educational databases: the Early Childhood Longitudinal Study (Tourangeau et al., 2009) and the Early Childhood Longitudinal Study (Tourangeau et al., 2015).
In this way, we have 9 total simulation conditions.Based on the recommendation that 1,000-5,000 replication is required to produce a stable result in Monte Carlo studies (Mundform et al., 2011), 1,000 complete datasets based on population model were generated for each simulation condition.SAS 9.4 was used to simulate the datasets.

Data analysis and outcomes
First we examined the descriptive information for model fit indices under different design factors.We also generated box plots to show the distribution of model fit indices under design factors.The second part of the analysis evaluated the sensitivity of both levelspecific and target-specific model fit indices to different design factors.ANOVA with an individual model fit index's values as the dependent variables were conducted to evaluate influence of design factors.For ANOVA, we calculated the effect size, eta-squared (η2), by dividing the Type III sum-of-square attributable to each design factor or the interaction between factors by the corrected total sum-of-square.η2 describes the proportion of the variability accounted for a particular design factor or interaction effect term.In this study, each simulation condition has the same number of simulated datasets, resulting in orthogonal design factors.In this way, the Type III sum-of-squares from different factors were additive and non-overlapping, meaning the η2 of each design factor could be calculated separately without considering other factors.Following Cohen (1988) study, we considered a moderate η2 of 0.0588 to identify practically significant design factors for the fit indices' values.Note that when a fit index had a standard deviation close to 0, the impact of design factors on the fit index were trivial, even though the η2 s were larger than 0.0588.As for our analysis, when fit indices have extremely low variability, we regarded design factors do not affect the fit indices.

Calculate level-specific and target-specific fit indices
χ 2 is commonly used because it is easier to compute than other model fit indices.It can also be used with categorical data and to check the if there is a "difference" between different groups of participants.Deviations from normality and small sample may result in poor χ 2 value even though the model is appropriately specified (Goos et al., 2013).For well-fitted models, cut off values of SRMR are supposed to be less than 0.05, and values as high as 0.08 are sometimes also deemed acceptable (Bentler and Dudgeon, 1996;Chou et al., 1998).However, when there are many parameters in the model and large sample sizes, SRMR also gives acceptable values even though the hypothesized model does not fit the dataset (Boulton, 2011).Unlike χ 2 and SRMR, RMSEA is not affected by the sample size, which means that RMSEA can still evaluate the model with small sample sizes (Clarke et al., 2008).Even though TLI is not affected significantly by the sample size, the TLI value can show poor fit when other fit indices are pointing toward good fit in models where small samples are used (Bentler, 1990;Boulton, 2011).CFI is relatively independent of sample size and yields better results for studies with a small sample size (Chen et al., 2012).
χ 2 PS_B can be obtained by specifying a hypothesized between model and saturating the within model (Hox et al., 2010).A saturated model can be seen as a just-identified model with zero degrees of freedom, and thus has a χ 2 test statistic equal to zero.As a result, χ 2 PS_B only reflect the model fit of the hypothesized between model (Hox et al., 2010).After χ 2 PS_B is obtained, between-level specific fit indices: RMSEA PS_B , CFI PS_B , TLI PS_B, SRMR B can be computed, because these fit indices are calculated based on χ2 PS_B .In the same way, within-level specific fit indices (χ 2 PS_W , RMSEA PS_W , CFI PS_W , TLI PS_W, SRMR W ) can be also be computed.Target-specific fit indices for the mean structure only (χ 2 T_S_MEAN , RMSEA T_S_Mean , CFI T_S_Mean , TLI T_S_ Mean , and SRMR T_S_ Mean ) can be derived by saturating the covariance structure of the between level of MLGM, whereas target-specific fit indices for the covariance structure only (χ 2 T_S_COV , RMSEA T_S_COV , CFI T_S_COV , TLI T_S_ COV , and SRMR T_S_COV ) can be derived by saturating the mean structure of the between level of MLGM.

Results
Under the various design conditions, the convergence rates over the 1,000 replications were 100% across all cells in the design.Thus, even under the smallest sample size [number of groups (NG) = 50, with an unbalanced group size (GS) = 5], the analysis was unlikely to encounter convergence problems.Results were summarized across all replications.Traditional cutoff criteria of the fit indices used with typical SEM studies (e.g., RMSEA <0.06; CFI and TLI >0.95; SRMR <0.08; Hu and Bentler, 1999) were examined with simulated model to determine if these recommended levels were able to accurately identify correct models across different number of groups and unbalanced group sizes.Table 1 summarizes the means and standard deviations of all fit indices.
Descriptive statistics in Table 1 showed that when NG increased from 50 to 200 and unbalanced GS increased from 5/15 to 25/75, χ 2 PS_B and χ 2 PS_W mean values and standard deviation values decreased.Both indices showed that the average estimated χ 2 PS_B and χ 2 PS_W approached the expected value (i.e., 4 degrees of freedom) when NG = 50 and unbalanced GS = 25/75.A total sample size over 1,250 was necessary for χ 2 PS_w and χ 2 PS_B to appropriately identify correct between-level and within-level models.For the χ 2 T_S_COV and χ 2 T_S_MEAN , the descriptive values mean of χ 2 T_S_COV and χ 2 T_S_MEAN did not approach acceptable model fit when total sample size increased.All means of RMSEA PS_B and RMSEA PS_W approached acceptable model fit (i.e., <0.06) and standard deviation values decreased as sample size increased.Also, all RMSEA T_S_COV mean values did not approach levels indicative of acceptable model under the tested sample sizes.For the RMSEA T_S_MEAN , values yielded an acceptable model, except under the smallest sample size condition (NG = 50, unbalanced GS = 5/15).The standard deviation of RMSEA T_S_COV and RMSEA T_S_MEAN decreased as sample sizes increased.Means of CFI-and TLI-related fit indices were indicative of good model fit (i.e., >0.95) under almost all simulation conditions.There was only one mean value, TLI T_S_COV , that yielded a value under the stated cutoff (NG of 50, unbalanced GS of 5/15).All means of SRMR B , SRMRw, and SRMR T_S_MEAN produced values indicating acceptable model fit (i.e., <0.08); however, means of SRMR T_S_COV were larger, approaching the cutoff of poor model fit under all simulation conditions.The standard deviations values of SRMRB and SRMRw were close to a value 0 and standard deviation of SRMR T_S_COV and SRMR T_S_MEAN were also small.

ANOVA review of conditions
To determine factors that affected model fit indices, a three-way ANOVA with 3 (NG: 50, 100, and 200) x 3 (unbalanced GS: 5/15, 10/30, and 25/75) levels was conducted, with each fit index as the outcome variable.The η2 for each design factor is presented in Table 2.We provide a visual representation of the influential design factors on the fit indices' values with boxplots in Figures 2, 3, respectively.
Based on η2 values in Table 2, only three indices: RMSEA PS_B , were impacted by NG, with η2 values of 0.06, 0.13, and 0.26, respectively.Further, the boxplots in Figure 2 showed that the RMSEA PS_B computed under all simulation conditions were large at lower sample sizes.As the NG increased, the median RMSEA PS_B decreased with values of 0.12, 0.04, to 0.001 associated with NG levels of 50, 100, and 200, respectively.The interquartile ranges also became smaller, indicating the values of RMSEA PS_B were less dispersed.
SRMR B and SRMR W also demonstrated large variability under the simulation conditions (shown in Figure 3).As the NG varied from 50, 100, to 200, the median SRMR B decreased from 0.007, to 0.005, to 0.0035 and the median SRMR W decreased from 0.0025, to 0.0018, to 0.001.The interquartile ranges of all SRMR B and SRMR W also became smaller.
Figure 4 demonstrated the variabilities of SRMR B and SRMR W computed under all simulation conditions were large.As the unbalanced GS increased, the median SRMR B decreased (with values of 0.007, 0.005, to 0.0032) as did the median SRMRW decreased (values of 0.0024, 0.0018, to 0.001) for unbalanced GS levels of 5/15, 10/30, to 25/75, respectively.The interquartile ranges of all SRMR B and SRMR W also became smaller, indicating the values of SRMR B and SRMR W were less dispersed.Both the η2 and boxplots showed SRMR B and SRMR W were affected by factor unbalanced GS.With increase of the unbalanced GS, SRMR B and SRMR W showed lower model fit values (Figure 5).

Discussion
In practice, given that researchers are not aware if the model is correctly specified, researchers are recommended to use χ 2 PS_B and χ 2 PS_W when the sample size is large enough (NG > 100 and unbalanced GS > 10/30).Based on our findings, researchers need to be aware that using χ 2 T_S_COV and χ 2 T_S_MEAN will cause the over-rejection of correctly specified unbalanced MLGM.Based on these findings, χ 2 T_S_COV and χ 2 T_S_MEAN are not recommended to evaluate unbalanced MLGM.These findings are different from balanced MLGM study, where Hsu et al. (2019) showed that all χ 2 test statistics, χ 2 PS_B , χ 2 PS_W , χ 2 T_S_COV , and χ 2 T_S_MEAN , can detect the misspecified balanced MLGM when NG is larger than 200 and GS is larger than 20.RMSEA PS_B and RMSEA PS_W are recommended to researchers to evaluate unbalanced MLGM with similar conditions used in the study.Based on our outcomes, RMSEA T_S_COV is not recommended to evaluate unbalanced     Based on the results, we recommend researchers to collect at least 50 for NG, regardless of the GS and if the MLGM design is unbalanced or balanced.When the NG is small, the amount of the sampling errors presents with the between-level related specific model fit indices will increase due to that small samples might commit a Type II error for χ 2 PS_B .The large sampling error causes some between-level or targetspecific fit indices for the between-covariance or between-mean structure to fail to identify a correctly specified between-level model.In this way, between-level related specific model fit indices, χ 2 PS_B , Box plot of RMSEAPS_B, RMSEAPS_W, and RMSEAT_S_COV values derived from correctly specified MLGM Models by unbalanced GS (5/15, 10/30, and 25/75).The combination of NG and unbalanced GS determine the total sample size in MLGMs and may also influence the performance levelspecific and target-specific model fit indices.Pervious research illustrated that total sample size highly influenced the values of fit indices because of the issue of total sample size discrepancy, that is, the difference between a sample covariance matrix and the covariance matrix of the population (Marsh et al., 2014;Wu et al., 2009;Wu and West, 2010).When the total sample size is small, the discrepancy between the sample covariance matrix and the population covariance matrix will increase, and the discrepancy between a sample covariance matrix and covariance matrix reproduced by a correctly specified model will also increase.The large discrepancy will cause the fit indices to indicate poor fit for a correctly specified model.In contrast to the between-model evaluation, both NG and unbalanced GS jointly determine the sample size of the within-level model and influence the performance of within-level specific model fit indices.As for the effect of unbalanced GS, χ 2 PS_W failed to identify correctly specified within-level model when the unbalanced GS was small (e.g., unbalanced GS < 10/30).When the unbalanced GS is small, the amount of the sampling errors calculated in within-level specific model fit indices increases.Based on the findings from our analysis and from previous research, we highly recommend applied researchers to collect average GS of 20 and consider NG when evaluating within-level related specific model fit indices, regardless of if the MLGM design is unbalanced or balanced.

Limitations and future research direction
In the study, we set the coefficient of variation constant throughout different group sizes, meaning that conclusions are limited to only one degree of imbalance.Future study need to vary the degree of imbalance to check the influence of imbalance on the evaluation of model fit indices.Misspecifications for the between and within models were not modeled.As there is not yet literature informing researchers about the performance of different level-specific and target-specific model fit indices in unbalanced MLGM, a correctly specified MLGM was simulated to fill this gap as a first step.As misspecifications in MLGM are possible, this aspect deserves systematic investigation in future simulation studies.By investigating different misspecifications, we can study the indices' sensitivity, which measures the extent to which specific fit indices could reflect the discrepancy between correctly specified models and misspecified hypothesized models.We expected desirable fit indices to demonstrate reasonable sensitivity to minor misspecifications and to be able to detect moderate misspecifications at both levels.Besides, in our study, we considered a few numbers of design factors.Other design factors, such as a different number of time points and trajectories, can be manipulated in future studies.
FIGURE 1Two-level multilevel latent growth model.
MLGM.Researchers are recommended to use RMSEA T_S_MEAN when the sample size is large enough (NG > 50 and unbalanced GS > 5/15).Except TLI T_S_COV , all the other CFI-related fit indices and TLI-related fit indices are recommended to evaluate unbalanced MLGM.Researchers are recommended to use TLI T_S_COV when the sample size is large enough (NG > 50 and unbalanced GS > 5/15).SRMR B , SRMRw, and SRMR T_S_MEAN are recommended to evaluate

FIGURE 3
FIGURE 3 Box plot of SRMRB and SRMRW values derived from correctly specified MLGM models by NG (50, 100, and 200).
unbalanced MLGM.The means of SRMR T_S_COV had approached values did not approach the values indicative of good model fit (i.e., <0.08) under all simulation conditions.Our results differed from previous level-specific and targetspecific fit indices study conducted under balanced MLGM.Hsu et al. (2019) balanced MLGM study concluded that RMSEA PS_B , RMSEA PS_W , RMSEA T_S_COV , CFI T_S_COV , TLI T_S_COV , RMSEA T_S_Mean , and TLI T_S_ Mean were significantly influenced by NG and should not be used.Hsu et al. (2019) also concluded that CFI PS_B , TLI PS_B , RMSEA PS_W , CFI T_S_COV , TLI T_S_COV , RMSEA T_S_Mean , CFI T_S_Mean , and TLI T_S_ Mean were influenced by different GSs.A plausible explanation for the differences could be due to the unbalanced GS.Our study simulated three unbalanced GS, 5/15, 10/30, and 25/75.The balanced GS used in Hsu et al. (2019)'s design are: 5, 10, and 20.Hox and Maas (2001) investigated the performance of a MCFA (multilevel confirmatory factor model) on unbalanced data.The results indicated that unbalanced data had impact on the accuracy of model estimation of the between level model.As unbalanced MLGM have different between-level and within-level at each time point, the unbalanced data have an large influence on the model estimation of both levels.

TABLE 1
Descriptive statistics of model fit indices by NG and unbalanced GS for the accelerating growth trajectory.mean square error of approximation.CFI, comparative fit index.TLI, Tucker-Lewis index.SRMR, standardized root mean square residual.Subscripted PS, partially saturated model method.Subscripted TS, target-specific fit indices.Subscripted B, between-level model.Subscripted W, within-level model.Subscripted COV, fit index for evaluating betweencovariance structure.Subscripted MEAN, fit index for evaluating between-mean structure.

TABLE 2 η2
values from ANOVA design by fit index.
10.3389/fpsyg.2024.1366850Frontiers in Psychology 10 frontiersin.org CFI PS_B , TLI PS_B , RMSEA PS_B , and SRMR B , require NG at least to be large enough (e.g., NG > 50) to be able to identify correctly specific MLGM with unbalanced design.However, if the NG cannot be at least 50 NG, RMSEA PS_B , SRMRB and SRMR W are not recommended.Based on our results, χ 2 PS_B and χ 2 PS_W are not recommended with unbalanced MLGM designs when researchers have a NG smaller than 50.If researchers have moderate NG (around 100 cases), we recommend researchers to collect a GS larger than 5/15.