AUTHOR=Sen Sedat , Cohen Allan S. TITLE=The Impact of Test and Sample Characteristics on Model Selection and Classification Accuracy in the Multilevel Mixture IRT Model JOURNAL=Frontiers in Psychology VOLUME=Volume 11 - 2020 YEAR=2020 URL=https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2020.00197 DOI=10.3389/fpsyg.2020.00197 ISSN=1664-1078 ABSTRACT=The standard item response theory (IRT) model assumption of a single homogenous population may be violated in real data. Mixture extensions of IRT models have been proposed to account for latent heterogeneous populations, but these models are not designed to handle multilevel data structures. Ignoring the multilevel structure is problematic as it results in lower-level units aggregated with higher-level units. This, in turn, yields less accurate results because of dependencies in the data. Multilevel mixture IRT models have been developed to account for such dependencies. Multilevel data structures cause dependencies between levels but can be modeled in a straightforward way in multilevel mixture IRT models. An important step in the use of multilevel mixture IRT models is the fit of the model to the data. This fit is often determined based on relative fit indices. Previous research on mixture IRT models has shown that performances of these indices and classification accuracy of these models can be affected by several factors including percentage of class-variant items, number of items, magnitude and size of clusters, and mixing proportions of latent classes. As yet, no studies have been reported examining these issues for multilevel extensions of mixture IRT models. The current study aims to investigate the effects of several features of the data on the accuracy of model selection and parameter recovery. Results are reported on a simulation study designed to examine the following features of the data: percentages of class-variant items (30%, 60%, and 90%), numbers of latent classes in the data (with from 1 to 3 latent classes at level 1 and 1 and 2 latent classes at level 2), numbers of items (10, 30, and 50), numbers of clusters (50 and 100), cluster size (10 and 50), and mixing proportions (equal [.5 and .5] vs non-equal [.25 and.75]). Simulation results indicated that, multilevel mixture IRT models resulted in less accurate estimates when the number of clusters and the cluster size were small. Sample size dependent fit indices (BIC and SABIC) showed poor performances for the smaller level-1 sample size. Overall, the SABIC index performed better than other fit indices.