Estimating CDMs Using the Slice-Within-Gibbs Sampler

In this paper, the slice-within-Gibbs sampler has been introduced as a method for estimating cognitive diagnosis models (CDMs). Compared with other Bayesian methods, the slice-within-Gibbs sampler can employ a wide-range of prior specifications; moreover, it can also be applied to complex CDMs with the aid of auxiliary variables, especially when applying different identifiability constraints. To evaluate its performances, two simulation studies were conducted. The first study confirmed the viability of the slice-within-Gibbs sampler in estimating CDMs, mainly including G-DINA and DINA models. The second study compared the slice-within-Gibbs sampler with other commonly used Markov Chain Monte Carlo algorithms, and the results showed that the slice-within-Gibbs sampler converged much faster than the Metropolis-Hastings algorithm and more flexible than the Gibbs sampling in choosing the distributions of priors. Finally, a fraction subtraction dataset was analyzed to illustrate the use of the slice-within-Gibbs sampler in the context of CDMs.


INTRODUCTION
Cognitive diagnosis models (CDMs) aim to provide a finer-grained evaluation of examinees' attribute profiles. As psychometric tools, CDMs have been employed in both educational and noneducational contexts (Rupp and Templin, 2008;de la Torre et al., 2018). Thus far, several reduced and general CDMs have been proposed. Examples of the former are the deterministic inputs, noisy "and" gate (DINA; Junker and Sijtsma, 2001) model and deterministic inputs, noisy "or" gate (DINO; Templin and Henson, 2006) model; whereas examples of the latter are the generalized DINA (G-DINA; de la Torre, 2011) model, log-linear CDM (Henson et al., 2009), and general diagnostic model (GDM;von Davier, 2008). When applying CDMs, a fundamental issue is model identifiability of the Q-matrix. For different models, different identifiability conditions have been proposed, including strict identifiability (Liu et al., 2013;Chen et al., 2015;Xu, 2017) and milder identifiability (Chen et al., 2020;Gu and Xu, 2020).
Basically, in the CDM literature, two estimation methods were widely used. The first is the Expectation-Maximization (EM) algorithm within the frequentist framework (de la Torre, 2009(de la Torre, , 2011Huo and de la Torre, 2014;Chiu et al., 2016;George et al., 2016;Minchen et al., 2017;Kuo et al., 2018). However, the main motivation of CDMs is to identify the latent attribute profiles of examinees' and Bayesian methods are often more natural to reach the goal. The second most commonly used method is Markov chain Monte Carlo (MCMC) method (de la Torre and Douglas, 2004;Culpepper, 2015;Culpepper and Hudson, 2018;Zhan et al., 2018Zhan et al., , 2019Jiang and Carter, 2019). Usually, to use the MH algorithm, it is necessary to choose a proposal distribution that can lead to optimal sampling efficiency. However, empirically determining the optimal proposal distribution can be challenging in practice. Culpepper (2015) first introduced the Gibbs sampling to the DINA model and Zhang et al. (2020) applied the Pólya-Gamma Gibbs sampling based on auxiliary variables to DINA model. Culpepper and Hudson (2018) introduced Bayesian method to the Reduced Reparameterized Unified Model (rRUM; DiBello et al., 1995;Roussos et al., 2007).
With the development of the identifiability, more complex restrictions need to be taken into account. How to estimate more general models incorporating to the corresponding identifiability conditions has been a technically appealing task. In this paper, a sampling method called the slice-within-Gibbs sampler is introduced, in which the identifiability constraints are easy to be imposed. The slice-within-Gibbs sampler can avoid the boring choices of tunning parameters in the MH algorithm and converges faster over the MH algorithm with misspecified proposal distributions. In addition, it has more flexibility over the Gibbs sampling in prior choices and can be easier to apply to more general models compared with the Pólya-Gamma Gibbs Sampling and the Gibbs sampling. In line with the original idea of the slicewithin-Gibbs sampler, data would still be augmented with auxiliary variables to make sampling from complicated posterior densities feasible. Existing theoretical results on convergence and stability of the slice-within-Gibbs sampler guarantees that the method is equally applicable to psychometric models, in general, and CDMs, in particular. As such, this paper focuses mainly on demonstrating the usage, as well as evaluating the performance of the slice-within-Gibbs sampler in conjunction with CDMs.
The remainder of this paper is organized as follows. Section 2 provides an overview of CDMs, mainly the G-DINA and DINA models. A detailed slice sampler algorithm for the DINA model is presented in section 3, followed by some advantages of the algorithm. In section 4, two simulations are conducted to illustrate the feasibility of the sampler and its advantages over other MCMC methods. Section 5 contains an application of the slice-within-Gibbs sampler to fraction subtraction data, and section 6 provides a discussion of the findings and limitations of this work and possible future research directions.

OVERVIEW OF CDMs
Suppose there are a total of I examinees and J items with K required attributes in a test. Let Y ij denote the binary response of examinee i to item j, and Y = {Y ij } I×J be the response matrix. In CDMs, it is often assumed that the latent trait of examinees is quantified by K−dimensional vectors, called attribute profiles. That is, for ith examinee, the latent profile is α i = (α i1 , α i2 , · · · , α iK ), where α ik ∈ {0, 1} and α ik = 1 means that examinee i has mastered the kth attribute, whereas α ik = 0 otherwise. Therefore, there possibly exist C = 2 K different attribute profile classes, denoted by α c = (α c1 , α c2 , · · · , α cK ), c = 1, 2, · · · , C. The association between items and attributes is specified by Q-matrix Q= q jk J×K (Tatsuoka, 1983), where q jk = 1 means the kth attribute is required to answer jth item correctly, and q jk = 0 otherwise.
CDMs model the item response Y ij using the following Bernoulli distribution, is the probability of answering item j correctly for examinee i with attribute pattern α c , and Ω j denotes the unknown parameter set of item j. The likelihood of the data can be written by obtaining the weighted sum across the different attribute profiles. More specifically, assuming an identically and independently distributed latent membership, π c = P(α i = α c ), the joint likelihood can be written as,

The G-DINA Model
The G-DINA model is a saturated CDM that subsumes a number of reduced CDMs. In this model, P(Y ij = 1|α i , Ω j ) in Equation (1) is expressed as a function of the main effects and interactions of the required attributes for each item. Following de la Torre (2011), let K * j = K k=1 q jk denote the number of required attributes for item j. For notational convenience, but without loss of generality, let the first K * j attributes be required for item j, and let α * ij = (α i1 , α i2 , · · · , α iK * j ) be the reduced vector of α i associated with item j. The f ij in the G-DINA model for item j is, where δ j0 is the intercept; δ jk is the main effect of α ik ; δ jkk ′ is the two-way interaction effect of α ik and α ik ′ ; and δ j12···K * j is the K * jway interaction effect of α i1 , · · · , α iK * j . Aside from the identity link, the G-DINA model can be expressed using log and logit links (de la Torre, 2011).

The DINA Model
The DINA model is one of most commonly used CDMs, and its f ij is given by where g j and s j are the guessing and slip parameters, and α * ij = 1 = (1, 1, . . . , 1) T denotes that examinee i has possessed all the required attributes of item j. In the DINA model, Ω j = {g j , s j }.
As many researchers have already noted, the DINA model is a special case of the G-DINA model. The former can be derived from the latter by setting δ j0 = g j , δ j0 + δ j12··· ,K * j = 1 − s j , and remaining parameters to zero. Thus, in the DINA model, only the K * j -way interaction is taken into account, which indicates that the response is expected to be correct only when all the required attributes have been mastered.

Identifiability of Restricted Latent Class Models
For most common statistical inferences, the identifiability of the models is a precondition. To guarantee the identifiability when estimating CDMs, we follow a set of sufficient conditions presented by Xu (2017) for a class of restricted latent class models. Specifically, these CDMs need to satisfy the following assumptions: (i) (Monotonicity relations) For any attribute profile α i ′ , and (ii) If q j = e k for k = 1, 2, . . . , K, where α q holds if and only if α k ≥ q k for any k ∈ {1, 2, . . . , K} and α q means there exists at least one k ∈ {1, 2, . . . , K} such that α k < q k ; 0 = (0, 0, . . . , 0) T ; and e k is a vector whose kth element is one and the rest elements are zero.
Both the G-DINA and DINA models are considered restricted latent class models. Specifically, for the DINA model, the above assumptions are equivalent to 1−s j > g j . For the G-DINA model, the transformation is more complicated and will be discussed in section 3.2.
Identifiability in restricted latent class models satisfies the following sufficient conditions (Xu, 2017): (C1) The Q-matrix is constructed such that where I K is a K × K identity matrix; and (C2) For any item in Q ′ , examinees who possess no required attributes have the lowest success probabilities. That is, min

INTRODUCING THE SLICE-WITHIN-GIBBS SAMPLER FOR CDMs
In this section, we introduce the slice-within-Gibbs sampler as a method of estimating CDMs. Moreover, we list its advantages.

Using the Slice-Within-Gibbs Sampler to Estimate CDMs
First, the joint posterior distribution of model parameters (Ω, π ) could be written as, where P (Ω, π) denotes the joint prior distribution.
Step 2: Sample item parameters Ω j , j = 1, 2, . . . , J, from the following truncated distribution: where Ω L j and Ω R j are derived from the identifiability restrictions, and inequalities 0 < U ij < f ij , and 0 < V ij < h ij . For example, in the DINA model, In the G-DINA model,  where j = i Y ij = 0 , ̥ j = i Y ij = 1 , and δ * L and δ * R are the lower and upper bounds, respectively, determined from the identifiability conditions of the restricted models.
Step 3: Update the latent membership probabilities π and the latent profile α i . Following Huebner and Wang (2011) and Culpepper (2015), the prior of π is assumed to follow Dirichlet(ϕ 0 , . . . , ϕ 0 ). The full conditional distribution of the latent class probabilities π can be written as In this process, α i is sampled from the distribution where A number of differences exist in updating the item parameters using the MH algorithm, Gibbs sampling, and slice-within-Gibbs sampler. The MH algorithm samples the new value from a proposal distribution p proposal (Ω j ). In this paper, we adopted truncated normal distributions as the proposal distributions. Within the Gibbs sampling framework, samples are drawn from the posterior distributions, which is a feature inherited by the slice-within-Gibbs sampler. For practicability, conjugate priors are normally employed for in the Gibbs sampling. In the "dina" 3 R package, for example, Culpepper (2015) used the Gibbs sampler to estimate the DINA model. In contrast, to make sampling more convenient and flexible to implement, the slicewithin-Gibbs sampler transforms the posterior distributions of item parameters into a uniform distribution by introducing auxiliary variables. However, for updating the latent membership probabilities π and the latent profile α i , the same formula was adopted by all the samplers.

About the Monotonicity Restrictions
When applying the slice-within-Gibbs sampler, the monotonicity restrictions are needed to cooperate with Step 3 for identifiability. For DINA model, it is easy to implement this constraint, that is, s j + g j < 1. However for other complex CDMs, it is a bit complicated. In this part, we present how to restrict parameters specially in G-DINA model. In this part, we only took K = 3 as an example. When K = 3, there exist at most 2 K = 8 parameters and corresponding C classes in G-DINA model. The inequalities (5) and (6) are actually equivalent to adopt the following inequality considering all combinations of the q−entries.
Therefore, the corresponding bound can be imposed as follows: 3. Apply similar formula to other main-effect parameters.

Some Advantages of the Slice-Within-Gibbs Sampler
The MH algorithm typically relies heavily on the proposal distributions to achieve sampling efficiency. Under unidimensional cases, some researchers suggest that about 50% of candidates need to be accepted for an appropriate proposal distribution to be optimal. The probability of acceptance reduces to around 25% when sampling two-or three-dimensional parameters (Patz and Junker, 1999). For more complex CDMs, this probability needs to drop even more. Compared with MH algorithm, the slice-within-Gibbs sampler as an extension of the Gibbs sampler inherits the high efficiency of the latter. Specifically, the slice-within-Gibbs sampler avoids choosing a proposal distribution because the posterior acts as its proposal distribution. This gives the slice-within-Gibbs sampler acceptance probabilities equal to 1, which makes it highly efficient.
In contrast to the Gibbs sampler, the slice-within-Gibbs sampler has greater flexibility in choosing the prior distributions.       Although highly efficient, finding easy-to-use conjugate prior distributions renders the use of the Gibbs sampler challenging in practice. However, this is not an issue with the slice-within-Gibbs sampler -its efficiency is not affected by the choice of prior distributions. Even if misspecified priors are adopted, it can obtain satisfactory results. Thus, the slice-within-Gibbs sampler not only has a relatively high convergence rate, but also overcomes the dependence on the conjugate prior. Moreover, based on Theorem 7 in Mira and Tierney (2002), it can easily be shown that the slice-within-Gibbs sampler when used with CDMs is uniformly ergodic because f ij is bounded by 1. However, it should be noted that a few other MCMC algorithms exhibit this robust property (Mira and Tierney, 1997;Roberts and Rosenthal, 1999).

SIMULATION STUDY
In this section, two simulation studies were conducted to evaluate the performance of the slice-within-Gibbs sampler in the CDM context. Simulation 1 was designed mainly to examine the extent the slice-within-Gibbs sampler can accurately recover the parameters of the DINA model and G-DINA models; Simulation 2 was designed to document the advantages of the slice-within-Gibbs sampler over the MH algorithm and Gibbs sampling in estimating the DINA model.

Design
In Simulation Study 1, the number of attributes for the DINA and G-DINA models was fixed to K = 5 and K = 3, respectively, whereas the number of items was set to J = 30. The Q-matrices for the DINA model given in Figures 1, 2 and for the G-DINA model given in Table 1 were designed to ensure the identifiability of restricted latent class models.
In this context, the number of examinees I is set as 500, 1000, 3000. As for item parameters of DINA model, five different conditions were considered. Following Huebner andWang (2011) andCulpepper (2015), four noise levels (i.e., item qualities) were considered: (1) a low noise levels j = g j = 0.1; (2) a high noise levels j = g j = 0.2; (3) the slip parameter was higher than the guessing parameters j = 0.2, g j = 0.1; and (4) the guessing parameter was higher than the slip parameters j = 0.1, g j = 0.2. A fifth condition was considered, where, as in Zhan et al. (2018), the negative correlation between the item parameters based on the empirical data was taken into account. Specifically, the guessing and slip parameters were generated from the following: (logit(g j ), logit(s j )) ∼ N( -2.  Table 2. Finally, the latent class membership probabilities π were set to be equal for the different latent classes. In this simulation study, all priors were set to be noninformative. With respect to the item parameters, priors of the slip and guessing parameters to the DINA model were set to be Uniform(0, 1), whereas P(δ) ∝ 1 in the support set was assumed for the G-DINA model.
Two criteria were used to evaluate quality of the parameter recovery, namely, the bias and root mean squared error (RMSE) of s, g, δ, π across 25 replications. In both simulation studies, the slice-within-Gibbs sampler was iterated 20,000 times for each replication, where the first 10,000 iterations were discarded as burn-in.
To evaluate the convergence, four chains started at overdispersed starting values were run. The potential scale reduction factor (PSRF) R (Brooks and Gelman, 1998) was computed using the R package "coda" (Plummer et al., 2006). A value of R less than 1.1 (Brooks and Gelman, 1998) was used as the criterion for chain convergence.

Results
It was verified that the number of iterations and burn-in were sufficient for the chain to converge. For example, Figure 3 shows the R in G-DINA model for sample size I = 1000 that all the parameters came down to 1.1 at the 7266th iteration. Table 3 shows the parameter recovery results of the slicewithin-Gibbs sampler under the DINA model, Table 4 shows the parameter recovery under the condition with negatively correlated item parameters, and Tables 5, 6 the results under the G-DINA model across different sample sizes and item qualities.
For the smallest sample size (i.e., I = 500), the maximum absolute bias of the item parameter estimates was 0.011 and 0.053 for the DINA and G-DINA models; the RMSE was below 0.044 and 0.182 for the DINA and G-DINA models, respectively. With the exception of the higher-order interaction terms when K * j = 3, these results indicate that satisfactory estimates can be obtained for the DINA and G-DINA models using the slicewithin-Gibbs sampler even with sample size as small as I = 500. As the table shows, the performance of the slice-within-Gibbs sampler improved as the number of examinees increased. When I = 3, 000, the absolute bias and RMSE of all the item parameters were smaller, and their maximum values dropped to 0.003 and 0.019, respectively, for the DINA model, and to 0.003 and 0.074, respectively, for the G-DINA model. For the condition where the item parameters were negatively correlated, the average bias and RMSE were comparable to those obtained under the low-noise level condition. Finally, for the latent membership probabilities, all the parameters can be estimated extremely accurately (i.e., bias is 0.00) for both models. Moreover, the maximum RMSEs at I = 500 were 0.010 and 0.015 for the DINA and G-DINA models, respectively, and improved with larger sample sizes.
To better understand the properties of the slice-within-Gibbs sampler, Figures 1, 2 show the detailed results for I = 500 size under DINA model. Consistent with the results in Culpepper (2015) and de la Torre (2009), which were obtained using different estimation algorithms, worse results were obtained for items that required more attributes. The deterioration in the quality of item parameter estimates as the number of required attributes increased can be clearly observed in Figure 2, which displays the RMSE of the estimates. It should be noted that the guessing parameter estimates did in fact slightly improve with more required attributes; however, the improvement did not compensate for the stark deterioration in the slip parameter estimates. These results underscore that fact that, given a fixed same sample size, the quality of item parameter estimates of the DINA model can affected by of the number of required attributes. Finally, Figures 1, 2 indicate that item quality had only a small impact on the recovery on the individual latent class membership probabilities.
In sum, the results of Simulation Study 1 indicates that the slice-within-Gibbs sampler can provide accurate estimates of the DINA and G-DINA model parameter estimates. Moreover, it can provide results consistent with those of previously implemented algorithms.

Simulation Study 2
This simulation study had two-fold goals: (1) to compare the efficiency of the slice-within-Gibbs sampler to that of MH algorithm; and (2) to compare the slice-within-Gibbs sampler and Gibbs sampler in terms their flexibility in specifying the priors. For this study, the MH algorithm, Gibbs sampling and slice sampler were compared in the context of DINA model.

Design
The simulated data contained I = 500 examinees, J = 30 items and K = 5 attributes. All the slip and guessing parameters were set to 0.1, and the Q-matrix given in Figure 1 was used.
For the MH algorithm, there exist infinite choices of proposal distributions. For demonstration purposes, this simulation study only considered the following two cases of the proposal distributions.
• Case 1: A larger step between iterations, where s j ∼ N(s (t) j , 1), and g j ∼ N(g (t) j , 1); and • Case 2: A smaller step between iterations, where s j ∼ N(s j , 0.001). For the Gibbs sampling, the Beta family distributions were the conjugate priors of the items parameters. Following Culpepper (2015), only the conjugate prior Beta (1, 1) was considered.
For the slice-within-Gibbs sampler, both conjugate and non-conjugate priors were considered. Below are the two cases of the priors and their specific instances.
• Case 3: For conjugate priors, Beta (1, 1) , Beta (1, 2) and Beta (2, 2) were used; and • Case 4: For non-conjugate priors, N (0, 1) I (0,1) (x), N (2, 1) I (0,1) (x), Uniform(0, 2) I (0,1) (x), and Exp (1) I (0,1) (x) were used.  Table 7 presents the recovery results of the slice-within-Gibbs sampler with uniform prior and MH algorithm under Cases 1 and Case 2. The results show that the accuracy of the MH algorithm parameter estimates was greatly influenced by the variance of proposal distribution. Specifically, the parameter estimates under Case 2 were worse than those under Case 1, which indicates that, for this particular condition, a smaller step between iterations was not a good as a larger step. It is also noteworthy that, despite the use of a uniform prior, the slice-within-Gibbs sampler provided estimates that were as good as, if not better than estimates obtained using the MH algorithm under Case 1.   Figure 4, which contains the Rs for the slice-within-Gibbs sampler and MH algorithms across different iterations, shows the differing convergence rates of the two methods. As can be seen from the figure, π converged at the fastest rate, followed by   g. For the MH algorithm, Case 1 converged faster than Case 2 -Case 1 reached convergence by the 1000th iteration, whereas Case 2 did not even reach convergence for some parameters. This indicates that the variance of the proposal distribution in Case 2 was too small to sufficiently explore the posterior distribution. In comparison, all the parameters estimated using the slice-within-Gibbs sampler reached convergence by the 2000th iteration. Table 8 presents the recovery results of the slice-within-Gibbs sampler with Beta (1, 2) prior under Case 3 and N (0, 1) I (0,1) (x) prior under Case 4. Figure 5 shows the bias and RMSE of the slice-within-Gibbs sampler under Case 3 (i.e., conjugate priors) and the Gibbs sampler under Beta (1, 1). It can be seen that the slice-within-Gibbs sampler performed similarly to the Gibbs sampler, particularly for g j and π c . Although the estimates of s j had a larger variability across the four priors, none of them was uniformly the best across the 30 items. The figure also shows that the slice-within-Gibbs sampler provided comparable results under the family of beta priors. Finally, regardless of the Beta priors used, the bias and RMSE of s j were always higher than those of g j and π c , which is consistent with the previous results. Figure 6 presents the recovery of the slice-within-Gibbs sampler under Case 4. It should be noted that the Gibbs sampler does not work under these specific priors. In contrast, the slicewithin-Gibbs sampler can also be applied with different nonconjugate, even misspecified priors. The figures shows that the biases of s j , g j , and π c were close to zero, and the corresponding RMSEs were below 0.05. Despite the use of non-conjugate priors, these results were almost the same those obtained using the Gibbs sampler under Beta (1, 1). Table 9 compares the convergence rate of the Gibbs and slicewithin-Gibbs samplers. Specifically, the simulated data based on the DINA model used I = 500 examinees, J = 30 items, K = 5 attributes and the Q-matrix in Figure 1. For comparison purposes, two criteria were used to evaluate the convergence

Latent classes
π Latent classes π α 1 α 2 α 3 α 4 α 5 EAP SE α 1 α 2 α 3 α 4 α 5 EAP SE   rates, namely, the iterations at which all the parameters reached convergence and the time to reach 20,000 iterations. Based on 100 replications, Table 11 shows that the Gibbs sampler converged much earlier and was about 1.29 times faster than the slicewithin-Gibbs sampler.
Overall, the results of Simulation Study 2 indicate that, depending on the proposal distribution, the slice-within-Gibbs sampler can be dramatically more or slightly less efficient than the MH algorithm. However, the MH algorithm is advantageous only to the extent that the proposal distribution is optimal, whereas the slice-within-Gibbs sampler can be implemented with a wide range of prior distributions. Similarly, although the slice and Gibbs samplers are comparable, the former, unlike the latter, is not restricted to the use of conjugate priors.

Data
The empirical example involved fraction subtraction data previously analyzed by Tatsuoka (1990), Tatsuoka (2002) and de la Torre (2009). The data analyzed here consisted of responses of 536 students to 15 fraction subtraction items. The five attributes measured by the test were: α 1 subtracting basic fractions; α 2 reducing and simplifying; α 3 separating whole from fraction; α 4 borrowing one from whole; and α 5 converting whole to fraction. The corresponding Q-matrix is given in Table 10.

Methods and Results
The DINA and G-DINA models were fitted to the data and the corresponding parameters estimated using the slicewithin-Gibbs sampler, with monotonicity constraints imposed to stabilize the estimates due to the relatively small sample size. Incidentally, the Gibbs sampler was not considered for these data due to the difficulty in finding conjugate priors that can also accommodate the monotonicity constraints. The estimates based on the expected a posteriori (EAP) and the corresponding standard errors (SEs) were computed for DINA and G-DINA models. Finally, the deviance information criterion (DIC) was employed to select between the two models. Figures 7, 8 show the R for the G-DINA and DINA analyses of the empirical data, respectively. In addition to the convergence of the chains, the figures also show that the DINA model converged faster than the G-DINA model for these data.
In terms of DIC, a model with smaller DIC is to be preferred (Spiegelhalter et al., 2002). In fitting the fraction subtraction data, the DICs of the DINA and G-DINA models were 27719.86 and 27017.43, respectively, which indicates that the G-DINA model provided a better fit to data. Thus, only results pertaining to the G-DINA model are presented below. Table 11 contains the EAP estimates of the latent membership parameters, π c , and their corresponding SEs under the G-DINA model. The eight latent classes with the largest memberships were: π(1, 1, 1, 1, 1) = 0.335, π(1, 1, 1, 0, 0) = 0.138, π(1, 1, 1, 1, 0) = 0.118, π(0, 0, 0, 1, 0) = 0.110, π(1, 1, 1, 0, 1) = 0.083, π(0, 0, 1, 0, 0) = 0.079, π(1, 0, 1, 0, 0) = 0.035, and π(1, 1, 0, 1, 0) = 0.014. They accounted for over 91% of the latent class memberships. In terms of individual attribute mastery, α 1 through α 5 had the following prevalences: 0.771, 0.723, 0.812, 0.620, and 0.463, which makes α 3 and α 5 the easiest and most difficult attributes to master, respectively. It can be noted that latent classes which showed mastery of all but one of the three easiest attributes to master, as in (0,1,1,1,1), (1,0,1,1,1), and (1,1,0,1,1), had the lowest latent class memberships. In this example, it can be noted that latent classes with the largest class memberships also had the larger SEs. Table 12 gives the G-DINA model estimates of the fraction subtraction items in term of the latent group success probability P(α * ij ). The item parameter estimates clearly show why the G-DINA model was preferred over the DINA model. For the DINA model to provide a satisfactory fit to the data, all parameters and except the intercept and the highest-order interaction effect must be equal to zero. This was not the case with, say, items that require two attributes (i.e., items 8, 9, and 12). For these items, P(00) < P(10) and P(00) < P(01) indicating that the main effects are not equal to zero. The remaining multi-attribute items also indicate that the conjunctive assumption of the DINA model was not tenable. As a rough measure of item discrimination, j = P j (1) − P j (0), was computed. All but two items had j > 0.70, and the average item discrimination was¯ j = 0.829. These results indicate that the fraction subtraction items are highly discriminating.
For comparison purpo ses, the EM estimates of the same items were also obtained using the R package "GDINA" (Ma and de la Torre, 2020), and are given Table 12. It can be noted that for one-attribute items (i.e., items 1 and 5), the slice-within-Gibbs sampler and EM estimates are highly comparable. However, for multi-attribute items, the estimates can be quite disparate, except for P j (1) was comparable across the two methods. The difference could be due to the small sample size relative to the complexity of the G-DINA model for the multi-attribute items. To better understand the behavior of the slice-within-Gibbs sampler and EM algorithm vis-a-vis the fraction subtraction data, we conducted a simulation study where data were generated based on the parameters obtained using the slice-within-Gibbs sampler. Figure 9 shows the mean absolute error (MAE) of the two estimates across 100 replications. The figure indicates that the slice-within-Gibbs sampler had smaller mean absolute biases FIGURE 9 | Mean absolute bias of the slice-within-Gibbs sampler and EM estimates of data simulated based on the fraction subtraction data.
for most of the parameters, thus, a more reliable method for the fraction subtraction data.

DISCUSSION
In this work, the slice-within-Gibbs sampler is introduced as a method of estimating CDMs. Unlike the MH algorithm, the slice-within-Gibbs sampler obviates the need to choosing an optimal proposal distribution; unlike Gibbs sampler, the slice-within-Gibbs sampler has the flexibility to work with a wider range of prior distributions. As shown in the simulation studies and empirical example, it can be used to estimate complex CDMs, such as the G-DINA model. Thus, the slice-within-Gibbs sampler provides an alternative and viable estimation procedure in the context of CDMs.
Based on the results of Table 9, additional work is needed to speed up the implementation of the slice-within-Gibbs sampler for researchers to be able to fully take advantage of the flexibility of the sampler to estimate a wide range CDMs.
In the present work, only two CDMs (i.e., DINA and G-DINA models) were employed to illustrate the slice-within-Gibbs sampler. However, the slice-within-Gibbs sampler can be easily extended to other CDMs (e.g., additive CDM, GDM), attribute structure (e.g., higher-order CDMs; de la Torre and Douglas, 2004), and potentially to CDMs that incorporate various types of covariates.
Finally, it should be noted that other MCMC sampling procedures that use auxiliary variables are currently available. One such procedure is the Hamiltonian Monte Carlo (Neal, 2011;Duane et al., 1987, HMC) algorithm. The HMC algorithm is based on the Hamiltonian dynamics, and has a physical interpretation and can provide useful intuitions. As an extension of the MH algorithm, it exploits the gradient information to draw samples from the posterior. Because HMC algorithm often provides a large move with acceptance rates close to one, its efficiency is higher than that of the MH algorithm. Future research should systematically compare the performance of the slice-within-Gibbs sampler and the HMC algorithm, as well as other auxiliary-variable sampling procedures, in estimating CDMs.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: https://CRAN.R-project.org/package=CDM.

ETHICS STATEMENT
Written informed consent was obtained from the individual(s), and minor(s)' legal guardian/next of kin, for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
XX and JG worked on the technical details of the article. XX and JZ completed the writing, with the support of JT. JT and JZ provided a few suggestions on the focus and direction of the research. All authors contributed to the article and approved the submitted version.