Exploring Multiple Strategic Problem Solving Behaviors in Educational Psychology Research by Using Mixture Cognitive Diagnosis Model

A mixture cognitive diagnosis model (CDM), which is called mixture multiple strategy-Deterministic, Inputs, Noisy “and” Gate (MMS-DINA) model, is proposed to investigate individual differences in the selection of response categories in multiple-strategy items. The MMS-DINA model system is an effective psychometric and statistical approach consisting of multiple strategies for practical skills diagnostic testing, which not only allows for multiple strategies of problem solving, but also allows for different strategies to be associated with different levels of difficulty. A Markov chain Monte Carlo (MCMC) algorithm for parameter estimation is given to estimate model, and four simulation studies are presented to evaluate the performance of the MCMC algorithm. Based on the available MCMC outputs, two Bayesian model selection criteria are computed for guiding the choice of the single strategy DINA model and multiple strategy DINA models. An analysis of fraction subtraction data is provided as an illustration example.


INTRODUCTION
Multiple classification latent class models, namely cognitive diagnosis models (CDMs), have been developed specifically to diagnose the presence or absence of multiple fine-grained skills required for solving problems in an examination (Doignon and Falmagne, 1999;Junker and Sijtsma, 2001;Tatsuoka, 2002;de la Torre and Douglas, 2004;Templin and Henson, 2006;DiBello et al., 2007;Haberman and von Davier, 2007;de la Torre, 2009de la Torre, , 2011Henson et al., 2009;von Davier, 2014;Chen et al., 2015). Compared with the traditional item response theory models, one of the advantages of multiple classification latent class models is that they can provide effective measurement of student learning and progression, design better teaching instruction, and conduct possibly intervention guidance for different individual and group needs.
However, most CDMs only consider the probability that examinees solve a problem in one way. In fact, examinees may solve a problem in different ways. Fuson et al. (1997) found that the children at elementary schools used more than one strategy to solve the problem of multi-digit addition and subtraction. Moreover, in eye-movement studies, Gorin (2007) expounded that the subjects often used very different cognitive strategies when solving similar reading tasks. More specifically, an example of a multiple-strategy used by de la Torre and Douglas (2008) in educational research is on the analysis of fraction subtraction data including responses of 2,144 examinees to 15 fraction subtraction items. The attributes required for the fraction subtraction are as follows: (a) performing basic fraction subtraction operation; (b) simplifying/reducing; (c) separating whole number from fraction; (d) borrowing one from whole number to fraction; (e) converting whole number to fraction; (f) converting mixed number to fraction; (g) column borrowing in subtraction (de la Torre and Douglas, 2008). As an illustration, they use two strategies to solve 4 4 12 − 2 7 12 . Strategy 1 requires attributes a, b, c, and d. Strategy 2 requires attributes a, b, and f. The detailed calculation processes were shown in de la Torre and Douglas (2008).
de la Torre and Douglas (2008) proposed a multiple strategy-Deterministic, Inputs, Noisy "and" Gate (MS-DINA) model to address the problem of fraction subtraction, where the DINA model (Haertel, 1989;Doignon and Falmagne, 1999;Junker and Sijtsma, 2001;de la Torre and Douglas, 2004;de la Torre, 2009) was the most popular and widely used model among various CDMs which assumed that examinees were expected to answer an item correctly only when they possessed all the required attributes. The MS-DINA model is a straightforward extension of the DINA model that allows incorporating multiple strategies for cognitive diagnosis based on competing assumptions. However, as de la Torre and Douglas (2008) indicated, although the simplicity of the MS-DINA model was appealing, it made a restrictive assumption that the item parameters were same for different strategies, which implied that the application of each strategy was equally difficulty. Another limitation of MS-DINA model is that the joint distribution attributes is expressed as a function of a higher-order continuous ability. The joint distribution of the attributes as the most special form of the saturated model may not be applied to all cases (Huo and de la Torre, 2014). Moreover, the MS-DINA model cannot provide the information of the strategies selected by the examinees, that is, in the case that multiple strategies are available, the probability of each strategy being used cannot be obtained, and the strategy diagnosis for examinees is an important part in the multiple strategies cognitive diagnosis.
To maximize the diagnostic results of multiple-strategy (MS) assessment and overcome the limitation that assumes identical item parameters across strategies, in this paper, we propose a cognitive diagnosis framework for analyzing the MS data. Specifically, the framework describes a psychometric model that can exploit multiple-strategy information. The psychometric model is a multiple-strategy model called the mixture multiple-strategy DINA (MMS-DINA) model. The details of the framework are laid out in section 2. In section 3, MCMC algorithm is employed to estimate model parameters. In section 4, four simulation studies are used to evaluate the viability of the proposed framework and to simulate true testing conditions to evaluate the performance of the MCMC algorithm based on several different criteria. According to the available MCMC outputs, two Bayesian model selection criteria are computed to guide the choice of the single strategy DINA model and multiple strategy DINA models. An empirical example of fraction subtraction is used to illustrate the application of the proposed MMS-DINA model in section 5. The final section concludes the article with discussion and some directions for further research.

Multiple-Strategy DINA Model
The MS-DINA model (de la Torre and Douglas, 2008; Huo and de la Torre, 2014) is a straightforward extension of the DINA model, which allows several different strategies of solution for each item. Let u ij denote the observed item response for the ith examinee to response jth item, where i = 1, 2, . . . , N, and j = 1, 2, . . . , J, u ij = 1, if the ith examinee correct answer the jth item, 0 otherwise. The ith examinee mastery attribute profile, α i , can be represented by a vector of length K, that is, 1, the ith examinee masters the kth attribute; 0, otherwise.
Suppose each item has as many as M distinct strategies that would suffice to solve it. A strategy is defined as a subset of the K attributes which could be used together to solve the item. This may be coded by constructing M different matrices, Q 1 , . . . , Q M , and the element in the jth row and kth column of Q m (m = 1, 2, . . . , M) is denoted as The latent variable η ijm denotes whether the examinee i has the all the required attributes to apply the mth strategy to the jth item. Let The variable η ij is 1 if examinee i satisfies the attribute requirements of at least one of the M strategies. Therefore, the item response function of the MS-DINA model is given as where the parameter s j is the slipping parameter, which indicates the probability of slipping on the jth item when an examinee has mastered all the required attributes for at least one of the strategies. The parameter g j is the guessing parameter, which denotes the probability of correctly answering the jth item when an examinee does not master all the required attributes for at least one of the strategies.

Mixture Multiple-Strategy DINA Model
We can see that the MS-DINA model assumes that the slipping and guessing parameters are the same for different strategies.
The assumption that the application of each strategy has equally difficulty is too restrictive, as indicated by de la Torre and Douglas (2004). Then, de la Torre and Douglas (2008) tried and suggested a variant of the multiple-strategy model in order to break the limitation mentioned above. However, one of the issues they discussed is a feasible approach for estimating the parameters in their model can not be provided due to the necessary identifiability issues. Inspired by their thoughts, we propose a multiple-strategy model to overcome the limitation that assumes identical item parameters across strategies. One way to solve the problem is to use a discrete mixture model. Discrete mixture models assume that a data set is composed of distinct subpopulations of observations that are described by different parametric distributions (Titterington et al., 1985). Thus, a mixture multiple-strategy-DINA (MMS-DINA) model is proposed to allow for different strategies to be associated with different levels of difficulty. The item response function of the MMS-DINA model is given by swhere M is the number of strategy, p ijm indicates the correct response probability that the ith examinee adopts the mth strategy to answer the jth item, and π m (m = 1, 2, . . . , M) is a mixing proportion satisfying M m=1 π m = 1. In addition to the specific strategy, mixing proportion parameters are related to the distribution of α. The average value of latent attributes for all examinees (α) using strategy m is µ m . The parameters s jm and g jm denote the slipping and guessing parameters for the mth strategy to the jth item, respectively. When the number of strategies is one (i.e., M = 1), it is apparent that the MMS-DINA model in Equation (2) reduces to the DINA model.

Bayesian Estimation
Within a fully Bayesian framework, the Metropolis-Hastings within the Gibbs sampling algorithm (Geman and Geman, 1984;Casella and George, 1992;Chib and Greenberg, 1995;Gilks, 1996;Patz and Junker, 1999a,b) is used to estimate the model parameters. In fact, MCMC methods have been found to be particularly useful in estimating mixture distributions (Diebold and Robert, 1994), including mixtures that involve random effects within classes (Lenk and DeSarbo, 2000). A common MCMC strategy is to sample a class membership parameter for each observation at each stage of the Markov chain (Robert, 1996). For the current model, a strategy membership parameter, c i = 1, 2, . . . , M, is sampled for each examinee i along with a latent attribute parameter α i . Then, the item response function of the MMS-DINA model in Equation (2) can be expressed as where the latent variable c i takes a value in the set {1, 2, . . . , M} for the ith examinee, indicating which type of strategies the ith examinee uses.
Assuming conditional independence between the mixing proportions and all parameters except the strategy memberships of examinees, the mixing proportions have a full condition posterior distribution of the form: p π all other parameters ∝ p (c |π ) f prior (π) , where n m is the number of examinees using strategy m. This full conditional distribution is Dirichlet (β 1 + n 1 , β 2 + n 2 , . . . , β M + n M ) .
Step 2: Sample a strategy membership c i for each examinee, where i = 1, . . . , M. Assuming independence of examinees, the full condition posterior distribution of c i can be written as where u i = u i1 , . . . , u iJ ′ is the item response vector for examinee i across items, J and K are respectively the numbers of item and attribute, and Bernoulli (α ik ; µ m ) is the Bernoulli density evaluated at α ik with parameter µ m .
Step 3: Sample attribute mean µ m for each strategy. Assuming the attribute distribution parameters are independent of all parameters expect the attribute vectors for examinees in mth strategy, the full conditional distribution of µ m can be written as which results in the following full conditional distribution for µ m : where I (·) denotes the indicator function. I (c i = m) = 1 if the ith examinee choose the mth strategy to answer the item, 0 otherwise.
Step 4: Sample a latent variable α i for each examinee, where i = 1, . . . , N. Assuming independence of examinees, the full conditional distribution of α i can be written as Step 5: Sample item parameters s jm and g jm for each strategy and each item. Assuming conditional independence across items, the full conditional distribution of s jm and g jm can be written as p s jm , g jm all other parameters where u j = u 1j , . . . , u Nj ′ is the item response vector for item j across examinees, N is the number of examinees.

Bayesian Model Assessment
Within the Bayesian framework, the deviance information criterion (DIC; Spiegelhalter et al., 2002) and the logarithm of the pseudo-marignal likelihood (LPML; Geisser and Eddy, 1979;Ibrahim et al., 2001) are considered to compare three different models (the DINA model, the MS-DINA model, and the MMS-DINA model). As an explanation, we only provide the most complicated calculation process of DIC and LPML in the MMS-DINA model, and the calculation formulas of DIC and LPML for the DINA model and MS-DINA model are similar. These two criteria are based on the log-likelihood functions evaluated at the posterior samples of model parameters. Therefore, the DIC and LPML of the MMS-DINA model can be easily computed.
jm , π (r) m ′ for i = 1, . . . , N, j = 1, . . . , J, m = 1, . . . , M and r = 1, . . . , R, which denotes rth MCMC sample from the posterior distribution in (4). The joint likelihood function of the responses can be written as where p u ij α i , s jm , g jm is the response probability. The logarithm of the joint likelihood function in (11) evaluated at (r) is given by jm .
(12) Since the joint log-likelihoods for the responses, jm , i = 1, . . . , N, j = 1, . . . , J, and m = 1, . . . , M are readily available from MCMC sampling outputs, (12) is easy to compute. Now, we calculate DIC as follows log L u (r) and Dev( ) In (13) Letting a Monte Carlo estimate of the conditional predictive ordinate (CPO; Gelfand et al., 1992;Chen et al., 2000) is given by Note that the maximum value adjustment used in log (CPO ij ) plays an important role in numerical stabilization in computing (14). A summary statistic of the CPO ij is the sum of their logarithms, which is called the LPML and given by The model with a larger LPML has a better fit to the data.

The Accuracy Evaluation of Parameter Estimation
To implement the MCMC sampling algorithm, chains of length 10,000 with an initial burn-in period 5,000 are chosen. Fifty replications are used in the following simulation studies. Three indices are used to assess the accuracy of the parameter estimates. Let ϑ be the parameter of interest. Assume that M = 50 data sets are generated. Also, let ϑ (m) and SD (m) (ϑ) denote the posterior mean and the posterior standard deviation of ϑ obtained from the mth simulated data set for m = 1, . . . , M.
The Bias for parameter ϑ is defined as and the mean squared error (MSE) for parameter ϑ is defined as and the average of posterior standard deviation can be defined as In addition, four criteria are used to assess the accuracy of the examinee classification methods. These criteria include the following: (h) the marginal correct classification rate for each attribute; (t) the proportion of examinees classified correctly for all K attributes; (v) the proportion of examinees classified correctly for at least K − 1 attributes; (z) the proportion of examinees classified incorrectly for K − 1 or K attributes.

Simulation 1
This simulation study is conducted to evaluate the parameter recoveries of the proposed model using the MCMC algorithm as the number of examinees increases. Here, we fix the test length and the numbers of attributes.

Simulation Designs
The following manipulated conditions are considered. Test length is fixed at 20, and 2 strategies with 5 attributes are used in this simulation. The corresponding Q matrix of the 20 items is the same as de la Torre (2008, p. 605); and the number of examinees, N = 500, 1,000, and 2,000. Fully crossing different levels produce 3 simulation conditions (1 test length × 3 sample sizes). The true values of slipping and guessing parameters are set to be 0.3 and 0.1, respectively. Assuming independence among examinees and independence among attributes, the true value of α ik is generated from Bernoulli (0.5) . We can obtain a N × 5 matrix α, where α = (α 1 , α 2 , . . . , α i , . . . , α N ) ′ , and the ith row vector α i denotes the ith examinee's true cognitive state. The hyper-parameters of the prior distributions are fixed as follows: β 1 = β 2 = 0.01, and λ 1 = λ 2 = 0.5. We assume the priors of the slipping and guessing parameters to follow a 4-Beta (1, 2, 0.1, 0.5) based on de la Torre and Douglas (2004)'s paper. Response data are simulated using the MMS-DINA model. About 50 replications are considered to evaluate the parameters recovery in this simulation.
To evaluate the convergence of parameter estimations, we only consider the convergence in the case of minimum sample sizes. That is, the number of examinees is 500. Two methods are used to check the convergence of our algorithm. Note that the ABias, AMSE, and ASD denote the average Bias, average MSE, and average SD for all item parameters.
One is the "eyeball" method to monitor the convergence by visually inspecting the history plots of the generated sequences (Hung and Wang, 2012), and the other method is to use the Gelman-Rubin method (Gelman and Rubin, 1992;Brooks and Gelman, 1998) to check the convergence of the parameters. The convergence of Bayesian algorithm is checked by monitoring the trace plots of the parameters for consecutive sequences of 10,000 iterations. The trace plots show that all parameter estimates converge quickly. We set the first 5,000 iterations as the burn-in period. In addition, the values of the potential scale reduction factor R (PSRF; Brooks and Gelman, 1998) are calculated. We find the PSRF (Brooks and Gelman, 1998) values of all parameters are less than 1.2, which ensures that all chains converge as expected.

Recovery Results Based on Minimum Sample Sizes
As an illustration, we only show the Bias, MSE, and SD for all of the slipping and guessing parameters based on 500 examinees. In the case of the strategy 1, the Bias is between 0.083 and 0.110 for the slipping parameters and between 0.053 and 0.096 for the guessing parameters. The MSE is between 0.007 and 0.019 for the slipping parameters and between 0.004 and 0.013 for the guessing parameters. The SD are about 0.057 and 0.020 for the slipping and guessing parameters. In the case of the strategy 2, the Bias is between 0.087 and 0.107 for the slipping parameters and between 0.069 and 0.114 for the guessing parameters. The MSE is between 0.007 and 0.011 for the slipping parameters, between 0.006 and 0.018 for the guessing parameters. The SDs are about 0.057 and 0.022 for the slipping and guessing parameters. We consider the criteria (h) in this simulation study, and the results show that the marginal correct classification rates are consistently high for the MMS-DINA model. Based on the criteria (t) through (z), we find that the MMS-DINA model consistently classifies examinees correctly high at least K − 1 attributes and produces few severe misclassifications. Thus, the classification method on the MMS-DINA model is effective.

Item Parameters Recovery Based on Different Sample Sizes
Given the total test length, when the number of individuals increases from 500 to 2,000, the average Bias, MSE, and SD for slipping and guessing parameters decrease. For example, under the first strategy, the average Bias of all slipping parameters decreases from 0.101 to 0.079, the average MSE of all slipping parameters decreases from 0.010 to 0.006, and the average SD of all slipping parameters decreases from 0.057 to 0.044. The average Bias of all guessing parameters decreases from 0.078 to 0.058, the average MSE of all guessing parameters decreases from 0.011 to 0.008, and the average SD of all guessing parameters decreases from 0.021 to 0.016. The evaluation results of the accuracy of item parameter estimation for different numbers of examinees are given in Table 1. We find that as the number of individuals increases, the estimates of item parameters become more accurate. In summary, the estimation of this algorithm is effective and accurate under the condition of simulation study 1.

Simulation 2
This simulation study is conducted to assess the parameter recoveries of the proposed model using the MCMC algorithm as the number of items increases. Here, we fix the sample size and the numbers of attributes.

Simulation Designs
The following manipulated conditions are considered. The number of examinees is fixed at 1,000, and the number of items, J = 20 or 30. Two strategies with five attributes are considered in this simulation. The corresponding Q matrix of the 20 items is the same as de la Torre (2008, p. 605), and the Q matrix of the 30 items is shown in Table 2. Fully crossing different levels have two conditions (2 test lengths × 1 sample size).
The true values and prior distributions for the parameters are the same as the simulation 1. To implement the MCMC sampling algorithm, chains of length 10,000 with an initial burnin period 5,000 are chosen. Fifty replications are considered in this simulation. The following conclusions can be obtained. Given the total number of examinees, when the number of items increases from 20 to 30, the average Bias, MSE, and SD for slipping and guessing parameters increase. For example, for the first strategy, the average Bias of all slipping parameters increases from 0.086 to 0.093, the average MSE of all slipping parameters increases from 0.007 to 0.009, and the average SD of all slipping parameters increases from 0.048 to 0.051. The average Bias of all guessing parameters increases from 0.063 to 0.087, the average MSE of all guessing parameters increases from 0.009 to 0.014, and the average SD of all guessing parameters increases from 0.018 to 0.023. The evaluation results of the accuracy of item parameter estimation for different numbers of items are specified in Table 3.

Simulation 3
This simulation study is conducted to evaluate the recoveries of the proposed model using the MCMC algorithm as the number of attributes increases. Here, the sample size and the test length are fixed.

Simulation Designs
The following manipulated conditions are considered. The number of examinees is fixed at 1,000, and the number of items is fixed at 40, that is, J = 40. Two strategies with seven attributes are considered in this simulation. The corresponding Q matrix of the 40 items is shown in Table 4. The true values and prior distributions for the parameters are the same as the simulation 1. To implement the MCMC sampling algorithm, chains of length 10,000 with an initial burn-in period 5,000 are chosen. Fifty replications are considered in this simulation. The recovery results of item parameters are shown in Table 5.
We find that when the number of attributes increases, the maximums of the average Bias, MSE, and SD for all of the

Simulation 4
In this simulation study, we use the DIC and LPML model assessment criteria to evaluate model fitting.

Simulation Designs
In this simulation, the number of examinees N = 1, 000 is considered and the test length is fixed at 20. The Q matrix from de la Torre (2008, p. 605)'s paper is used in this simulation study. Three cognitive diagnosis models will be considered. That is, the DINA model, the MS-DINA model, and the MMS-DINA model. Therefore, we evaluate the model fitting in the following three cases. The true values and prior distributions for the parameters are the same as the simulation 1. To implement the MCMC sampling algorithm, chains of length 10,000 with an initial burn-in period 5,000 are chosen. The results of the Bayesian model assessment based on the 50 replications are shown in Table 6. Note that the following results of DIC and LPML are based on the average of 50 replications. From Table 6, we find that when the DINA model is the true model, the DINA model fits the data best as we expected. The average DIC and LPML for the DINA model are 17605.31    . This shows that when the data come from the mixture multiple strategy model, the DINA model with a single strategy is obviously ineffective in fitting this batch of data. The MS-DINA model has better fitting than the DINA model. No matter which models (DINA and MS-DINA) generate data, the MMS-DINA model is better fitting model than the other not true models. The MMS-DINA model is effective under many conditions of model fitting. In summary, the Bayesian assessment criterion is effective for identifying the true models, and it can be used in the subsequent real data study.

Data
To study the applicability of the mixture multiple-strategy DINA model, we consider a real data including responses by 528 middle school students to answer 15 fraction subtraction items, which is a subset of the data originally used and described by Tatsuoka (2002). The Q-matrix design is given in de la Torre and Douglas (2008) research. Two strategies are considered to solve the 15 items, where the attribute definition is the same as in the introduction. The prior distributions described in the simulation

Bayesian Model Assessment
Three comparative models, the DINA model, the MS-DINA model, and the MMS-DINA model, are used to fit the fraction subtraction data. The deviance information criterion (DIC; Spiegelhalter et al., 2002) and the logarithm of the pseudomarignal likelihood (LPML; Geisser and Eddy, 1979;Ibrahim et al., 2001) are computed on the "CODA" R package (Plummer et al., 2006

Results
The estimated posterior means and the SDs for the MMS-DINA model are shown in Table 7. The estimates of the slipping parameters range from 0.10 to 0.23 and the estimates of the guessing parameters range from 0.10 to 0.25. For the item 2, the students choose two strategies to answer the item, in which the first strategy examines four attributes (attributes 1, 2, 3, and 4), and the second strategy examines two attributes (attributes 1 and 6). We know that the more attributes an item measures, the lower the probability that the specific examinee will answer correctly. This is because the examinee can answer the item correctly if they have mastered all the attributes. If the examinee answers correctly the item with more attributes, the examinee is more likely to guess correctly the item. Therefore, for item 2, the estimate of the guessing parameter under the first strategy is 0.22, which is higher than the estimate of the guessing parameter under the second strategy is 0.18. Similarly, for item 4, the first strategy examines five attributes (attributes 1, 2, 3, 4, and 5) and the second strategy examines three attributes (attributes 1, 5, and 6). The corresponding estimates of guessing parameters are 0.18 and 0.12, respectively. When the number of attributes examine under the two strategies is the same, the estimates of the guessing parameters of the two strategies are basically the same. For example, four attributes are examined under both strategies for item 15. The probability of guessing under both strategies is the same as 0.11. In addition, the three items with the easiest slipping are items 6, 5, and 15 when using the strategy 1, and the corresponding estimates of the slipping parameters are 0.21, 0.20, and 0.17, respectively. When using the strategy 2, the three items with the easiest slipping are items 6, 13, and 2. The corresponding estimates of the slipping parameters are 0.17, 0.15, and 0.14, respectively. In order to depict individual tendency of which strategy the examinees used, we use the probability plots of examinees choosing different strategies to show the selection tendency of all 528 examinees. In Figure 1, We find that 432 examinees use the first strategy to answer all 15 items. Compared to the first strategy, the number of examinees who adopt the second strategy is relatively small, only 96 examinees.

CONCLUSIONS AND DISCUSSION
The goal of this article is to investigate a discrete mixture version of multiple-strategy model for cognitive diagnosis. A unique feature of the mixture model (MMS-DINA model) presented in this article is its capacity to break the limitation that assumes identical item parameters across strategies. The model-based approach presented in this article provides a natural generalization of the DINA model that allows it to account for the strategies to have different item parameters for each item. In the simulation studies, two simulation designs to examine the accuracy of the algorithm estimation from three different perspectives. The simulation results indicate that MCMC algorithm can be used to obtain accurate parameter estimates. Thus, this research provides researchers a tool that allows them to explore the practicability of the MMS-DINA model, which can in turn pave the way for the applications of CDMs in practical education settings to inform instruction and learning. In addition, two Bayesian model assessment FIGURE 1 | The probabilities of examinees choosing different strategies. The y-axis indicates the probabilities of all examinees using the first strategy to answer items. 0 indicates that examinees use the first strategy to answer item with 0% probability, while 1 indicates that examinees use the first strategy to answer items with 100% probability.
criterion are considered to evaluate the model fitting among DINA model, MS-DINA model and MMS-DINA model. We find that when the data are generated from the simple singlestrategy DINA model, the MMS-DINA model fits the data better than the MS-DINA model. This may be because each strategy is selected with a certain probability in the MMS-DINA model, unlike the MS-DINA model, which randomly chooses one strategy from multiple strategies. In this way, the Q matrix used in the MS-DINA model may be inconsistent with the Q matrix of the DINA model that generates data, resulting in the biased estimates and poor fitting. However, when the data are generated from MMS-DINA model, the DINA model is the worst fitting model. The worst fitting result is attributed to the relatively simple model structure, which leads to the phenomenon of under-fitting. Finally, we draw a valuable conclusion that no matter which models (DINA and MS-DINA) generate the data, the MMS-DINA is better fitting model than other not true models. However, in the real data analysis, the DINA model is preferred for this data set because its relatively simple formulation do not lead to worse fit compared with the MS-DINA model and MMS-DINA model.
Classification methods based on CDMs play an important role in cognitive diagnosis, because it is desired in some educational settings to classify examinees as masters or non-masters of multiple discrete latent attributes. In simulation study, as an illustration, we consider the MMS-DINA model is used in the situation that 500 examinees answer 20 items, indicating that it classifies few examinees correctly on all K skills but classifies high ability examinees almost or exactly correctly with few severe misclassification.
Because there are a large number of parameters in MMS-DINA model, we can only rely on MCMC algorithm to estimate the parameters. However, the computational burden of the MCMC algorithm becomes intensive especially when a large number of examinees or the items is considered, or a large number of the MCMC sample size is used. Therefore, it is desirable to develop a standing-alone R package associated with C++ or Fortran software for more extensive large-scale assessment program. In addition, the convergence of Bayesian algorithm need to be further investigated in the next studies. Firstly, for the PSRF value, we use a relatively relaxed 1.2 as a cutoff for determining the convergence of Bayesian estimation based on the previous literature (Brooks and Gelman, 1998;Fagua et al., 2019). In fact, we cannot decide whether 1.2 as a cutoff is really sufficient to determine the convergence. Educational psychologists have to be more careful when choosing 1.2 as a cutoff. This is because the effective sample size (ESS) can be only small, which can result in the summary statistics for the chain that provide only poor approximations of the Bayesian estimates. More specifically, the mean of the chain might not be very close to the expected value of the posterior distribution from the perspective of Bayesian point estimation. Therefore, in more substantive applications of the model, a more conservative PSRF cutoff (e.g., PSRF < 1.05) should ideally be used Vehtari et al., 2019;Zitzmann and Hecht, 2019). However, if we use a more conservative criterion for the PSRF, it is unknown how long it will take to achieve a PSRF of 1.05, and it will be a great challenge for our MMS-DINA model due to the large number of unknown parameters to be estimated. In order to achieve a cutoff of 1.05 for PSRF, we need to run a longer Markov chains to achieve the required number of ESS for convergence, but this process is very time-consuming and requires a large amount of computer memory. These require us to do a lot of simulation studies in later stages to give the definite results. Secondly, we also need to further investigate whether the obtained standard errors are accurate by using the coverage rate. However, these studies are beyond the purpose of this study to analyze the different solution strategies of the examinees by constructing a MMS-DINA model.
There are several avenues for further research on multiplestrategy models. In this paper, we focus on the comparison of multiple-strategy models under the most commonly used DINA model framework, and explore the cognitive process of solving items using different strategies among examinees, without focusing on the comparison of other multiple strategy cognitive diagnostic models, such as MS high-order DINA model, or some saturated type MS CDMs which are MS generalized DINA models, or MS loglinear cognitive model and so on. As Li et al. (2016) point out, it needs to be further explored to find the most appropriate model to fit data among the numerous cognitive diagnosis models. Therefore, in the later research, we will focus on the comparison of different MS CDMs to find out the advantages, disadvantages, and application scope of each model. In addition, the different classification methods may be helpful in both item selection and final examinee classification (Xu et al., 2003;Cheng, 2009). Also, note that a strategy is merely defined by the set of attributes required by a particular approach to solving a problem. One can imagine that a strategy might instead be determined by a set of attributes as well as a procedure and sequence for using them. So depending on how the attributes are defined, this will not always be the case, and one may consider different methods of using the same attributes. In addition, in this study, we only analyze two strategies. When the number of strategies increase, the performance of our MMS-DINA model needs to be further investigated. For example, we need to investigate that whether the identification conditions are satisfied as the number of strategies increases, as well as whether the parameter estimates are recovered well. In addition, the computational efficiency may be reduced due to the large number of parameters with the increased strategies.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found at: https://cran.rproject.org/web/packages/CDM/index.html.