Cognitive Diagnosis Modeling Incorporating Item-Level Missing Data Mechanism

The aim of cognitive diagnosis is to classify respondents' mastery status of latent attributes from their responses on multiple items. Since respondents may answer some but not all items, item-level missing data often occur. Even if the primary interest is to provide diagnostic classification of respondents, misspecification of missing data mechanism may lead to biased conclusions. This paper proposes a joint cognitive diagnosis modeling of item responses and item-level missing data mechanism. A Bayesian Markov chain Monte Carlo (MCMC) method is developed for model parameter estimation. Our simulation studies examine the parameter recovery under different missing data mechanisms. The parameters could be recovered well with correct use of missing data mechanism for model fit, and missing that is not at random is less sensitive to incorrect use. The Program for International Student Assessment (PISA) 2015 computer-based mathematics data are applied to demonstrate the practical value of the proposed method.


INTRODUCTION
Cognitive diagnosis has recently received increasing concern in psychological and educational assessment, which can provide fine-grained classifications and diagnostic feedback for respondents from their performance on test items (Leighton and Gierl, 2007;Rupp et al., 2010). It is a useful tool to identify students' mastery status of different latent skills based on their responses to test items and to evaluate patients' presence of mental disorders based on their responses to diagnostic questions. More specifically, cognitive diagnosis has been used to study fraction subtraction (de la Torre and Douglas, 2004), language proficiency (von Davier, 2008;Chiu and Köhn, 2016), psychological disorders (Templin and Henson, 2006;Peng et al., 2019), and so forth.
Various cognitive diagnosis models (CDMs), also called diagnosis classification models, have been developed, such as the deterministic inputs, noisy "and" gate (DINA) model (Macready and Dayton, 1977;Junker and Sijtsma, 2001) and the deterministic inputs, noisy "or" gate (DINO) model (Templin and Henson, 2006). Most CDMs are parametric and model the probability of item response as a function of latent attributes. The simplicity and interpretability make the parametric CDMs popular in practice. More general CDMs, such as the log-linear cognitive diagnosis model (LCDM; Henson et al., 2009) and the generalized DINA model (de la Torre, 2011), assume a more flexible relationship between the item responses and latent attributes. Moreover, higher-order latent trait models for cognitive diagnosis (de la Torre and Douglas, 2004) have been introduced to link the correlated latent traits by a general high-order ability.
Multiple items are often used for cognitive diagnosis. When respondents choose to answer some but not all items, item-level missing data occur (Chen et al., 2020). Respondents may refuse to answer items that they deem too difficult, quit the test early because it is too long, or just skip items because of carelessness. Missing data lead to loss of information and may result in biased conclusions (Glas and Pimentel, 2008;Köhler et al., 2015;Kuha et al., 2018). Many studies have employed a complete case analysis that only use subjects without missing data (e.g., Xu and Zhang, 2016;Chen et al., 2017;Zhan et al., 2019a). In this case, the subjects with missing data cannot receive any diagnostic feedback and, more importantly, may produce biased results when subjects with complete data are systematically different from those with missing data (Pan and Zhan, 2020).
In the literature, three missing data mechanisms should be distinguished: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) (Rubin, 1976;Little and Rubin, 2020). The MCAR holds if the probability of missingness is independent of both the observed and unobserved responses, whereas the MAR holds if the probability of missingness is independent of the unobserved responses given the observed responses. If either of these two conditions cannot be satisfied, i.e., the probability of missingness depends on the unobserved responses, the MNAR occurs. If the missing data mechanism is MCAR or MAR, unbiased estimation can be obtained from the observed data; if the missing data mechanism is MNAR, a model for the missing data mechanism should be included to obtain valid estimations of the primary parameters. Limited approaches have been proposed in CDMs incorporating missing data mechanisms. Ömür Sünbül (2018) considered MCAR and MAR in the DINA model. Recently, Ma et al. (2020) have used the sequential process model to accommodate omitted items due to MNAR, where the first internal task, i.e., making the decision to either skip the item or respond to it, is assumed to be affected by a latent categorical variable representing the response tendency. As stated by Heller et al. (2015), CDMs have connections to knowledge space theory (KST), which has been developed by Doignon and Falmagne (1985) (see also Doignon and Falmagne, 1999;Doignon, 2011). de Chiusole et al. (2015) and Anselmi et al. (2016) have developed models for the analysis of MCAR, MAR, and MNAR data in the framework of KST. In their work, the MCAR holds if the missing response pattern is independent of the individual's knowledge state (i.e., the collection of all items that an individual is capable of solving in a certain disciplinary domain) and of the observed responses; the MAR holds if the missing response pattern is conditionally independent of the knowledge state given the observed responses; and the MNAR holds if the missing response pattern depends on the knowledge state. In CDMs, the attribute profile (i.e., the collection of all attributes that an individual masters in a certain disciplinary domain) is similar to the knowledge state in KST. In our study, an additional latent categorical variable is introduced to indicate missingness propensity. This latent categorical variable affects the probability of missingness for each item and, in the meantime, may affect the latent attributes through the influence of the general high-order ability. The missing data mechanism can then be incorporated into cognitive diagnosis. Similar ideas can be seen in Holman and Glas (2005), Rose et al. (2015), and Kuha et al. (2018).
In this paper, we propose a joint cognitive diagnosis modeling including a higher-order latent trait model for item responses and a missingness propensity model for item-level missing data mechanism. We take the DINA model as an example for illustration because of its popular use, and the latent traits are linked by a general high-order ability. For a flexible specification of the missingness propensity model, the latent missingness propensity is represented by a categorical variable. The MNAR holds if the distribution of the general high-order ability depends on the latent classes, whereas the MCAR holds if the distribution of the general high-order ability is independent of the latent classes.
The rest of this paper is organized as follows. Section 2 presents the proposed joint model for item responses and item-level missing data mechanism. The Bayesian approach is then developed for model parameter estimation using JAGS. In section 3, simulation studies are conducted to compare the performance of the parameter recovery under different missing data mechanisms. Real data analysis using the PISA 2015 computer-based mathematics data is given in section 4. Some concluding remarks are given in section 5.

JOINT MODELING INCORPORATING ITEM-LEVEL MISSING DATA MECHANISM
We consider N subjects taking a test of I items, and there are K latent attributes to be evaluated. Let Y ni be the response for subject n(n = 1, · · · , N) to item i(i = 1, · · · , I). Let R ni be a missingness indicator corresponding to Y ni , where R ni = 1 if Y ni is observed and R ni = 0 if Y ni is missing.

The Missingness Propensity Model
Let ξ n denote the latent missingness propensity for subject n. ξ n is unobserved and has C categories, which we refer to as latent missingness classes. The missingess probability of (R n1 , · · · , R nI ) is given by where p(R n1 , · · · , R nI ) is the joint probability of (R n1 , · · · , R nI ), p(R ni |ξ n = c) is the conditional probability of R ni given ξ n = c and p(ξ n = c) is the probability of ξ n = c.

The High-Order DINA Model
The DINA model describes the probability of item response as a function of latent attributes as follows: where p(Y ni = 1) is the probability of a correct response for subject n to item i; s i and g i are the slipping and guessing probability for item i respectively, and 1 − s i − g i is the item discrimination index for item i (IDI i ; de la Torre, 2008), α nk is the kth (k = 1, · · · , K) latent attribute for subject n, with α nk = 1 if subject n masters attribute k and α nk = 0 otherwise. The Q matrix is a I × K matrix with binary entries q ik (Tatsuoka, 1983). For each i and k, q ik = 1 indicates that attribute k is required to answer item i correctly and q ik = 0 otherwise. Equation (4) can be reparameterized, and it is called the reparameterized DINA model (DeCarlo, 2011) as where β i = logit(g i ) and δ i = logit(1 − s i ) − logit(g i ) are called item intercept and interaction parameter, respectively. As stated in the literature (de la Torre and Douglas, 2004;Zhan et al., 2018a), attributes in a test are often correlated, and a higher-order structure for the attributes can be formulated by where p(α nk = 1) is the probability of subject n's mastery of attribute k; θ n is a general (higher-order) ability for subject n; and γ k and λ k are the slope and intercept parameter for attribute k. Denote γ = (γ 1 , · · · , γ K ) and λ = (λ 1 , · · · , λ K ). Following Zhan et al. (2018a), a positive slope parameter is assumed, which means the probability of mastery of attribute k increases as the general ability θ n grows. Including a higher-order structure for cognitive diagnosis can not only reduce the number of model parameters for correlated latent attributes but also obtain an assessment for subjects' overall ability. The distribution of the general ability θ n may be affected by the latent missingness classes of ξ n , and we suppose that for n = 1, · · · , N and c = 1, · · · , C. If µ c or σ 2 c (c = 1, · · · , C) may vary between different classes, the missing data mechanism is MNAR, and we set µ 1 = 0 and σ 2 1 = 1 for model identification. If µ c and σ 2 c remain unchanged for different c, the probability of missingness does not depend on the responses and MCAR holds, and this is where we set θ n ∼ N(0, 1) for identification.

Bayesian Parameter Estimation
The parameters of the proposed model can be estimated using the Bayesian MCMC approach. JAGS (version 4.2.0; Plummer, 2015) and the R2jags package (Su and Yajima, 2020) in R (R Core Team, 2019) were used for estimation, and the JAGS code can be found in the Supplementary Material. The priors of the model parameters are given below. For the majority of the parameters, the conjugate priors are used. The priors and the hyper priors for the item parameters are assigned the same as those given in Zhan et al. (2018a), please find them for details. Moreover, the noninformative prior is used for Dirichlet distribution. The (truncated) normal distribution and the inverted gamma distribution priors are chosen to obtain dispersed values for each corresponding parameter.
(π 1 , · · · , π C ) ∼ Dirichlet(1, · · · , 1), The hyper priors are specified: where I is a 2 × 2 identity matrix. In this case, the mean guessing and slipping probabilities are approximately equal to 0.1. In this paper, the number of latent missingess classes C is taken as fixed. In fact, C can be selected by some information criterion, for example, deviance information criterion (DIC; Spiegelhalter et al., 2002), which can result in a statistically optimal number. In practice, it is efficient to determine C beforehand, using latent class analysis (Linzer and Lewis, 2011) just for the missingness indicators. In the following simulation studies, the number of latent missingess classes C is fixed to 2 for simplicity. For the values of C >2, the results are similar and we report some results for C = 3 in the Supplementary Material.

SIMULATION STUDY
Two simulation studies were conducted to evaluate the empirical performance of the proposed method. Simulation 1 aimed to examine the parameter recovery using the Bayesian MCMC FIGURE 1 | K-by-I Q matrix for simulation study 1. Blank means "0," gray means "1"; "*" denotes items used in I = 15 conditions; K = the number of attributes; I = test length. algorithm when the simulated data were generated under MNAR. Under different conditions, models with MCAR and MNAR were fitted to the simulated data, respectively, where the distribution of the general higher-order ability was unrelated or related to the latent missingness classes in MCAR or MNAR. In Simulation 2, our purpose was to study the sensitivity of incorrect use of MNAR for model fit. Parameter recovery related to diagnostic classification are reported here. The other parameters about the missing data mechanisms can be recovered well but not reported here since they are not our primary interest.

Simulation Study 1
In Simulation 1, three factors were manipulated, including (a) sample sizes (N) at two levels of 500 and 1,000; (b) test length (I) at two levels of 15 and 30; and (c) the probability of missingness for each item, high missingness (HM) and low missingness (LM).
Five attributes (K = 5) were measured and the simulated Q matrices for two test length I = 15 and I = 30 were given in Figure 1, which were used in Zhan et al. (2018b). Most of the model parameters were assigned by referring to the real data analysis presented in Zhan et al. (2018a). Specifically, π = (0.3, 0.7), representing unequal probabilities for each latent class; which was the mean vector and the covariance matrix for a bivariate normal distribution generating β i and δ i . Other assignments for the model parameters, for example, equal probabilities for each latent class as those used in Ma et al. (2020), also make sense, and the results are similar to the above parameter settings.
In each of the eight conditions, models with MCAR and MNAR were fitted to the simulated data, respectively. Thirty replications were implemented for each fitted model. Pilot runs showed that the algorithm converged using 20, 000 iterations with a burn-in phase of 10, 000 iterations. The convergence of the chains was monitored by multivariate potential scale reduction factor, which were < 1.1 (Gelman and Rubin, 1992). The bias and the root mean square error (RMSE) of the Bayesian estimates were computed to assess the parameter recovery. For evaluating the classification of each attribute and attribute profiles, the attribute correct classification rate (ACCR) and the pattern correct classification rate (PCCR) were computed. More formally, if the kth (k = 1, · · · , K) latent attribute for subject n (n = 1, · · · , N), denoted as α k,n , is estimated byα k,n , ACCR for the kth latent attribute and PCCR can be expressed as ACCR = N n=1 I(α k,n = α k,n )/N and PCCR = N n=1 [ K k=1 I(α k,n = α k,n )]/N, respectively. Here I(α k,n = α k,n ) is an indicator function such that I(α k,n = α k,n ) = 1 ifα k,n = α k,n and zero otherwise. Figure 2 presents the recovery of the item mean vector and the item covariance matrix for the models with MCAR and MNAR in all eight conditions. First, the patterns of the item parameter recovery were similar between MCAR and MNAR, especially the RMSEs of MCAR and MNAR were close in each condition. The item mean vector can be well recovered, as the bias were small and the RMSEs were relatively smaller than those of the item covariance matrix. The bias of the item mean vector under MNAR were smaller than those under MCAR at a higher sample size N = 1, 000. The RMSEs of the item mean vector decreased as test length increased. The bias and RMSEs of the item covariance matrix were relatively higher, and decreased as test length increased. The sample size had no consistent impact on the estimates of the item covariance matrix. Moreover, the missingness probabilities had little impact on the item parameter recovery. Table 1 summarizes the item parameter recovery for models with MCAR and MNAR in high missingness conditions. The results for low missingness are similar and presented in Supplementary Figure 1. The mean absolute value (MAV) of the bias and RMSE are reported. In each condition, the MAVs of the bias and RMSE under MCAR and MNAR were close. Detailed information about the recovery of each item parameter with MCAR and MNAR was similar and not reported here.   Table 2 summarizes the estimation of general ability parameter in high missingness conditions. The results for low missingness are similar and presented in Supplementary Figure 2. The MAV and Range of the bias and RMSE are reported. The correlation between the true and the estimated general abilities is also given. The MAVs of the bias under MCAR were much higher than those under MNAR, and the bias under MCAR were all negative, which can be seen from their Ranges. The MAVs of the RMSEs under MCAR were also higher than those under MNAR. The correlations under MNAR were higher than those under MAR. From the above results, we find that when data are generated with MNAR, incorrect use of missing data mechanism could lead to biased estimation of the general abilities. Figures 3, 4 presents the recovery of higher-order parameters between the attributes and the general ability in each condition. For the attribute slope parameters, the bias was closer to zero and the RMSEs were relatively small. For the attribute intercept parameters, their recovery under MNAR were good with small bias and RMSEs. Under MCAR, the bias and RMSEs for attribute intercept parameters were large, with absolute values >1.0, and all the bias were negative. Figure 5 shows the correct classification rates for each attribute and attribute profiles in all conditions. All ACCRs were > 0.90 and all PCCRs were > 0.70, which indicated a good recovery of the mastery status of attributes. ACCRs and PCCRs raised as test length increased and changed little as sample size increased. The correct classification rates of MCAR and MNAR were very close in each condition. These results may be caused by the fact that the impact of the missing data mechanism on the latent attributes is indirect through the general high-order ability, and we assume the same model for item response under both missing data mechanisms.

Simulation Study 2
The aim of Simulation 2 was to empirically examine the sensitivity of incorrect use of MNAR for model fit. In this study, the simulated data were generated with MCAR, i.e., the distribution of the general higher-order ability is independent of the latent missingness classes, and we set θ n ∼ N(0, 1) for identification. The other settings were the same as those used for N = 500, J = 15 and low missingness in Simulation 1, which is a weak condition studied in Simulation Study 1. FIGURE 3 | Recovery of the attribute intercept parameters. HM, high missingness; LM, low missingness. Table 3 presents the bias and RMSE for the item mean vector, the item covariance matrix and the higher-order structure parameters. The recovery of the item parameter mean vector and covariance matrix were similar under MCAR and MNAR. For the higher-order structure parameters, unlike the results in Simulation Study 1, the recovery of the parameters under MNAR were as good as those under MCAR. Table 4 summarizes the recovery of the item and the general ability parameter, where the mean, standard deviation, minimum, and maximum of the bias and RMSE are reported. It also shows the correct classification rates for each attribute and attribute profiles. The results under MCAR and MNAR were similar, with mean bias close to zero and approximately equal correct classification rates for the recovery of each attribute and attribute profile. From Tables 3, 4, we found that when the data were generated under MCAR, the missing data mechanism used in the model fit had little impact on the results for diagnostic classification in our framework.

REAL DATA ANALYSIS
To illustrate the application of the proposed method, the PISA 2015 computer-based mathematics data were used. The data include item scores and response times for each item, and we only select dichotomous item scores for illustration. Four attributes belonging to the mathematical content knowledge were evaluated, i.e., change and relationship (α 1 ), quantity (α 2 ), space and shape (α 3 ) and uncertainty and data (α 4 ). I = 9 items with dichotomous responses were selected, with items IDs CM033Q01, CM474Q01, CM155Q01, CM155Q04, CM411Q01, CM411Q02, CM803Q01, CM442Q02, and CM034Q01. The Q matrix was shown in Table 5. Item responses with code 0 (no credit), code 1 (full credit), and code 9 (noresponse) were considered here. The responses "nonresponse" (code 9) were treated as missing data and might be due to a MNAR mechanism. N = 758 test-takers from Albania with responses 0, 1, and 9 to each of the nine items were used for analysis. The missing rate (i.e., the proportion of "nonresponse") for each item ranged from 0 to 14.78%. The models with missing data mechanisms MCAR and MNAR were both fitted to the data. The number of latent missingess classes C = 2 was determined by latent class analysis. Then, the analysis was specified in the same way as the simulation study. The DIC was applied to compare model fit for models under different missing data mechanisms.
The DIC values under MCAR and MNAR were 11454.2 and 10406.3, respectively, which indicated that MNAR was preferred with a lower DIC value. We were only interested in the results concerning diagnostic classification. Table 6 reports the estimated parameters and its corresponding standard deviations for the item mean vector, the item covariance matrix, and the attribute intercept and slope parameters. The results for the item mean vector and covariance matrix were similar under different missing data mechanisms, and the estimated 12 was −1.069, which indicated that items with a higher intercept corresponded to a lower interaction. µ β was estimated to be −2.033, which means that the mean guessing probability was approximate 0.12. All estimated attribute intercept and slope parameters were positive. The estimations for the item covariance matrix and the attribute intercept and slope parameters were poor, which were consistent with previous studies (de la Torre and Douglas, 2004;Zhan et al., 2018a). Table 7 reports the estimated item parameters. All δ i were positive, which means that all items satisfy g i < 1−s i . Only β i for CM033Q01 was positive, which means that the guessing probability of this item is higher than 0.5.
Though there were 16 possible attribute patterns for four attributes, 15 attribute patterns except (0101) were found in the estimated attributes under both MCAR and MNAR. Figure 6 presents the top five most frequent attribute patterns in the real data under MCAR and MNAR. The most prevalent attribute pattern was (0000) and the second most prevalent pattern was (1111) under both MCAR and MNAR, where the corresponding proportions were slightly different. The third and the fourth most prevalent patterns under MCAR and MNAR were reverse. The above results indicate that the missing data mechanisms have some influences on the estimated attribute patterns.

CONCLUSIONS
When multiple items are used to classify subjects' mastery status of latent attributes, it is almost inevitable that item-level  missing data will occur. It is possible that the missing data mechanism is related to the item responses, and without very strong assumptions, item-level non-response can be thought to depend on some latent variables. Motivated by this idea, we have proposed a joint modeling method incorporating item responses and missing data mechanism for cognitive diagnosis. A latent categorical variable is employed to describe the latent missingness propensity, which can avoid distributional assumptions and result in a more flexible model. Then, the latent missingness classes are linked to each item missingness indicator by the logistic function. Applying the hierarchical modeling framework, the general higher-order ability's distribution is affected by the latent missingness class in the case of MNAR and is independent of the latent missingness class in the case of MCAR. A Bayesian MCMC method is used to estimate the model parameters under the missing data mechanisms MCAR and MNAR. The simulation study demonstrates that when the data are generated under MNAR, the estimated general ability and the attribute intercept parameters have higher bias if an incorrect missing data mechanism is used for model fit; when the data are generated under MCAR, the results between different missing data mechanisms do not have much difference. Similar results about the impact of the missing data mechanisms have been found by de Chiusole et al. (2015) in the framework of KST. The PISA 2015 computer-based mathematics data are used to explore the magnitude and direction of item and person parameters, and the results support the MNAR in the real data analysis.
Our proposed method can be further investigated in several aspects. First, the proposed model has good performance with DINA model used for illustration, and the joint modeling method can be extended to other types of CDMs for further studies. Second, this study assumes that each latent attribute is a binary variable. When polytomous attributes are involved in CDMs (Zhan et al., 2019b), modified higher-order CDMs could be utilized in the framework of our model. Third, multiple sources of data about a subject behavior, for example, response time and other process data, can be combined to build up a more

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found at: http://www.oecd.org/pisa.

AUTHOR CONTRIBUTIONS
NS provided original idea of the paper. XW provided the technical support. Both authors contributed to the article and approved the submitted version.