Item-Weighted Likelihood Method for Measuring Growth in Longitudinal Study With Tests Composed of Both Dichotomous and Polytomous Items

In this paper, a new item-weighted scheme is proposed to assess examinees’ growth in longitudinal analysis. A multidimensional Rasch model for measuring learning and change (MRMLC) and its polytomous extension is used to fit the longitudinal item response data. In fact, the new item-weighted likelihood estimation method is not only suitable for complex longitudinal IRT models, but also it can be used to estimate the unidimensional IRT models. For example, the combination of the two-parameter logistic (2PL) model and the partial credit model (PCM, Masters, 1982) with a varying number of categories. Two simulation studies are carried out to further illustrate the advantages of the item-weighted likelihood estimation method compared to the traditional Maximum a Posteriori (MAP) estimation method, Maximum likelihood estimation method (MLE), Warm’s (1989) weighted likelihood estimation (WLE) method, and type-weighted maximum likelihood estimation (TWLE) method. Simulation results indicate that the improved item-weighted likelihood estimation method better recover examinees’ true ability level for both complex longitudinal IRT models and unidimensional IRT models compared to the existing likelihood estimation (MLE, WLE and TWLE) methods and MAP estimation method, with smaller bias, root-mean-square errors, and root-mean-square difference especially at the low-and high-ability levels.


INTRODUCTION
The measurement of change has been a topic to both practitioners and methodologists (e.g., Dearborne, 1921;Woodrow, 1938;Lord, 1963;Fischer, 1973Fischer, , 1976Fischer, , 1995Rasch, 1980;Andersen, 1985;Wilson, 1989;Embretson, 1991Embretson, , 1997von Davier and Xu, 2011;Barrett et al., 2015). Item response theory (IRT), particularly, a family of Rasch models (RM), provides a new perspective to modeling change. Andersen (1985) proposed the multidimensional Rasch model for modeling growth in the repeated administration of the same items at different occasions. Embretson (1991) presented a special multidimensional Rasch model for measuring learning and change (MRMLC) based on IRT. Embretson's model postulated the involvement of K abilities for K occasions. Specifically, the MRMLC assumes that on the first occasion (k = 1), performance depends on initial ability. The MRMLC further assumes that on later occasions (k > 1), performance also depends on k−1 additional abilities, termed "modifiabilities, " as well as initial ability. Thus, the number of abilities increases at each time point. The same items are repeated over occasions in Andersen's model which may lead to practice effects or memory effects and result in local dependency among item responses (von Davier and Xu, 2011), whereas items in Embretson's MRMLC are not necessarily repeated. Fischer (2001) extended the MRMLC to polytomous items by extending the partial credit model (PCM, Masters, 1982). This paper extends Embretson's method to measure growth based on item responses from mixed-format tests composed of both dichotomous and polytomous items which are frequently used in large-scale educational assessments, such as the National Assessment of Educational Progress (NAEP) and the Program for International Student Assessment (PISA). For polytomous items, each response category provides information. If categories within an item are close together, the item information will be peaked near the center of the location parameter of category. However, if the categories are spread further apart, each can add information at a different location. Therefore, the item information for a polytomous item can have multiple peaks and can be spread over a broader extent of the ability range. Thus, polytomous items may contain more information than dichotomous items (e.g., Donoghue, 1994;Embretson and Reise, 2000, p. 95;Jodoin, 2003;Penfield and Bergeron, 2005;Yao, 2009;Christine, 2010;Tao et al., 2012). How to utilize the potential information difference hidden in different item types to improve estimates of the latent trait is the main concern in our study.
As mentioned above, it has been demonstrated that polytomous items can often provide more information than dichotomous items concerning the level of estimated latent trait . Meanwhile, different items of the same type may provide different amount of information about latent trait estimation. To improve the precision of ability estimation, the aim of this study is to develop an efficient item-weighting scheme by assigning different weights to different items in accordance with the amount of information for a certain latent trait level. As early as 40 years ago, Lord (1980) has considered to optimal item weights for dichotomously scored items. Tao et al. (2012) proposed a bias-reduced item-weighted likelihood estimation method, and Sun et al. (2012) proposed weighted maximum-a-posteriori estimation, which focused on differentiating the information gained from different item types. In their methods, the weights were pre-assigned and known or automatically selected such that the weights assigned to the polytomous items are larger than that assigned to the dichotomous items. They assign different weights to different item types, instead of assigning different weights to different items, and items of the same type all have the same weight. For convenience, we called these weighting methods type-weighted estimation. However, different items of the same type may have different information for a certain latent trait level; the same weights assigned to the same-type items may not be statistically optimal in terms of the precision and accuracy of ability estimation due to neglecting the difference in the individual item contribution. It is expected that assigning a weight for each item based on its own contribution may increase measurement precision.
The remainder of this paper is organized as follows. First, we present the MRMLC and its polytomous extension, and then the proposed item-weighted likelihood estimation (IWLE) method and the other two ability estimation methods: Warm's (1989) weighted likelihood estimation (WLE) and type-weighted maximum likelihood estimation (TWLE). Second, we show that the IWLE is consistent and asymptotically normal with mean zero and a variance-covariance matrix, and the bias of IWLE is of order n −1 . Third, a simulation study is conducted to compare the proposed IWLE method with MLE, MAP, WLE, and TWLE. Fourth, a simulation study is conducted to show IWLE can also be applied to general unidimensional item response models. Finally, we conclude this paper with discussion.

MRMLC and Its Polytomous Extension
The MRMLC assumes that the probability of a correct response by person l on item i at occasion k can be written as: where U ilk is the response variable with values in {0, 1},θ l1 is the initial ability of person l on the first occasion v = 1,θ l2 , ..., θ lk are modifiabilities that correspond to occasion k > 1, and b i is item difficulty Although the MRMLC may be applied to multiple occasions, for clarity, the model will be presented with only two occasions. To simplify the notation, the examinee subscript will not be shown in the following derivations. Using the abbreviated notations P i1 and P i2 for the probability of a correct item response for Occasions 1 and 2, respectively, and Regarding the polytomous items, we use the abbreviated notations P ij1 and P ij2 to denote the probability of selecting response category j (where j = 1, ..., h) of polytomous item i for Occasions 1 and 2, respectively, and Frontiers in Psychology | www.frontiersin.org To develop a conditional maximum likelihood estimation method for item parameters in the learning process model, Embretson (1991) constructed a data design structure for item calibration in which item blocks are counterbalanced in several occasions over groups. This data design matrix is needed to determine the occasion on which an item appears for an individual. Every item must be observed on every occasion, but to preserve local independence, an item should be administered only once to an individual across the two occasions. To incorporate Embretson's design structure, two groups of examinees are asked to respond to unique items on two occasions, k ig is now defined as a binary variable to indicate the occasion on which item i is administered to group g(g = 1, 2). Specifically, Thus, the probability of a response vector u = (u 1 , ..., u n ) in group g, P g for n items conditional on ability vector (θ 1 , θ 2 ), item difficulty vector b and item occasion vector k g , for k 1g , ..., k ng is given by: First, suppose that person l is assigned to a test condition group g that receives items I. For the following considerations, it is assumed that some of the items I = {I 1 , ..., I n } are presented at time point (Occasion) 1, called the "pretest, " denoted I 1 , and some items are presented at point time 2, called the "posttest, " denoted I 2 according to Fischer (2001). The nonempty item subsets I 1 and I 2 may be completely different, may overlap, or may be identical. For convenience, however, a notation is adopted where I 1 and I 2 are considered disjoint subsets of I, I 1 = I 1 , ..., I n 1 and I 2 = I n 1 +1 , ..., I n . However, the cases in which I 1 and I 2 overlap are implicitly covered; it suffices to let some I a ∈ I 1 have the same parameters as some I b ∈ I 2 . Let us consider mixedformat tests; specifically, k items I 1 , ..., I k are dichotomous and n 1 −k items I k+1 , ..., I n 1 are polytomous in the pretest; for the posttest, m−n 1 items I n 1 +1 , ..., I m are dichotomous and n−m items I m+1 , ..., I n are polytomous.

Maximum Likelihood Estimator
Now we consider the problem of likelihood estimation of ability θ = (θ 1 , θ 2 ). The likelihood function of responses is the product of two types of likelihood functions given local independence: where and The response matrix U contains the responses to dichotomous items u i , v i and the responses to polytomous items u ij , v ij . The conventional maximum likelihood estimator (MLE)θ can be obtained by maximizing the log-likelihood function logL(θ| U). Warm (1989) proposed a weighted likelihood estimation (WLE) method for dichotomous IRT model. Compared with the maximum likelihood estimation, Warm's weighted likelihood estimation method can obtain less bias estimation. Penfield and Bergeron (2005) extended this method to the case of the generalized partial credit model (GPCM). The weighted likelihood function of a mixed-type model can be expressed as:

Weighted Likelihood Estimator
where w(θ) is the weighting function, w(θ) = I 1 2 in one or two parameter models of IRT. w(θ) is multiplied by the likelihood function L(θ|U), and the product is maximized. WLE was proved to yield asymptotically normally distributed estimates, with finite variance, and with bias of only o n − 1 .

Item-Weighted Maximum Likelihood Estimator
In this section, we consider the following item-weighted likelihood function: where Frontiers in Psychology | www.frontiersin.org and are the item-weighted likelihood functions of the dichotomous model and the polytomous model of a mixed-format longitudinal test, respectively. Here the weight vector: Note that, where I i (θ) is the information function of item i given as: for polytomous item i. P i is the probability of a correct response to item i, Q i = 1−P i , P ij is the probability of selecting response category j (where j = 1, ..., h) of polytomous item i, and I(θ) = n i=1 I i (θ) is the test information function consisting both dichotomous and polytomous items (Muraki, 1993). Using the information ratio of each item to the test at a certain ability level, the weights of items are determined.
In IRT, the item and test information functions relate to how well an examinee's ability is being estimated over the whole ability scale; they are usually used to calculate the standard error of measurement and the reliability. Since the test information is a function of proficiency (or whatever trait or skill is measured) and the items on the test, the expression of the proposed weights involves the ability level θ and item characteristic parameters. The weights may be "adaptive" in the sense that they are allowed to be estimated based on the ability level and individual test items. By using the information ratio of each item to the test to determine the weights, so the more information an item has at a certain ability level, the larger weight could be assigned to it. According to the proposed weighting method, the weight for the polytomous item is then larger than that for the dichotomous item and the weights for the same type item are different due to the difference between the amounts of item information. The weight assigned to each item just indicates its contribution to the precision for ability parameter estimation. This item weighting scheme maximizes the information obtained from both different types of items and different items of the same type and may lead to more accurate estimates of the latent trait than equally weighting all items. If each item with same scoring procedure has same item information at a certain latent trait level, the weights are equal between them. Hence, the proposed itemweighted likelihood method may be an extension of the method proposed by Tao et al. (2012). The item-weighted likelihood estimator (IWLE) can be obtained by maximizing the itemweighted log-likelihood function log IWL(θ|U) (for derivation details, see Supplementary Appendix A). Maximum likelihood estimator (Lord, 1983) was shown to have bias of O n −1 . When the weights are determined at a certain ability level, with some assumptions made by Lord (1983), the bias of the item-weighted maximum likelihood estimation also has bias of O n −1 . The approach and techniques of this derivation were taken from, and parallel closely, the derivations in Lord (1983). The asymptotic properties of IWLM can be obtained by generalizing those of Bradley and Gart (1962) (for more details, see Supplementary Appendix B).

Type-Weighted Maximum Likelihood Estimator
In contrast to the MLE, the type-weighted maximum likelihood estimator (TWLE) yields usable ability estimator for mixed-type tests composed of both dichotomous and polytomous items (Sun et al., 2012). The type-weighted likelihood function of a mixedtype model can be expressed as: i=n 1 +1 I i , and I p = n 1 i=k+1 I i + n i=m+1 I i , are test information of the dichotomous and polytomous model based on the longitudinal model, respectively. According to the weighting scheme proposed by Sun et al. (2012), the ratio parameters α, β determined to make sure that the weight assigned to the polytomously scored item is larger than that assigned to the dichotomously scored item. Three steps are needed to determine the ratio parameters α, β and the two weights. First, we obtain the ML estimatorθ 0 and take it as the initial estimator. Second, if I d θ 0 < I p θ 0 , the two ratio parameters are all equal to 1. Otherwise, we may set α and β to be a small value ε (such as ε < 0.4) to make sure I d θ 0 < I p θ 0 . Then, no change is needed for either α or β ifw 1 θ 0 <w 2 θ 0 . Otherwise, we may increase α in increments of 0.05 or less, or decrease β in increments of 0.05 or less. We adjust α and β to ensurew 1 θ 0 < w 2 θ 0 . Third, we maximize the type-weighted log-likelihood function log TWL(θ|U) to obtainθ with the obtained α and β values from the above. Ifw 1 (θ) <w 2 (θ), theθ is the TWLE. Otherwise, the ratio parameters should be adjusted continually basing on the above process untilw 1 (θ) <w 2 (θ ).
The above three-weighted estimations TWLE, WLE, and IWLE have different weighting schemes. For TWLE, the larger weights are assigned to the polytomous items and the smaller weights are assigned to the dichotomous items. This method only assigns different weights to different item types, instead of assigning different weights to different items, thus items of the same type all have the same weight. However, different items of the same type may have different information about a certain latent trait level; the same weights assigned to the same-type items may not be statistically optimal in terms of the precision and accuracy of ability estimation due to neglecting the difference in the individual item contribution. The proposed IWLE assigns different weights to different items in accordance with the amount of the information an item provides at a certain latent trait level. Using the information ratio of each item to the test, the weights of items are determined. This improved IWLE procedure that incorporates item weights in likelihood functions for the ability parameter estimation may increase measurement precision. The WLE provides a bias correction to the maximum likelihood method. The weight function is multiplied by the likelihood function L(θ|U) in the WLE method, which provides a correction to the maximum likelihood estimation method by solving an weighted, log-likelihood equation. The WLE and IWLE are both consistent and asymptotically normal with mean zero and a variance-covariance matrix, and the bias of the estimators is of order n − 1 .

SIMULATION STUDY 1 Simulation Design
In this section, the performance of the three weighting methods, the WLE, the type-weighted likelihood estimation (TWLE), and IWLE are compared. To investigate the effects of the test-length and the proportion of dichotomous and polytomous items in a mixed-format test on the properties of the θ estimators, nine artificial tests were constructed at each time point, three of them short (10 items with 7, 5, and 3 dichotomous items), three medium (30 items with 20, 15, and 10 dichotomous items), and three long (60 items with 40, 30, and 20 dichotomous items). In the simulation, the 3 levels of test length were representative of those encountered in measuring settings using fixed-length tests. The 3 levels of proportion of dichotomous and polytomous items (λ = 2, 1, 0.5) were selected, so that we may have a thorough investigation into the property of different weighting methods.
The item parameters and ability parameters are set as follows. The difficulty parameters of the dichotomous items were randomly generated from the standard normal distribution N(0, 1). The polytomously scored items with four-category were constructed. The step parameters of each polytomous item were randomly generated from four normal distributions: This pattern of location parameters centers items on zero and thus centers the test on zero. In the simulation, 17 equally spaced θ 1 values were considered, ranging from −4.0 to 4.0 in increments of 0.5. We set 3 values of θ 2 (0.6, 0.8, and 1.0) for 3 different initial ability levels: high (value of θ 1 larger than 2), medium (value of θ 1 between −2 and 2), and low (value of θ 1 smaller than −2), respectively. Thus, a high initial ability will have low gain, a medium initial ability will have moderate gain, and a low initial ability will have high gain. At each level of (θ 1 , θ 2 ) , N(N = 1000) replications were administered for all 9 tests. In each replication, the dichotomous item responses were simulated according to the MRMLC model as presented in Equations 2 and 3, and the polytomous item responses were simulated according to the PCM as presented in Equations 4 and 5. For the tests containing response patterns consisting of all correct responses for dichotomous items and all 4s for polytomous items or all incorrect responses for dichotomous items and all 4s, the Newton-Raphson algorithm cannot converge, and thus the likelihood estimators could not be obtained. These response patterns were removed from the analysis, and the same item responses were scored using the WLE, TWLE, and IWLE procedures. In the simulation, the θ in the weight for each item is taken asθ, the MLE of θ. All levels of the number of items, the proportion of dichotomous and polytomous items, and the number of examinee were crossed, resulting in 27 conditions of test properties at each time point. For each of the 27 conditions of test properties, the WLE, TWLE, and IWLE were obtained for each of the response patterns.

Evaluation Criteria
The bias, absolute bias, root mean squared error (RMSE) and root mean squared difference (RMSD) of the ability estimates were used as evaluation criteria to examine all estimation methods. The absolute bias is calculated using Equation 13. In Equation 13, θ denotes the true ability value andθ l the corresponding ability estimate for the l th replication.
RMSE and RMSD are calculated using Equation 14 and 15, respectively: N is the number of replications. In simulation studies, we fix the number of replications at 1000, that is, N = 1000.

Results of Simulation
The weights of IWLE for 6 dichotomous and 3 polytomous items are shown in Figures 1, 2 The purpose of these figures is to give more intuition in terms of our item weighting scheme. The weights are based on the individual test items and the ability level, with θ 1 ranging from −4.0 to 4.0 and 3 values of θ 2 (0.6, 0.8, and 1.0). We can find that the different items are designed with FIGURE 1 | The weights of IWLE based on θ 1 for dichotomous items (item 1 to 6) and polytomous items (items 7 to 9) in test 1.
FIGURE 2 | Weights based on θ (θ = (θ 1 , θ 2 )) at 17 ability levels for dichotomous items (item 1 to 6) and polytomous items (items 7 to 9) in test 2. n 1 d + n 2 p means the (n 1 + n 2 ) -item test with n 1 dichotomous items and n 2 polytomous items. different weights. In addition, the weights assigned to polytomous items are larger than that of dichotomous items. Table 1 shows the correlation between the estimated abilities and the true abilities for all three weighting estimation methods under nine conditions. The higher degree of correlation obtained by the IWLE ability estimates indicates that the IWLE produces better quality ability estimates. The results in Table 1 indicate that the longer tests provide higher correlation between the estimated abilities and the true abilities. In the tests with the same length, higher proportion of polytomous and dichotomous items also provide higher correlation between the estimated abilities and the true abilities.
The simulation results of 3 test lengths show similar trends for the three weighting estimators: WLE, TWLE, and IWLE. Due to page limitation, only those for the 30-item test are presented. The complete results can be obtained from the author.
Examining these results, the following general trends are observed. The absolute bias are all nearly to zero for three estimators when |θ 1 | < 2, or θ 2 = 0.8, but IWLE has a considerably less absolute bias than the other two estimators when |θ 1 | > 2 or θ 2 = 0.6 and 1. We note that in the 3 simulation scenarios the absolute bias of IWLE is slightly larger than that of WLE at some level of θ 1 when |θ 1 | < 2, but is considerably smaller than that of WLE at the low and the high levels of    ability. IWLE consistently displays the level of absolute bias that is smaller than that of TWLE, especially substantially smaller than that of TWLE at the low and the high levels of ability. In addition, the absolute bias of WLE is less than that of TWLE at the extremes of ability level. However, the changes are observed when the proportion of the dichotomous and polytomous items in mixed-type test is changed. With the number of polytomous items increased, the absolute bias produced by TWLE and WLE are more similar, even TWLE produces a little larger absolute bias than WLE at the extremes of ability level. The similar  change patterns are also observed for RMSD produced by three estimators. The RMSD of IWLE is slightly larger than that of WLE at some level of θ 1 when |θ 1 | < 2, but is considerably smaller than that of WLE and TWLE at the low and the high levels of ability.
To investigate the performance of the proposed IWLE method, an simulation study was conducted for the comparison of the five estimators: MLE, MAP [with a non-informative prior distribution U(4, 4)] WLE, TWLE, and IWLE under the above simulation condition. Figures 3-8 show the results of       The RMSE presented in Figures 3-5 show that among the five θ 1 estimation methods, IWLE has a slight large RMSE when |θ 1 | < 2, but is considerably smaller than that of MLE, MAP, WLE and TWLE at extreme levels of the latent trait. The RMSE of WLE is very similar to that of MLE and TWLE. EAP has lower RMSE than MLE, WLE, TWLE, and IWLE in the middle of the ability range because of the shrinkage. The RMSE plotted in Figures 6-8 shows the similar change patterns for θ 2 .
The proposed IWLE method outperforms the MLE, MAP, WLE and TWLE in terms of controlling the absolute bias, RMSE, and RMSD at the low and the high levels of ability, but has a slight large RMSE and RMSD in the middle range of the ability scale.
In general, test length had a dramatic impact on the relative performance of the five estimators. We can observe the strongest differences between the five estimators are obtained when the test length is short. The absolute bias, RMSE, and RMSD of five estimation methods have a slightly decrease with the length of test increased. The proportion of dichotomous and polytomous items in a mixed-format test appears to affect the absolute bias, RMSE, and RMSD of five estimation methods.

SIMULATION STUDY 2
When we only care about the ability of the examinee without considering the ability growth at multiple time points, the FIGURE 9 | RMSE of the two θ estimation methods MLE and IWLE for 10p+20d.  unidimensional IRT models are the focus of many educational psychometrists. In fact, our IWLE method can't only be used to analyze multidimensional IRT models, but also can be implemented for unidimensional IRT models. In this simulation study, we evaluate the accuracy of the IWLE method in the unidimensional models.
The proposed IWLE method is applied to the unidimensional IRT models for mixed-format test that is the combination of the two-parameter logistic model and the partial-credit model. We consider the following item-weighted likelihood function: and IWL p (θ|U) = 24 n i=k+1 h j=1 ; P ij (θ) u ij w i (θ) , P i (θ) is determined by dichotomously scored items; P ik (θ) is determined by polytomously scored items. Here the weight w i (θ) assigned to item i is defined as equation 4, and n i=1 w i (θ) = 1. The 3 levels of test length (10 items, 30 items and 60 items) and the 3 levels of proportion of dichotomous and polytomous items (λ = 2, 1, 0.5) were selected. The item parameters were generated similar to simulation 1, and 17 equally spaced θ 1 values were considered, ranging from −4.0 to 4.0 in increments of 0.5.
The simulation results of three test lengths show similar trends. The proposed IWLE method outperforms the MLE in terms of the absolute bias, RMSE and RMSD at the low and high levels of ability. However, the IWLE has a slight large absolute bias, RMSE and RMSD in the middle range of the ability scale compared with the MLE . Figures 9-11 show the results of RMSE calculated from 30-item test. According to the simulation results, we find that the IWLE can also be applied to the general unidimensional IRT models for tests composed of both dichotomous and polytomous items.

DISCUSSION AND CONCLUSION
In this study, an improved IWLE procedure that incorporates item weights in likelihood functions for the ability parameter estimation is proposed. The weights may be "adaptive" in the sense that they are allowed to be estimated with the ability level and individual test items. We assign different weights to different items in accordance with the amount of the information an item provides at a certain latent trait level. Using the information ratio of each item to the test, the weights of items are determined. We also give the rigorous derivations for asymptotic properties and the bias of IWL estimators. The results from the simulation study clearly demonstrate that the proposed IWLE method outperforms the usual, MLE, MAP, WLE and TWLE in terms of controlling absolute bias, RMSE, and RMSD especially at low and high ability levels. Latent trait estimation is one of the most important components in IRT, but when an examinee scores high (or low) in a test, we known that the examinee is high (or low) on the trait but we do not have a very precise estimate of how high (or low). It could be considerably higher (or lower) than the test instrument' scale reaches. In the case, improving latent trait estimation especially at extreme levels of ability scale is worthy of attention.
Improving latent trait estimation is always important in longitudinal survey assessments, such as the Early Childhood Longitudinal Study (ECLS) and the PISA (von Davier and Xu, 2011), which aims at tracking growth of a representative sample of the target population over time. The proposed weighting scheme also can be applied in the general unidimensional item response models. Other issues should be further explored. First, the proposed weighting scheme could be generalized to other application settings where latent ability needs to be estimated for each person such as computerized adaptive testing (CAT). Second, although the Rasch model and the PCM are commonly used in practical tests, there are other more general item response models, for instance the three-parameter logistic (3PL) model and the generalized partial credit model. Therefore, it is worth studying the extension of the IWLE to these more complex models, with different test lengths and sample sizes. Third, more than two occasions can be considered in longitudinal study, so the proposed weighting method can be generalized to deal with more general situations. Finally, the proposed IWLE method can be extended to multidimensional longitudinal IRT model.
From a practical point of view, we would not use a test that is way too difficult or way too easy items. This is because each item should have a certain discrimination to distinguish the examinees with different ability levels. In fact, the reliability and validity of the test items are precalibrated before the actual assessment. When the examinees answer the pre-calibrated test, some examinees answer all items correctly while others do not answer all items correctly. In this case, the extreme ability estimator will occur. Thus, the extreme ability occur because there are large differences between examinees' abilities rather than items being too difficult or too easy (the test items are pre-calibrated, reliable and valid). In addition, the examinees were obtained through a multistage stratified sample in the actual assessment. In the first stage, the sampling population is classified according to district, and schools are selected at random. In the second stage, students are selected at random from each school. Therefore, in this case, there are some extreme cases of the examinees' ability. For example, some examinees with high abilities answer all the items correctly, or some examinees with low abilities answered all the items incorrectly. Traditional methods (WLE and TWLE) fail to estimate these extreme abilities. However, our IWLE method is more accurate in estimating these extreme abilities. This is the main advantage of our itemweighted scheme.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.