GENERAL COMMENTARY article

Front. Psychol., 30 October 2018

Sec. Quantitative Psychology and Measurement

Volume 9 - 2018 | https://doi.org/10.3389/fpsyg.2018.01988

Commentary: On the Importance of the Speed-Ability Trade-Off When Dealing With Not Reached Items

  • 1. Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany

  • 2. National Board of Medical Examiners, Philadelphia, PA, United States

Article metrics

View details

10

Citations

2,2k

Views

710

Downloads

In their 2018 article, (T&B) discuss how to deal with not reached items due to low working speed in ability tests (Tijmstra and Bolsinova, 2018). An important contribution of the paper is focusing on the question of how to define the targeted ability measure. In this note, we aim to add further aspects to this discussion and to propose alternative approaches.

Challenges in estimating optimal ability

Ignoring the dimensional structure

To show effects of too low working speed, T&B (p. 6) consider a model combining effective working speed and optimal ability

T&B assume two respondent groups: Compliers with and non-compliers with lower than optimal working speed, i.e., which implies if γp > 0. We refer to this group as slow non-compliers (slowNCs).

For compliers (with ), the model in (1) reduces to a one-dimensional IRT model since . For non-compliers, defining α1i = αi, α2ip = −γpαi and , a person-specific two-dimensional IRT model depending on the speed-ability trade-off (SAbT) parameter γp results, i.e.,

Apart from specific experimental settings, which are rarely feasible to implement in large-scale assessments, in practice this model cannot be estimated, so T&B resort to fixing γp to a constant for their simulations. This specifies a regular two-dimensional IRT for simulation, and using a unidimensional model for analysis will of course result in biased ability estimates, which can be quantified as follows

Only compliers with or respondents with γp = 0 would obtain unbiased person parameter estimates from a unidimensional model. Thus, bias is not a result of how missing responses are treated, but due to ignoring the dimensional structure.

Respondents faster than optimal

T&B only consider non-compliance as lower speed than optimal. However, most of the non-complying respondents show higher speed than optimal. Even respondents who manage responding to all items within the time limit will not have speed , but . This was noted by Kuhn and Ranger (2015) and shown in our own empirical data analyses (up to 70% of respondents without missing values finish the test some time before the time limit; Pohl, 2018; Pohl et al., under review; Ulitzsch et al., under review). Thus, a third group is needed in this discussion, which we will call faster non-compliers (fastNCs). Note that fastNCs—who will likely reach all items—will also receive biased estimates according to Equation (3). Hence, the issue of estimating optimal ability cannot solely be solved by focusing on the treatment of missing values.

Evaluation of missing data approaches

Assumption on the missing data process

When evaluating the performance of approaches for estimating optimal ability, one must consider a more realistic missing data mechanism including that (a) there is fastNC and (b) not reached items also occur due to quitting. In fact, in low stakes assessments quitting seems to be the main reason for not reached items (up to 90% of not reached items are due to quitting, see Pohl, 2018; Pohl et al., under review; Ulitzsch et al., under review). This will alter the results.

Performance of the missing data treatments

T&B conclude that incorrect scoring shows the best results compared to other approaches. First, T&B's result seems somewhat surprising since the finding on the performance of incorrect scoring stands in stark contrast to other published research on this approach (Lord, 1974; De Ayala et al., 2001; Rose et al., 2010; Pohl et al., 2014) which show that incorrect scoring results in highly biased parameter estimates whenever missing values do not only occur on otherwise incorrect responses. Second, note that scoring missing values as incorrect results in a different definition of the target ability for different subgroups. For slowNCs with missing values, scoring these as incorrect results in an overcorrection for speed while aiming at estimating optimal ability. For compliers and fastNCs no corrections for speed are made, as there are no missing data, but instead effective ability is estimated.

Discussion of proposed solutions

We appreciate the solutions proposed by T&B and want to add further aspects for consideration:

Non-speeded power tests rely on respondents (a) being aware of their own SAbT function and (b) being highly motivated to optimize performance. The first assumption is unlikely to hold in many applications. The second assumption may hold in high stakes assessments, while in low stakes assessments, for which the missing data approaches have been suggested, empirical data (e.g., Cosgrove and Cartwright, 2014; Pohl et al., under review); suggest otherwise. Also note that this solution requires moving from measuring optimal ability for a given time limit and instead opt for measuring effective ability given the chosen speed.

Item-level time limits help respondents to manage time and reduce variability in chosen speed. However, note that this solution (a) cannot resolve the issue of differences in speed across respondents as there will still be fastNCs and (b) induces other problems, as for example increased item omit rates or rapid guessing.

An alternative solution

One may conjecture that effective speed and effective ability more closely mirror real life behavior, which is typically the goal in large scale assessments (OECD, 2017). These may even be better predictors for later outcomes than optimal ability: In everyday situations there is no information on optimal speed but persons typically chose their speed given external time limits.

Pohl et al., under review and Ulitzsch et al., under review suggest describing performance of respondents by the profile of all dimensions of performance: effective ability, effective speed, and test endurance (as a measure of quitting behavior) and to use these dimensions for evaluating and comparing performance. This allows developing a richer description of differences in performance and to disentangle the different aspects involved. This also allows explaining differences in performance (e.g., Sachse et al., in preparation). If stakeholders are interested in only one score per domain, as for example for country rankings, we suggest using a constructive approach and decide either empirically (through prediction of key outcomes) or by means of a validity argument how to combine ability, speed, and test endurance by developing a composite score that reflects the combination one wants to focus on. One advantage of such an approach would be that this composite is the same for all respondents (not just for those with missing values). Note that this solution also works for omitted responses; these just need a slightly different modeling approach (Ulitzsch et al., 2018; Ulitzsch et al., under review).

Statements

Author contributions

SP wrote the first draft of the manuscript including the general outline of argumentation. MvD discussed these with SP and added further ideas. SP and MvD both revised the manuscript.

Funding

This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) as part of the project Using response times to account for missing data in competence tests (Grant No. PO1655/3-1) as well as part of the project Analyzing relations between latent competencies and context information in the National Educational Panel Study within the Priority Programme 1646: Education as a Lifelong Process (Grant No. PO1655/2-1).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  • 1

    Cosgrove J. Cartwright F. (2014). Changes in achievement on PISA: the case of Ireland and implications for international assessment practice. Large-Scale Assess. Educ.2:2. 10.1186/2196-0739-2-2

  • 2

    De Ayala R. J. Plake B. S. Impara J. C. (2001). The impact of omitted responses on the accuracy of ability estimation in item response theory. J. Educ. Meas.38, 213234. 10.1111/j.1745-3984.2001.tb01124.x

  • 3

    Kuhn J.-T. Ranger J. (2015). Measuring speed, ability, or motivation: a commentary on Goldhammer (2015). Measurement13, 173176. 10.1080/15366367.2015.1105065

  • 4

    Lord F. M. (1974). Estimation of latent ability and item parameters when there are omitted responses. Psychometrika39, 247264. 10.1007/BF02291471

  • 5

    OECD (2017). PISA 2015 Assessment and Analytical Framework: Science, Reading, Mathematic, Financial Literacy and Collaborative Problem Solving, Revised Edition. Paris: PISA, OECD Publishing.

  • 6

    Pohl S. (2018). Using Response Times to Model Missing Values in Competence Tests. Invited talk at the Department of Methodology and Statistics. Tilburg University.

  • 7

    Pohl S. Gräfe L. Rose N. (2014). Dealing with omitted and not reached items in competence tests - Evaluating approaches accounting for missing responses in IRT models. Educ. Psychol. Measur. 74, 423452. 10.1177/0013164413504926

  • 8

    Rose N. von Davier M. Xu X. (2010). Modeling nonignorable missing data with item response theory (IRT). ETS Res. Rep. Ser.2010:i53. 10.1002/j.2333-8504.2010.tb02218.x

  • 9

    Tijmstra J. Bolsinova M. (2018). On the importance of the speed-ability trade-off when dealing with not reached items. Front. Psychol.9:964. 10.3389/fpsyg.2018.00964

  • 10

    Ulitzsch E. Pohl S. von Davier M. (2018). Using nonresponse times to account for omitted items in competence tests, in Presentation at the 19. Annual Meeting of the National Council on Measurement and Education (NCME) (Washington, DC).

Summary

Keywords

missing values, response time, not reached items, speed-ability trade-off, time limit, speed-accuracy

Citation

Pohl S and von Davier M (2018) Commentary: On the Importance of the Speed-Ability Trade-Off When Dealing With Not Reached Items. Front. Psychol. 9:1988. doi: 10.3389/fpsyg.2018.01988

Received

31 July 2018

Accepted

27 September 2018

Published

30 October 2018

Volume

9 - 2018

Edited by

Ioannis Tsaousis, University of Crete, Greece

Reviewed by

Yong Luo, National Center for Assessment in Higher Education, Saudi Arabia

Updates

Copyright

*Correspondence: Steffi Pohl

This article was submitted to Quantitative Psychology and Measurement, a section of the journal Frontiers in Psychology

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics