<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Educ.</journal-id>
<journal-title>Frontiers in Education</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Educ.</abbrev-journal-title>
<issn pub-type="epub">2504-284X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/feduc.2021.648324</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Education</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>An Item Response Modeling Approach to Cognitive Load Measurement</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Ehrich</surname> <given-names>John Fitzgerald</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1032385/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Howard</surname> <given-names>Steven J.</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/189748/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Bokosmaty</surname> <given-names>Sahar</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1255016/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Woodcock</surname> <given-names>Stuart</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/988625/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Faculty of Arts, Macquarie University</institution>, <addr-line>Sydney, NSW</addr-line>, <country>Australia</country></aff>
<aff id="aff2"><sup>2</sup><institution>School of Education, Faculty of Social Sciences, University of Wollongong</institution>, <addr-line>Wollongong, NSW</addr-line>, <country>Australia</country></aff>
<aff id="aff3"><sup>3</sup><institution>School of Education and Professional Studies, Faculty of Arts, Education and Law, Griffith University</institution>, <addr-line>Southport, QLD</addr-line>, <country>Australia</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Kate M. Xu, Open University of the Netherlands, Netherlands</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Andrew J. Martin, University of New South Wales, Australia; Melina Klepsch, Abt. Lehr-Lernforschung, Universit&#x00E4;t Ulm, Germany</p></fn>
<corresp id="c001">&#x002A;Correspondence: John Fitzgerald Ehrich, <email>john.ehrich@mq.edu.au</email></corresp>
<fn fn-type="other" id="fn004"><p>This article was submitted to Assessment, Testing and Applied Measurement, a section of the journal Frontiers in Education</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>22</day>
<month>04</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>6</volume>
<elocation-id>648324</elocation-id>
<history>
<date date-type="received">
<day>31</day>
<month>12</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>22</day>
<month>03</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2021 Ehrich, Howard, Bokosmaty and Woodcock.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Ehrich, Howard, Bokosmaty and Woodcock</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>The accurate measurement of the cognitive load a learner encounters in a given task is critical to the understanding and application of Cognitive Load Theory (CLT). However, as a covert psychological construct, cognitive load represents a challenging measurement issue. To date, this challenge has been met mostly by subjective self-reports of cognitive load experienced in a learning situation. In this paper, we find that a valid and reliable index of cognitive load can be obtained through item response modeling of student performance. Specifically, estimates derived from item response modeling of relative difficulty (i.e., the difference between item difficulty and person ability locations) can function as a linear measure that combines the key components of cognitive load (i.e., mental load, mental effort, and performance). This index of cognitive load (<italic>relative</italic> difficulty) was tested for criterion (concurrent) validity in Year 2 learners (<italic>N</italic> = 91) performance on standardized educational numeracy and literacy assessments. Learners&#x2019; working memory (WM) capacity significantly predicted our proposed cognitive load (relative difficulty) index across both numeracy and literacy domains. That is, higher levels of WM were related to lower levels of cognitive load (relative difficulty), in line with fundamental predictions of CLT. These results illustrate the validity, utility and potential of this objective item response modeling approach to capturing individual differences in cognitive load across discrete learning tasks.</p>
</abstract>
<kwd-group>
<kwd>cognitive load</kwd>
<kwd>item response theory</kwd>
<kwd>mental effort</kwd>
<kwd>working memory</kwd>
<kwd>standardized test</kwd>
</kwd-group>
<counts>
<fig-count count="2"/>
<table-count count="4"/>
<equation-count count="1"/>
<ref-count count="54"/>
<page-count count="11"/>
<word-count count="0"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1">
<title>Introduction</title>
<p>The core goal of cognitive load theory (CLT) is the creation of learning environments that make optimal use of learners&#x2019; cognitive resources and reduce any demands extraneous to learning in order to optimize learning success (<xref ref-type="bibr" rid="B33">Paas et al., 2003</xref>, <xref ref-type="bibr" rid="B32">2004</xref>). In addition to the inherent complexity of information that is to be learned, the method of presenting information to learners also affects the cognitive load learners experience when acquiring knowledge and skills. However, the understanding and application of CLT requires methods to appraise cognitive load, which could be expected to differ across tasks, contexts and learners. To-date, this has been indexed mostly by subjective self-reports of cognitive load experienced in a learning situation. In this study, we evaluated a more objective and sensitive approach to indexing cognitive load experienced by learners.</p>
<sec id="S1.SS1">
<title>Cognitive Load: Definition, Sources and Measurement</title>
<p>Cognitive load is considered to be a complex multidimensional construct that consists of: (1) causal factors relating to the task, the learner and their interactive components; and (2) assessment factors such as mental load (ML), mental effort (ME), and performance (e.g., <xref ref-type="bibr" rid="B35">Paas and van Merri&#x00EB;nboer, 1994</xref>). The cognitive resources needed for a certain task comprise ML, which is a result of a task&#x2019;s content, presentation, structure, complexity and difficulty (<xref ref-type="bibr" rid="B30">Paas, 1992</xref>). On the other hand, the cognitive resources that are devoted to a task comprise ME (<xref ref-type="bibr" rid="B30">Paas, 1992</xref>; <xref ref-type="bibr" rid="B33">Paas et al., 2003</xref>). ME is intrinsic to the learner, and constitutes the degree to which cognitive resources are mobilized to enable processing and completion in complex tasks (<xref ref-type="bibr" rid="B35">Paas and van Merri&#x00EB;nboer, 1994</xref>). The causal factor of cognitive load relates to aspects such as the novelty of the task and environmental conditions, while factors relating to the learner involve aspects like working memory (WM) capacity and expertise. These task and learner factors interact to further influence performance through their influence on, for example, motivation.</p>
<p>Cognitive load can be understood within three broad categories &#x2013; intrinsic, extraneous, and germane (<xref ref-type="bibr" rid="B45">Sweller et al., 2019</xref>). Intrinsic cognitive load has to do with the complexity of the information which is being processed and subsumes the idea of &#x201C;element interactivity&#x201D; (<xref ref-type="bibr" rid="B45">Sweller et al., 2019</xref>). Element interactivity depends on the nature of the information and the prior knowledge of the learner processing the information. For example, complex tasks which require the processing of multiple interconnected elements are considered to have high element interactivity. By contrast, extraneous cognitive load has to do with how information is presented and the instructional procedures involved in the task. Manipulations of the presentation of instructional procedures can affect the level of element interactivity. Finally, germane cognitive load refers &#x201C;[&#x2026;] to the WM resources available to deal with the element interactivity associated with intrinsic cognitive load&#x201D; (<xref ref-type="bibr" rid="B42">Sweller, 2010</xref>, p. 126). Therefore, germane cognitive load is both linked to intrinsic and extraneous cognitive load. Germane cognitive load resources can only be utilized if extraneous cognitive load is not depleting WM resources. Moreover, germane cognitive load can redistribute WM resources to process complex tasks with high element interactivity (<xref ref-type="bibr" rid="B45">Sweller et al., 2019</xref>).</p>
<p>As a covert psychological construct, which can be expected to vary across tasks, contexts and learners, cognitive load constitutes a serious challenge in terms of its accurate measurement. Without precision in its capture, application of CLT is limited to the identification of conditions under which learning is superior or inferior, without the ability to accurately tailor these principles to the specific tasks, conditions and learners involved in a particular learning situation. For instance, the split attention effect would suggest that when learners are novice, essential information should be well integrated; however, this might not be expected at higher levels of expertise. Application of this principle to optimize learning outcomes amongst diverse tasks (e.g., in reading, numeracy, and science), diverse learners (e.g., in expertise and WM capacity), and in different contexts to which the research was conducted, is complicated without the ability to carefully appraise changes in cognitive load as conditions change.</p>
<p>When cognitive load is measured it is most often done through the use of a subjective ranking using a Likert scale asking for invested ME (e.g., <xref ref-type="bibr" rid="B28">Marcus et al., 1996</xref>; <xref ref-type="bibr" rid="B47">Tindall-Ford et al., 1997</xref>; <xref ref-type="bibr" rid="B39">Salden et al., 2004</xref>; <xref ref-type="bibr" rid="B15">Halabi et al., 2005</xref>). A primary reason is that this method is straightforward, simple to apply, shows evidence of reliability, construct validity, and does not interfere with learning (<xref ref-type="bibr" rid="B36">Paas et al., 1994</xref>; <xref ref-type="bibr" rid="B44">Sweller et al., 1998</xref>). For instance, <xref ref-type="bibr" rid="B30">Paas (1992)</xref> used a one-dimensional 9-point symmetrical category rating scale (Likert-type scale) for assessing learners&#x2019; ME in different phases of learning and performance. The scale ranged from 1 (very low mental effort) to 9 (very high mental effort), on which learners rank their ME during a learning and performance task. <xref ref-type="bibr" rid="B36">Paas et al. (1994)</xref> tested this subjective scale for its measurement properties and found that it had good reliability (e.g., Cronbach &#x03B1; = 0.82) and was sensitive to variation in small levels of cognitive load. Such evidence is taken to suggest that learners are capable of introspecting their cognitive processes and use this to quantify their ME.</p>
<p>However, this scale has been interpreted by some cognitive load researchers by substituting &#x201C;mental effort&#x201D; with &#x201C;task difficulty&#x201D; (e.g., <xref ref-type="bibr" rid="B8">Ayres, 2006</xref>; <xref ref-type="bibr" rid="B13">Cierniak et al., 2009</xref>). By itself, asking learners to rank difficulty of learning tasks as a measure of ME is problematic. While ME and task difficulty are no doubt related, as a consequence of factors such as prior knowledge, they are not identical (<xref ref-type="bibr" rid="B49">van Gog and Paas, 2008</xref>). For instance, when tasks are very difficult for learners, research shows they are often not stimulated to put in the required ME (<xref ref-type="bibr" rid="B52">Wright, 1984</xref>; <xref ref-type="bibr" rid="B53">Wright et al., 1986</xref>) and, as a result, may not be reflective of the task&#x2019;s cognitive load. Despite this, <xref ref-type="bibr" rid="B43">Sweller et al.(2011</xref>, p. 74) state that the subjective ME scale has &#x201C;[&#x2026;] been shown to be the most sensitive measure available to differentiate the cognitive load imposed by different instructional procedures.&#x201D;</p>
<p>From these scales, ME (cognitive load) is indexed through a combination of the learning result and learners&#x2019; ME. That is, a learning experience is considered more optimal if it has a higher average performance than an alternative condition. Yet when two instructional conditions record the same average performance the learning condition that requires less ME has higher instructional efficiency. Accordingly, the learning condition that needs more ME is considered to be less efficient than the one that requires learners to exert less ME. Using a cognitive load framework, <xref ref-type="bibr" rid="B34">Paas and van Merri&#x00EB;nboer (1993)</xref> suggested a method for quantifying this instructional efficiency. Their formula, <inline-formula><mml:math id="INEQ1"><mml:mrow><mml:mrow><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mo>-</mml:mo><mml:mi>R</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msqrt><mml:mn>2</mml:mn></mml:msqrt></mml:mfrac></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math></inline-formula> reconciles: (E), the relative efficiency of the instructional condition; (P), the standardized z-scores for test performance scores; and (R), the standardized z-scores for the ratings of cognitive load related to the task. Based on this formula, a learning condition would be more efficient when lower subjective ratings of cognitive load correspond with higher performance scores. These scores are calculated per learner and per task, and interpreted relative to an ideal slope of 1, where instructional efficiency = 0 (or performance is equal to ME). Proximity above or below this slope denotes high or low mental efficiency, respectively. This mental efficiency model has since been expanded to include factors such as motivation (<xref ref-type="bibr" rid="B19">Hummel et al., 2004</xref>).</p>
<p>However, <xref ref-type="bibr" rid="B16">Hoffman and Schraw (2010)</xref> have pointed out fundamental measurement concerns with <xref ref-type="bibr" rid="B34">Paas and van Merri&#x00EB;nboer&#x2019;s (1993)</xref> cognitive load efficiency model beyond the well-documented issues of using self-report measures, such as measurement error arising from rater bias and over-confidence (e.g., <xref ref-type="bibr" rid="B41">Stone, 2000</xref>; <xref ref-type="bibr" rid="B11">Burson et al., 2006</xref>). Hoffman and Schraw note that task performance scores and ME scores are not commensurable and do not share a common unit of measurement. Calculations derived from incommensurable variables are problematic for interpretative and computational reasons (see <xref ref-type="bibr" rid="B16">Hoffman and Schraw, 2010</xref>).</p>
<p>Recently, studies have attempted to measure the different aspects of cognitive load (e.g., <xref ref-type="bibr" rid="B26">Leppink et al., 2013</xref>; <xref ref-type="bibr" rid="B21">Klepsch et al., 2017</xref>; <xref ref-type="bibr" rid="B25">Krell, 2017</xref>). For example, <xref ref-type="bibr" rid="B25">Krell (2017)</xref> developed a seven-point Likert scale to measure self-reported levels of cognitive load. In this study, Krell used an item response theory (IRT) approach to test the linear functioning of the self-report scale. This scale consists of 12 items, half of which measure ML (i.e., the cognitive capacity to process tasks) and the other half to measure ME (the investment of cognitive capacity by persons to process tasks). Krell tested the scale on a large sample of high school students on the performance of a standardized science test. Krell found evidence that ML and ME were different dimensions and some evidence which suggest a causal role between ML and performance but not ME and performance.</p>
<p>Whereas the majority of cognitive load researchers have used subjective self-report, a range of objective cognitive load measurement techniques have also been explored by cognitive load researchers (for overviews see <xref ref-type="bibr" rid="B33">Paas et al., 2003</xref>; <xref ref-type="bibr" rid="B31">Paas et al., 2008</xref>). Whereas subjective techniques are normally used to get an estimate of overall cognitive load, that is, experienced load based on the whole task procedure, continuous objective techniques can be used to determine the dynamics of cognitive load through fluctuations in cognitive load from the beginning to the end of the task (<xref ref-type="bibr" rid="B54">Xie and Salvendy, 2000</xref>; <xref ref-type="bibr" rid="B33">Paas et al., 2003</xref>). Such approaches include neuroscience (e.g., <xref ref-type="bibr" rid="B7">Antonenko et al., 2010</xref>; <xref ref-type="bibr" rid="B17">Howard et al., 2015</xref>), physiological measurements such as heart rate (e.g., <xref ref-type="bibr" rid="B35">Paas and van Merri&#x00EB;nboer, 1994</xref>), pupil dilation (<xref ref-type="bibr" rid="B48">van Gerven et al., 2004</xref>), and blood glucose levels (e.g., <xref ref-type="bibr" rid="B40">Scholey et al., 2001</xref>).</p>
<p>Other objective cognitive load measurement techniques involve the use of secondary tasks. Secondary-task techniques are based on the assumption that performance on a secondary task can be used to reflect the level of cognitive load imposed by a primary task, and have been used successfully by several cognitive load researchers (e.g., <xref ref-type="bibr" rid="B12">Chandler and Sweller, 1996</xref>; <xref ref-type="bibr" rid="B28">Marcus et al., 1996</xref>). A recent and promising example of this technique is the rhythm method (<xref ref-type="bibr" rid="B37">Park and Br&#x00FC;nken, 2015</xref>; <xref ref-type="bibr" rid="B23">Korbach et al., 2018</xref>). With this technique participants have to execute a previously practiced rhythm continuously by foot tapping (secondary task) while learning (primary task). Eye-tracking analysis is another objective technique to measure cognitive load. These studies investigate fixation time and number of fixations on visual stimuli as indications of ME and cognitive load (see <xref ref-type="bibr" rid="B22">Korbach et al., 2017</xref>; <xref ref-type="bibr" rid="B24">Krejtz et al., 2018</xref>).</p>
<p>In summary, cognitive load has been measured primarily through the use of subjective self-report scales. Less common objective measures of cognitive load have been attained through brain imaging, the monitoring of physiological processes, the use of secondary tasks, and eye tracking. While such studies (e.g., neuroscientific (fMRI) approaches to cognitive load measurement) have shown great potential (<xref ref-type="bibr" rid="B50">Whelan, 2007</xref>) they are cumbersome, intrusive, require considerable technical expertise beyond the capability of most CLT researchers, and are unclear about which type of cognitive load is being measured. Moreover, such measurement approaches lack ecological validity and occur within laboratory settings outside of the typical classroom learning environment. An ideal measure of cognitive load would be objective, unobtrusive, and measurable within a typical classroom environment.</p>
</sec>
<sec id="S1.SS2">
<title>A Measure of Cognitive Load Through Rasch Modeling</title>
<p>Self-report Likert scale ratings do not constitute measures in so far as, technically, they are <italic>observations</italic> and, as such, do not meet the basic requirements of measurement (<xref ref-type="bibr" rid="B51">Wright, 1997</xref>). Likert scale raw scores provide ordinal data, which means that: (1) the scale is finite or limited to a small number of observations (e.g., 5-, 7-, or 9-point); and (2) that differences between observations (i.e., ratings) are not equidistant from each other, as in an interval or ratio level scale. For a scale to qualify as a linear measure it needs to be boundless, or not limited to a finite set of observations and, critically, needs to consist of equally divisible units. Hence, a serious problem of measurement error arises when Likert scales are used as substitute measures in parametric analyses, such as analyses of variance (ANOVA) (<xref ref-type="bibr" rid="B51">Wright, 1997</xref>). Ideally, a behavioral measure of ME would be derived through an objective procedure that fulfills the measurement principles of a linear continuum with interval-level units. Item response modeling presents such an opportunity, while using some of the same data (e.g., performance) as CLT efficiency indices.</p>
<sec id="S1.SS2.SSS1">
<title>The Rasch Model</title>
<p>The <xref ref-type="bibr" rid="B38">Rasch (1960)</xref> model, or the one parameter logistic model (1PL), is a commonly used model in IRT. The Rasch model is a mathematical model of probability predicated on a hierarchy of item difficulties. This hierarchy of item difficulty is determined by conformity to a Guttman scalar pattern. The model depicts the probability of getting an item correct/incorrect as a logistic function of the distance between a person&#x2019;s location (ability) and an item&#x2019;s location (difficulty). These location estimates are situated on the same linear scale (i.e., logit scale). This relationship is expressed below in mathematical form for dichotomous data (e.g., correct/incorrect test answers):</p>
<disp-formula id="S1.Ex1">
<mml:math id="M1">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>}</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B2;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mo rspace="5.3pt">-</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B4;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:msup>
</mml:msup>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo rspace="5.3pt">+</mml:mo>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B2;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mo rspace="5.3pt">-</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B4;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
<p>Where <italic>P</italic> = probability of <italic>X</italic> at person <italic>n</italic> for item <italic>i</italic> and where <italic>x</italic> represents either a correct (<italic>x</italic> = 1) or an incorrect (<italic>x</italic> = 0) response. Person locations are denoted as &#x03B2;<sub><italic>n</italic></sub> and item locations as &#x03B4;<sub><italic>i</italic></sub> (<xref ref-type="bibr" rid="B6">Andrich et al., 2010</xref>).</p>
<p>According to this model, an item&#x2019;s difficulty is defined as being equal to the level of ability at which 50% of persons respond successfully to that item. When the difficulty of any given item exceeds the ability of any given group of persons, a smaller percentage of persons respond successfully. A major strength of this model is that an analysis on raw data provide reliable and valid independent (stand-alone) measures of a person&#x2019;s ability and the difficulty of items. These reliable person ability and item difficulty parameters, attained through a person-item interaction, potentiates an objective measure of cognitive load. That is, the difference between item difficulty <italic>&#x03B4;</italic> and person ability &#x03B2; or (&#x03B4; <italic>&#x2013;</italic>&#x03B2;) provides an objective and performance-derived estimate of <italic>relative</italic> difficulty (or cognitive load experiences by the learner as a function of the learning task). The more the difficulty of an item exceeds the ability of the person, the greater the relative difficulty of <italic>that</italic> item for <italic>that</italic> person and, hence the greater cognitive load involved in correctly solving the item.</p>
<p>This approach reflects the interaction between measurable elements of cognitive load (i.e., ML, ME, and performance) and calibrates them within a single scalable trait/dimension. ML is captured through the transformation of raw performance data into reliable estimates of item/task difficulty. ME is estimated through the transformation of raw performance data into ability measures (and degree to which variation occurs with respect to difficulty estimates). This relative difficulty of items is analogous to ME as <italic>a measure of the amount of cognitive load</italic> involved in correctly responding to the task/item. This provides a summary interval level measurement of cognitive load derived by an objective mathematical procedure. It is important to note that this proposed cognitive load measure involves intrinsic cognitive load only and does <italic>not encompass extraneous cognitive load</italic>. The proposed measure deals solely with the complexity of the tasks and or difficulty of the test questions (element interactivity) and the background knowledge of the learners (e.g., their numeracy and literacy abilities).</p>
<p>By contrast with <xref ref-type="bibr" rid="B34">Paas and van Merri&#x00EB;nboer&#x2019;s (1993)</xref> efficiency model, which stem from calculations involving incommensurable variables, this IRT approach provides a psychometrically sound alternative. For example, <xref ref-type="bibr" rid="B34">Paas and van Merri&#x00EB;nboer&#x2019;s (1993)</xref> efficiency model <italic>uses two distinct scales</italic> to derive a measure of cognitive efficiency/cognitive load and calculates the difference between <italic>z</italic> score performance and z score effort as an efficiency measure. By contrast, a probabilistic IRT analysis transforms the raw data of a <italic>single</italic> performance measure and derives item difficulty and person ability parameters from this measurement scale (i.e., test or task scores). IRT probabilistic transformation of raw performance scores into these two parameter estimates are located on a single logit scale in interval level units. Hence, the subtraction of the ability estimates from the difficulty estimates per person item interaction is psychometrically sound as these estimates share a common logit scale.</p>
</sec>
</sec>
<sec id="S1.SS3">
<title>The Present Study</title>
<p>We understand the concept of test validity as defined by <xref ref-type="bibr" rid="B20">Kane (2013)</xref> who presents an argument-based approach. In this approach &#x201C;&#x2026;to validate an interpretation or use of test scores is to evaluate the plausibility of the claims based on the test scores&#x201D; (p. 1). This validity framework consists of (1) stating the proposed interpretation and use of the test scores and (2) evaluating the plausibility of such proposals (<xref ref-type="bibr" rid="B20">Kane, 2013</xref>).</p>
<p>In the current study, and following from <xref ref-type="bibr" rid="B20">Kane&#x2019;s (2013)</xref> argument-based approach, we specifically propose that IRT derived statistics from standardized numeracy and literacy test scores can provide proxy measures to determine variance in learners&#x2019; intrinsic cognitive load. In order to evaluate the plausibility of this proposal we demonstrate two types of validity evidence: construct validity and concurrent criterion validity. Evidence of construct validity is demonstrated through an IRT analysis on the National Assessment Program &#x2013; Literacy and Numeracy (NAPLAN) standardized test data (e.g., correct item functioning, reliability testing, and fit to the Rasch model). Moreover, we evaluate the plausibility of this proposal by attaining concurrent criterion validity evidence. Our hypothesis (H1) for criterion validity was that WM should inversely predict the relative difficulty/cognitive load requirement of learners. That is, concordant with CLT theory, higher WM capacity would decrease the experience of cognitive load and give preliminary support for the utility of this index to measure learners&#x2019; cognitive load.</p>
</sec>
</sec>
<sec id="S2" sec-type="materials|methods">
<title>Materials and Methods</title>
<sec id="S2.SS1">
<title>Participants</title>
<p>Ninety-one primary school primary school-aged learners in Grade 2 (aged 7&#x2013;8 years) participated in this study. Learners were recruited across three regional (<italic>n</italic> = 29) and two metropolitan schools (<italic>n</italic> = 62), with a balanced gender ratio of boys (<italic>n</italic> = 42), and girls (<italic>n</italic> = 49). All learners spoke English as their first language and had no known developmental delay or disorder.</p>
</sec>
<sec id="S2.SS2">
<title>Measures</title>
<sec id="S2.SS2.SSS1">
<title>Learning Assessment</title>
<p>An out-of-circulation version of Australia&#x2019;s National Assessment Program &#x2013; Literacy and Numeracy (NAPLAN) test was administered as the learning task (<xref ref-type="bibr" rid="B1">ACARA, 2011</xref>). Specifically, a numeracy test (35 multiple-choice questions) and a language conventions test which consists of a spelling subtest (25 multiple-choice questions) and a grammar subtest (25 multiple-choice questions) of NAPLAN were selected to provide raw performance data. These assessments were administered in a group setting within the students&#x2019; classrooms, which followed the protocols of the NAPLAN test.</p>
</sec>
<sec id="S2.SS2.SSS2">
<title>Working Memory</title>
<p>Phonological and visual-spatial WM was measured by respective &#x201C;Not This&#x201D; and &#x201C;Mr Ant&#x201D; tasks from the Early Years Toolbox (EYT; <xref ref-type="bibr" rid="B18">Howard and Melhuish, 2017</xref>). These tasks are administered via iPad to collect scores and timing measures.</p>
</sec>
<sec id="S2.SS2.SSS3">
<title>Phonological WM</title>
<p>The iPad-based EYT &#x201C;Not This&#x201D; task (<xref ref-type="bibr" rid="B18">Howard and Melhuish, 2017</xref>) involves the presentation of an auditory instruction, against a blank screen, to find a stimulus that does not have certain characteristics of color, shape, or size (or a combination of these; e.g., &#x2018;Point to a shape that is not red and not a circle). After a brief retention interval, participants are then shown a stimulus array from which to identify a stimulus that satisfies the auditory instruction. The task increases in complexity from level 1 (one feature to recall) to level eight (eight features to recall). Each level consists of five trials and at least three successful responses are required to proceed to the next level. The task ends if participants fail to achieve three or more successful trials within a level, or the completion of level eight. WM capacity is estimated using a point score, calculated as: one point for each successive level, starting at the first, in which at least three trials are performed correctly and then 1/5 of a point for each successful trial thereafter.</p>
</sec>
<sec id="S2.SS2.SSS4">
<title>Visual-Spatial WM</title>
<p>The iPad-based EYT &#x201C;Mr Ant&#x201D; task (<xref ref-type="bibr" rid="B18">Howard and Melhuish, 2017</xref>) involves recall of an increasing number of stickers placed on various locations of a cartoon ant. The task increases in complexity from level one (recalling the placement of one sticker) to level eight (recalling the placement of eight stickers). The task consists of three trials per level and failure on all trials at a given level (or completion of level eight) ends the task. In test trials, a cartoon ant with sticker/s is presented for 5 s, followed by a blank screen for 4 s, before the return of the cartoon ant without any stickers. Participants respond by tapping on the location of the missing sticker/s. WM capacity is estimated by a point score, calculated as: 1 point for each successive level, starting at the first, in which at least two trials are performed correctly and then 1/3 of a point for each successful trial thereafter.</p>
</sec>
</sec>
<sec id="S2.SS3">
<title>Procedure</title>
<p>NAPAN tests were administered in two group sessions within students&#x2019; classrooms, across 2 days, starting with language conventions. This order and spacing is consistent with NAPLAN administration (<xref ref-type="bibr" rid="B9">Board of Studies Teaching and Educational Standards NSW (BOSTESNSW), 2015</xref>). Absent students completed the missed test on the day of their return to school. After completion of the NAPLAN assessments, the WM tasks were administered in a single session individually and in a quiet room. The tasks were administered in a fixed random order, as follows: RSPM; Mr Ant; and Not This. The classroom teacher was present throughout the testing phase and was on hand to assist students who had questions.</p>
</sec>
</sec>
<sec id="S3">
<title>Results</title>
<sec id="S3.SS1">
<title>Rasch Analyses</title>
<p>The proposed indices of cognitive load were derived from Rasch modeling analyses of the NAPLAN test performances (numeracy and language conventions). These data were analyzed using the dichotomous Rasch model, run on Rasch Unidimensional Measurement Modeling (RUMM) 2030 software (<xref ref-type="bibr" rid="B6">Andrich et al., 2010</xref>; for a complete interpretation of Rasch analysis, see <xref ref-type="bibr" rid="B46">Tennant and Conaghan, 2007</xref>). Overall fit of the data to the Rasch model indicated good model fit for both tests (chi-square all <italic>p</italic> &#x003E; 0.05) (see <xref ref-type="table" rid="T1">Table 1</xref> for summary of fit statistics). The Person Separation Index (PSI), a reliability index on the transformed logistic data, indicated very good reliability for all three tests (0.85&#x2013;0.86), as did the Cronbach alpha reliability indices (0.86&#x2013;0.94).</p>
<table-wrap position="float" id="T1">
<label>TABLE 1</label>
<caption><p>Rasch analysis summary statistics of the NAPLAN numeracy and language conventions tests.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left"></td>
<td valign="top" align="center" colspan="2">Item trait Interaction<hr/></td>
<td valign="top" align="center">PSI</td>
<td valign="top" align="center">&#x03B1;</td>
</tr>
<tr>
<td valign="top" align="left">Test type</td>
<td valign="top" align="center">Value (df)</td>
<td valign="top" align="center"><italic>p</italic></td>
<td/>
<td/>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Numeracy</td>
<td valign="top" align="center">088.3 (70)</td>
<td valign="top" align="center">0.07</td>
<td valign="top" align="center">0.85</td>
<td valign="top" align="center">0.86</td>
</tr>
<tr>
<td valign="top" align="left">Language conventions</td>
<td valign="top" align="center">105.8 (96)</td>
<td valign="top" align="center">0.23</td>
<td valign="top" align="center">0.86</td>
<td valign="top" align="center">0.94</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<attrib><italic>&#x002A;<italic>p</italic>s &#x003C; 0.05 are statistically significant. PSI, person separation index.</italic></attrib>
</table-wrap-foot>
</table-wrap>
<p>The individual fit of items to the Rasch model are identified by fit residuals outside the acceptable ranges (&#x2264;2.50 and &#x003E;2.50). Residuals constitute the difference between the observed values and the theoretical Rasch estimates. Individual item misfit can also be detected by significant chi-square and <italic>F</italic> statistics, where an insignificant <italic>p</italic> value (&#x003E;0.05) indicates good fit to the Rasch model. Misfit can also be detected by examination of an item&#x2019;s item characteristic curve (ICC). ICCs plot the observed values against the theoretical Rasch-derived estimates represented as an s-shaped curve; the closer the proximity between the observed values and the theoretical curve the better the fit and vice versa.</p>
<p>One item in the language conventions test (item 48) was found to misfit the model (&#x03C7;<sup>2</sup> = 0.72, <italic>p</italic> &#x003C; 0.001) at Bonferroni adjusted alpha = 0.001 and was removed from the analysis. Also, Item 25 in the language conventions test had an extreme score (defined as all responses correct or incorrect) and was not used in the analysis. Otherwise, individual item fit was acceptable for all items of each test. Overall, all tests showed evidence of good reliability and construct validity (as good fit to the unidimensional Rasch model and correct functioning of items). The spread of items relative to the ability of the learners in the numeracy and language conventions tests are depicted in <xref ref-type="fig" rid="F1">Figures 1</xref>, <xref ref-type="fig" rid="F2">2</xref>, respectively.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>Wright map of the spread of learner ability and item difficulty on the NAPLAN numeracy test (in logits). Learner abilities (on the left) range from the least able on the bottom to the most able on the top of the graph. Item difficulties (on the right) range from the least difficult on the bottom to the most difficult on the top. The map indicates that the test was difficult with the majority of learners indicating their ability levels were lower than the difficulty of the majority of items.</p></caption>
<graphic xlink:href="feduc-06-648324-g001.tif"/>
</fig>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Wright Map of the spread of learner ability and item difficulty on the NAPLAN language conventions test (in logits). Learner abilities (on the left) range from the least able on the bottom to the most able on the top of the graph. Item difficulties (on the right) range from the least difficult on the bottom to the most difficult on the top. The map indicates that the test was very difficult for 20 learners&#x2019; whose ability fell below the easiest item (item 26). Overall, the majority of items fell above the ability of the majority of learners indicating a difficult test.</p></caption>
<graphic xlink:href="feduc-06-648324-g002.tif"/>
</fig>
<p>The high reliability indices and well-functioning of items according to the Rasch model constitutes significant evidence of the precision of the test score data which we will use to formulate our proposed intrinsic cognitive load measure. Following <xref ref-type="bibr" rid="B20">Kane&#x2019;s (2013)</xref> validity argument approach, such evidence of the precision of our test score data will support the plausibility and generalizability of our proposed measure.</p>
<sec id="S3.SS1.SSS1">
<title>Relative Difficulty/Cognitive Load Measures</title>
<p>Essentially, our proposed cognitive load index is a measure of the relative difficulty of test items. This relative difficulty measure was calculated from the subsequent IRT analysis on the NAPLAN numeracy and language conventions test data. These relative difficulty/cognitive load measures were calculated for each test dimension by subtracting the IRT derived person ability estimates from the item difficulty estimates for each person-item interaction. The descriptives for these measures are depicted in <xref ref-type="table" rid="T2">Table 2</xref> as logits and depict the mean relative difficulty/cognitive load for each person-item interaction across the two test domains.</p>
<table-wrap position="float" id="T2">
<label>TABLE 2</label>
<caption><p>Descriptive statistics for item response derived measures of relative difficulty/cognitive load for the numeracy and language conventions tests.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Relative difficulty</td>
<td valign="top" align="center">Mean</td>
<td valign="top" align="center">SD</td>
<td valign="top" align="center">Skewness</td>
<td valign="top" align="center">Kurtosis</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Numeracy</td>
<td valign="top" align="center">3.31</td>
<td valign="top" align="center">1.29</td>
<td valign="top" align="center">0.20 (0.25)</td>
<td valign="top" align="center">0.54 (0.50)</td>
</tr>
<tr>
<td valign="top" align="left">Language conventions</td>
<td valign="top" align="center">4.20</td>
<td valign="top" align="center">2.11</td>
<td valign="top" align="center">0.33 (0.25)</td>
<td valign="top" align="center">&#x2212;1.23 (0.50)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<attrib><italic>These metrics are denoted in logit values and indicate the average amount of cognitive load capacity utilized to complete the full tests (numeracy and language conventions) per test taker. SD = standard deviation. Standard errors are denoted in parentheses.</italic></attrib>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="S3.SS1.SSS2">
<title>Multiple Regression Analyses</title>
<p>The results of the multiple regression for Model 1 (numeracy relative difficulty/cognitive load) indicated that the two WM predictors significantly explained 20% of the variance [<italic>R</italic><sup>2</sup> = 0.20, <italic>F</italic>(2,87) = 10.63, <italic>p</italic> &#x003C; 0.001]. Phonological WM made the strongest contribution to explaining numeracy relative difficulty/cognitive load and accounted for 9% unique variance while visual-spatial WM was found to contribute 6% unique variance. It was found that as phonological WM increased by one standard deviation the relative difficulty/cognitive load index decreased by 0.31 standard deviations (&#x03B2; = &#x2212;0.31, <italic>p</italic> &#x003C; 0.01), as did visual spatial WM, which decreased by 0.26 standard deviations (&#x03B2; = &#x2212;0.26, <italic>p</italic> &#x003C; 0.05). Model 2 (language conventions relative difficulty/cognitive load) indicated that the predictors explained 7% of the variance (<italic>R</italic><sup>2</sup> = 0.07, <italic>F</italic>(2,87) = 0 3.18, <italic>p</italic> &#x003C; 0.05). However, only phonological WM significantly contributed to unique variance (6%). As phonological WM increased by 1 standard deviation the relative difficulty/cognitive load index decreased by 0.26 standard deviations (&#x03B2; = &#x2212;0.26, <italic>p</italic> &#x003C; 0.05). Correlations of these variables are listed in <xref ref-type="table" rid="T3">Table 3</xref> and results of the regression models are summarized in <xref ref-type="table" rid="T4">Table 4</xref>.</p>
<table-wrap position="float" id="T3">
<label>TABLE 3</label>
<caption><p>Summary of intercorrelations.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Measure</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">2</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">4</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1. Numeracy (relative difficulty)</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2212;0.539&#x002A;&#x002A;&#x002A;</td>
<td valign="top" align="center">&#x2212;0.369&#x002A;&#x002A;&#x002A;</td>
<td valign="top" align="center">&#x2212;0.338&#x002A;&#x002A;</td>
</tr>
<tr>
<td valign="top" align="left">2. Language conventions (relative difficulty)</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2212;0.262&#x002A;</td>
<td valign="top" align="center">&#x2212;0.095</td>
</tr>
<tr>
<td valign="top" align="left">3. Phonological working memory</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2212;0.221&#x002A;</td>
</tr>
<tr>
<td valign="top" align="left">4. Visual spatial working memory</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">&#x2013;</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<attrib><italic>&#x002A;<italic>p</italic>s &#x003C; 0.05; &#x002A;&#x002A;<italic>p</italic>s &#x003C; 0.01; &#x002A;&#x002A;&#x002A;<italic>ps</italic> &#x003C; 0.001.</italic></attrib>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T4">
<label>TABLE 4</label>
<caption><p>Multiple regression results for working memory predicting relative difficulty/cognitive load measures.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left"></td>
<td valign="top" align="center"><italic>B</italic></td>
<td valign="top" align="center">SE B</td>
<td valign="top" align="center">&#x03B2;</td>
<td valign="top" align="center"><italic>t</italic></td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><bold>Model 1</bold></td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left"><bold>Numeracy</bold></td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Constant</td>
<td valign="top" align="center">5.846</td>
<td valign="top" align="center">0.573</td>
<td/>
<td valign="top" align="center">10.201&#x002A;&#x002A;&#x002A;</td>
</tr>
<tr>
<td valign="top" align="left">Phonological WM</td>
<td valign="top" align="center">&#x2212;0.529</td>
<td valign="top" align="center">0.169</td>
<td valign="top" align="center">&#x2212;0.311</td>
<td valign="top" align="center">&#x2212;03.130&#x002A;&#x002A;</td>
</tr>
<tr>
<td valign="top" align="left">Visual spatial WM</td>
<td valign="top" align="center">&#x2212;0.289</td>
<td valign="top" align="center">0.111</td>
<td valign="top" align="center">&#x2212;0.260</td>
<td valign="top" align="center">&#x2212;02.609&#x002A;</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Model 2</bold></td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left"><bold>Language conventions</bold></td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Constant</td>
<td valign="top" align="center">6.426</td>
<td valign="top" align="center">1.009</td>
<td/>
<td valign="top" align="center">6.372&#x002A;&#x002A;&#x002A;</td>
</tr>
<tr>
<td valign="top" align="left">Phonological WM</td>
<td valign="top" align="center">&#x2212;0.707</td>
<td valign="top" align="center">0.297</td>
<td valign="top" align="center">&#x2212;0.255</td>
<td valign="top" align="center">&#x2212;2.375&#x002A;</td>
</tr>
<tr>
<td valign="top" align="left">Visual spatial WM</td>
<td valign="top" align="center">&#x2212;0.058</td>
<td valign="top" align="center">0.195</td>
<td valign="top" align="center">&#x2212;0.032</td>
<td valign="top" align="center">0.767</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<attrib><italic>&#x002A;<italic>p</italic>s &#x003C; 0.05; &#x002A;&#x002A;<italic>p</italic>s &#x003C; 0.01; &#x002A;&#x002A;&#x002A;<italic>p</italic>s &#x003C; 0.001 are statistically significant.</italic></attrib>
</table-wrap-foot>
</table-wrap>
</sec>
</sec>
</sec>
<sec id="S4">
<title>Discussion</title>
<p>The aim of the current study was to evaluate the potential of item response modeling to generate an objective measure of intrinsic cognitive load. Results indicated that valid and reliable indices of intrinsic cognitive load can be attained by item response modeling of raw test data (or other series of complex tasks/problems within a single domain) at an interval scale level. The interaction of the two parameter estimates (item difficulty and person ability) combine into a single scalable measure, in logits, subsuming critical elements of the measurable aspects of cognitive load: ML (i.e., task difficulty) and ME (performance measures transposed into ability logits). In support of our hypothesis (H1), resulting relative difficulty indices&#x2013;that is, subtraction of the person ability estimates from the item difficulty estimates&#x2013;were related to cognitive resources, in the expected direction, functions as an estimate of cognitive load. This IRT approach to estimating intrinsic cognitive load is superior to subjective self-report measures as it meets the requirements of objective measurement (<xref ref-type="bibr" rid="B5">Andrich, 2004</xref>).</p>
<p>Our findings provide clear validity evidence for the plausibility of our interpretations and utility of our IRT-based measure to indicate a learner&#x2019;s intrinsic cognitive load capacity. This evidence was demonstrated through a concurrent criterion validity approach in that a learner&#x2019;s WM capacity was found to significantly predict our proposed cognitive load index within both numeracy and literacy domains. We found both phonological and visual spatial WM scores significantly accounted for 20% of the variance of cognitive load in the numeracy domain. This finding is consistent with prior research which has found that phonological and visual spatial WM are important predictors of numeracy processing (<xref ref-type="bibr" rid="B3">Alloway and Alloway, 2010</xref>; <xref ref-type="bibr" rid="B4">Alloway and Passolunghi, 2011</xref>). While phonological WM significantly captured 7% of the variance of our novel cognitive load index in the language conventions domain (combined spelling and grammar tasks), visual-spatial WM played no significant role.</p>
<p>A possible explanation for these results, that is, the small amount of variance captured by phonological WM and lack of predictive role of visual-spatial WM on our cognitive load measure may have to do with the nature of the language convention spelling and grammar tasks. In the language conventions sections of the NAPLAN tests, the spelling items consist of identification of misspelt words. The mental resources needed for this type of processing do not require deliberate thought and essentially require retrieval from long-term memory if the word is known and guessing in the case of an unknown word (though in some cases the application of spelling rules may apply). Similarly, in the grammatical section of the language conventions test the format consists of short cloze activities where a sentence is presented, and students choose the correct missing grammatical form. Here, knowledge of the correct conjugation or form of the verb or auxiliary is all that is needed to successfully complete the task. The degree to which deliberate thought is needed to control the processing of information is minimal and hence the ME and WM capacities on these tasks would not be optimal. According to <xref ref-type="bibr" rid="B35">Paas and van Merri&#x00EB;nboer&#x2019;s (1994)</xref> cognitive load model, the automatic processing of information bypasses the requirement of drawing on ME resources and feeds directly into performance. Hence, this type of automatic processing may have sufficiently limited the cognitive capacity requirements in the language conventions domain.</p>
<p>Our findings may also simply be reflective of the reduced role of visual spatial WM in language processing. For example, it is well established that visual spatial WM is important for early numeracy processing (<xref ref-type="bibr" rid="B29">McKenzie et al., 2003</xref>; <xref ref-type="bibr" rid="B10">Bull et al., 2008</xref>). Moreover, in the year three NAPLAN numeracy tests many questions comprise visual &#x201C;patterns&#x201D; (or similar) and consequently involve visual processing along the lines of what was assessed by the visual spatial WM tasks. By contrast, such item types requiring visual processing were not present in the language conventions test used. Therefore, this may explain the lesser role of visual spatial WM processing as a predictor of our proposed cognitive load index.</p>
<p>Overall, however, our findings indicated that higher levels of cognitive resources were related to lower levels of cognitive load requirements and vice versa. This is consistent with fundamental underpinnings of CLT (<xref ref-type="bibr" rid="B45">Sweller et al., 2019</xref>), which suggest that: cognitive load and WM capacity share an inverse relationship, such that deficiency in one aspect can be rectified by reduction in the other; and that a reduction in cognitive load can facilitate learning and performance.</p>
<p>Our proposed IRT modeling approach to cognitive load measurement provides a relatively simple and straightforward procedure to attain reliable and valid estimates of intrinsic cognitive load. While IRT modeling and Rasch analysis has been available to social scientists and psychologists for many decades now few have taken advantage of its superior measurement capabilities. Moreover, the creative potential of IRT modeling and its applications to cognitive load research, as well as educational and psychological research in general, has yet to be actualized.</p>
<p>As we have shown in this study, IRT modeling can provide an objective measure of intrinsic cognitive load outside of subjective self-report. This is particularly pertinent given the difficulty in attaining reliable self-report measures on cognitive processing of younger children (i.e., less than 7 years) (<xref ref-type="bibr" rid="B14">Conjin et al., 2020</xref>). The ability to ascertain reliable and valid measures of intrinsic cognitive load through a performance-based objective mathematical procedure is highly beneficial, especially for cognitive load researchers interested in measuring younger learners&#x2019; cognitive load. Moreover, this objective IRT modeling approach has ecological validity in that the performance data (i.e., tasks, problems, and questions) are collected within the classroom learning environment and are unobtrusive. The innovation of IRT and Rasch modeling into the cognitive load research paradigm offers exciting measurement opportunities beyond subjective self-report approaches.</p>
<sec id="S4.SS1">
<title>Limitations</title>
<p>We wish to acknowledge several limitations of this study. First, while our study has demonstrated the utility and validity of IRT modeling to quantify intrinsic cognitive load it is important to note that IRT analysis requires large sample sizes. In the case of the current study sample size was not such an issue because we used standardized tests which have already been validated with large (nationwide) samples using IRT analyses (<xref ref-type="bibr" rid="B2">ACARA, 2020</xref>). Normally, a reliable IRT analysis requires (<italic>N</italic> = 200) or so (<xref ref-type="bibr" rid="B27">Linacre, 1994</xref>). Hence, IRT analysis may be beyond the scope of typical smaller experimental classroom-based cognitive load investigations. Second, our sample of learners were younger than the target age of the tests and this was reflected somewhat in the IRT analysis, in that many learners found the test difficult.</p>
</sec>
<sec id="S4.SS2">
<title>Future Directions</title>
<p>The current study has shown that our relative difficulty/cognitive load index varies with WM in relation to intrinsic cognitive load. Further validation of this measure would benefit from evaluation of the index to determine whether it varies according to the learner task following CLT principles (e.g., extraneous and germane load) and through construct (i.e., convergent) validity testing to establish the measure&#x2019;s relationship with other cognitive load scales (e.g., <xref ref-type="bibr" rid="B30">Paas, 1992</xref>; <xref ref-type="bibr" rid="B26">Leppink et al., 2013</xref>; <xref ref-type="bibr" rid="B25">Krell, 2017</xref>). Such research is needed to show that our proposed cognitive load index varies with theoretical variations in cognitive load. Additionally, it would be desirable to investigate the performance of our proposed cognitive load index with learners at varying stages of age and development. Finally, our proposed cognitive load index may be a useful measure for those undertaking intervention research where the index can be used to assess shifts in relative difficulty (cognitive load) scores across stages of learner development.</p>
</sec>
</sec>
<sec id="S5">
<title>Data Availability Statement</title>
<p>The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.</p>
</sec>
<sec id="S6">
<title>Ethics Statement</title>
<p>The studies involving human participants were reviewed and approved by University of Wollongong. Written informed consent to participate in this study was provided by the participants&#x2019; legal guardian/next of kin.</p>
</sec>
<sec id="S7">
<title>Author Contributions</title>
<p>All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><collab>ACARA</collab> (<year>2011</year>). <source><italic>NAPLAN. Australian Curriculum Assessment and Reporting Authority (ACARA).</italic></source> Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.nap.edu.au/naplan/naplan.html">http://www.nap.edu.au/naplan/naplan.html</ext-link> <comment>(accessed January 2, 2020)</comment>.</citation></ref>
<ref id="B2"><citation citation-type="journal"><collab>ACARA</collab> (<year>2020</year>). <article-title>Reliability and Validity of NAPLAN. Australian Curriculum Assessment and Reporting Authority.</article-title> Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.nap.edu.au/resources">https://www.nap.edu.au/resources</ext-link> <comment>(accessed January 2, 2020)</comment>.</citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alloway</surname> <given-names>T. P.</given-names></name> <name><surname>Alloway</surname> <given-names>R. G.</given-names></name></person-group> (<year>2010</year>). <article-title>Investigating the predictive roles of working memory and IQ in academic attainment.</article-title> <source><italic>J. Exp. Child Psychol.</italic></source> <volume>106</volume> <fpage>20</fpage>&#x2013;<lpage>29</lpage>. <pub-id pub-id-type="doi">10.1016/j.jecp.2009.11.003</pub-id> <pub-id pub-id-type="pmid">20018296</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alloway</surname> <given-names>T. P.</given-names></name> <name><surname>Passolunghi</surname> <given-names>M. C.</given-names></name></person-group> (<year>2011</year>). <article-title>The relationship between working memory, IQ, and mathematical skills in children.</article-title> <source><italic>Learn. Individ. Dif.</italic></source> <volume>21</volume> <fpage>133</fpage>&#x2013;<lpage>137</lpage>. <pub-id pub-id-type="doi">10.1016/j.lindif.2010.09.013</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Andrich</surname> <given-names>D.</given-names></name></person-group> (<year>2004</year>). <article-title>Controversy and the Rasch model: a characteristic of incompatible paradigms?</article-title> <source><italic>Med. Care</italic></source> <volume>42(Suppl. 1)</volume> <fpage>1</fpage>&#x2013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1097/01.mlr.0000103528.48582.7c</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Andrich</surname> <given-names>D.</given-names></name> <name><surname>Sheridan</surname> <given-names>B.</given-names></name> <name><surname>Luo</surname> <given-names>G.</given-names></name></person-group> (<year>2010</year>). <source><italic>RUMM2030: A Windows Program for the Rasch Unidimensional Measurement Model (User Manual: Part 1 Dichotomous Data).</italic></source> <publisher-loc>Perth, WA</publisher-loc>: <publisher-name>RUMM Laboratory</publisher-name>.</citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Antonenko</surname> <given-names>P.</given-names></name> <name><surname>Paas</surname> <given-names>F.</given-names></name> <name><surname>Grabner</surname> <given-names>R.</given-names></name> <name><surname>van Gog</surname> <given-names>T.</given-names></name></person-group> (<year>2010</year>). <article-title>Using electroencephalography to measure cognitive load.</article-title> <source><italic>Educ. Psychol. Rev.</italic></source> <volume>22</volume> <fpage>425</fpage>&#x2013;<lpage>438</lpage>. <pub-id pub-id-type="doi">10.1007/s10648-010-9130-y</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ayres</surname> <given-names>P.</given-names></name></person-group> (<year>2006</year>). <article-title>Impact of reducing intrinsic cognitive load on learning in a mathematical domain.</article-title> <source><italic>Appl. Cogn. Psychol.</italic></source> <volume>20</volume> <fpage>287</fpage>&#x2013;<lpage>298</lpage>. <pub-id pub-id-type="doi">10.1002/acp.1245</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><collab>Board of Studies Teaching and Educational Standards NSW (BOSTESNSW)</collab> (<year>2015</year>). <source><italic>NAPLAN.</italic></source> Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.boardofstudies.nsw.edu.au/naplan/">http://www.boardofstudies.nsw.edu.au/naplan/</ext-link> <comment>(accessed May 31, 2016)</comment>.</citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bull</surname> <given-names>R.</given-names></name> <name><surname>Espy</surname> <given-names>K. A.</given-names></name> <name><surname>Wiebe</surname> <given-names>S. A.</given-names></name></person-group> (<year>2008</year>). <article-title>Short-term memory, working memory, and executive functioning in preschoolers: longitudinal predictors of mathematical achievement at age 7 years.</article-title> <source><italic>Dev. Neuropsychol.</italic></source> <volume>33</volume> <fpage>205</fpage>&#x2013;<lpage>228</lpage>. <pub-id pub-id-type="doi">10.1080/87565640801982312</pub-id> <pub-id pub-id-type="pmid">18473197</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Burson</surname> <given-names>K. A.</given-names></name> <name><surname>Larrick</surname> <given-names>R. P.</given-names></name> <name><surname>Klayman</surname> <given-names>J.</given-names></name></person-group> (<year>2006</year>). <article-title>Skilled or unskilled, but still unaware of it: perceptions of difficulty drive miscalibration in relative comparisons.</article-title> <source><italic>J. Pers. Soc. Psychol.</italic></source> <volume>90</volume> <fpage>60</fpage>&#x2013;<lpage>77</lpage>. <pub-id pub-id-type="doi">10.1037/0022-3514.90.1.60</pub-id> <pub-id pub-id-type="pmid">16448310</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chandler</surname> <given-names>P.</given-names></name> <name><surname>Sweller</surname> <given-names>J.</given-names></name></person-group> (<year>1996</year>). <article-title>Cognitive load while learning to use a computer program.</article-title> <source><italic>Appl. Cogn. Psychol.</italic></source> <volume>10</volume> <fpage>151</fpage>&#x2013;<lpage>170</lpage>. <pub-id pub-id-type="doi">10.1002/(sici)1099-0720(199604)10:2&#x003C;151::aid-acp380&#x003E;3.0.co;2-u</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cierniak</surname> <given-names>G.</given-names></name> <name><surname>Scheiter</surname> <given-names>K.</given-names></name> <name><surname>Gerjets</surname> <given-names>P.</given-names></name></person-group> (<year>2009</year>). <article-title>Explaining the split attention effect: is the reduction of extraneous cognitive load accompanied by an increase in germane cognitive load?</article-title> <source><italic>Comput. Hum. Behav.</italic></source> <volume>25</volume> <fpage>315</fpage>&#x2013;<lpage>324</lpage>. <pub-id pub-id-type="doi">10.1016/j.chb.2008.12.020</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Conjin</surname> <given-names>J. M.</given-names></name> <name><surname>Smits</surname> <given-names>N.</given-names></name> <name><surname>Hartman</surname> <given-names>E. E.</given-names></name></person-group> (<year>2020</year>). <article-title>Determining at what age children provide sound self-reports: an illustration of the validity-index approach.</article-title> <source><italic>Assessment</italic></source> <volume>27</volume> <fpage>1604</fpage>&#x2013;<lpage>1618</lpage>. <pub-id pub-id-type="doi">10.1177/1073191119832655</pub-id> <pub-id pub-id-type="pmid">30829047</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Halabi</surname> <given-names>A. K.</given-names></name> <name><surname>Tuovinen</surname> <given-names>J. E.</given-names></name> <name><surname>Farley</surname> <given-names>A. A.</given-names></name></person-group> (<year>2005</year>). <article-title>Empirical evidence on the relative efficiency of worked examples versus problem-solving exercises in accounting principles instruction.</article-title> <source><italic>Issues Account. Educ.</italic></source> <volume>20</volume> <fpage>21</fpage>&#x2013;<lpage>32</lpage>. <pub-id pub-id-type="doi">10.2308/iace.2005.20.1.21</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hoffman</surname> <given-names>B.</given-names></name> <name><surname>Schraw</surname> <given-names>G.</given-names></name></person-group> (<year>2010</year>). <article-title>Conceptions of efficiency: applications in learning and problem solving.</article-title> <source><italic>Educ. Psychol.</italic></source> <volume>45</volume> <fpage>1</fpage>&#x2013;<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1080/00461520903213618</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Howard</surname> <given-names>S.</given-names></name> <name><surname>Burianova</surname> <given-names>H.</given-names></name> <name><surname>Ehrich</surname> <given-names>J.</given-names></name> <name><surname>Kervin</surname> <given-names>L.</given-names></name> <name><surname>Calleia</surname> <given-names>A.</given-names></name> <name><surname>Barkus</surname> <given-names>E.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>Behavioural and fMRI evidence of the differing cognitive load of domain-specific assessments.</article-title> <source><italic>Neuroscience</italic></source> <volume>297</volume> <fpage>38</fpage>&#x2013;<lpage>46</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroscience.2015.03.047</pub-id> <pub-id pub-id-type="pmid">25818553</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Howard</surname> <given-names>S. J.</given-names></name> <name><surname>Melhuish</surname> <given-names>E. C.</given-names></name></person-group> (<year>2017</year>). <article-title>An early years toolbox (EYT) for assessing early executive function, language, self-regulation, and social development: validity, reliability, and preliminary norms.</article-title> <source><italic>J. Psychoeduc. Assess.</italic></source> <volume>35</volume> <fpage>255</fpage>&#x2013;<lpage>275</lpage>. <pub-id pub-id-type="doi">10.1177/0734282916633009</pub-id> <pub-id pub-id-type="pmid">28503022</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hummel</surname> <given-names>H. G. K.</given-names></name> <name><surname>Paas</surname> <given-names>F.</given-names></name> <name><surname>Koper</surname> <given-names>E. J. R.</given-names></name></person-group> (<year>2004</year>). <article-title>Cueing for transfer in multimedia programmes: process worksheets vs. worked-out examples.</article-title> <source><italic>J. Comput. Assist. Learn.</italic></source> <volume>20</volume> <fpage>387</fpage>&#x2013;<lpage>397</lpage>. <pub-id pub-id-type="doi">10.1111/j.1365-2729.2004.00098.x</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kane</surname> <given-names>M. T.</given-names></name></person-group> (<year>2013</year>). <article-title>Validating the interpretations and uses of test scores.</article-title> <source><italic>J. Educ. Meas.</italic></source> <volume>50</volume> <fpage>1</fpage>&#x2013;<lpage>73</lpage>. <pub-id pub-id-type="doi">10.1111/jedm.12000</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Klepsch</surname> <given-names>M.</given-names></name> <name><surname>Schmitz</surname> <given-names>F.</given-names></name> <name><surname>Seufert</surname> <given-names>T.</given-names></name></person-group> (<year>2017</year>). <article-title>Development and validation of two instruments measuring intrinsic, extraneous, and germane cognitive load.</article-title> <source><italic>Front. Psychol.</italic></source> <volume>8</volume>:<issue>1997</issue>. <pub-id pub-id-type="doi">10.3389/fpsyg.2017.01997</pub-id> <pub-id pub-id-type="pmid">29201011</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Korbach</surname> <given-names>A.</given-names></name> <name><surname>Br&#x00FC;nken</surname> <given-names>R.</given-names></name> <name><surname>Park</surname> <given-names>B.</given-names></name></person-group> (<year>2017</year>). <article-title>Measurement of cognitive load in multimedia learning: a comparison of different objective measures.</article-title> <source><italic>Instr. Sci.</italic></source> <volume>45</volume> <fpage>515</fpage>&#x2013;<lpage>536</lpage>. <pub-id pub-id-type="doi">10.1007/s11251-017-9413-5</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Korbach</surname> <given-names>A.</given-names></name> <name><surname>Br&#x00FC;nken</surname> <given-names>R.</given-names></name> <name><surname>Park</surname> <given-names>B.</given-names></name></person-group> (<year>2018</year>). <article-title>Differentiating different types of cognitive load: a comparison of different measures.</article-title> <source><italic>Educ. Psychol. Rev.</italic></source> <volume>30</volume> <fpage>503</fpage>&#x2013;<lpage>529</lpage>. <pub-id pub-id-type="doi">10.1007/s10648-017-9404-8</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krejtz</surname> <given-names>K.</given-names></name> <name><surname>Duchowski</surname> <given-names>A. T.</given-names></name> <name><surname>Niedzielska</surname> <given-names>A.</given-names></name> <name><surname>Biele</surname> <given-names>C.</given-names></name> <name><surname>Krejtz</surname> <given-names>I.</given-names></name></person-group> (<year>2018</year>). <article-title>Eye tracking cognitive load using pupil diameter and microsaccades with fixed gaze.</article-title> <source><italic>PLoS One</italic></source> <volume>13</volume>:<issue>e0203629</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0203629</pub-id> <pub-id pub-id-type="pmid">30216385</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krell</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>Evaluating an instrument to measure mental load and mental effort considering different sources of validity evidence.</article-title> <source><italic>Cogent Educ.</italic></source> <volume>4</volume>:<issue>1280256</issue>. <pub-id pub-id-type="doi">10.1080/2331186X.2017.1280256</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leppink</surname> <given-names>J.</given-names></name> <name><surname>Paas</surname> <given-names>F.</given-names></name> <name><surname>Vander Vleuten</surname> <given-names>C. P. M.</given-names></name> <name><surname>van Gog</surname> <given-names>T.</given-names></name> <name><surname>van Merri&#x00EB;nboer</surname> <given-names>J. J. G.</given-names></name></person-group> (<year>2013</year>). <article-title>Development of an instrument for measuring different types of cognitive load.</article-title> <source><italic>Behav. Res. Methods</italic></source> <volume>45</volume> <fpage>1058</fpage>&#x2013;<lpage>1072</lpage>. <pub-id pub-id-type="doi">10.3758/s13428-013-0334-1</pub-id> <pub-id pub-id-type="pmid">23572251</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Linacre</surname> <given-names>J. M.</given-names></name></person-group> (<year>1994</year>). <article-title>Sample size and item calibrations stability.</article-title> <source><italic>Rasch Meas. Trans.</italic></source> <volume>7</volume>:<issue>328</issue>.</citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marcus</surname> <given-names>N.</given-names></name> <name><surname>Cooper</surname> <given-names>M.</given-names></name> <name><surname>Sweller</surname> <given-names>J.</given-names></name></person-group> (<year>1996</year>). <article-title>Understanding instructions.</article-title> <source><italic>J. Educ. Psychol.</italic></source> <volume>88</volume> <fpage>49</fpage>&#x2013;<lpage>63</lpage>.</citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>McKenzie</surname> <given-names>B.</given-names></name> <name><surname>Bull</surname> <given-names>R.</given-names></name> <name><surname>Gray</surname> <given-names>C.</given-names></name></person-group> (<year>2003</year>). <article-title>The effects of phonological and visual-spatial interference on children&#x2019;s arithmetical performance.</article-title> <source><italic>Educ. Child Psychol.</italic></source> <volume>20</volume> <fpage>93</fpage>&#x2013;<lpage>108</lpage>.</citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paas</surname> <given-names>F. G. W. C.</given-names></name></person-group> (<year>1992</year>). <article-title>Training strategies for attaining transfer of problem-solving skill in statistics: a cognitive-load approach.</article-title> <source><italic>J. Educ. Psychol.</italic></source> <volume>84</volume> <fpage>429</fpage>&#x2013;<lpage>434</lpage>. <pub-id pub-id-type="doi">10.1037/0022-0663.84.4.429</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paas</surname> <given-names>F. G. W. C.</given-names></name> <name><surname>Ayres</surname> <given-names>P.</given-names></name> <name><surname>Pachman</surname> <given-names>M.</given-names></name></person-group> (<year>2008</year>). &#x201C;<article-title>Assessment of cognitive load in multimedia learning environments: theory, methods, and applications</article-title>,&#x201D; in <source><italic>Recent Innovations in Educational Technology that Facilitate Student Learning</italic></source>, <role>eds</role> <person-group person-group-type="editor"><name><surname>Robinson</surname> <given-names>D. H.</given-names></name> <name><surname>Schraw</surname> <given-names>G. J.</given-names></name></person-group> (<publisher-loc>Charlotte, NC</publisher-loc>: <publisher-name>Information Age</publisher-name>), <fpage>11</fpage>&#x2013;<lpage>35</lpage>.</citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paas</surname> <given-names>F. G. W. C.</given-names></name> <name><surname>Renkl</surname> <given-names>A.</given-names></name> <name><surname>Sweller</surname> <given-names>J.</given-names></name></person-group> (<year>2004</year>). <article-title>Cognitive load theory: instructional implications of the interaction between information structures and cognitive architecture.</article-title> <source><italic>Instr. Sci.</italic></source> <volume>32</volume> <fpage>1</fpage>&#x2013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1023/b:truc.0000021806.17516.d0</pub-id></citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paas</surname> <given-names>F. G. W. C.</given-names></name> <name><surname>Tuovinen</surname> <given-names>J.</given-names></name> <name><surname>Tabbers</surname> <given-names>H.</given-names></name> <name><surname>van Gerven</surname> <given-names>P.</given-names></name></person-group> (<year>2003</year>). <article-title>Cognitive load measurement as a means to advance cognitive load theory.</article-title> <source><italic>Educ. Psychol.</italic></source> <volume>38</volume> <fpage>63</fpage>&#x2013;<lpage>71</lpage>. <pub-id pub-id-type="doi">10.1207/s15326985ep3801_8</pub-id> <pub-id pub-id-type="pmid">33486653</pub-id></citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paas</surname> <given-names>F. G. W. C.</given-names></name> <name><surname>van Merri&#x00EB;nboer</surname> <given-names>J. J. G.</given-names></name></person-group> (<year>1993</year>). <article-title>The efficiency of instructional conditions: an approach to combine mental effort and performance measures.</article-title> <source><italic>Hum. Factors</italic></source> <volume>35</volume> <fpage>737</fpage>&#x2013;<lpage>743</lpage>. <pub-id pub-id-type="doi">10.1177/001872089303500412</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paas</surname> <given-names>F. G. W. C.</given-names></name> <name><surname>van Merri&#x00EB;nboer</surname> <given-names>J. J. G.</given-names></name></person-group> (<year>1994</year>). <article-title>Instructional control of cognitive load in the training of complex cognitive tasks.</article-title> <source><italic>Educ. Psychol. Rev.</italic></source> <volume>6</volume> <fpage>351</fpage>&#x2013;<lpage>371</lpage>. <pub-id pub-id-type="doi">10.1007/bf02213420</pub-id></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paas</surname> <given-names>F. G. W. C.</given-names></name> <name><surname>van Merri&#x00EB;nboer</surname> <given-names>J. J. G.</given-names></name> <name><surname>Adam</surname> <given-names>J. J.</given-names></name></person-group> (<year>1994</year>). <article-title>Measurement of cognitive load in instructional research.</article-title> <source><italic>Percept. Mot. Skills</italic></source> <volume>79</volume> <fpage>419</fpage>&#x2013;<lpage>430</lpage>. <pub-id pub-id-type="doi">10.2466/pms.1994.79.1.419</pub-id> <pub-id pub-id-type="pmid">7808878</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Park</surname> <given-names>B.</given-names></name> <name><surname>Br&#x00FC;nken</surname> <given-names>R.</given-names></name></person-group> (<year>2015</year>). <article-title>The rhythm method: a new method for measuring cognitive load&#x2014;an experimental dual&#x2212;task study.</article-title> <source><italic>Appl. Cogn. Psychol.</italic></source> <volume>29</volume> <fpage>232</fpage>&#x2013;<lpage>243</lpage>. <pub-id pub-id-type="doi">10.1002/acp.3100</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rasch</surname> <given-names>G.</given-names></name></person-group> (<year>1960</year>). <source><italic>Probabilistic Models for Some Intelligence and Attainment Tests.</italic></source> <publisher-loc>Chicago, IL</publisher-loc>: <publisher-name>University of Chicago Press</publisher-name>.</citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Salden</surname> <given-names>R. J. C. M.</given-names></name> <name><surname>Paas</surname> <given-names>F. G. W. C.</given-names></name> <name><surname>Broers</surname> <given-names>N. J.</given-names></name> <name><surname>van Merri&#x00EB;nboer</surname> <given-names>J. J. G.</given-names></name></person-group> (<year>2004</year>). <article-title>Mental effort and performance as determinants for the dynamic selection of learning tasks in air traffic control training.</article-title> <source><italic>Instr. Sci.</italic></source> <volume>32</volume> <fpage>153</fpage>&#x2013;<lpage>172</lpage>. <pub-id pub-id-type="doi">10.1023/b:truc.0000021814.03996.ff</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scholey</surname> <given-names>A. B.</given-names></name> <name><surname>Harper</surname> <given-names>S.</given-names></name> <name><surname>Kennedy</surname> <given-names>D. O.</given-names></name></person-group> (<year>2001</year>). <article-title>Cognitive demand and blood glucose.</article-title> <source><italic>Physiol. Behav.</italic></source> <volume>73</volume> <fpage>585</fpage>&#x2013;<lpage>592</lpage>. <pub-id pub-id-type="doi">10.1016/s0031-9384(01)00476-0</pub-id></citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stone</surname> <given-names>N. J.</given-names></name></person-group> (<year>2000</year>). <article-title>Exploring the relationship between calibration and self-regulated learning.</article-title> <source><italic>Educ. Psychol. Rev.</italic></source> <volume>12</volume> <fpage>437</fpage>&#x2013;<lpage>476</lpage>.</citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sweller</surname> <given-names>J.</given-names></name></person-group> (<year>2010</year>). <article-title>Element interactivity and intrinsic, extraneous, and germane cognitive load.</article-title> <source><italic>Educ. Psychol. Rev.</italic></source> <volume>22</volume> <fpage>123</fpage>&#x2013;<lpage>138</lpage>. <pub-id pub-id-type="doi">10.1007/s10648-010-9128-5</pub-id></citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sweller</surname> <given-names>J.</given-names></name> <name><surname>Ayres</surname> <given-names>P.</given-names></name> <name><surname>Kalyuga</surname> <given-names>S.</given-names></name></person-group> (<year>2011</year>). <source><italic>Cognitive Load Theory.</italic></source> <publisher-loc>London</publisher-loc>: <publisher-name>Springer</publisher-name>.</citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sweller</surname> <given-names>J.</given-names></name> <name><surname>van Merri&#x00EB;nboer</surname> <given-names>J. J. G.</given-names></name> <name><surname>Paas</surname> <given-names>F. G. W. C.</given-names></name></person-group> (<year>1998</year>). <article-title>Cognitive architecture and instructional design.</article-title> <source><italic>Educ. Psychol. Rev.</italic></source> <volume>10</volume> <fpage>251</fpage>&#x2013;<lpage>296</lpage>.</citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sweller</surname> <given-names>J.</given-names></name> <name><surname>van Merri&#x00EB;nboer</surname> <given-names>J. J. G.</given-names></name> <name><surname>Paas</surname> <given-names>F. G. W. C.</given-names></name></person-group> (<year>2019</year>). <article-title>Cognitive architecture and instructional design: 20 years later.</article-title> <source><italic>Educ. Psychol. Rev.</italic></source> <volume>31</volume> <fpage>261</fpage>&#x2013;<lpage>292</lpage>. <pub-id pub-id-type="doi">10.1007/s10648-019-09465-5</pub-id></citation></ref>
<ref id="B46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tennant</surname> <given-names>A.</given-names></name> <name><surname>Conaghan</surname> <given-names>P. G.</given-names></name></person-group> (<year>2007</year>). <article-title>The Rasch measurement model in rheumatology: what is it and why use it? When should it be applied, and what should one look for in a Rasch paper?</article-title> <source><italic>Arthritis Care Res.</italic></source> <volume>57</volume> <fpage>1358</fpage>&#x2013;<lpage>1362</lpage>. <pub-id pub-id-type="doi">10.1002/art.23108</pub-id> <pub-id pub-id-type="pmid">18050173</pub-id></citation></ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tindall-Ford</surname> <given-names>S.</given-names></name> <name><surname>Chandler</surname> <given-names>P.</given-names></name> <name><surname>Sweller</surname> <given-names>J.</given-names></name></person-group> (<year>1997</year>). <article-title>When two sensory modes are better than one.</article-title> <source><italic>J. Exp. Psychol. Appl.</italic></source> <volume>3</volume> <fpage>257</fpage>&#x2013;<lpage>287</lpage>. <pub-id pub-id-type="doi">10.1037/1076-898x.3.4.257</pub-id></citation></ref>
<ref id="B48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Gerven</surname> <given-names>P. W. M.</given-names></name> <name><surname>Paas</surname> <given-names>F. G. W. C.</given-names></name> <name><surname>van Merri&#x00EB;nboer</surname> <given-names>J. J. G.</given-names></name> <name><surname>Schmidt</surname> <given-names>H. G.</given-names></name></person-group> (<year>2004</year>). <article-title>Memory load and the cognitive pupillary response in aging.</article-title> <source><italic>Psychophysiology</italic></source> <volume>41</volume> <fpage>167</fpage>&#x2013;<lpage>174</lpage>. <pub-id pub-id-type="doi">10.1111/j.1469-8986.2003.00148.x</pub-id> <pub-id pub-id-type="pmid">15032982</pub-id></citation></ref>
<ref id="B49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Gog</surname> <given-names>T.</given-names></name> <name><surname>Paas</surname> <given-names>F. G. W. C.</given-names></name></person-group> (<year>2008</year>). <article-title>Instructional efficiency: revisiting the original construct in educational research.</article-title> <source><italic>Educ. Psychol.</italic></source> <volume>43</volume> <fpage>16</fpage>&#x2013;<lpage>26</lpage>. <pub-id pub-id-type="doi">10.1080/00461520701756248</pub-id></citation></ref>
<ref id="B50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Whelan</surname> <given-names>R. R.</given-names></name></person-group> (<year>2007</year>). <article-title>Neuroimaging of cognitive load in instructional multimedia.</article-title> <source><italic>Educ. Res. Rev.</italic></source> <volume>2</volume> <fpage>1</fpage>&#x2013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1016/j.edurev.2006.11.001</pub-id></citation></ref>
<ref id="B51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wright</surname> <given-names>B.</given-names></name></person-group> (<year>1997</year>). <article-title>A history of social science measurement.</article-title> <source><italic>Educ. Meas. Issues Pract.</italic></source> <volume>16</volume> <fpage>36</fpage>&#x2013;<lpage>52</lpage>.</citation></ref>
<ref id="B52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wright</surname> <given-names>R.</given-names></name></person-group> (<year>1984</year>). <article-title>Motivation, anxiety, and the difficulty of avoidance control.</article-title> <source><italic>J. Pers. Soc. Psychol.</italic></source> <volume>46</volume> <fpage>1376</fpage>&#x2013;<lpage>1388</lpage>. <pub-id pub-id-type="doi">10.1037/0022-3514.46.6.1376</pub-id> <pub-id pub-id-type="pmid">6737218</pub-id></citation></ref>
<ref id="B53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wright</surname> <given-names>R.</given-names></name> <name><surname>Contrada</surname> <given-names>R.</given-names></name> <name><surname>Patane</surname> <given-names>M.</given-names></name></person-group> (<year>1986</year>). <article-title>Task difficulty, cardiovascular response, and the magnitude of goal valence.</article-title> <source><italic>J. Pers. Soc. Psychol.</italic></source> <volume>51</volume> <fpage>837</fpage>&#x2013;<lpage>843</lpage>. <pub-id pub-id-type="doi">10.1037/0022-3514.51.4.837</pub-id> <pub-id pub-id-type="pmid">3783427</pub-id></citation></ref>
<ref id="B54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xie</surname> <given-names>B.</given-names></name> <name><surname>Salvendy</surname> <given-names>G.</given-names></name></person-group> (<year>2000</year>). <article-title>Prediction of mental workload in single and multiple task environments.</article-title> <source><italic>Int. J. Cogn. Ergon.</italic></source> <volume>4</volume> <fpage>213</fpage>&#x2013;<lpage>242</lpage>. <pub-id pub-id-type="doi">10.1207/s15327566ijce0403_3</pub-id></citation></ref>
</ref-list>
</back>
</article>