Reliability and Validity of the Utrecht Tasks for Attention in Toddlers Using Eye Tracking (UTATE)

van Baar, Anneloes L.; de Jong, Marjanneke; Maat, Martine; Hooge, Ignace T. C.; Bogičević, Lilly; Verhoeven, Marjolein

doi:10.3389/fpsyg.2020.01179

ORIGINAL RESEARCH article

Front. Psychol., 08 June 2020

Sec. Human Developmental Psychology

Volume 11 - 2020 | https://doi.org/10.3389/fpsyg.2020.01179

This article is part of the Research TopicUnderstanding Trajectories and Promoting Change from Early to Complex Skills in Typical and Atypical Development: A Cross-Population ApproachView all 22 articles

Reliability and Validity of the Utrecht Tasks for Attention in Toddlers Using Eye Tracking (UTATE)

Anneloes L. van Baar^1*

¹Department of Child and Adolescent Studies, Utrecht University, Utrecht, Netherlands
²Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, Netherlands

Attention problems hinder many children in their cognitive and social emotional development. Children at risk for developmental problems, like preterm born infants, are specifically known for attention difficulties. Early identification of attention difficulties is important for application of appropriate stimulation in trying to reduce further problems. Specifically designed instruments with good psychometric characteristics are needed to show difficulties in attention, that may contribute to early identification. The Utrecht Tasks of Attention in Toddlers using Eye tracking (UTATE) is an instrument to measure orienting, alerting and executive attention capacities in young children. Reliability and validity of the UTATE are specifically addressed in three studies, reported in this paper. A sample of 95 term born children assessed at 18 months of age was used that provided data for both the second and third study reported here. In addition, three other small samples were used, of which the first consisted of 12 children at 18 months with test-retest data available that are reported in the first study. Two other samples that were used in the third study, consisted of 14 children measured at 12 months, and 15 children examined at 24 months. The UTATE resulted in reliable information on eye movements and some first support for construct and predictive validity was found. Low scores on the UTATE at 18 months were found to be related to slower cognitive development as measured with the Bayley-III-NL at 24 months. Furthermore, a first indication that the UTATE is able to detect some age differences in attention was found. It is concluded that the UTATE can be used to study attention capacities in toddlers that underlie cognitive functioning and development, but further research is necessary.

Introduction

Many children experience problems in attention development (e.g., Mahone and Schneider, 2012). While development of attention capacities starts already early in life, problems are usually not recognized before school entry (Ruff and Rothbart, 1996; Atkinson and Braddick, 2012). To be able to detect attention problems at an earlier age, reliable and valid measures are needed, that objectively measure attention capacities. For this reason, the Utrecht Tasks for Attention in Toddlers using Eye tracking (UTATE) was developed (De Jong et al., 2016b). The UTATE consists of four tasks that are administered on an eye tracker and intends to measure functioning of three theoretically distinguished attention systems: orienting, alerting, and executive attention (Posner and Petersen, 1990). Orienting concerns the ability to activate attention and shift between visual targets, as becomes evident by relocating the gaze (Posner and Petersen, 1990; Atkinson and Braddick, 2012). Alerting is a skill that consists of the ability to attain and sustain attention for important cues in the environment. Executive attention is considered to be a more internal and endogenous system of attention, which entails directed attention and inhibition of behavior (Colombo, 2001; Atkinson and Braddick, 2012).

Several studies provided information regarding reliability and validity of the UTATE. In a pilot study was shown that the UTATE was feasible for use with 18-month-old children: the toddlers cooperated well during the procedure, and the data was of good quality and measured individual variation (De Jong et al., 2016b). Furthermore, sufficient split half reliability was found (De Jong et al., 2015, 2016b). In a second study, factorial validity of the UTATE was shown by a confirmatory factor analyses providing evidence for three underlying factors (i.e., orienting, alerting, and executive attention), as was expected based on the theory underlying the design of the tasks (De Jong et al., 2016a). Another study showed first evidence for clinical validity as the UTATE differentiated between a group of children at risk for attention difficulties (i.e., preterm children) and a typically developing group of children (De Jong et al., 2015). Further evaluation of the potential and the psychometric characteristics of the UTATE is important, also to allow other researchers to use our information regarding studies with eye tracking to evaluate attention capacities in toddlers and perhaps even develop improved instruments based on the UTATE. Therefore, the test-retest reliability (the focus of study 1) as well as the convergent, divergent and predictive validity (the focus of study 2) and an exploration of the results at different ages (the focus of study 3) of the UTATE are reported in the current paper.

Reliability indicates the consistency of an instrument, which can be measured in different ways, like split half reliability and test-retest reproducibility. Split-half reliability is a measure of the internal consistency of an instrument. For the UTATE, split-half reliability was studied by deriving the outcome variables separately for the even and odd numbered trials of the tasks. The correlation between the variables of the even and odd numbered trials indicated the strength of reliability (Field, 2009). Although this method already gave a first impression of reliability of the UTATE for toddlers at the age of 18 months, a drawback was that due to splitting the data, only half of the data is used to compute split-half reliability. In addition, for one of the tasks (i.e., delayed response task), making an appropriate split was not possible due to one of the outcome variables that could not be evenly divided among even and odd numbered trials. Therefore, another type of reliability of the UTATE will be investigated in this paper, again with toddlers at the age of 18 months, in study 1: test-retest reliability. Test-retest reliability is a measure of consistency and is examined by administering an instrument twice within a short time span, like 2 weeks, which is especially important for a construct as assessed by the UTATE, as attention skills in early infancy and toddlerhood are subjected to developmental and maturational changes. Strong correlations between measurements at two moments within a short period of time, is seen as proof of test-retest reliability.

Construct validity refers to the ability of an instrument to actually measure a certain construct (Cohen and Swerdlik, 2010), in this case attention. To determine the construct validity, both convergent and divergent validity have to be found and these will be addressed in study 2. Convergent validity indicates that a measure is equally suitable in identifying attention skills as other measures of attention (Cohen and Swerdlik, 2010). To investigate convergent validity, the orienting system as measured by the UTATE will be compared with mother-reported attention shifting skills of toddlers. The alerting system as measured by the UTATE will be compared to mother-reported attention focusing skills, and to observed on-task persistence of the toddlers during a free and structured play setting, coded by trained professionals. The executive attention system as measured by the UTATE is compared to mother-reported effortful control: a temperament dimension suggested to be closely related to executive attention (Rothbart et al., 2007). A moderately sized correlation between the UTATE and other measures of attention, evaluated with different kinds of instruments, is seen as proof of convergent validity. Divergent validity is accepted when the attention systems as measured by the UTATE are not, or less strongly related to constructs not supposed to reflect attention (Cohen and Swerdlik, 2010). As attention capacities underlie many cognitive activities, it is difficult to determine constructs to which it might not be related (Atkinson and Braddick, 2012). The orienting, alerting and executive attention systems of the UTATE will be compared to mother-reported social-emotional functioning and communication skills, with which no, or only weak relationships are expected.

Predictive validity of the UTATE will be found when the attention systems are related to measures of attention capacities and developmental outcome based on attention capacities, like cognitive capacities at older ages. For all three attention systems, predictive validity is studied by comparing the UTATE measure at 18 months of age to cognitive functioning assessed with a developmental test at 24 months of age in study 2. Next to that, orienting, alerting, and executive attention were compared to respectively mother-reported attention shifting, attention focusing, and effortful control measured at 24 months of age.

Part of validity of an instrument is also whether it is able to detect expected developmentally specific patterns. Although there are no studies that empirically tested the development of orienting, alerting, and executive attention during the second year of life, theoretically it is expected that attention capacities change and improve during the first years of life (e.g., Ruff and Rothbart, 1996). In study 3, we explore whether the UTATE is feasible for use with 12- and 24-month-old children. In addition is studied if the UTATE is capable of detecting age differences in attention skills by comparing the performance of 12-, 18-, and 24-month-old children. In this way a first impression of the potential of the UTATE in studying age related development of attention capacities is presented.

Study 1 – Reliability

Research Question

To what extent is the performance on the UTATE related to performance on the UTATE within the following 2 weeks? In other words, is the test-retest reliability of the UTATE adequate?

Materials and Methods

Participants and Procedure

The participants for this study formed a convenience sample and they were acquired by students in their own network, who asked parents with children aged around 18 months to participate in this study consisting of two assessments with the UTATE and answering a short questionnaire on demographic background characteristics. Parents and caretakers considered the children to be healthy at the time of assessment. One of the children could only be assessed once, because of an unexpected holiday at the second appointment. The sample with data for both measurements consisted of 12 healthy Dutch children aged 16–22 months (M = 19.00, SD = 1.86, 41.7% boys). The UTATE was administered twice with, on average, 8 days in between (M = 8.33 days, SD = 2.93, range 6–15). The UTATE was administered in a lab setting (n = 1), at their child care centre (n = 8), or at home (n = 3). For all children, the location was the same at the first and second measurement moment.

The research project was approved by the Medical Ethical committee of the University Medical Centre Utrecht. All parents gave informed consent for their child’s participation.

Measures

UTATE

The UTATE consists of four tasks: (1) In the disengagement task, a stimulus was first presented at the center of the screen, and after 2 s a second stimulus appeared at the left or the right side of the central stimulus. This task included 20 trials. (2) In the face task, first two pictures of identical child faces were shown, and after 8.5 s one of the pictures changed into a new picture and stayed on the screen together with the previously shown picture for 8 s. This task consisted of eight trials. (3) In the alerting task, a stimulus was presented on the screen for 32 trails and in half the trails, this was preceded by a signaling sound. (4) In the delayed response task, a dog was hiding in one out of two doghouses and the child was asked to search for the dog. This is the only task that makes use of an instruction for the child. A voice-over directs the child toward a dog on the screen and tells the child the dog wants to play “hide-and-seek.” The child is told to pay attention, because the dog is going to hide himself. The dog moves to one of two doghouses for 1000 ms, before he disappears. A worm pops up in the center of the screen, accompanied by a little music, to distract the child from the dog houses, and after a delay the child is asked to search for the dog by a voice-over. This task consisted of 18 trials, in which the delay increased with 2 s after three trials, from 0 to 10 s. Details regarding the instrument, apparatus and procedure are described in De Jong et al. (2016b) and in the manual (see Supplementary Material). Fixations were classified with the method described in Hooge and Camps (2013). Thirteen variables were derived from these tasks (see Tables 1, 2). The whole procedure to do the UTATE took about 18 min. Please, also see our manual concerning the procedures we used in the Supplementary Material.

TABLE 1

Table 1. Descriptions of the variables from the four eye-tracker tasks.

TABLE 2

Table 2. Means and standard deviations at the first and second measurement moment and test-retest reliability: correlations between the variables measured at the first and second moment.

Statistical Analyses

As test-retest reliability cannot be computed for latent constructs which have to be derived for each sample separately, Pearson’s correlations were computed between the 13 variables that were derived from the UTATE at the first and second measurement moment. The 13 variables are ordered by the latent constructs on which they load (De Jong et al., 2016a). We adopt the criteria used in previous studies including neurocognitive tasks, where correlations between 0.50 and 0.70 were interpreted as “adequate reliability” and above 0.70 as “good reliability” (Kindlon et al., 1995; Kuntsi et al., 2001; Karalunas et al., 2016). SPSS version 25.0 was used for the analysis with α set at 0.05, one tailed, in view of the expected positive correlations. Post-hoc power analyses were done using the G^∗Power tool, version 3.1.9.7 (Faul et al., 2009).

Results

The means, standard deviations and correlations between the variables at both measurement moments as well as the power of the results are presented in Table 2 for each attention system.

Orienting

For the variables that measure functioning of the orienting system, test-retest reliability was good for transition rate in the disengagement task (r = 0.85) and adequate for mean dwell time and proportion of correct refixations in the disengagement task (r = 0.55 for both variables). For latency in the disengagement task and mean dwell time in the face task, the correlations were slightly below the cut-off of 0.50 (i.e., r = 0.46 and 0.49, respectively). For transition rate in the face task, the test-retest reliability was low with a correlation of −0.07.

Alerting

For the alerting variables, test-retest reliability was good for total dwell time in the alerting task (r = 0.79) and total dwell time in the delayed response task (r = 0.86). Test-retest reliability was adequate for total dwell time in the face task (r = 0.53) and slightly below cut off for total dwell time in the disengagement task (r = 0.49). For latency difference in the alerting task, test-retest reliability was low with a correlation of 0.21.

Executive Attention

Test-retest reliability was good for number of correct searches in the delayed response task (r = 0.71) and adequate for mean delay in the delayed response task (r = 0.63).

Discussion

In this study, test-retest reliability of the UTATE was examined by studying the relationship of the variables from the tasks that underlie the three latent factors, orienting, alerting, and executive attention. Results of the current study showed for the orienting measures adequate to good reliability for 3 out of 6 variables, reliability slightly below cut off for two variables and low reliability for one variable. For the alerting measures, adequate to good reliability was found for 3 out of 5 variables, slightly below cut off for one and low for one variable. Reliability was adequate to good for both variables that measure executive attention.

The goal of this study was to get additional information regarding reliability of the UTATE, next to the previously examined split-half reliability (De Jong et al., 2015, 2016b). This is needed as split-half reliability is not the best method to investigate reliability for every variable, for example because of lack of variation (i.e., proportion of correct refixations) and the inability to make an appropriate split (i.e., mean delay in the delayed response task). This suggestion is supported by the finding that for both of these variables test-retest reliability was adequate to good. When information from split-half reliability and test-retest reliability are combined, reliability was adequate to good for 5 out of 6 orienting measures. For latency in the disengagement task, test-retest reliability was slightly below cut off and split-half reliability was moderate. This indicates that the measure of orienting is a reliable measure, as the reliabilities of the variables from which this measure is constructed were mostly adequate to good. For the alerting measure, the same can be concluded as reliabilities were adequate to good for 4 out of 5 measures.

Reliability was low for one measure underlying the orienting factor, transition rate in the face task. As the means for this variable were almost the same on both measurement occasions and the standard deviation were not large, the variation in scores may have been too small to show a clear relationship over time. Reliability was also low for latency difference in the alerting task and this variable previously was found to show a very small factor loading (De Jong et al., 2016a). The alerting system is thought to reflect the ability to achieve and maintain a state of alertness (Posner and Petersen, 1990). Whereas most variables of the alerting system especially reflect sustained attention, latency difference specifically reflects the ability to achieve a state of alertness. The variation in achieving a state of alertness at two different measurement moments, apparently is large and it seems to differ from the ability to maintain it at both times. As we previously found that an extra analysis with the variables with non-significant factor loadings excluded, also resulted in a model with good fit indices, we have kept these variables in our models for theoretical reasons and it was expected that this had little influence on the measure of alerting (De Jong et al., 2015). For further study of the attention capacities of toddlers the latent factors are considered to be of greater importance than the variables of all tasks separately.

Finally, the executive attention measure can be considered reliable as reliabilities of both variables were adequate to good.

Limitation of the current study is the small sample size (n = 12). Although in a previous study we found almost similar results regarding split-half reliability in a pilot sample of 16 and the full sample of 196 children (De Jong et al., 2015, 2016b), further research is needed with larger samples to confirm our findings with respect to test-retest reliability.

In sum, reliability was adequate to good on at least one of the two methods (i.e., split-half reliability and test-retest reliability) for 11 out of 13 variables of the UTATE. This study showed that a combination of different types of reliability assessment provides a more complete picture of the reliability of an instrument, as one type of reliability measure does not suite every variable. For further studies we suggest to use the three latent factors of orienting, alerting and executive attention capacities and not the separate variables of the tasks that constitute those factors. Based on our findings we conclude that, overall, the UTATE is a reliable instrument.