Vocabulary Assessment With Tablets in Grade 1: Examining Effects of Individual and Contextual Factors and Psychometric Qualities

A tablet application was designed to assess children’s receptive vocabulary in French using a classical four-choice picture paradigm and 240 words which varied in word frequency. Results showed (1) an effect of socio-demographic zone, with lower correct response scores and longer reaction times for children in disadvantaged areas, (2) an effect of word frequency, with higher correct response scores and shorter reaction times for frequent words than for rare words, and (3) an effect of age and gender on correct responses in favor of girls and older children. More interestingly, an interaction effect on correct responses revealed that for rare words, the difference between girls and boys was higher, again in favor of girls, in the normal socio-demographic zones. We used an Item Response Theory analysis to examine the psychometric qualities of each item. This then allowed us to select two shortened equivalent versions of the test which were very closely matched to certain psychometric properties. In the same way as other reading-related skills assessed using new technologies (computer or tablet), receptive vocabulary with its two parameters of speed and accuracy can be integrated as an important component of reading ability.


INTRODUCTION
The last two decades of research have brought about an increase interest in increasing the use of mobile phones and touch-screen tablets for assessing, collecting, and developing early literacy skills (see Frank et al., 2016;Herodotou, 2018 for recent reviews). One of these literacy skills is vocabulary. Vocabulary knowledge underpins all language skills, and most specifically reading, and is considered to be an important component of academic success (Rice and Hoffman, 2015). Vocabulary is the knowledge an individual has about a word, and the retrieval of this knowledge is underpinned by processes controlling the speed and access to this knowledge (Oakhill et al., 2012). Vocabulary is often conceptualized in two dimensions: receptive and expressive (see Pearson et al., 2007 for a review). Expressive vocabulary refers to known words that the individual is able to produce, used correctly in a context/sentence. Receptive vocabulary refers to words that an individual is able to understand, either orally or in writing. It is on the latter that we will focus in this article. In general, the size of receptive vocabulary is larger than the size of expressive vocabulary (i.e., understand more words than they use, Pearson et al., 2007). Applications that test receptive vocabulary knowledge (i.e., understanding a word by reading or hearing it) are easy to develop (Neumann and Neumann, 2019), whereas expressive vocabulary assessment needs implementing more complicated a vocal recording.
The use of digital tools seems to bring advantages in assessment. Indeed, computers or tablets are attractive to children due to their fun aspect (e.g., game), especially for children with learning difficulties or disabilities (see Marble-Flint et al., 2019 for a study in children with autism spectrum disorder). Moreover, these new technologies permit the standardized administration of the tasks and, more importantly, the automated collection of a series of data, response types (correct responses and errors) and response times (Frank et al., 2016;Neumann and Neumann, 2019). It is now generally acknowledged that two parameters, namely speed and accuracy, are important when assessing language skills (Oakhill et al., 2012(Oakhill et al., , 2019Richter et al., 2013). Children do not all learn at the same speed, do not all have the same vocabulary, and therefore do not all have the same proficiency of language. These inter-individual differences may be due to different factors (e.g., gender, age socio-economic level, or lexical frequency). The aim of this study was a) to use a touchscreen tablet to examine the receptive vocabulary of Frenchspeaking children in Grade 1 as a function of certain individual, contextual and word-related factors, and b) to evaluate the psychometric properties of this test. We briefly present how receptive vocabulary is measured in the literature and then introduce a number of factors affecting vocabulary performance.

How to Assess Receptive Vocabulary?
Receptive vocabulary is traditionally assessed by means of paperbased tests using a four-choice picture paradigm in which children are asked to match a target word to one of the four pictures. The most widely used and best known test of receptive vocabulary is the Peabody Picture Vocabulary Test (in American English, PPVT-4, Dunn and Dunn, 2007), which has been adapted for use in many other languages, such as British English (British Picture Vocabulary Scale, Dunn et al., 2009) and French ("Échelle de Vocabulaire en Images Peabody"[Peabody Picture Vocabulary Scale], Dunn et al., 1993). Recently, attention has turned to increasing the use of tablets for learning and assessing language skills, especially in children with language difficulties (Marble-Flint et al., 2019). This growing interest in tablet devices has made it possible to develop applications for assessing vocabulary using, for instance, a form of the picturechoice paradigm in multilingual children (Schaefer et al., 2019) as well as in monolingual English-speaking children (Schaefer et al., 2016). The authors pointed out several benefits of using numerical/digital tools to measure vocabulary such as speed, ease of test administration for the experimenter, but ease of use and comprehension for the children. Another benefit of using digital tools in language assessment is the collection of accurate measures of reaction times and errors. These measures are important because they represent the efficiency of a process (e.g., Richter et al., 2013). Reaction time operationalizes how quickly an individual will access information (see De Boeck and Jeon, 2019 for a review), and errors reflect the accuracy of the information (Oakhill et al., 2012;Richter et al., 2013). In addition, measuring these two indicators also helps determine response strategies, i.e., whether participants react quickly but make errors, or whether they react more slowly and make fewer errors. Using digital tools to assess literacy skills is a new challenge.

What Factors May Affect Vocabulary Level?
Paper-based assessments of participants' individual vocabularyrelated characteristics have been conducted (e.g., Dunn, 1981, 1997;Fenson et al., 2007;Taylor et al., 2013;Rice and Hoffman, 2015). A gender effect has been shown, but there is no clear consensus about this. Thirty years ago, Dunn and Dunn (1981) noted that boys performed better than girls both on the original PPVT and on the PPVT-Revised. However, in a more recent study, girls were found to have a larger vocabulary size than boys (Fenson et al., 2007). A longitudinal study, from 2;6 to 21 years of age, showed a gender effect on vocabulary growth with an advantage for young girls which then leveled out with age before turning into an advantage for boys from ages 10 to 21 years (Rice and Hoffman, 2015). The gender effect seems to change as individuals become older. Researchers have not yet provided a clear explanation of this effect (see Rice and Hoffman, 2015, for an explanation in terms of hormones). However, it seems likely that it is partially explained by children's interest in language (e.g., as the interest in reading varies according to gender and/or age; see Hoff, 2006). Moreover, Rice and Hoffman (2015) showed an effect of age, with the rate of vocabulary acquisition increasing up to 12 years before slowing again. Vocabulary increases with age, and therefore with educational level (Taylor et al., 2013). Vocabulary and reading abilities have a bidirectional relation. Indeed, the more words individuals know, the better their reading comprehension is; and the more they read, the more their vocabulary grows (e.g., Oakhill et al., 2019). One of the main factors explaining vocabulary growth is thus reading practice. Moreover, reading comprehension involves knowing the meaning of words and retrieving this information accurately and quickly when reading. The richness of a person's vocabulary can therefore depend on the speed at which words are activated.
Finally, Taylor et al. (2013) examined receptive vocabulary development, measured with a short version of the PPVT-III (Dunn and Dunn, 1997), in a sample of 4,332 Australian children from 4 to 8 years. They found a negligible effect of gender, but showed that the fact of coming from a disadvantaged socioeconomic area (i.e., families with low socioeconomic status) was related to a lower rate of growth in receptive vocabulary. Indeed, the difference between children from families with high vs. low socioeconomic status derived from the quality of the conversation between them and the other members of their families (e.g., Hart and Risley, 1995;Fletcher and Reese, 2005). Children from families of a higher socioeconomic status hear more differentiated words (e.g., nouns, verbs, adjectives, etc.) than children from families with lower socioeconomic status (Hoff, 2006). Socioeconomic status affects children's language development in different ways.
Another factor affects vocabulary performance during assessments and it is important to take account of word characteristics, and especially word frequency, when considering vocabulary (Schaefer et al., 2016). Word frequency measures how often a word occurs in an individual's daily life. The more frequent a word is, the better it is known and, therefore, the more likely it is to be part of children's vocabulary. Including rare and frequent words in a test allows varying the difficulty of access to the information of the words stocked in the lexicon. The retrieval of the lexical representation of frequent word is faster and more accurate than that of a rare word. Word knowledge also depends on reading time: The more exposed to reading children are, including shared book reading activities in younger children, the more their vocabulary will increase, and the more they will be able to redefine words they already know (e.g., Cunningham, 2005;Oakhill et al., 2019). In addition, the quality of lexical representation of a word would depend on links between different levels of lexical representation: orthographic, phonological and semantic (Perfetti and Stafura, 2014). The stronger the links between the different levels of representation, the more precise the quality of the lexical representation. The impact of these four factors-gender, age, socio-economic status and word frequency-on vocabulary performance will be examined in the current study.

PRESENT STUDY
The receptive vocabulary of French-speaking children was assessed using tablets at the beginning of formal reading and writing instruction, i.e., in Grade 1. To respond at the first aim of the paper, we examined the factors which could impact vocabulary performance, such as the socio-demographic zones of schools (related to the socioeconomic status of the families), individual characteristics (gender, age), and word frequency. We operationalized the older vs. younger children by their date of birth, which were included in either semester 1 or semester 2. We expected that children from schools situated in disadvantaged areas, boys and the younger children would achieve lower scores. Moreover, we also expected to observe an effect of word frequency, with the correct response scores decreasing and the response times increasing from frequent words to rare words. In addition, we examined the influence of distractors in the task using two levels of lexical representations: phonological and semantic. This last point will help determine where the children's difficulties lie, whether the accuracy of the information is based more on semantic or phonological information. Finally, in order to examine the psychometric properties of the vocabulary test, an Item Response Theory (IRT) analysis was run to select the best items and then to construct a new version of the vocabulary test.
Thus, based on the most discriminating items, we will propose two shorter versions of the test.

METHODS Participants
A total of 281 first graders (M age 1 = 75.9 months; SD = 5.1) took part in this study. They were assessed at the beginning of the school year (October). They were schooled in two zones 2 , one with major "specific educational needs, " i.e., part of the so-called "Réseau d'Education Prioritaire" [Priority education network] (REP+; 3 schools; n = 102; 50 boys/52 girls), and the other outside of the REP (7 schools; n = 179; 86b/93g). All necessary consents were obtained from the parents and academic authorities.

Material and Procedure
A large number of words (240) were presented during two sessions of 30 min each (2 × 120 words). These consisted of 173 common nouns, 20 adjectives, and 47 verbs. They were selected from a set of available pictures and were divided into five categories (48 × 5 words) according to their frequency using the UG1 index from the Manulex database (Lété et al., 2004), going from the most frequent words in C1 to the least frequent words The children performed a traditional task in which they first heard a word and then saw four pictures on the screen (Figure 1). Three distractors were presented for each target word: one had a phonological unit (DPho; syllable or rime) in common with the target word, one was from the same semantic field as the target word (DSem), and the third was unrelated to the target word (strange item; DStr).
The children sat in front of the tablets wearing headphones. They heard a word and then had to touch the screen with their finger as soon as they wanted to respond (Figure 1). If they did not clearly hear the target word, they could ask to hear it again. Of the total responses, 5.94% were collected after a second hearing of the word. The children had 15 s to give each response. After that, a new word was proposed.
The target words were presented randomly and the response time for each response (RT) was recorded. All data (type of response and RT) were saved on a web server and then collected. The test reliability was high with Cronbach's α = 0.97.
In order to eliminate aberrant response times (outliers), a twostep operation was performed for each participant before the analyses described below were performed: 1/response times that were over two standard deviations from the mean were replaced by the mean response time and 2/after replacement, a mean response time was recalculated.
1 The age of each child was not recorded. 2 In the French educational system, schools are divided into three zones according to their socio-economic status and their children's learning difficulties. Two zones with "specific educational needs" are distinguished between: REP and REP+, in which a majority of parents have lower incomes. In REP+, the percentage of manual workers and unemployed persons is the highest (74.3%) compared to that of REP (60%) and non-REP (37.8%) (data published in 2017 by the French Ministry of Education).

RESULTS
We first examined the effects of individual, contextual and wordrelated factors on the collected responses (correct responses and RTs) and types of errors with MANOVA which allows to take into account the effects of the different independent variables on the combination of dependent variables. Then, we conducted IRT analyses to examine the psychometric properties of all the items and select the items sufficiently discriminant to assess receptive vocabulary. After selecting the best items, that is to say those with good properties, we tried to construct two shortened versions of the test which exhibited the same properties to avoid a long test administration time.

Effects of Individual, Contextual and Textual Factors
Descriptive data are presented in Table 1. Two successive MANOVAs with the same design were run, one on the correct response scores and the second on the RT, with three betweensubjects factors, Zone (REP+ vs. outside of REP), Gender (boys vs. girls) and Age (old, born in semester 1 vs. young, born in semester 2) and one within-subjects factor, Word Frequency (C1, C2, C3, C4, and C5).
Finally, we carried out a MANOVA on the types of errors, with a between-subjects factor Zone and a within-subjects factor Type of Error, for the three distractors: phonological (DPho), semantic (DSem) and strange (DStr). We expected a significant interaction between Zone and Type of Error, with the differences between the children in the two zones being greater for DSem and DPho. A significant effect of Zone was again revealed, F(1, 279) = 33.04, p < 0.0001, η 2 = 0.11, with the children from REP+ making more errors than their peers outside of REP (31.4 vs. 23.3). We also found a significant effect of Type of Error, F(2, 588) = 510.98, p < 0.0001, η 2 = 0.65, with the number of errors decreasing from DPho (32.6) to DSem (32.1) to DStr (17.4). There was no significant difference between the first two of these and the expected significant interaction was not found (p > 0.05). No other significant effects were found.

Psychometric Properties: An IRT Analysis
We conducted an IRT analysis using a two-parameter logistic model (2PL) to obtain the difficulty and discrimination coefficients of the items. Here, we present two types of curves, namely item characteristics curves (icc) in Figure 3A, and the test characteristic curve (tcc) in Figure 3B. In the first graph (iic), high-discrimination items are represented by a steep slope and items with a flat slope are poorly discriminated. A negative discrimination coefficient (see "luire," Figure 3A) indicates that the corresponding item was not informative for the purposes  of the test (Baker, 2001). The second graph (tcc) covers all the items. It is obtained by adding, for each theta value (θ), the probabilities relative to all the items. Figure 3B presents three contrasted θ values corresponding to three expected scores (correct responses), the mean, the mean less one standard deviation and the mean plus one standard deviation. The more accentuated curve on the left side (negative θ) shows that the test was more difficult for the children with a lower vocabulary level (expressed as the latent trait).

Selection of Items for Two New Versions of the Test
A two-step process was used: 1/we calculated the pointbiserial coefficient (r pb ) of each item and 2/we constructed two versions with items matched on their difficulty coefficients from the IRT analysis. Taking a r pb < 0.25 as our threshold, 42 items were discarded. These items were also those with the lowest discrimination coefficients (and obviously those with negative values). We then FIGURE 3 | (A) icc for three contrasted items as a function of their discrimination coefficients, indicating the lowest (α = -0.55; luire; shine), the highest (α = 2.45; cadre; frame) and a mean coefficient discrimination (α = 1.03; marmite; pot); (B) tcc with three expected scores (mean score (m = 155), m-sd = 119), and m + sd = 191) and the corresponding latent trait scores (N = 281). Notes: icc, item characteristic curves; tcc, test characteristic curve; m, mean; sd, standard deviation. matched two series of items on the basis of their difficulty coefficients. We finally obtained two versions (2 × 99 items) of the test with very similar difference indexes (Table 2). Moreover, after conducting an IRT analysis for each version, we can observe that the two curves (Figure 4) have very similar slopes and that the theta values are also very similar for the average correct response scores in the two versions. Finally, with regard to the two tcc, we can confirm that versions A and B have the same properties.

DISCUSSION
The aim of this paper was to use tablets to assess the receptive vocabulary level of French-speaking children at the beginning of Grade 1 as a function of individual, contextual and word-related factors and to examine the psychometric properties of this new test implemented on a tablet. We used a conventional, easy fourpicture choice task.
We found an effect of educational zone, with children outside of REP performing better than those from REP+. The children in REP+ came from families with low socioeconomic status. They made more errors than the children from outside of REP and exhibited shorter RTs. This latter finding might seem surprising. Indeed, we might have expected the RTs of the children from lower socio-economic zones to be longer because their less developed vocabulary knowledge should have caused them to hesitate when responding. In the light of these unexpected faster RTs combined with more errors, we assume that the children in REP+ may ultimately have exhibited speed over accuracy. At the same time, the poorer receptive vocabulary abilities of the lower socioeconomic status children (i.e., REP+ in our study) is an expected result (see, for example, Taylor et al., 2013). According to Hoff (2006), this could result from the different interaction between these children and their parents. Families with low socioeconomic status might use a different language style (e.g., less complex grammar and syntax in families with lower socioeconomic status). Moreover, children from families with higher socioeconomic status read (with their parents) more often than those from families with lower socioeconomic status (e.g., Fletcher and Reese, 2005). As a result, children from families with higher socioeconomic status are more likely to understand and know words and develop language abilities, and more specifically improve their vocabulary knowledge (e.g., Hoff, 2006;Oakhill et al., 2019).
We also found an effect of age. The older children had lower RTs and higher correct response scores than their younger counterparts in the same school year. These results confirm that receptive vocabulary develops rapidly in early childhood (e.g., Taylor et al., 2013;Rice and Hoffman, 2015), especially at the beginning of formal educational. Indeed, we show an age effect when only a few months separate the birth of children (i.e., born in the first half of the year vs. born in the second half of the year).
Furthermore, receptive vocabulary performances vary as a function of word frequency. RTs decreased and the number of correct responses increased with increasing word frequency. In this study, we used a frequency index calculated in G1 (Lété et al., 2004). The retrieval of the lexical representation of a frequent word is faster and more accurate than that of a rare word. When children with a low level of vocabulary read lowfrequency words then, even if they can decode the words they read, one of the dimensions of the words will be missing, i.e., the semantic representation (see Perfetti and Stafura, 2014), and this will hinder or prevent reading comprehension. Vocabulary should therefore be assessed as a core component of reading ability and word frequency is an important factor that needs to be taken into account when developing a corresponding assessment tool.
More interestingly, an interaction effect between gender, zone and word frequency was found. The word frequency effect was larger for girls than for boys as a function of socioeconomic zone (i.e., REP+ vs. outside of REP), with girls outperforming boys on rare words in normal zones (outside of REP). Our results are consistent with those of Fenson et al. (2007) and Rice and Hoffman (2015), showing that girls have better language abilities than boys (see Hoff, 2006, for a review). In our study, we did not have information about the reading practices of the children with their parents. That is to say, we were unaware of their "home literacy environment" (see Sénéchal et al., 2017). It might be worthwhile including this factor in future work because research shows that children with a rich home literacy environment (books, magazines, shared book-reading, etc.) are more likely to develop vocabulary (Hart and Risley, 1995). Our study confirms that receptive vocabulary knowledge depends on different factors, which may be individual (e.g., gender, age), contextual (e.g., socioeconomic zone), or related to word characteristics (e.g., word frequency).
The design of our test allowed us to observe how the types of error vary depending on the relation shared with the target item (e.g., semantic, phonological, or strangenot related to the target). We thus examined the types of errors made by the children and this enabled us to identify and understand their difficulties. The children made more "phonological" and "semantic" errors than "strange" errors. The "phonological" errors may be accounted for by an auditory attentional bias. The children might have made errors because they did not concentrate enough to respond accurately, or they might not have heard the word clearly. However, as we have pointed out, they were able to listen to each word again. The "semantic" errors are due more obviously to imprecise knowledge of the target word (see Perfetti and Stafura, 2014). Finally, the "strange" errors that we observed would seem to suggest that the target words were completely unknown.
Another important result of our study is that we can now propose two versions of a French receptive vocabulary test available in the form of a tablet application. Indeed, the initial test might have made use of too many items and was therefore time-consuming. We are now in a position to shorten this version to produce two truncated versions which could be used as complementary tools in further studies. For instance, one version could be used at the beginning of an interventional study, and the second at the end of the study in order to avoid or attenuate the classical test-retest effect. These two versions have the same good psychometric properties. The IRT analyses have shown that the discrimination coefficients are sufficiently high to identify children with low or high receptive vocabulary abilities.
Finally, using a tablet has various advantages, which have been repeatedly highlighted in the literature (Frank et al., 2016;Schaefer et al., 2016Schaefer et al., , 2019Herodotou, 2018;Neumann and Neumann, 2019). Indeed, digital tools are generally popular with children and increase their motivation. Moreover, the receptive format test is easy to administer and easy to understand for children. It could also be used with atypical populations of children, e.g., those with an autism spectrum disorder (Marble-Flint et al., 2019), with specific language impairments, or with difficulties in learning to read in the case of children at the beginning of Grade 1. In addition, the use of computerized tools in language assessment allows for the collection of accurate response times and errors, which are two essential indicators to consider when estimating the efficiency of language processes (i.e., the speed of access to and accuracy of the information requested-e.g., Oakhill et al., 2012;Richter et al., 2013).
To conclude, our article provides a digital vocabulary assessment tool that is fun and easy to administer to children. We have provided two equivalent versions with several important items, which will allow the measurement of receptive vocabulary without any test-rest effect. We found effects of individual (i.e., gender, age), contextual (i.e., socioeconomic zone), and lexical (i.e., word frequency) variables. These variables should be considered in the development of future norms.

LIMITATIONS AND FURTHER RESEARCH
First, not all children have the same prior experience with tablets. Children who have already used a tablet are more likely to be confident in indicating their responses to the tests. It may therefore be necessary to take some time to familiarize children with the tablet (Frank et al., 2016). Second, we had no information about home language, the home literacy environment or maternal education. However, these factors could influence the level of receptive language (Schaefer et al., 2016;Sénéchal et al., 2017;see Hoff, 2006 for a review). Third, given that vocabulary has two dimensions, i.e., breadth and depth (Ouellette, 2006), the second of these will be developed in the next version of our test. A more complete version, including the two dimensions, would then be available for administration to older children in subsequent grades. Using the same linguistic material as is employed in computerbased assessments of reading, word reading (Auphan et al., 2019), and comprehension (Beauvais et al., 2018), vocabulary will be another important reading-related skill which can be examined in order to define the profiles and difficulties of readers. In such an approach, it will be possible to take account of both parameters, namely speed and accuracy. Of course, to obtain a good and reliable all-round tool for assessing both reading and vocabulary, all the tasks will need to be implemented on tablets. Furthermore, comparison with a paper/pencil vocabulary test should be tested to confirm the advantage of the computerized version (Neumann and Neumann, 2019). In addition, future work will propose a standardized norm of this vocabulary test associated with a reading test implemented on tablets. Another limitation is that the sample is composed solely of Grade 1 children.
Standards applicable to a larger population (i.e., at different levels and/or by age) should be the subject of future studies. This will then provide a standard of reference for researchers, practitioners, and teachers.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
JE, AM, LC, CG, and PA contributed to the design and development of the study. LC and CG provided access to the study population. JE organized the database and performed the statistical analysis. PA, AM, JE, and ED participated in the interpretation of the results, as well as in the choice of theory. ED wrote the introduction and the discussion parts of the manuscript. JE wrote the method and results parts of the manuscript. All authors contributed to the revision of the manuscript, read and approved the submitted version.