The Role of Statistical Learning and Verbal Short-Term Memory in Impaired and Typical Lexical Development

Purpose: Studies on the interface between statistical learning and language are dominated by its role in word segmentation and association with grammar skills, while research on its role in lexical development is scarce. The current study is aimed at exploring whether and how statistical learning and verbal short-term memory are associated with lexical skills in typically developing German-speaker primary school children (Experiment 1) and Hungarian-speaking children with developmental language disorder (DLD, Experiment 2). Methods: We used the language-relevant Peabody Picture Vocabulary Tests to measure individual differences in vocabulary. Statistical learning skills were assessed with the Weather Prediction task, in which participants learn probabilistic cue-outcome associations based on item-based feedback. Verbal short-term memory span was assessed with the Forward digit span task. Results: Hierarchical linear regression modelling was used to test the contribution of different functions to vocabulary size. In TD children, statistical learning skills had an independent contribution to vocabulary size over and above age, receptive grammatical abilities and short-term memory, whereas working memory did not have an independent contribution. The pattern was reverse in SLI: Vocabulary size was predicted by short-term memory skills over and above age, receptive grammar and statistical learning, whereas statistical learning had no independent contribution. Conclusion: Our results suggest that lexical development rely on different underlying memory processes in typical development and in developmental language disorder to different degrees. This qualitative difference is discussed in the light of different stages of lexical development, as well as the contribution of the different human memory systems to vocabulary acquisition.


INTRODUCTION
Our environment serves as a rich source of statistical information. We face a huge number of incoming stimuli, some of which are random, but most of them are organized by an underlying pattern. These patterns are not always easy to detect. This process of pattern extraction is supported by a domain general learning mechanism: statistical learning, which is the ability of identifying structure and patterns based on distributional environmental information (Saffran et al., 1996;Frost et al., 2015). Statistical learning has been associated with a wide variety of cognitive skills and mechanisms, including reading (Arciuli and Simpson, 2012;Schmalz et al., 2019), language development (Weiss et al., 2010;Hsu and Bishop, 2014), language processing (Conway et al., 2010), computational thinking (Kandlhofer et al., 2019), music cognition (Pearce, 2018) and numerical cognition (Levy et al., 2020). The current study focuses on the role of statistical learning in lexical development, and tests how statistical learning and verbal short term memory skills contribute to vocabulary knowledge in typically developing children (Experiment 1) and children with developmental language disorder (Experiment 2).
The contribution of statistical learning to language acquisition has always been among the key topics of research within the field of SL. The first study to use the concept of statistical learning was developed to understand what role linguistic experience plays in word segmentation (Saffran et al., 1996). Later research diverged into various directions both within lexical and grammatical acquisition. Studies addressed how adjacent and non-adjacent dependencies (Uddén et al., 2012;Gervain and Werker, 2013; are acquired, how specific structural information is learned (Wonnacott et al., 2008;Wonnacott, 2011) vs. how general patterns are extracted from repeated exposure (Friederici et al., 2006;Saffran, 2002; for a review see; Gomez and Gerken, 2000).
Lexical acquisition relies on distributional information and is supported by statistical learning on many levels. It is important in reorganizing the initial phoneme space of infants to fit the phoneme inventory of their mother tongue (e.g., Kuhl, 2000;Werker and Curtin, 2005), which is a prerequisite of effective word learning. The cornerstone study motivating a wealth of later research on segmentation by Saffran et al. (1996) tested whether babies can identify word boundaries in continuous sound streams based on statistical information alone: Differences between transitional probabilities between syllables within words vs. across word boundaries. The extraction of language specific distributions of phonemes (phonotactics) also supports segmentation (e.g., Jusczyk et al., 1994). Beyond establishing word boundaries, distributional information is also useful in mapping words onto their referents and in resolving referential ambiguity. There are usually multiple candidates available in the environment for a novel word, and repeated exposure with cross situational correlations between words and referents is required to resolve ambiguity (Yu and Smith, 2007;Scott and Fisher, 2012;Monaghan et al., 2015;Rebuschat et al., 2021). The seminal word segmentation task by Saffran et al. (1996) modelled how infants extract word-like units from speech. In this task, participants (infants, children and adults, see e.g., Aslin et al., 1999) are exposed to a continuous stream of CV syllables (composed of a consonant and a vowel). Unknown to the participants, the syllables are ordered into triplets. Within a triplet the probability of syllable transition is high, whereas between-triplet transitions have a low probability. Infants are able to differentiate between triplet units (with both transitions being high) and triplet fragments (one transition being high, the other one low, Saffran et al., 1996).
Beyond the extraction of word boundaries, SL also operates in establishing object-label mappings, as demonstrated by studies of using the cross-situational word learning paradigm. In crosssituational word learning, participants are exposed to a number of novel objects and their corresponding labels at the same time, without any explicit mapping between objects and labels. After providing all labels, a new set of objects is presented with new labels. Participants should identify which label corresponds to which object across a number of sets, in which the same objectlabel pair is provided along with different other objects and labels. This type of word learning mechanism has been shown to operate in infants (Yu and Smith, 2007;Yu, 2008, 2013), children (Suanda et al., 2014) and adults as well (Yu and Smith, 2007;Vouloumanos, 2008).
Few studies addressed the role of statistical learning plays in vocabulary growth, and most of them had a different primary focus, and were only tangentially related to our research question. A modelling study by Yu (2008) examined how lexical knowledge at 2 years of age contributes to later statistical vocabulary learning. Results showed that the more words a child stores in their vocabulary, the easier to utilize statistical learning in further vocabulary acquisition. That is, while statistical learning in fact contributed to vocabulary acquisition, this effect was moderated by vocabulary knowledge. A further large scale study by Spencer and colleagues (Spencer et al., 2015) addressed how statistical learning on a word segmentation task is related to written language skills in 4-10-year-old children. While they primarily focused on literacy skills, most relevant to the current study, word segmentation was associated with several vocabulary measures.
Along with statistical learning, comprehensive models of vocabulary acquisition also emphasize the role of other skills such as short-term memory and categorization in the acquisition of lexical knowledge (Tomasello, 2000;Hirsch et al., 2016). In the following section, we will provide a brief overview of how these cognitive abilities contribute to lexical development.
The associations between verbal-short term memory capacity and early language acquisition are well-documented Gathercole, 1995, 2000;Adams, 1996;Baddeley et al., 1998). Gathercole and Baddeley (1993) specifically argued that the main function of the phonological loop is to store new phonological forms for learning new words. In line with this proposal, vocabulary development has been found to be associated with verbal shortterm memory in both monolingual and bilingual children (Masoura and Gathercole, 2005;Gathercole, 2006;Verhagen and Leseman, 2016). The significance of verbal short term and working memory deficits in developmental language disorder as clinical markers is also well known (Gathercole and Baddeley, 1990;Bishop et al., 1996;Botting and Conti-Ramsden, 2001;Conti-Ramsden et al., 2001).
Category learning is important in every aspect of language learning from the acquisition of speech sound categories through the emergence of syntactic categories to forming conceptual categories and labelling them. In many cases, categorisation processes are probabilistic and rely on statistical information available to the learner. Previous studies have addressed how categorization is associated with vocabulary growth and found that both the knowledge of (Borovsky et al., 2016), and the interest in (Ackermann et al., 2020) the category domain help learning new words related to the given category. Individual differences in the early spurt of vocabulary growth have been linked with differences in the appearance of toddlers' ability of exhaustive sorting of objects into categories (Gopnik and Meltzoff, 1987). Similarly, categorization performance has been found to predict later vocabulary in infants (Ferguson et al., 2015).
In this paper we will focus on how statistical category learning contributes to vocabulary knowledge. As discussed above, there is ample evidence for the association between developmental language disorder and verbal short-term memory. This strong evidence is complemented with sporadic evidence suggesting positive (Andrade and Baddeley, 2011) as well as negative associations (Virag et al., 2015;Conway, 2020) between statistical learning and short-term memory/working memory. Due to the inconsistency of the findings, we extended our focus to the relative contribution of statistical category learning and short-term memory to vocabulary knowledge in typically developing school-age children and in children with developmental language disorder.
To this end we used the Weather Prediction Task, which is a probabilistic category learning paradigm (Knowlton et al., 1994;Lukács and Kemény, 2015). In this task, participants are exposed to various combinations of geometric shapes and have to decide whether they predict sunshine or rain. At the beginning of the task, participants are unaware that each cue is associated with one of the outcomes. Two of the cues have a strong predictive value (being associated with one of the outcomes in 85.7% of all cases), the other two are weaker in prediction (leading to one outcome in 70% of the cases). Participants are not informed about the distributional nature of the task, but receive feedback after each decision. To achieve optimal performance, participants not only have to identify the association between each cue and outcome, but also have to cumulate the predictive values of the cues that are present at the same time. We expect to see that statistical learning efficiency on the Weather Prediction Tasks would be associated with individual variance in vocabulary size over and above age, grammar skills, fluid intelligence and verbal short-term memory. This association is tracked in typically developing primary school children (Experiment 1) and children with Developmental Language Disorder (Experiment 2).

Participants
Altogether 50 children participated in Experiment 1, three children were later excluded from the analyses due to missing data, resulting in a pool of 47 participants, whose data was considered in the analyses. Descriptive statistics of the participants are provided in Table 1. All children had German as their native language and were considered monolingual. Children were recruited from first and third grades of primary schools in and around Graz, Austria. All children were tested individually in their schools. Parents of children provided a written informed consent in accordance with the stipulations of the Institutional Ethical Committee, as well as the Declaration of Helsinki.

Procedure
All children were tested individually in quiet rooms in their schools. Tasks were administered in a random order in a single session that lasted for a maximum of 1 h. Children could have a self-paced break between the tasks.

Tasks
Vocabulary. We used a standardized German measure of the Peabody Picture Vocabulary Test (PPVT-4, Lenhard et al., 2015) to assess the vocabulary knowledge of children. In the test children see four pictures while hear a single word. They have to point at the picture that matches the given word. We used the raw score of the PPVT-4 which is the number of the last correctly responded item minus the number of errors.
Statistical Learning. The short version of the Weather Prediction Task (Gluck et al., 2002;Kemény and Lukács, 2010) was used as a measure of statistical learning. The Weather Prediction task is a probabilistic categorization paradigm. Participants see one, two or three out of four possible stimuli, and have to decide whether it would be sunshine or rain. Cues are simple geometric shapes: a square, a triangle, a pentagon and a rhombus. Since these geometric shapes have no general common associations to weather conditions, participants are initially expected to guess. Immediately after their choice, the correct outcome is revealed (that is, whether it was sun or rain). Unknown to the participants, each stimulus has a pre-set probability with which it is associated with the outcome. Cue1 (square) predicts sunshine in 85.7% of all its appearances, Cue2 (triangle) in 70%, Cue3 (pentagon) in 30%, Cue4 (rhombus) in 14.3%. In all other cases, the cue is associated with rain. Participants are not informed about the predictive values, they are expected to learn these contingencies based on feedback. Participants are exposed to one block of 50 items. The association of items and outcome probabilities are provided in Table 2. Statistical learning performance is characterized by the percentage of items with correct answers, i.e., items in which the participants chose the more probable outcome.
Verbal Short-Term Memory (STM). We used a Digit Span task to assess verbal STM (Racsmány et al., 2005). Participants heard a sequence of numbers and had to repeat the numbers in the same order. Numbers in the sequence were presented at a rate of one number per second. At the beginning of the task, sequences were composed of three items, and increased in length by blocks. For each length, there were four sequences presented in a block. If a participant successfully repeated at least half of the sequences (i.e., two out of four), the task continued with increased sequence length. Verbal STM span is characterized by the highest sequence length the participant was able to repeat. Grammar Skills. Since previous studies have shown that 1) statistical learning is associated with grammar skills (Misyak and Christiansen, 2012;Kidd and Arciuli, 2016), 2) syntactic information supports meaning acquisition (Fisher et al., 1994) and 3) lexical and grammatical development are closely related (Dale et al., 2000;Moyle et al., 2007; but see ;Brinchmann et al., 2019), we decided to control for grammar skills while examining the association of vocabulary and statistical learning. We used the standardized German version of the Test for the Reception of Grammar (TROG-D, Fox, 2016) to assess grammar skills. Like in the PPVT-4, children see four pictures, hear one utterance, and have to match the utterance with one of the pictures. The TROG-D is composed of 21 blocks of increasing syntactic complexity with each block containing four items. Completion of a block is considered successful if the participant responds correctly to at least three out of four items are correctly solved. We used the raw score number of the correctly solved successful blocks as the measure of grammar skills.
Fluid Intelligence. Vocabulary is a part of crystallized intelligence (Ullstadius et al., 2002;Kaufman et al., 2015), and fluid intelligence is argued to serve as a basis for crystallized intelligence (Cattell, 1987;Rindermann et al., 2010). Consequently, we also included fluid intelligence on Raven's Colored Matrices (Raven et al., 1987) as a control variable. Participants are exposed to a picture with a missing part. There are six possibilities, and participants have to choose by pointing at the correct one. There are altogether 36 items, we used the raw score, which is the number of correct responses.

Results
First, we tested whether the experimental group performed above chance on the statistical learning task. We conducted a one-sample T-test on the mean statistical learning performance with 0.5 as target value. The mean score of the group (60.64%, Sd 14.06%) was significantly above chance level (50%), t (46) 5.186, p < 0.001.
Second, we correlated the target (Statistical learning performance and Verbal Short-Term Memory measure) and control variables (Age, Grammar skills and Fluid intelligence) with the vocabulary measure. Details of the correlations are provided in Table 3. The Vocabulary measure correlated significantly with Statistical learning, r (46) 0.421, p 0.003, but did not correlate with Verbal Shortterm Memory, r (46) 0.136, p 0.362. In terms of the control measures, Vocabulary correlated significantly with age, r (46) 0.531, p < 0.001, Grammar skills, r (46) 0.546, p < 0.001, and Fluid intelligence, r (47) 0.467, p < 0.001.
Third, we assessed the individual contribution of statistical learning and verbal short-term memory to vocabulary measures over and above age, grammar skills and fluid intelligence. We have conducted two hierarchical linear regressions, both with vocabulary as dependent variable. In the first regression analysis, we entered the control variables of Age, Fluid Intelligence and Grammar Skills as well as Verbal STM in Step 1, and additionally Statistical Learning in Step 2. The two models are compared with an analysis of the R 2 change, which provides the individual contribution of Statistical Learning to vocabulary over and above the controlled variables. In the second regression analysis we tested the individual contribution of Verbal STM to vocabulary. We entered the control measures and statistical  Lukács et al., 2013). c Percent of correct predictions on the Weather Prediction task (Kemény and Lukács, 2010). d Raw score on Raven's Colored Matrices (Raven et al., 1987). e Number of correct blocks on the Digit Span task (Racsmány et al., 2005).
Note. The first column (Cues) shows which cues are present in a given combination: A is cue1, B is cue2, C is cue3, D is cue4. Frequency is the number of appearances within a block of 50 trials. The third column provides the probability that the given cue or combination leads to sunshine.
Frontiers in Communication | www.frontiersin.org July 2021 | Volume 6 | Article 700452 learning in Step 1, and Verbal STM in Step 2. Then we report the analysis of the R 2 change. Details of the regression models and coefficients are presented in Table 4. Statistical Learning and Vocabulary. Most importantly, a significant regression equation was found in both steps, F (4, 46) 10.944, p < 0.001 for Step 1, and F (5,46) 13.311, p < 0.001 for Step 2. There was a significant R 2 change in Step 2, F (1,41) 11.663, p 0.001, ΔR 2 10.8. The coefficients of Grammar skills and Statistical learning were significant in Step 2 model, t (46) 4.871, p < 0.001 ß 11.808 for Grammar skills and t (46) 3.415, p 0.001 ß 55.141 for Statistical Learning. Verbal STM was not a significant predictor, t (46) 0.421, p 0.676, ß 1.190.

Discussion
Experiment 1 suggests that statistical learning abilities, more specifically, probabilistic categorization skills are significant factors in vocabulary development, explaining 10.8% of additional variance in differences in receptive vocabulary over and above age, grammar skills, fluid intelligence and verbal shortterm memory. On the other hand, verbal STM did not have an independent contribution to predicting differences in receptive vocabulary in school-age typically developing children. These results provide evidence that statistical learning, more specifically probabilistic categorization is an important factor in lexical development. This is in line with previous suggestions reviewed in the introduction about the relationship between word learning and statistical learning: that is, objects and their labels are associated numerous times, however, noise and incongruent mappings have to be discounted for (e.g., Roembke et al., 2018).
Verbal short-term memory did not affect vocabulary size in our study. This is in contrast with previous findings in the literature on the role of VSTM in language development. Previous studies observed both an association between verbal short-term memory and vocabulary (Baddeley et al., 1998), as well as verbal short-term memory and statistical learning (e.g., Misyak and Christiansen, 2012). A possible explanation is a methodological one. VSTM spans may not have enough variance when it comes to individual differences. The small variance may account for the lack of association with other skills. However, this is unlikely, since span tasks have generally been developed to reflect individual differences (Daneman and Carpenter, 1980;Engle et al., 1992).
The lack of association between verbal short-term memory and statistical learning is also in contrast with previous findings, but that can be explained by the choice of tasks. On the one hand, most studies used phonological short-term memory as a measure of verbal STM (among others: Andrade and Baddeley, 2011;Gathercole and Baddeley, 1990;Masoura and Gathercole, 2005). The use of the digit span task may explain the lack of association. This task loads more on semantic/declarative information instead of phonological processes. On the other hand, we also used a non-typical task to assess statistical learning. Although the Weather Prediction task clearly relies on distributional information, it has not been widely used as a statistical learning task. Most statistical learning tasks focus on the acquisition of sequential information, where the frequencies of transitions between elements are central (e.g., Saffran et al., 1996). Misyak and Christiansen (2012) examined the association between statistical learning, working and short-term memory and language learning, and found only statistical learning of adjacent dependencies to be related to verbal short-term memory. No association was observed with statistical learning of non-adjacent dependencies. That is, even for sequential learning, task characteristics had an important impact on the association with short-term memory.
A number of previous neuropsychological studies have used the Weather Prediction task to assess learning in patients with brain injury (Knowlton et al., 1994;Hopkins et al., 2004). A number of these studies focused on patients with anterograde amnesia, that is, patients with a difficulty of acquiring new knowledge (Knowlton et al., 1994;Knowlton et al., 1996). Results of more studies have shown that despite a serious deficit in verbal short-term memory, amnesic patients with a deficit of the mediotemporal lobe performed identical to control participants (Knowlton et al., 1994;Knowlton and Squire, 1993; but see; Hopkins et al., 2004;Zaki, 2005). Such neuropsychological studies argue for the independence of short-term memory functions and statistical learning performance, at least in the case of the Weather Prediction task. Despite the use of a non-sequential statistical learning task that is relatively independent of verbal short-term memory, we still found a positive association with language measures.
Overall, Experiment 1 suggests that statistical learning on a probabilistic category learning task is associated with vocabulary. The aim of Experiment 2 is to assess this association in developmental language disorder, as both statistical learning and vocabulary measures are reduced in this population.

EXPERIMENT 2
Studying cognitive functions in atypical linguistic development can also contribute to our understanding of the cognitive bases of linguistic competence. Developmental Language Disorder is a neuro-developmental disorder with below age-level spoken language comprehension and expression (Leonard, 2014;McGregor et al., 2020), while other cognitive functions are relatively spared (Leonard, 2014), developmental language disorder shows highly comorbidity with reading impairment and some other developmental disorders (Young et al., 2002), such as developmental coordination disorder (Beitchman et al., 1996) and ADHD (Hill, 2001). A number of theoretical accounts have been proposed to explain the core deficit underlying DLD. These explanations range from specific language impairments to domain-general deficits explaining both the core linguistic problem as well as co-occuring problems in other domains (Ullman and Pierpont, 2005;Leonard, 2014). In the current study we focus on two accounts that suggest core deficits in phonological working memory (Gathercole and Baddeley, 1990) and statistical learning (Evans et al., 2009).
Phonological working memory was among the first candidates to explain linguistic deficits (Gathercole and Baddeley, 1990). Children with developmental language disorder have significantly shorter spans than their typically developing peers. Phonological working memory was also found to constrain vocabulary acquisition, at least in typical development (Gathercole and Baddeley, 1993). Later studies proposed that short-term or working memory deficits are clinical markers of developmental language disorder , and they also assumed that reduced STM capacity has a direct role in impaired acquisition of structural aspects of language (Adani et al., 2014;Friedmann et al., 2009;Stavrakaki, 2020;Stavrakaki and Van der Lely, 2010; but see; Bishop, 2006). In accordance, we expect verbal Frontiers in Communication | www.frontiersin.org short-term memory to play a stronger role in vocabulary acquisition in developmental language disorder than in typical development.
Similarly, while numerous studies showed a robust statistical learning deficit in children with DLD in different tasks and domains (Gabriel et al., 2012;Hedenius et al., 2011;Kemény and Lukács, 2010;Lukács and Kemény, 2014;Lum et al., 2010;Lum et al., 2012, for meta-analyses, see; Lum et al., 2014;Lammertink et al., 2017;Obeid et al., 2016), the focus in most studies was on statistical learning, and results on its association with lexical knowledge are few. One exception is the study by Evans and colleagues (2009), who argue that statistical learning is a central factor in lexical development. They found a positive relationship between vocabulary size and statistical learning abilities (in a word segmentation task) in typically developing children, but no association was observed with the same parameters in developmental language disorder. Association was found, however, when the length of the training was increased (Experiment 2 of Evans et al., 2009). Haebig et al. (2017) tested two key mechanisms of word learning: Statistical learning and fast mapping in children with developmental language disorder, children with Autism Spectrum Disorder, and typically developing children. Children with DLD showed impaired statistical learning performance, and no association between segmentation skills and word learning abilities, while a comparable level of statistical learning was found in ASD and TD, as well as an association between SL and word learning in both groups. In sum, we expect statistical learning to play a smaller role in the vocabulary acquisition of children with developmental language disorder. Experiment 2 addresses the same questions and uses the same methods as Experiment 1 in a group of monolingual Hungarianspeaking children with Developmental Language Disorder. Based on previous results we expected to see no association of statistical learning and vocabulary size in DLD (in accordance with Evans et al., 2009;Haebig et al., 2017). On the other hand, we expect a stronger association between verbal short-term memory and vocabulary. This expectation is supported by the observation that children with developmental language disorder are slower in their vocabulary acquisition (and in language acquisition more generally), and verbal short-term memory skills may be stronger predictors of lexical development in earlier stages/at smaller vocabulary sizes.

Participants
Altogether 45 children were included in the Developmental Language Disorder group, one child had to be excluded due to missing data. Descriptive statistics are provided in Table 1. All children had Hungarian as their native language and were considered monolingual. Children were recruited from two special schools for children with language impairment. Children were referred to these groups and classes by speech and language therapists working in clinical practice. In each institution, recruitment took between 2 and 3 months. No eligible children declined participation. All children met inclusive and exclusive criteria for DLD that are standardly used in selecting DLD children in research (Leonard, 1997;Tager-Flusberg, 2000). Each child scored above 85 on the Raven Coloured Progressive Matrices (Raven et al., 1987), a measure of fluid intelligence. No child had a hearing impairment or a history of neurological impairment. No children in the SLI group had any known comorbidities. Each child scored at least 1.25 SDs below age norms on at least two of four language tests administered. The four tests included two receptive tests: the Hungarian standardizations of the Peabody Picture Vocabulary Test (PPVT, Csányi, 1974) and the Test for Reception of Grammar (TROG-H, Lukács et al., 2012;Lukács et al., 2013) and two expressive tests: the Hungarian Sentence Repetition Test (Magyar Mondatutánmondási Teszt, MAMUT, Kas and Lukács, 2011), and a nonword repetition test (Racsmány et al., 2005). Table 5 provides further information about the tests on which children with DLD were significantly below age expectations. All children were tested with the informed consent of their parents, in accordance with the principles set out in the Declaration of Helsinki and the stipulations of the local Institutional Review Board.

Methods and Procedure
Methods and procedure were equivalent to those of Experiment 1, however, Hungarian language tests were used. We used the Hungarian adaptation of the Test for the Reception of Grammar (Lukács et al., 2012) as well as the Hungarian version of the Peabody Picture Vocabulary Test (Csányi, 1974). The tasks, procedures and calculation of raw scores are identical despite language differences.

Results
Similar to experiment 1, we first conducted a one-sample T-test on Statistical learning performance (dependent variable) with 50% (chance level) as target value. Children with DLD scored significantly above chance, t (43) 4.245, p < 0.001, showing a mean performance of 57.13% with a standard deviation of 11.27%.
We computed bivariate correlations between vocabulary and target variables (Statistical learning and Verbal STM), as well as vocabulary and control variables (Age, Grammar skills and Fluid intelligence). Details of correlations are provided in Table 3. Vocabulary correlated significantly with verbal short-term memory, r (43) 0.445, p 0.002, but not with statistical learning, r (43) −0.165, p 0.287. Considering the control variables, vocabulary correlated significantly with age, r (43) 0.509, p < 0.001, grammar skills, r (43) 0.374, p 0.012, but not with fluid intelligence, r (43) −0.050, p 0.746. Statistical Learning and Vocabulary. As in Experiment 1, we used hierarchical linear regression with Vocabulary as dependent variable. Age, Fluid Intelligence, Grammar Skills and Verbal STM were entered in Step 1 and Statistical learning was additionally entered in Step 2. Details of the hierarchical regression are provided in Table 4. Both steps resulted in significant models, F (4,43) 8.682, p < 0.001 for Step 1 and F (5,43) 7.502, p < 0.001 for Step 2. Unlike in Experiment 1, entering statistical learning scores did not significantly increase the explained variance of in vocabulary measures, F (1,38) 1.941, p Verbal STM and Vocabulary. We conducted a hierarchical linear regression analysis with Vocabulary as dependent variable. Age, Fluid Intelligence, Grammar Skills and Statistical Learning were entered in Step 1, and Verbal STM in Step 2. Again both steps resulted in significant models, F (4,43) 6.882, p < 0.001 for Step 1 and F (5,43) 7.502, p < 0.001 for Step 2. Unlike Experiment 1, entering Verbal STM significantly increased the explained variance of vocabulary, F (1,38) 6.264, p 0.017, ΔR 2 0.083.

GENERAL DISCUSSION
The current study investigated the contribution of statistical categorization abilities and verbal short-term memory to vocabulary knowledge. Statistical learning was tested with a probabilistic category learning task, the Weather Prediction task (Knowlton et al., 1994). Forward digit span was used as a measure of verbal short-term memory. Both factors have been suggested to contribute significantly to vocabulary growth, and both have been shown to be impaired in developmental language disorder. We expected statistical learning in probabilistic categorization to play an important role in vocabulary acquisition in typically developing children, whereas a greater contribution of verbal short-term memory was expected in language impaired population, due to their slower pace of vocabulary development. These hypotheses were only partially supported by our results. These suggest that statistical learning plays an important role in typical lexical development, explaining over 10% of the variance in receptive vocabulary scores. This contribution was observed after controlling for the effects of grammar skills and fluid intelligence. In contrast, verbal short-term memory was not associated with vocabulary measures in typically developing children. Children with developmental language disorder showed the reverse pattern. There was no effect of statistical learning, whereas verbal short-term memory explained more than 8% of the variance in vocabulary knowledge.
The first and most important result is that statistical learning plays an important role in vocabulary development, at least in typical development. This result supports and extends previous findings on the importance of different forms of statistical learning in lexical acquisition, even with the use of probabilistic categorization. This type of learning has features analogous to cross-situational learning where participants have to identify word-object mappings across several trials, and similar learning has been observed across different age-groups (Yu and Smith, 2007;Yu, 2008, 2013;Vouloumanos, 2008;Suanda and Namy, 2012;Suanda et al., 2014;Roembke et al., 2018). One study examined specifically how cross-situational statistical word learning is affected by reduced working memory resources (Roembke and McMurray, 2021), which were modelled with a dual-task setting, resulting in lower word learning performance. Although the reported performance decrease was small, this finding is an important step towards understanding how statistical word learning may be affected in developmental language disorder, a clinical population with reduced working memory. While a reduced working memory span may be an important factor in explaining smaller vocabulary sizes and lower rates of lexical development in DLD, we found no association between this form of statistical learning and working memory in either of the experiments, arguing for differential effects of different forms of statistical learning to vocabulary.
It is also important to note that the novelty of the current study was to rely on a statistical learning task, which primarily focuses on categorization instead of the typical word segmentation task (Saffran et al., 1996). We used the Weather Prediction task Note. a Vocabulary was assessed with the Hungarian Peabody Picture Vocabulary Test (PPVT, Csányi, 1974), b Grammar with the Test for the Reception of Grammar (TROG-H, Lukács et al., 2013), c Nonword repetition with the Hungarian Nonword repetition task (Racsmány et al., 2005), d Sentence repetition with the Magyar Mondatutánmondási Teszt (MAMUT, Kas and Lukács, in prep).
Frontiers in Communication | www.frontiersin.org July 2021 | Volume 6 | Article 700452 (Knowlton et al., 1994;Kemény and Lukács, 2013), which is a probabilistic category learning task. The task has traditionally been considered to tax procedural memory functions, and shows intact performance in amnesia (Knowlton et al., 1994(Knowlton et al., , 1996but see;Zaki, 2005;Lagnado et al., 2006;Newell et al., 2007;Kemény and Lukács, 2013;Kemény, 2014), which suggests independence of this form of statistical learning from short-term memory functions. Similar to this dissociation, the relation between statistical learning and vocabulary measures may also depend on the choice of the task. This is in line with previous studies showing low correlations even between different versions of the same statistical learning tasks (Siegelman et al., 2017a;Siegelman et al., 2017b). We observed a different pattern of associations between skills in DLD, where vocabulary acquisition showed a stronger association with short-term memory capacity. These results are in contrast with the wealth of previous studies highlighting the role of verbal short-term memory in vocabulary acquisition. A potential explanation for the failure to observe such a connection in our TD group lies in the age of participants: The association between verbal STM and vocabulary may be especially strong at the beginning and earlier stages of lexical acquisition, and indeed, previous studies reported significant associations in younger children (Masoura & Gathercole, 2005;Ferguson et al., 2015;Verhagen & Leseman, 2016). The strength of the association may decrease with age and/or it may also decrease with the growth of the lexicon, which could also account for the presence of such an association in the DLD group. Developmental Language Disorder is often characterized by slower linguistic development (Leonard, 1997; but see; Larkin et al., 2013). If children with DLD lag behind their typically developing peers in their linguistic development, one could argue that the observed association between verbal short-term memory and vocabulary measures is characteristic of TD language development at the DLD groups' language age. Children with DLD show a pattern of language and cognitive abilities of younger TD children in the developmental phase where lexical development relies more heavily on verbal shortterm memory. This hypothesis is supported by the fact that 20 of the children with DLD scored at least 1.25 SD below their age expectations on the PPVT. Since the probabilistic categorization task tests procedural learning (Knowlton et al., 1996; but see; Lagnado et al., 2006), with direct comparisons across DLD and typical groups, our design can provide important implications for the Procedural Deficit Hypothesis of Specific Language Impairment (Ullman and Pierpont, 2005) and for the different patterns of cooperation and competition between the memory systems in clinical populations. This hypothesis suggests that language impairment is the consequence of a deficit in domain-general procedural memory functions. Procedural memory is the memory responsible for the acquisition and storage of process-like information, like riding a bike , categorization (Knowlton et al., 1996) or grammar use (Ullman et al., 1997), and has also been assumed to underlie statistical learning (Cleeremans et al., 1998;Perruchet and Pacton, 2006;Simor et al., 2019). Procedural memory is complemented by Declarative memory, which is responsible for the acquisition and storage of fact-like information, like dates, phone numbers, etc (e.g. Squire et al., 1993). Children with developmental language disorder are assumed to be primarily deficient in their procedural memory functions, and use their declarative memory to compensate for the loss (Ullman and Pullman, 2015).
Since the two groups of the current study differ along several factors, direct comparison of the groups does not allow directly assessing the procedural deficit hypothesis or the declarative compensation. The level of statistical learning performance, however, was above chance in both groups, and only slightly differ from each other. That is, we have no evidence for a general procedural deficit, even if we consider that the clinical group was on average 1 year older than the typical group. Instead, we provide evidence that children with developmental language disorder benefit less from their statistical learning abilities when it comes to vocabulary development. This does not imply that training procedures relying on statistical learning are not beneficial for children with DLD, instead it highlights the importance of focusing on higher exposure and more repetitions of trials with difficult patterns during training (Evans et al., 2009;Plante and Gómez, 2018). Perhaps training in statistical learning would not only enhance core statistical learning abilities, but would also support the utilization of distributional regularities within the linguistic domain.
Instead of statistical learning, children with DLD rely more on verbal short-term memory. While one could argue that since verbal STM is the input of the declarative memory system (Blumenfeld and Ranganath, 2007), such a result might reflect mediation by declarative compensation, such a suggestion should be handled with care. While statistical learning was comparable across the two groups, verbal STM is clearly reduced in developmental language disorder, which is in line with previous assumptions of verbal STM being a marker DLD (Gathercole & Baddeley, 1993). If verbal STM is deficient in DLD, whereas statistical learning is at least not clearly deficient, it could be misleading to assume such reliance being compensatory. However, the association pattern between statistical learning abilities and vocabulary is different in the two groups.
Our study is not without limitations. One of these is the lack of a direct comparison between the clinical and the typical groups: that the typical group of Experiment 1 and the clinical group of Experiment 2 came from different countries, spoke different languages and were not matched on age. This made it impossible to directly compare the two groups on their linguistic skills, verbal short-term memory skills or statistical learning abilities. As a result in this study we could not provide evidence either in favour or against the procedural deficit hypothesis (Ullman & Pierpont, 2005) or the verbal shortterm memory deficit (Gathercole & Baddeley, 1993) by comparing the two groups. On the other hand, the aim of the current study was to examine how memory skills contribute to linguistic abilities, which contribution should be relatively independent of the target language, and the analyses themselves should be done separately even if the two groups were matched on age. Our study focused on vocabulary knowledge in school-age children at a relatively late stage of lexical development. It would also be interesting to test the contribution of the same set of skills in younger children, where stronger associations might be expected. Similarly, repeating the study, especially Experiment 1, with more agegroups could provide further insights on how the relative contribution of verbal short-term memory and statistical learning to vocabulary.
A further limitation of the study is that the dependent variable (PPVT) of Experiment 2, as well as one of the control variables (TROG-H) were selection variables for developmental language disorder. As explained above, our research focus was how statistical learning and verbal short-term memory contribute to vocabulary development. Both these skills have been found impaired in developmental language disorder (Evans et al., 2009;Masoura & Gathercole, 2005;Lukács et al., 2016; but see; Lukács & Kemény, 2014). Consequently, performance on these variables is on the lower end of the population's variance. One might argue that the reduced variability of the measures could lead to invalid results. While it is true in principle, we did not observe lower variability in the DLD group than in the TD group (see Table 1 for Descriptives). Variance in Vocabulary (SD 21.81 in TD,20.83 in DLD) and Statistical Learning (14.05% in TD and 11.37% in DLD) is slightly smaller in the clinical group, while variance in Verbal Short-Term Memory is slightly larger in DLD (0.77 in TD, 0.86 in DLD). The difference of variance is considerably larger in the case of Grammar skills, with larger variance in DLD (0.91 in TD and 2.40 in DLD). That is, the similarity of variances would not support a conclusion that the atypical pattern in DLD is due to the use of selection variables. On the other hand, the comparison of a typical (TD) and an extreme group (DLD) was the central aim of the current study, which could not be achieved otherwise.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors upon request, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics committee of the University of Graz (Experiment 1) and the Ethics committee of the Budapest University of Technology and Economics (Experiment 2). Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
FK and ÁL contributed equally to the design of the study, data collection and publication. Statistical analyses were conducted by FK.

FUNDING
This work was supported by the Momentum Research Grant of the Hungarian Academy of Sciences (Momentum 96233 "Profiling learning mechanisms and learners: individual differences from impairments to excellence in statistical learning and in language acquisition," PI: ÁL). The authors acknowledge the financial support by the University of Graz.