# THE ROLE OF LETTER-SPEECH SOUND INTEGRATION IN TYPICAL AND ATYPICAL READING DEVELOPMENT

EDITED BY : Jurgen Tijms, Silvia Brem, Gorka Fraga González and Iliana I. Karipidis PUBLISHED IN : Frontiers in Psychology and Frontiers in Human Neuroscience

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88963-698-3 DOI 10.3389/978-2-88963-698-3

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# THE ROLE OF LETTER-SPEECH SOUND INTEGRATION IN TYPICAL AND ATYPICAL READING DEVELOPMENT

Topic Editors:

Jurgen Tijms, University of Amsterdam, Netherlands Silvia Brem, Psychiatrische Klinik der Universität Zürich, Switzerland Gorka Fraga González, University of Zurich, Switzerland Iliana I. Karipidis, Stanford University, United States

Fluency is the quintessence of effective reading. To obtain socio-economic success, fluent reading is of primordial importance and reading is considered a crucial marker of an individual's life course. Approximately 5% of children are affected by developmental dyslexia, exhibiting inaccurate word recognition, spelling, phonological decoding, and most importantly, severely dysfluent reading, which remains as their most characterizing and persistent deficit. Unable to attain society's literacy demands, individuals with dyslexia are at severe risk for adverse academic, economic, and psychosocial consequences.

Recently, it has been posed that the development of automatic letter-speech sound (LSS) integration is critical in the acquisition of fluent reading skills, and in particular that a failure to develop automatic LSS integration results in an impairment of reading fluency. In support, neurocognitive research has suggested that the development of automatized processing of LSS associations is an essential step in the formation of a functional neural network for reading. Furthermore, both neurocognitive and behavioural studies have suggested a less efficient LSS integration in children with dyslexia than in typical readers. Finally, results from intervention studies have suggested that training LSS might be a promising approach to ameliorate dysfluent reading in children with dyslexia.

Nonetheless, there is still a considerable gap of knowledge in our understanding of the mechanisms by which learning LSS associations relate to (dys)fluent reading.

Citation: Tijms, J., Brem, S., González, G. F., Karipidis, I. I., eds. (2020). The Role of Letter-Speech Sound Integration in Typical and Atypical Reading Development. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88963-698-3

# Table of Contents


Mirjam Keetels, Milene Bonte and Jean Vroomen


Stephanie N. Del Tufo, Stephen J. Frost, Fumiko Hoeft, Laurie E. Cutting, Peter J. Molfese, Graeme F. Mason, Douglas L. Rothman, Robert K. Fulbright and Kenneth R. Pugh

*106 Performance in Sound-Symbol Learning Predicts Reading Performance 3 Years Later*

Josefine Horbach, Kathrin Weber, Felicitas Opolony, Wolfgang Scharke, Ralph Radach, Stefan Heim and Thomas Günther

*114 Longitudinal Task-Related Functional Connectivity Changes Predict Reading Development*

Gregory J. Smith, James R. Booth and Chris McNorgan

*127 Reading Independently and Reading With a Narrator: Eye Movement Patterns of Children With Different Receptive Vocabularies*

Zhuqing Su, Yifang Wang, Yadong Sun, Jinhong Ding and Zhuoya Ma

*135 Letter and Speech Sound Association in Emerging Readers With Familial Risk of Dyslexia*

Joanna Plewko, Katarzyna Chyl, Łukasz Bola, Magdalena Łuniewska, Agnieszka Dębska, Anna Banaszkiewicz, Marek Wypych, Artur Marchewka, Nienke van Atteveldt and Katarzyna Jednoróg

*148 Deficient Letter-Speech Sound Integration is Associated With Deficits in Reading but not Spelling*

Ferenc Kemény, Melanie Gangl, Chiara Banfi, Sarolta Bakos, Corinna M. Perchtold, Ilona Papousek, Kristina Moll and Karin Landerl


Linda Romanovska, Roef Janssen and Milene Bonte

*203 Early Brain Sensitivity to Word Frequency and Lexicality During Reading Aloud and Implicit Reading*

Luís Faísca, Alexandra Reis and Susana Araújo


Elpis V. Pavlidou and Louisa Bogaerts

# Editorial: The Role of Letter-Speech Sound Integration in Normal and Abnormal Reading Development

Jurgen Tijms <sup>1</sup> \*, Gorka Fraga-González <sup>2</sup> , Iliana I. Karipidis <sup>3</sup> and Silvia Brem<sup>2</sup>

<sup>1</sup> Department of Psychology, University of Amsterdam, Amsterdam, Netherlands, <sup>2</sup> Department of Child and Adolescent Psychiatry and Psychotherapy, University Hospital of Psychiatry, University of Zurich, Zurich, Switzerland, <sup>3</sup> Center for Interdisciplinary Brain Sciences Research, Department of Psychiatry and Behavioral Sciences, Stanford Medicine, Stanford University, Stanford, CA, United States

Keywords: reading development, dyslexia, letter-speech sound integration, reading disabilities, neural circuits

#### **Editorial on the Research Topic**

#### **The Role of Letter-Speech Sound Integration in Normal and Abnormal Reading Development**

Learning to read is a central focus of education as it enables us to successfully participate in society and develop as individuals. In this process, a crucial milestone toward expert reading is the ability to read fluently, that is, to quickly and effortlessly access the meaning of print. However, this is not an innate skill and has to be learned at school through explicit instruction and extensive practice. Specific brain networks functionally specialize to serve reading-related cognitive processes. In particular, areas involved in visual recognition of symbols and auditory processing of language are at the heart of this adaptation. Their specialization is essential in the first stages of learning how to read. As a vital part of this process, the crossmodal (audiovisual) integration of letters and speech sounds constitutes the key starting point toward an effective brain network that enables fluent reading (Blomert, 2011).

Letter-speech sound (LSS) integration is an important process to be studied in order to better understand progress in typical reading acquisition and impairments in developmental dyslexia. A failure in the automation of LSS associations has been consistently reported in impaired readers of different orthographies (Richlan, 2014). In addition, optimizing this associative process seems to be a key ingredient for positive outcomes in training and intervention aimed at improving reading fluency (Mehringer et al., 2020; Patel et al.). From a neurodevelopmental perspective, neuroimaging studies suggest that the input from audiovisual integration areas into visual processing areas is a primary drive of visual specialization, crucial in the later stages of the reading network development (Brem et al., 2010; Fraga González et al., 2017; Pleisch et al., 2019).

Several important research questions remain to be clarified. First, what constitutes typical and atypical developmental pathways of LSS integration and how do these relate to successful and unsuccessful reading acquisition? Second, do more fundamental deficits such as deficits in general associative learning mechanisms obstruct LSS integration in impaired readers? Third, and more relevant to clinical practice, what is the optimal way to facilitate the process of LSS integration and automation? In the current Frontiers' Research Topic we present a collection of original research articles from different disciplines that are, directly or indirectly, advancing our knowledge on these questions. We divided the papers in this Research Topic conceptually into three sections: (1) cognitive and linguistic processes related to the development of letter-speech integration, (2) a neurocognitive window into letter-speech sound binding, and (3) interventions to support reading development.

Edited and reviewed by: Jon Andoni Dunabeitia, Nebrija University, Spain

> \*Correspondence: Jurgen Tijms j.tijms@uva.nl

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 18 May 2020 Accepted: 28 May 2020 Published: 07 July 2020

#### Citation:

Tijms J, Fraga-González G, Karipidis II and Brem S (2020) Editorial: The Role of Letter-Speech Sound Integration in Normal and Abnormal Reading Development. Front. Psychol. 11:1441. doi: 10.3389/fpsyg.2020.01441

## COGNITIVE AND LINGUISTIC PROCESSES RELATED TO LETTER-SPEECH SOUND INTEGRATION

This first section is dedicated to mechanisms related to the associative learning processes in LSS integration. The studies in this section help us specify cognitive abilities and perceptual processes important for reading acquisition, as well as deficits in specific learning disorders, including dyslexia and dysgraphia. Experimental manipulation of the visual and auditory properties of LSS offer valuable insights to the fundamental mechanisms of these cognitive and linguistic processes. In addition, relating performance in such experimental paradigms with literacy outcomes provides a better understanding of how specific aspects of cognition relate to typical and atypical literacy development. For instance, artificial script learning paradigms are excellent experimental tools to examine these issues and to find new predictors of reading abilities.

The first paper of this section (Pavlidou and Bogaerts) examined the role of implicit statistical learning (ISL) in reading acquisition. The results point to the relevance of perceptual modality in ISL, and show an interesting association between visual ISL and phonological awareness. Pavlidou and Bogaerts discuss how visual ISL could facilitate mapping letters and speech sounds in novice readers. Keetels et al. approached the atypical integration of letters and speech sounds in dyslexia by examining the ability to adjust one's perceptual interpretation of ambiguous speech input in accordance with contextual information. Results revealed that adults with dyslexia were, in contrast to typical readers, unable to use text to recalibrate their phoneme categories, whereas no deviations in their ability to recalibrate by lipread speech were found. This result supports a LSS integration deficit in dyslexia, but suggests this does not extend to a more general audio-visual integration deficit. Interestingly, Romanovska et al. using the same paradigm, failed to find a difference in recalibration by written text between young readers with and without dyslexia. The authors emphasize the importance of taking dynamic developmental processes into account, and specifically, they point to the potential role of changes in the temporal integration window for LSS coupling during development. From a linguistic perspective, Caccia et al. showed that in Italian, pitch is the most reliable acoustic cue in stress perception in words for adults and, less markedly so, for typical reading children. In contrast, Italian children with dyslexia did not seem to rely as much as typical readers on pitch for stress perception. Following their results, Caccia et al. discuss the relevance of language-specific features in studying atypical reading development.

Although previous research has shown many similarities between reading disabilities (dyslexia) and spelling disabilities (dysgraphia), there is a dearth of research specifically dedicated to dysgraphia. Döhla et al. therefore investigated which cognitive deficits are associated with spelling deficits in dysgraphia. A cluster analysis revealed that children with dysgraphia could be split into two distinct clusters, one with auditory deficits and the other with deficits in visual magnocellular functions. The authors discuss the implications of these findings for developing more individually tailored interventions.

The last two studies of this section used a paradigm of artificial LSS learning to mimic the initial stages of reading acquisition. Horbach et al. developed a paradigm in which preliterate children had to map morse-code symbols to speech sounds, and subsequently followed the children's reading development over a 3-year period. The performance on this learning paradigm turned out to be a particularly relevant predictor of reading fluency and reading comprehension skills 3 years later. Law et al. also examined artificial LSS learnability with Hebrew letters and speech sounds, comparing children with dyslexia and typical readers in grade 3. The results showed a reduced ability of children with dyslexia to use the newly learned LSS correspondence for reading words presented in the novel script. However, in contrast to Horbach et al. this study found no significant independent contribution of artificial LSS learning to reading skills.

## A NEUROCOGNITIVE WINDOW INTO LETTER-SPEECH SOUND BINDING

The studies included in this section used a wide range of non-invasive neural imaging and modulation (or stimulation) methods to study letter-speech sound binding and how it relates to reading. These methods include event-related potentials (ERP) of electroencephalography, magnetocencephalography (MEG), functional magnetic resonance imaging (fMRI), transcranial direct current stimulation (tDCS), and magnetic resonance spectroscopy (MRS). Common to all studies is their special interest on measures of brain activation and connectivity in parieto-temporal brain regions, because of their important role in the processing of audiovisual information.

In the first paper of this section, Richlan et al. presents a review on the neural networks associated with LSS integration in typical and atypical reading development. The review suggests a putative neurocognitive deficit specific to the crossmodal integration of letters and speech sounds that hinders the emergence of a functional neural system for reading. In concordance with this suggestion, Plewko et al. show that alterations in brain activity during LSS association can be detected at very early stages of reading acquisition in kindergarten and first grade between children with and without familial risk for dyslexia. Their results suggest that an increased response of the left superior temporal cortex to incongruent LSS pairs reflects an early stage of automatization. Absence of such a distinct response to incongruent information during this early stage of reading acquisition is suggested to potentially cause reading problems through deficient suppression of irrelevant information. In an ERP study, Kemény et al. also demonstrate a deviation in neural responses in a stroop-like interference LSS integration task in children with combined reading and spelling deficits, compared to typically developing peers. Notably, ERPs of children with isolated deficits in spelling did not differ from those of typically developing children, suggesting that deficits in automatized LSS associations may be specifically associated with reading impairments.

Both Younger and Booth, and Xu et al. emphasize the role of parietotemporal regions in the early stages of learning to read. Using transcranial direct current stimulation (tDCS) to manipulate parietotemporal function, Younger and Booth revealed that stimulation of these brain areas can enhance the learning of new grapheme-phoneme associations. Interestingly, the results of Younger and Booth suggest that whereas parietotemporal function may be critical to new, initial learning, its role in continued reading development is likely to change afterwards. In line with these findings, the results of the MEG study of Xu et al. with Finnish school children emphasize the crucial role of the parieto-temporal cortex in the early phase of reading. They show that audiovisual integration effect of letters and speech sounds are most pronounced in parietotemporal regions and correlate with reading and writing skills.

Faísca et al. provide novel insights into processes involved in visual word recognition. In their ERP study, they show task dependent effects of lexicality on early ERPs (within 200 ms) in expert adults readers. This result suggests that visual word recognition is not simply the consequence of letter-speech integration only, but results from an interplay between various bottom-up and top-down processes.

The study of Smith et al. uses fMRI to examine developmental changes in functional connectivity in children's neural reading network over a 2.5-year period. This study provides evidence that improvements in reading skill over time are predicted by the nature and degree of changes among connectivity patterns within the reading network. More specifically, an overall increase in processing coherence among regions of the reading network, was shown to be a critical driver of growth in reading proficiency.

The final study in this section, by Del Tufo et al., examines the mediating role of crossmodal integration of visual and spoken word representations in the relationship between neurochemical concentrations and reading proficiency using MRS. The results revealed that the effect of cross-modal word matching mediated the relationship between increased glutamate (a suggested index of "neural noise" or random variability in neuronal firing) and poorer reading ability as well as the relation between increased choline and poorer reading ability. In addition, lower GABA and higher N-acetyl-aspartate (NAA) predicted faster cross-modal matching reaction times. These results are discussed vis-à-vis a biochemical framework in which the ability of neurochemistry to predict reading ability may at least partially be explained by cross-modal integration.

## INTERVENTIONS TO SUPPORT READING DEVELOPMENT

This section presents several intervention approaches for reading in different languages that aim at supporting struggling readers or facilitating typical reading development. In some of these interventions, LSS integration constitutes a central training component.

Patel et al. evaluated the effects of a game-based intervention, GraphoLearn, that trains LSS mappings in English reading skills of young struggling readers in India. In this small-scale study, a group of 7-year-old children was randomly allocated to either the GraphoLearn training or a control Math-game training. The results revealed that, compared to the control training, GraphoLearn led to significant improvements in children's lettersound knowledge, a critical factor in early reading development.

The three other studies of this section did not focus on children with reading impairments but investigated the potential of specific intervention mechanisms to promote reading development. In the study of Pinto et al. a training called PASSI (Promoting the Achievement of Sound-Sign Integration) aimed to improve kindergarteners' conceptual knowledge of the Italian writing system. 159 Italian children (3–5 year old) were randomly assigned to either the experimental training or a control group. Results revealed positive effects of this type of training on several emergent literacy skills. Siu et al. were interested whether metalinguistic awareness (addressing phonetic-symbolic and semantic-symbolic mappings) or working memory training could be effective for Chinese reading skills. In this small-scale study, second graders in Hong Kong were randomly assigned to a metalinguistic training group, a working-memory training group, or a waitlist control group. The results of the study suggest that metalinguistic awareness training enhanced phonological skills, whereas the working memory training enhanced memory span, but both training groups improved similarly in word reading fluency in Chinese and English compared to the control group. In the last study, Su et al. compared effects of two reading styles, reading with a narrator vs. reading independently, on eye movements and early literacy skills of young children (4–6 years old). Children were randomly assigned to one of the two reading approaches, and effects of the narrator were evaluated by analysis of eye movement patterns. Based on differences in fixation patterns during text reading, Su et al. concluded that children in kindergarten profit from reading with a narrator because this supports the acquisition and consolidation of mappings between speech and text.

## CONCLUDING REMARKS

To sum up, we present a collection of studies from diverse fields that contribute to our understanding of a milestone in human cognition that is reading acquisition. The studies in the first section successfully pointed at associative learning mechanisms and markers that are important contributors to our capacity of integrating letters and speech sounds to become successful readers. The neuroimaging studies in section two advance our knowledge of typical and atypical development of brain networks that enable us fluent reading, clarifying the role of crossmodal integration regions as part of a connected reading network. Finally, the importance of LSS integration for reading finds support in the intervention studies in section three, which demonstrate its relevance to the design of support and remediation programs for reading across languages. Although there are still important gaps in our current models of typical and atypical reading abilities and reading acquisition, we are certain that the studies presented in this Frontiers Research Topic extend our knowledge and open new interesting venues for future research. The investigation of audiovisual integration of letters and speech sounds constitutes a fascinating and complex topic which encompasses interdisciplinary knowledge from both fundamental and applied research domains. We look forward to more exciting new findings and hope that the integrative efforts

#### REFERENCES


of so many international and multidisciplinary labs continue yielding progress in our understanding of human learning and cognition.

## AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Tijms, Fraga-González, Karipidis and Brem. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Parietotemporal Stimulation Affects Acquisition of Novel Grapheme-Phoneme Mappings in Adult Readers

Jessica W. Younger <sup>1</sup> \* and James R. Booth1,2

<sup>1</sup> Department of Communication Sciences and Disorders, University of Texas at Austin, Austin, TX, United States, <sup>2</sup> Department of Psychology and Human Development, Vanderbilt University, Nashville, TN, United States

Neuroimaging work from developmental and reading intervention research has suggested a cause of reading failure may be lack of engagement of parietotemporal cortex during initial acquisition of grapheme-phoneme (letter-sound) mappings. Parietotemporal activation increases following grapheme-phoneme learning and successful reading intervention. Further, stimulation of parietotemporal cortex improves reading skill in lower ability adults. However, it is unclear whether these improvements following stimulation are due to enhanced grapheme-phoneme mapping abilities. To test this hypothesis, we used transcranial direct current stimulation (tDCS) to manipulate parietotemporal function in adult readers as they learned a novel artificial orthography with new grapheme-phoneme mappings. Participants received real or sham stimulation to the left inferior parietal lobe (L IPL) for 20 min before training. They received explicit training over the course of 3 days on 10 novel words each day. Learning of the artificial orthography was assessed at a pre-training baseline session, the end of each of the three training sessions, an immediate post-training session and a delayed post-training session about 4 weeks after training. Stimulation interacted with baseline reading skill to affect learning of trained words and transfer to untrained words. Lower skill readers showed better acquisition, whereas higher skill readers showed worse acquisition, when training was paired with real stimulation, as compared to readers who received sham stimulation. However, readers of all skill levels showed better maintenance of trained material following parietotemporal stimulation, indicating a differential effect of stimulation on initial learning and consolidation. Overall, these results indicate that parietotemporal stimulation can enhance learning of new graphemephoneme relationships in readers with lower reading skill. Yet, while parietotemporal function is critical to new learning, its role in continued reading improvement likely changes as readers progress in skill.

Keywords: transcranial direct current stimulation, parietotemporal cortex, reading acquisition, artificial orthography, reading skill

## INTRODUCTION

Reading is a fundamental educational skill important for academic and vocational success (Gerber, 2012), yet not every child develops into a fluid reader. Poor readers exhibit a number of behavioral deficits related to reading including phonological awareness, grapheme-phoneme (lettersound) mapping and reading fluency (Shaywitz and Shaywitz, 2005; Siegel, 2006). These behaviors

#### Edited by:

Silvia Brem, Psychiatrische Klinik der Universität Zürich, Switzerland

#### Reviewed by:

Kristina Moll, Ludwig-Maximilians-Universität München, Germany Fabio Richlan, University of Salzburg, Austria

> \*Correspondence: Jessica W. Younger jrwise@utexas.edu

Received: 03 January 2018 Accepted: 06 March 2018 Published: 23 March 2018

#### Citation:

Younger JW and Booth JR (2018) Parietotemporal Stimulation Affects Acquisition of Novel Grapheme-Phoneme Mappings in Adult Readers. Front. Hum. Neurosci. 12:109. doi: 10.3389/fnhum.2018.00109

**9**

have been related to a primarily left hemisphere network of brain regions that show reduced activation in individuals with poor reading ability, including inferior frontal, parietotemporal and occipitotemporal areas (Richlan et al., 2011; Richlan, 2014). While neuroimaging research has converged on neural patterns associated with poor reading, which neural patterns are causes compared to consequences of poor reading is not yet clear. One proposed theory of failed reading is that reduced activity in parietotemporal regions involved in grapheme-phoneme integration results in impaired learning of letter-sound mappings critical for reading (Pugh et al., 2001; Schlaggar and McCandliss, 2007; Blau et al., 2010; Blomert, 2011). This idea is supported by developmental research showing the most consistently underactivated region in children with dyslexia is parietotemporal cortex, though adults with dyslexia show greater reduced activation in occipitotemporal areas associated with orthographic processing (Richlan et al., 2011).

Further support is provided by studies of pre-readers with and without risk for dyslexia as well as intervention studies. The most consistently reported anatomical and functional differences reported are in parietotemporal areas (Vandermosten et al., 2016). Indeed, successful reading intervention is marked by increases in parietotemporal regions in both children (Simos et al., 2002, 2007; Shaywitz et al., 2003, 2004; Odegard et al., 2008; Yamada et al., 2011) and adults (Eden et al., 2004). Though intervention modulates activity in inferior frontal and occipitotemporal areas, activity in parietotemporal cortex has also shown to be predictive of the response to intervention (Odegard et al., 2008; Rezaie et al., 2011a,b), further supporting its potential causal role in development of reading skill. However, reduced function in occipitotemporal activation has also been observed (Specht et al., 2009; Raschle et al., 2011, 2012; Vandermosten et al., 2016). Without longitudinal data from sufficiently large samples to determine whether at risk children do go on to develop dyslexia, the neural differences associated with risk for dyslexia that are also significant predictors of the development of dyslexia cannot be established.

Thus, while parietotemporal activity and an understanding of grapheme-phoneme mappings have been established as critical for reading improvement for struggling readers, whether parietotemporal activity is causally related to learning graphemephoneme mappings is not yet clear. To better support remedial reading programs for both children and adults, we must better understand the role of parietotemporal activity in new learning. The neural effects associated with learning of new graphemephoneme relationships in non-impaired individuals can be examined in adults by training them to read a new orthography. This new orthography could be a previously unknown writing system or an artificial one created to control for or manipulate various factors such as mapping consistency or visual complexity of characters. Similar to intervention and developmental studies, orthographic learning studies in adults have shown learning related increases in left hemisphere reading regions (Hashimoto and Sakai, 2004; Bitan et al., 2005; Callan et al., 2005; Mei et al., 2014; Takashima et al., 2014; Taylor et al., 2017). Further, parietotemporal areas do show training related increases specifically related to accuracy gains on grapheme-phoneme mappings in the new language (Hashimoto and Sakai, 2004; Callan et al., 2005; Takashima et al., 2014). Parietotemporal cortex is also involved in reading untrained ''transfer'' words in the newly learned script that is similar to the activity found during reading pseudowords in English (Mei et al., 2014; Takashima et al., 2014; Taylor et al., 2017). However, relationships between individual differences in activation and learning have largely been left unexamined, and the predictive relationship between parietotemporal activity and learning outcomes seen in intervention studies has not yet been established in orthographic learning studies. One study has demonstrated a positive relationship between increases in parietotemporal activity and increases in accuracy to trained words, but there was no relationship with transfer word or retention performance (Deng et al., 2008). However, this study focused on training semantic-orthographic, not grapheme-phoneme, relationships. The only studies examining pre-training neural predictors of orthography learning in adults have been restricted to orthographic processing areas in occipitotemporal regions (Xue et al., 2006; Cao et al., 2013). Even less is known about the neural predictors of long-term retention of the newly learned orthography, though one study indicates visual attention prior to learning may be an important factor (Cao et al., 2013). Thus, whether there is a relationship between parietotemporal region activation and learning of grapheme-phoneme mappings, including the ability to transfer and retain this information, is yet unknown.

One method used to experimentally examine the role of parietotemporal regions in reading is neuromodulation. Neuromodulation affects neural activity in the affected region(s) which often leads to physiological or behavioral changes (Nitsche et al., 2008; Nitsche and Paulus, 2011; Stagg and Nitsche, 2011; Horvath et al., 2015, 2016). One such neuromodulation tool is transcranial direct current stimulation (tDCS), in which a low electrical current is delivered to the scalp. Anodal (positive) current is thought to reduce the firing threshold of neurons in brain regions under the electrode, while cathodal (negative) current is thought to raise the firing threshold (Nitsche and Paulus, 2000; Stagg and Nitsche, 2011). Anodal stimulation is therefore presumed to enhance behavior, whereas cathodal stimulation inhibits it (Jacobson et al., 2012) though this traditional pattern does not always hold true (Wiethoff et al., 2014; Bestmann et al., 2015). Studies applying anodal tDCS to parietotemporal areas have demonstrated stimulationrelated improvements in reading ability in low-skill adults (Turkeltaub et al., 2012; Younger et al., 2016) and adolescents with dyslexia (Costanzo et al., 2016a). These studies thus support previous research showing a relationship between changes in parietotemporal function and changes in reading skill. However, this relationship has yet to be extended to new learning.

Only one previous study has paired stimulation with reading skill training to determine whether stimulation to parietotemporal areas can facilitate reading intervention in children with dyslexia (Costanzo et al., 2016b). Children received training on reading speed and grapheme-phoneme mappings with real or sham stimulation aimed at enhancing left lateralization of parietotemporal cortex. Reading speed was trained via tachistoscopic presentation of words in which words were flashed on the screen for a limited time range after which children were to read the word aloud. Graphemephoneme mappings were trained via tasks in which children had to correctly complete the written form of a word that corresponded to a presented picture and a task in which children rearranged syllables to form real words. Children who received real stimulation during training showed improved accuracy for low-frequency words and improved reading speed for nonwords compared to children who received training without stimulation. Gains in performance for the stimulation group compared to the sham group were also maintained for a 6-week period, showing stimulation can have lasting impact on performance. Further, because the effects were found on skills not directly related to the training received, there is some support for the effects of stimulation transferring to untrained skills. This study provides a demonstration of the potential for parietotemporal stimulation to enhance reading interventions. Yet, because both reading speed and grapheme-phoneme mappings were trained, it is unclear whether stimulation benefitted both or only one process. Further, stimulation only affected two of eight measures of reading, and while there is some evidence of transfer, that there was no effect on the behaviors more directly related to the training is at odds with previous tDCS research. Thus, while stimulation to parietotemporal areas do seem to affect reading related learning, many open questions remain.

The goal of the current study, therefore, was to examine the effect of parietotemporal stimulation on learning new graphemephoneme mappings in adults varying in their reading skill. We taught adults to read a novel writing system for English, allowing us to examine learning rates when only the visual representation of a word is novel, not the sound or meaning. This design ensured any potential effects could be attributed to learning new visual representations and not due to potential influences of processing novel or meaningless sounds. We then compared learning curves between readers who received real or sham stimulation to the parietotemporal cortex. We predicted individual differences in reading skill prior to learning would interact with stimulation to affect learning. Specifically, we expected that stimulation would increase learning curves for low skill readers more than high skill readers because of diminishing returns on the effect of stimulation. To ensure that grapheme-phoneme rules were learned and readers did not simply memorize mappings of entire word forms, we examined performance on both trained and novel, untrained transfer words. Similar effects of stimulation on performance across trained and untrained transfer words would indicate parietotemporal stimulation affected acquisition of these new grapheme-phoneme mappings. Finally, we also determined whether parietotemporal stimulation facilitated long-term maintenance of newly learned material, which would indicate lasting benefits of stimulation facilitated learning, as seen in previous studies.

## MATERIALS AND METHODS

## Participants

In total, 89 right-handed 18–35-year-old native English speaking adults with normal or corrected-to-normal vision enrolled in the study. All participants reported no history of neurological disorder, psychiatric disorder, significant head trauma, hearing loss, substance abuse, seizure or migraine, metal implants and current pregnancy. Of the initial 89, 79 participants completed all training sessions and were considered for the analysis. An additional 16, eight in each stimulation group, were excluded for showing no evidence of learning during training (performance significantly above chance at both the final training session and final test of words). The remaining 63 participants included in the analysis had at least average (>85 standard score) intelligence as measured by the Wechsler Abbreviated Scale of Intelligence (WASI; Wechsler, 2008). All participants scored within two standard deviations above or below the mean on all standardized assessments (>70 and <130 standard score), with the exception of the WASI, in which the maximum score was 135. Participants were pseudorandomly assigned to receive real or sham stimulation to the left inferior parietal lobe (L IPL) based on standardized testing performance at baseline to ensure equivalent performances and number of participants across stimulation groups. Of those who met all performance criteria, 32 (25 female) received real stimulation to the L IPL and 31 (21 female) received sham stimulation. Two-sample t-tests revealed no significant effects of group on all group characteristics and standardized test performance as reported in **Table 1**.

#### Procedure

Participants took part in single-blind sham controlled study completed over a total of six sessions. The training procedure is depicted in **Figure 1**. The first five sessions occurred between 24 h and 48 h of each other, and the sixth took place approximately 4 weeks after the completion of the fifth session (mean 4.7 weeks; range 1.4–8 weeks). During the first session, participants completed both a battery of standardized tests to determine reading ability and a baseline test of the training stimuli. During the second, third and fourth days, participants received 20 min of real or sham stimulation followed by training on 10 new words in the artificial orthography. Finally, they were tested on the entire training set of 30 words and a unique set of 20 untrained ''transfer'' words that followed the same grapheme-phoneme pattern as the trained words. On the fifth day, participants did not receive stimulation but completed a cumulative test of all 30 trained words and 20 unique transfer words to assess final knowledge of the artificial orthography. The sixth retention test session was similar to the fifth session; participants did not receive stimulation and were tested of all 30-trained words as well as a test of 20 unique untrained transfer words.

#### Standardized Testing

Participants completed a battery of standardized tests to assess general intelligence and reading ability. Intelligence was

#### TABLE 1 | IQ was measured by the Performance subscale of WASI.


Groups were similar in demographics and performed similarly on all measures of reading skill. IQ was measured by the Performance sub-scale IQ index from the Wechsler's Abbreviated Scale of Intelligence. Note: all tests have µ = 100, σ = 15, except for Reading Rate.

measured by the nonverbal-scale IQ index from the Performance sub-scale WASI (Wechsler, 1999). All participants had at least average intelligence (>85 standard score), per inclusionary criteria. Reading fluency was assessed by the Phonemic Decoding Efficiency (PDE) and Sight Word Efficiency (SWE) subscales of the Test of Word Reading Efficiency (TOWRE; Torgesen et al., 1999). The TOWRE requires participants to read as many pseudowords (PDE) or words (SWE) as possible in 45 s. Untimed reading skill was assessed by the Woodcock Johnson Test of Achievement III Word Identification and Word Attack subtests (Woodcock et al., 2007). These tests require participants to read increasingly difficult words (Word Identification) or pseudowords (Word Attack) with no time requirements. The TOWRE tests were additionally administered on the fifth day after completion of the final testing session to assess whether stimulation may have affected English reading ability.

In keeping with previous studies demonstrating an effect of tDCS on reading skill, we used SWE performance as the metric of reading skill for analytical purposes. However, the maximum standard score an adult can earn on the SWE is 113 (mean 100, SD 15). Because we wanted to assess a wide-range of reading abilities (two standard deviations above or below the mean), we used a modified metric. In addition to recording the total number of words participants correctly read in 45 s in accordance with the standardized protocol, we allowed all participants to read the entire list of words and recorded the time to read the list in its entirety. We then calculated a reading rate score by dividing the total number of correctly read words by the number of seconds required to complete the list. To relate reading rate to standardized test performance, we calculated the reading rate that would correspond to standard scores of 87, 100 and 113. The corresponding reading rates were 1.911, 2.177 and 2.28 words per second, respectively.

#### Artificial Orthography and Training Procedure

Participants were trained on an artificial orthography using a Klingon-like script created for a previous successful artificial orthography training study (Brennan and Booth, 2015). The orthography is composed of letter-like characters that correspond to English phonemes and are combined to make English words. By learning real English words instead of pseudowords, participants had access to semantic representations during learning. This design approximates learning an orthography for which the linguistic sounds and their meaning are known. The artificial orthography was previously pruned for symbols resembling English letters. The remaining graphemes were randomly assigned to correspond with 10 consonant (/b/, /d/, /g/, /k/, /m/, /n/, /p/, /r, /s/, /t/) and five vowel (/æ/, /i/, /I/, /A/, /U/) phonemes. Words were constructed using a CVC structure with a transparent 1:1 grapheme to phoneme ratio such that each letter represents one and only one sound. This design means that though participants learned English words, the training words may not have had the same number of letters as their English counterparts. For example, ''beet'' is written with three graphemes, corresponding to the /b/, /i/ and /t/ phonemes present in the word. The low graphemephoneme ratio was used to encourage a decoding-based learning strategy and discourage a holistic strategy of memorizing whole symbols or attempting to translate the symbols into English. Further, it maximized the potential for transfer to new words. Inconsistencies between number of consonant graphemes in English and artificial orthography words occurred in 25% of the 130 words participants were exposed to throughout the course of training and testing. Of those, inconsistencies primarily related to digraphs, e.g., the digraph ''ck'' was represented with one letter, ''k'', in the artificial orthography. Only seven words had inconsistent consonant spellings not related to digraphs (e.g., ''cat'' was represented as ''kat''). For a full list of stimuli, see Supplementary Table S1.

Each participant learned a total set of 30 words, broken in to three training lists of 10 words each. In each training list, each consonant was used twice: once as the first and once as the last letter of a word. Each vowel was used twice. Five sets of 20 ''transfer'' words were also created following the same procedures. These transfer sets were tested but not trained, allowing us to determine how well participants generalized the underlying grapheme-phoneme rules present in the orthography. Sets of words were equated for English word frequency, and the construction of word lists ensured that the occurrence of each letter was equated. As such, while semantics was accessible to participants, it could not have affected learning. That is, words could not be predicted based on information from the first two letters alone, and all three letters needed to be processed to correctly identify the word.

Training took place over the course of three sessions, during which 10 of the 30 words from the training set were each presented twice. This low number of training trials per word was to minimize potential ceiling effects on learning. On each training trial, a word was presented for a total of 4000 ms. After 2500 ms, the correct corresponding auditory word was played, which lasted approximately 600 ms. The word remained on the screen for an additional 1500 ms following the pronunciation. Participants were instructed to say the correct word aloud at some point during the trial (see **Figure 2**). While the verbal responses were not recorded, the requirement to say the word ensured attention to the task and aided in the learning process.

After each training block, the entire set of 30 training words as well as one set of transfer words were tested. As such, the number of words participants were explicitly trained on prior to testing differed each training day. In the first and second training sessions, 20/30 and 10/30 words respectively were similar to transfer words in that the correct pronunciation of the symbol was not known, however, participants had been

previously exposed to these symbols during the baseline test. During each test trial, one word was presented on the screen for 2000 ms followed by an auditory word. Participants were asked to determine whether the presented stimuli are from the same word (i.e., if the auditory and visual items match), and press a button in response. Participants were not provided feedback on these tests. Each visual word was presented twice: once matched with its correct audio, and once mismatched. The foil for a target word was a word from the set that shares at least one letter with the target word. In order to prevent learning from the test, foil pairs were always presented together. A different transfer set was tested after each training block to ensure transfer words were completely novel for each test. Each test thus consisted of 60 trials of trained words and 40 trials of untrained transfer words.

#### tDCS

Direct current was administered using a battery-driven DC stimulator device (NeuroConn) via two saline-soaked electrodes (5 cm × 5 cm; 25 cm<sup>2</sup> ). The anode electrode was placed over the L IPL (P3) according to the international 10-20 system for electroencephalography (EEG) electrode placement (Herwig et al., 2003). The cathode (return) electrode was placed over the contralateral supraorbital frontal region. During real stimulation, 1.5 mA of current (current density 0.06 mA/cm<sup>2</sup> ) was delivered for 20 min. During sham stimulation, the machine ramped up to 1.5 mA for 30 s, then extinguished over a 5 s fade-out. Using this procedure allows participants to feel the initial sensations (e.g., tingling or itching) associated with stimulation without any after-effects of stimulation being induced (Nitsche and Paulus, 2000). These stimulation parameters replicate the parameters used previous reading studies (Turkeltaub et al., 2012; Younger et al., 2016) and are within the safety limits established in prior studies on humans and animals (Iyer et al., 2005; Nitsche et al., 2008; Bikson et al., 2009). All participants watched a silent movie for 20 min during the actual or sham stimulation (Antal et al., 2007; Gill et al., 2015).

#### Analysis

Accuracy to trained and transfer words across the six testing sessions were analyzed using a multivariate latent growth curve modeling approach (McArdle and Nesselroade, 2003) using Mplus v7.3 (Muthén and Muthén, 2012). Data were analyzed using full information maximum likelihood (FIML) estimate to take all data, including participants with missing data, into account. Latent growth curve modeling estimates an intercept, the starting value for a measurement, and a slope to represent the intercept's change across all measurement points. Accuracy during the baseline testing session was entered as the initial measurement or intercept (path weight of 0) for both trained and transfer words. The slope therefore estimated the amount of accuracy change beyond the baseline session that occurred over the remaining sessions relative to 0, for all participants, regardless of initial baseline performance. Because the shape of the learning curve may not be linear, path weights for the three training sessions were allowed to be freely estimated while the path weight for the testing session (day 5) was fixed to 4. Since no additional training with the artificial orthography occurred between the final testing session and the retention testing session, the path weight for the retention session was also fixed to 4. Further, we expected a direction change to occur between the first 5 days and the retention test such that accuracy would increase over the first five sessions but decrease at the retention test. Therefore, we entered an additional slope to model the change between the final testing session and the retention test. For these second intercepts and slopes, all testing sessions were fixed to 0 with the retention test session fixed at 1. This approach allowed us to examine effects of stimulation and skill on both acquisition of the new orthography and its retention separately. Model fit was assessed using the root mean squared error of approximation (RMSEA) and the Comparative Fit Index (CFI). CFI compares fit of the target model to a null model in which it is assumed all variables are uncorrelated. CFI scores range between 0 and 1, with 1 indicating the best fit. RMSEA is an absolute measure of fit that indicates the difference between the observed and predicted covariance matrix with values ranging from 0 to 1, and 0 indicating a perfect fit on the target model. Traditionally, a CFI > 0.90 and RMSEA < 0.05 is considered good model fit. CFI values between 0.80 and 0.90 and RMSEA values between 0.05 and 0.08 are generally considered acceptable but suboptimal (Hooper et al., 2008).

#### Covariates

To determine the effect of variables on intercept and slopes, intercept and slopes were regressed on covariates entered into the model. Covariates of interest were stimulation group, reading skill, and interaction between reading skill and group. Group was entered as a dummy coded variable with as 2 representing real stimulation and 4 sham. Reading rates centered around the rate corresponding to the mean standard test score of 100 (2.177) were entered to represent reading skill. A group by skill interaction term was determined by multiplying the group dummy code variable by the centered reading rate and entered as an interaction term. Additional covariates were entered to control for previously demonstrated effects of age, IQ and sex on stimulation. Age was centered around 18, the youngest age in the sample, IQ was centered on the population mean standard score (100), and sex was dummy coded as 1 or 2. These values were then each multiplied by the group dummy variable to obtain an interaction term for each. The intercept and slopes were additionally regressed on the three interaction terms.

#### Missing Data

Not all participants had usable data from all testing sessions. Seventeen participants (nine stimulation, seven sham group) did not complete the retention test session. In some cases, individual responses were not recorded due to technical errors or slow response time. Trials were excluded if the response time was less than 300 ms or no response was recorded (including responses that did not correspond to the instructed keyboard response). Data from a testing session was considered unusable and entered into the latent growth curve model as missing if the number of missing responses was greater than statistically different from chance (22 and 13 missing responses for trained and transfer tests respectively). Thus, in all included time points, participants responded to at least 63% (trained) and 67% (transfer) of all trials, whether correct or incorrect. All participants had at least three time points of useable data and missingness was not systematically related to reading skill or stimulation group. The number of participants for each time point ranged from 45 (the retention test) to the full set of 63 participants. All time points met minimum covariance coverage (10%) with values ranging from 68.3% to 100%.

## RESULTS

Standardized parameter estimates of each covariate on the intercept and training and transfer slopes are reported in **Table 2**. Standardized parameters indicate the estimated standard deviation change in intercept and slopes given one standard deviation change in the predictor variable.

## Trained Words

Model fit indices indicate the model fit the data for trained words well (RMSEA = 0.039; CFI = 0.978). Significant effects of skill and group by skill interaction term on the intercept indicate higher skilled readers tended to perform better at baseline. However, lower skill readers tended to show the lowest performance at baseline within the stimulation group while higher skill readers tended to have the lowest performance within the sham group.

There were significant effects of skill and group by skill interaction on the training slope after controlling for significant effects of interactions between stimulation group and age and IQ. A negative parameter estimate for skill indicates the training slope became smaller as skill increased. Because skill was treated as a continuous variable, we used the model to estimate the effect of group on the training slope at three skill levels to interpret the interaction effects. The three skill levels chosen were the centered mean and two standard deviations above or below the centered mean reading rate calculated using the mean and standard deviation of the centered reading rate in the sample (mean 0.111; SD 0.309). There was a significant positive effect of group at the lower skill level, but a significant negative effect of group at the higher skill level. Thus, stimulation benefited the training slope for lower skill readers, but stunted the training slope for higher skill readers. Given the significant effects of variables of no interest (such as the interaction between stimulation group and age), results were visualized by calculating the model estimated performance of the same participant across different levels of stimulation group and skill. In this way, the visualization of results shows the effect of stimulation group and skill in the absence of any effects of demographic variables. **Figure 3A** shows the model predicted performance for an 18-year-old male with average IQ (reflecting a mean score of 0 for these covariates of no interest) and either two standard deviations below (low skill) or above (high skill) the mean centered reading rate of the sample. All subsequent plots use these same parameters. It should be noted that despite differences in intercept (baseline performance) the effects of slope are calculated assuming an intercept of 0. As such, slope would only be affected by baseline performance


TABLE 2 | Parameter estimates (standard error) for each covariate on the intercept, training slope and retention slope for trained and transfer words.

Positive effect of group indicates an advantage for sham, and a negative effect of group indicates an advantage for stimulation. Skill had a significant effect and interaction with group on training slopes for both trained and transfer words. Stimulation affected the retention slope for trained words while skill affected the retention slope of transfer words. <sup>∗</sup>p < 0.05.

FIGURE 3 | Model estimated training (A) and retention (B) slopes for trained words. During training (A), low skill readers (blue) benefitted from real stimulation (solid), showing steeper learning curves compared to those who received sham stimulation (dashed). High skill readers (red), showed less training related gains following stimulation (solid) compared to those who received sham stimulation (dashed). During retention (B), those who received real stimulation (solid) showed less forgetting compared to those who received sham stimulation (dashed). Plots reflect the model estimated performance for an 18-year-old male with average intelligence (reflecting mean centered scores of 0) at two standard deviations below (low) and above (high) group mean reading skill.

if participants reached a ceiling for accuracy, preventing further possible improvements. As **Figure 3** shows, participants did not reach ceiling; indeed, the group with the highest baseline accuracy achieved only the third highest accuracy at the final testing session.

There was no effect of skill on retention slope, rather, there was a significant effect of stimulation group after controlling for a significant group by age interaction. The sham group showed a steeper negative retention slope compared to the stimulation group. Thus, regardless of skill level, the stimulation group forgot less in the interval between the training and the retention test (see **Figure 3B**).

#### Transfer Words

Model fit indices indicate the model did not fit the data for transfer words as well as trained words (RMSEA = 0.095; CFI = 0.787). Given work showing model fit indices tend to over-reject acceptable models in samples <100 (Kenny et al., 2015), the model was considered acceptable. The same pattern of results was found for the intercept of the training slope for transfer words with higher skill readers tending to have higher baseline performance with the interaction showing the same pattern of results within each group.

The training slope for transfer words also showed similar effects of skill and group by skill interaction, though there was

FIGURE 4 | Model estimated training (A) and retention (B) slopes for transfer words. During training (A), low skill readers (blue) who received real stimulation (solid) showed steeper learning curves for transfer to novel words compared to those who received sham stimulation (dashed). High skill readers (red) were less able to transfer letter knowledge to newly learned words following stimulation (solid) compared to those who received sham stimulation (dashed). During retention (B), high skill readers regardless of stimulation group (red) showed less decline in transfer compared to low skill readers (blue) who show a decrease in transfer. Plots reflect the model estimated performance for an 18-year-old male with average intelligence (reflecting mean centered scores of 0) at two standard deviations below (low) and above (high) group mean reading skill.

TABLE 3 | Parameter estimates (standard error) for the effect of group for lower, average and higher skill readers.


Positive effect of group indicates an advantage for sham, and a negative effect of group indicates an advantage for stimulation. Stimulation improved the training curve for low skill readers, but interfered with learning for high skill readers. <sup>∗</sup>p < 0.05.

only an additional significant effect of group by IQ interaction, not group by age as in the trained words data. Skill again had a negative effect on the training slope for transfer words. We performed the same simple slope calculations to determine the direction of effect in the group by skill interaction employed for the trained words. We obtained a similar pattern of results, with stimulation tending to benefit the training slope at lower levels of reading skill and stunting it for higher levels of reading skill (see **Figure 4A**). However, in this case, the effect of group at the lower reading skill level was not significant (see **Table 3**).

On the retention slope, there was a significant effect of skill but not group, contrasting the results of the trained word model. Reading skill had a positive effect on the retention slope for transfer words, indicating poorer readers showed a greater decrease in performance on transfer words between the final training session and the retention test (see **Figure 4B**). However, stimulation had no effect on retention, nor did it interact with skill to significantly affect retention.

## DISCUSSION

The goal of this study was to determine whether parietotemporal stimulation could improve learning and long-term retention of new grapheme-phoneme relationships in lower reading skill adults. As predicted, parietotemporal stimulation improved acquisition rates for lower skilled adults. Yet, parietotemporal stimulation negatively impacted higher skill adults' learning curves. The effects of stimulation also transferred to untrained material, with stimulation benefitting transfer word learning curves of lower skill readers and impairing that of higher skill readers. Further, stimulation improved long-term retention of trained material across all skill levels. This study supports prior research showing pre-learning parietotemporal activity predicts response to reading intervention and goes beyond previous orthographic learning studies that have shown training affects parietotemporal cortex activity by suggesting that parietotemporal activity can affect new learning, including transfer and long-term retention.

That stimulation affected individuals of varying skill levels differently suggests our readers did have variation in the composition of their reading network at baseline, most likely in the parietotemporal area targeted by stimulation. By manipulating parietotemporal function, we provide evidence to support the importance of this region for word learning from explicit instruction (Wong et al., 2007; Richardson et al., 2010; López-Barroso et al., 2013). The results of the current study suggest that for adult learners, new learning depends on an optimal balance between semantic, phonological and orthographic information. Connectionist models of reading suggest semantics is reached via two pathways, an orthography to semantics pathway and an orthography to phonology to semantics pathway. These pathways both contribute to word reading, but the division of labor between the two differs depending on the type of word being read (e.g., exception words, high frequency words, pseudowords; Harm and Seidenberg, 1999, 2004; Seidenberg, 2005). The phonologically mediated pathway is less efficient, but initially dominant when learning to read, whereas the more efficient orthography to semantics pathway is formed and strengthened over time. Even when the more efficient orthography to semantics pathway is fully formed, the phonologically mediated pathway remains a significant contributor to word reading, with the sum of outputs from the two pathways being greater than the output of the either pathway on its own (Harm and Seidenberg, 2004). According to this model, one reason for lower reading skill may be a weaker phonologically mediated pathway. Lower skill but non-impaired readers, such as the readers in the current study, may still achieve reasonable reading skill by relying more on the orthography to semantics pathway. The orthography to semantics pathway thus plays a dominant role regardless of word type, which ultimately results in an overall less efficient reading network (Harm and Seidenberg, 2004). In our study, parietotemporal stimulation likely strengthened this phonologically mediated pathway, resulting in better learning in lower skill readers. However, this same increase to an already strong phonologically mediated pathway in higher skilled readers may have caused this less efficient pathway to be a stronger contributor throughout the course of learning which prevented the more efficient orthography to semantics pathway from effectively contributing as it developed later in learning. Indeed, neural connectivity studies in typical adult readers have suggested that readers who tend to rely on one processing stream regardless of word type are more likely to have lower reading ability compared to those readers whose neural strategy shifts depending on word type (Levy et al., 2009). Thus, readers who continued to rely on the phonologically mediated pathway could successfully acquire the orthography, but at a slower rate than those readers who were able to successfully shift the division of labor between the two pathways over the course of learning.

Stimulation had a positive effect on learning graphemephoneme relationships, but only for readers who showed initial lower reading skill, as measured by real word reading fluency. These findings underscore the importance of considering baseline performance when determining the effect of stimulation, and may reconcile conflicting results amongst reports of the effect of stimulation on reading in healthy adults (Turkeltaub et al., 2012; Thomson et al., 2015; Younger et al., 2016; Westwood and Romani, 2017). Studies examining either lower skill adults or adults with dyslexia have demonstrated positive effects of left hemisphere stimulation on reading ability (Turkeltaub et al., 2012; Younger et al., 2016). However, two studies have found a null effect on reading ability after left hemisphere stimulation, with one showing a positive effect after right hemisphere stimulation (Thomson et al., 2015; Westwood and Romani, 2017). These two studies, however, studied adults within the typical range of reading ability and do not account for individual differences in baseline performance. As suggested by previous research, the effects of stimulation may have been reduced when examining all skill levels together, resulting in a null effect (Benwell et al., 2015; Hsu et al., 2016).

The differential effect of stimulation depending on baseline skill level is consistent with previous stimulation studies as well (Tseng et al., 2012; Benwell et al., 2015; Hsu et al., 2016; Katz et al., 2017), yet the results of our study extend these studies in an important way. Previous research has indicated potential diminishing returns of stimulation, with the benefit of stimulation decreasing as baseline performance increases (Tseng et al., 2012; Katz et al., 2017). Our study shows not just diminishing returns but a significant negative effect of stimulation as skill increases. While other studies have shown anodal stimulation generally thought to have a positive effect on behavior can in some cases have a negative effect (Antal et al., 2007; Jacobson et al., 2012; Sandrini et al., 2012), to our knowledge, ours is the first study showing anodal stimulation can have a positive effect for some individuals, and a negative effect for others, depending on baseline skill level (though see Wiethoff et al., 2014 for other examples of individual differences in direction of effect). This result supports our previous findings reported in Younger et al. (2016) in which we demonstrated stimulation can have a negative and not just a null effect.

The effects of parietotemporal stimulation extended beyond explicitly trained words to novel transfer words. While the effect did not reach significance for the lower skill readers, the same pattern of effects was found for transfer words as trained words. These results support that parietotemporal stimulation affected learning of grapheme-phoneme mappings at the letter level, and did not simply improve route memorization of trained whole word forms. Previous orthographic learning studies have shown that transfer depends on the type of instruction received during training (Bitan et al., 2005; Cao et al., 2013; Mei et al., 2014; Hirshorn et al., 2016; Taylor et al., 2017), even when training is not on individual letters, but on entire word forms (Yoncheva et al., 2010, 2015). Yoncheva et al. (2010) taught participants to read words using the same orthography, but directed attention to either grapheme-phoneme mappings at the letter level or word level. While both groups achieved high accuracy on explicitly trained words, only the group whose attention was directed towards letter-level mappings were able to identify novel words (Yoncheva et al., 2010). In the current study, all participants received the same instructions with explicit attention to the letter-level mappings embedded within the words. Transfer ability was thus not modulated by instruction, but rather by individual differences in pre-training reading skill and parietotemporal stimulation. Therefore, individual differences in skill and neural function prior to training influence learning of grapheme-phoneme mappings which transfers to untrained material.

Despite the skill by stimulation interaction on acquisition rates, parietotemporal stimulation benefitted retention of trained material across all skill levels. This result suggests parietotemporal stimulation may have a differential effect on initial learning and consolidation, and these two stages interact with baseline skill differently. Previous studies examining the effect of stimulation during cognitive training have shown differential effects on initial and later performance (Reis et al., 2009, 2015; Martin et al., 2014), possibly due to a specific effect on consolidation (Alonzoa et al., 2012). In some cases, there are no immediate effects of stimulation, and benefits only emerge after a delay period (Antonenko et al., 2018). Thus, while parietotemporal stimulation interacted with skill to affect acquisition, stimulation may be more universally beneficial to consolidation of learned material. However, the long-term benefits of stimulation were only seen for explicitly trained words and did not transfer to novel words. While transfer effects of tDCS are inconsistent, several studies, including Costanzo et al. (2016b), showed long term benefits of stimulation to tasks that were not performed during the initial training period (for review see Berryhill, 2017). One possible explanation for the lack of maintained transfer effects seen in the current study is the spacing of stimulation sessions. Work examining tDCS enhanced working memory training has shown that stimulation has a greater effect when spaced a few days apart (Au et al., 2016). The majority of stimulation sessions were in the current study were on concurrent days, and no session took place more than 48 h apart. In contrast, the Costanzo et al. (2016b) study delivered three stimulation session over the course of a week. Thus, not only the type of training, but also the timing of stimulation sessions, may be an important factor for determining the optimal design of a tDCS facilitated intervention.

The current study provides promising evidence for parietotemporal stimulation enhancing training on graphemephoneme mapping for lower skill readers. Yet, the current study does not allow us to make a definitive statement regarding the specificity of parietotemporal stimulation or the underlying source of these behavioral effects. We chose to stimulate the parietotemporal cortex given its demonstrated role in graphemephoneme mapping. However, this area is also associated with cognitive skills such as visual attention, which can also influence reading skill (Bosse et al., 2007; Shaywitz and Shaywitz, 2008; Vidyasagar and Pammer, 2010; Gabrieli and Norton, 2012; Heim et al., 2015). Studies using a similar target site have also shown stimulation can affect visual attention (Minamoto et al., 2014) and working memory (Hill et al., 2016; Trumbo et al., 2016; Möller et al., 2017). These cognitive mechanisms are related to grapheme-phoneme processing, and thus may have mediating roles on the relationship between reading skill, parietotemporal stimulation and grapheme-phoneme mapping. More comprehensive profiles of reading ability may provide additional insights as to the type of reader most likely to respond to stimulation enhanced training. Further, the effects of stimulation can spread to regions functionally and structurally connected to the target region (Turi et al., 2012; Bikson et al., 2013; Park et al., 2013; Choe et al., 2016). It is therefore possible that stimulation additionally affected related reading regions such as the inferior frontal gyrus and occipitotemporal cortex. Conversely, stimulation to any of these connected regions could also potentially result in the same behavioral effects. The spreading effects of stimulation may have acted in conjunction with stimulation to the parietotemporal cortex to affect learning to a greater degree than expected compared to stimulation of parietotemporal cortex in isolation. Given the anatomical and functional connection between parietotemporal and occipitotemporal cortex (Yeatman et al., 2013), parietotemporal cortex stimulation may be more beneficial to reading skill compared to other stimulation targets (Younger et al., 2016). Neuroimaging data could be used to address how neuroanatomy interacts with stimulation to affect behavior.

## CONCLUSION

The current study provides evidence that the parietotemporal cortex plays an influential role in learning grapheme-phoneme mappings. Parietotemporal stimulation enhanced acquisition of letter-sound mappings of a novel orthography in lower skill readers, and this knowledge was both generalized to untrained material and maintained over a delay period. Thus, parietotemporal stimulation may be an effective tool to support reading instruction for those who struggle by both enhancing existing grapheme-phoneme mappings and supporting the acquisition of new ones. However, stimulation did not benefit all readers equally; higher skill readers were negatively affected, possibly because stimulation interfered with the optimal division of labor between processing pathways. Thus, while parietotemporal function is critical to new learning, its role in continued reading improvement likely changes as readers progress in skill.

## ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the Institutional Review Board at the University of Texas at Austin with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Institutional Review Board at the University of Texas at Austin.

## AUTHOR CONTRIBUTIONS

JWY and JRB conceived and designed the experiments; analyzed and interpreted the data. JWY performed the experiments; drafted the manuscript. JRB revised the manuscript. All the authors read and approved the final version of the manuscript.

## ACKNOWLEDGMENTS

This research was supported by the University of Texas at Austin. The authors would also like to thank the numerous research assistants who assisted in data collection throughout the study.

## REFERENCES


## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnhum.2018.001 09/full#supplementary-material

learning in pilot training. Front. Hum. Neurosci. 10:34. doi: 10.3389/fnhum. 2016.00034


transcranial direct current stimulation (tDCS). Brain Stimul. 8, 535–550. doi: 10.1016/j.brs.2015.01.400


year-old children. Scand. J. Psychol. 50, 79–91. doi: 10.1111/j.1467-9450.2008. 00688.x


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Younger and Booth. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Selective Deficit in Phonetic Recalibration by Text in Developmental Dyslexia

#### Mirjam Keetels<sup>1</sup> \*, Milene Bonte<sup>2</sup> and Jean Vroomen<sup>1</sup>

<sup>1</sup> Cognitive Neuropsychology Laboratory, Department of Cognitive Neuropsychology, Tilburg University, Tilburg, Netherlands, <sup>2</sup> Maastricht Brain Imaging Center, Department Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Netherlands

Upon hearing an ambiguous speech sound, listeners may adjust their perceptual interpretation of the speech input in accordance with contextual information, like accompanying text or lipread speech (i.e., phonetic recalibration; Bertelson et al., 2003). As developmental dyslexia (DD) has been associated with reduced integration of text and speech sounds, we investigated whether this deficit becomes manifest when text is used to induce this type of audiovisual learning. Adults with DD and normal readers were exposed to ambiguous consonants halfway between /aba/ and /ada/ together with text or lipread speech. After this audiovisual exposure phase, they categorized auditory-only ambiguous test sounds. Results showed that individuals with DD, unlike normal readers, did not use text to recalibrate their phoneme categories, whereas their recalibration by lipread speech was spared. Individuals with DD demonstrated similar deficits when ambiguous vowels (halfway between /wIt/ and /wet/) were recalibrated by text. These findings indicate that DD is related to a specific letter-speech sound association deficit that extends over phoneme classes (vowels and consonants), but – as lipreading was spared – does not extend to a more general audio–visual integration deficit. In particular, these results highlight diminished reading-related audiovisual learning in addition to the commonly reported phonological problems in developmental dyslexia.

#### Edited by:

Jurgen Tijms, University of Amsterdam, Netherlands

#### Reviewed by:

Jarmo Hamalainen, University of Jyväskylä, Finland Marie Lallier, Basque Center on Cognition, Brain and Language, Spain

#### \*Correspondence:

Mirjam Keetels M.N.Keetels@uvt.nl; M.N.Keetels@TilburgUniversity.edu

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 12 February 2018 Accepted: 23 April 2018 Published: 15 May 2018

#### Citation:

Keetels M, Bonte M and Vroomen J (2018) A Selective Deficit in Phonetic Recalibration by Text in Developmental Dyslexia. Front. Psychol. 9:710. doi: 10.3389/fpsyg.2018.00710 Keywords: phonetic recalibration, orthographic information, dyslexia, letters, speech perception

#### INTRODUCTION

Children learn to associate graphemes with speech sounds during reading acquisition. The automatic coupling of graphemes with speech is crucial to become a fluent reader in an alphabetic script. Although most children successfully master these skills, individuals with developmental dyslexia (DD) experience difficulties in reading and spelling despite adequate intelligence and intact sensory abilities (Lyon et al., 2003). Mounting evidence suggests that individuals with DD show deficits in grapheme-phoneme or letter-speech sound associations (Blau et al., 2009, 2010; Froyen et al., 2011; McNorgan et al., 2013; Zaric et al., 2014), next to commonly observed phonological processing difficulties (Snowling, 2000; Ramus, 2003; Blomert, 2011).

Blau et al. (2009) were the first to demonstrate these letter-speech sound integration impairments in a functional magnetic resonance imaging (fMRI) study. Adult dyslexic and fluent age-matched readers were presented with letters and speech sounds either in isolation (visual or auditory) or combined (congruent or incongruent). As in earlier studies, fluent readers showed

**22**

enhanced superior temporal gyrus activation for congruent letterspeech sound pairs as compared to incongruent pairs (Van Atteveldt et al., 2004) indicating automatic detection of letterspeech congruencies. Blau et al. (2009), though, did not find such congruency effect for adult dyslexic readers, indicating reduced letter-speech sound integration (see also Blau et al., 2010 for similar findings in dyslexic children).

Studies using electroencaphalogram (EEG) further investigated the neural time-course of letter-speech integration deficits in individuals with DD. These studies have typically used an audiovisual variant of the oddball paradigm. In the classical oddball paradigm a mismatch negativity (MMN) response is evoked between 100 and 250 ms after the onset of a deviating sound stimulus that is presented in a sequence of repeating standard stimuli (see Näätänen et al., 2007 for a review). By employing an audiovisual oddball paradigm Froyen et al. (2008) demonstrated that normal readers show an enhanced MMN response to a deviant speech sound /o/ in a stream of standard speech sounds /a/ when both the standard and deviant sounds are presented together with the letter 'a' (as compared to the MMN in an auditory-only condition without letter stimuli). This enhanced audiovisual MMN indicates that in fluently reading adults, letters and speech sound are integrated early and automatically. Furthermore, these audiovisual effects have been shown to gradually appear in typically reading children after several years of reading instruction (Froyen et al., 2009; Zaric et al., 2014), whereas these effects are reduced or absent in children with dyslexia (Froyen et al., 2011; Zaric et al., 2014). Both reading-related audiovisual effects in typically reading children and their reduction in dyslexia have further been reported in EEG and fMRI studies using other paradigms and different types of stimuli including individual letters/speech sounds, syllables or words (McNorgan et al., 2013; Mittag et al., 2013; Kronschnabel et al., 2014; Zaric et al., 2014; Moll et al., 2016). (Though, for contradictory results see Nash et al., 2017).

A key question in current research on dyslexia involves the domain-specificity of this audiovisual deficit. Is it restricted to a specific deficit of matching graphemes with phonemes, or is it a more general deficit in the integration of audiovisual information (Hahn et al., 2014). At this point, findings in the literature are contradictory. Some studies suggest that individuals with DD have problems with more general audio–visual integration processes. For example, in a reaction time experiment, Harrar et al. (2014) showed that individuals with DD have problems with multisensory integration of simple non-linguistic stimuli, which would be indicative of a more general multi-sensory deficit. By using an ERP paradigm in which visual symbol patterns had to be matched with predicted sound patterns, Widmann et al. (2012) also showed that dyslexic children had difficulties to form unitary audiovisual object representations (though see Widmann et al., 2014 who showed that the gamma response in the audio– visual task is mostly due to microsacades). Other studies using lipread speech, have found mixed results. For example, Baart et al. (2012) demonstrated comparable phonetic recalibration effects by lipread speech in dyslexic and fluent readers. De Gelder and Vroomen (1998), though, reported that poor readers were also poor lipreaders, and a recent study by Van Laarhoven et al. (2018) found that both children and adults with DD have deficits in the ability to benefit from lip-read speech when speech was presented in background noise (see also Hayes et al., 2003; Ramirez and Mann, 2005). Taken together, evidence on the domain-specificity of audio–visual association deficits in DD is not consistent at this point. Furthermore, audiovisual integration has typically been studied using either text, lipread speech, or non-linguistic information without direct comparisons of these different types of information within the same groups.

In the present study we investigate the domain-specificity of the audiovisual processing deficit in DD by comparing the influence of written text and lipread speech on the perception of ambiguous speech sounds. If the deficit reflects a more general audiovisual deficit, impaired audiovisual processing in dyslexia should be observed with both types of information. We used lipread speech as a comparison stimulus, because, like text, it involves visual information that matches to speech sounds. Importantly, however, letters are different from lipread speech because letter-speech sound combinations are arbitrary and culturally determined and need explicit training during literacy acquisition (Liberman, 1992) and some studies even challenge the idea that written text may influence speech perception (Mitterer and Reinisch, 2015). This contrasts with the association between lipmovements and speech sounds because that does not need to be learned explicitly as there are strong biological constraints between perception and production (Kuhl and Meltzoff, 1982).

In the current study, either written text or lipread speech was presented together with ambiguous speech sounds during an exposure phase to induce phonetic recalibration. The context information (text or lipread speech) is thought to induce a shift in the perception of the ambiguous speech sound in order to reduce the intersensory conflict. This shift can then be measured as an aftereffect with subsequently presented ambiguous speech sounds. Phonetic recalibration was first demonstrated by Bertelson et al. (2003) who used an ambiguous speech sound halfway between /aba/ and /ada/ (henceforth: A?) dubbed onto the video of a face articulating either /aba/ or /ada/ (henceforth: VbA? or VdA?, where Vb = visual 'aba' stimulus, Vd = visual 'ada' stimulus, and A? = ambiguous auditory stimulus). Results showed that after exposure to an ambiguous speech sound combined with the video of a face articulating /aba/ (exposure to VbA?), an auditory-only ambiguous test sound was perceived as more /b/-like than after exposure to that same ambiguous sound combined with an /ada/ video (exposure to VdA?). The common interpretation is that lipread speech shifts the interpretation of the ambiguous sounds in order to reduce the intersensory conflict. This shift is thus observable as an aftereffect. Further research has also shown that this shift induced by lipread speech can be decoded in auditory cortical activity patterns (Kilian-Hutten et al., 2011).

In order to control for a simple response bias or a priming effect that reflects that a particular phoneme was heard in the previous exposure phase (e.g., participants respond /d/ simply because they heard /d/ in the foregoing exposure phase), we included, as in Bertelson et al. (2003; Experiment 2), audiovisual exposure stimuli that do not induce recalibration, namely audiovisual congruent stimuli with auditory non-ambiguous

sounds: VbAb and VdAd. Nevertheless, VbAb and VdAd do not induce recalibration because there is no conflict between the heard and lipread information that induces a shift in the phoneme boundary. In previous studies, these stimuli have sometimes induced contrastive aftereffects in which the responses are in the opposite direction as the exposure stimuli (i.e., fewer /b/ responses after exposure to VbAb than VdAd) indicative of selective speech adaptation (Eimas and Corbit, 1973), but this effect is usually quite small as selective speech adaptation requires larger amounts of exposure stimuli (Vroomen et al., 2007).

Phonetic recalibration by lipread speech has now been replicated many times, also in other laboratories with other tokens, and other phonemes (Samuel and Kraljic, 2009; Kilian-Hutten et al., 2011; Reinisch et al., 2014; Kleinschmidt and Jaeger, 2015). Most relevant for the present study is that phonetic recalibration can also be induced by orthographic information (Keetels et al., 2016). As with lipread speech, normal readers thus adjust their phoneme boundary if an ambiguous speech sound is accompanied by text that specifies what the ambiguous phoneme should be. Recently, Bonte et al. (2017) replicated this text-induced recalibration effect in an fMRI-paradigm and furthermore showed that it was accompanied by subtle changes in auditory cortical activity. More specifically, their results showed that it was possible to consistently predict whether participants perceived the same ambiguous speech sounds as either /aba/ or /ada/ based on the activity patterns in the posterior superior temporal cortex (STG). This finding indicates that letter-speech sound associations can adjust the auditory cortical representation of ambiguous speech in typically reading adults.

This raises the question whether individuals with DD will have a deficit using text to induce phonetic recalibration. Of interest is that Baart et al. (2012) already found that recalibration by lipread speech is comparable in DD and normal readers. If indeed recalibration by lipread speech is spared in DD, we thus might expect an orthographic-specific deficit in the processing and integration of graphemes and phonemes rather than a more general audiovisual integration problem. In particular, this would indicate diminished reading-related audiovisual learning in DD in addition to previously reported deficits in detecting letter-speech sound (in)congruency and commonly reported phonological problems.

## EXPERIMENT 1

#### Materials and Methods Participants

Thirty-six students from Tilburg University participated. Eighteen of them were formally assessed and diagnosed with dyslexia, either by a remedial educationalist or psychologist (15 female; average age 20.2 ± 1.9 SD). The diagnosis was made at varying ages ranging from approximately 7–20 years. Most of them participated in a training or rehabilitation program that varied from extra reading lessons at school to remedial teaching programs at external organizations. Seven of them reported to have one or more relatives with an official dyslexia diagnosis, five reported to have no relatives with dyslexia and the others were not sure. The other eighteen participants had no diagnosis of dyslexia (13 female; average age 20.0 ± 2.0 SD) and served as a control group. Dyslexic students were invited by email and were paid for their participation and students without dyslexia participated to receive course credits. We determined our sample size based on our lab's previous experience with the phonetic recalibration paradigm (Baart et al., 2012: 22 subjects in both the DD and Control group; Bertelson et al., 2003: 10 subjects in Experiment 2; Keetels et al., 2016: 22 subjects in both Experiments 1 and 2), which shows that inclusion of about 20 participants per subject group should give robust and significant behavioral recalibration/adaptation effects. All participants reported normal hearing and normal or corrected-to-normal vision and were fluent speakers of Dutch. They took part in the experiment individually and were unaware of the purpose of the experiment. This study was carried out in accordance with the recommendations of local ethics committee (EC-2014.38). The protocol was approved by the local ethics committee (EC-2014.38). All subjects gave written informed consent in accordance with the Declaration of Helsinki.

#### Reading Fluency Tests

Reading fluency was tested by using two Dutch standardized tests that measured single word reading for real words ('Eenminuut-test' or EMT, Brus and Voeten, 1997) and pseudo-words ('De Klepel,' Van den Bos et al., 1999). Participants had to readout-loud as many words as possible in a certain time period (1 min for EMT, 2 min for De Klepel). For both tests, reading fluency scores were calculated by subtracting the number of mistakes from the total number of read words. As expected, the DD-group was less efficient in reading (number of correctly read real-words = 77.8 ± 3.2 SEM, pseudo-words = 75.1 ± 3.4 SEM) than the Control group (number of correctly read realwords: 101.7 ± 2.31 SEM; pseudo-words = 98.6 ± 3.5 SEM) [independent samples t-test: t(34) = 6.04, p < 0.001, η <sup>2</sup> = 0.52 on real-words, t(34) = 4.83, p < 0.001, η <sup>2</sup> = 0.41 on pseudowords].

#### Stimuli and Materials

Participants were seated in front of a 17-inch (600 pixels × 800 pixels) CRT-monitor (100 Hz refresh rate) at a distance of approximately 60 cm. The stimuli were identical to those used in Bertelson et al. (2003). In short, we used the audiovisual recording of a male Dutch speaker pronouncing the non-words /aba/ and /ada/.

The audio was synthesized into a nine-token /aba/–/ada/ continuum (i.e., A1-A9) by changing the second formant (F2) in eight steps of 39 Mel using the 'Praat' speech editor (Boersma and Weenink, 1999). The offset frequency of the first vowel (before the closure) and onset frequency of the second vowel (after the closure) were 1100 Hz for /aba/ and 1678 Hz for /ada/ (see Figure 1 in Vroomen et al., 2004b). The duration of all sound files was 640 ms. From this nine-token continuum, we used the most outer tokens (A1 and A9; henceforth Ab and Ad, respectively) and the three middle tokens (A4, A5, and A6; henceforth A?−1, A? and A?+1, respectively). The audio was delivered binaurally through headphones (Sennheiser HD201). The sound volume of

the stimuli was approximately 66 dB SPL when measured at 5 mm from the earphone.

Visual stimuli consisted of either the presentation of the three letters of the non-words 'aba' or 'ada,' and the video of the lipmovements of the speaker pronouncing 'aba' or 'ada.' The letters were lowercase presented in gray (RBG: 128,128,128) Arial Black Font on a dark background in the center of the screen (W: 5.5◦ , H: 2.5◦ ). Visual stimulus duration was 1200 ms. When presented in combination with speech sound stimuli, letters were presented 450 ms before the sound because informal pilot testing in Keetels et al. (2016) showed that this was the most optimal interval to induce perceptual synchrony between the inner speech of the silently read letters (the internal voice that is 'heard' while reading) and the externally presented speech sound.

In case of the video-presentations, we used the video tracks of the audio–visual recording of the male Dutch speaker pronouncing the non-words /aba/ and /ada/. The videos showed the face of the speaker from the forehead to the chin and had a duration of 2130 ms. Videos were displayed as a string of 71 bitmaps in which each bitmap was displayed for 30 ms (including a 4 bitmap black-to-color fade-in and 5 bitmap color-to-black fade-out). The image size was 9 × 6.5 degrees (high × width) and was presented on a black background at the center of the screen.

#### Design and Procedure

Participants were repeatedly presented with Exposure-Test miniblocks that each consisted of eight audiovisual exposures (i.e., exposure-phase) followed by six auditory-only test trials (testphase). See **Figure 1** for a schematic set-up of the Exposure-Test mini-block design. In the Exposure phase, three within-subjects factors were varied: Exposure-type (Letter or Video), Exposuresound (Ambiguous or Non-ambiguous) and Exposure-token ('aba' or 'ada'). The exposure stimuli thus either contained letters or videos as visual stimuli in which either the ambiguous speech sound was combined with 'aba' or 'ada' (VbA? or VdA?), or the non-ambiguous speech sound in combination with congruent letters or video (VbAb or VdAd). The inter-stimulus interval (ISI) between subsequent exposure sound stimuli was 800 ms. The audiovisual exposure phase was followed (after 1500 ms) by six auditory-only test trials. Test-sounds were the most ambiguous token on the continuum (A?), its more 'aba-like' neighbor (A?−1), and the more 'ada-like' neighbor on the continuum (A?+1). The three test-sounds (A?−1; A?; A?+1) were presented twice in random order. The participant's task was to indicate whether the test sound was more /aba/ or /ada/-like by pressing a corresponding key on a response box. The inter-trial interval (ITI) was 1250 ms.

Each participant completed 80 Exposure-Test mini-blocks in which each of the 8 exposure conditions (2 Exposure-type × 2 Exposure-sound × 2 Exposure-token) was presented 10 times (in order to collect 20 repetitions of each Test-sound per exposure condition). There was a short pause after each 16 mini-blocks. The audiovisual exposure conditions varied randomly between mini-blocks. Total testing lasted ∼60 min.

#### Results

The results of the ambiguous and non-ambiguous exposure sounds were analyzed separately because previous studies have demonstrated that different mechanisms underlie phonetic recalibration (induced by intersensory conflict) and selective speech adaptation (mainly depending on the acoustic nature of the exposure stimuli) (Eimas and Corbit, 1973; Samuel, 1986; Vroomen et al., 2004a; Samuel and Lieblich, 2014). **Figures 2** and **3** display the group-averaged proportions of /d/-responses of the test sounds after exposure to ambiguous and non-ambiguous sounds, respectively. As expected, after exposure to ambiguous sounds, there were more /d/ responses after exposure to VdA? than after VbA? (indicative of phonetic recalibration), whereas for non-ambiguous exposure, there were fewer /d/ responses after exposure to VdAd than after VbAb (indicative of selective speech adaptation). The individual proportion of /d/-responses on the auditory-only test-trials was calculated for each combination of

A?+1) after ambiguous Exposure-sounds. Graphs separately depict Letter (upper four graphs) and Video (lower two graphs) Exposure-types for the Control group (left graphs) and DD group (right graphs). Aftereffects represents the overall difference between the two Exposure-tokens (VdA? – VbA? for consonants, and ViA?–VeA? for vowels). Error bars represent the standard errors of the mean.

Exposure-type (Letter or Video), Exposure-sound (Ambiguous or Non-ambiguous), Exposure-token (Vb or Vd), and Test-sound (A?−1; A?; A?+1).

#### Aftereffects Following Exposure to Ambiguous Sounds (Recalibration)

A repeated measures ANOVA with within-subjects factors Exposure-type (Letter or Video), Exposure-token (Vb or Vd), and Test-sound (A?−1; A?; A?+1) and between-subjects factor Dyslexia (DD or Control-group) was performed on the log-odds transformed proportions of /d/-responses on the test trials. The log-odds transformation was performed to meet assumptions of distribution normality. In cases in which Mauchly's test indicated that the assumption of sphericity was violated, degrees of freedom were corrected using Greenhouse-Geisser estimates of sphericity.

The analysis showed a main effect of Exposure-token [F(1,34) = 79.96, p < 0.001, η 2 <sup>p</sup> = 0.70] which interacted with Exposure-type [F(1,35) = 36.49, p < 0.001, η 2 <sup>p</sup> = 0.52] indicative of differences between letter- and lipread-induced aftereffects (i.e., the difference between Vb and Vd Exposure-tokens). Important for the present study, this interaction was different for the DD and Control group [Exposure-token × Exposure-type × Dyslexia: F(1,34) = 3.58, p = 0.034, one-tailed, η 2 <sup>p</sup> = 0.10] and will be further examined by post hoc t-tests (described below).

The ANOVA showed a main effect of Test-sound [F(2,68) = 212.23, p < 0.001, Greenhouse-Geisser corrected, η 2 <sup>p</sup> = 0.86] which interacted with Exposure-token [F(2,68) = 4.82, p = 0.011, η 2 <sup>p</sup> = 0.12]. Numerical comparison of the means shows overall larger aftereffects at the most ambiguous Test-sound. Also, a three-way interaction between Test-sound, Exposure-type and Dyslexia was found [F(2,68) = 5.00, p = 0.01, η 2 <sup>p</sup> = 0.13] possibly reflecting a somewhat less steep function of Test-sound for the DD group when exposed to letters as compared to lipread exposure. The four-way interaction was not significant [F(2,68) = 0.112, p = 0.89, η 2 <sup>p</sup> = 0.003]. None of the other effects were significant (all p-values > 0.17).

In order to further explore the theoretically important threeway interaction between Exposure-token, Exposure-type and Dyslexia, data were pooled over Test-sound (A?−1; A?; A?+1) and aftereffects were computed as in previous studies by subtracting the proportion of /d/ responses after exposure to VbA? from VdA? (Van Linden and Vroomen, 2007; Keetels et al., 2015, 2016). Aftereffects indicative of recalibration should then have a positive sign.

#### **Letter-induced aftereffects**

After exposure to ambiguous sounds combined with letterstimuli, aftereffects were 0.05 and 0.14 for the DD and Control group, respectively. An independent samples t-test showed that the effect was stronger for the Control group than the DD group [t(34) = 2.35, p = 0.013 one-tailed, η <sup>2</sup> = 0.14 because there was a clear prediction that DD should have smaller letter-induced recalibration effects]. Two one-sample t-tests were conducted using Bonferroni corrected alpha levels of 0.025 (0.05/2) per test and showed that the aftereffects were significantly different from zero for the Control group [t(17) = 4.35; p < 0.001, η <sup>2</sup> = 0.53], but not for the DD group [t(17) = 1.51; p = 0.15, η <sup>2</sup> = 0.12]. Dyslexic readers thus had no letter-induced recalibration effect whereas the fluent readers did.

#### **Lipread-induced aftereffects**

After exposure to ambiguous sounds combined with lipread speech, aftereffects were 0.24 and 0.23 for the DD and Control group, respectively. Separate one-sample t-tests using Bonferroni corrected alpha levels of.025 (0.05/2) tested the aftereffects against zero and showed that both groups had lipread-induced aftereffects [DD group: t(17) = 6.38, p < 0.001, η <sup>2</sup> = 0.71; Control group: t(17) = 8.12, p < 0.001, η <sup>2</sup> = 0.80]. Furthermore, an independent samples t-test showed that these effects were not different in size [t(34) = 0.014, p = 0.98, η <sup>2</sup> < 0.001]. Dyslexic and fluent readers thus both had lipread-induced recalibration with comparable magnitude.

#### **Lipread vs. letter-induced aftereffects**

Two paired-sample t-tests using Bonferroni corrected alpha levels of 0.025 (0.05/2) compared the lipread and letter-induced aftereffects for both the DD and Control group. In both groups the letter-induced aftereffects were significantly smaller than the

lipread-induced aftereffects [DD group: t(17) = 3.07, p < 0.01, η <sup>2</sup> = 0.58; Control group: t(17) = 4.85, p < 0.001, η <sup>2</sup> = 0.43].

#### **Correlation between reading fluency scores and aftereffects**

No significant correlations were found between the word or pseudo-word reading fluency scores and the letter-induced aftereffects (real words: r = 0.13, p = 0.44; pseudo-words: r = 0.26, p = 0.12) or the lipread aftereffects (real words: r = 0.20, p = 0.23; pseudo-words: r = 0.14, p = 0.42). Though, when correlating the reading scores with the difference between the lipread and letter induced aftereffects, a trend was found (real-words, r = 0.28, p = 0.10; pseudo-words: r = 0.32, p = 0.06) indicating a trend toward a bigger difference between lipread and letter-induced aftereffects when reading fluency was less good. The absence of significant effects might be explained by the overlap in reading scores between the groups (Controls range from 55 to 113 on the pseudo-word reading, while DDs range from 57 to 106 on pseudo-word reading). These reading scores also show that our dyslexic group consisted of compensated dyslexic adults, who were, however, all formally diagnosed with dyslexia, while participants in the control group were not.

#### Aftereffects Following Exposure to Non-ambiguous Sounds (Selective Speech Adaptation)

A repeated measures ANOVA on the data of the non-ambiguous exposure-sound trials showed a main effect of Exposure-token [F(1,34) = 28.10, p < 0.001, η 2 <sup>p</sup> = 0.45] indicative of selective speech adaptation effects (i.e., the difference between VbAb and VdAd exposure). This effect did not interact with Exposuretype [F(1,34) = 0.021, p = 0.88, η 2 <sup>p</sup> = 0.001], nor with Dyslexia [F(1,34) = 0.37, p = 0.55, η 2 <sup>p</sup> = 0.011], and also no three-way interaction between these factors was found [F(1,34) = 0.24, p = 0.63, η 2 <sup>p</sup> = 0.007]. These findings thus indicate that selective speech adaptation effects after letter and lipread exposure were not different for the DD and Control group (aftereffects after letter exposure were −0.15 and −0.08 for the DD and Control group, respectively, and aftereffects after lipread exposure were −0.12 for the DD and −0.08 for the Control group).

The analysis also showed a main effect of Test-Sound [F(2,68) = 261.10, p < 0.001, η 2 <sup>p</sup> = 0.89] which did not interact with Dyslexia [F(2,68) = 2.09, p = 0.13, η 2 <sup>p</sup> = 0.06]. Furthermore, an interaction between Test-sound and Exposure-token was found [F(2,68) = 3.19, p = 0.05, η 2 <sup>p</sup> = 0.09] which interacted with Exposure-type [F(2,68) = 5.31, p = 0.007, η 2 <sup>p</sup> = 0.14], showing that aftereffects were somewhat bigger for the most ambiguous Test-sound after letter exposure, while this was not the case for lipread exposure. None of the other effects were significant (all p-values > 0.13).

Taken together, Experiment 1 demonstrates that dyslexic readers had difficulties using text to recalibrate their /b-d/ phoneme boundary, whereas recalibration by lipread speech was as in normal readers. This is indicative of a rather specific deficit in the processing and integration of graphemes and phonemes in DD, but not of a more general problem in audiovisual integration. In Experiment 2 we investigated whether this deficit is replicated when vowels are used instead of consonants.

## EXPERIMENT 2

In Experiment 2, we investigated whether vowels, rather than consonants, can be recalibrated by text. It has been argued that dyslexic readers may have specific difficulties in the processing of stop consonants because the relevant acoustic cues that discriminate stop consonants from each other are short and easily masked by other acoustic information (Tallal, 1980). For this reason it is important to assess whether recalibration by text is spared if vowels instead of consonants are used. To do so, we created an ambiguous vowel halfway between /I/ and /e/ and embedded it in a CVC context of /w?t/. This sound was then accompanied by the letters 'wit' or 'wet,' that are both high-frequency words in Dutch (meaning 'white' and 'law,' respectively). Here we chose to use real words instead of pseudowords in order to avoid any subtle differences in reading of nonword stimuli due to commonly reported phonological processing difficulties (Yap and Vanderleij, 1993; Snowling, 1995; Taroyan and Nicolson, 2009). The question was whether DD would still have deficits using written high-frequency real-words, to induce recalibration of the ambiguous vowel.

#### Materials and Methods

Experimental procedures were as in Experiment 1 with the following changes.

#### Participants

Thirty-seven students from Tilburg University participated and received course credits or were paid for their participation. Nineteen of them formally diagnosed with dyslexia (average age = 21.2 ± 2.22 SD; 11 also participated in Experiment 1) and the other eighteen had no diagnosis of dyslexia and served as a control group (average age 19.3 ± 2.2 SD; two also participated in Experiment 1).

#### Reading Fluency Tests

Numerical comparison showed that individuals with DD were less efficient readers (number of correctly read realwords = 75.9 ± 3.7 SEM, pseudo-words = 72.5 ± 4.1 SEM) than the Control group (number of correctly read real-words: 92.7 ± 3.4 SEM; pseudo-words = 101.6 ± 1.5 SEM), a finding that was confirmed by two independent samples t-tests with Bonferroni corrected alpha levels of 0.025 (0.05/2) [t(35) = 3.35, p = 0.002, η <sup>2</sup> = 0.24 for real-words, t(35) = 6.44, p < 0.001, η <sup>2</sup> = 0.54 for pseudo-words].

#### Stimuli and Materials

For the auditory stimuli, we used the audio tracks of a recording of a male Dutch speaker pronouncing the words /wet/ and /wIt/. The audio was synthesized into a 19-token /wet/–/wIt/ continuum (i.e., A1–A19) created with Tandem-STRAIGHT (Kawahara et al., 2008) by changing the spectrum and fundamental frequency of the individual tokens. The duration of all sound files was 595 ms. From this nineteentoken continuum, we used the most outer tokens (A1 and A19; henceforth Ae and Ai, respectively), and three tokens from the middle of the continuum (A8, A10, and A12, henceforth

A?−1, A? and A?+1, respectively). These three middle tokens were chosen based on pilot-testing showing a comparable categorization curve as the middle tokens of the /aba/-/ada/ continuum of Experiment 1. The audio was delivered binaurally through headphones (Sennheiser HD201) in which the sound volume of the stimuli was approximately 64 dB SPL when measured at 5 mm from the earphone.

Visual stimuli consisted of the presentation of the three letters of the Dutch words 'wit' and 'wet'. As in Experiment 1, the letters were gray on a dark background with a duration of 1200 ms and presented 450 ms before the onset of the audio.

#### Design and Procedure

As in Experiment 1, Exposure-Test mini-blocks were presented in which Exposure-sound (Ambiguous or Non-ambiguous) and Exposure-token ('wit' or 'wet') and Test-sound (A?−1; A?; A?+1) were varied. The participant's task was to indicate whether the test sound was more like /wIt/ or /wet/. Each participant completed 40 Exposure-Test mini-blocks in which each of the 4 exposure conditions [Exposure-sound (Ambiguous/Nonambiguous) × Exposure-token ('wit'/'wet')] was presented 10 times.

#### Results

Analyses were performed on the log odds transformations of the individual proportion of /i/-responses (i.e., 'wit'-responses) on the auditory-only test-trials (see **Figures 2** and **3**). As in Experiment 1, in cases in which Mauchly's test indicated that the assumption of sphericity was violated, degrees of freedom were corrected using Greenhouse-Geisser estimates of sphericity.

#### Aftereffects Following Exposure to Ambiguous Sounds (Recalibration)

A repeated measures ANOVA with within-subjects factors Exposure-token (Vi or Ve) and Test-sound (A?−1; A?; A?+1) and between-subjects factor Dyslexia (DD or Control group) was performed on the log-odds transformed proportions of /i/-responses to the test sounds. A main effect of Test-sound [F(2,70) = 348.474, p < 0.001, η 2 <sup>p</sup> = 0.91] was found, indicative of an overall larger number of /i/-responses for Test-sounds that were more /i/-like. Importantly, also an effect of Exposuretoken [F(1,35) = 12.49, p < 0.001, η 2 <sup>p</sup> = 0.26] was found, indicative of phonetic recalibration, and this effect interacted with Dyslexia [F(1,35) = 3.70, p = 0.032, one-tailed, η 2 <sup>p</sup> = 0.10] showing a significant group difference in phonetic recalibration with vowels. None of the other effects were significant (all p-values > 0.61).

To measure aftereffects, data were pooled, as before, over the three Test-sounds (A?−1; A?; A?+1) and the difference was computed between exposure to ViA? and VeA?. After exposure to ambiguous sounds, aftereffects were 0.05 and 0.10 for the DD and Control group, respectively. Two one-sample t-tests were conducted using Bonferroni corrected alpha levels of 0.025 (0.05/2) and showed that the effect was significantly different from zero for the Control group [t(17) = 3.80; p = 0.001, η <sup>2</sup> = 0.46], but not for the DD group [t(18) = 1.16; p = 0.26, η <sup>2</sup> = 0.07]. In line with the data of Experiment 1, dyslexic readers thus showed no letter-induced phonetic recalibration while the fluent readers did.

#### Aftereffects Following Exposure to Non-ambiguous Sounds (Selective Speech Adaptation)

A repeated measures ANOVA was also performed on the logodds transformed proportion of /i/-responses after exposure to non-ambiguous sounds. This analysis showed a main effect of Test-sound [F(2,70) = 489.82, p < 0.001, η 2 <sup>p</sup> = 0.93] which interacted with Exposure-token [F(2,70) = 13.78, p < 0.001, η 2 <sup>p</sup> = 0.28] showing that aftereffects were strongest at the most ambiguous test-sound. A main effect of Exposure-token was found [F(1,35) = 379.97, p < 0.001, η 2 <sup>p</sup> = 0.92] indicative of selective speech adaptation (i.e., ViAi and VeAe difference). This effect interacted with Dyslexia [F(1,35) = 4.28, p = 0.046, η 2 <sup>p</sup> = 0.11] due to slightly more negative aftereffects in dyslexics (−0.40 and −0.32 for the DD and Control group, respectively). None of the other effects were significant (all p-values > 0.68). Post hoc one-sample t-tests showed that aftereffects were significantly smaller than zero in both the DD [t(18) = 14.22, p < 0.001, η <sup>2</sup> = 0.92] and the Control group [t(17) = 13.53, p < 0.001, η <sup>2</sup> = 0.92].

## DISCUSSION

As developmental dyslexia has been associated with reduced integration of text and speech sounds, we investigated whether this deficit becomes manifest when text is used to induce phonetic recalibration. More specifically, we investigated whether dyslexic readers use orthographic information to recalibrate their phoneme boundary and compare this to their ability to use lipread speech for recalibration. In Experiment 1, adults with DD had no text-induced recalibration for a /b-d/ phoneme boundary, whereas lipread-induced recalibration was normal. In Experiment 2, the same absence of text-induced recalibration was found for an /e-I/ boundary. Together, these results demonstrate that dyslexic readers do not use disambiguating orthographic information to adjust their phoneme boundaries in a comparable way as fluent readers do.

Importantly, dyslexics' recalibration by lipread speech was as in normal readers. This is in line with Baart et al. (2012) showing that dyslexic and fluent readers have comparable lipread recalibration effects. Together, these data speak to the question whether deficits in grapheme-phoneme association in DD are specific for visual orthographic information, or are the result of a more general auditory-visual association deficit (Blomert, 2011; Hahn et al., 2014). Our data clearly suggest that dyslexic readers have a specific orthographic integration deficit. Further research is needed, though, to address this question from a broader context. In particular, others have found that DD might be associated with more general audio–visual integration processes. For example, Harrar et al. (2014) showed that dyslexics have problems with multisensory integration of simple nonlinguistic stimuli, Francisco et al. (2017) showed a correlation between reading errors and audiovisual temporal sensitivity for speech and non-speech stimuli, and Widmann et al. (2012)

showed that dyslexic children did not integrate visual symbolic and auditory sensory information into a unitary audiovisual object representation (though see Widmann et al., 2014). Of relevance for the present study, it remains to be examined whether individuals with DD might have more subtle integration problems with auditory and lipread speech than we could observe here (De Gelder and Vroomen, 1991). For example, it is conceivable that recalibration for lipreading was at ceiling in both groups, but that deficits in lipreading in DD would become visible if the lipread stimuli were more varied and more difficult than the relatively easy to lipread /b-d/ contrast. It might also be the case that deficits in lipreading in DD are less noticeable in these repetitive listening conditions and become more evident in more challenging listening conditions like presentations of speech in noise. This would be in line with other studies showing that adults and children with DD gain less from lipreading when speech is presented in noise (Hayes et al., 2003; Ramirez and Mann, 2005; Van Laarhoven et al., 2018).

In the present study we found that individuals with DD show intact phonetic recalibration when it was induced by lipread information, but not when induced by text. This raises the question whether dyslexics might also show deficits in another well-studied form of speech recalibration, namely phonetic recalibration driven by lexical information. Lexical recalibration was first demonstrated by Norris et al. (2003) and is a form of phonetic recalibration in which the lexical context of a spoken word provides the disambiguating information for a phonetically ambiguous sound. For example, a speech sound halfway between /f/ and /s/ is heard as /f/ when embedded in the Dutch word witlof (i.e., chicory) but as /s/ when embedded in naaldbos (i.e., pine forest). Although we are not aware of studies investigating lexical recalibration in dyslexia, Blomert et al. (2004) showed that dyslexic and normally reading children exhibit comparable context effects in speech perception at auditory, phonetic, and phonological levels of processing. Together with the presently observed absence of a general problem in audiovisual recalibration of speech, we would thus predict normal lexically driven recalibration in dyslexia. This prediction would also be in line with the typical dyslexia profile of phonological deficits combined with spared non-phonological language skills (Ramus et al., 2013). Further research is needed to examine this question.

Since the results of the present study demonstrate that dyslexic readers show specific deficits in grapheme-phoneme associations, the question arises whether training in graphemephoneme associations would result in less prominent reading and spelling problems in DD. In a recent study, Fraga Gonzalez et al. (2015) investigated whether an intensive 6-month letter-speech sound integration training leads to improved reading fluency in dyslexic children. The results indicated faster improvements at word reading and spelling measures in dyslexic children who followed the training in comparison to a control group of dyslexic children without training. Comparable findings were reported by Žaric et al. (2015) who further showed that deficiencies in audiovisual ERP (MMN and a late negativity) modulations that are typically shown in dyslexic readers when being presented with letter-speech sound stimuli, are reduced by letter-speech sound training. Future research might therefore investigate whether dyslexics develop orthographically induced recalibration after longer periods of training to letter-speech sound combinations.

Exposure to non-ambiguous speech sounds led to selective speech adaptation effects in both visual conditions (orthographic and lipread). This fits previous reports demonstrating that the origin of the aftereffects (i.e., selective speech adaptation) mostly depends on the acoustic nature of the exposure stimulus (Eimas and Corbit, 1973; Samuel, 1986; Vroomen et al., 2004a; Samuel and Lieblich, 2014) rather than on the combination of the auditory and visual stimuli, as in the case of phonetic recalibration. Given that the same auditory stimulus was used for non-ambiguous exposure in both orthographic and lipread conditions, it is not surprising that both these visual conditions induced selective speech adaptation effects. In addition, the finding that both dyslexic and normal readers showed selective speech adaptation aftereffects, suggests the absence of general speech perception deficits in dyslexia (see also Ramus, 2003; Blomert, 2011).

In Experiment 1, the audiovisual timings of lipread speech versus text (relative to the ambiguous sound) may be somewhat different from each other, but in our view this is not crucial for the interpretation of the data. With lipread speech, the sound and lip-movements were synced, but in the orthographic context, the text was presented 450 ms prior to the speech sound. At first sight, it may seem then that the text precedes the audio whereas the video does not. However, it is important to note that the videos also contain anticipatory information such that 'b' or 'd' can be lipread before the ambiguous sound is heard (although their exact timing is difficult to measure). Both the orthographic and the lipread context thus provide visual information about 'b' or 'd' before the crucial part of the sound is heard. This is in agreement with data showing that the effect of written text on the reported clarity of noise-vocoded speech is most pronounced when text is presented before (rather than after) speech, and that this effect only declines when text is presented more than 120 ms after speech onset (Sohoglu et al., 2014).

Another interesting finding that deserves further discussion is that lipread speech induced larger recalibration effects than text. This may seem surprising because 'viseme' categories for lipread speech (the class of phonemes that looks the same) do not have a one-to-one correspondence to phonemes. For example, lipread information about bilabial closure can correspond with phonemes /b/, /p/, and /m/, whereas textual information of 'b' unambiguously corresponds to the sound /b/. In essence, lipread speech thus contains less phonetic information than text, but it nevertheless induces larger recalibration effects. Similar observations have been made with EEG studies using an audiovisual mismatch negativity paradigm [MMN, a component of the event-related potential (ERP) reflecting pre-attentive auditory change detection] in which deviant text or lipread speech was used to induce an illusory change in a sequence of identical ambiguous sounds halfway between /aba/ and /ada/. Results showed that only deviant lipread speech induced a so-called McGurk-MMN, but not deviant text (Stekelenburg et al., 2018). Text thus appears to have weaker effects on sound processing

than visual speech, also if measured at the neurophysiological level measured via EEG. It should be mentioned though that in fMRI, both lipread and text-speech sound associations do induce changes in speech perception that are measurable as subtle changes in auditory cortical activity (Kilian-Hutten et al., 2011; Bonte et al., 2017). Thus both following lipread and text-based recalibration, it is possible to retrieve participant's perceptual interpretation of the ambiguous speech sounds from posterior auditory cortical activity patterns, indicating that both types of inducer stimuli can serve a disambiguating role in phonetic adjustments. A potential difference that may account for why lipread speech is usually more potent than text is that lipread sound-sight associations are natural and acquired early in life whereas letter-speech sound associations are culturally defined and acquired at school-age by extensive reading training (Liberman, 1992). According to this line of reasoning, it may not be that surprising that the earlier acquired lip-speech sound associations induce larger effects as compared to the later acquired text-speech sound associations. Admittedly though, further research is needed to fully elucidate the different effects that text and lipread speech have on speech sound processing.

To summarize, the present study demonstrates that, unlike fluent readers, dyslexic readers do not show orthographic induced recalibration. Together with previous findings, this suggests that individuals with DD have difficulties in learning and

## REFERENCES


Brus, B. T., and Voeten, M. J. M. (1997). Een-Minuut-Test. Amsterdam: Pearson.


applying letter-speech sound associations. Since dyslexic readers did not show deficits in lipread-induced phonetic recalibration effects, these findings additionally point into the direction of auditory-visual association deficits in DD that are specific for orthographic information, rather than originating from a general auditory-visual integration deficit.

## AUTHOR CONTRIBUTIONS

All authors contributed to the study design. Testing and data collection were performed under supervision by MK. MK performed the data analysis and drafted the manuscript. MB and JV provided critical revisions. All authors approved the final version of the manuscript for submission.

## FUNDING

MB was supported by NWO-VIDI Grant 452-16-004.

## ACKNOWLEDGMENTS

We thank Elsemiek Nabben and Lemmy Schakel for testing and Merel Burgering for creating the /wet/–/wIt/ continuum.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling Editor declared a past co-authorship with one of the authors MB.

Copyright © 2018 Keetels, Bonte and Vroomen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# GraphoLearn India: The Effectiveness of a Computer-Assisted Reading Intervention in Supporting Struggling Readers of English

Priyanka Patel<sup>1</sup> \*, Minna Torppa<sup>2</sup> , Mikko Aro<sup>1</sup> , Ulla Richardson<sup>3</sup> and Heikki Lyytinen4,5

<sup>1</sup> Department of Education, University of Jyväskylä, Jyväskylä, Finland, <sup>2</sup> Department of Teacher Education, University of Jyväskylä, Jyväskylä, Finland, <sup>3</sup> Centre for Applied Language Studies, University of Jyväskylä, Jyväskylä, Finland, <sup>4</sup> Department of Psychology, University of Jyväskylä, Jyväskylä, Finland, <sup>5</sup> Niilo Mäki Institute, Jyväskylä, Finland

India, a country with a population of more than 1.3 billion individuals, houses the world's

#### Edited by:

Jurgen Tijms, University of Amsterdam, Netherlands

#### Reviewed by:

Anna Steenberg Gellert, University of Copenhagen, Denmark Jan Christopher Frijters, Brock University, Canada

#### \*Correspondence:

Priyanka Patel priyanka.v.patel@student.jyu.fi; prpatel@student.jyu.fi

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 30 March 2018 Accepted: 04 June 2018 Published: 26 June 2018

#### Citation:

Patel P, Torppa M, Aro M, Richardson U and Lyytinen H (2018) GraphoLearn India: The Effectiveness of a Computer-Assisted Reading Intervention in Supporting Struggling Readers of English. Front. Psychol. 9:1045. doi: 10.3389/fpsyg.2018.01045 second largest educational system. Despite this, 100 of millions of individuals in India are still illiterate. As English medium education sweeps the country, many are forced to learn in a language which is foreign to them. Those living in poverty further struggle to learn English as it tends to be a language which they have no prior exposure to and no support at home for. Low-quality schools and poor instructional methods further exacerbate the problem. Without access to quality education, these individuals continue to struggle and are ultimately never given the chance to break the cycle of poverty. The aim of this study was to determine whether GraphoLearn, a computer-assisted reading tool, could be used to support the English reading skills of struggling readers in India. Participants were 7-year-old, grade 3 students (N = 30), who were attending an English-medium public school in Ahmedabad, India. English was not a native language for any of the students and all were reading at a level below that of Grade 1 despite having attended school for 2 years. Half of the students played GraphoLearn (n = 16) while the other half played a control math game (n = 14) for 20–30 min a day, over a period of 8 weeks. GraphoLearn led to significant improvements in children's letter-sound knowledge, a critical factor in early reading development. Overall, the study opens doors for GraphoLearn as a potential intervention to support struggling readers of English in India, including those who are learning a non-native language and coming from at-risk backgrounds.

Keywords: GraphoLearn, reading intervention, computer-assisted learning, phonics, grapheme-phoneme correspondence, English language learners, India

## INTRODUCTION

Despite international moves and agreements to improve literacy around the world, many developing countries are still struggling with high rates of illiteracy. India, a country with a population of 1.3 billion individuals, only has a literacy rate of 72% among those 15 years and older (UNESCO, 2015). In a country developing as quickly as India, an illiteracy rate which leaves 100 of millions as illiterates is highly concerning as it puts many individuals at risk of never being

able to reach opportunities and act as contributing members of society. With 17 official languages (as recognized by the United Nations) and more than 700 dialects (Mitra et al., 2003; Dixon et al., 2011), and with 21% of the population, or 269 million people, living below the poverty line (The World Bank, 2011), solving India's literacy crisis is an extremely large task.

Education plays a major role in literacy and, therefore, some believe that one strategy to start combatting the problem may be to look at countries with successful education systems and borrow interventions that can be implemented elsewhere (Ojanen et al., 2015). Children in India, especially those living in poverty, face many problems in education. Slum and other low-income children are forced to attend low quality schools, which are under-resourced and use poor teaching methods (Cheney et al., 2005; Kingdon, 2007). With a country-wide push towards English medium education, these students are studying in a language which they may have no prior exposure to and no support at home for. Due to factors such as these, many children struggle to learn English and attain a quality education. In turn, many of these children will never have the option of higher education, and once again, they will find themselves stuck in the cycle of poverty. According to The World Bank (2012) 45% of the poor are illiterate as compared to 26% of the non-poor.

The purpose of this study was to determine whether GraphoLearn, a computer-assisted tool for reading instruction, originally created for struggling readers of Finnish, could be used to support struggling readers of English in India. The major focus is on slum children attending government-aided public schools in Ahmedabad, India, who are non-native speakers of English, and at high risk of never achieving fluent English literacy.

### English in India

English as a language was originally brought to India by the British who arrived in the 1600s and established trade posts through the East India company (Mehrotra, 1998). English was used throughout the British rule between traders and merchants, as well as by Christian missionaries (Mehrotra, 1998). During this time, English was viewed as a language of the elite, a view that has been upheld even post Indian independence in 1947 (Mishra and Stainthorp, 2007). Being that India is a highly multicultural country, English has been maintained, and acts as a common bridging language across states (Mitra et al., 2003). British rule brought with it a tradition of English medium education to India (Annamalai, 2004) which was maintained as there was no other language throughout the country which would be accepted by the linguistic minorities (Mishra and Stainthorp, 2007).

In present day India, it is common for individuals to use a variety of languages in everyday life (Mishra and Stainthorp, 2007). It may even be that one language is used in the workplace or school, while another language is used in speaking to peers, and then the mother tongue is used in speaking to family and other relatives. Today, English is the only language that is taught in all states and in the most number of schools across the country (Annamalai, 2004). Individuals who speak English are coveted by employers (Mitra et al., 2003; Annamalai, 2004) and it has become a very important language, particularly in higher education (Mehrotra, 1998; Annamalai, 2004; Cheney et al., 2005), with the majority of high level institutions only providing instruction in English. As a result, English has the ability to influence the standard of living in India; with those having better English skills getting better job opportunities, and in turn better pay (Mehrotra, 1998; Mitra et al., 2003). As parents realize the opportunity that comes with learning English, many are actively choosing to enroll their children in English medium schools. This is true even for parents from slum areas who have started accepting that the ability to read, write, and speak in English will increase opportunity for their children (Mehrotra, 1998; Mitra et al., 2003; Dixon et al., 2011). Currently, there are 90 million children across various socioeconomic statuses that are becoming literate in English (Kaila and Reese, 2009).

However, children growing up in slum communities are at a large disadvantage when it comes to learning the English language (Annamalai, 2004). In English medium schools, English is the primary language of instruction, meaning that all subjects are taught in English, with regional and other languages taught as second and/or third languages. Slum children often have no exposure to English prior to entering school, as parents typically cannot speak or communicate in English. It is also likely that these parents are illiterate in their mother tongue as well (Dixon et al., 2011), meaning that their children will have no exposure to literacy in any language prior to school entry. According to Nag (2013) children who miss such supports, such as having a print rich environment with access to reading material or an adult to read to them, tend to develop profiles which are similar to those with dyslexia or other reading difficulties. Thus, children are at high risk even before they enter the school.

Parents from the lower levels of society, typically have two choices in terms of schools for their children; government -aided public schools or low-income, unaided private schools (Cheney et al., 2005). Due to the high demand for English, there has been a "mushrooming" of low-cost private schools (Tooley and Dixon, 2005), and now English is also taught as a primary language in public government schools. In most of these public and private schools, teaching quality is low and children are forced to rote learn a language they do not fully understand (Annamalai, 2004; Dixon et al., 2011). On the contrary, there are many private schools across the country which follow international board curriculum and provide high quality English education. However, these schools charge high fees making them inaccessible to the low-income population (Cheney et al., 2005).

According to the latest Annual Status of Education Report (ASER), 95.9% of children ages 6–14 are enrolled in school across India (2016). Although school enrollment is high, learning achievements of these enrolled children are consistently low (Kingdon, 2007). Across all languages, only 47.8% of children in Grade 5 are able to read a Grade 2 level text (ASER, 2016). When looking at English, of all surveyed children in Grade 3, only 19.3% could read simple words such as "day" or "sit" (ASER, 2016). Although the ASER report only surveys children in rural India, data from the National Achievement Survey (NAS) shows that the situation in urban India is not strikingly different. The NAS for Grade 3 students has three measures on the language assessment; listening comprehension, word recognition, and reading comprehension. Across the nation, the average score was

257 out of a total 500, leaving approximately 50% of Grade 3 students unable to perform at grade level (NCERT, 2014).

## Grapheme-Phoneme Correspondences and Early Reading Acquisition

Learning to read in any language requires understanding the links between the spoken language and its written form. More specifically, those who are learning to read must understand the grapheme-phoneme correspondences (GPC's) that occur within a particular language. It has been well established that knowledge of grapheme-phoneme correspondences directly impacts fluent reading (e.g., Ehri, 2005) and such knowledge is necessary for further development of reading skills.

However, the ease of reading acquisition is greatly determined by the orthographic depth of a language. Many researchers agree that reading acquisition in English, is much more complicated than reading acquisition in many other languages, due to its deep orthography (see Seymour et al., 2003). The grapheme-phoneme correspondences in English are more complex and contextdependent and therefore, there is still some disagreement on how early reading instruction in English should proceed. Some argue that English, and other opaque orthographies, might be more effectively introduced through larger units, also known as rime units, rather than at the level of single graphemes and phonemes (Goswami, 1986, 1988), as they tend to be more consistent. It is believed that English-speaking children may benefit more if focus is put towards teaching these larger rime units and can then use rime analogies from words that they already know to read unfamiliar words as well (Goswami and Bryant, 1990).

However, when compared to instruction based on small units, some studies have failed to find any significant differences when comparing instruction based on grapheme-phoneme correspondence as compared to onset rime (e.g., Haskell et al., 1992; Levy and Lysynchuk, 1997). A study conducted by Christensen and Bowey (2005) compared children participating in two explicit, decoding programs, one which was based on orthographic rimes and a second which was based on grapheme-phoneme correspondences. The study also involved a control group which received implicit phonics instruction. Not surprisingly, it was found that both of the explicit instruction groups outperformed the implicit control group in reading and spelling. Interestingly, the study also showed differences between the orthographic rime group and the grapheme-phoneme correspondences group, with the graphemephoneme correspondences group performing better at reading and spelling unfamiliar words. The role of grapheme-phoneme correspondences in reading development have also been established amongst children who are non-native speakers of English. Researchers in Canada compared children who were either native speakers of English or native speakers of Punjabi, all of whom were attending school in English. They found that both groups of students were reliant on grapheme-phoneme correspondences when they were presented with unfamiliar words. Similarly, for both groups, errors in reading were due to the inability to apply grapheme-phoneme correspondences to unfamiliar words (Chiappe and Siegel, 1999) with poor readers being less skilled at this application.

Nevertheless, there tends to be consensus that early reading instruction through phonics (individual phonemes or onsetrime) should follow a systematic approach in which children are taught to connect spoken language segments to their corresponding written forms (Wyse and Goswami, 2008; Kyle et al., 2013). Automatization of this phonetic knowledge of a language plays a critical role in early reading development and later reading skill (Ehri, 1998; Juel and Minden-Cupp, 2000).

## Reading Instruction: From Rote Memorization to Systematic Phonics

Children studying English in India, particularly those in lowincome schools, are taught English in a rote manner (Annamalai, 2004; Dixon et al., 2011). Students learn the names of letters, rather than sounds, and are then expected to learn "common" words as a whole in which students essentially learn to recognize words through sight. Like words, sentences are also learned through a method of rote memorization in which someone points to the words written on the board, which are then chanted by the rest of the class (Dixon et al., 2011). Through such rote learning methods, children are unable to blend or decode unfamiliar words and are therefore, only able to "read" words which are familiar to them, but that too often with limited comprehension. The NAS uses reading comprehension as the primary measure of language knowledge of Grade 5 students across India. In 2015, it was found that nationally Grade 5 students only scored an average of 48.2% (out of a total of 100%) on the reading comprehension assessment (NCERT, 2015). Thus providing evidence against such rote methods of reading instruction to teach English in India.

One of the most popular methods of early reading instruction in English-speaking countries has been through systematic phonics. The phonics approach involves explicitly instructing readers on the linkages that exist between letters and their corresponding sounds, and how that is then used to read words. Synthetic phonics approaches, in which children learn small units of language (graphemes and phonemes) are believed to be the most logical way to support early reading development (e.g., Seymour and Duncan, 1997; Hulme et al., 2002). Major correspondences are taught, as well as vowel sounds, digraphs, blends, onsets, and rimes (Ehri et al., 2001). There is ample support for systematic, synthetic phonics programs among native speakers of English (e.g., Ehri et al., 2001; Johnston and Watson, 2005). Fortunately, there is also strong evidence in favor of synthetic phonics programs for children learning English as a second language. A study by Stuart (1999) looked at reading instruction for 5-year-old children through a synthetic phonics program, Jolly Phonics, versus a more holistic program which placed no explicit importance on phonics. Majority of the sample (N = 96 out of 112) were children who were learning English as a second language. Results showed a significant positive effect of the Jolly Phonics intervention on the children's reading and writing development which persisted even a year after the initial intervention. Based on these results, researchers concluded

that early structured, rapid, and focused teaching of phonetic manipulation actively supports development of this knowledge, even for children who are non-native speakers of the language (Stuart, 1999). A follow up study by Stuart also showed that even if children have not been taught using phonics at the start of school, they can catch up through structured and intensive phonics training (Stuart, 2004).

Such findings of the effectiveness of phonics teaching among second language learners is important for the Indian context as children in India are predominantly bilingual (and in some cases even multilingual), which creates a unique educational situation. Most children are exposed to their mother tongue prior to entering school, upon which they may begin to study in a language which they have no previous exposure. If the mother and father happen to speak different languages, then they may already encounter two different languages before starting formal schooling (Mishra and Stainthorp, 2007).

Synthetic phonics approaches have made their way to developing countries more recently; India being one such country of study. Dixon and colleagues tested the Jolly Phonics intervention with children attending English-medium, lowincome private schools in Hyderabad, India. There was an experimental group which received the intervention for an hour per day for 6 months by the teacher, and a control group which received the traditional English instruction, typically involving rote-learning and whole word recognition. Results showed a statistically significant difference between the experimental and control groups, with the experimental group performing better on tasks of reading, spelling, and sounding out letters and words (Dixon et al., 2011). Effect sizes (d) were particularly strong for tasks assessing sound value of letters (16.18), blending (1.20), sentence dictation (1.01), and spelling (.86). Findings such as these strongly support the idea that phonics interventions could be successful to improve emergent English literacy in India.

## Why Technology?

As it can be seen, there are a number of factors working against slum community children in India, when it comes to learning to read in English. Coming from homes, where parents may also be illiterate, children are suddenly forced to learn in a language which they may have no prior exposure to. Mother tongue instruction also may not be seen as an ideal option in a place like India, where English is given such high importance and has the potential to open many more doors. However, the rote methods teachers are currently using are clearly not helping students to achieve. Thus, the children are put in a situation where, although they are attending English medium schools, they may never acquire sufficient English literacy. The few studies which have been done using synthetic phonics instruction to teach English in India have produced promising results (Dixon et al., 2011). However, due to the numerous demands faced by teachers in India, as well as a potential lack of skill, changing instructional methods may seem intimidating for many. Technology, on the other hand, has the potential to help teachers overcome some of these barriers, and in turn allow them to provide the high-quality literacy instruction that all children deserve.

India has always been a strong player in the IT industry (Mitra et al., 2003; Kingdon, 2007). The Indian Market Research Bureau along with the Manufacturers' Association for Information Technology (MAIT-IMRB) has reported the tablet market in India to be growing at a rate of 73% (as cited in Central Square Foundation, 2015). Smartphone use is also becoming widespread as more and more low cost models come on the line (Central Square Foundation, 2015). As a result, the Indian government has also been actively working to integrate technology into the educational space through various initiatives. One such initiative is the "ICT@Schools" scheme. According to the Ministry of Human Resource Development, the government has spent 2585 crore Indian rupees (approximately 38 million USD), to install technological infrastructure in about 86,000 schools across the country (as cited in Central Square Foundation, 2015).

Researchers have found that not only is technology-led instruction benefiting children's learning (Banerjee et al., 2007), it is also cost effective and time effective (Muralidharan et al., 2017). Insights from studies across the educational technology sector in India have shown the benefits of, and continuing need for, technology that allows for differentiated instruction through personalized learning (Central Square Foundation, 2015). Though technology is greatly influencing modern educational spaces, there has been criticism against solely using technology as an intervention. A meta-analysis comparing technology use for direct versus support instruction resulted in a slightly greater effect for support instruction (see Tamim et al., 2011). Supporting results have been found when technology as a teacher compliment versus a teacher substitute was studied in the context of India. Linden (2008) found that students who received a math intervention as a substitute to teacher delivered curriculum performing significantly worse than students who received the intervention as a compliment to teacher instruction. Similarly, a study comparing the effects of a computer-based intervention to teacher implemented activities found that different students benefited from different interventions, with the lower performing students benefiting more from the teacher implemented activities and the higher performing students benefiting more from the computer-based intervention (He et al., 2008).

## The GraphoLearn Method

GraphoLearn,<sup>1</sup> previously known as GraphoGame, is a theoretically driven computer-assisted tool for early reading that provides training on the connections between spoken and written language by explicitly instructing on grapheme-phoneme correspondences. The structure of the game is based on a theory of teaching small units, or 1–2 phonemes first, as this phonetic knowledge has been shown to be a strong predictor of later reading skill (e.g., Seymour and Duncan, 1997; Hulme et al., 2002). It was originally devised for readers of a transparent orthography, Finnish, based on longitudinal data that was collected through the Jyväskylä Longitudinal Study of Dyslexia (Lyytinen et al., 2007, 2009; Richardson and Lyytinen, 2014).

<sup>1</sup>http://info.grapholearn.com/

The Finnish version of GraphoLearn has been adapted to other languages around the world, English being one, and results have been promising in many countries across various languages (e.g., Saine et al., 2011; Kyle et al., 2013; Ojanen et al., 2015; Ruiz et al., 2017). To date, there has been no study which has used GraphoLearn to support non-native speakers of English.

There are two GraphoLearn English versions GraphoLearn English-Rime and GraphoLearn English-Phoneme. Prior to the current study, there has only been one published study done investigating GraphoLearn English. Kyle et al. (2013) tested the efficacy of the two versions of GraphoLearn English as a supplementary tool for students who were native English speakers in the United Kingdom. Results showed significant improvements in basic reading skills of the intervention group as compared to the controls for both game versions, but were unable to conclude that one version was more effective than the other. In the present study, GraphoLearn English-Rime was utilized. It incorporates the idea of teaching slightly larger rime units in addition to single grapheme-phoneme correspondences due to the orthographic complexity of English as a language (e.g., Goswami, 1986, 1988). In both game versions players are first introduced to single grapheme-phoneme correspondences. However, rather than introducing them all at once, in GraphoLearn English-Rime, grapheme-phoneme correspondences are introduced in sets of about 7–8 items. These individual letters are then combined to form rime units, and finally whole words. Later in the game, players are also shown whole words in which they must isolate or blend various graphemephoneme correspondences or rime units. Presentation of the grapheme-phoneme correspondences proceeds from the most frequent and consistent to the more infrequent and least consistent (Kyle et al., 2013). Kyle et al. (2013) reported that for the game version used in this study, effect size was large for BAS spelling (0.66) and TOWRE non-word reading (1.43) and medium for BAS reading (0.66) and TOWRE sight word reading (0.53) (Kyle et al., 2013).

## The Present Study

The study reported here examined the efficacy of GraphoLearn, a computer-assisted reading tool, in improving basic reading skills of English by supporting the development of graphemephoneme knowledge, reading, and spelling ability of slum children in India. GraphoLearn was provided as a supplement to teacher instruction to third grade students in an English medium, government-aided public school in Ahmedabad, India. The school was approached based on information retrieved from the class teacher which showed the children as having very low literacy levels. We chose Grade 3 in order to assume that the children had at least 2 prior years of spoken English exposure (starting from Grade 1). Based on previous studies using synthetic phonics (Stuart, 1999; Stuart, 2004; Dixon et al., 2011) and based on previous GraphoLearn studies (Kyle et al., 2013), we expected to see improvements in student performance.

## MATERIALS AND METHODS

## Ethics Statement

Permission to run the study was taken from the Ahmedabad Municipal Corporation School Board, along with the principal and the class teacher. Parents of the children (both pilot and full study) provided written informed consent prior to the start of the intervention. The study was carried out in accordance with guidelines as given by the University of Jyväskylä Ethics Committee. An ethics approval was not required as per the University of Jyväskylä Ethics Committee guidelines and national regulations. However, a statement from the Ethics Committee can be provided upon request.

## Pilot

Prior to the start of the full study, a pilot was conducted including 16 children from a second government-aided public school. These students were also in Grade 3 and had similar demographics as the children who participated in the full study. The pilot phase was run for 3 weeks and the primary purpose of the pilot phase was to experience the type of difficulties which may arise in the full study in a hope to circumvent such difficulties later. After the pilot period, there were some changes that were made prior to the start of the full study. The math game was changed for the controls as the original game which was selected was not long enough for students to play throughout the entire study period. Another change was to the paper-pencil tasks. It was originally planned to conduct a standardized phoneme deletion task as used by Kyle and colleagues (Kyle et al., 2013). However, when attempted with the children during the pilot, it was obvious that most children did not understand the task. Therefore, the standardized phoneme-deletion task was not included in the full study.

## Participants

Thirty-one third graders, ages 7–8 participated in the study. Data provided by the teacher showed that the children, on average, were performing drastically below grade level in literacy. Due to the lack of specialists in the school, it was unknown if any children had additional special needs in learning, but no students had any formal diagnoses of such problems. All of the participating students were consented, at the end of second grade before they left for summer holidays to ensure that the study could begin as soon as possible once they returned. Parents were invited to the school and taken through the consent form as many were illiterate in English. In total, 43 parents provided written informed consent, however, only 31 children ended up participating in the study as some children dropped out of the school prior to the start of the study while other children had extremely irregular attendance or joined the school after the start of the study and therefore could not be included.

Students were randomly allocated to either the experimental group which played GraphoLearn (n = 16) or the control group which played a math game (n = 15). Groups were primarily matched based on age and gender, but basic reading skills, such

as letter-sound knowledge, were also considered based on the information provided by the teacher. All students came from low-income homes, with a majority living below poverty line, and all students were learning English as a second or third language, with no exposure at home to English. All the children, except for one, had been enrolled in the school from Grade 1 and they had all been in the same classroom with the same teacher in both Grades 1 and 2. At the end of the study, there were three students who were unable to participate in all or some parts of the post-test due to illness. One student's data from the control group has been removed because they did not participate in any of the post testing. The other two students' data, both of whom were in the GraphoLearn group, was not removed because one participated in the GraphoLearn post-tests and the other participated in the paper-pencil posttests. Significance values and effect sizes were not affected by eliminating these students' data, and therefore their data has been retained. Final group sizes at post-test were n = 16 for the GraphoLearn group and n = 14 for the control group. As a reward for the participation and cooperation of the class teacher and students involved, a set of 20 English story books were donated to the classroom at the end of the intervention period.

## Procedure

Both groups of children played their respective games (GraphoLearn versus math) for 20–30 min per day, 6 days a week, over a period of 8 weeks. The children played the game on an individual tablet with headphones. All play was done during the regular school day where children were pulled out of their classroom in batches of 12 and then taken to a separate room where the tablets were set up for them. The researcher was present during all play sessions with the students.

## GraphoLearn

GraphoLearn provides adaptive practice in which players see a set of letters or letter strings and hear a corresponding speech sound. Players are expected to select the correct written unit from the 4 to 7 options that correspond to the sound they hear from the headphones. GraphoLearn requires players to create an individual avatar after which they are taken through a series of streams which are divided into levels GraphoLearn English-Rime has a total of 25 streams. Each stream contains anywhere from 5 to 9 levels. The first seven streams start with a level with introduces players to a small set of individual graphemephoneme correspondences (7–8 items), some of which are new and others which are review from previous streams. Once these are introduced, they are then combined to form larger rime units. These larger units are then presented in the context of words. Further in the game, players are introduced to more complex grapheme-phoneme correspondences (e.g., blends and digraphs) and sounds which have multiple possible spellings. After every four streams, there is an assessment stream in which players are assessed on letter-sounds, rime units, and word recognition. Throughout the game, players are presented with auditory targets which they then must match with the correct visual target out of items presented on the screen. The streams are ordered according to difficulty, starting from the easiest and progressing to the more difficult connections present between spoken and written English. To support spelling skills, word formation levels are present in 15 streams. Players are presented with blocks on the screen containing either individual letters or onset and rime patterns which they then have to drag into boxes in the correct order to spell a target word (see **Figure 1**). In order to further support the development of phonological awareness, there are rhyming tasks present in 11 of the streams requiring players to select the target that rhymes with the auditory target they are presented with. In all the levels, if players choose incorrectly, they

are provided with automatic feedback, allowing them to correct themselves. Players must score above 80% on each level within a stream in order to move on to the next stream. To further build motivation, players are rewarded within each level with stars and coins which they can trade in to purchase things for their avatar. Data from the game is automatically saved to an external server when players exit the game so long as the device has an active internet connection (For a detailed description of GraphoLearn English see Kyle et al., 2013).

### Math Game

The math game played by the control group was a Grade 3 level game called "Math for Kids" selected from the Google Play store. It provided students with basic operations problems (addition, subtraction, and multiplication) and students were required to select the correct answer out of four targets provided. Students could select out of three degrees of difficulty (easy, medium, and hard) and their progress in the game was saved meaning they could continue every session where they last left off. The math game was similar to GraphoLearn in that within each level there were multiple sublevels. The game rewarded children with stars and children were instructed to move on to the next level only after collecting at least two stars. The game provided no visual or auditory English input other than at the beginning when children had to select their level. The main purpose of the math game was to ensure that both groups of children spent equivalent amounts of time in the classroom versus outside of the classroom using the technology. As it can be seen in **Table 1**, there were no significant differences in the number of days played or playing times between the two groups.

#### Measures and Assessment Procedure

Students were assessed at pre and post intervention using three tasks in the GraphoLearn software and four paper-pencil tasks. The in-game assessment included the following tasks: letter-sound knowledge, rime unit recognition, and whole word recognition. The standardized paper-pencil tasks included the following tasks: the Single Word Reading subtest from the British Ability Scale (BAS II; Elliot et al., 1996), and the Test of Word Reading Efficiency (TOWRE; Torgesen et al., 1999) which included sight word reading and non-word reading. Students also completed a modified version of the spelling subtest from the BAS II. Students were pre-tested and post-tested by the researcher in the days preceding and the days following the

#### TABLE 1 | Group characteristics.


intervention period. Students were pulled out of the class and completed the BAS II reading, TOWRE sight words, and TOWRE non-words tasks one-on-one with the researcher. The BAS II spelling assessment was given as a whole class dictation and the GraphoLearn in-game assessments were given to students in groups of 12 on the tablets. Both the GraphoLearn and control groups were given basic instructions on how the GraphoLearn assessments work prior to the start of the assessment tasks, and all students were instructed to inform the researcher once they finished an assessment task, and prior to starting the next assessment task. Through this, it was ensured that children were not playing levels which they should not be and all three assessment tasks were only being played once at pre-test and once at post-test.

#### In-Game Assessments

All students completed three in-game assessments in GraphoLearn. The letter-sound task required children to pick the correct letter, out of the options, that corresponded with the sound which was presented to them (see **Figure 2A**). The rime unit task required children to pick the correct 2–3 letter string that corresponded to the pronunciation presented to them (see **Figure 2B**). Finally, the word-recognition test required children to pick the correct word to that which was presented to them (see **Figure 2C**). In all three tasks, players were presented with an auditory target which they were required to match with a visual target, just as in the rest of the game. In total, the letter sounds task contained 24 trials, the rime units task contained 24 trials, and the word recognition task contained 47 trials. The game would discontinue for the rime units task and the word recognition task if players chose incorrectly more than 50% of the time. The average number of trials played within all three tasks are given in **Table 2**. Both the experimental and control groups completed the assessment level prior to and at the end of the intervention period.

#### Paper-Pencil Assessments: Reading

All students in the study completed the Single Word Reading subtest from the British Ability Scale II (BAS II; Elliot et al., 1996) which measures single-word reading accuracy. The test was administered according to the manual and required children to read single-words of increasing difficulty which are listed in groups of 10. The test is discontinued after children miss eight or more words within one group. Internal reliability of the BAS II word reading task has been reported to be 0.98 and test-retest reliability has been reported to be 0.97 as per test review (Thomson, 1997). Students also completed the Test of Word Reading Efficiency (TOWRE; Torgesen et al., 1999). The TOWRE requires students to accurately read aloud a list of sight words and non-words for 45 s. Practice words were given for each section. Internal reliability ranges from 0.86 to 0.98, and test-retest reliability has been reported to be between 0.82 and 0.97 for both tasks, as per test review (Tanna, 2009). It is important to note that these assessments are not standardized for Indian children and therefore only raw scores are provided.

formation task.

fpsyg-09-01045 June 22, 2018 Time: 16:50 # 8

FIGURE 2 | Example screens from the GraphoLearn in-game assessments (A) is from the letter sounds task, (B) is from the rime units task, and (C) is from the word

TABLE 2 | Average number of trials completed within the in-game assessments at


#### Paper-Pencil Assessments: Spelling

All students also completed a spelling subtest which was taken from the British Ability Scale II (Elliot et al., 1996). The task contained a mixture of verbs, nouns, and adjectives, some of which can be spelled phonetically. The dictation test was not carried out according to the instructions suggesting different starting points based on age. Rather, the first 30 words out of the list were dictated to all students with the accompanying sentence. The word and an accompanying sentence were said a maximum of three times and students were expected to write down the word. The score was the number of correctly spelled items out of 30.

## Fidelity to the Program

Fidelity to the GraphoLearn intervention was controlled by the detailed game logs sent to the GraphoLearn server. These logs include the number of days played and seconds spent playing. The first and last play day were also recorded. For the control group, days and time (in minutes) were recorded manually by the researcher. In addition, the primary researcher was present through all play sessions to ensure that the children were engaged in playing the respective games.

TABLE 3 | Descriptive statistics and group comparisons on GraphoLearn tasks.

## RESULTS

Prior to analyses, the distributions of all measures were assessed for normality. The BAS II reading measure at pre-test had two scores which were outliers and caused a right-skewed distribution. The TOWRE non-words measure at pre-test had one score which was an outlier and caused a right-skewed distribution. These scores were winzorized (replaced with a value that was closer to the distribution while retaining the order of values) to meet the assumption of normality. The remaining measures (GraphoLearn letter-sounds, GraphoLearn rime units, and GraphoLearn word recognition, TOWRE sight words, spelling) all produced a normal distribution at both time points.

## Pre-test and Post-test Group Comparisons

The pre-test and post-test means and standard deviations in the two study groups, as well as group comparison results, are reported in **Table 3** for the GraphoLearn tasks and **Table 4** for the paper-pencil tasks.

First, an independent samplest-test was conducted to examine if there were group differences at pre-test or post-test. Due to the small sample size, group differences were also analyzed using non-parametric measures (Mann–Whitney U) but as the results did not differ from those given by the t-test, and therefore, the t-test results are reported. Effect sizes and their confidence intervals at pre-test were also calculated for all measures using Cohen's d with pooled standard deviation. The criteria as that defined by Cohen (1988) is being used


<sup>∗</sup>p ≤ 0.05, ∗∗p ≤ 0.01, ∗∗∗p ≤ 0.001.


TABLE 4 | Descriptive statistics and group comparisons on paper-pencil tasks.

<sup>∗</sup>p ≤ 0.05, ∗∗p ≤ 0.01, ∗∗∗p ≤ 0.001.

fpsyg-09-01045 June 22, 2018 Time: 16:50 # 9

in which d ≥ 0.2 is a small effect, d ≥ 0.5 is a medium effect, and d ≥ 0.8 is a large effect. The results (see **Table 3**) showed that there were no pre-test group differences in the GraphoLearn tasks. Although effect size was small for lettersounds (0.30) in favor of the control group, the confidence interval crossed zero. At post-test, group differences in favor of the GraphoLearn group were significant for all GraphoLearn tasks; letter-sounds (t(27) = 5.73, p = 0.000), rime units (t(27) = 2.31, p = 0.029), and word recognition (t(27) = 2.07, p = 0.048). Effect sizes were large for GraphoLearn lettersounds (2.13) and GraphoLearn rime units (0.85), and medium for GraphoLearn word recognition (0.77), however, only the GraphoLearn letter-sounds had a confidence interval that did not cross zero (1.22, 3.04).

On the paper-pencil tasks, results revealed no significant differences between the groups at neither pre-test nor posttest (see **Table 4**). Effect sizes (d) for the group differences at pre-test were very small and supported the t-test finding of no significant group differences in BAS II reading (0.13), TOWRE sight words (0.24), TOWRE non-words (0.23), and spelling (0.24). Effect sizes for the paper-pencil tasks at post-test were also very small and again supported the t-test finding of no significant group differences in BAS II reading (0.03), TOWRE sight words (0.08), TOWRE nonwords (0.09), and spelling (0.05). Confidence intervals for all paper-pencil measures crossed zero at both pre-test and post-test.

### Group Comparisons of Development From Pre-test to Post-test

Repeated measures ANOVA was used to compare the effects of time (change from pre-test to post-test), group (GraphoLearn versus control), and time<sup>∗</sup> group interaction on the scores (group differences in change).

For the GraphoLearn tasks (letter-sounds, rime units, and word recognition), there was a significant main effect of time on all three tasks (See **Table 3**), with both groups showing improvement from pre- to post-test (see **Figure 3**). For the letter-sounds task, there was a significant main effect for group, as well as a significant interaction effect for time<sup>∗</sup> group, with the GraphoLearn group showing significantly higher scores and faster development than the control group. For the rime unit task, there were no significant main effects for group or interaction effects for group<sup>∗</sup> time. However, the p-values for both the main effect and interaction effect were close to the 0.05 significance level (p = 0.09). Finally, for the word recognition task there were no significant group effects or interaction effects for group<sup>∗</sup> time. For the paper-pencil tasks (BAS II reading, TOWRE sight words, TOWRE non-words, and spelling), there was a main effect for time on all measures (see **Table 4**), with both groups showing improvements from pre to post-test (see **Figure 4**). There were, however, no significant effects of group, nor were there significant time<sup>∗</sup> group interactions for the paper-pencil assessments.

## Group Comparisons of Gain Scores

Finally, groups were compared using gains scores. Gain scores were calculated by subtracting the pre-test score from the posttest score for each individual. Means and standard deviations of the gain scores for both groups are given in **Table 5**, along with group comparisons, effect size (Cohen's d), and confidence intervals for the effect sizes for GraphoLearn versus control. The standard errors of the effect sizes are given in parentheses.

In regards to the GraphoLearn tasks, there was a very large effect on the letter-sound task (2.49) and the confidence interval did not cross zero (1.52, 3.47), allowing us to conclude of a significant difference in favor of the GraphoLearn group. There were medium effects for the rime units (0.64) and word recognition (0.52) tasks, however, confidence intervals on these measures crossed zero. In regards to the paper-pencil tasks, GraphoLearn group versus control group comparison had medium effect sizes on TOWRE non-word reading (0.62) and spelling (0.74). Effect size was small for TOWRE sight word reading (0.31) and almost zero for BAS II single-word reading. Confidence intervals for all paper-pencil measures crossed zero (see **Table 5**).

## DISCUSSION

The present study examined whether GraphoLearn, a computerassisted reading tool, could effectively support the development of basic English reading skills of struggling readers in India. The participants were Grade 3 slum children in India, who were learning English as a non-native language and who typically had no exposure to English outside of the school environment. Students were divided into either the control or experimental group with the control group playing a simple math game and the experimental group playing GraphoLearn for 20–30 min per day, over a period of 8 weeks. Despite a short play period (∼7.5 h) and limited sample size, participants made


TABLE 5 | Means and effect sizes of group differences in gains.

fpsyg-09-01045 June 22, 2018 Time: 16:50 # 11

∗∗∗p ≤ 0.001.

significant gains and effect size was promising for at least the letter-sound knowledge, a critical skill for early reading development.

The GraphoLearn intervention group showed the greatest improvements on the letter-sounds task. Group differences were significant, effect size of the gains from pre to post-test was large, and the confidence interval of the effect size did not cross zero, thus allowing us to conclude that there was in fact an effect of the intervention on the difference between the two groups for the letter-sounds knowledge task.

The results show that GraphoLearn can effectively support the development of English letter-sound knowledge in Indian children, despite the fact that participants were non-native speakers and were exposed to the intervention for a limited amount of time. The ability for GraphoLearn to support the development of letter-sound knowledge to this extent is of importance as letter-sound knowledge has been identified as a critical building block in early reading development, even for non-native readers of English (Muter and Diethelm, 2001). There is also evidence in favor of letter-sound knowledge affecting early literacy skills, particularly word reading (Hulme et al., 2012). GraphoLearn can be seen as a beneficial intervention even for bilingual children supporting the previous finding suggesting that bilingual children can benefit just as much as native English speakers when they are provided with literacy interventions that involve explicit emphasis on grapheme-phoneme relationships (Lesaux and Siegel, 2003).

Although the rime unit and word recognition tasks had effect sizes that were medium to large, confidence intervals crossed zero. Due to our small sample size, it is difficult to obtain significant results, and therefore, future studies will need to be done to study the effects of GraphoLearn English with a larger sample. The lack of significant effects may also be partially due to the short playtime. Participants in this study were non-native speakers of English and only had about 7.5 h of play time, as compared to 11 h in the study done by Kyle and colleagues with native speakers of English (Kyle et al., 2013). Due to the structure of the game, only about 60% of participants reached till stream eight, where the explicit practice of all rime units and their accompanying whole words begin. Further studies are required to determine if greater play time will produce significant effects on the GraphoLearn rime units and word recognition tasks.

Paper-pencil measures of reading and spelling were conducted to determine if there was a transfer of skills learned ingame to a non-game assessment. Although effect sizes of the gains were medium for the non-words and spelling tasks, confidence intervals of the effect sizes crossed zero and reflects insignificant group differences. Due to a lack of availiable measures standardized against such populations, we used measures which were designed for native English speaking children. Unfortunately, however, this created a less than ideal testing situation as the tasks were also quite far from what the game explicitly taught. In addition, given the fact that none of the participants had enough time to finish the game, there were many items (e.g., complex GPC's such as "the rule of e") that participants were not exposed to and therefore, were not able to learn from the game but were required on the paper-pencil measures. Like the in-game assessments, further studies will be required to determine if longer exposure to the game will produce transferable skills. It is also important that future studies use measures which are standardized to such populations.

Overall, the intervention opened the doors for GraphoLearn to be a potential success in the Indian context where the importance of English grows, yet supports for learning the language are lacking for many. We are hopeful that future studies using a larger sample, greater play time, and more effective measures will allow GraphoLearn to be comparable with the few interventions studies that have been done using phonics programs in the Indian setting (e.g., Nag-Arulmani et al., 2003; Dixon et al., 2011), with comparatively less demand of resources. GraphoLearn, as an tool, works by combining successful aspects of previous interventions, while providing individualized learning for students and easy to access data for teachers, factors crucial for implementation and success in a country like India (Central Square Foundation, 2015; Muralidharan et al., 2017). Generalizability of these results will be of question and therefore, it is important that going forward, further testing be done to determine if results improve when the GraphoLearn is used over a longer period of time, with a larger population, and in other parts of India where demands

may differ. Nonetheless, this study provides a good first step in looking at how technology, and in particular GraphoLearn, can be used to support the English reading skills of struggling readers in India.

## Limitations

There are a few limitations that must be taken into consideration when evaluating the results of this study. As mentioned, one major limitation was a small sample size. With a sample size of only 30 children, we were limited by the statistical approaches that could be used on the data, and understand that with a bigger sample, we would have had more statistical power. The small sample size also provided us with a limited capacity to control for unobserved variables, therefore, although we had random assignment, the methodological rigor of this random assignment can only be considered as "moderate." A second limitation was limited intervention time. Although the study was carried out over 8 weeks, the students only played for about 7.5 h. Most inability to play was due to student absenteeism and/or the school being unexpectedly closed. Due to limited play time, no student was able to complete all the streams. Although these factors limit the results of this study, such problems are very real for teachers in India. Therefore, what we see as limited may be what we would actually see if teachers were expected to carry out such and intervention themselves. Third, a methodological limitation that must be considered is the repeated exposure of the GraphoLearn group to the in-game assessments. As previously mentioned, GraphoLearn is built in a way so that students are exposed to an assessment stream after every four practice streams. Thus, students who played GraphoLearn has repeated exposure to the in-game assessments throughout the intervention period, whereas the control group was only exposed to the in-game assessments once at pre-test and once at post-test. This was unavoidable as the in-game measures were necessary to test the skills exactly as taught by the game. Also, using paper-pencil measures which were standardized for native English speakers, made them somewhat difficult for the participants of this study. In the future, this could be avoided by developing experimental measures which are standardized to this particular population. A fourth limitation from the point of view of practical implications was the fulltime presence of a researcher during the intervention period. The presence of an adult who was fully focused on the participating children may have increased motivation. The researcher was also constantly supporting students by calling them if they were not in school and making it possible for them to play any time of the school day. In implementation of the game in everyday practices these conditions are not realistic. Similarly, we as researchers had access to a sufficient amount of equipment and resources (i.e., tablets and headphones, a working internet connection) in order for children to be able to play regularly. Going forward it is important that futures studies take into consideration the realities of implementation as to increase chances of sustainability (Central Square Foundation, 2015). Future studies could also study cost-effectiveness of GraphoLearn as an intervention tool in such localities. Finally, based on the current study, we do not know how the effects will be maintained over time. In future studies, it would be important to conduct follow-ups and determine whether or not effects are maintained by students even post-intervention. Going forward, it would also be important to use assessments which are normed for Indian students as to get more accurate results.

## Practical Implications

The current study sheds insight into the ability of computerassisted reading tools, like GraphoLearn, to support children who struggle to read in India. A logical next step would be to test GraphoLearn English on a larger scale over a longer period. As mentioned previously, the exposure time of students to the game was quite limited due to many uncontrollable factors. Thus, future studies should focus on exposure over a longer duration to determine whether that boosts effects and leads to students being able to transfer the skills they learn in the game to real life situations.

GraphoLearn also opens doors to the ability to provide interventions in children's mother tongue and other native languages. According to the 2001 census, 41% or more than 422 million individuals in India are Hindi speakers. Despite the large number of speakers, there is still a great need for ed-tech developers to cater to students who are studying in a native language in India (Central Square Foundation, 2015).

By now it has become clear that technology has potential to enhance learning, particularly in developing countries where differentiation is necessary, but difficult for a teacher alone to achieve (Muralidharan et al., 2017). However, there are still critical considerations that must be taken into account prior to implementing technology in schools. According to The World Bank (2018), technology should be used as a complement to teachers rather than a replacement for teachers. A study in India where children were provided technology as a teacher substitute within the school versus a teacher compliment out of school showed that children in the within school group learned significantly less (Linden, 2008). As suggested by Muralidharan et al. (2017), it may be most efficient if technology is used to create what they call a "blended learning" environment in which teachers use the information that they can gather from the technology to guide further instruction. In the current study, GraphoLearn was used as an in-school intervention which was meant to supplement teacher instruction. However, because teachers were not using phonics methods to teach English, there was no teacher involvement and therefore it became an isolated activity that the children performed during the day. In a previous study which looked at the effectiveness of GraphoLearn in Zambia, it was shown that an intervention design in which both students and teachers were trained on and played GraphoLearn lead to the greatest improvements in student learning (Jere-Folotiya et al., 2014). Thus, it must be considered how the technology can be used in greater collaboration with teachers as well. GraphoLearn could provide teachers in India with an alternative to the currently used "rote-memorization" approach, and further increase the use of phonics as a method to teach English literacy in India.

## AUTHOR CONTRIBUTIONS

fpsyg-09-01045 June 22, 2018 Time: 16:50 # 13

PP performed this study as part of her master's thesis (Patel, 2018). She collected the data and is the main author of the paper. MT supervised PP in her master's thesis and supported planning of data collection, data analysis, and writing of the paper. MA provided guidance on the writing and proofreading of the paper. UR and HL provided their expertise on the details of the GraphoLearn software, as well as supporting the writing and proofreading of the paper.

## REFERENCES


## FUNDING

HL's work was supported by Academy of Finland with grants #292493 and #311737. MT's work was supported by Academy of Finland with grants #276239, #284439, and #313768.

### ACKNOWLEDGMENTS

We would like to thank the Ahmedabad Municipal Corporation School Board for granting the permission to carry out this study in their schools, and also the teachers for being cooperative through the entire intervention period. This article is based off of the master's thesis written by the first author in the Department of Education at the University of Jyväskylä, Finland.


to support reading acquisition. Nord. Psychol. 59, 109–126. doi: 10.1027/1901- 2276.59.2.109


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer JF declared a shared affiliation, with no collaboration, with one of the authors UR to the handling Editor.

Copyright © 2018 Patel, Torppa, Aro, Richardson and Lyytinen. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Brain Responses to Letters and Speech Sounds and Their Correlations With Cognitive Skills Related to Reading in Children

Weiyong Xu1,2 \*, Orsolya B. Kolozsvari1,2, Simo P. Monto1,2 and Jarmo A. Hämäläinen1,2

<sup>1</sup> Department of Psychology, University of Jyväskylä, Jyväskylä, Finland, <sup>2</sup> Jyväskylä Centre for Interdisciplinary Brain Research, University of Jyväskylä, Jyväskylä, Finland

Letter-speech sound (LSS) integration is crucial for initial stages of reading acquisition. However, the relationship between cortical organization for supporting LSS integration, including unimodal and multimodal processes, and reading skills in early readers remains unclear. In the present study, we measured brain responses to Finnish letters and speech sounds from 29 typically developing Finnish children in a child-friendly audiovisual integration experiment using magnetoencephalography. Brain source activations in response to auditory, visual and audiovisual stimuli as well as audiovisual integration response were correlated with reading skills and cognitive skills predictive of reading development after controlling for the effect of age. Regression analysis showed that from the brain measures, the auditory late response around 400 ms showed the largest association with phonological processing and rapid automatized naming abilities. In addition, audiovisual integration effect was most pronounced in the left and right temporoparietal regions and the activities in several of these temporoparietal regions correlated with reading and writing skills. Our findings indicated the important role of temporoparietal regions in the early phase of learning to read and their unique contribution to reading skills.

#### Edited by:

Gorka Fraga González, University of Amsterdam, Netherlands

#### Reviewed by:

Joao Correia, Maastricht University, Netherlands Mario Braun, University of Salzburg, Austria

> \*Correspondence: Weiyong Xu weiyong.w.xu@jyu.fi

Received: 30 April 2018 Accepted: 16 July 2018 Published: 03 August 2018

#### Citation:

Xu W, Kolozsvari OB, Monto SP and Hämäläinen JA (2018) Brain Responses to Letters and Speech Sounds and Their Correlations With Cognitive Skills Related to Reading in Children. Front. Hum. Neurosci. 12:304. doi: 10.3389/fnhum.2018.00304 Keywords: letter-speech sound integration, brain development, magnetoencephalography, auditory cortex, language learning, reading

### INTRODUCTION

Letter-speech sound (LSS) integration is a key step in learning to read for alphabetic languages. The development and reorganization of early readers' language circuits for supporting automatized LSS integration and how such integration is related to the development of fluent reading are crucial questions from both theoretical and practical point of view (Shankweiler et al., 2008; Dehaene et al., 2015). Research has shown that in early readers, the print-speech convergence (as measured by coactivation in fMRI) in the left reading network (inferior frontal gyrus, inferior parietal cortex, and fusiform gyrus) is a significant predictor of reading achievement measured 2 years later (Preston et al., 2016). In another study using four contrasting languages to find common indices of successful literacy acquisition, highly similar neural organization for print-speech convergence was observed between the languages. Furthermore, such print-speech convergence was suggested as a common brain signature of reading proficiency (Rueckl et al., 2015). However, little is known

about the interrelationships between brain mechanisms of speech perception, letter processing, LSS integration and the development of reading skills during childhood.

In order to understand the development of LSS integration, which is a form of audiovisual integration, auditory and visual processes also need to be taken into account. The maturation of auditory and visual cortices is reflected by changes in the auditory and visual evoked responses. In general, the auditory evoked responses have been shown to change greatly with the tendency of shortened latencies and decreased amplitudes from childhood to adulthood (Albrecht et al., 2000). For example, the auditory P1 and N1b (the supratemporal component of the N1) peaks show large age-related decreases in latency. In addition, auditory P1, P1-N1b, and N2 peak amplitudes change throughout childhood with accelerated change around the age of 10 years (Ponton et al., 2000). For the visual components, there is a clear delay in the activation timing in children compared to adults, which progressively increases from occipital (related to low-level visual analysis) to occipitotemporal (related to letters/letter string analysis) and further to temporal areas (related to written word perception) (Parviainen et al., 2006).

It has been shown that audiovisual speech produces audiovisual interaction effects reflected as both suppression of the visual response to lipreading and reduced auditory responses to the speech sound compared with unimodal conditions (Besle et al., 2004, 2008). One study used audiovisual speech and audiovisual non-linguistic stimuli to investigate the developmental pattern of audiovisual interactions in the age range of 5–19 years (Tremblay et al., 2007). The results showed that the strength of audiovisual speech integration significantly correlated with age, whereas the performance on non-speech tasks seemed to be similar across all ages. These findings suggest independent maturational processes for audiovisual speech and non-speech during childhood. Converging evidence from electrophysiological research revealed a systematic relationship between brain responses underlying audiovisual integration (of simple audiovisual sounds and objects) in the time range of the auditory N1 response (about 120 ms) and age between 7 and 16 years (Brandwein et al., 2011). Multisensory processes are thus still developing even in late childhood and this maturation is likely to be reflected in learning and automatization of LSS correspondences, as well as in the associations with reading skill development.

As children learn to read, their sensitivities to print are paralleled by changes in an occipitotemporal negativity N1 (or N170) to words as measured by event-related potentials (Brem et al., 2010; Maurer et al., 2010; Fraga González et al., 2014). This visual N1 has been shown to develop with reading skills, showing an inverted U-shaped developmental trajectory with maximum N1 tuning (selectivity for print) during the second grade and further decrease of the N1 tuning in adults (Maurer et al., 2005, 2006). Neuroimaging studies have localized the visual print-sensitive N1 in a region within the left fusiform gyrus called "visual word form area" (VWFA) (McCandliss et al., 2003; Dehaene and Cohen, 2011). In one recent study (Bach et al., 2013) 19 non-reading kindergarteners were trained in letter-speech sound associations with Graphogame (Lyytinen et al., 2009) for 8 weeks. It was found that the N1 and the VWFA activation in these kindergarteners significantly improved the prediction of reading skills in second grade over behavioral data alone and together with the behavioral measures they explained up to 88% of the variance in reading (Bach et al., 2013). Therefore, visual N1 is considered as a sensitive index of visual letter string processing reflecting important processes for reading fluency (Fraga González et al., 2014, 2017).

Audiovisual integration, which is defined as the interaction between auditory and visual modalities, and its developmental trajectory remain poorly understood. The additive model, which is based on the comparison of multisensory responses to the summed responses from the constituent unisensory conditions [responses to audiovisual stimuli – responses to (auditory stimuli + visual stimuli)], has been frequently used in electrophysiological studies on multisensory integration (Calvert and Thesen, 2004; Stein and Stanford, 2008; Sperdin et al., 2009). Another commonly used approach in audiovisual research is to study the congruency effect (Jones and Callan, 2003; Ojanen et al., 2005; Hein et al., 2007; Rüsseler et al., 2017), which involves a contrast between congruent and incongruent audiovisual pairs. LSS in alphabetic languages consistently activates several language and cross-modal brain regions in adults. Regions particularly in the superior temporal cortices, have been shown in fMRI studies to have heteromodal properties (van Atteveldt et al., 2009). These brain regions have also been implicated in magnetoencephalography (MEG) findings showing LSS sites in the left and right superior temporal sulci (STS) (Raij et al., 2000). Feedback projections from this heteromodal region have also been shown in fMRI studies to modify the response in a modality-specific region of the primary auditory cortex (van Atteveldt et al., 2004). Top-down factors generated by different task demands and instructions also clearly impact multisensory integration (Andersen et al., 2004). For example, use of explicit vs. implicit and passive vs. active experimental task has been shown to influence the brain responses related to LSS (van Atteveldt et al., 2007; Blau et al., 2008).

Accessing the phonological representations for written words and letter strings has been shown to also involve the parietal areas in many studies particularly the supramarginal gyrus (BA 40) and the angular gyrus (BA 39) (Price, 2000; Pugh et al., 2000b; Schlaggar and McCandliss, 2007). Activation in these two posterior regions was found to significantly correlate with cross−modal (auditory and visual) language task performance (Booth et al., 2003). Furthermore, neuroimaging studies have confirmed that activation in the angular gyrus and supramarginal gyrus were associated with phonological (Buchsbaum and D'Esposito, 2008; Sliwinska et al., 2015) and semantic processing (Binder et al., 2009) of written words, respectively. Parietal regions also show differences during phonological processing in children with reading difficulties (Vandermosten et al., 2016). Taken together, there are several temporoparietal brain regions that are suggested to be involved in the process of integrating visual and auditory information for the purpose of reading.

In contrast to the natural relationship between auditory and visual information in audiovisual speech, the association between letters and speech sounds is mostly based upon agreed conventions. Although knowledge of letter-speech sound associations seems easy to acquire within 1 year of reading instruction (Hardy et al., 1972), EEG studies using mismatch negativity (MMN) paradigm (Näätänen, 2001) have found that beginning readers showed protracted development of letterspeech sound associations beyond early school years (Froyen et al., 2009) and such orthographic–phonological integration could serve as a neural signature of successful or failing reading development (Blomert, 2011). Studies on dyslexia have revealed reduced audiovisual integration (indexed by cross modal MMN) which is associated with a more fundamental deficit in the auditory processing of speech sounds leading to reading failure (Blau et al., 2009; Žaric et al., 2014 ´ ). Therefore, audiovisual integration is considered as an important marker associated with reading fluency and has been shown to facilitate visual specialization (indexed by print sensitive N1 in the VWFA) in learning to read (Fraga González et al., 2016, 2017).

Although LSS integration has been shown to be important for reading development (Blau et al., 2009, 2010; Blomert and Froyen, 2010; Blomert, 2011), reading is also dependent on other cognitive skills. Several behavioral measures such as phonological awareness, verbal short-term memory and rapid automatized naming (RAN) have been shown to be closely associated with reading skills and provide a good estimation of risk for dyslexia (Pennington and Lefly, 2001; Puolakanaho et al., 2007; Melby-Lervåg et al., 2012). These cognitive measures have been shown to be important mediators of the prediction of reading outcome from brain responses as measured by ERPs (Lohvansuu et al., 2018).

In the present study, we measured auditory responses to speech and visual responses to letters as well as audiovisual integration related responses of letter-speech sound combinations with MEG with the purpose of linking these brain responses to reading development. Previous studies (Froyen et al., 2008, 2009; Blomert, 2011) have often used an audiovisual oddball design and shown a long developmental trajectory for LSS integration. We used an experimental design with equal numbers of unimodal and bimodal stimuli as well as equal numbers of congruent and incongruent audiovisual stimuli. This allows a more direct examination of the LSS integration as well as separating the unimodal effects from the audiovisual effects. We used regression-based methods (controlling for age) to explore the relationship between the neural-level responses to speech sounds, visual letters, audiovisual combinations and behavioral cognitive skills. We expected to see associations between responses to the speech sounds and phonological and reading skills (e.g., Lohvansuu et al., 2018), between the visual N1 and reading skills (e.g., Brem et al., 2010; Maurer et al., 2010; Fraga González et al., 2014), and importantly between the brain measures of LSS integration and reading skills (Blau et al., 2009; Blomert and Willems, 2010; Blomert, 2011; Preston et al., 2016; Fraga González et al., 2017).

## MATERIALS AND METHODS

## Participants

All participants were Finnish speaking school children (6– 11 years) recruited through the National Registry of Finland. None of the participants had neurological disorders or problems caused by permanent head injuries, ADHD, delay in language development or language-specific disorders or medication affecting the central nervous system. In total, 32 Finnish children participated in the experiments. Of those three were excluded for the following reasons: two subjects due to excessive head movements and one subject due to low head position in the MEG helmet. The data included in the present study consisted of 29 children (mean age 8.17 years, SD: 1.05 years; 19 girls, 10 boys; 1 left-handed). All participants included had normal hearing as tested with an audiometry and normal or correctedto-normal vision. This study was carried out in accordance with the recommendations of the Ethics Committee of the University of Jyväskylä. The protocol was approved by the Ethics Committee of the University of Jyväskylä. All children and their parents were informed about the project and they gave written consent in accordance with the Declaration of Helsinki to participate in the study. All subjects received gifts (movie tickets or shopping vouchers) as compensation for participation.

## Stimuli and Task

The stimuli consisted of eight Finnish capital letters (A, E, I, O, U, Y, Ä, and Ö) written with Calibri font in black color and their corresponding speech sounds ([a], [e], [i], [o], [u], [y], [æ], and [ø]). Four categories of stimuli, auditory only (A), visual only (V), audiovisual congruent (AVC), and audiovisual incongruent (AVI) were presented in random order with 112 trials for each type of stimuli. The experiment was ca. 20 min in total with two short breaks. The duration of the auditory stimuli was 300 ms. The duration of the visual stimuli was 400 ms. For the audiovisual trials, the auditory and visual stimuli started at the same time. Each trial lasted 1500 ms and started with a fixation cross at the center of the screen for 500 ms, then followed by the presentation of auditory, visual or audiovisual stimuli (**Figure 1**). The visual stimuli were projected on the center of the screen in a gray background. The size of the visual stimuli was 0.6 cm × 0.6 cm for the fixation cross and 2 cm × 2 cm for the letters on a screen 1 m away from the participants. The sounds were delivered through insert earphones using MEG compatible lo-fi sound system at a comfortable loudness level. The stimuli were presented with Presentation (Neurobehavioral Systems, Inc., Albany, CA, United States) software running on a Windows computer. The experiment was conducted in a childfriendly environment in which we told a story of a cartoon character's adventure in a forest. In order to keep their attention equally on both auditory and visual stimuli, the participants were instructed to press a button using their right hand when they saw an animal drawing or heard an animal sound. In total eight animal drawings and their corresponding sounds were used as target stimuli and they occurred with 10% probability. Feedback (hit or miss) was given immediately after button press.

## MEG and MRI

306-channel MEG data were recorded in a magnetically shielded room using Elekta Neuromag <sup>R</sup> TRIUXTM system (Elekta AB, Stockholm, Sweden) with 1000 Hz sampling rate and 0.1– 330 Hz band-pass filter. The head position in relation to the sensors in the helmet was monitored continuously with five digitized head position indicator (HPI) coils attached to the scalp. Three HPI coils were placed on the forehead and one behind each ear. The position of HPI coils was determined in relation to three anatomic landmarks (nasion, left and right preauricular points) using the Polhemus Isotrak digital tracker system (Polhemus, Colchester, VT, United States) at the beginning of the recording. To allow the co-registration with individual magnetic resonance images (MRIs), an additional set of scalp points (>100) randomly distributed over the skull were also digitized. Electrooculogram (EOG) was recorded with two electrodes attached diagonally slightly below the left and slightly above the right eye and one ground electrode attached to the collarbone. The MEG was recorded in 68◦ upright gantry position.

stimuli (400 ms). The total length of each trial was 1500 ms.

Individual structural MR images were acquired from a private company offering MRI services (Synlab Jyväskylä). T1-weighted 3D-SE images were collected on a GE 1.5 T (GoldSeal Signa HDxt) MRI scanner using a standard head coil and with the following parameters: TR/TE = 540/10 ms, flip angle = 90◦ , matrix size = 256 × 256, slice thickness = 1.2 mm, sagittal orientation.

## Behavioral Assessment

Cognitive skills were tested on a separate visit. The behavioral tests included the following: Wechsler Intelligence Scales for Children Third edition (Wechsler, 1991) and Wechsler Preschool and Primary Scales of Intelligence (Wechsler, 2003) for children above 6 years and for 6-year-olds, respectively. Block design (visuospatial reasoning), vocabulary (expressive vocabulary), and digit span (forward and backward; working memory) subtests were administered. In the block design test, the children are shown how to arrange blocks with red and white color to form a design and they have to build the same design. In more difficult sections the children are only shown the design in a figure and they have to build it. In the vocabulary test, the children hear a word and they have to describe the meaning of that word. In the digit span test, series of numbers are said to the participant and they have to repeat them either in forward or backward order. These tests were used to assess the children general cognitive skills and used as control variables for the possible associations between phonology and reading measures and the MEG indices.

Phonological awareness was tested using the Phonological processing task from NEPSY II (Korkman et al., 2007). In this task, the child is first asked to repeat a word and then to create a new word by leaving out a syllable or a phoneme, or by replacing one phoneme in the word with another phoneme.

Rapid automatized naming (Denckla and Rudel, 1976), in which pictures of five common objects or five letters had to be

named as quickly and as accurately as possible. The objects and letters were arranged in five rows each containing 15 objects. The task was audio-recorded and the time in seconds was calculated from the recording to be used in the analyses.

Three reading tests were included: word list reading using standardized test of word list reading (Häyrinen et al., 1999), number of correctly read words in 45 s was used as the score; non-word list reading based on Tests of Word Reading Efficiency (Torgesen et al., 1999), number of correctly read non-words in 45 s was used as the score; pseudoword text reading (Eklund et al., 2015), number of correctly read words and total reading time were used as the scores. Writing to dictation was also assessed in which the child heard 20 words and had to write them on a sheet of paper. Number of correctly written words was used as the score.

#### Data Analysis

Data were first processed with Maxfilter 3.0TM (Elekta AB) to remove external interference and correct for head movements. Bad channels were identified manually and were excluded and later reconstructed in Maxfilter. The temporal extension of the signal-space separation method (tSSS) was used in buffers of 30 s (Taulu et al., 2004; Taulu and Kajola, 2005; Taulu and Simola, 2006). Head position was estimated in 200 ms time windows and a 10 ms step for movement compensation. The MEG data were transformed to the mean head position across the recording session.

Data were then analyzed using MNE Python (0.15) (Gramfort et al., 2013). First, continuous MEG recordings were low-pass filtered at 40 Hz and epoched into −200 to 1000 ms trials relative to the stimulus onset. Data were then manually checked to remove any head movement-related artifacts and electronic jump artifacts. Then independent component analysis (ICA) using fastICA algorithm (Hyvärinen and Oja, 2000) was applied to remove eye blinks, horizontal eye movements and cardiac artifacts. MEG epochs exceeding 1 pT/cm for gradiometer or 3 pT for magnetometer peak-to-peak amplitudes were excluded from further analysis. Event-related fields were obtained by averaging trials in the four conditions separately. Sum of the auditory and visual response (A + V) was calculated by first equalizing the number of epochs between the unimodal conditions and then adding up the event-related fields of the auditory and visual only conditions. To match the noise level of A + V and AVC conditions and therefore to make these two conditions comparable, a subset of AVC trials was created by randomly selecting half the number of trials from the AVC condition which equates to the noise level in A + V condition.

Individual MRI were processed in Freesurfer (RRID: SCR\_001847, Martinos Center for Biomedical Imaging, Charlestown, MA, United States) to obtain the cortical surface for source modeling. Three participants' MRIs were replaced by age and gender matched MRIs of other children (MRIs were not available for two children and the third one had a bad quality cortical surface reconstruction). Freesurfer reconstructed cortical surface was decimated to about 4098 evenly distributed vertices per hemisphere with 4.9 mm spacing. Cortically-constrained and depth-weighted (p = 0.8) L2 minimum-norm estimate (wMNE) (Hämäläinen and Ilmoniemi, 1994; Lin et al., 2006) was calculated using one layer boundary element model (BEM) from the inner skull surface for all current dipoles with a loose orientation of 0.2. The noise covariance matrix was estimated from the raw 200 ms pre-stimulus baseline data over all conditions. For each current dipole, the estimated source amplitudes were calculated by taking the norm of the vectors. Source amplitudes were averaged within each label for the 68 brain regions defined by the Desikan-Killiany Atlas (Desikan et al., 2006). In order to capture the full extent of the sensory event-related field, the auditory source region was defined by a combination of superior temporal and transverse temporal brain areas and the visual source region was defined by a combination of lateral occipital, cuneus, pericalcarine and lingual brain areas. In addition, the fusiform area was defined as a region of interest for the N170 component based on previous studies (Cohen et al., 2000; Dehaene and Cohen, 2011).

In total, five auditory and visual event-related fields, the auditory N1m, N2m and late component, and the visual P1m and N170m were investigated in the present study. Peak latencies of these sensory responses were identified at sensor level (magnetometer) from the grand average of auditory and visual only conditions. The peak latencies were 109 ms (left) and 105 ms (right) for the auditory N1m, 241 ms (left) and 247 ms (right) for the auditory N2m and 448 ms (left) and 463 ms (right) for the auditory late component. The peak latencies were 104 ms (left) and 97 ms (right) for the visual P1m and 204 ms (left) and 192 ms (right) for the visual N170m. For all the four conditions (A, V, AVC, and AVI), the source level brain activities in the auditory or visual source regions were extracted by taking the average source activities of 50 ms time window centered around the peak latencies which were identified in the previous step. For auditory late component, a longer time window of 100 ms was used due to the extended time course of the response. In addition, individual peak latencies for each participant were also detected within the time window of each component in the source space.

## Statistical Analysis

First, partial correlations (controlling for age in months) were calculated between the cognitive skill measures (see above) and the mean amplitudes and peak latencies of brain sensory responses in the four conditions using SPSS Statistics 24 software package (IBM Corp., Armonk, NY, United States). For the integration (A + V–AVC) and congruency (AVC– AVI) comparison, individual source waveforms in 68 brain regions extracted according to Desikan-Killiany atlas was used in nonparametric permutation (Maris and Oostenveld, 2007) t-tests with temporal clustering implemented in Mass Univariate ERP Toolbox (Groppe et al., 2011). The time window was selected from 0 to 1000 ms after stimulus onset and the number of permutations was 2000. The cluster alpha was 0.05 for both integration and congruency comparison. The family-wise p values were corrected for multiple comparisons. For regions that showed significant (p < 0.05) integration or congruency

effects, partial correlations (controlling for age in months) were calculated between cognitive skills and brain responses in each of these regions averaged in the time window of the significant clusters. In addition to the source amplitude values, a laterality index [(left-right)/(left+right)] was calculated for the activity from the fusiform gyrus to examine differences in the development of the hemispheric specialization to print as shown for example by (Maurer et al., 2008, 2010).

In addition, linear regression analyses were performed with cognitive skills as the dependent variable in SPSS Statistics 24. Children's age was entered first into the model followed by the brain responses as independent variables in order to determine if the different brain responses explain independent or overlapping portions of variance in the cognitive skills. Dependent and independent variables were selected based on significant partial correlations.

## RESULTS

## Cognitive Skills and Behavioral Performance

Descriptive statistics of the participants' cognitive skill measures and their behavioral performance in the cover task during MEG experiment are presented in **Table 1**.

## Grand Averages

Grand averages of combined gradiometer channels in auditory only, visual only, audiovisual congruent and audiovisual incongruent conditions are shown in **Figure 2**. The waveforms were averaged over left and right temporal and occipital gradiometer channels (within the four circles shown in the sensor layout map).

The auditory and visual responses were identified in the magnetometer channels based on their topographies and timings. For the visualization purpose, the topography plot of auditory N1m, N2m and late component, and visual P1m and N170m are

TABLE 1 | Descriptive statistics of the participants' cognitive skill measures and behavioral performance in the cover task during MEG experiment (N = 29).


shown at the local maximum of the global field power (GFP) in **Figure 3**.

## Correlations Between Cognitive Skills and Sensory Brain Responses

No significant correlations were found between the scores in the cognitive skill measures for visuo-spatial reasoning (block design), general verbal skills (vocabulary), or verbal working memory (digit span) and the sensory brain responses. No significant correlations were found between age and the sensory brain responses.

Consistent correlations were found between the phonological processing accuracy, rapid naming speed of letters and auditory N1m, N2m, and LC responses (see **Tables 2**–**4**). No consistent correlation patterns were observed between peak latencies and cognitive skills (see **Supplementary Material**). In addition, the left auditory cortex activity at the late time window in response to the audiovisual stimuli showed rather systematic associations with phonology, rapid naming of letters and objects as well as non-word list reading accuracy. N170m amplitude in the left fusiform gyrus in the audiovisual conditions (both AVI and AVC) were significantly correlated with phonological processing. A similar correlation pattern was observed for the auditory only, audiovisual congruent and incongruent conditions in relation to cognitive skills thus indicating a high overlap between these brain measures. In general, we found that the larger the brain response the better the performance in the behavioral tasks for all of the correlations.

In the next step, linear regressions were used to predict the phonological and rapid naming (the dependent variable) using age and the brain responses that showed significant partial correlations as predictors (independent variables). Age was entered first into the model followed first by the significant auditory variables and visual variables using stepwise method and finally by the significant audiovisual variables also using the stepwise method. This model was used to disentangle possible overlapping variance explained by auditory/visual and audiovisual brain responses. In the multiple regression model, as shown in **Table 5** the auditory late component from the left hemisphere was the only significant predictor of the phonological skills and RAN letters.

The scatterplots (**Figure 4**) show that, in general, the larger the source activity in the auditory cortex the more likely it is that the child has better phonological processing skills and faster rapid naming abilities.

#### Integration and Congruency Effects Integration Effect (A + V vs. AVC)

Cluster-based permutation tests showed that audiovisual integration effect was found in multiple brain regions in the parietal and temporal areas after ca. 250 ms (p < 0.05) as shown in **Figure 5**. In total eight significant clusters were found in eight brain regions of the Desikan-Killiany atlas. These clusters were in the left (317–499 ms) and right (315–818 ms) inferior parietal, left (391–585 ms) and right (306–797 ms) supramarginal, right (271–529 ms) precuneus, right (551–755 ms) postcentral and

right superior (535–827 ms), and middle (346–749 ms) temporal cortices.

#### Congruency Effect (AVC vs. AVI)

Cluster-based permutation test did not reveal significant effects (p > 0.05) in congruency comparison.

## Correlations Between Cognitive Skills and the Brain Activity Related to Multimodal Integration

The difference between the AVC and A + V conditions was calculated and the average source amplitudes from the different brain regions in the time window identified by the permutation test were used for the correlation analyses with cognitive skills (**Table 6**). Representative partial correlations between suppressive integration and behavioral tests are shown in **Figure 6**.

## DISCUSSION

In this study, auditory and visual responses, as well as audiovisual integration of letters and speech sounds were correlated with children's behavioral cognitive skills. Results from the current study revealed that auditory processing, especially the auditory processing in the late time window was the driving force for the correlation between sensory evoked fields and phonological skills. The visual N170 in the left fusiform gyrus in the audiovisual condition was also correlated with phonological skills. In addition, audiovisual suppressive integration was localized mainly in the temporoparietal brain regions and showed an independent contribution from the sensory evoked fields to the reading skills.

It has been shown that the sequence of activation in response to speech sounds is strikingly different in children compared with adults (Wunderlich et al., 2006; Parviainen et al., 2011). Children showed prolonged responses to sound with a major peak at 250 ms in both left and right hemispheres (Parviainen et al., 2011) while a corresponding effect occurred about 100 ms specifically in the left hemisphere in adults (Parviainen et al., 2005). This matches with the current findings that showed a major negative going peak around 250 ms after speech sound onset. The response at 250 ms is usually followed by a second activity peak around 400 ms (Ceponiene et al., 2001, 2005, 2008).

The auditory late component seems to be sensitive to the speech sounds as can be seen from the study in children in which a strong late activation around 400 ms was observed in speech sounds compared to other types of sounds (Parviainen et al., 2011). The activity during the late component time

window has been suggested in other studies to be related to late stages of phonological processing (Stevens et al., 2013; Bann and Herdman, 2016) or to orthographic-phonological mapping (Weber-Fox et al., 2003). However, in our study the late processing (around 413 ms) seemed to be linked to the auditory stimuli. This fits with previous studies suggesting that this time window could reflect the effect of speech sound representations (Szymanski et al., 1999; Ceponiene et al., 2001, 2005; Kuuluvainen et al., 2016) and it is sensitive to phonological priming (Bonte and Blomert, 2004). The response has also been suggested to be important for receptive language processing (Ceponiene et al., 2008) which also matched with the correlation pattern of the current study. Overall this could imply that the later stages of integrative speech sound processing are important also for learning to read and for phonological skills. Although the activity around 400 ms seems to mature early in development


TABLE 2 | Partial correlations (controlling for age) between cognitive skills and the auditory responses in the auditory only condition.

Note: <sup>∗</sup>p < 0.05, ∗∗p < 0.01.

fnhum-12-00304 August 1, 2018 Time: 16:59 # 9

The auditory components are left (L) and right (R) auditory N1m, N2m and late component (LC).

TABLE 3 | Partial correlations (controlling for age) between cognitive skills and the visual responses in the visual only condition.


Note: VC, visual cortices; FG, fusiform gyrus; LI, laterality index.

The visual responses are left (L) and right (R) visual P1m in the visual cortices and N170m in the fusiform gyrus.

TABLE 4 | Partial correlations (controlling for age) between cognitive skills and the auditory and visual responses in the audiovisual conditions [the first row in each cell is audiovisual congruent (AVC) and the second row audiovisual incongruent (AVI)].


Note: AC, auditory cortices; VC, visual cortices; FG, fusiform gyrus; LI, laterality index. <sup>∗</sup>p < 0.05, ∗∗p < 0.01.

The auditory components are left (L) and right (R) auditory N1m, N2m and late components. The visual components are left (L) and right (R) visual P1m in the visual cortices and N170m in the fusiform gyrus.

TABLE 5 | Linear regression analysis using phonological and rapid naming as the dependent variable, age was entered first in the model, then the brain responses that showed significant partial correlations as predictors (independent variables).


Note: Beta = standardized Beta coefficient; R2 change = unique variance accounted for at each step of the 2-step (enter method for age; stepwise for brain measures) multiple regression analyses. <sup>∗</sup>p < 0.05, ∗∗p < 0.01.

(Kushnerenko et al., 2002), there is still substantial variation in the response amplitude at school-age that is systematically linked with cognitive skills related to language processing.

In the current study, we found correlations between N1, N2, late component and phonological processing for both auditory and AV conditions. Although the regression analysis showed that only the left auditory late component explains unique variance among the brain measures implicating that the early responses do not have independent variance from the late activity that is related to phonological processing. From the time window around 100 ms correlations have been found between brain responses and preschool cognitive skills also in other studies. For example, auditory P1 response has been shown in typically developed children to be associated with phonological and pre-reading skills (Kuuluvainen et al., 2016). In addition, for children at risk for dyslexia, their P1 response amplitudes elicited by speech sound stimuli were smaller compared to controls (Lovio et al., 2010). Similarly, in a study (Hämäläinen et al., 2015) investigating the eventrelated potentials to tones in children with multiple risk factors for dyslexia, the amplitudes at the P1–N2 time window was correlated with letter knowledge and phonological skills. The N1 and N2 time window has also been shown to be sensitive to reading level differences in response to phonological priming (Bonte, 2004) and nonspeech stimuli (Espy et al., 2004).

The N2 response has been linked to reading and readingrelated skills in previous studies. For example, the N2m has been found to correlate with reading skills in children (Parviainen et al., 2011) and the N2 response has been reported to have larger amplitudes in response to speech and non-speech sounds in dyslexic children compared with control group and such enhanced brain responses were correlated with reading skills (Hämäläinen et al., 2013). Furthermore, the brain activity at the N2 time window has been found to correlate with phonological skills, as well as reading and writing accuracy in children with dyslexia (Lohvansuu et al., 2014). The N2m response strength in the left hemisphere in the current study was correlated with phonological skills further supporting the hypothesis that this time window is important to languagerelated skill development.

We also found a significant correlation between rapid naming ability and auditory late component amplitude. Previous research (Kuuluvainen et al., 2016) showed a similar relationship between N4 and rapid naming speed in preschool children in which N4 was suggested to be linked to accessing phonological representations. Overall, the correlation patterns found in the current study between the phonological and rapid naming ability and auditory brain responses are consistent with and in support of the earlier literature.

Audiovisual responses shared a large portion of variance with the auditory responses, and furthermore, both showed an association with phonology. In order to disentangle contributions of the auditory processing from the audiovisual processing, we run regression analyses with both auditory and audiovisual brain responses as predictors. No unique variance was left to

amplitude.

be explained by the responses to the audiovisual stimuli on the phonological skills after the left auditory late response was taken into account. The regression analyses thus showed the auditory response to be the driving force behind the association with phonological skills.

N170 amplitude and the laterality index of the N170 were not significantly correlated with any of the cognitive skills in the visual only condition. Most previous studies (Cohen et al., 2000; Dehaene et al., 2002; Maurer et al., 2005, 2008; Dehaene and Cohen, 2011) found brain specialization for letter strings and whole words in VWFA (as indexed by N170 responses in EEG/MEG). Presentation of single letters in our study instead of letter strings or words could therefore have led to the lack of findings for the N170 response in the visual only condition. However, previous studies (McCandliss and Noble, 2003; Maurer et al., 2010) have suggested the left lateralization of N170 for words to be partly driven by an automatic link between orthographic and phonological systems. Interestingly, the N170 response showed significant correlation with phonological skills in both audiovisual congruent and incongruent conditions in the left fusiform area. This result could suggest a possible top-down feedback activation of the VWFA and the lateral inferior temporal cortex from auditory and audiovisual integration sites. It has been reported that the VWFA could be activated during speech processing through a topdown modulation (Dehaene et al., 2010; Desroches et al., 2010; Yoncheva et al., 2010). Such auditory/audiovisual processing modulation fits well with the significant correlation between phonological processing and N170 responses in left fusiform in the audiovisual conditions in our study. Similar results were


TABLE 6 | Partial correlations (controlling for age) between cognitive skills and averages of the brain responses in the regions and time windows where significant audiovisual integration effects were revealed by the cluster-based permutation analyses.

Note: <sup>∗</sup>p < 0.05, ∗∗p < 0.01.

found in an MEG study in which occipitotemporal letterstring-sensitive activation strength was also reported to be correlated with phonological skills in children (Parviainen et al., 2006).

When comparing the summed unimodal responses to the audiovisual responses, suppressive audiovisual integration effect was found in right temporal and both left and right parietal regions. These regions partly match with a previous MEG

study (Raij et al., 2000) in adults about LSS integration in which a suppressive integration effect was found in the right temporo-occipito-parietal junction and the left and right STS. In the current study, we found suppressive audiovisual integration effects mostly in the temporoparietal areas but not in the frontal cortices reported in (Raij et al., 2000). This could be due to the difference in the experimental design since an active implicit audiovisual task was used in our study whereas (Raij et al., 2000) used an active explicit matching task, which could recruit more top-down task related audiovisual attention processes (van Atteveldt et al., 2007). The dorsal (temporoparietal) system, including supramarginal gyrus/angular gyrus in the inferior parietal lobule and the posterior superior temporal gyrus (pSTG) is thought to be related to mapping visual print onto the phonological and semantic structures of language (Sandak et al., 2004). Compared with the rather consistent findings in the superior temporal cortex for LSS integration in adults (Raij et al., 2000; van Atteveldt et al., 2004), it seems that the early readers have recruited more widely distributed temporoparietal cortical networks to support learning the association of orthography with phonological codes (Pugh et al., 2013). The suppressive LSS integration effect in the parietal areas at the rather late time window could be related to top-down modulation of the audiovisual processing and reflect less automatic processing of the stimuli than in adults. Pugh et al. (2013) also find a similar correlation between BOLD response and reading skills in the precuneus, which is similar to the current study, and they interpret their finding as part of the visual attention network that seems to impact reading development. They also suggest that this could reflect the integration between visual, language and attentional processes. Lack of the suppressive integration effect at the left superior temporal areas could be related to the less automatic processing of the multimodal stimuli in early readers (Froyen et al., 2009; Blomert, 2011).

The timing of this integration effect was mostly from about 300 to 600 ms in the present study, which matches well with the previous studies using similar stimuli and paradigms (Raij et al., 2000; Herdman et al., 2006; Jost et al., 2014). The relatively late time window is probably due to the fact that bimodal audiovisual integration happens after the early unimodal processing of sound in the auditory cortex and print in the visual cortex (Raij et al., 2000) and possibly involve the feedback projection to auditory cortex in a late stage of processing (van Atteveldt et al., 2004).

Significant partial correlations were found between the audiovisual integration effect and phonological skills, rapid naming abilities as well as reading and writing skills. Phonological skills were correlated with the strength of the audiovisual integration effect in the right inferior parietal and precuneus regions, while rapid naming of letters was correlated with the strength of the audiovisual integration in the left supramarginal and right precuneus regions. Previously research has found similar associations between both structural (gray matter volume indices) (Raschle et al., 2011) and functional (Raschle et al., 2012) changes in these temporoparietal regions and pre-reading skills such as phonology and rapid naming. Moreover, activations in left parietal (angular gyrus) lobe was correlated with individual at-risk index scores for dyslexia in pre-readers (Specht et al., 2009). Reduced LSS is suggested to be linked to a deficit in auditory processing of speech sounds, which in turn predicts phonological skills (Blau et al., 2009). Consistent correlation was found between the strength of the audiovisual integration effect in the right precuneus and reading skills such as word list, nonword list and nonword text reading accuracy. This matches well with results from one recent study which used similar brain-behavior correlation analysis with fMRI and showed the activation in the precuneus to print and speech sounds of words and pseudowords to be correlated with reading-related skills (Pugh et al., 2013). Finally, writing skills were also significantly correlated with the strength of the audiovisual integration effect in the right superior and middle temporal regions. This might suggest that the skills required in writing to dictation are more associated with auditory processes for speech than those required for reading (Hämäläinen et al., 2009). Taken together, these results highlight the important role of LSS in the temporoparietal area in early reading acquisition (Blomert and Froyen, 2010; Blomert and Willems, 2010).

Audio-visual congruency did not produce significant effects in the brain responses in the present study. Here we discuss possible reasons for this. First, the congruency effect which heavily depends on the task demands (Andersen et al., 2004; van Atteveldt et al., 2007), also seems to interact with the brain imaging method (fMRI vs. MEG). For example, several previous fMRI studies on children have found a congruency effect using similar implicit active tasks to ours (Blau et al., 2010; Brem et al., 2010). In contrast, use of an active explicit matching task in fMRI has been reported to overrule the congruency effect (van Atteveldt et al., 2007) However, the MEG study of (Raij et al., 2000) used an active task forcing the participants to relate letters to sounds and reported an audiovisual congruency effect in the heteromodal superior temporal cortex. Therefore, it seems that the task demands modulate differently the MEG and BOLD responses. Second, it is also possible that the children in the present study may not establish fully automatized LSS integration as many of them only have 1 or 2 years of reading instruction. Previous research (Froyen et al., 2009; Blomert, 2011) using MMN paradigm has shown the protracted developmental trajectory of LSS integration and this may be reflected in the absence of congruency effect in the present study. Finally, almost all previous electrophysiological studies (Froyen et al., 2008, 2010; Žaric et al., 2014 ´ ) examining letter-speech sound congruency in children have used an oddball paradigm, it is likely that congruency is pronounced in the oddball paradigm, but not in the simple LSS paradigm used in the present study. The audiovisual integration and congruency comparisons indicated that children seemed to utilize more general multimodal integration processes of letters and speech sounds, but have not reached the fully automatic level of integration as shown by the absent of congruency effect.

A cohort of beginning readers with relatively wide age range (6–11 years) was recruited to examine the reading and reading related cognitive skills as continuums. Even though we controlled for age in all of the correlation and regression analyses, age did

not seem to have a large impact on the results. This finding is similar to that of, for example, the study by Pugh et al. (2013). It seems that the correlations were driven more by learning of these cognitive skills than general maturation of the central nervous system.

According to the general neurodevelopmental theory for reading proposed by (Pugh et al., 2000a; Cornelissen et al., 2010), the temporal and dorsal parietal networks are crucial for the early stage of reading acquisition. Working together with the anterior regions (especially the inferior frontal gyrus), the dorsal (temporoparietal) reading system is involved in the emergence of phonological awareness (Katzir et al., 2005) and in forming associations between orthography, phonology, and semantics (Pugh et al., 2001). Such associations will then shape the organization and connectivity of left occipitotemporal regions including the VWFA (Dehaene and Cohen, 2011) for supporting fluent reading in advanced readers. The present study highlighted the important role of the temporoparietal route in developing phonological awareness and forming automatic LSS in early readers.

A possible concern regarding our study relates to the accuracy of MEG source reconstruction in children, which could be affected by many factors including the relatively large distance of the child's head to the MEG sensors, imprecise cortical surface reconstruction, suboptimal forward and inverse solution parameters for the child brain and potential MEG-MRI coregistration errors. These could lead to misallocation of brain activity to neighboring brain regions from their true locations in the source analyses. In general, we followed the recommended analysis practice proposed by (Jas et al., 2017) and checked in each step the quality of the data carefully. Furthermore, MEG is less sensitive to the conductivity parameters of the head tissues than EEG which should allow better reconstruction of source activity in children. In addition, we used relatively large brain regions, and in the case of LSS integration effects whole brain analysis, capturing most of the brain activity in the different conditions taking into account possible limitations in localization accuracy of the brain activity.

## CONCLUSION

In conclusion, brain-behavior analyses were used to explore the relationship between behavioral tasks measuring different

## REFERENCES


cognitive skills and brain responses related to auditory and visual processing of letters and speech sounds in beginning readers. Regression analysis identified the auditory late component in response to speech sounds to be the most significant predictor of phonological skills and rapid naming. In addition, the audiovisual integration effect was found in left and right temporoparietal regions and several of these temporal and parietal regions showed contribution to reading and writing skills. Findings from the current study point to the important role of temporoparietal regions in learning letter-speech sound associations in early reading development. A more detailed neurocognitive model, including additional measures such as functional connectivity, is needed for better understanding of the cortical organization and the developmental trajectory of LSS in children learning to read.

## AUTHOR CONTRIBUTIONS

WX, JH, and OK designed the study. WX, JH, and OK performed the MEG experiments. WX, JH, and SM analyzed the data. All authors discussed the results and contributed to the final manuscript.

## FUNDING

This work was supported by the European Union projects ChildBrain (Marie Curie Innovative Training Networks, no. 641652), Predictable (Marie Curie Innovative Training Networks, no. 641858), and the Academy of Finland (MultiLeTe #292466).

## ACKNOWLEDGMENTS

We would like to thank Katja Koskialho, Ainomaija Laitinen, and Sonja Tiri for their help in data collection.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnhum. 2018.00304/full#supplementary-material


neuroimaging studies. Cereb. Cortex 19, 2767–2796. doi: 10.1093/cercor/ bhp055



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Xu, Kolozsvari, Monto and Hämäläinen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fnhum-12-00304 August 1, 2018 Time: 16:59 # 17

# Improving Conceptual Knowledge of the Italian Writing System in Kindergarten: A Cluster Randomized Trial

Giuliana Pinto, Lucia Bigozzi, Christian Tarchi\* and Monica Camilloni

*Department of Education and Psychology, University of Florence, Florence, Italy*

This study assessed the efficacy of PASSI (*Promoting the Achievement of Sound-Sign Integration*), an intervention to improve children's conceptual knowledge of the Italian writing system in kindergarten, which is an emergent literacy predictor of reading and spelling acquisition focused on letter-speech sound integration. PASSI implements an embedded-explicit approach in which teachers target specific subskills (reflection on the graphic, symbolic and phonological aspect of written signs) and emphasize children's contextualized interactions with oral and written language. One hundred fifty-nine Italian children participated in this study. Six teachers (and their three respective classes) were randomly assigned to the experimental group, and six teachers were assigned to the control group. All children were tested on the invented spelling of words and numbers, knowledge of the alphabet, orthographic awareness, and drawing twice, before and after the intervention. Children's visual-motor integration skills were also assessed as a control variable. The data were analyzed through the complex samples general linear model (GLM) approach. The results confirmed the efficacy of PASSI in promoting children's conceptual knowledge of the writing system and related emergent literacy skills. Theoretical and educational implications of the results are presented and discussed.

Keywords: emergent literacy, conceptual knowledge of the writing system, invented spelling, knowledge of letters, orthographic awareness

## INTRODUCTION

This study assessed the efficacy of an intervention to improve children's conceptual knowledge of the Italian writing system in kindergarten. Past studies have shown that children's early competence in this construct are predictive of future reading fluency scores (Bigozzi et al., 2016b) and reading and spelling disorders (Bigozzi et al., 2016a). Children's conceptual knowledge of the writing system are generally assessed through an invented spelling task, in which children create letter-speech sound integrations that correspond to their level of knowledge of the writing system. This factor integrates phonological awareness with grapho-motor skills (Berninger et al., 2008) and visual attention (Germano et al., 2014).

Past studies have demonstrated that children's early literacy skills can be stimulated through educational programs and interventions (Bus and van Ijzendoorn, 1999; Justice and Pullen, 2003). Emergent literacy interventions can be designed through an embedded approach (i.e.,

#### Edited by:

*Jurgen Tijms, University of Amsterdam, Netherlands*

#### Reviewed by:

*George Manolitsis, University of Crete, Greece Mack Burke, Texas A&M University, United States*

> \*Correspondence: *Christian Tarchi christian.tarchi@unifi.it*

#### Specialty section:

*This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology*

Received: *07 May 2018* Accepted: *19 July 2018* Published: *07 August 2018*

#### Citation:

*Pinto G, Bigozzi L, Tarchi C and Camilloni M (2018) Improving Conceptual Knowledge of the Italian Writing System in Kindergarten: A Cluster Randomized Trial. Front. Psychol. 9:1396. doi: 10.3389/fpsyg.2018.01396* emphasizing children's daily self-initiated, naturalistic, and contextualized interactions with oral and written language), an explicit approach (i.e., structured, sequenced and directed instruction targeting specific skills), or a combination of both (Justice and Kaderavek, 2004). Recently, studies have contributed to the reconsideration of phonological awareness as the main predictor of literacy acquisition and brought to light multicomponent constructs, such as children's emergent conceptual knowledge of the writing system (Ouellette and Sénéchal, 2017; Pinto et al., 2017). This construct can be considered the emergent antecedent of the integration process that characterizes formal spelling.

In this study, we focused on children's conceptual knowledge of the Italian writing system to provide kindergarten teachers with an evidence-based intervention that could facilitate reading and spelling acquisition once the children are in primary school. The intervention fosters children's reflection on the characteristics of different symbolic systems used for graphic representations, namely, the invented spelling of words, invented spelling of numbers, invented reading, and drawing skills. In this study, invented spelling and invented reading are defined as children's early attempts to represent words in print before they can conventionally read and spell words (Ouellette and Sénéchal, 2008).

## Learning to Spell in the Italian Language

The target language in this study was Italian, which is characterized by a few differences from other writing systems, including English, the language on which most research on learning to read and spell is based on. Cross-linguistic studies have suggested the existence of a near-universal pattern of reading and spelling development across alphabetic languages (Furnes and Samuelsson, 2010; Ziegler et al., 2010; Caravolas et al., 2012; Landerl et al., 2013), with some moderating effect by the orthographic consistency on the rate of development and patterns of impairments (Paulesu et al., 2001; Seymour et al., 2003; Furnes and Samuelsson, 2010). The two dominant theories describing cross-script diversity in reading and spelling development are orthographic depth and psycholinguistic grain size theory (Daniels and Share, 2018). Orthographic depth refers to the fact that alphabetic orthographies differ by the transparency of their grapheme-phoneme correspondence. Transparent orthographies are characterized as having nearly a 1:1 correspondence, whereas opaque orthographies are characterized by an equivocal grapheme-phoneme correspondence (Katz and Frost, 1992). By implication, phonological skills should be more strongly associated to reading and spelling development in deep orthographies, rather than in shallow ones (Katz and Frost, 1992; Daniels and Share, 2018). According to the psycholinguistic grain size theory, the consistency of spelling–sound mappings may modulate the importance of phonological skills: Whereas in shallow orthographies readers and spellers can rely on single letters (and their phonemic correspondence), in deep orthographies they need to rely on larger grain, such as rhymes (Ziegler and Goswami, 2005; Daniels and Share, 2018; Diamanti et al., 2018).

Interestingly, the rate of reading and spelling development mirrors the transparency of the writing system (Sprenger-Charolles et al., 2011). Italian, the target language in this study, is characterized by a transparent orthography. More specifically, the regularity is higher in grapheme-phoneme relations (forward regularity) than in phoneme-grapheme relations (backward regularity; Wimmer and Mayringer, 2002). For example, when reading, the grapheme "c" corresponds to the phoneme /k/ if followed by a consonant (including "h") or by one of the following vowels: "a," "o," or "u." In contrast, the same grapheme corresponds to the phoneme /tR / when followed by "i" or "e." There are no exceptions to this rule. However, when writing, the phoneme /k/ can correspond to two different graphemes, "c" as in /kwko/ ("cuoco," en. tr. "chef ") or "q" as in /kwì/ ("qui," en. tr. "here"). In Italian, children generally learn to spell through grapheme-phoneme mapping (sublexical procedure), and shift to recognize known words by sight alone (lexical procedure) in later grades (Notarnicola et al., 2012; Bigozzi et al., 2017). Typically, mastery in the sublexical procedure is achieved at the end of first grade (Notarnicola et al., 2012), reaching a ceiling effect at the end of second grade (Cossu et al., 1995). It should be noted that in Italian the regularity of the orthographic system is higher in reading (forward regularity) than it is in spelling (backwards regularity) (Notarnicola et al., 2012; Bigozzi et al., 2016a). When spelling, some phonemes might correspond to one or more graphemes, and correct spelling can be identified only through the context. Finally, developmental studies showed that spelling plays a fundamental role for both, reading and writing acquisition (Pinto et al., 2015a), bringing further support to the importance of spelling in literacy development in Italian.

## Symbolic Systems: Spelling Words, Spelling Numbers, and Drawing

According to the emergent literacy approach, children's preschool competences and knowledge of the nature and conceptual meaning of a writing system begin early in life and influence the formal learning of conventional literacy processes (Whitehurst and Lonigan, 1998; Lonigan et al., 2013). Drawing, numeracy and literacy are all fundamental components of children's emergent understanding of symbolic systems (Whitehurst and Lonigan, 1998) and have been found to be strong predictors of later achievement in formal literacy processes (Yamagata, 2007; Bigozzi et al., 2016b). Past studies have also identified children's ability to differentiate different symbolic systems in kindergarten (e.g., letter from numbers) as a predictor of improvement in reading achievement through primary school (Spira et al., 2005). These symbolic systems share some characteristics. All these symbolic systems allow the expression of mental representations. Both drawing and writing are systems that leave visible marks, unlike what occurs with speaking or reading (Tolchinsky Landsmann, 2003). Drawing can be defined as a process characterized by certain rules that need to be followed (Goodnow and Levine, 1973; Freeman, 1987): Also drawing is characterized by recurrent graphic patterns, such as lines, dots, and circles. However, these symbolic systems differ in some aspects, and children's knowledge and attitudes toward symbols might be domain specific (Tolchinsky Landsmann and Karmiloff-Smith, 1992). Writing is characterized by more restrictions than drawing. The writing system can be segmented into discrete units (i.e., a word can be segmented into letters), and thus, set of units represent a closed system in which nothing can be added without drastically changing its meaning (Tolchinsky Landsmann and Karmiloff-Smith, 1992). Concerning literacy and numeracy, a first relevant difference is that we use an alphabetic system to spell words, whereas numerals can be represented by digits that, in turn, can be spelled in alphabetic writing. Additionally, repeated symbols cannot represent an example of a correct representation of a word (e.g., ppppp), whereas they can represent an example of a correct representation of a number (e.g., 88888). Prior research has found that the conventional use of numbers appears developmentally earlier and more frequently than the conventional use of letters (Yamagata, 2007), suggesting that an intervention targeting children's early attempts at spelling might have different effects on words vs. numbers.

## The Construct of Conceptual Knowledge of the Writing System

Conceptual knowledge of the writing system includes two components. The first component is awareness of the existence of different symbolic systems to represent meanings, for instance, written language, numeric language, and drawing (see previous paragraph). The second component is represented by invented spelling, which is the systematic matching of sounds that are included in words with signs that are not necessarily conventional (Liberman, 1971; Read, 1971; Puranik et al., 2011, 2013; Read and Treiman, 2013). Conventional signs are the actual letters of the alphabet. In contrast, "invented" signs are written productions that although not yet letters, include some of the properties of the writing system. Invented spelling refers to children's spontaneous attempts to represent words in print (Read, 1971; Ouellette and Sénéchal, 2017), and several subskills are involved, all of which need to be addressed by an intervention aiming at improving children's conceptual knowledge of the writing system. Certainly, phonological awareness is involved in invented spelling, as children need to be able to discriminate between the sounds included in a word (Vernon and Ferreiro, 1999; Martins and Silva, 2006). This construct also requires children to reflect their level of knowledge of the writing system and provides them with insight into the structure of their writing system (Treiman, 1998; Read and Treiman, 2013). Overall, invented spelling is a developmental step in which children attempt to merge the phonological and orthographic characteristics of a word (Adams, 1998; Ouellette and Sénéchal, 2008). Visual-motor skills are also involved and allow children to apply and execute their knowledge on the phonological-orthographic connectivity (Pinto and Camilloni, 2012; Read and Treiman, 2013).

Several studies have explored children's emerging conceptual knowledge of the writing system and emphasized similarities and differences across languages. In many countries, before the onset of formal schooling, children learn to identify the shapes of letters and known the name of letters (Treiman et al., 2007b); they show some knowledge about the horizontal orientation of their language (Treiman et al., 2007a, 2015); and they show some understanding about the symbolic nature of writing, and how words symbolize meaning in a different way than pictures do (Treiman et al., 2016). Rather than being an all-in-one acquisition, children's emergent conceptual knowledge of the writing system is progressive. For instance, children learn some letters before other ones (Puranik et al., 2013). In a study conducted with 296 preschool children aged 4–5 years, Puranik et al. (2011) found that print knowledge and letter writing were related to name-writing skills, whereas, print knowledge, alphabet knowledge and name writing were related to letter writing skills. Only letter writing skills significantly contributed to the prediction of spelling skills. Thus, letter writing is an important antecedent of spelling, but this knowledge is supported by several other emergent writing skills, confirming the multicomponential nature of children's conceptual knowledge of the writing system. Conceptual knowledge of the writing system was found to be related to literacy acquisition in both transparent (e.g., Italian, Bigozzi et al., 2016a,b) and opaque languages (e.g., English, Ouellette and Sénéchal, 2008).

Taken together, these results suggest that early instruction in conceptual knowledge of the writing system at the preschool level may be promising to enhance emergent as well as formal literacy skills (Puranik et al., 2011).

## Developing Conceptual Knowledge of the Writing System Through Intervention

There are several reasons to believe that children's conceptual knowledge of the writing system can be improved through an intervention, the most important being that this construct is context dependent. Before entering primary school, children are surrounded by several symbolic representations of the world (Ferreiro, 1988; Ravid and Tolchinsky Landsmann, 2002), and become increasingly able to discriminate written language from other symbolic systems by comprehending several basic features of written language, such as dimensionality, linearity, directionality, horizontality, and finally, letters as a conventional system of shapes (Levin and Bus, 2003; Treiman et al., 2007a,b, 2015; Puranik et al., 2013). Children develop also a pragmatic competence in written language through exposure to adults' use of writing (e.g., shopping list) and the interaction with them or peers in writing-mediated activities (e.g., story-telling) (Aram and Levin, 2004).

Even though most intervention studies have targeted the subskills included in children's conceptual knowledge of the writing system, a few studies have specifically addressed invented spelling. Silva and Martins (2003) verified the efficacy of an invented spelling intervention to foster the development of 30 Portuguese children's phonological awareness. The intervention aimed at leading the child to think about the rules of spelling and to help them move from pre-phonetic to early phonemic spellings. The intervention proved to be effective, suggesting the possibility of promoting both phonological awareness and the gradual learning of the alphabetic principle. In a followup study, the authors (Martins and Silva, 2006) suggested that invented spelling intervention programs could replace, or at least complement, phonological awareness programs to prevent difficulties in learning to read. Ouellette and Sénéchal (2008) identified children's invented spelling levels and trained them at a higher level than their own level. The experimental group outperformed the control group in invented spelling, orthographic awareness and the reading of words. Rieben et al. (2009) compared four different early spelling practices that mimicked teaching activities, namely, invented spelling, copied spelling, invented spelling with feedback on orthography, and a control group. According to their results, invented spelling with feedback on orthography was more effective than invented spelling alone or copied spelling in improving children's orthographic awareness but not in the phonologically oriented tasks.

Levin and Aram (2013) referred to these three works and discussed their limitations. According to these researchers, the first two studies (Martins and Silva, 2006; Ouellette and Sénéchal, 2008) designed developmentally tailored interventions that constrained children's progress in their conceptual knowledge of the writing system. In fact, their results were far from optimal, as also discussed by the authors of the original articles. In contrast, Rieben et al. (2009) tested several types of intervention, and although each of the interventions was effective, children's gains were restricted to the type of feedback received. According to Levin and Aram (2013), the problem was that explanations were provided only for the orthographic aspects of the writing system rather than addressing the integrated alphabetic code underlying spelling and reading. To overcome such limitations, Levin and Aram (2013) compared the effects of two mediation routines on children's gains obtained in invented spelling, as well as other early literacy skills. In the process-product mediation group, the experimenter explained the implicit and explicit processes involved in invented spelling immediately after children's invented spelling performance, whereas in the product mediation group, the experimenter showed the correct spelling of a word after students' invented spelling performance. According to their results, the process-product mediation strategy was more effective in enhancing knowledge of letters, as well as the segmentation, spelling and decoding of words, than the product mediation strategy. This result suggests that the explanation of all steps involved in phoneme-grapheme mapping processing along with the display of the correct spelling contributes to the development of early literacy skills, except for naming letters and word decoding. However, this approach has its limitations, as also suggested by the authors. The intervention was adapted to the phono-orthographic characteristics of Hebrew, and invented spelling interventions in other writing systems have to be adjusted to their characteristics.

#### Rationale and Research Questions

Based on these theoretical premises, we developed PASSI (Promoting the Achievement of Sound-Sign Integration), an intervention to improve children's conceptual knowledge of the writing system in kindergarten before the onset of formal schooling. There are several differences between PASSI and prior intervention studies in terms of the target skill, intervention design and activities. PASSI is an intervention that includes the simultaneous integration of the dual code, decoding and coding, in three different symbolic systems (word writing, number writing, and drawing), given the importance of early acquisition of the ability to effectively differentiate different symbolic systems (Spira et al., 2005). Children's metacognitive reflection on written language is triggered by activating both children's coding hypotheses (when inventing spelling) and children's decoding hypotheses (when inventing reading). Each of the three symbolic systems shares the need to rely on some conventional rules to be effective, but the rules change from system to system (Spira et al., 2005; Yamagata, 2007). When drawing, children have to include symbols in their production, which should look like the object that they want to represent. In contrast, words and numbers are conventional and arbitrary signs that represent sounds and meanings without any similarity in form. Words represent several types of meanings (concrete and abstract), whereas numbers represent quantities. To effectively convey meaning, words and numbers must also be represented with a conventional syntax. Signs need to be written (and read) in a specific order to create a relationship among them. The same sign can produce different meanings, depending on the symbolic system in which it is included. A circle can be a tire if we are drawing, an "o" if we are spelling words, or a "0" if we are spelling numbers. The simultaneous activation of these three symbolic systems increases children's conceptual awareness of the differences existing between these systems.

To increase children's conceptual knowledge of the writing system, we adopted an embedded-explicit approach to design the intervention, which not only focused on fostering children's spontaneous engagement with the oral and written language present in their natural environment (Read and Treiman, 2013) but also included more systematic, structured skills activities. We targeted specific subcomponents of the invented spelling construct, namely, the graphic, symbolic and phonological aspects of written signs and the relationships among them. The activities were all aimed at stimulating reflection on and the construction of the written sign rather than at anticipating the formal learning of reading and writing. The choice of creating an embedded-explicit approach influenced the research design implemented in this study, a cluster randomized trial, increasing the ecological validity of the study. Since the intervention was delivered by teachers over 15 weeks, we could not randomly assign students to conditions; we had to randomly assign teachers to conditions. The appropriate statistical method was implemented to adjust for intra-cluster correlations.

This study examined the efficacy of PASSI in children's performances in the invented spelling of words and numbers, orthographic awareness and knowledge of letters in the Italian language, with children's visual-motor integration included as a control variable. We hypothesized that PASSI would be more effective in enhancing the invented spelling of words and numbers, orthographic awareness and knowledge of letters than the control group. To test the domain-specific nature of the intervention, we also verified its efficacy in children's drawing skills, hypothesizing that we would not find any significant improvement, notwithstanding its dependency from school

practices (Tarchi and Pinto, 2015). Of notice is that emergent literacy and emergent numeracy follow different developmental paths (Yamagata, 2007), which in turn can influence the beneficial effect of PASSI on children's gains.

## METHOD

#### Participants and Setting

One hundred fifty-nine Italian children, all attending two different schools located in a city of Central Italy, participated in this study. All children were born in Italy and spoke Italian as their mother-tongue language. At the time of the study, no participant was diagnosed with a physical or mental disability, was included in a diagnostic process, or identified by the teachers as having special educational needs, thus all participants could be defined as typically-developing. Six classes and 12 teachers (two per class) participated. All classes belonged to the same school district, characterized by a middle-high socio-economic level and teaching practices that followed the national guidelines released by the Ministry of Education. In this study, the target language was Italian, which is characterized by a transparent orthography.

In Italy, kindergartens follow national guidelines set by the Ministry of Education and include activities targeting the development of grapho-motor skills, literacy-related skills, and sensorial skills. Children are generally not exposed to formal teaching of reading and spelling, which occurs in first grade. The participating schools were not following any specific program to empower relevant variables for this study and adhered to the national curriculum. All schools and classes were also comparable in terms of the presence, visibility and accessibility of meaningful material for written language. Interviews with the participating teachers confirmed that the experimental group and the control group classrooms did not differ in emergent literacy instruction.

## Procedure

Teachers were considered eligible to the participate in this study if they were tenured, with more than 5 years of teaching experience. The outcome measures were collected in the same manner for both groups. All tests were individually administered and coded by two trained experimenters who were blind to the treatment condition. The pre-test measures were assessed at the beginning of the school year, in October. In November, the experimental group teachers attended a training on the PASSI intervention, which was also offered to the control group teachers once the study was concluded and all data had been collected. The intervention occurred over 15 weeks from mid-January to the end of April. In May, the post-test measures were assessed.

#### Fidelity of Implementation

Fidelity of implementation was verified through multiple procedures (O'Donnell, 2008). Teachers received a specific training, as explained above. All teachers included in the experimental group participated in both meetings. During the meetings, the instructor discussed the intervention theory and determined what it meant to implement the intervention with fidelity (O'Donnell, 2008). For each activity, the instructor specified which critical components and processes were necessary to implement the curriculum intervention with fidelity, and which components could be adapted to the classroom by the teacher. Teachers' knowledge and understanding of PASSI were assessed by the instructor.

After the training, each teacher was supervised by a member of the research team that created PASSI. Teachers were provided with a manual, which included detailed description of each activity, to increase the probability of fidelity of implementation (O'Donnell, 2008). The supervisor monitored the implementation of the program through weekly meetings with the teacher, to identify significant deviations from the intervention. The supervisor also held meetings with the control group teachers as part of the school routine practices, during which the supervisor monitored their activity. No sign of contamination with the experimental group was identified in any of the participating classrooms.

A research assistant was assigned to every classroom to assist as a participant observer to take field notes for 20% of the sessions. The field notes confirmed that PASSI was implemented in every classroom without departures from the instructions.

#### Inclusion Criteria

From the initial sample of 159 children, 11 children (seven children from the treatment group and four children from the control group) were excluded because they did not take part in either the pre-test or post-test assessment or because they were absent at school during the treatment period. As a result, we did not have missing data for the sample used in the analyses. From this sample, we also excluded 24 children (11 children from the treatment group and 13 children from the control group) showing a formal mastery of reading and writing during kindergarten, that is, children who knew how to correctly spell all of the words in the invented spelling of words task and who knew all of the letters of the alphabet as a result of informal extracurricular activities. The final sample included 124 children. The characteristics (number, age, and gender) are described in **Table 1**. See **Figure 1** for a flow chart representing the participants' allocation.

#### Description of Experimental Groups **Experimental group training**

PASSI was implemented by the classroom teachers. The six teachers included in the experimental group received a specific training on how to implement the intervention in the classroom. The training consisted of two 2-h meetings with one of the authors of this article. In the first meeting, the researchers explained the theoretical principles of PASSI, whereas the second meeting was a workshop on the actual activities to implement. The researchers explained the activities and given activity sheets. The teachers simulated the activities and received feedback from the instructor. Finally, the teachers' knowledge and understanding of PASSI were assessed by the instructor.

The invented spelling intervention included two aims: (i) to emphasize and enrich the symbolic material present in the child's educational environment (e.g., books, newspapers, magazines, boards, and street signs) and (ii) to create significant contexts

in which the symbolic material can be used (e.g., activities on thematic drawing and invented spelling). The activities were designed to be similar to children's everyday routines and to offer the children playful scenarios in which they could concretely use symbolic material. Each activity lasted approximately one and a half hours. Activities were performed twice a week at the beginning of the school day for 3 and a half months (15 weeks). Overall, the children worked on 30 activities, 10 for each category (graphic sign, orthographic sign, and numeric sign). Within each category, five activities stimulated decoding processes, and 5 stimulated coding processes. Each activity involved up to three tasks. At the end of each set of activities, the teacher was encouraged to report his or her observations of the children's contextualized behaviors (e.g., if they are collaborating with peers, frequently requiring the adult, working independently, or working with curiosity) and individual competences (whether the children are completely, partially, or not achieving the activity objectives). The activities varied by type and classroom structure. Some activities were based on activity sheets, some required recycling material, some were games, some others were based on story-telling, and some activities were discussion based. Regarding classroom structure, some activities were addressed to the whole classroom (for instance, discussion-based and game activities), some activities involved small groups, some other activities required students to work in pairs, and some other activities involved individual work. For instance, in the activity "There's mail for you" (targeting the orthographic sign, production), the teacher shows the children several objects involved with mail (e.g., envelopes, stamps, letters, and postcards). Then, the teacher fosters discussion with the following questions: what is this envelope for? What is a stamp for? What are letters for? Have you ever been to a post-office? What is the difference between a letter and a post-card? Have you ever sent a post-card? What did you want to say with it? The activity concludes with the teacher showing a few examples of letters so that the children familiarize themselves with this type of writing. Finally, the children work in pairs, in which one child "dictates" to the other a letter about a topic of his/her choice (e.g., birthday wishes to a classmate, farewell to a family member, and discussion of the school day with dad). This activity is functional

#### TABLE 1 | Sample characteristics (age in months and *n*).


in fostering children's conceptual knowledge of the specific forms that writing takes when we change the support on which we write. Children improve their awareness of the relationship between conventional rules about writing and the specific context (e.g., writing a letter or a birthday card).

Below, we provide an overview of the activities involved in PASSI; see **Supplemental Materials** for a more detailed description of the intervention with examples of activities.

#### **Targeting the graphic sign**

To improve the children's ability to graphically represent signs, we implemented activities such as: Creating shapes with cardboard or a rope; drawing shapes with chalk on the floor and then having the children walk on them; observing the shape of objects present in the children's everyday life and describing their perimeter; guessing hidden shapes by seeing only some detail; noticing that if we partially modify a sign, then the whole configuration of the sign changes as well; reflecting on the difference among real objects, objects in a picture, and drawn objects; reflecting on the different representations of the same object; identifying the essential traits to characterize an object through drawing; and understanding that the same graphical sign can be assigned different meanings if represented in a different position in relation to the context.

#### **Targeting the orthographic sign**

We implemented the following activities: Activities to familiarize the children with usual and unusual writing instruments; guessing games to discriminate written words from scribbles; activities in which the children played with letters; activities in which the children had to find letters within complex patterns; and activities in which the children had to read street signs.

#### **Targeting the numeric sign**

To help the children differentiate among different symbolic systems, we also targeted the writing of numbers. We implemented the following activities: Nursery rhymes in which the children associated the names of the numbers with their representation; games to associate the number with the symbolic sign; activities in which the children used written numbers to discriminate positions and quantities; and activities in which the children had to recognize numbers within complex patterns. We also constructed a clock to identify daily activities through the hours of the day and a thermometer with the line of numbers.

#### **Control group**

We asked the control group teachers to schedule the early literacy activities typically performed in the regular school curriculum in the same time slots while the experimental group children were working on conceptual knowledge of the writing system intervention for the same length of time and same frequency. More specifically, the control group worked on the following skills:

#### **Grapho-motor skills**

Playing with materials, transforming and creating with hands small and large objects, gluing and taping, cutting, filing, tracing contours, drawing straight, and curved lines, drawing labyrinths and paths, painting, coloring, and drawing repeated ornaments.

#### **Literacy skills**

Listening to and telling stories, inventing stories, illustrating stories, inventing nursery rhymes, playing with words (e.g., "which words do you know that begin with the letter . . . ?"), recognizing initial and final phonemes in a word, reflecting on the length of words, segmenting and combining words in syllables and phonemes.

#### **Sensorial skills**

Discriminating the basic colors; mixing them to create new colors; discriminating sounds, rhythm, high, and low sounds; discriminating smooth and rough materials and soft and hard materials; discriminating flavors (sweet, sour, and savory), mixing water and flour to make bread or pizza; whipping cream and baking simple sweets; discriminating the smells of flowers, beverages, food, perfumes, and glue; and describing the differences.

#### Measures

#### Invented Spelling (Bigozzi et al., 2016a,b)

The children were asked to write as best as they could the following words: their name, mum [mamma], dad [babbo], child [bambino], and little bird [uccellino]. The children's invented spelling of words was categorized into four sequential schemes: graphic scheme, pseudo-writing, symbolic scheme, and conventional spelling (see **Table 2**). Each item was coded, and a mean score was calculated. Participants' scores could range from a minimum of 1 to a maximum of 4. The reliability score was good, with α = 0.91. Two raters scored all the children's attempts. The inter-rater reliability score was strong, with k = 0.90. In the few instances when there was a discrepancy in their scoring, both scorers discussed each item until a consensus was reached.

#### Invented Spelling of Numbers

The children were also asked to write all the numbers that they knew. Two independent raters attributed a global score to the children's production and categorized it into three sequential schemes following the previous coding scheme: graphic scheme, pseudo-writing, symbolic scheme, and conventional spelling (see **Table 3**). To attribute scores to the children's production, we adapted Yamagata's coding system 2007. Originally, Yamagata's coding system had three main categories and eight subcategories. TABLE 2 | Coding system for the invented spelling of words task.


Our first sequential scheme, the graphic scheme, corresponds to Yamagata's first category, graphic products (sub-categories 1 and 2). Our second sequential scheme, pseudo-writing, corresponds to Yamagata's second category, writing-like products (subcategories 3, 4, 5, 6, and 7). We added a third sequential scheme, the symbolic scheme, to code children's productions that were similar to conventional spelling but differed from it in some detailed manner (e.g., the number "3" written with three humps). Finally, our fourth sequential scheme, conventional spelling, corresponds to Yamagata's third category, conventional products (sub-category 8). Two raters scored all the children's attempts. The inter-rater reliability score was strong, with k = 0.91. In the few instances when there was a discrepancy in their scoring, both scorers discussed each item until a consensus was reached.

#### Knowledge of the Alphabet (Aram and Biron, 2004)

The children were asked to recognize the letters of the alphabet from a set of 21 printed letters. One point was assigned for every letter correctly recognized, for a maximum of 21 points. The reliability score was good, with an α coefficient = 0.88.

#### Orthographic Awareness (Levy et al., 2006)

Twelve pairs of patterns of signs corresponding to words and non-words were represented on cardboard. In each pair, the non-word included a characteristic that violated the rules of the writing system (i.e., scribble, fonts like letters, figures, non-linearity, excessive spacing, one letter only, a mix of letters and numbers, the same letter repeated, letters written upside down, letters written backwards, only consonants, and only vowels). The children had to identify which stimulus corresponded to a word that could be read. For each word correctly identified, the children received one point, for a maximum of 12 points. The reliability score was good, with an α coefficient = 0.87.

#### Drawing Skills

To understand if the child was able to communicate different types of information depending on the request, we assigned a drawing task with a "contrastive" instruction. The child had to draw a person standing still and then a person running. The children's productions were coded on the basis of the differences between the two drawings in several dimensions: head orientation; body orientation; feet orientation; the representation of elbows, arms, ankles, knees, hair, and clothes; and the distance between feet (Morra, 2005). The differentiation score can range between 0 and 11 points. Two raters scored all the children's attempts. The inter-rater reliability score was strong, with k = 0.95. In the few instances when there was a discrepancy in their scoring, both scorers discussed each item until a consensus was reached.

#### Visual-Motor Integration, VMI (Beery and Buktenica, 2000)

This test evaluates how children integrate their visual and motor skills by asking them to copy 18 geometrical shapes of increasing complexity. One point was assigned for every shape TABLE 3 | Coding system for the invented spelling of numbers task.


correctly copied. Scores could range between 0 and 18 points. The reliability score was good, with an α coefficient = 0.91. Two raters scored all of the children's attempts. The inter-rater reliability score was strong, with k = 0.87. In the few instances when there was a discrepancy in their scoring, both scorers discussed each item until a consensus was reached.

## Research Design and Data Analysis

To test the hypotheses of this study, a parallel cluster randomized trial with a control group research design<sup>1</sup> with pre-test and post-test comparisons between two groups was carried out (Campbell et al., 2004). The research design of this study followed all indications of the Declaration of Helsinki (World Medical Association, 2013) and was approved by the Ethics Committee of the Department of Psychology at the University of Florence, Italy. We collected the written informed consent forms from the participants' parents. We strictly adhered to the requirement of privacy required by Italian law. Six teachers (and their three respective classes) were randomly assigned to the experimental group and six to the control group. The two groups were assessed with the same tests in both the pre- and post-test stages and differed in that the experimental group received a 3-month invented spelling intervention, whereas the control group followed the regular curriculum. Given the nested nature of the data, the appropriate statistical procedures were applied.

The principal descriptive statistics (mean, standard deviation, skewness, and kurtosis coefficients) were calculated. We applied increasing monotonic transformations to all variables that were not normally distributed (Fox, 2008). Differences between posttest and pre-test performances were calculated for each variable and used as dependent variables. Because the study was a parallel "cluster" randomized trial with a control group, in which classes and their teachers were randomly assigned to the control or experimental condition, we analyzed the data using complex samples general linear model (GLM) analyses. Group was included as fixed factor, pre-test scores were included as covariates, and classroom as cluster variable.

## RESULTS

## Descriptive Statistics

The descriptive analyses for the experimental and control groups are presented in **Table 4**. On average, the children's conceptual knowledge of the writing system was between the pseudowriting and symbolic schemes. The children were able to recognize an average of six letters, although the knowledge of letter performances was characterized by great variance. The children were able to recognize and discriminate from pseudowords approximately half of the words presented. In contrast, drawing skills were quite low, with children being hardly able to discriminate between a running vs. a still person when drawing. Finally, the VMI performances were in line with what was expected of children of this age, with an average of half of the 18 geometrical shapes being correctly reproduced.

All variables included in the study correlated with each other, except for drawing in the pre-test and the invented spelling of words in the post-test, drawing in the pre-test and knowledge of letters in the post-test, in addition to drawing between pre-test and post-test (see **Table 5**). These results confirm the

<sup>1</sup>Groups of individuals (school classes in this study), rather than individuals, are randomly assigned to study groups (Campbell et al., 2004).


TABLE 5 | Correlation between dependent variables.


*1, pre-test; 2, post-test.* \*\**p* < *0.01.*

stability of the emergent literacy construct, which includes several interconnected literacy-related skills (Lonigan et al., 2000).

#### Pre-test Differences

Before comparing the effect of the intervention on the students' gains over time, we controlled for baseline equivalence on the pre-test measures. The two groups did not differ in any measure: the invented spelling of words [t = −1.630, df = 122, p = 0.11; 95%CI = −0.044; 0.444], the invented spelling of numbers [t = −1.294, df = 122, p = 0.198; 95%CI = −0.309; 0.065], knowledge of letters [t = −0.830, df = 122, p = 0.408; 95%CI = −4.126; 1.688], orthographic awareness [t = −1.403, df = 122, p = 0.163; 95%CI = −0.339; 0.294], drawing [t = −0.140, df = 122, p = 0.889; 95%CI = −0.339; 0.294], or VMI [t = 0.391, df = 122, p = 0.696; 95%CI = −1.356; 2.024].

## Effects of Intervention on Dependent Variables

According to the results from the complex samples GLMs, group significantly explained differences in growth from pre-test to post-test performances in the following variables: invented spelling of words, knowledge of letters, and orthographic awareness. Growth in orthographic awareness was also explained by pre-test performances in knowledge of letters and visualmotor integration. The effect sizes of the complex samples GLMs were moderate for orthographic awareness and knowledge of TABLE 6 | Results from the complex samples GLM analyses with differences between post-test and pre-test scores as dependent variables, group as factor, pre-test scores as covariates, and classroom as cluster variable.


letters, and low for invented spelling of words, invented spelling of numbers, drawing, and visual-motor integration (see **Table 6**).

## DISCUSSION

This study tested the efficacy of PASSI, an intervention targeting 3- to 5-year-old children's conceptual knowledge of the Italian writing system, in enhancing early literacy skills. The results partially confirmed the research hypothesis, in line with prior studies on this construct (Silva and Martins, 2003; Ouellette and Sénéchal, 2008; Rieben et al., 2009; Levin and Aram, 2013). Overall, the interaction between group and time was significant for all emergent literacy skills, confirming that this set of early skills can be enhanced through interventions (Bus and van Ijzendoorn, 1999; Justice and Pullen, 2003).

More specifically, PASSI was effective in improving both, conceptual knowledge of the writing system (as assessed by the invented spelling of words task), and literacy-related skills (i.e., knowledge of the alphabet and orthographic awareness). This result has practical implications due to the relevance of children's conceptual knowledge of the writing system for the acquisition of reading and spelling (Ouellette and Sénéchal, 2017) and for the prediction of related disorders (Bigozzi et al., 2016a). PASSI aims at triggering children's metalinguistic reflection on the writing system by giving them an insight into its structure (Treiman, 1998), also granting to teachers insight into children's conceptual knowledge of the writing system. More specifically, PASSI intervenes in the simultaneous integration of the dual code, decoding, and coding, in three different symbolic systems (word writing, number writing, and drawing). PASSI does so through an embedded-explicit approach in which teachers target specific subskills (reflection on the graphic, symbolic and phonological aspect of written signs) and emphasize children's contextualized interactions with oral and written language (Justice and Kaderavek, 2004). PASSI targets children's phonological awareness by improving the integration between children's skills in this construct and other related emergent literacy skills, in light of the limited role that phonological awareness plays in transparent writing systems (Ziegler and Goswami, 2005; Notarnicola et al., 2012; Bigozzi et al., 2016a; Daniels and Share, 2018; Diamanti et al., 2018). As suggested by several theories (e.g., psycholinguistic grain size theory), the consistency of spelling–sound mappings modulate the importance of the role of phonological skills in reading and spelling acquisition in transparent orthographies (Ziegler and Goswami, 2005; Bigozzi et al., 2017; Daniels and Share, 2018; Diamanti et al., 2018).

However, similar to prior studies on invented spelling interventions, the impact of PASSI on improving children's conceptual knowledge of the writing system was small (Silva and Martins, 2003; Ouellette and Sénéchal, 2008; Rieben et al., 2009; Levin and Aram, 2013). Of notice, although on a descriptive level the experimental group showed an increment in performance in the invented spelling of numbers task, whereas the control group showed a decrease in the same task, the analysis did not reach the conventional threshold for significance. These data confirm that emergent literacy and numeracy are not overlapping domains but present some differences (Tolchinsky Landsmann, 2003) and can be explained by a few hypotheses. First, the result could depend on the fact that the conventional use of numbers appears developmentally earlier and more frequently than the conventional use of letters (Yamagata, 2007). Alternatively, the invented spelling tasks might have been too easy for the children, as shown by the means in both conditions, which might have led to underestimation of the benefit of the treatment. Or, the result might depend on the design of the invented spelling of numbers task. We chose to assign a global score to children's performances, but children may have known more numbers in time 2 than they did in time 1. Future studies should replicate the design of this study with an improved version of the invented spelling of numbers task. Finally, whereas prior studies on emergent literacy interventions have confirmed that their efficacy is domain-specific (e.g., Lonigan et al., 2013), previous studies showed that broader teaching approaches provided by parents on literacy (e.g. direct teaching of literacy skills) promoted counting skills too (LeFevre et al., 2009; e.g., Anders et al., 2012; Manolitsis et al., 2013). Thus, results might depend on the domain-specific nature of PASSI.

Interestingly, the efficacy of PASSI was higher for growth in knowledge of letters and orthographic awareness, relevant skills developing before the onset of formal skills and connected to reading and spelling acquisition (Silva and Martins, 2003; Treiman et al., 2007a,b, 2015, 2016; Ouellette and Sénéchal, 2008; Rieben et al., 2009; Puranik et al., 2011; Levin and Aram, 2013). Knowledge of alphabet was measured through a letter recognition task, rather than with a letter writing task. Prior studies have emphasized the importance of letter writing for future spelling acquisition (Puranik et al., 2013), however in this study we were interested in children's emerging knowledge about letters. It is important to notice that this task posits different cognitive demands as compared to the other tasks involving writing, a difference that may influence results. Rather than a related skill, orthographic awareness represents an important component of children's conceptual knowledge of the writing system, which is systematically interacting with phonological awareness in their attempts at spelling (Adams, 1998; Ouellette and Sénéchal, 2008).

Conversely, PASSI did not contribute to improving the children's drawing or visual-motor integration skills, which confirms the specificity of this intervention and emphasizes how, developmentally, drawing and writing skills are already separate domains in the child (Tolchinsky Landsmann and Karmiloff-Smith, 1992). Of notice, children's visual-motor integration skills were involved with children's orthographic awareness, confirming the involvement of domain-general skills in the application and execution of knowledge on phonologicalorthographic connectivity (Pinto and Camilloni, 2012; Read and Treiman, 2013).

For these reasonsresults from this study are in accordance with Levin and Aram (2013) criticism of prior invented spelling interventions that designed developmentally tailored interventions (Silva and Martins, 2003; Ouellette and Sénéchal, 2008), which could constrain children's progress. Concerning the role played by children's visual-motor skills, while our data confirm the involvement of this bottom-up construct in children's emergent literacy skills (Pinto and Camilloni, 2012), the fact that the group explained most of the variance of the dependent variables, even after the effect of children's visual-motor integration skills were checked, confirms that children rely on other sources of information when "inventing spelling," namely, their knowledge of the structure of the writing system.

## Limitations and Directions for Future Research

Although children's emergent attempts at spelling must be considered in light of the characteristics of their writing system (Levin and Aram, 2013; Read and Treiman, 2013) and its transparency (Ziegler et al., 2010), the efficacy of PASSI might be extended to other languages. Previous studies have confirmed the efficacy of invented spelling interventions in both transparent (e.g., Portuguese, Silva and Martins, 2003) and opaque orthographies (e.g., French, Ouellette and Sénéchal, 2008; Rieben et al., 2009; or Hebrew, Levin and Aram, 2013). We speculate that the embedded component of PASSI, in which children's spontaneous interactions with the symbolic systems included in their environment are emphasized and supported, might be cross-linguistically similar, whereas the explicit component of PASSI, which includes children's engagement with grapheme-phoneme correspondences, might be language bound and should be adapted when used in different contexts. Future studies should confirm these speculations on the generalizability of PASSI to other languages. Similar to other studies on invented spelling interventions (Silva and Martins, 2003; Ouellette and Sénéchal, 2008; Rieben et al., 2009; Levin and Aram, 2013), the efficacy of PASSI on conceptual knowledge of the writing system was only moderate. This similarity in results suggests an apparent difficulty in enhancing children's conceptual knowledge of the writing system through intervention. One reason might be the multicomponential nature of this construct, which means that an intervention needs to target each of these components (e.g., phonological awareness, orthographic awareness, and visual-motor integration) and integration among them (e.g., sound-sign mapping). Consequently, for students, it might be difficult to transfer what was learned during the intervention to other tasks. An indirect confirmation derives from the fact that Ouellette and Sénéchal's intervention 2008 did not improve the reading performance of words not included in the intervention, a datum that suggests that the intervention was not effective in targeting children's grapho-phonemic mapping skills (Levin and Aram, 2013). A second reason might be the difficulty that children face in differentiating between different symbolic systems, for example, drawing and writing. Frequently, schools adopt a mixed writing and drawing approach in kindergarten, in which children are asked to write a word in response to the associated picture. Such a practice should be reconsidered, given that prior studies have shown that mixing these two systems might retard automaticity in writing (Adi-Japha and Freeman, 2001), a phenomenon that might have contributed to reducing the efficacy of PASSI.

Results from the present study are limited to tasks adopted to measure emergent-literacy skills, and the effect of control variables included, namely drawing and visualmotor integration skills. The effect of PASSI on growth in invented spelling was small for words and non-significant for numbers, but results might depend on the characteristics of the tasks (e.g., limited number of items for invented spelling of words, or absence of a specific end for invented spelling of numbers). In terms of control variables, future studies should verify whether other developmentally relevant skills moderate the beneficial effects of PASSI (e.g., mental state talk, given its context-dependency, Pinto et al., 2016).

In this study, teachers, not students, were randomly assigned to conditions. We controlled the effect of data nested within these clusters statistically through a complex samples GLM approach. However, given the lack of a true experimental control in the clustered design employed in this study, we could only produce evidence supporting the efficacy of the PASSI intervention, which should be "confirmed" or "verified" through future studies employing a randomized trial research design.

### Implications for Applied Practice

Prior studies have emphasized the importance of children's conceptual knowledge of the writing system as a specific predictor of reading and spelling disorders (Bigozzi et al., 2016a). This predictor was assessed in kindergarten, before the onset of formal literacy. This study supports the hypothesis that this important process can be fostered through a targeted intervention. This result has relevant practical implications, given that it suggests that teachers can act on a specific risk factor of reading and spelling disorders when children are still in kindergarten, contributing to preventing them or at least reducing their severity through a primary prevention approach. PASSI is not an intervention promoting early learning to read and spell in a formal manner; rather, it fosters emergent literacy processes associated with literacy acquisition. Thus, PASSI is an "ecological" instrument that can be integrated in kindergarten's daily routines. Moreover, it is crucial to implement early interventions supporting children spelling acquisition, so that decoding and coding processes are automatized by children before they become functional to more complex tasks, such as composing (Pinto et al., 2015b) or reflecting (Bigozzi et al., 2011).

## CONCLUSION

In conclusion, this study confirms the validity of PASSI, an intervention into children's conceptual knowledge of the writing system targeting children's emergent literacy skills through an embedded-explicit approach (Justice and Kaderavek, 2004), rather than through tailored interventions (Levin and Aram, 2013). This emergent literacy construct is a cognitive precursor of reading and spelling acquisition (Bigozzi et al., 2016a,b; Pinto et al., 2017). PASSI promoted engagement with the graphic, orthographic and numeric sign by emphasizing children's daily self-initiated, naturalistic, and contextualized interactions with oral and written language, in addition to implementing structured, sequenced and directed instruction targeting specific skills. In this manner, the children were able to develop specific emergent literacy skills, such as knowledge of letters and orthographic awareness, and an integrated conceptual knowledge of the writing system, all of which are significantly associated to later reading and spelling development (Puranik et al., 2011).

## AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2018.01396/full#supplementary-material

## REFERENCES


primary school students' writing and reading skills. Read. Writ. 28, 1–23. doi: 10.1007/s11145-015-9569-9.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Pinto, Bigozzi, Tarchi and Camilloni. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Grapheme-Phoneme Learning in an Unknown Orthography: A Study in Typical Reading and Dyslexic Children

Jeremy M. Law1,2 \*, Astrid De Vos 2,3, Jolijn Vanderauwera2,3, Jan Wouters <sup>3</sup> , Pol Ghesquière<sup>2</sup> and Maaike Vandermosten<sup>3</sup>

<sup>1</sup> School of Interdisciplinary Studies, University of Glasgow, Glasgow, United Kingdom, <sup>2</sup> Parenting and Special Education Research Unit, KU Leuven, Leuven, Belgium, <sup>3</sup> Laboratory for Experimental ORL, KU Leuven, Leuven, Belgium

In this study, we examined the learning of new grapheme-phoneme correspondences in individuals with and without dyslexia. Additionally, we investigated the relation between grapheme-phoneme learning and measures of phonological awareness, orthographic knowledge and rapid automatized naming, with a focus on the unique joint variance of grapheme-phoneme learning to word and non-word reading achievement. Training of grapheme-phoneme associations consisted of a 20-min training program in which eight novel letters (Hebrew) needed to be paired with speech sounds taken from the participant's native language (Dutch). Eighty-four third grade students, of whom 20 were diagnosed with dyslexia, participated in the training and testing. Our results indicate a reduced ability of dyslexic readers in applying newly learned grapheme-phoneme correspondences while reading words which consist of these novel letters. However, we did not observe a significant independent contribution of grapheme-phoneme learning to reading outcomes. Alternatively, results from the regression analysis indicate that failure to read may be due to differences in phonological and/or orthographic knowledge but not to differences in the grapheme-phoneme-conversion process itself.

Keywords: dyslexia, literacy, phonological awareness, orthographic knowledge, letter-speech sound learning, grapheme-phoneme-correspondences, artificial script, children

## INTRODUCTION

The ability to encode and decode meaning by using a collection of distinct markings, or simply put, the ability to read and write, is an impressive accomplishment. Unlike spoken language, which in the general population can be mastered without direct instruction, explicit instruction over a period of several years is required to master the ability to read and write.

Most children are capable of acquiring adequate grapheme-phoneme knowledge within the first year of reading instruction. However, the achievement of full automatization and integration of separate phonemes and graphemes into a single audiovisual unit (also known as grapheme-phoneme binding), requires several years of literacy experience and requires the formation of a functional neuro-circuitry converging print and spoken language processing networks, permitting a child to emerge as a skilled reader (Blomert and Vaessen, 2009; Blomert, 2011; Holloway et al., 2013; Clayton and Hulme, 2018). Since it has been theorized that a disruption in the development of integrated grapheme-phoneme representations

#### Edited by:

Silvia Brem, Psychiatrische Klinik der Universität Zürich, Switzerland

#### Reviewed by:

Hannah Marie Nash, University of Leeds, United Kingdom Joana Acha, Basque Center on Cognition, Brain and Language, Spain

> \*Correspondence: Jeremy M. Law jeremy.law@glasgow.ac.uk

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 31 March 2018 Accepted: 18 July 2018 Published: 15 August 2018

#### Citation:

Law JM, De Vos A, Vanderauwera J, Wouters J, Ghesquière P and Vandermosten M (2018) Grapheme-Phoneme Learning in an Unknown Orthography: A Study in Typical Reading and Dyslexic Children. Front. Psychol. 9:1393. doi: 10.3389/fpsyg.2018.01393 interferes with the acquisition of accurate and/or fluent word reading, our research will investigate how these graphemephoneme mappings are learned in typical and dyslexic reading children and what the unique contribution to reading is.

In developmental dyslexia, present in 5–7% of children, reading is nearly insurmountable, in the absence of any motivational, perceptual or environmental explanation (Snowling, 2000). It has been proposed that the poor decoding abilities observed in people with dyslexia stem from a cognitive deficit in the development of, and/or access to, phonological representations leading to a difficulty in the processing of sounds in oral language (Snowling, 2000; Boets et al., 2013). The ability to attend to and manipulate speech sounds is essential for the formation and automatization of grapheme-phoneme correspondences which in turn is the foundation of accurate and fluent decoding. Research across various age (Shaywitz et al., 2007) and language groups (Ziegler and Goswami, 2005) has provided support for the phonological deficit theory of dyslexia and transparency of the language. Furthermore, the observation of the existence of a phonological deficit prior to the onset of formal reading instruction, and its significant relation to later literacy achievement, has lent support for the phonological deficit's potential causal role in dyslexia (Wagner and Torgesen, 1987; Pennington and Lefly, 2001; Boets et al., 2011; Snowling and Melby-Lervåg, 2016; Law et al., 2017b; but see Castles and Coltheart, 2004) for a critical review on causal evidence based on longitudinal studies).

Although phonological deficits have been demonstrated to account for a significant portion of the variance in reading by dyslexics, a significant portion of the variance remains unexplained. To account for this remaining variance recent studies have begun to explore alternative cognitive variables which may contribute for the literacy difficulties of individuals with dyslexia; such as orthographic processing (Ziegler et al., 2010; Boros et al., 2016) morphological awareness (Law et al., 2015, 2018; Cavalli et al., 2017) or statistical learning (Krishnan et al., 2016; Schmalz et al., 2017, Vandermosten et al., 2018). An additional factor that has been proposed to act as a critical bridge between the previously identified cognitive and behavioral deficits of individuals with dyslexia lays within the formation and integration of grapheme-phoneme mappings. Past research has theorized that deviant grapheme-phoneme integration of individuals with dyslexia is a result of phonological problems (Snowling, 2000). However, Blomert and Willems (2010) observed difficulties in grapheme-phoneme integration that were independent of phonological difficulties. Furthermore, brain potential and neuroimaging research comparing neural activation of dyslexic and typical reading participants also support the hypothesis of less integrated grapheme-phoneme representations in poor readers. Such studies have demonstrated that individuals with dyslexia exhibit reduced activity in response to grapheme-phoneme associations in the superior temporal sulcus region of the brain, a region highly associated with neural integration of grapheme-phoneme pairs. However, the same individuals with dyslexia were found to demonstrate adequate knowledge when graphemes and phonemes were separately presented. These results suggest a problem in the automatic integration (Froyen et al., 2009; Blau et al., 2010; Blomert, 2011; however, see Nash et al., 2017). Furthermore, in a study by Froyen et al. (2011), 11-year-old children with dyslexia exhibited brain responses which did not show any evidence of letter-speech sound integration and were noted to be comparable to the weakly associated effects found in typical first-grade readers.

Although these studies provide evidence supporting the presence of letter-speech sound learning deficits in dyslexia, the learning process of integrating graphemes and phonemes is never monitored as these studies use native graphemes and phonemes that are already trained for years. When using native phonemes and graphemes the differences in grapheme-phoneme coupling might be driven by prior letter knowledge and the degree of exercise and reading experience. In that perspective, the behavioral study by Aravena et al. (2013, 2017) and Karipidis et al. (2018) differed from past studies in its testing of newly learned association through the use of a novel script. The use of an artificial orthography, in contrast to the use of the subjects' native orthography, allows for a characterization of the initial learning process involved in the creation of grapheme-phoneme associations, in contrast to a mere assessment of the product of this learning. The use of such a novel script permits the control over conditions relating to how the associations were formed, such as length of exposure and instruction methodology.

In the studies by Aravena et al. (2013, 2017), children had to learn eight basic grapheme-phoneme correspondences using unfamiliar Hebrew letters and speech sounds derived from the participants' native language (Dutch). Results from Aravena et al. (2013) study indicated that the basic knowledge of the newly learned grapheme-phoneme associations was mastered equally well by both typical and dyslexic readers. However, dyslexic children performed worse on a word reading task within an artificial orthography. These results led Aravena et al. (2013) to conclude that the process of letter-speech sound binding is impaired in dyslexia. In a follow up study by Aravena et al. (2017) typical readers were eventually found to outperformed the dyslexic readers for accuracy and speed on a letter–speech sound matching task, thus supporting their initial claim. However, the inconstancy between the studies raises question concerning the nature of these observed differences in performance. It could be argued that if a phoneme-grapheme binding deficits are present within readers with dyslexia, it would be expected to be observed during the assessment of newly learned letter-speech sound association across both studies and not only be observed as a deficit in reading when using these newly learned associations which partly relies on phonological skills, as reported in Aravena et al. (2013).

Building on these concerns, Peterson and Pennington (2015) argued that the interpretation of these results is ambiguous, as results could be a function of the unimodal phonological deficit of individuals with dyslexia and not unique to a letterspeech sound binding deficit per se. The cross-modal integration, required to form accurate audiovisual representations of letterspeech sound correspondences, is complex and involves two very different representational systems, namely phonological and orthographic representations (Litt and Nation, 2014; Clayton and Hulme, 2018). Thus, the observation of dyslexic reader's failure to accurately integrate letter-speech sound correspondences could be attributable to a failure of phonological and orthographic processes in isolation or an issue directly concerning the association process itself. For instance, research has demonstrated the existence of early problems in phonological processing, before the introduction of print of children later found to have dyslexia. These early problems with phonological processing have been shown to be a strong predictor of later reading via measures of letter naming, rapid naming, morphological awareness, and phonological awareness (Boets et al., 2007; Torppa et al., 2010; Law et al., 2017a,b).

Therefore, any future work exploring letter-speech sound binding deficits would need to control for unimodal phonological processing and orthographic processing across groups. Such a design would aid in reducing any ambiguity and permit an assessment of letter-speech sound binding deficits and their independent contribution to reading (Peterson and Pennington, 2015).

Although the Aravena et al. (2017) included measures of phonological awareness as controls, orthographic processing controls were not included when exploring how letter-speech learning contributions to predicting individual differences in reading and spelling ability. An issue this study will set out to address.

## The Present Study

To address this gap in the literature, the current behavioral study is set out to examine letter-speech sound learning of individuals with dyslexia while controlling for unimodal phonological and orthographic processing across groups. The aims of this study were two-fold. First, we examined whether letter-speech sound binding deficits of individuals with dyslexia were behaviorally detectable within the initial stages of learning an unfamiliar script. Secondly, we examined the relationship of letter-speech sound learning in individuals with and without dyslexia with phonological and orthographic processing and its independent contribution to reading outcomes. Since the majority of the previous studies reviewed above found letter-sound binding deficits in dyslexics (i.e., Blomert, 2011; Aravena et al., 2013, 2017; with the noted exception of Nash et al., 2017), we hypothesized that such deficits would be observable within our dyslexic group. Additionally, we hypothesize that the ability to learn letter-sound connections would be a good predictor of reading, independent of phonological awareness (Blomert and Willems, 2010; Aravena et al., 2017), orthographic skills and naming speed.

To achieve these ends, this study used the same implicit learning task and assessment procedure as that used by Aravena et al. (2017). Children with dyslexia and typical readers were provided with a short computer game based training program, aimed at the learning of 8 basic letter-speech sound correspondences within an unfamiliar script (Hebrew), paired with familiar phonemes derived from their native language (Dutch), thus allowing for an assessment and comparison between the groups of the initial phase of learning a novel script. Additionally, an assessment using an artificial script paradigm allows for the control of differences in previous exposure to experimental stimuli, thus more closely mimicking the early phase of learning to read in children irrespective of prior linguistic knowledge.

## METHODS

## Participants

All children were recruited from the longitudinal study reported by Vanvooren et al. (2017). For this study, a total of 84 third grade children were assessed. Subdivision of the sample was made based on reading status, where 20 participants were found to be dyslexic while 64 were typical chronologically age matched readers. The average chronological age of the participants was determined to be 8 years and 3 months, ranging from 7 years and 9 months to 8 years and 8 months. All participants were native Dutch-speakers and found to have normal non-verbal IQ, that is, a standardized score ≥80 on the Wechsler Intelligence Scale for Children-III (WISC-III-NL) Block Design subtest (Kort et al., 2005). Based on parental/guardian questionnaires, all participants were found to have no history of brain damage, language problems, psychiatric symptoms, or uncorrected visual or auditory problems.

Similar to Vanvooren et al. (2017), dyslexia status was determined based on evidence of persistent and severe literacy deficits, defined as a score below the 10th percentile on the standardized word reading tasks and/or the spelling task at two consecutive test moments (i.e., in second and third grade). Children who were identified as dyslexic based on a spelling score below percentile 10 also performed below percentile 10 for reading on at least one test moment and below percentile 25 on reading at all test moments. As such, 20 participants in the current sample were found to be dyslexic, while 64 were typical readers.

## Background Measures

All participants completed a testing battery to provide a better understanding of the cognitive and literacy skills of each group. All tests were administered in a single session. Descriptive statistics and t- and p-values from the independent t-tests for each background measure are given in **Table 2**.

Word reading was assessed through the EMT (Brus and Voeten, 1999). This standardized task requires students to read aloud as accurately and quickly as possible a list of 116 Dutch words of increasing difficulty, printed in four columns. The participants were given 1 min to read as many words as possible. The raw score is calculated as the number of words read correctly. The EMT has been found to be a reliable measure (r = 0.87), as determined through the use of a parallel test method (Brus and Voeten, 1999).

Pseudo-word reading was assessed by means of the Klepel (Van den Bos et al., 1994). Students are instructed to read aloud as quickly and as accurately as possible a list containing 116 pseudowords following Dutch grapheme-phoneme correspondence rules. The raw score is calculated as the number of pseudo-words read correctly in 2 min. The Klepel is a standardized test with a reported reliability of r = 0.91 determined through the use of a parallel test method.

Phonological awareness was assessed with a phoneme deletion task that has been used previously in reading research in Dutch (Boets et al., 2010) as well as in other populations of a similar age (Hecht et al., 2001). Children are presented with 28 singlesyllable non-words and asked to delete a target phoneme of the non-word. The task consists of two parts. The first presents the participant with 10 non-words which results in the production of a real word after deletion of the given phoneme (e.g., DROOS without /d/, an English equivalent would be DROPE without /d/). The second part includes 18 items and results in the production of a non-word after deletion of the given phoneme (e.g., WAPT without /t/). No time restriction is applied. Two practice items are provided before both test sessions to ensure familiarity with task administration. Each correctly solved item is rewarded with one point (maximum = 28). Internal consistency of the test is 0.93 (Evers et al., 2009–2012).

Orthographic Knowledge was measured at the start of grade 3 using a pseudo-homophone task (Bekebrede et al., 2010) consisting of two practice items and 70 test items, visually presented to the child. Bekebrede and colleges reported an internal consistency (Cronbach's alpha) of 0.68. Each item consists of three answer alternatives that are orthographically different, although they are phonologically related (e.g., "voet voed - foet," an English equivalent would be "fox - phox - focks"). The child has to determine which alternative is orthographically correct. The orthographic knowledge score is based on the accuracy, with a maximum score of 70.

Four rapid serial naming tasks assessed the rapid serial naming of five familiar colors, objects, numbers and letters (van den Bos et al., 2002). According to Velvis (1998), the average testretest reliability of the battery is 0.74. For each stimulus type, the child is presented with a card 50 items that are ordered in five columns, randomly arranged. The four tasks were individually administered in the same fixed order: numbers, letters, pictures, and colors. The child was instructed to name the symbols as fast and accurately as possible. The number of correctly identified items per second was recorded. A composite score RAN-Total was created by averaging the z-scores of all four naming tasks.

## Letter-Speech Sound Learning

This study used the same training and assessment procedure as Aravena et al. (2017). All children were provided with a short computer game based training program, aimed at the learning of 8 basic letter-speech sound correspondences within an unfamiliar script (Hebrew), thus allowing for an assessment and comparison between the groups of the initial phase of learning a novel script. Hebrew letters were utilized to remove and control for any influence of prior knowledge, associations, and experience with Dutch orthography. It is believed that by adopting an artificial orthography, such as the one reported in **Table 1**, a-priori differences in exposure to the experimental stimuli can be controlled for and ruled out.

The artificial orthography (reported in **Table 1**) of Aravena et al. (2013, 2017) consists of eight Hebrew graphemes matched to phonemes from the participants' native language, Dutch. The resulting script consists of four vowels and four consonants, with the graphemes presented from left to right. Blomert (2011) noted that the quality of the audio-visual integration of letterspeech sound correspondences in the brain is reflected in the time course of the neural activation of target units and, additionally, manifested at the behavioral level in the associated response latencies during identification. Therefore, a measure of both accuracy and response latencies relating to participants' ability to identify newly learned letter-speech sound correspondences were measured.

#### Training Method

Training was carried out with an interactive computer game in which the 8 new letter sound couplings were learned. The objective of the game required children to match target speech sounds presented through headphones, with the corresponding Hebrew symbol that was visually presented on the computer screen inside an animated balloon. The game challenged the children to burst, or pop, the corresponding balloon as quickly and as accurately as possible. Correct associations made the balloon and the surrounding balloons to disappear. If the wrong balloon was popped, no balloons would disappear. This paradigm allowed for the children to learn the correct couplings through trial and error, thus allowing for implicit learning. When all balloons had been popped and disappeared, the child moved on to the next "balloon field." The clearing of several "balloon fields" allowed for the advancement to the next level which advanced in the level of difficulty through the addition of extra letters in the balloon field, advancing from 2 to eventually 8 different letters. The faster the child played, the more points were awarded. Each child was trained for 20 min per session, regardless of the level of difficulty attained.

#### Letter-Speech Sound Identification Within the Artificial Orthography

The Lexy association test was administered directly after training. Each child was presented target Dutch speech sounds through a headphone and was instructed to select the corresponding Hebrew letter from two alternatives presented on a computer screen (50% chance level). All presented Hebrew letters were the same as those used in the training game. Each child was instructed to indicate as fast and as accurately as possible which sign corresponds to the sound they heard, allowing for measures of both accuracy and speed of access to the learned couplings. A maximum correct score of 56 was achievable. Speed was measured in milliseconds per coupling allowing for the creation of a Lexy rate score of average time (milliseconds) per correct item.

#### Reading Within the Artificial Orthography

The 3MAST, a reading task within the artificial orthography was administered following the Lexy association measure, consisting of 22 high-frequency Dutch words written within the artificial orthography. Each word consisted of two to four Hebrew letters and was presented in two columns of 11 words on a printed card. The child was provided 3 min to read as many of the words as possible aloud. Similar to the EMT and Klepel, a rate score (words per second) was calculated representing the number of correctly read words within 3 min.

TABLE 1 | Novel letters from Hebrew alphabet and corresponding Dutch speech sounds.

IPA, International Phonetic Alphabet.

TABLE 2 | Performance and group comparisons on literacy and cognitive tasks.


ˆfailed Levene's test for Equality of Variance.

\* significant p-value after applying the FDR procedure.

<sup>a</sup>Mann-Whitney U test.

<sup>b</sup>eta-squared, η 2 .

## Statistical Analyses

Statistical analyses were performed with SPSS 20.0 software (IBM Corp., 2011). All variables were found to be normally distributed as checked within each group by the Shapiro-Wilk's test for normality (p > 0.05) with the exception of the accuracy score of the Lexy and 3MAST rate measure. To approach a normal distribution, the Lexy accuracy score was transformed by a reflect logarithmic transformation that led to a distribution that was found to be normal and so the transformed scores were used in the analyses. Due to the lack of normality achieved through the application of various transformations of the 3MAST task, nonparametric tests were utilized in the analysis of results relating to this variable. Homogeneity of variance was assessed by Levene's Test for Equality of Variances. Group comparisons were investigated based on an independent samples t-test for all measures with the exception of the 3MAST task. Correction for multiple testing was applied across all group comparisons to avoid the likelihood of false positive conclusions through the application of the False Discovery Rate (FDR) procedure, a simple sequential Bonferroni-type procedure that has been proven to control for the false discovery rate for independent test statistics (Benjamini and Hochberg, 1995). Concurrent relations between measures of orthographic knowledge, phonological awareness, RAN, reading and outcome measures of the letterspeech sound learning tasks were evaluated with Pearson and Spearman correlations.

To assess the portion of the unique variance of reading and non-word reading explained by letter-speech sound binding, a series of stepwise linear regression analyses were calculated across both groups. For each model, reading and non-word reading performance was predicted by measures of phonological awareness (PA), orthographic knowledge (Ortho), RAN, and the two letter-speech sound learning and binding outcome measure: Lexy and 3MAST.

Prior to conducting a hierarchical multiple regression, assumption testing revealed the presence of an adequate sample size (n = 84) given the inclusion of five independent variables included in the analysis (Tabachnick et al., 2001). The assumption of singularity was also met as the independent variables (phonological awareness, orthographic knowledge, RAN, Lexy and 3MAST) were not a combination of other independent variables. An examination of correlations (see **Table 3**) revealed that no independent variables were highly correlated, except orthographic knowledge with both phonological awareness and RAN. However, as the collinearity statistics indicate that the variance inflation factors (VIF) were all within acceptable limits (all VIFs < 1.67), the assumption of multicollinearity was deemed to have been met (Hair et al., 1998). Histogram and P-P plots of the standardized residuals for each model and scatter plots indicated the assumptions of normality, linearity, and homoscedasticity were all satisfied (Hair et al., 1998).

## RESULTS

## Differentiating Typical Readers and Individuals With Dyslexia

Similar to past studies typical and dyslexic readers were found to differ across all cognitive measures of phonological awareness,


\*p < 0.05.

\*\*p < 0.01.

\*\*\*p < 0.001.

<sup>∧</sup> Spearman correlations.

orthographic knowledge and rapid naming in addition to both literacy measures, as reported in **Table 2**.

For the purpose of examining whether the brief letter-speech sound training could differentiate between participants with or without dyslexia, groups were compared across all letter-speech sound learning related measures. Mean scores and tests statistics for each comparison are also reported in **Table 2**. Results indicate that both groups performed equally well on both speed and accuracy measures with regard to novel letter naming ps > 0.400. On the other hand, individuals with dyslexia were on average found to underperform when having to apply the newly learned novel script in the 3MAST word reading task p < 0.001. Analysis of the effect size utilizing eta-squared values indicated that nearly 17% of the variability in the reading rate of an artificial orthography could be accounted for by the individual's reading status.

#### Correlation Analysis

To determine the concurrent relations between measures of letter-speech sound learning within the artificial orthography, phonological awareness, orthographic knowledge, RAN, word and non-word reading, Pearson and Spearman correlations were calculated and are reported in **Table 3**.

As was expected, phonological awareness, orthographic knowledge, and RAN were found to be significantly correlated with both word and non-word reading. The 3MAST measure of reading within the artificial orthography was found to correlate significantly with letter knowledge accuracy and rate in the novel script. In addition, the 3MAST task was found to correlate significantly with phonological awareness, orthographic knowledge, word and pseudoword reading, but not with RAN. The Lexy scores were not found to correlate with any of the assessed variables with the exception of the 3MAST task. When IQ and age were introduced across all subjects to control for any spurious effects, all relations were maintained.

#### Predicting Individual Differences in Literacy Achievement

To test the unique variance of reading explained by each of the predictor variables a series of hierarchical multiple regression analyses were conducted with word reading and non-word reading as the dependent variables. All independent variables were entered at stage one of the regression with the exception of the target independent variable. This was done to allow for the assessment of the unique contribution of each independent variable in explaining individual variance in word and non-word reading above that of the additional controls. Inter-correlations between the hierarchical multiple regression variables were reported in **Table 3**, and the regression statistics are in **Table 4**.

The full model of PA, orthographic knowledge, RAN, Lexy accuracy and 3MAST to predict word reading was statistically significant, R <sup>2</sup> = 0.770, F(4, 76) = 63.693, p < 0.001, adjusted R 2 = 0.758. Similarly the full model predicting non-word reading was found to be statistically significant, R <sup>2</sup> = 0.688, F(5, 76) = 41.814, p < 0.001, adjusted R <sup>2</sup> = 0.671.

Results revealed that the addition of PA to the prediction of word reading (after controlling for any variance explained by the other independent variables) led to a statistically significant increase in R <sup>2</sup> = 0.012, F(1, 76) = 4.065, p = 0.047. Similarly, orthographic knowledge and RAN were found to uniquely contribute to word reading, with respectively R <sup>2</sup> = 0.177, F(1, 76) = 58.655, p < 0.001 and R <sup>2</sup> = 0.054, F(1, 76) = 17.721, p < 0.001.

The hierarchical multiple regression analysis with non-word reading as a dependent variable found that PA [R 2 change = 0.038, F(1, 76) = 9.357, p = 0.003], and orthographic knowledge [R 2 change = 0.150, F(1, 76) = 36.549, p < 0.001] each uniquely contributed to a statistically significant increase in R <sup>2</sup> when entered as the independent variable in the model. While RAN-Total was not found to offer and additional contribution [R 2 change = 0.010, F(1, 76) = 2.464, p = 0.121].

Although the children's reading within the artificial orthography, as measured with the 3MAST task, was found to be significantly related to word and non-word reading the addition of the 3MAST task in the model found that reading in the newly learned artificial orthography did not offer any additional contribution to explaining the variance of word and non-word reading once measures of PA, orthographic knowledge and RAN were controlled for. Therefore, results of this analyses could not support our research hypothesis which


TABLE 4 | Unique variance in word and non-word reading, accounted for by measures of letter-speech sound binding (3MASTRate), phonological awareness (PA), orthographic knowledge and rapid naming (RAN-Total) (R <sup>2</sup>change and standardized Beta).

\*p < 0.05.

\*\*p < 0.01.

\*\*\*p < 0.001.

predicted a significant independent contribution of our measure of letter-speech sound binding on reading outcomes.

## DISCUSSION

This study examined letter-speech sound learning using a novel script (Hebrew) paired with speech sounds taken from the participants' native language (Dutch). Both groups of children with dyslexia and chronologically age matched controls, underwent a 20-min training program administered with the aim of teaching them eight basic letter-speech sound correspondences within a novel orthography. Assessments of both mastery of identification and word reading in the novel script were made. Additionally, this study set out to investigate letter-speech sound learning's relation with measures of phonological awareness, orthographic knowledge and RAN as well as the combined and unique contribution these variables have on reading and non-word reading achievement. Supporting past research, our predictor variables of phonological awareness, orthographic knowledge and RAN were found to be related to both reading outcomes of word and non-word reading. Additionally, our measure of reading within the artificial orthography was found to correlate significantly with word and non-word reading.

Results from this study demonstrated that most children in both groups were capable of mastering the new letter-speech sound correspondences within the allotted time by the use of our implicit learning game. As such, groups were not found to differ significantly on the measure of grapheme-phoneme learning, thus we could not support past findings reporting differences between children with and without dyslexia when matching, under time pressure, graphemes of an artificial-letter script with phonemes (Aravena et al., 2013, 2017; Karipidis et al., 2018). However, outcome measures assessing the child's ability to apply these newly learned correspondences to decode text were found to significantly differ between the reading groups (dyslexic vs. typical readers). These results replicate Aravena et al. (2013, 2017) findings of the presence of group differences in word reading rate measures within the novel script (3MAST rate). Aravena et al. (2013) argued that the presence of such group differences suggests a difficulty in the early stages of crossmodal integration of phonemes and graphemes in children with dyslexia. Aravena et al. (2013, 2017) went on to suggest that a deficit in the process of cross-modal integration of phonemes and graphemes is a key contributing factor in the expressed literacy difficulties of individuals with dyslexia, and could be invoked at any time by the presentation of a novel script, as was done in the 3MAST task. However, it must be noted that our interpretation of these results and our support of Aravena and colleagues' acieration is limited as the 3MAST task was administered directly after a short training period and, therefore, cannot be said to be a direct measure the integration of letter-speech sound correspondences, as a greater amount of training time would be required. Although the 3MAST task is believed to offer an assessment of an individual's ability to instrumentally use newly learned letter-sound correspondences, it should be noted that this task additionally relies partly on additional processes such as verbal short-term memory and phonological awareness skills (i.e., phoneme blending), thus, resulting in ambiguity surrounding our interpretation of the 3MAST task as a measure of the early stages of letter-speech sound learning, especially since the expected differences in the Lexy task were not observed. Additionally, given that performance on the 3MAST task was found to be correlated with existing reading ability, the directionality of the relationship could be argued in such a way that performance on the 3MAST task is a consequence of reading impairment or experience and not a cause. Results of the regression analyses demonstrated that our measure of reading in an artificial script made no meaningful contribution in explaining individual differences in reading and non-word reading, after controlling for phonological awareness and orthographic knowledge, indicating that the uncontrolled contribution of the 3MAST task in explaining reading outcomes is a function of unimodal phonology and orthography processing.

However, since our study involved the learning of an artificial orthography, it was believed and argued by Aravena et al. (2013, 2017) that any group differences in reading experience would have been controlled for. As an artificial orthography was used, a relation between our Dutch measure of orthographic knowledge and reading in an artificial orthography was not expected, yet observed. One possible explanation for this relation may be found in reading experience's impact on brain development. Neurodevelopmental studies of children and ex-illiterates have demonstrated a clear neurological development in response to reading experience (Dehaene et al., 2010; Thiebaut de Schotten et al., 2014). Findings have shown that as reading improves an increased unilateral activation in the left occipito-temporal region or visual word form area is observable (Shaywitz et al., 2002). Additionally, learning to read does not merely alter an individual's visual cortex response to written words. Studies have revealed measurable changes in the language areas of the left hemisphere associated with phoneme perception and articulation (Turkeltaub et al., 2003). Therefore, it could be argued that regardless of the specific control for past reading experience, provided by our novel script reading measure, differences in the brain circuitry caused by individual differences in literacy experience may still exhibit themselves as a more optimal means of processing newly learnt orthographic units and/or phoneme articulation and blending which is also required when reading in the artificial orthography.

Furthermore, Nash et al. (2017) demonstrated that children with dyslexia's degree of letter-sound integration was appropriate for their reading level, suggesting that compromised letter-speech sound integration may be a function of reading level. However, in contrast to our design, the study of Nash and colleagues used a grapheme-phoneme task based on the participant's native language, not an artificial one which has been argued to remove the influence of past orthographic knowledge (also see Blomert and Willems, 2010). Thus, ambiguity still exists concerning the directionality of the relationship, as our research design limits our interpretations of such cause and effects.

#### CONCLUSION

The present data cannot directly support the notion of a letterspeech sound integration deficit in dyslexic children. However, findings do indicate that children with dyslexia are less well able to use newly acquired phoneme-grapheme rules to decode words. As discussed above, this is most likely a result of reduced reading experience/level and differences in the phonological skills required for decoding. Our study was not able to support letterspeech sound learning as an independent contributor to reading difficulties of individuals with dyslexia. However, our findings are in line with past research that has implicated impaired phonological awareness as the source of reading impairment (Ramus et al., 2003; Law et al., 2017a).

In summary, our results appear to suggest that within a short period of training dyslexics can learn to identify the letter speech sound combination. However, when applying it, failure occurs. This could be argued to be due to either (1) other reading-related processes which are needed to perform this task, such as phonological skills, RAN or verbal shortterm memory, or specific neurological changes related to increases print exposure, or (2) based on the rational of Aravena et al. (2013, 2017) the lack of full integration of, or access to, the new grapheme-phoneme correspondences can be argued to be essential when having to apply these correspondences within a reading task (also see Van Atteveldt et al., 2004). Based on the evidence provided within this study and for reasons outlined within the discussion we feel the best explanation for the pattern of results reported in this paper lays with the influence of past reading experience and reading related processes, and not with the lack of full integration of, or access to, the new grapheme-phoneme correspondences.

## ETHICS STATEMENT

This study has been approved by the KULeuven Research Ethics Committee. Written informed consent was obtained from all parents and/or guardians of each participating child included in the study.

## AUTHOR CONTRIBUTIONS

All listed authors contributed to the editing and preparation of this manuscript. JL was the lead author in charge of producing the intel drafts of the manuscript along with conducting and reporting the included data analysis. AD, JV, and MV carried out and supervised the data collection and subject recruitment. MV was responsible for the design and construction of the study and tasks. JW and PG inanition to providing support during the writing process offered guidance and support throughout all stages from design, data collection and analysis of the study.

#### ACKNOWLEDGMENTS

This research was supported by postdoctoral grant of Maaike Vandermosten, funded by Research Foundation Flanders (FWO). Additional support was provided by the DBOF of the Research Council of KU Leuven (KU Leuven-DBOF/12/014) and KU Leuven Research Council OT/12/044.

## REFERENCES


imaging study of dyslexic children. Brain 133, 868–879. doi: 10.1093/brain/ awp308


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Law, De Vos, Vanderauwera, Wouters, Ghesquière and Vandermosten. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Neurochemistry Predicts Convergence of Written and Spoken Language: A Proton Magnetic Resonance Spectroscopy Study of Cross-Modal Language Integration

Stephanie N. Del Tufo1,2,3 \*, Stephen J. Frost <sup>3</sup> , Fumiko Hoeft 3,4, Laurie E. Cutting1,2,3,5,6 , Peter J. Molfese3,7, Graeme F. Mason8,9, Douglas L. Rothman8,10, Robert K. Fulbright 3,8 and Kenneth R. Pugh3,8,11 \*

#### Edited by:

*Jurgen Tijms, University of Amsterdam, Netherlands*

#### Reviewed by:

*Milene Bonte, Maastricht University, Netherlands Chris McNorgan, University at Buffalo, United States*

#### \*Correspondence:

*Stephanie N. Del Tufo stephanie.del.tufo@vanderbilt.edu Kenneth R. Pugh kenneth.pugh@yale.edu*

#### Specialty section:

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

Received: *31 March 2018* Accepted: *30 July 2018* Published: *04 September 2018*

#### Citation:

*Del Tufo SN, Frost SJ, Hoeft F, Cutting LE, Molfese PJ, Mason GF, Rothman DL, Fulbright RK and Pugh KR (2018) Neurochemistry Predicts Convergence of Written and Spoken Language: A Proton Magnetic Resonance Spectroscopy Study of Cross-Modal Language Integration. Front. Psychol. 9:1507. doi: 10.3389/fpsyg.2018.01507* *<sup>1</sup> Department of Special Education, Peabody College, Vanderbilt University, Nashville, TN, United States, <sup>2</sup> Vanderbilt Brain Institute, Vanderbilt University School of Medicine, Nashville, TN, United States, <sup>3</sup> Haskins Laboratories, New Haven, CT, United States, <sup>4</sup> Department of Psychiatry, University of California, San Francisco, San Francisco, CA, United States, <sup>5</sup> Peabody College of Education and Human Development, Vanderbilt University, Nashville, TN, United States, <sup>6</sup> Vanderbilt Kennedy Center, Vanderbilt University, Nashville, TN, United States, <sup>7</sup> Section on Functional Imaging Methods, Laboratory of Brain and Cognition, Department of Health and Human Services, National Institutes of Mental Health, National Institutes of Health, Bethesda, MD, United States, <sup>8</sup> Department of Radiology and Biomedical Imaging, Yale University School of Medicine, New Haven, CT, United States, <sup>9</sup> Department of Psychiatry, Yale University School of Medicine, New Haven, CT, United States, <sup>10</sup> Department of Biomedical Engineering, Yale University School of Medicine, New Haven, CT, United States, <sup>11</sup> Department of Psychological Sciences, University of Connecticut, Storrs, CT, United States*

Recent studies have provided evidence of associations between neurochemistry and reading (dis)ability (Pugh et al., 2014). Based on a long history of studies indicating that fluent reading entails the automatic convergence of the written and spoken forms of language and our recently proposed Neural Noise Hypothesis (Hancock et al., 2017), we hypothesized that individual differences in cross-modal integration would mediate, at least partially, the relationship between neurochemical concentrations and reading. Cross-modal integration was measured in 231 children using a two-alternative forced choice cross-modal matching task with three language conditions (letters, words, and pseudowords) and two levels of difficulty within each language condition. Neurometabolite concentrations of Choline (Cho), Glutamate (Glu), gamma-Aminobutyric (GABA), and N- acetyl-aspartate (NAA) were then measured in a subset of this sample (*n* = 70) with Magnetic Resonance Spectroscopy (MRS). A structural equation mediation model revealed that the effect of cross-modal word matching mediated the relationship between increased Glu (which has been proposed to be an index of neural noise) and poorer reading ability. In addition, the effect of cross-modal word matching fully mediated a relationship between increased Cho and poorer reading ability. Multilevel mixed effects models confirmed that lower Cho predicted faster cross-modal matching reaction time, specifically in the hard word condition. These Cho findings are consistent with previous work in both adults and children showing a negative association between Cho and reading ability. We also found two novel neurochemical relationships.

**89**

Specifically, lower GABA and higher NAA predicted faster cross-modal matching reaction times. We interpret these results within a biochemical framework in which the ability of neurochemistry to predict reading ability may at least partially be explained by cross-modal integration.

Keywords: magnetic resonance spectroscopy (MRS), reading, multisensory, cross-modal, reading disability (RD), developmental dyslexia

## INTRODUCTION

Most children in the United States education system begin the process of learning to read in kindergarten, a process that will continue formally in the classroom until they are 10 or 11 years old. Whereas learning to read requires explicit instruction, the ability to perceive and produce native language typically begins without instruction. Thus, children begin kindergarten with knowledge of their native speech sounds. Despite acquisition differences in listening and reading, it is well established that intact speech perception and production facilitates learning to read (Mattingly, 1971; Liberman, 1973). In fact, fluent reading requires learning the correspondence between letters and speech sounds (Marsh et al., 1981; Frith, 1985). Moreover, associations between auditory and visual letter learning jointly influence each other (Perfetti, 1987). Thus, a central role for learning letterspeech sound associations is highlighted in models of reading development (e.g., Ehri and Wilce, 1985; Share and Stanovich, 1995).

## Importance of Cross-Modal Integration for Reading

Information from different sensory modalities (e.g., visual and auditory inputs) must be integrated, assimilated, and organized as intersensory information. Early studies found that auditoryvisual integration improves with age and, particularly relevant here, is correlated with reading skills (Birch and Belmont, 1965; see Kavale, 1980, 1982 for meta-analyses). Across 31 studies, Kavale (1980) reported a correlation between audio-visual integration and reading ability (r = 0.329, range: 0.025–0.617). In particular, one type of audio-visual integration, that of spoken and written language, has close ties to reading ability. Spokenwritten language integration is often considered a separate, or special, type of intersensory "audio-visual" integration (see van Atteveldt et al., 2007; Froyen et al., 2010). This integration of spoken and written language has been shown at the level of the word (Frost et al., 1988), syllable (Massaro et al., 1988), and letter (see Blomert and Froyen, 2010 for review). Furthermore, letterspeech sound integration is often considered an early indicator of developmental reading outcome (see Blomert, 2011 for review).

Across a spectrum of reading ability, the poorest readers are those with a profound reading disability (RD; often referred to as developmental dyslexia) (see Gabrieli, 2009 for brief review). As previously mentioned, learning to read requires that unfamiliar visual symbols (i.e., letters) be associated with familiar auditory sounds (i.e., speech sounds). Thus, information must be integrated both within (i.e., unimodal or intramodal) and between (i.e., bimodal or intermodal) the auditory and visual sensory modalities. RD is historically characterized by a unimodal impairment, a deficit in phonological awareness: the use, manipulation, and processing of speech sounds (Bradley and Bryant, 1978; Liberman et al., 1989). However, Birch (1962) posited early on that reading impairment could be the result of a bimodal impairment. Specifically, the inability to integrate intersensory information could be indicative of reading impairment. In support of this idea, some studies have found that individuals with RD do struggle with auditory-visual integration (e.g., Birch and Belmont, 1965; Snowling, 1980; Siegel and Faux, 1989). Others have argued that an impairment in phonological awareness leads to an impairment in auditoryto-visual integration (e.g., Frith, 1985). In other words, an individual with a RD cannot adequately learn to perceive speech sounds, making it difficult to establish robust mappings between speech sounds and letter forms. However, children with a RD have shown unimpaired unimodal perception, in the form of auditory-auditory matching (Snowling, 1980; Siegel and Faux, 1989), visual-visual matching (Maurer et al., 2010), and have demonstrated typical letter mastery (Blomert and Willems, 2010). This indicates that unimodal perception, whether auditory or visual, does not encompass the difficulty underlying cross-modal integration. Therefore, cross-modal integration provides unique insight into both unimpaired and impaired reading development.

The development of cross-modal integration begins early. In children with typical reading abilities, electrophysiological responses to printed orthography are seen as early as first grade, when children are typically 6–7 years old (Maurer et al., 2006); this has been suggested to be the beginning of automation of the reading system (Chein and Schneider, 2005). Letter-speech sound associations are quickly learned (Ziegler and Goswami, 2005), and the neural responses accompanying these associations are adult-like by second grade (Maurer et al., 2006). Although automatization of this integrated process extends further into development (Booth et al., 2001; Froyen et al., 2008), to capture the early stages in the process of developing crossmodal neural responses, cross-modal interaction would need to be studied during early elementary school. While typically developing children quickly learn the relationship between processing auditory and visual letters, for poorer readers and children with RD this trajectory is less straightforward. When individuals with a RD were asked to complete cross-modal tasks, it was discovered that they had irregular letter-speech sound integration at the beginning of reading development, which remained irregular into adulthood (Blau et al., 2009, 2010). Perhaps most intriguing, reading disabled children's crossmodal integration was found to decrease over the course of reading instruction suggestive of an entirely different crossmodal development trajectory in typical readers compared to those with reading impairments (see Blomert, 2011 for review).

## Links Between Neurochemistry and Reading

There is a multitude of evidence suggesting a biological basis of reading ability and disability, yet the exact biological mechanisms remain unknown. One particular method that has promise for understanding biological mechanisms is proton (1H) Magnetic Resonance Spectroscopy (MRS), a non-invasive technique used to measure biochemical resonance levels and determine neurometabolite concentrations in vivo. Across developmental disorders, neurometabolites concentrations have been found to vary compared to their typically developing age-matched peers (Perlov et al., 2009; Baruth et al., 2013). Thus far, a limited number of studies have investigated the relationship between neurometabolite concentrations and reading (see Del Tufo and Pugh, 2012 for review). In adults, levels of Choline (Cho) were higher for those with poorer phonological ability (Bruno et al., 2013), and higher in individuals with RD relative to their typical developing peers (Rae et al., 1998; Laycock et al., 2008). See **Table 1**: H1-MRS Findings in Reading and Reading Disability. In an initial study from our group, Pugh and colleagues explored this relationship in emergent readers, establishing that higher levels of Cho measured in a midline occipital region was negatively correlated with children's reading ability. Moreover, Pugh et al. (2014) found that reading skill is negatively correlated with glutamate (Glu), a neurometabolite that is involved in a large number of neuronal metabolic pathways and can be used to explain system excitability.

Hancock et al. (2017) have recently proposed a "Neural Noise Hypothesis of Developmental Dyslexia." The precis of this hypothesis is that increased neural excitability, which leads to neural noise in cortical networks, is a key contributor to RD. In their hypothesis, "neural noise" refers to random variability in neuronal firing. Although the specifics of the underlying biochemical mechanism are not yet fully established, they offer examples of genetic pathways from two highly replicated dyslexia candidate genes (DCDC2 and KIAA0319), both known to affect neural noise. DCDC2 mutations increase neural noise through a direct effect on glutamatergic signaling and hyperexcitability (Meng et al., 2005; as evidenced by Che et al., 2014, 2016). KIAA0319 mutations disrupt neural migration and the formation of local excitatory-inhibitory circuits (Paracchini et al., 2006; Peschansky et al., 2010; Huang and Hsueh, 2015). Hancock et al. (2017) posited that increased neural noise leads to disruptions in neural synchronization and precise neural spike timing. This would in turn lead to impairment in phonological awareness and particularly relevant here, multisensory integration. The hypothesis further predicts that the impairment in multisensory integration may arise from disruptions in visual or auditory sensory areas. However, beyond the dual points of susceptibility (i.e., visual and auditory), multimodal integration and coordination of processing across cortical regions are particularly sensitive to the loss of spike timing precision (Senkowski et al., 2007a,b). In summary, increased neural noise is hypothesized to lead to imprecise orchestration of multisensory information, resulting in disrupted multisensory integration.

## The Current Study

Our overarching goal was to determine the relationship between neurometabolite concentrations and cross-modal integration in emergent readers. Based on the "Neural Noise Hypothesis of Developmental Dyslexia" (Hancock et al., 2017), we hypothesized that diminished multisensory integration would correspond to increased Glu levels–a proximal measure of increased glutamatergic signaling and hyperexcitability. Our full sample of emergent readers (n = 231) completed a behavioral cross-modal matching task. After validating our behavioral cross-modal task we then used a subsample of those participants (n = 70; those that also contributed MRS data) to determine if emergent readers' neurochemistry (Glu, GABA, Cho, NAA) predicted differences in cross-modal matching. Next, we used structural equation modeling (SEM) to determine if the relationship between emergent readers' neurochemistry and reading ability was mediated by their performance on the cross-modal integration task. Finally, followup analyses of our initial SEM mediation model investigated if children's reading ability was driven by specific cross-modal stimuli integration and predicted by specific neurometabolite concentrations.

## MATERIALS AND METHODS

### Participants

Researchers obtained parental informed consent and child assent in compliance with Yale University's Human Research Protection Program. Parental report indicated that all children were native speakers of American English with normal or corrected-tonormal vision, normal hearing, and no history of neurological or mood disorders. All children had a performance intelligence quotient (PIQ) within normal limits. Children were recruited through the Yale Reading Center in order to recruit across a diverse range of reading ability from good-to-impaired. See **Table 2** for participant demographics and descriptive statistics.

Of the 231 participants [132 male, mean (M) age = 8.14 years, standard deviation (SD) = 1.42] who performed the cross-modal matching task, seven participants failed to respond during the cross-modal matching task. An additional two participants failed to complete the word condition. Three additional participants failed to complete the pseudoword condition. The remaining participants all scored above chance on the cross-modal matching task (chance = 50% accuracy). Thus, our full sample analysis of the behavioral cross-modal matching task included 224 children for the letter stimulus condition, 222 children for the word stimulus condition, and 221 children for the pseudoword stimulus condition. Of the full sample of cross-modal matching task participants, a subset of those reported in Pugh et al. (2014) also contributed MRS data. Of those 70 participants [44 male, (M) age = 7.70 years, SD = 0.71], one participant



*NM, Neurometabolite; SVS, small voxel spectroscopy; RD, Reading Disability; TD, Typically Developing; Cho, Choline; Cr, Creatine; NAA, N-Acetylaspartate; GABA, gamma-Aminobutyric acid; Glu, Glutamate; L, left; R, Right.* \**Pediatric readers from the NIH MRI Study of Normal Brain Development (http://pediatricmri.nih.gov, release 5).* §*Follow up assessments took place twenty-4 months post-initial assessment.* \*\**Category Fluency Task: Native Japanese Speakers had 1 min to write down as many Japanese nouns as possible belonging to each category: animal, fruit, and vehicle.*

failed to respond during the cross-modal matching task. Of those 69 children, one child scored below chance in the word stimulus condition, and two children scored below chance on the pseudoword stimulus condition. Therefore, our subsample analyses included 69 children for the letter stimulus condition, 68 children for the word stimulus condition, and 67 children for the pseudoword stimulus condition. Of those 69 subjects, three contributed partial metabolite spectra [Cho (n = 67), Glu (n = 66), GABA (n = 69), and NAA (n = 66)] due to poor spectral quality (see MRS methods below for spectral quality details).

#### Cross-Modal Integration Task

A two-alternative forced choice task was designed to assess auditory-visual cross-modal matching (see **Figure 1**; Shaywitz et al., 2004). During the experiment a picture of an ear appeared in the center of the screen for 1,500 milliseconds (ms), followed by an auditory spoken letter name, word, or pseudoword presented binaurally through headphones (e.g., "B"). The auditory stimulus was then followed 1,000 ms later by two visual target stimuli (e.g., "B" and "T"). The visual stimuli were offset by 10◦ of center to the right and left, respectively. The two visual stimuli remained on the screen until either the child responded, or a period of 4,000 ms had passed. The interstimulus interval (ISI) was 1,000 ms and immediately followed either a response or the 4,000 ms time lapse. The experiment was divided into three blocks, with each stimulus condition (letters, words, and pseudowords) comprising an experimental block.

Children were instructed to respond as quickly as possible by pressing the button that corresponded to the position (right or left) of the visual stimulus that matched the spoken letter name. To avoid fatigue effects, children received a 1-to-2-min

#### TABLE 2 | Demographic and descriptive statistics.


*Sample mean and (standard deviation) are reported. Performance IQ and Full Scale IQ are from the Wechsler Abbreviated Scale of Intelligence (WASI: Wechsler, 1999). Single word and pseudoword reading ability raw scores are reported for both timed (TOWRE: Torgesen et al., 1999) and untimed (Woodcock-Johnson Test of Achievement III; Woodcock et al., 2001) measures.*

break between each experiment block. In the first block children heard and saw letters, in the second they heard and saw words, and in the third they heard and saw pseudowords. To ensure that children understood the task directions, children viewed an instructional flipbook and competed practice items immediately prior to the cross-modal matching task. The flipbook and practice items included trial examples for each of the three conditions and children received feedback as to whether or not they were correct.

presented in three sequential blocks: letters, words, and then pseudowords.

#### Stimuli

The letter block included all 26 English letters as stimuli. The word and pseudoword blocks included consonant-vowelconsonant (CVC) stimuli (see **Appendix A** in Supplementary Material for stimuli). Two conditions were included in each experiment block. The first condition was degree of difficulty. All three blocks contained easier (14 stimulus pairs) and more difficult (14 stimulus pairs) visual stimulus pairs to match. Easy stimulus pairs had no overlap in orthography or phonology (e.g., BAM, ROG). Hard stimulus pairs overlapped in either phonology (letters) or orthography and phonology (words and pseudowords; e.g., BAL, BAF). The second condition was repetition. In each experiment block, stimuli were fully randomized and presented once (first stimulus presentation) and then randomized and presented for the second time (second stimulus presentation).

#### Counterbalancing

Two experiment versions (A and B) were created, for counterbalancing. Matching targets that were a "hard" stimuli pair in one experimental version instead formed an "easy" stimuli pair in the other experiment version. For example, BAL, BAF (a hard pseudoword pair) in one experimental version would become BAL, MOT (an easy pseudoword pair) in the other experimental version. Likewise, matching targets that were an "easy" stimuli pair in one experimental version instead formed a "hard" stimuli pair in the other experiment version. For example, BAL, MOT (an easy pseudoword pair) in one experimental version would become BAL, BAF (a hard pseudoword pair) in the other experimental version. Additionally, the limited set of letter stimuli that are confusable (or not) made counterbalancing impossible for the letter condition. Thus, counterbalancing only applied to the word and pseudoword conditions.

#### Magnetic Resonance Methods

A 4T Bruker Avance Magnetic Resonance system was used to acquire MR spectroscopy. All participants watched a commercially available movie, without sound, to encourage stillness and relaxation. A spin-echo J-editing acquisition sequence (Rothman et al., 1993) was used to measure the metabolite basis signal for all neurometabolites: edited GABA and non-edited Cho, Cr, NAA, and Glu. An H-tuned surface coil (7-cm) was used to increase sensitivity. To position the voxel, gradient echo scout images were acquired (slice thickness 1.5 mm with no gap and a field of view 200 mm, divided into 128 × 128 pixels). The volume of interest was a 3 × 3 × 1.5 cm voxel placed at the midline of the occipital cortex, including the lingual gyrus, calcarine sulcus, and cuneus (see Pugh et al., 2014 for image of voxel placement and spectra). Eckert et al. (2008) and others have shown that this central occipital region correlates with activation in left Heschl's gyrus (Zangenehpour and Zatorre, 2010; Murray et al., 2016). Results that are consistent with anatomical evidence from non-human primates (see van Wassenhove et al., 2012 for review). The water signal was used to calibrate the pulse power for MR spectroscopy.

#### Quantitative T1 Sequences

Rapid inversion-recovery sampling was used to obtain quantitative T1 (Mason and Rothman, 2002), which are optimized for statistical sensitivity (Mason et al., 1997). A B1 map was acquired to correct for surface coil inhomogeneities. Quantitative T1 images were converted to graded segmented images: percentage gray matter, white matter, and cerebrospinal fluid (Mason and Rothman, 2002). Based on the known dimensions and positions of the MRS voxel (see MRS Sequences below), the composition of the MRS voxel was determined from the segmented images as percentage gray matter, white matter, and cerebrospinal fluid (Mason and Rothman, 2002).

#### MRS Sequences

Shimming was performed using FASTERMAP (Shen et al., 1997). The water signal was suppressed via six applications of chemical shift selected sequence (CHESS) using a 1,000 Hz offset swept amplitude pulse. Volume excitation employed a slice selective Shinnar-Le Roux pulse, followed by a 180◦ slice selective pulse. The 3D volume selection was obtained using outer volume suppression and image selected volume spectroscopy. Volume suppression outside of the voxel used an adiabatic full passage pulse in x, y, and z directions. A J-editing sequence (Rothman et al., 1993) was used to acquire the GABA resonance. The subspectra with (and without) editing inversion of the GABA C3 resonance were acquired: 1,024 data points in 410 ms, a 3 s repetition time, and a 68 ms echo time. To eliminate contamination by macromolecules, the DANTE editing pulse, which was placed symmetrically about the refocusing pulse, was applied at 1.89 and 1.31 ppm on alternative 8-scan blocks (Henry et al., 2001). The total acquisition period was 22 min.

#### Neurometabolite Analyses

Linear combination spectral fitting was applied to the subspectrum obtained with the DANTE pulse applied at 1.31 ppm to determine the area of the resonances of Cho, Glu, NAA, and Cr. The unedited subspectrum was fitted using a basis set of metabolite spectra. The fitted metabolites included aspartate, glutamate, glutamine, N-acetyl-aspartate, N-acetyl-aspartylglutamate (NAAG), creatine, phosphocreatine, myoinositol, choline, phosphorylcholine, glycerophosphorylcholine, and scylloinositol. The J-editing acquisition sequence was employed to measure the metabolite basis signals, with the exception of NAA and phosphocreatine, which were simulated. Reported NAA was the combination of N-acetyl-aspartate and NAAG, reported Cr was the combination of creatine and phosphocreatine, and reported Cho was the combination of choline, phosphorylcholine, and glycerophosphorylcholine. Three subjects contributed partial metabolite spectra due to poor spectral quality. Following spectra fitting, a Monte-Carlo analysis was used to assess uncertainties of individual measurements. In the Monte-Carlo analysis the least-squares spectral fits were treated with random Gaussian noise whose standard deviation was equal to that of the raw data and refitted using 20 repetitions to estimate the SDs of the uncertainty for each metabolite measure. No data exceeded the criterion for exclusion—standard deviation greater than three times the average standard deviation for the full set of studies (Valentine et al., 2011).

GABA in the edited subtraction spectrum was analyzed with in house software written in MATLAB (www.mathworks.com). Each free induction decay (FID) was phased-locked using the water FID and frequency aligned using resonance from NAA, Cr, and Cho. Each pair of subspectra (27/experiment) was subtracted to obtain FID of the edited GABA signal, and then apodized. For quality control, sub-spectra pairs were excluded if their difference in GABA spectra showed residual intensity from either Cho or creatine in the subtraction spectrum (absorptive and dispersive), which also minimizes the effects of motion. The remaining spectra were then combined. The area of the GABA resonance at 3 ppm was determined using automated manual integration following automated baseline correction. GABA was determined in each subject. Two methods were used to evaluate macromolecular contamination: metabolite nulling (Behar et al., 1994; Rothman, 1994; Shen et al., 2004) and frequency switching symmetrically about the coupled macromolecular resonance (Henry et al., 2001). Neither method showed evidence of macromolecular contamination of the resonance.

The area of Cr was used as an internal reference, controlling for potential drifts in the spectra during acquisition (Rothman et al., 1993). Glu, Cho, NAA, and GABA are reported as a ratio of their metabolite resonance area relative to the internal Cr reference, as recommended by Rothman et al. (1993).

#### Statistical Analyses

Multilevel mixed effect models were employed using the maximum likelihood estimation (R: https://www.r-project. org, lme4 package: Bates et al., 2015). In all models, subjects were specified as the random intercept. This also controlled for associated intraclass correlation (Pinheiro and Bates, 2000). We employed forward-fitting model procedures to determine the model of best fit using likelihood ratio tests. Additionally, structural equation models of mediation were fitted using the R lavaan package (Rosseel, 2012), which uses a maximum likelihood estimation. Standard errors were calculated using bootstrapping procedures.

#### Multilevel Mixed Effect Modeling Task Effects

In our initial mixed effect model analysis, we validated our cross-modal matching task. In this model all effects and their interactions were tested for improvement in model fit. Following a natural log transformation, there was no evidence of cross modal reaction time (CM-RT), the dependent variable, violating normality across stimulus repetitions [Full sample (n = 224): Bartlett's test K<sup>2</sup> (1) <sup>=</sup> 1.89, <sup>p</sup> <sup>=</sup> 0.17 and Subset sample (<sup>n</sup> <sup>=</sup> 69): Bartlett's test K<sup>2</sup> (1) <sup>=</sup> 0.0013, <sup>p</sup> <sup>=</sup> 0.97]. Due to differences in how the easier and more difficulty stimulus pairs were created within the three stimuli conditions (i.e., letters, words, and pseudowords), we considered the degree of difficulty factor to be nested within each condition. Repetition was included as a crossed factor.

#### Multilevel Mixed Effects Modeling Task Effects Predicted by NT

After validating the effect of our cross-modal matching task, we used a second mixed effect model to investigate if emergent

#### TABLE 3 | Correlations between neurochemical concentration.


*Spearman's pairwise correlations with p-values adjusted for multiple comparison (Holm's method). All neurometabolites are referenced to a Creatine (Cr) baseline. Significance:* \**p* < *0.05,* \*\**p* < *0.01,* \*\*\**p* < *0.001.*

readers' neurochemistry (Glu, GABA, Cho, NAA) predicted differences in cross-modal matching. Prior to inclusion in the models, fixed magnitude correlations were run on the z-scored neurometabolite concentrations of Glu, Cho, NAA, and GABA to determine if the magnitude of the overlap between correlations would require separate models to examine the respective effects of neurometabolite concentration on cross-modal matching. The neurometabolite concentrations did correlate (see **Table 3**: Neurochemical Concentration Correlation) but did not remove one another's unique variable contributions. Thus, neurometabolite concentrations were included as fixed effects in a single model.

#### Structural Equation Modeling Mediation

Mediation analyses using Structural Equation Modeling (SEM) determined whether the relationship between emergent readers' neurometabolite concentration (a latent variable) and reading ability, which has been previously reported in Pugh et al. (2014), was mediated by CM-RT. Mediation models tested if the relationship between neurometabolite concentrations and reading ability was mediated by cross-modal matching, for each cross-modal stimulus condition. Mediation assumes that the mediating variable (CM-RT) causes the outcome variable (reading ability). The initial assumptions of mediation were met (see Baron and Kenny, 1986); namely, (a) neurometabolite concentrations (independent variable: IV) were found to be significantly related to cross-modal integration (mediating variable: MV), and (b) neurometabolite concentrations (IV) were found to be significantly predictive of reading ability (dependent variable: DV). Our prior analysis led us to expect CM-RT (MV) would have individual variation; thus, a latent variable approach to mediation was used (see Hayes, 2009 for review).

#### RESULTS

#### Cross-Modal Matching Task Effects

We examined CM-RT predicted by task effects (i.e., stimulus condition, stimulus repetition, and degree of difficulty) performed by the full sample of children who completed the cross-modal matching task. See **Table 4** for CM-RT by stimulus condition. We remind our reader that this included 224 children for the letter stimulus condition, 222 children for the word stimulus condition, and 221 children for the pseudoword stimulus condition. Multilevel mixed effect models, with subject as a random intercept and stimulus condition as a random slope,



*Includes only those who scored above chance. Sample mean (M) and standard deviation (SD) are reported. Reaction Time was measured in milliseconds (ms).*

were employed to examine individual differences in emergent readers cross-modal matching. CM-RT showed significant variance in intercepts across participants and significant variance in slope across stimulus conditions X 2 (5) <sup>=</sup>1464.4, <sup>p</sup> <sup>&</sup>lt; 0.001. Thus, in addition to subject as the random intercept, stimulus condition was included as the random slope. The best fitting model included the fixed effects: stimulus condition (letters, words, and pseudowords), degree of difficulty (easy and hard stimulus pairs) nested by stimulus condition, and the two-way interaction of stimulus condition by repetition (first and second stimulus presentation) X <sup>2</sup> = 48.52, df = 16 p < 0.001, marginal R <sup>2</sup> = 0.18, and conditional R<sup>2</sup> = 0.89. No improvement in model fit was found for the inclusion of the counterbalanced experiment version (A and B) factor (p = 0.60) nor for the inclusion of the three-way interaction: stimulus condition by repetition by degree of difficulty (p = 0.49).

As expected, there was a significant effect of stimulus condition F(2, 220.39) = 192.74, p < 0.001. Bonferroni post-hoc tests confirmed that CM-RT for the letter stimulus condition was faster than CM-RT for both the word stimulus condition [b = −0.24, SE = 0.017, t(219.94) = 13.99, p < 0.001] and the pseudoword stimulus condition [b = −0.34, SE = 0.02, t(218.95) = 19.49, p < 0.001]. Additionally, CM-RT for the word stimulus condition was faster than for the pseudoword stimulus condition [b = −0.11, SE = 0.011, t(217.95) = 9.50, p < 0.001]. Therefore, as expected, CM-RT was fastest for the letter condition, followed by the word condition, and slowest for pseudowords. Nested within stimulus condition, there was a significant effect of degree of difficulty F(3, 1823.14) = 119.99, p < 0.001. This was driven by faster cross-modal matching CM-RT on the easy stimuli compared to the hard stimuli in all three stimulus conditions: letter [b = −0.02, SE = 0.009, t(1823.14) = 2.22, p < 0.05], word [b = −0.11, SE = 0.009, t(1823.14) = 11.70, p < 0.001], and pseudoword [b = −0.14, SE = 0.009, t(1823.14) = 14.77, p < 0.001] (**Figure 2A**). Thus, across each stimulus condition we see slower CM-RT for the hard stimuli. Repetition was not included as a fixed effect as there was no increase in model fit when repetition was included on its own (p = 0.21), but there was an interaction of stimulus condition by repetition F(2, 1872.41) = 16.39, p < 0.001. The two-way stimulus condition by repetition interaction was driven by faster CM-RT in the letter condition for the first compared to the second letter stimulus presentation [b = 0.05, SE = 0.01, t(1864.87) = 4.71, p < 0.001]. Conversely, faster CM-RT for the second compared

to the first stimulus presentation drove the interaction in the word [b = −0.04, SE = 0.01, t(1879.21) =4.63, p < 0.001], and pseudoword conditions [b = −0.02, SE = 0.01, t(1876.73) = 2.36, p < 0.05] (**Figure 3A**). Thus, we find that during the letter stimulus condition, CM-RT for the first stimulus repetition is faster than the CM-RT for the second stimulus repetition. Conversely, in the word and pseudoword stimulus conditions, CM-RT for the second stimulus repetition is faster than the CM-RT for the first stimulus repetition. The results of the interaction also explain the lack of effect of repetition on its own. The effect of repetition is reversed during letter stimulus condition as compared to the word and pseudoword stimulus conditions. This suggests that children are taking advantage of the effect of stimulus repetition only for the word and pseudoword stimulus conditions.

## Neurometabolite Concentrations Predict Cross-Modal Matching

Next, we examined the effect of neurometabolite concentrations on CM-RT. This analysis included only the subsample—the subset of individuals who scored above chance on our crossmodal matching task and contributed MRS data: Cho (n = 67), Glu (n = 66), GABA (n = 69), and NAA (n = 66). This included 69 children for the letter stimulus condition, 68 children for the word stimulus condition, and 67 children for the pseudoword stimulus condition.

CM-RT showed significant variance in intercepts across participants and significant variance in slope across stimulus conditions X 2 (5) <sup>=</sup> 1349.45, <sup>p</sup> <sup>&</sup>lt; 0.001. The best fitting model included the following fixed effects: stimulus condition (letters, words, and pseudowords), neurometabolite concentrations (GABA and NAA), degree of difficulty (easy and hard stimuli) nested by stimulus condition, the two-way interaction of stimulus condition by neurometabolite concentrations, and finally the three way interaction of Cho concentration by degree of difficulty nested by stimulus condition X <sup>2</sup> = 80.07, df = 25, p < 0.001, marginal R<sup>2</sup> = 0.30, and conditional R<sup>2</sup> = 0.85. As in the previous (full sample) analysis, there was a significant effect of stimulus condition F(2, 64.10) = 55.85, p < 0.001. Bonferroni post-hoc tests confirmed that CM-RT for the letter stimulus condition was faster than CM-RT for both the word stimulus condition [b = −0.21, SE = 0.027, t(62.79) = 7.90, p < 0.001] and the pseudoword stimulus condition [b = −0.32, SE = 0.03, t(59.61) = 10.35, p < 0.001]. There was also a significant effect of degree of difficulty nested within stimulus condition F(3, 499.97) = 26.15, p < 0.001. This was driven by faster CM-RT on the easy stimuli compared to the hard stimuli in the word [b = 0.099, SE = 0.02, t(499.97) = 5.12, p < 0.001] and pseudoword [b = 0.14, SE = 0.02, t(499.97) = 7.21, p < 0.001] stimulus conditions (**Figure 2B**). There was no effect of repetition by itself, nor was there an increase in model fit for the interaction of stimulus condition by repetition (**Figure 3B**).

There was an effect of GABA [F(1, 65.82) = 10.39, p < 0.01] and NAA [F(1, 66.30) = 8.62, p < 0.01] on CM-RT, where lower GABA and higher NAA concentrations predicted faster CM-RT (**Figure 4**). Moreover, there was a significant two-way interaction of stimulus condition by GABA [F(2, 62.04) = 3.57, p < 0.05]. The two-way interaction of stimulus condition by GABA was driven by the word stimulus condition [b = 0.08, SE = 0.033, t(61.40) = 2.53, p < 0.05] (**Figure 5**). These interactions again provide evidence, at least for the word condition, that lower

GABA and higher NAA concentrations predict faster CM-RT. Additionally, there was a three-way interaction of stimulus condition by degree of difficulty by Cho [F(3, 499.97) = 2.86, p < 0.05]. This interaction was significantly driven by the word [b = 0.04, SE = 0.019, t(161.97) = 2.28, p < 0.05] condition, but not the pseudoword (p = 0.85) or letter (p = 0.93) stimulus conditions (**Figure 6**). Therefore, CM-RT in the hard word condition was faster for children with lower concentrations of Cho.

## Cross-Modal Matching Mediates the Effect of Neurometabolite Concentrations

Given that neurometabolite concentrations have previously been shown to have a negative association with reading ability (Pugh et al., 2014), we employed a mediation approach using SEM to test whether this relationship between neurometabolite concentration and reading ability was statistically mediated by cross-modal integration. Three latent variables were

created. The latent variable (1) Reading Ability (RA) predicted Word Identification and Word Attack subtest scores (WJ-III: Woodcock et al., 2007), as well as Sight Word Efficiency and Phonemic Decoding Efficiency subtest scores (TOWRE: Torgesen et al., 1999). The latent variable (2) CM-RT predicted repetition 1 and repetition 2. Finally, the latent variable (3) neurometabolite predicted concentrations of Cho, Glu, GABA, and NAA. In addition, degree of difficulty was included as a categorical variable predicting CM-RT. The mediation model was a good fit [Maximum Likelihood X <sup>2</sup> = 93.09, CFI = 0.937, RMSEA = 0.114 (90% CI: 0.083, 0.145), SRMR = 0.082; Robust (R.) Maximum Likelihood X <sup>2</sup> = 91.426, R.CFI = 0.938, R.RMSEA = 0.113 (90% CI: 0.082, 0.144); and the scaling factor for the Yuan-Bentler correction was 1.018]. Specifically, the initial assumptions of mediation were met. There was a direct effect of increased neurometabolite concentration on slower CM-RT (path a: b = 0.61, SE = 0.26, z = 2.31, p < 0.05), as well as an effect of more difficult stimuli leading to slower CM-RT (b = 0.41, SE = 0.20, z = 1.98, p < 0.05). There was also a direct effect of faster CM-RT leading to better reading performance (path b = −0.47, SE = 0.09, z = 5.28, p < 0.001). When the indirect pathway was included (b = −0.29, SE = 0.14, z = 2.05, p = 0.040), the direct pathway from neurometabolite

concentration to reading ability (path c: b = −0.76, SE = 0.32, z = 2.39, p = 0.017) was no longer significant (path cprime: p = 0.23). The statistical mediation model confirms that individual differences in neurometabolite concentration influenced CM-RT, which in turn influenced reading ability.

#### Mediation Analyses by Stimulus Condition

standard error of the mean (SEM) for each difficulty condition.

Mediation models were then used to examine if the relationship between neurometabolite concentration and reading ability was significantly influenced by CM-RT stimulus condition. Two latent variables were created. The latent variable (1) Reading Ability (RA) predicted Word Identification and Word Attack subtest scores (WJ-III: Woodcock et al., 2007), as well as Sight Word Efficiency and Phonemic Decoding Efficiency subtest scores (TOWRE: Torgesen et al., 1999). The latent variable (2) neurometabolite predicted concentrations of Cho, Glu, GABA, and NAA. Degree of difficulty was included as a categorical variable predicting CM-RT.

#### **Letter condition**

The mediation model for the letter condition was a very good fit [Maximum Likelihood X <sup>2</sup> = 57.59, CFI = 0.97, RMSEA = 0.073 (90% CI: 0.040, 0.104), SRMR = 0.042, the R. Maximum Likelihood X <sup>2</sup> = 56.88, R.CFI = 0.97, R.RMSEA = 0.073 (90% CI: 0.039, 0.104); and the scaling factor for the Yuan-Bentler correction was 1.013]. For the letter condition, there was a direct effect of faster CM-RT leading to better reading performance (path b: b = −0.34, SE = 0.08, z = 4.09, p < 0.001). However, there was no direct effect of neurometabolite concentration on CM-RT (path a: p = 0.30), nor was there an effect of stimulus difficulty (p = 0.85). Therefore, the mediating role of cross-modal matching was not driven by the letter condition.

#### **Word condition**

The mediation model for the word condition was a good fit [Maximum Likelihood X <sup>2</sup> = 71.91, CFI = 0.96, RMSEA = 0.092 (90% CI: 0.063, 0.122), SRMR = 0.053, the R. Maximum Likelihood X <sup>2</sup> = 70.63, R.CFI = 0.96, R.RMSEA = 0.091 (90% CI: 0.062, 0.121); and the scaling factor for the Yuan-Bentler correction was 1.02]. For the word condition, the initial assumptions of mediation were met. There was a direct effect of faster CM-RT for words leading to better reading performance (path b: b = −0.51, SE = 0.08, z = 6.66, p < 0.001). There was a direct effect of increased neurometabolite concentration on slower CM-RT (path a: b = 0.50, SE = 0.18, z = 2.80, p < 0.01), but only a trending effect of more difficulty stimuli leading to slower CM-RT for words (p = 0.086). When the indirect pathway was included (path ab: b = −0.26, SE = 0.097, z = 2.67, p = 0.008) the direct pathway from neurometabolite concentration to reading ability (path c: b = −0.76, SE = 0.32, z = 2.39, p = 0.017) was no longer significant (path c-prime: p = 0.53), indicating full statistical mediation.

#### **Pseudoword condition**

The mediation model for the pseudoword condition was a good fit [Maximum Likelihood X <sup>2</sup> = 69.90, CFI = 0.96, RMSEA = 0.090 (90% CI: 0.060, 0.119), SRMR = 0.057, the R. Maximum Likelihood X <sup>2</sup> = 69.60, R.CFI = 0.96, R.RMSEA=0.90 (90% CI: 0.060, 0.119); and the scaling factor for the Yuan-Bentler correction was 1.004]. In the pseudoword condition, there was an effect of more difficult stimuli leading to slower CM-RT (b = 0.41, SE = 0.16, z = 2.22, p < 0.05). There was a direct effect of faster CM-RT leading to better reading performance (path b: b = −0.45, SE = 0.095, z = 4.78, p < 0.001). However, there was no direct effect of neurometabolite concentration on CM-RT (path a: p = 0.24). This suggests that the mediating role of cross-modal matching was not driven by the pseudoword condition.

#### Word Mediation Analyses by Neurometabolite

SEM mediation models were then used to investigate if reading ability mediated by the effect of word CM-RT was predicted by specific neurometabolites (Glu and Cho), which have previously been linked to reading ability. Only one latent variable was included. The latent variable (1) Reading Ability (RA) predicted Word Identification and Word Attack subtest scores (WJ-III: Woodcock et al., 2007), as well as Sight Word Efficiency and Phonemic Decoding Efficiency subtest scores (TOWRE: Torgesen et al., 1999). As in the previous mediation model, degree of difficulty was included as a categorical variable predicting CM-RT.

#### **Word CM-RT mediates the relationship between Glu and reading ability**

The mediation of the relationship between reading ability and Glu by word CM-RT was a good fit [Maximum Likelihood X <sup>2</sup> = 27.86, CFI = 0.98, RMSEA = 0.098 (90% CI: 0.050, 0.146), SRMR = 0.033, the R. Maximum Likelihood X <sup>2</sup> = 26.73, R.CFI = 0.98, R.RMSEA = 0.096 (90% CI: 0.047, 0.146); and the scaling factor for the Yuan-Bentler correction was 1.042]. There was a direct effect of faster CM-RT for words leading to better reading performance (path b: b = −0.51, SE = 0.08, z = 6.66, p < 0.001). There was a direct effect of increased Glu concentration on slower CM-RT (path a: b = 0.24, SE = 0.09, z = 2.80, p < 0.01), but only a trending effect of more difficult stimuli leading to slower CM-RT for words (p = 0.086). When the indirect pathway was included (path ab: b = −0.12, SE = 0.046, z = 2.65, p = 0.008) the direct pathway from Glu concentration to reading ability (path c: b = −0.22, SE = 0.072, z = 3.00, p = 0.003) was no longer significant (path c-prime: b = −0.10, SE = 0.07, z = 1.31, p = 0.19) (**Figure 7A**). This indicates that faster cross-modal CM-RT for words mediated the relationship between reading ability and Glu.

#### **Word CM-RT mediates the relationship between Cho and reading ability**

Additionally, the mediation of the relationship between reading ability and Cho by word CM-RT was also a good fit [Maximum Likelihood X <sup>2</sup> = 38.33, CFI = 0.97, RMSEA = 0.126 (90% CI: 0.083, 0.172), SRMR = 0.036, the R. Maximum Likelihood X <sup>2</sup> = 37.51, R.CFI = 0.97, R.RMSEA = 0.125 (90% CI: 0.081, 0.172); and the scaling factor for the Yuan-Bentler correction was 1.022]. There was a direct effect of faster CM-RT for words leading to better reading performance (path b: b = −0.51, SE = 0.08, z = 6.66, p < 0.001). There was a direct effect of increased Cho concentration on slower CM-RT (path a: b = 0.19, SE = 0.09, z = 1.99, p < 0.05), but no effect of stimuli difficulty leading to slower CM-RT for words (p = 0.094). When the indirect pathway was included (path ab: b = −0.10, SE = 0.053, z = 1.84, p = 0.066) the direct pathway from Cho concentration to reading ability (path c: b = −0.16, SE = 0.084, z = 1.90, p = 0.06) was no longer marginally significant (path c-prime: p = 0.147) (**Figure 7B**). This indicates that faster CM-RT for words fully mediated the relationship between reading ability and Cho.

### Percent Correct

Overall, children's performance was very high [full sample (n = 224): percent correct = 0.944, SD = 0.10 and subsample (n = 69): percent correct = 0.940, SD = 0.10]. **Table 5** includes task percent correct by condition for both the full sample and the subsample.

#### Percent Correct (Full Sample)

A non-parametric Friedman test was employed to compare the total percent correct for the four measures (easy-repetition1,

between Glu:Cr and reading ability. (B) Cross modal reaction time significantly mediates the relationship between Cho and reading ability. Significance: \**p* < 0.05, \*\**p* < 0.01, \*\*\**p* < 0.001.

TABLE 5 | Cross modal task accuracy by stimulus condition.


*The total number includes only those participants who scored above chance. Sample mean (M) and standard deviation (SD) are reported. Accuracy is the percent correct.*

hard-repetition1, easy-repetition2, and hard-repetition2) of each stimulus condition in the full sample of participants. This included 224 children for the letter stimulus condition, 222 children for the word stimulus condition, and 221 children for the pseudoword stimulus condition. There was a significant difference between the four measures of each stimulus condition [Friedman X<sup>2</sup> (11) <sup>=</sup> 326.7, <sup>p</sup> <sup>&</sup>lt; 0.0001]. We then investigated if differences were due to stimulus condition using an adjusted critical alpha (0.05/3 = 0.016). Significant differences were found on the four measures of the word [X 2 (3) <sup>=</sup> 141.38, <sup>p</sup> <sup>&</sup>lt; 0.0001] and pseudoword [X 2 (3) <sup>=</sup> 100.3, <sup>p</sup> <sup>&</sup>lt; 0.0001] stimulus conditions, but not the letter stimulus condition (p = 0.32). Post-hoc tests were carried out to determine if differences were due to repetition or degree of difficulty within each stimulus condition (0.05/12 = 0.00416). In the word stimulus condition, there was a significant difference between the hard and easy word stimuli on both the first repetition [X 2 (1) <sup>=</sup> 82.14, <sup>p</sup> <sup>&</sup>lt; 0.0001] and the second repetition [X 2 (1) <sup>=</sup> 60.19, <sup>p</sup> <sup>&</sup>lt; 0.0001]. There was no difference due to repetition of the easy words (p = 0.17) or the hard words (p = 0.58). The same was true for pseudowords, where there was a significant difference between the hard and easy pseudoword stimuli on both the first repetition [X 2 (1) <sup>=</sup> 55.19, <sup>p</sup> <sup>&</sup>lt; 0.0001] and the second repetition [X 2 (1) <sup>=</sup> 42.82, <sup>p</sup> <sup>&</sup>lt; 0.0001]. Again, there was no difference found due to repetition of the easy pseudowords (p = 0.30) or the hard pseudowords (p = 0.18). Thus, for both the word and pseudoword condition, differences in percent correct during cross-modal matching were driven by degree of difficulty.

#### Percent Correct (Subsample)

A non-parametric Friedman test was employed to compare the total percent correct for the four measures (easy-repetition1, hard-repetition1, easy-repetition2, and hard-repetition2) of each stimulus condition in the subsample of participants. This included 69 children for the letter stimulus condition, 68 children for the word stimulus condition, and 67 children for the pseudoword stimulus condition. There was a significant difference between the four measures of each stimulus condition [Friedman X<sup>2</sup> (11) <sup>=</sup> 72.04, <sup>p</sup> <sup>&</sup>lt; 0.0001]. As in the prior full sample analysis, we investigated if differences were due to stimulus condition using an adjusted critical alpha (0.05/3 = 0.016). Significant differences were found on the four measures of the word [X 2 (3) <sup>=</sup> 20.55, <sup>p</sup> <sup>&</sup>lt; 0.001] and pseudoword [<sup>X</sup> 2 (3) = 34.17, p < 0.0001] stimulus conditions, but not the letter stimulus condition (p = 0.26). As in the prior full sample analysis, posthoc tests were carried out to determine if differences were due to repetition or degree of difficulty within each stimulus condition (0.05/12 = 0.00416). In the word stimulus condition, there was a significant difference between the hard and easy word stimuli on both the first repetition [X 2 (1) <sup>=</sup> 18.67, <sup>p</sup> <sup>&</sup>lt; 0.0001] and the second repetition [X 2 (1) <sup>=</sup> 7.54, <sup>p</sup> <sup>&</sup>lt; 0.01]. There was no difference due to repetition of the easy words (p = 0.65) or the hard words (p = 0.32). The same was true for pseudowords, where there was a significant difference between the hard and easy pseudoword stimuli on both the first repetition [X 2 (1) <sup>=</sup> 18.69, <sup>p</sup> <sup>&</sup>lt; 0.0001] and the second repetition [X 2 (1) <sup>=</sup> 14.24, <sup>p</sup> <sup>&</sup>lt; 0.001]. Again, there was no difference found due to repetition of the easy pseudowords (p = 0.51) or the hard pseudowords (p = 0.86). Therefore, we see that even in our subsample, differences in percent correct during cross-modal matching were driven by degree of difficulty.

## DISCUSSION

We asked first-grade children to complete a cross-modal matching task with three language stimulus conditions (letter, word, and pseudoword. Degree of difficulty (hard and easy stimuli) and stimulus repetition (first presentation and second presentation) were included as nested and crossed factors, respectively. As in previous multimodal studies, there was a high degree of individual variability on the cross-modal matching task. After accounting for individual differences in cross-modal performance, we found that CM-RT was the fastest for letters, followed by words, and slowest for pseudowords. CM-RT was faster in the easy words and pseudoword conditions; yet, no difference was found between the hard and easy letter condition. There was also an effect of repetition in the word and pseudoword stimulus conditions in the full sample, but the effect was not robust enough to be found in the subsample. Taken together, this indicates that by first grade prior knowledge of cross-modal matched stimulus pairs is already supported by information specific to real words (e.g., semantic information).

### Glu

The primary aim of our investigation was two-fold. First, we aimed to determine if neurochemical concentration predicted individual differences in readers' cross-modal integration. Second, given that neurometabolite concentrations have previously been shown to have a negative relationship with reading abilities (Bruno et al., 2013; Pugh et al., 2014), we aimed to provide insight into possible ways that cross-modal integration might influence the reationship between reading ability and neurometabolite concentration. Hancock et al. (2017) proposed that RD is the result of increased neural excitability, which leads to neural noise in cortical networks. Our colleagues suggested that a result of increased neural noise would be impairment in multisensory integration, due to robust multisensory encoding requiring that stimuli be spatially congruent and temporally synchronous (Meredith et al., 1987; Meredith and Stein, 1996; Kadunce et al., 1997). More specifically, Hancock et al. (2017) suggested that random and excessive variability in neuronal firing would lead to disruptions in neural synchronization and precise neural spike timing. This imprecision in synchronization would lead to impairments in multisensory integration. Our findings revealed that decreased Glu (our proximal measure of increased glutamatergic signaling and hyperexcitability) was associated with slower CM-RT, which was in turn associated with diminished reading performance. Interpreting our results in the framework of Hancock et al. (2017), this finding suggests that increased neural noise, due to increased glutamatergic signaling, corresponds to decreased multimodal integration, and thus lessened reading ability.

## GABA

The emergence of multisensory integration relies on GABA circuit maturation (Allman et al., 2008; Gogolla et al., 2014; Balz et al., 2016). Evidence from animal studies suggests that reorganization of the GABAergic system in early development is what leads to impaired or unimpaired multisensory integration (Gogolla et al., 2014). We found that a lower concentration of GABA in the visual cortex predicted faster cross-modal matching. This indicates that a lower GABA concentration allows children to quickly integrate and match auditory and visual stimuli, resulting in faster CM-RT. In MRS studies, increased GABA is often found to indicate increased performance on speeded tasks (Boy et al., 2010). However, this is typically achieved through increased motor inhibition, which results in slower reaction time (Stagg et al., 2011a,b). Our cross-modal findings are in fact consistent with recent evidence from Nakai and Okanoya (2016), who reported that lower GABA predicted increased reading fluency in the left (but not right) inferior frontal gyrus (IFG). In their study, reading fluency was assessed by having adults quickly write down nouns belonging to a category (e.g., fruit).

GABA's role in metabolite energetics is often tightly coupled with Glutamate (Patel et al., 2005; Ramadan et al., 2013). Sensory stimuli neural encoding time windows are tightly linked to neural excitability, and this excitation triggers a shadowing period of inhibition. It is during this period of inhibition that sensory input is integrated prior to the next excitatory neuronal spike. This largely explains why multimodal integration and coordination across cortical regions are particularly sensitive to the loss of spike timing precision, due to their occurrence over a restricted time window (Senkowski et al., 2007a,b). Greater GABA concentration leads to more selective cortical tuning, resulting in greater perceptual acuity (Kolasinski et al., 2017). Likewise, GABA correlates with unisensory visual perception and is highly predictive of individual performance (Edden et al., 2009). Therefore, our result of decreased GABA leading to faster CM-RT likely allows children to more quickly integrate already learned cross-modal stimulus pairs, but that this may not be possible for readers that require more perceptual acuity or time to encode and differentiate sensory information. In other words, less proficient readers are more likely to require increased GABA to support cross-modal matching performance, while our skilled first-grader readers are proficient enough that increased GABA is not necessary for perceptual acuity and likely hinder their reaction time.

GABA negatively correlates with the functional Magnetic Resonance Imaging (fMRI) blood-oxygen-level dependent (BOLD) signal (Northoff et al., 2007; Donahue et al., 2010). Cross-modal deactivation has been reported in fMRI, but deactivation was not present during paired stimulus presentations (Laurienti et al., 2002). This is supported by evidence from in vivo whole-cell recordings, where Iurilli et al. (2012) found cross-modal influence of the auditory cortex on inhibitory GABAergic circuits in the primary visual cortex. Moreover, GABA contributes to the generation of gamma band oscillations (e.g., Traub et al., 2003; Bartos et al., 2007). During maturation, GABA signaling is a powerful regulatory mechanism of parvalbumin (PV) cell innervation patterns (Chattopadhyaya et al., 2007; Gogolla et al., 2014). Inhibition of PV interneurons suppressed gamma oscillations, while excitation of PV interneurons generates emergent gammafrequency rhythmicity (Sohal et al., 2009). The rate of gamma oscillations is also highly predictive of multisensory integration (Kaiser and Lutzenberger, 2005; Hipp et al., 2011) and is abnormally fast in RD (Lehongre et al., 2011, 2013). Intriguingly, gamma-frequency modulation of excitatory input was found to enhance signal transmission through output to PV interneurons and reduce neural circuit noise (Sohal et al., 2009). This mechanism likely accounts for the recent finding that GABA mediates the relationship between gamma band oscillation and audiovisual integration (Balz et al., 2016).

Terhune et al. (2014) reported that GABA concentrations in the motor and visual cortex were independent of each another. In our study, the MRS spectra was collected from a voxel on the midline of the occipital cortex. Thus, in considering the relationship between lower GABA and cross-modal matching speed, it is also critical to consider the GABA plays a nuanced role that is dependent upon brain location. The single voxel spectroscopy location is a limitation of the current study; indeed, current studies are underway to examine more holistically the role that neurochemistry plays in reading ability (Pugh and Hoeft, 2017).

## Cho

Consistent with previous research findings that Cho negatively predicted reading ability (e.g., Bruno et al., 2013; Pugh et al., 2014 from the NIH MRI Study of Normal Brain Development: http:// pediatricmri.nih.gov, release 5), we found that Cho predicted cross-modal word matching speed; specifically, lower Cho predicted faster cross-modal matching for hard words. Moreover, decreased Cho was associated with faster CM-RT in the word stimulus condition, which was in turn associated with better reading performance. The Cho signal measured in proton MRS corresponds largely to glycerophosphocholine (GPCho), phosphocholine (PCho), and free choline (Miller, 1991). These compounds are products and building blocks for membrane metabolism, and have been proposed to function in the osmotic regulation of cell volume, as well as support cell proliferation and differentiation (Brenner et al., 1993; Jackowski, 1994; Kwon et al., 1995). Proton MRS measures of Cho have been associated with myelination (Laule et al., 2007), neurodegeneration or inflammation due to membrane/phospholipid turnover (Roser et al., 1995), as well as cellular density (Miller et al., 1996).

Bruno et al. (2013) suggested that decreased Cho is linked to phonological processing. This conclusion was based on adult neurochemical concentrations of Cho, accounting for a additional variance in phonological decoding of pseudowords, beyond word reading. The authors "tentatively [suggest that this] indicates some specificity for the negative relationship between Cho and phonological decoding." However, they also reported that word and pseudoword decoding showed the same relationship with Cho, but that the relationship was more robust for pseudoword decoding. Moreover, they report a moderate degree of overlap in the variance accounted for by word and pseudoword decoding. Based on these findings Bruno et al. (2013) suggested some alternative explanations for the results of their study. One such explanation was that difficulty of the linguistic stimuli may be what drives these differences. In keeping with this alternative explanation, our work similarly suggests that difficulty of the linguistic stimuli likely plays a role in the relationship to Cho. Thus, we hesitate to make any claims regarding specific linguistic constructs, but instead suggest that degree of difficulty of the linguistic stimuli likely drives differences in Cho concentration between letter, word, and pseudoword CM-RT.

## NAA

NAA is a marker of neuronal viability and is considered to be a neurochemical correlate of neuron-oligodendrocyte (axon-myelin) integrity (Moffett et al., 2007; Paslakis et al., 2014; Xu et al., 2016). NAA has been reported to correspond to measures of diffusion weighted imaging (Caprihan et al., 2015). Here, higher concentrations of NAA predicted faster cross-modal matching. Individual developmental differences in cross-modal brain activation has been found to correspond to connectivity in the arcuate fasciculus (Gullick and Booth, 2014). Our interpretation of these findings is that a more intact white matter reading network likely corresponds to higher NAA. There are now several studies that have linked measures of the integrity of the left arcuate fasciculus to reading skill (Yeatman et al., 2011) and longitudinal reading change (Gullick and Booth, 2015). Future work examining individual differences in the white matter reading network and longitudinal changes in reading development may benefit from investigating the corresponding role played by NAA.

## CONCLUSION

In summary, this work provides supporting evidence of the Neural Noise Hypothesis of Developmental Dyslexia (Hancock et al., 2017), and allows us to better understand the role of neurochemistry in reading disability. Specifically, this work shows that Glu and Cho concentrations influence cross-modal matching, which in turn effects reading ability. This study is the first to demonstrate a direct relationship between individual differences in cross-modal matching and emergent readers' GABA and NAA neurochemical concentrations. Further, this work links behavioral studies of multisensory phonological and orthographic integration and reading performance with pediatric Magnetic Resonance Spectroscopy (MRS) studies.

## ETHICS STATEMENT

This study was carried out in accordance with the recommendations of Yale University's Human Research Protection Program. The protocol was approved by the Yale University's Human Research Protection Program. All participants' parents provided written informed consent, while children gave written assent, in accordance with the Declaration of Helsinki.

## AUTHOR CONTRIBUTIONS

SD, SF, PM, GM, DR, RF, and KP designed and performed the research. SD, PM, and GM analyzed the data. GM, DR, and RF contributed unpublished reagents and, analytic tools. SD wrote the manuscript with assistance from SF, FH, LC, PM, GM, DR, RF, and KP.

## FUNDING

This study is supported by NICHD P01 HD001994 (PI: Carol Fowler) and R01 HD086168 (PIs: KP and FH) to Haskins Laboratories, and by NICHD R01 HD048830 (PI: KP) to Yale University. The 4T MR system was partially purchased by a generous gift from the W. M. Keck Foundation.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2018.01507/full#supplementary-material

#### Del Tufo et al. Neurochemistry Predicts Cross-Modal Integration

#### REFERENCES


proton spectroscopic imaging study. Neuropsychopharmacology 40, 2248–2257. doi: 10.1038/npp.2015.72


**Conflict of Interest Statement:** GM is a consultant for Sumitomo Dainippon Pharma Co. Ltd. and UCB Pharma SA, and serves on the Scientific Advisory Board of Elucidata Inc.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Del Tufo, Frost, Hoeft, Cutting, Molfese, Mason, Rothman, Fulbright and Pugh. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Performance in Sound-Symbol Learning Predicts Reading Performance 3 Years Later

Josefine Horbach<sup>1</sup> \*, Kathrin Weber<sup>2</sup> , Felicitas Opolony<sup>2</sup> , Wolfgang Scharke<sup>1</sup> , Ralph Radach<sup>3</sup> , Stefan Heim4,5 and Thomas Günther1,6

<sup>1</sup> Child Neuropsychology Section, Department of Child and Adolescent Psychiatry, Psychotherapy and Psychosomatics, Medical School, RWTH Aachen University, Aachen, Germany, <sup>2</sup> Department of Neurology, Medical Faculty, RWTH Aachen University, Aachen, Germany, <sup>3</sup> Allgemeine und Biologische Psychologie, Bergische Universität Wuppertal, Wuppertal, Germany, <sup>4</sup> Department of Psychiatry, Psychotherapy and Psychosomatics, Medical Faculty, RWTH Aachen University, Aachen, Germany, <sup>5</sup> Research Centre Jülich, Institute of Neuroscience and Medicine (INM-1), Jülich, Germany, <sup>6</sup> Faculty of Health, Zuyd University, Heerlen, Netherlands

To master the task of reading, children need to acquire a coding system representing

#### Edited by:

Jurgen Tijms, University of Amsterdam, Netherlands

#### Reviewed by:

Sabine Heim, Rutgers University, The State University of New Jersey, United States Thomas Lachmann, Technische Universität Kaiserslautern, Germany

> \*Correspondence: Josefine Horbach jhorbach@ukaachen.de

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 26 June 2018 Accepted: 24 August 2018 Published: 12 September 2018

#### Citation:

Horbach J, Weber K, Opolony F, Scharke W, Radach R, Heim S and Günther T (2018) Performance in Sound-Symbol Learning Predicts Reading Performance 3 Years Later. Front. Psychol. 9:1716. doi: 10.3389/fpsyg.2018.01716 speech as a sequence of visual symbols. Recent research suggested that performance in the processing of artificial script that relies on the association of sound and symbol may be associated with reading skill. The current longitudinal study examined the predictive value of a preschool sound-symbol paradigm (SSP) of reading performance 3 years later. The Morse-like SSP, IQ, and letter knowledge (LK) was assessed in young preschool children. Reading outcome measures were examined 3 years later. Word reading, pseudoword reading, and reading comprehension were predicted with age, IQ, LK, and SSP. The results showed that SSP substantially predicted reading fluency and reading comprehension 3 years later. For reading fluency measures, the influence of further predictor variables was not significant and SSP served as a sole predictor. Reading comprehension was best explained by SSP and age. The amount of variance SSP explained in reading 3 years later was remarkably high, with an explained variance between 63 and 82%, depending on the outcome reading variable. SSP turned out to be a substantial predictor of later reading performance in a language with statistically reliable spelling-to-sound relations. As LK is highly dependent on educational support, we assume that children in our socioeconomically diverse sample did not have much opportunity to acquire LK in their home environment. In contrast, the SSP challenges students to acquire new spelling-to-sound relations, simulating a core aspect of natural reading acquisition. Future work will test this paradigm in less transparent languages like English and explore its potential as a future standard assessment in the study of early reading development.

Keywords: predictors of reading, sound-symbol learning, longitudinal study, letter knowledge, dynamic test

## INTRODUCTION

The current longitudinal study investigates the predictive value of the performance of preschoolers in a sound-symbol paradigm (SSP) on later reading achievement. The paradigm is based on an earlier study of Horbach et al. (2015) which found SSP to better predict later reading in six-year-old monolingual kindergarteners over and above the established predictors phonological awareness

(PA), rapid automatized naming (RAN), short-term memory (STM), and environmental factors. Due to the simplistic design of SSP, our study is able to assess SSP's predictive capacity for younger preschool children.

The ability to read is crucial for participation in our society. Reading difficulties start early in childhood and tend to persist throughout reading development (Cunningham and Stanovich, 1997; Landerl and Wimmer, 2008). Such difficulties can substantially limit academic performance and career choices (Esser et al., 2002). Therefore, it is important to improve diagnostic tools in order to identify and prevent risk for reading difficulties in children as early as possible.

To master the task of reading, children need to acquire a coding system representing speech as a sequence of visual symbols (Ziegler and Goswami, 2005). Necessary processes for this are first, to learn the association between sound and symbol and second, to serially process the learned correspondences. Recent studies examining the role of sound-symbol learning in reading have used paradigms that require the serial processing of newly-learned visual–verbal correspondences and assessed the relation of performance on these tasks with reading ability. Aravena et al. (2013) developed an artificial orthography and demonstrated that normal readers performed better than students with dyslexia in serial application of the newly-learned sound-symbol associations. Interestingly, normal readers differed from dyslexic readers in serial processing of new letter names, even though they did not differ in their knowledge of the new letter names themselves. In a further study, the authors showed that a 20 min training on the artificial orthography was enough to differentiate dyslexic from non-dyslexic readers (Aravena et al., 2017).

Participants in the studies of Aravena et al. (2013, 2017) already had several years of reading experience when tested. In a study of Horbach et al. (2015), the predictive power of a Morselike SSP was assessed in monolingual kindergarteners without the experience of formal reading instruction. This task was designed to simulate the process of learning to read schematically. First, children learned to associate verbal sounds with graphical symbols, similar to a classical paired-associate learning task. Afterward, children had to recall strings of the newly-learned correspondences, similar to Morse-code. The children learned only two associations to keep the influence of phonological processing and working memory load as low as possible. The authors found that SSP predicted word reading one year later in non-readers over and above PA, verbal STM, and RAN. A group of children were able to read before they received formal reading instruction. In these early readers, SSP did not predict reading in first grade but so did early reading performance measured in kindergarten. It was concluded that SSP simulates the process of learning to read and is therefore especially appropriate for young preliterate children. Gellert and Elbro (2017) replicated these findings using a similar paradigm of artificial decoding in kindergarten children for the prediction of reading in the first grade. The children had to learn three sound-symbol pairs and blend them into new words. Their study found the test predicted reading significantly after controlling for several standard predictors. The authors suggested the learning aspect of the task is essential for the prediction of initial reading development.

Some years before, the authors demonstrated a further advantage of SSPs in the prediction of reading (Elbro et al., 2012); the measurement is language independent. From a global perspective, multilingualism is normality (Riehl, 2014). In 2016, 38% of children under an age of 10 had a migration background in Germany (Statistisches Bundesamt, 2017). Therefore, diagnostic instruments are needed which circumvent the influence of language skills on predictor variables. Elbro et al. (2012) found that their measure of artificial decoding was able to discriminate dyslexic from non-dyslexic adult secondlanguage learners.

Against this background, the current study aimed to assess whether SSP measured at the young age of 4–5 years predicts reading performance 3 years later. As an auxiliary question, it was tested whether multilingual children differ in SSP performance from monolingual children.

## MATERIALS AND METHODS

#### Sample and Procedures

At the first measurement time point (T1), 56 preschool children (34 female: 17 multilingual, 17 monolingual; 22 male: 12 multilingual and 10 monolingual) took part. All multilingual children had exposure to the German language for at least 2 years. It was ensured that all children understood the instructions. Children were aged between 4.01 and 5.99 (M = 5.00; SD = 0.50). At T1, children were tested individually in a quiet room of their day-care center. The SSP, letter knowledge (LK), and non-verbal intelligence (IQ) were assessed (Weber et al., 2014).

Three years later, 17 children were retested (10 girls, 11 multilingual). At the time of retesting, the children were in first (n = 4), second (n = 11), and third (n = 2) grades, respectively. The testing took place individually at each child's home. Reading fluency and reading comprehension were tested. As a further control variable, non-verbal IQ was additionally measured at T2.

This study was conducted in accordance with the Declaration of Helsinki. Informed written consent was obtained from all parents of participants. The study was approved by the Ethics Committee at the Medical Faculty of RWTH Aachen University.

#### Instruments

#### T1: Measures in Preschool

#### **SSP**

This task was a computer-based version based on an existing paper–pencil task of Köhn and Voß (unpublished thesis) and was described in Horbach et al. (2015) as follows: the task was designed to simulate the reading process schematically. The first part of the task was a learning phase, similar to a classical PALtask, where the children learn to associate verbal sounds with graphical symbols. It was followed by a second learning phase. The main part of the task was the test phase which required the serial application of the newly-learned correspondences. To keep the influence of phonological processing and working memory

load as low as possible, the children learned only two associations (**Figure 1**).

#### **Learning phase 1**

The task started with a voice introducing two symbols: a dot " r " and a dash "**—**". Each symbol was presented separately on the screen and the voice explained that the dot is called /ta/ and the dash is called /ma:/. The children were instructed to name the symbols. Stimuli were presented on a 23-in TFT display in a fixed order. If the child responded correctly, he/she received positive feedback ("yes, this was /ta/"), and the next trial appeared. If the child's response was incorrect, the experimenter provided negative and corrective feedback (e.g., "no, this was /ma:/"), and the trial was repeated. Due to this repetition, the exposure to both stimuli was individual to each child. The task was performance sensitive in that children only reached the next learning phase after passing through a minimum of 10 correctly-solved trials. Performance was assessed as the percentage of correctly solved trials.

#### **Learning phase 2**

To prepare the children for the following test phase, they had to name the recently learned symbols in a string of two symbols (e.g., visual stimulus: " r **—**" correct response: "/ta ma:/"). Again, feedback was provided and at least 10 items had to be solved correctly (abort criterion max. 20 trials). Performance was assessed as the percentage of correctly solved trials.

#### **Test phase**

The test phase required the serial application of the newly-learned correspondences. Twelve trials with three or four symbol strings were presented in the same way as in the learning phase, except that feedback was no longer given (six trials for each string length). The correlation between performance on three and four symbol strings was high (r = 0.72). Since all analyses showed the same patterns for three and four symbols, the two scores were combined. The items of the task had high internal consistency (Cronbach's alpha = 0.87).

#### **Non-verbal intelligence measure**

Non-verbal IQ was measured using Raven's Colored Progressive Matrices (CPMs; Bulheller and Häcker, 2002). The CPM is designed to measure the child's reasoning ability, which is referred to as general IQ.

#### **Letter knowledge (LK)**

In an individual letter naming task, the children were asked to name all 26 upper case letters of the German alphabet. These were presented in a random order on a white sheet of paper. One point was given for each correctly pronounced letter. Both letter names and letter sounds were possible answers.

## T2: Measures 3 Years Later

#### **Reading fluency**

Reading performance was measured using a standardized word reading fluency test, the Salzburg Reading and Spelling Test (SLRT-II; Moll and Landerl, 2010). The SLRT-II test measures



SSP, sound-symbol paradigm; LK, letter knowledge.

reading speed and accuracy of words and pseudowords within a 1 min reading fluency task. The sum of correctly-read words and pseudowords was measured.

#### **Reading comprehension**

fpsyg-09-01716 September 12, 2018 Time: 15:35 # 4

The standardized reading comprehension test ELFE 1-6 (Lenhard and Schneider, 2006) was used to assess reading comprehension on word, sentence, and text level. Word reading comprehension requires the child to decide which word out of four fits best to a given image. Sentence comprehension requires the child to choose one of four words that fits best into a given sentence. On text level, small stories had to be read and questions had to be answered. The cumulated z-score of all three subtests was used to score reading performance.

#### **Non-verbal intelligence measure**

Non-verbal IQ was measured with the short version of CFT1-R (Weiß and Osterland, 2013).

## RESULTS

**Table 1** shows the performance in predictor measures assessed at T1. High dropout from T1 to T2 occurred for several reasons (participants moved, declined participation, or could not be contacted). Thirty percent of the participants could be retested. In order to determine whether the reduced sample at T2 significantly differed from the full sample at T1, one-sample t-tests were computed for SSP, LK, IQ, and age at T1 with the respective mean of each variable at T1 as test value. The reduced sample assessed at T2 did not differ in its performance in the predictor measures SSP, LK, and IQ (**Table 1**) from the reference values. Chi-squared tests revealed that the distribution of sex was nearly equal, with 39% boys at T1 and 41% boys at T2 [χ 2 (1) = 0.03, p = 0.854]. At T2, also the proportion of multilingual children was comparable to T1 [χ 2 (1) = 1.90, p = 0.168]. These results suggest that the group of children participating at T2 was a random sample from the initial group at T1 and, as a consequence, that there was no systematic dropout between T1 and T2.

## Performance on Predictor Measures at T1

In the first learning phase of SSP, the children responded accurately nearly 80% of the time. In the second learning phase, 70% of the response was accurate. In the test phase where complexity of the task grew and feedback was no longer given, the children responded with an average of 30–40% accuracy.

Concerning LK floor effects were observed. On average, the children of this young age were only able to identify 3.77 letters out of 26.

Forty percent of the children reached an IQ value below average. This was not unexpected, given the fact that children were recruited in regions with relatively low socioeconomic status. It was assured that all children understood instructions.

In order to compare the performance of monolingual and multilingual children on the different levels of the SSP a twoway repeated measures ANOVA (Greenhouse-Geisser corrected) was performed, with task level as within-subject factor and group as between-subject factor. A significant main effect of task level, F(2.10,113.37) = 85.84, p < 0.001 was observable. This showed that across groups, performance decreased as task complexity increased (**Figure 2**). No significant main effect of group [F(1,54) = 0.71, p = 0.402] indicated that multilingual and monolingual children in general performed similarly.

## Prediction of Reading Performance 3 Years Later

Monolingual and multilingual children did not differ in their performance of SSP at T1, and group sizes were small. Therefore,


TABLE 2 | Pearson correlations between predictor measures (T1) and outcome measures (T2).

SSP, sound-symbol paradigm; LK, letter knowledge; word fluency, word reading fluency (SLRT II); pseudoword fluency, pseudoword reading fluency (SLRT II); comprehension, reading comprehension (ELFE 1-6); p<sup>∗</sup> < 0.05, p∗∗ < 0.01.

the longitudinal prediction analyses were performed over the total sample.

Sound-symbol paradigm measured at T1 was a strong correlate of reading performance 3 years later (word reading fluency r = 0.86, p < 0.001; pseudoword reading fluency r = 0.81, p < 0.001; reading comprehension r = 0.82, p < 0.001). LK did not correlate significantly with 3 year later reading, presumably due to floor effects of LK. IQ (T1) was a moderate correlate of word fluency (r = 0.60, p = 0.011) and pseudoword fluency (r = 0.54, p = 0.030) measured at T2. For all correlations, see **Table 2**.

Linear regression models with SSP, IQ, LK, and age as predictors were computed for each reading outcome variable (**Table 3**). Applying a threshold p-value of 0.10, non-significant predictors were removed in order to find the best model in terms of fit and parsimony for each variable. Adjusted R <sup>2</sup> was used as a method of cross-validation. Scatterplots of the final models are shown in **Figure 3**.

Model 1: For word reading fluency as dependent variable, SSP was the only significant predictor and explained a variance of 71% [F(1,15) = 40.647, p < 0.001]. IQ, LK, and age did not contribute to the final model.

Model 2: The analysis with pseudoword fluency as a dependent variable revealed the same pattern. Again, SSP was the unique significant predictor. The explained variance was 63% [F(1,14) = 26.091, p < 0.001].

Model 3: In the third analysis, SSP and age contributed significantly to the variance of reading comprehension. Eightytwo percent of the variance in reading comprehension is explained by the model [F(2,14) = 36.462, p < 0.001].

In order to show the robustness of the models, a second way of cross-validation was applied. The bivariate Pearson coefficients of the correlations were compared between the predicted value and the dependent variable of a randomly selected 60% subsample with a 40% subsample. For the first model with the dependent variable "word reading," the correlation of 60% subsample is r = 0.878, p < 0.001 and of the 40% subsample r = 0.870, p = 0.024. Similar patterns are found for the second and third models with pseudoword reading and reading comprehension as dependent variables. For pseudoword reading, the correlations were r = 0.805, p = 0.005 and r = 0.893, p = 0.017. For reading comprehension, the correlations were r = 0.962, p = 0.002 and r = 0.910, p < 0.001. The high and nearly equal correlations suggest that the models are robust.

In order to find out whether the prediction of SSP is specific to reading or unspecific, i.e., as well predictive for general cognitive abilities, a further regression analysis was conducted. IQ measured at T2 (IQT2) served as dependent variable and SSP was included as predictor. SSP explained with 35% a significant albeit smaller amount of variance in non-verbal IQT2 (β = 0.63, p = 0.007) as in reading variables.

#### DISCUSSION

This longitudinal study aimed to determine the predictive value of a Morse-code like SSP assessed in preliterate preschool children, aged 4–5, of reading performance 3 years later.

The results showed that SSP substantially predicted reading fluency and reading comprehension 3 years later. For reading fluency measures, the influence of further predictor variables (age, IQ, and LK) was not significant and SSP served as a sole predictor. Reading comprehension was best explained by SSP and age. The finding that SSP contributed considerably lower to the variance of non-verbal IQT2 as to the variance of reading is consistent with a specific prediction effect on reading. The amount of variance SSP explained in reading 3 years later was remarkably high, with an explained variance



Predictor variables included in each initial model were age, letter knowledge, IQ, and sound-symbol paradigm (SSP); p∗∗ < 0.01, p∗∗∗ < 0.001.

between 63 and 82%, depending on the outcome reading variable. We suggest this prediction is that accurate because SSP challenges students to acquire completely new sound-symbol relations, which simulates a core aspect of natural reading acquisition. Good or poor performance of SSP may result from stable or instable association of sound-symbol pairs in the learning part of SSP. This corresponds to the hypotheses of Blomert and Willems (2010) that letter speech sound binding plays a causal role in learning to read. In line with this hypotheses and our findings, Karipidis et al. (2018) showed an artificial letter training predicts reading. Furthermore, they demonstrated that neural underpinnings are significantly related to later reading performance.

Previous studies that used paradigms comparable to SSP in preliterate children found smaller effect sizes; however, additional predictor variables were used and reading was predicted only 1 year later (Horbach et al., 2015: R <sup>2</sup> = 0.36, Gellert and Elbro, 2017: R <sup>2</sup> = 0.55). There was also a stronger correlation between SSP and reading performance observed (r = 0.80 to r = 0.86) compared to Horbach et al. (2015), using the same paradigm (r = 0.36). A possible explanation is that SSP works especially well with the currently addressed age group of 4–5-year-old children. The paradigm is designed to be learned easily, because the children have to learn only two sound-symbol associations. After the learning phase, they simply have to string the sound of the displayed symbols together. They are not required to blend phonemes into another as it is required in the paradigm of Gellert and Elbro (2017). Maybe these low demands make the paradigm especially useful for young children.

SSP's ability to predict later reading is partly due to its dynamic nature. A dynamic test aims to measure a child's potential to learn, in contrast to static assessments (e.g., PA, RAN, and LK), which measure the current attainment of the child (Lidz, 1983, 1996). Also previous studies demonstrated the superiority of dynamic measures in comparison to static assessments in the prediction of reading (Petersen et al., 2016; Gellert and Elbro, 2017, 2018). As reading acquisition is a learning process, it seems obvious that paradigms which include the learning aspect can explain an extra amount of variance in reading additionally to a specific cognitive demand of the predictor measure. Furthermore, a dynamic measure avoids the problem of the influence of environmental support, which is always a limitation of static measures (Petersen et al., 2016).

In the current study, the static assessment of LK did not contribute to the explained variance of reading, although it is regarded as one of the strongest predictors of reading before formal reading instruction starts (Scarborough, 1998; Hammill, 2004). We assume that children in our socioeconomically diverse sample did not have much opportunity to acquire LK in their home environment. But this deficit does not implicate a disorder in later reading. Children who have limited literacy experience due to weak socioeconomically background are at risk of being

overdiagnosed with a learning disability (Artiles et al., 2002). Also, LK may play a more important role in older children. Most studies that identify LK as important predictor assessed older children in their last kindergarten year, i.e., children are aged six on average. The 4- and 5-year-old children of the current sample were rarely familiar with letters. Therefore, floor effects could also have led to a poor predictive value in our study. The problem of floor effects in early pre-reading measures is well known (Catts et al., 2008). The advantage of SSP is that children are learning the associations directly in the test situation, so it is independent from pre-knowledge, age, or educational support. In line with the findings of Gellert and Elbro (2017), we conclude that the learning aspect of SSP is an essential part in the task and, therefore, leads to the strong predictive value of reading.

A further question of the study was whether SSP is appropriate for multilingual children. In many of Germany's day-care centers, children of various origins grow up together. A method that is equally suitable for monolingual and multilingual children allows a fair, language-independent assessment. Second-language learner often shows linguistic delays compared to monolingual children (Schwippert et al., 2008). They have fewer opportunities to build up sufficient language skills in the environment language compared to monolingual children. It is, therefore, not surprising that children with migration background scored significantly lower in the language dependent measures PA and RAN than monolingual children (Weber et al., 2007). It was also found that PA did not contribute to the prediction of reading in secondlanguage learners, whereas it was the strongest predictor in monolingual children (Duzy et al., 2013). Hence, it is unclear whether these language-dependent abilities predict reading in multilingual children as reliable as in monolingual children or the use of those language-dependent predictors leads to false risk diagnoses (for an overview, see Cline and Shamsi, 2000). This problem could be avoided by using the language-independent measures like the SSP task. Elbro et al. (2012) showed in adult second-language learners that the performance in their dynamic measure of decoding was able to differentiate dyslexic from nondyslexic readers. In this line, the current study demonstrated that SSP performance of monolingual and multilingual children was comparable. No differences were detected in learning the new sound-symbol pairs or in serial processing. Thus, the language independent nature of the task makes it as appropriate for multilingual children.

#### Limitations of the Current Study

The high dropout after 3 years at T2 led to a small sample size. The comparison of the reduced sample at T2

#### REFERENCES


with the full sample at T1 showed no significant differences in any measured characteristics. Thus, it is reasonable to assume that no differential attrition took place. Nevertheless, generalization of the results should be avoided. Although substantial effects of the prediction analyses can be observed even though the sample size is relatively small, in further studies with bigger sample sizes, the predictive value of SSP in monolingual and multilingual children should be differentiated.

The common predictor measures PA and RAN were not used at T1 in order to avoid the problem that multilingual children are confronted with language-dependent measures. Beside this, no control condition has been implemented. Therefore, this study cannot speak to the specificity of SSP. Previous studies showed that comparable paradigms share variance with PA and RAN but also contribute uniquely to the variance of reading performance (Horbach et al., 2015; Gellert and Elbro, 2017). The overall explained variance of reading in previous studies being smaller, although more predictor measures were included, suggests the specific part of the explained variance contributed by SSP is relatively high.

## CONCLUSION

The present study extended the findings of current literature that SSPs can predict reading to the young age group of 4–5 year-old preschoolers. Future work will test this paradigm in less transparent languages like English and explore its potential as a future assessment in the study of early reading development.

## AUTHOR CONTRIBUTIONS

JH, TG, and SH designed the study. FO and KW performed the experiments and recruited the participants. JH and FO performed the calculations. WS helped to analyze the data. JH wrote the manuscript. RR, TG, and SH supervised the study. All authors provided critical feedback and commented on the manuscript.

## FUNDING

This research was funded by the German Research Foundation (DFG, Deutsche Forschungsgemeinschaft, Grant GU-1177/1-1 and GU-1177/1-3).

letter-speech sound training. J. Learn. Disabil. doi: 10.1177/0022219417715407 [Epub ahead of print].



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Horbach, Weber, Opolony, Scharke, Radach, Heim and Günther. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Longitudinal Task-Related Functional Connectivity Changes Predict Reading Development

#### Gregory J. Smith<sup>1</sup> \*, James R. Booth<sup>2</sup> and Chris McNorgan<sup>1</sup>

<sup>1</sup> Department of Psychology, State University of New York at Buffalo, Buffalo, NY, United States, <sup>2</sup> Department of Psychology and Human Development, Vanderbilt University, Nashville, TN, United States

Longitudinal studies suggest developmentally dependent changes in lexical processing during reading development, implying a change in inter-regional functional connectivity over this period. The current study used functional magnetic resonance imaging (fMRI) to explore developmental changes in functional connectivity across multiple runs of a rhyming judgment task in young readers (8–14 years) over an average 2.5-year span. Changes in functional segregation are correlated with and predict changes in the skill with which typically developing children learn to apply the alphabetic principle, as measured by pseudoword decoding. This indicates a developmental shift in the proportion of specialized functional clusters is associated with changes in reading skill and suggests a dependency of reading development on changes of particular neural pathways, specifically decreases in transitivity is indicative of greater network integration. This work provides evidence that characteristics of these pathways, quantified using graph-theoretic metrics, can be used to predict individual differences in reading development.

Keywords: functional connectivity, functional magnetic resonance imaging (fMRI), graph theory, reading network, reading development, longitudinal, neural connectivity

## INTRODUCTION

Reading is a multisensory task requiring audiovisual integration of printed characters and speech sounds. This entails coordinated contributions of multiple functionally specialized brain regions implicated in reading, coined the reading network, including the ventral occipito-temporal cortex (vOT), middle and superior temporal gyri (MTG, STG), inferior frontal gyrus (IFG), angular gyrus (AG), and inferior parietal lobule (IPL), connected by arcuate fasciculus (AF), inferior frontooccipital fasciculus (IFOF), and other dense white-matter tracts that facilitate communication among these regions (Paus et al., 1999; Pugh et al., 2001; Marchina et al., 2011). Analysis of blood-oxygen-level dependent (BOLD) signal in developing readers shows changes in cortically distributed processes at these regions over time predict changes in reading fluency (McNorgan et al., 2011). Together, these findings suggest an important role of connectivity in reading performance that presumably entails changes in interregional connectivity.

A body of research provides evidence for developmentally dependent diffuse structural and functional changes in processing within the reading network for developing readers. Notably, considerable research provides evidence for a developmental shift from phonological to orthographic processing in developing readers (Sprenger-Charolles et al., 2003; Booth et al., 2007; Blomert, 2011). Reading fluency is dependent on changing anatomical connectivity between

#### Edited by:

Silvia Brem, Psychiatrische Klinik der Universität Zürich, Switzerland

#### Reviewed by:

Mohamed L. Seghier, Emirates College for Advanced Education, United Arab Emirates Tzipi Horowitz-Kraus, Cincinnati Children's Hospital Medical Center, United States

> \*Correspondence: Gregory J. Smith gjsmith4@buffalo.edu

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 31 March 2018 Accepted: 30 August 2018 Published: 19 September 2018

#### Citation:

Smith GJ, Booth JR and McNorgan C (2018) Longitudinal Task-Related Functional Connectivity Changes Predict Reading Development. Front. Psychol. 9:1754. doi: 10.3389/fpsyg.2018.01754

some of these regions: diffusion tensor imaging (DTI) of white matter tract passing through core multisensory hub (AF) predicted cortical activity during an audiovisual phonological judgment task (Gullick and Booth, 2014), with the direct segment of AF predicting longitudinal changes in reading development (Gullick and Booth, 2015). This shift is presumably accompanied by changes in how regions supporting phonological and orthographic processing communicate with each other during reading. Indeed, this is supported by recent research using a rhyming judgment task that shows a decrease in dorsal stream connectivity, responsible for phonological processing, is important for the emergence of ventral stream dependent orthographic processing (Younger et al., 2017). Comparison of task-related functional connectivity show disparate regional activation for children and adults, with children showing greater temporal functionality associated with phonological processing, and adults a greater task-related sensitivity with reliance on occipital regions associated with orthographic processing (Liu et al., 2018), which provides convergent functional evidence in favor of developmentally dependent changes across the reading network.

Connectivity studies further show that the nature of the developmental changes to reading network are related to reading outcomes. Greater dorsal striatum connectivity is associated with poorer adult reading performance, as measured using a pseudoword decoding task, which may reflect inefficient lexical processing (Achal et al., 2016). This is consistent with research showing less efficient processing (e.g., a high degree of integrated functional connectivity across a more diffuse network) within regions associated with a narrative comprehension task for children with reading difficulties compared to task-dependent regions of normal readers (Horowitz-Kraus et al., 2016). Wholebrain functional connectivity analysis for English as a second language (ESL) individuals indicates more localized clusters for an English phonological rhyming task in second language reading impaired children compared to normal controls (Liu et al., 2016). In an analysis of a functionally defined reading network, similar to that in the present study, Wang et al. (2013) found that greater interaction between distal hierarchically segregated network clusters is associated with better rhyming judgment task performance. Together, these results suggest that language processing benefits from cooperative integrative processing among processing centers in the reading network.

Collectively, these findings suggest improvements in reading skill are predicted by the nature and degree of changes among connectivity patterns within the reading network and are consistent with general theories of neural development that implicate synaptic pruning and myelination as a mechanism that optimizes processing efficiency by reducing noise believed associated with excess connectivity (Huttenlocher, 1979; Tau and Peterson, 2010; Navlakha et al., 2015). Not all cognitive processes are likely to depend equally on the same structural connections between cortical and subcortical regions; however, changes in anatomical connectivity should nonetheless be reflected in changes in functional connectivity, which describes patterns of interregional signaling during a functional MRI task. The present study capitalizes on this dynamic relationship between structural and functional connectivity by exploring how measures of taskrelated functional connectivity explain individual differences in reading ability, and thus suggest critical connectivity-related mechanisms on which reading development relies.

Our longitudinal study aims to identify the changes in patterns of functional connectivity among core nodes in the reading network that are predictive of developmental changes in single word reading skill. The literature discussed above suggests several potential connectivity-dependent mechanisms within the reading network that are predictive of reading skill. In isolation, each of these findings provide insight into the dependence of reading on the development of particular constituents of this network. Though the existing literature clearly shows that connectivity strength among some of these regions is an important determinant of reading skill, we use graph-theoretic metrics that quantify how, rather than how much, the global reading network is connected, to test the hypothesis that reading skill depends on the degree to which these anatomically distributed and functionally specialized processing centers work in concert. Single word reading skill was assessed for the analyses that follow using the Test of Word Reading Efficiency (TOWRE) of pseudoword decoding efficiency (PDE) subtest. Performance for word and non-word reading has been shown to be highly correlated (Barker et al., 1992); however, pseudoword decoding is argued to be a more reliable measure of reading development (Curtis, 1980; Gough, 1983) for early readers because unknown words are equivalent to nonwords and require an online integration of orthographic and phonemic representations that is influenced less by memorized vocabulary (Seidenberg et al., 1994). Moreover, McNorgan et al. (2011) found that developmental changes in PDE were predicted by changes in the reading network over the same period, establishing the sensitivity of this measure of reading skill to developmental changes in neural dynamics. In this work, the authors noted a developmental shift in the reliance on predominantly orthographic processing (e.g., sight word recognition) to interactions between phonological and orthographic processing regions as skilled readers develop.

We quantify changes in network connectivity characteristics over time using task-related fMRI data collected at two time points from young readers (aged 8–14 and 10–17 years, respectively). We use task-related fMRI acquired while participants engage in rhyming judgments of visually presented words, which is expected to engage the neural systems that underlie orthographic and phonological processing and the interactions between these two processes. This choice, over resting state MRI (rs-MRI) reflects an interest in changes in the dynamics of neural activity related to reading, rather than general changes in brain organization. Moreover, because we are interested in network-wide changes in connectivity, we use global measures of functional connectivity rather than seed-based correlational approaches, such as psychophysical interaction (PPI) analyses that measure connectivity to and from a specific region. Interregional functional connectivity matrices are derived from cross-correlated time series drawn from task-sensitive cortical regions for concatenated runs at both time points. Graph-theoretic metrics are applied to

the connectivity matrices to quantify changes in regional specialization in terms of functional segregation, which reflects specialized processing clusters within interconnected networks, and integration, a class of measures quantifying the synthesis of distributed information processing. Because reading entails the transformation of visually acquired orthographic representations into phonological representations, requiring the interaction of anatomically disparate and functionally specialized processing regions, these network measures were selected because they quantify this coordination and specialization of processing. These developmental changes in connectivity patterns that collectively represent the presence of clusters and the propensity of information transfer are then used to predict changes in reading skill, as assessed using standardized testing of pseudoword decoding at both time points (Torgesen et al., 1999). The body of literature discussed above suggests that more integrative processing within the reading network is associated with better reading performance, and this increased integrative processing is hypothesized to be reflected in changes in functional connectivity patterns. Accordingly, we use graph-theoretic connectivity measures to quantify not just the overall strength of connections within the reading network, but also the propensity of the constituent processing nodes within this network to function interdependently versus with relative independence. Longitudinal changes in these measures of functional connectivity that reflect a transition to more interdependent processing are predicted to be associated with greater improvements in reading skill.

## MATERIALS AND METHODS

## Participants

Nineteen right hand dominant native English speakers age 8– 14 years (10 females) at the first scanning session (T1), and age 10–17 years at the second scanning session (T2) participated from the Chicago metropolitan area in accordance with the Institutional Review Board of Northwestern University. The average interval between T1 and T2 scans was 2.5 years. Participant screening (described below) ensured that all children had average to above-average verbal and non-verbal intelligence scores. All children were reported by their parents to be free of neurological diseases or psychiatric disorders, a history of intelligence, reading, attention or oral language deficits, and were not taking medication affecting the central nervous system.

## Standardized Testing

A battery of standardized tests was administered at T1 and T2 to screen participants and establish reading skill for the analyses that follow. Participant characteristics are summarized in **Table 1**. Verbal and non-verbal (performance) IQ was assessed with the Wechsler Abbreviated Scale of Intelligence (WASI; Wechsler, 1999). Reading-related phonological processing was assessed using the elision and blending words (BW) subtests from the Comprehensive Test of Phonological Processing (CTOPP; Wagner et al., 2009), and participant skill in identifying known words was measured with the TOWRE (Torgesen et al., 1999) sight word efficiency (SW) and PDE subtests and the Woodcock-Johnson III (WJ-III; Woodcock et al., 2001) Letter-Word Identification (WID) subtest. Raw PDE scores were used as our measure of reading skill because our previous work (McNorgan et al., 2011) found this measure to be sensitive to longitudinal changes in absolute (rather than peer-relative) reading ability.

## Rhyming Judgment Task

The rhyming judgment task was carried out in the MRI scanner, and required participants to make rhyming judgments during lexical trials for 96 word pairs (24 for each run) presented visually in the center of a screen. Words were presented for 800 ms followed by a 200 ms inter-stimulus interval (ISI). Immediately after the second word was presented, a red fixation cross was used to signal for a response during the next 2600 ms; responses outside of that time window were counted as errors. Participants were instructed to press a button with their index finger for rhyming pairs and a button with their middle finger for nonrhyming pairs using an optic response keypad. Orthography and phonology was manipulated such that word pairs matched or conflicted on one or both dimensions, resulting in a total for four conditions: in two conditions both orthography and phonology matched (O+P+, e.g., dime–lime) or conflicted (O-P-, e.g., staff–gain). Orthography matched while phonology conflicted (O+P-, e.g., pint–mint) in a third condition, and phonology matched while orthography conflicted (O-P+, e.g., jazz–has) in the fourth condition. Response time (RT) and accuracy (ACC) were recorded for each trial.

Baseline data were collected with 24 fixation trials during each run that required participants to press a button indicating the change in color (red to blue) of the fixation cross displayed on the center of a screen. The experiment included 12 trials during each run of a perceptual condition that served as a baseline for an unrelated study. For these trials, two sets of three non-alphabetic glyphs were sequentially presented. The sets of glyphs were increasing, decreasing or constant in height, and the participants indicated via button press whether the glyph sequences were matching (e.g., both increasing in height) or mismatching (e.g., one decreasing, the other increasing in height).

## Procedure

Following the obtainment of written informed consent and standardized testing, and in advance of the fMRI scan, participants were taught to maintain head position to minimize head movement using infrared tracker feedback in a practice mock scan. Participants also underwent a practice rhyming task in a mock scanner to ensure that they understood the task. The T1 scanning session took place within 1 week of the practice session. T1 and T2 scanning sessions occurred approximately 21/<sup>2</sup> years apart, with the precise interval used as a nuisance regressor in the analyses that follow.

## MRI Data Acquisition

Images were acquired using a 3T Siemens Trio MRI scanner with a standard 16-channel head coil at the Northwestern University Center for Translational Neuroimaging. Foam padding was used to reduce head movements. BOLD functional images


TABLE 1 | Means and standard deviations for verbal IQ (VIQ), CTOPP measures elision and blending words (BW), TOWRE sight word efficiency (SW), and WJ-III letter-word identification (WID).

were acquired in an interleaved sequence from bottom to top using echo planar imaging (EPI) with the following parameters: TE = 20 ms, flip angle = 80◦ , matrix size = 128 × 120, field of view = 220 mm × 206.25 mm, slice thickness = 3 mm (0.48 mm gap), number of slices = 32, and TR = 2000 ms. Scanning sessions consisted of two functional runs, each approximately 6:44 in duration. A high-resolution T1 weighted 3D structural image was acquired using the functional image orientation with the following parameters: TR = 1570 ms, TE = 3.36 ms, matrix size = 256 × 256, field of view = 240 mm, slice thickness = 1 mm, and number of slices = 160.

#### Image Analysis

Imaging data were analyzed using the FreeSurfer (version 5.1.0<sup>1</sup> ) software suite (Fischl, 2012). The anatomical volume was segmented into white matter and gray matter volumes and rendered as 3D structural surface mesh. Anatomical regions were partitioned using an automated parcellation of the gyri and sulci and mapped onto a template cortical surface mesh (Destrieux et al., 2010). Functional images were co-registered with the 3D anatomical surface for each subject and mapped onto a common structural template for group analysis with a voxel size of 3 mm<sup>3</sup> .

We conducted a subject-level general linear model (GLM) analysis of the four lexical, fixation, and perceptual conditions in the template surface space, including motion parameters as nuisance regressors. Separate GLM analyses were carried out for each study time point. A contrast between lexical and fixation conditions evaluated the main effect of the lexical conditions for each subject. Group-level random effects analyses evaluated the lexical vs. fixation contrast at each study time point across all participants. Significant activations in the group-level analyses were thresholded (p = 0.05) using voxel-wise false discovery rate (FDR) correction to isolate significant clusters of activation demonstrating greater lexical vs. fixation activity, as shown in **Figure 1** and **Table 2**.

The group-level conjunctions of clusters demonstrating a significant positive lexical vs. fixation activation contrast at both time points were used to create functional regions of interest (fROIs) for the subsequent connectivity analyses. In this way, we identified a stable core orthographic decoding network – that is, regions of the cortex that were reliably more active during the lexical condition at both study time points. Notably, the clusters appearing in the intersection of the T1 and T2 contrast maps varied in size and often spanned multiple anatomical region. Because larger regions, or those spanning multiple anatomical landmarks are more likely to include functionally distinct populations, we increased distinctiveness and homogeneity among our fROIs in a two-step process. In the first step, we projected the conjunction map on the FreeSurfer template surface and overlaid the anatomical region boundaries from the Desikan-Killiany atlas (Desikan et al., 2006). Clusters spanning multiple anatomical regions were partitioned at the atlas-defined anatomical region boundaries to produce subclusters that were restricted to single anatomical regions. In the second step, we further subdivided each of the subclusters into regions of roughly equal size, using the FreeSurfer mris\_divide\_parcellation utility, which divides a surface-based region perpendicular to its longest axis to produce roughly equally sized subdivisions covering a specified area. All subclusters were subdivided in this way to produce 99 fROIs, each covering approximately 400 mm<sup>2</sup> of cortical surface (**Figure 2**). Thus, the functionally defined reading network comprised 99 nodes of approximately equal size, were demonstrably more involved in the phonological decoding of printed words than a baseline task at both time points and were individually restricted to single anatomical regions to increase functional homogeneity within each node.

## Functional Connectivity Analysis

The BOLD time series, averaged across voxels contained within the surface space described by each functional fROI, was extracted for both functional runs at each of the T1 and T2 time points (four time series matrices in total). For each of time points T1 and T2, the Run 1 and Run 2 time series were concatenated to yield a single time series for each of T1 and T2. For each time point T1 and T2, inter-regional functional connectivity was estimated by the pairwise zero-lag cross-correlations between each fROI to produce two 99 × 99 connectivity matrices, **M**, where each entry **M**ij provides the estimated functional connectivity between regions i and j. We computed common measures of functional integration and segregation within the functional connectivity matrices at Time 1 and Time 2 using the Brain Connectivity Toolbox (BCT; Rubinov and Sporns, 2010).

Functional integration reflects a networks capacity for information transfer and was evaluated using a measure of Global Efficiency (GE) and Diameter. GE represents the reciprocal of the path length between nodes, with higher GE indicative of a network comprised of clusters connected via shorter paths allowing for more efficient processing. Diameter indicates the longest path between any two nodes after computing the shortest path between any two nodes, which reflects the

Frontiers in Psychology | www.frontiersin.org

<sup>1</sup>http://surfer.nmr.mgh.harvard.edu

breadth of the network. Functional segregation was measured by Transitivity and Modularity. Functional segregation, as the name implies, is the division of a larger network into smaller functionally specialized sub-networks. Functional subnetworks are evident among a cluster of nodes with activity that is less correlated with that of the rest of the network, suggesting that those nodes are engaged in a different processing task (Fornito et al., 2012). Transitivity measures the fraction of a node's neighboring connections that are also connected with each other to form cliques. A high degree of transitivity is indicative of a dense interconnected network, or densely connected clusters within a network, whereas a low degree of transitivity is indicative of a sparser network organization with less functional specialization. Modularity is an indicator of the degree to which the network may be subdivided into clearly delineated and non-overlapping groups. A higher measure of modularity is indicative of a more clustered network organization. These concepts are illustrated in **Figure 3**, which demonstrates extreme examples of changes in transitivity and modularity. Additional information about each of these measures can be found in Table A1 in Rubinov and Sporns (2010).

Changes in functional connectivity between the two study time points within each participant were measured by the difference (1) between connectivity measures at each scanning session (T2 − T1). Likewise, differences in measures of reading skill at each time point were calculated to assess changes in reading skill over the same time period. A linear regression model assessed whether changes in measures of functional integration or segregation accounted for changes in measures of reading skill.

## RESULTS

## Behavioral Analysis

Mean and standard deviations for RT and accuracy at T1 and T2, and the change in measure of reading skill (PDE) for lexical trials are presented in **Table 3**. Measures of 1SW and 1PDE were significantly and positively correlated (r = 0.70, p = 0.001; uncorrected), consistent with their mutual dependence on orthographic and phonologic integration and indicating that these skills were developing in parallel in our sample. PDE scores ranged from 19 to 63 (out of a possible 63) for T1 and from 31 to 59 for T2.

## General Linear Model Analysis

Though the evaluation of developmental changes in regional activity during single-word reading was not a central goal of this study, cortical surface contrast maps were generated to identify the reading network at T1 and T2. This afforded the opportunity to compare the activation maps within participants to identify developmental changes in fMRI activity for this task. We performed a random-effects within-subjects t-test of T1 vs. T2 in the Lexical vs. Fixation contrast. **Figure 4** shows a group level contrast map of cluster-size corrected significant activation differences in this contrast, within a per-voxel significance threshold of p < 0.001 (FWE). Peak voxel statistics are presented in **Table 4**. Overall, T1 was associated with more activation in left hemisphere phonological processing areas (STG; IFG), and T2 was associated with more activation in orthographic processing areas (FG).

#### TABLE 2 | Coordinates for regions demonstrating a significant task vs. baseline activation contrast at both time points (shown in Figure 1).


#### Functional Connectivity Analysis

An initial description of the relationship between connectivity and reading skill was obtained from uncorrected zero-order Pearson correlations between our measure of reading skill and each of the network metrics. Transitivity at T1 was not significantly correlated with PDE at T1 (r = 0.07, p = 0.79), but Transitivity at T2 is correlated with PDE at T2 (r = 0.48, p = 0.04). Because reading scores increased over this same period, there was an overall pattern wherein reading scores increased and correlations between transitivity and reading scores increased within the same period. Pearson correlations calculated between T2 – T1 changes (1) in functional connectivity (1GE,

non-overlapping groups. Note, both changes may result in more efficient

1Diameter, 1Transitivity, and 1Modularity) and reading skill (1PDE) indicated a significant relationship between the change in Transitivity and PDE (r = −0.49, p = −0.03; uncorrected), as shown in **Figure 5**. Because transitivity is an index of functional segregation, this indicates that an increase in reading skill is accompanied by a reduction in the proportion of functionally distinct clusters of activity. In short, the children who made the greatest gains in reading skill were those for whom the network made the largest transition from a collection of relatively independently functioning nodes to a coherent (i.e., unified) network.

We more rigorously assessed these relationships controlling for nuisance variables, including age, verbal ability (VIQ), phonological ability (Elision, BW), general letter and word recognition ability (WID, SW), and task performance (RT, ACC), at T1 and T2. Partial correlations between Transitivity and PDE at T1 (r = −0.41, p = 0.209) and at T2 (r = 0.10, p = 0.771) were non-significant, indicating that the significant zero-order correlation at T2 was largely driven by variables of non-interest. Our reanalysis of the relationships among developmental changes in connectivity and reading ability employed a stepwise linear regression that additionally controlled for nuisance variables, including age at enrollment, and T2 − T1 changes in verbal ability (1VIQ), age (1Age), phonological ability (1Elision, 1BW), general letter and word recognition ability (1WID, 1SW), and changes in task performance (1RT, 1ACC), F(1, 17) = 15.92, R <sup>2</sup> = 0.48, Adj. R <sup>2</sup> = 0.45, p < 0.01, prior to entering all four functional connectivity measures into a model predicting changes in reading skill (1PDE). The final model includes 1SW (η 2 <sup>p</sup> = 0.72, p = 0.001) and 1Transitivity (η 2 <sup>p</sup> = −0.55, p = 0.019) as significant predictors of 1PDE, F(2, 16) = 14.14, R <sup>2</sup> = 0.64, Adj. R <sup>2</sup> = 0.59, 1R <sup>2</sup> = 0.16, p < 0.001. No other variables were significant predictors of changes in PDE. These results indicate that, apart from changes in sight word reading, with which PDE has long been known to be highly correlated (Barker et al., 1992), changes in our measure of reading skill were predicted only by transitivity changes within the reading network.

Though we took measures to restrict our analyses to brain regions that were preferentially engaged in reading at both time points, the connectivity analyses described above were performed on functional connectivity estimates computed from two intact concatenated time series. Because the experiment used a fast event-related design, both time series contained

processing.


TABLE 3 | Mean and standard deviations at T1 and T2 for reaction time (RT) and accuracy (ACC) for the lexical trials, the change in measure of reading skill, pseudoword decoding efficiency (PDE), and change in transitivity.

PDE performance was significantly improved at T2 compared to T1. There was no significant change in transitivity between T1 and T2.

medial surfaces of the left (A,C) and right (B,D) hemispheres.


TABLE 4 | Coordinates for size-corrected clusters (p = 0.001) demonstrating significant T1 vs. T2 task-related activation (shown in Figure 4).

intermittent periods (e.g., the perceptual and fixation trials) during which participants were not actively engaged in reading. This introduces the possibility that the connectivity metrics within the functionally defined reading network may have been influenced by time series correlations associated with nonreading activity. Given the design of the experiment, it is impossible to isolate segments of the intact time series that can be confidently ascribed to reading or non-reading processing.

Rather, the best available means of addressing this concern was by first modeling the fixation and perceptual trials with the GLM, and saving the residualized values as our best estimate of what the time series would look like had these conditions been excluded from the design. We then computed connectivity statistics on functional connectivity estimates derived from the residualized time series. When we do so, we find that 1Transitivity (η 2 <sup>p</sup> = − 0.40, p = 0.09) remained a marginally significant predictor of 1PDE.

## DISCUSSION

We hypothesized that changes in reading skill should be reflected in changes in the dynamics of processing in the reading network, specifically that changes indicative of more efficient processing will predict improvements in reading performance. We explored this hypothesis by examining longitudinal changes in neural connectivity within a functionally defined reading network in young readers. This research provides evidence that improvements in reading skill over time are predicted by the nature and degree of changes among connectivity patterns within the reading network. This is consistent with broader theories of neural development that stipulate cognitive performance improves as a result of optimized connectivity throughout the lifespan (Casey et al., 2005; Bassett et al., 2009), and connectionist models of reading development that propose a shift in interregional dependency within the reading network (Koyama et al., 2011; Liu et al., 2018).

Lending critical support to our hypothesis, the hierarchical model additionally found that changes in functional segregation, specifically transitivity, predicted changes in pseudoword decoding. Transitivity is an index of functional segregation, with large transitivity values indicating that a network contains many embedded cliques, each working somewhat independently of each other. Thus, a decrease in transitivity indicates a developmental decrease in the proportion of such clusters. Stated conversely, a reduction in transitivity reflects an overall increase in processing coherence among these regions, such that the nodes within the reading network work less independently and more cooperatively. Because a decrease in transitivity predicted an improvement in reading skill, this indicates that the unification of the reading network through cooperative processing is a critical driver of gaining reading skill. Transitivity has been identified as an important component of neural connectivity (Zalesky et al., 2012; Baronchelli et al., 2013; Goñi et al., 2014) and used in previous work to distinguish dysfunction from normal processing, such as in Schizophrenia (Anderson and Cohen, 2013) and Alzheimer's (Stam et al., 2006); however, this work appears to be the first to apply transitivity to task-related behavior.

We note here an apparent inconsistency between the positive zero-order correlation between PDE and Transitivity at Time 2, and the significant zero-order correlation between the change in PDE and the change in Transitivity between Time 1 and Time 2. We find that the zero-order correlations between PDE and Transitivity at these time points were confounded by several variables with obvious or non-interesting relationships with PDE, and that when these variables are accounted for, Transitivity is not significantly correlated with PDE at either time point. This lack of relationship indicates that, at some arbitrary baseline point in reading development, a child at a particular level of orthographic decoding skill could demonstrate either relatively high or relatively low Transitivity within the reading network. Importantly, the negative relationship between the change in transitivity and PDE remains significant, even when these confounding variables are accounted for. This suggests that, regardless of baseline reading network Transitivity, a decrease in Transitivity across time within this network is important for ongoing reading development, as it marks its ongoing reorganization.

Reading entails coordinated contributions of multiple functionally specialized brain regions that process orthographic, phonological, and semantic information. As such, most theories of reading development characterize improvements in reading as the result of changes in the efficiency by which distributed components of the reading network interact. Recent literature incorporating DTI measures in a machine learning application to predict reading skill demonstrates that connectivity within the putative reading network is critical for normal reading development (Cui et al., 2016). Whole brain connectivity estimated from rhyming task data indicates decreased connectivity in the visual word form area (VWFA) and regions associated with visual processing, and increased connectivity among temporal regions for dyslexics compared to normal readers (Finn et al., 2014), suggesting that dyslexics depend on a less developed reading mechanism reliant on phonological processing. Moreover, early connectivity changes have been shown to precede and predict the location of the development of a VWFA (Saygin et al., 2016). The VWFA appears specifically tuned to the orthography of a reader's familiar language and is considered an important developmental milestone for reading fluency.

The changes in functional connectivity patterns presented here may reflect the refinement of the VWFA that emerges as a product of the increased interaction among regions in the reading network. Price and Devlin (2011) suggest an increased responsiveness to orthography is driven by the development of integrated bottom-up sensory processing and top-down phonological and semantic processing regions, such that interaction between orthographic, phonological, and semantic information becomes increasingly automatized. This account is consistent with most accounts of reading development and more general explanations of cognition, whereby development is the product of an integrated bottom-up and top-down system.

Resting-state literature has been used to argue for a domain general role of the purported reading network (Vogel et al., 2013). Koyama et al. (2011) has shown that increased resting state connectivity between language processing regions and motor regions in both adults and children predicts reading skill, which they suggest reflects more automated processing. Our findings neither support nor refute the domain-general hypothesis; however, they provide an interesting context to interpret the resting state literature within Price and Devlin (2011) Interactive Account framework. The Interactive Account argues that functionally specialized modules within the reading network, such as the putative VWFA, are a consequence of the interactions between regions within this subnetwork. Resting-state data shows more connectivity between temporal regions and VWFA for adults compared to children indicating that development of the reading network entails increased functional interconnectivity. Connectivity within this network appears to also be related to reading impairment: resting state magnetoencephalography (MEG) has provided evidence that reading impaired children have altered global and localized temporoparietal connectivity believed to reflect less efficient signal processing (Dimitriadis et al., 2013), and less uniform temporal and spatial patterns attributed to altered information exchange in comparison to non-impaired readers (Dimitriadis et al., 2016). Improvements in reading following intervention for normal children and children with reading difficulties has been correlated with increased interactivity between core networks in reading and cognitive control (Horowitz-Kraus et al., 2015). Finally, González et al. (2016) show decreased diameter and increased eccentricity, characteristic of less efficient connectivity, in electroencephalogram (EEG) resting state data for dyslexic readers compared to normal controls, and suggest deficits observed in dyslexics are due to altered network connectivity rather than specific regional dysfunction. The resting-state connectivity literature collectively points to a model in which measures of connectivity within the core reading network, and between the reading network and other domain general areas can be used to differentiate readers at different skill levels. Whether functional connectivity is computed from task or resting-state data, it measures moment-to-moment processing dynamics within the brain. If functional specialization is an emergent property of these processing dynamics, it follows that reading network development will be intimately tied to wholebrain functional connectivity patterns that dictate the efficiency with which phonological and orthographic representations may be transformed and transmitted to support reading.

## Limitations and Future Directions

As noted earlier, one potential concern was that connectivity within the intact time series was estimated in part from nonlexical task activity. Because a fast event-related design was used, one cannot easily disentangle lexical and non-lexical processing. To address this potential confound, we used the approach of regressing out non-lexical activity, applying the logic on which analyses of covariance (ANCOVA) are based. It is important to note, however, that one should not construe the residualized time series as reflecting reading-only processing. The general linear approach models an idealized hemodynamic response function (HRF) that is convolved with a vector of event onset times. This approach is based on an assumption that the single HRF describes the response characteristics of each voxel equally well. We know, however, that the HRF varies regionally (Lu et al., 2006). This is seldom a concern for conventional contrast-based approaches because all comparisons are performed within-voxel, and thus, the spatial inhomogeneity of the HRF is equated across conditions. However, because the non-lexical activity estimate will be non-uniformly accurate, functional connectivity estimates of reading-related brain activity will be confounded by inter-regional differences in the degree to which the canonical HRF accurately models non-reading brain activity. Because connectivity estimates directly depend on interregional activity correspondence, this non-uniformity would be expected to influence connectivity metrics. The fast event-related design used in this experiment thus limits the confidence with which we can argue that our connectivity measures reflect purely reading-related functional connectivity. However, we found that changes in network transitivity predicted changes in reading skill on residualized lexical event time series when non-lexical events were regressed out. In conjunction with the rigorous datadriven approach to defining the network of interest, this suggests that the degree to which the constituent processing regions within the reading network function in tandem is relevant to the improvement of reading skill.

The whole network connectivity measures that we compute numerically summarize characteristic connectivity patterns among all nodes within the network. They cannot, however, be decomposed to make claims about specific nodes or connections. For example, though the reading network is largely left-lateralized, our functional network included nodes in the right hemisphere. Increased transitivity may reflect increased interhemispheric communication in some individuals, and increased posterior–anterior communication in others. The global network transitivity measure does not distinguish between the two, but this is not consequential to the main point that, regardless of the dominant directionality of communication, increased cooperative activity among nodes is associated with improved reading ability. Likewise, our global network measure does not speak to connectivity motifs outside the reading network. It is thus possible that changes in other networks, such as the default mode networks identified in resting state connectivity studies are additively predictive of reading skill as

attention and memory skills continue to develop. However, we take the position that skilled reading depends on multisensory interactions between brain regions that support orthographic and phonological processing, and that these interactions are supported by the skill-dependent connectivity between these regions. Our findings are consistent with this perspective, and should be seen as providing a big picture perspective that complements other network-level studies using seed-based approaches (e.g., dynamic causal modeling; Cao et al., 2008) or focusing on specific anatomical connectivity tracts (e.g., Gullick and Booth, 2014). Collectively, this body of literature suggests that skilled reading requires the coordinated effort of a network of brain regions, each contributing uniquely and importantly to the overall task of decoding between phonographic and orthographic representations.

Here, we propose that differences in developmental trajectories of reading network connectivity may function as a mechanism for explaining individual differences in reading ability. These findings, along with McNorgan and Booth (2015) and McNorgan et al. (2013), suggest that learning to read involves promoting interactions among these nominally modality-specific processing areas to make them work more cooperatively. It remains unclear how cooperative processing across these regions underlies reading development. Segmentation of the functional network reported here into occipito-temporal and parietal nodes may allow for investigation into whether there is a developmental change in connectivity that corresponds to a shift from phonological to orthographic processing. Additionally, further elucidation of reading network processing dynamics might involve incorporating methodology with faster time resolution (e.g., near-infrared spectroscopy or event-related potentials) to identify the sequence of interactions among these regions.

#### Summary

Our longitudinal analysis examined the predictive relationship of changes in graph-theoretic metrics of task-related functional connectivity within fMRI data collected from a sample of typically developing children over a period of several years. Our analysis found that improvements in reading skill were predicted by a desegregation of nominally functionally specialized regions within the putative reading network. These findings provide additional insight into how developmental changes within

## REFERENCES


reading network connectivity contribute to gains in reading skill in typically developing readers and suggest a model in which typical reading development entails increased interdependence among brain regions that typically support auditory and visual processing across other contexts.

## ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the Institutional Review Board of Northwestern University. The protocol was approved by the Institutional Review Board of Northwestern University. Parents of all subjects gave written informed consent in accordance with the Declaration of Helsinki.

## DATA AVAILABILITY

The data for this study will be made available upon request.

## AUTHOR CONTRIBUTIONS

GS, JB, and CM contributed conception and design of the study. JB oversaw data collection. GS performed the statistical analysis and wrote the first draft of the manuscript. CM and GS wrote sections of the manuscript. All authors contributed to manuscript revision, and read and approved the submitted version.

## FUNDING

This research was supported by internal faculty development funding from the University at Buffalo College of Arts and Sciences to CM and grants from the National Institute of Child Health and Human Development (HD042049) to JB.

## ACKNOWLEDGMENTS

The authors thank Erica S. Edwards for her early contributions preprocessing data for this project.


auditory spelling task. Dev. Sci. 10, 441–451. doi: 10.1111/j.1467-7687.2007. 00598.x


in children and adults. J. Neurosci. 31, 8617–8624. doi: 10.1523/JNEUROSCI. 4865-10.2011


reading-related phonological processes," in Practitioner's Guide to Assessing Intelligence and Achievement, eds J. A. Naglieri and S. Goldstein (Hoboken, NJ: John Wiley & Sons Inc), 367–387.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Smith, Booth and McNorgan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Reading Independently and Reading With a Narrator: Eye Movement Patterns of Children With Different Receptive Vocabularies

Zhuqing Su1,2, Yifang Wang<sup>1</sup> \*, Yadong Sun<sup>1</sup> , Jinhong Ding<sup>1</sup> and Zhuoya Ma<sup>1</sup>

<sup>1</sup> Department of Psychology, Capital Normal University, Beijing, China, <sup>2</sup> Department of Preschool Education, Yichun Early Childhood Teachers College, Yichun, China

This study examined the effects of two reading styles (i.e., reading with a narrator and reading independently), receptive vocabulary and literacy on children's eye movement patterns. The sample included 46 Chinese children (aged 4–6 years) who were randomly assigned to two reading styles and read the same picture book on a screen. The results indicated that the higher the children's receptive vocabulary was, the sooner they fixated on the text. Overall, the children's fixation probability (i.e., the time spent viewing the text zones as a proportion of full-page viewing time during each period) decreased with time when reading independently but increased with time when reading with a narrator. For children in senior kindergarten, reading with a narrator is thought to help establish and consolidate the links between speech and text and thus promote reading acquisition.

#### Edited by:

Iliana I. Karipidis, Universität Zürich, Switzerland

#### Reviewed by:

Urs Maurer, The Chinese University of Hong Kong, Hong Kong Angela Jocelyn Fawcett, Swansea University, United Kingdom

> \*Correspondence: Yifang Wang wangyifang6275@126.com

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 11 February 2018 Accepted: 30 August 2018 Published: 24 September 2018

#### Citation:

Su Z, Wang Y, Sun Y, Ding J and Ma Z (2018) Reading Independently and Reading With a Narrator: Eye Movement Patterns of Children With Different Receptive Vocabularies. Front. Psychol. 9:1753. doi: 10.3389/fpsyg.2018.01753 Keywords: children, picture books, eye movement, reading styles, receptive vocabulary

## INTRODUCTION

Substantial importance has been attached to improving the quantity and quality of reading among citizens in China, and Reading for All has twice been included in the "Report on the Work of the Government" (Qian, 2005). Reading cannot only increase people's knowledge and enrich their spiritual lives but can also improve the quality of the national culture. Hence, the government promotes reading for all, and forms of reading are becoming more diversified and modernized. Preschool is a critical period for cultivating children's reading habits, and numerous studies have demonstrated that reading during early childhood has a positive impact on children's language skills (Felton, 1992; Evans and Saint-Aubin, 2013), emotional intelligence (Doyle and Bramwell, 2006; Nikolajeva, 2013) and other aspects of development. Picture books combine two types of symbols, namely, pictures and short sections of text (which, in this paper, refers to Chinese characters), to tell a story. Such books are intuitive, easy to understand and interesting. Thus, children's early reading activities involving picture books have attracted increasing attention from children and parents.

Various reading styles may be employed when reading picture books. Sharolyn et al. (2016) described shared book reading and interactive reading, two terms that refer to adults' encouragement of children's active participation when they read books together. When employing these reading styles, adults ask children open-ended questions about the book and expand on their feedback to help children learn to express themselves more accurately. Evans and Saint-Aubin (2013) discussed another "adults reading to children" style that does not involve

**127**

adult–child interaction, in contrast to shared book reading and interactive reading. In this process, adults read to children in a straightforward manner without pointing, emphasizing, or explaining. With the development of multimedia technology, printed books are no longer the only form of reading: children can read books on electronic devices so that they can read not only by "looking" with their eyes but also by "listening" with their ears, which substantially increases interest in reading among children who are initially unenthusiastic (Maynard and McKnight, 2001; Maynard, 2010). In this study, we simulated an electronic reading situation for children and defined two reading styles; for both styles, the picture books were presented on a computer screen, and the children were allowed to proceed at their own pace. For the first reading style, the participants could listen to a recording of the text after they turned each page by clicking with a mouse; for the second reading style, there was no recording that matched the text after each page was turned. There were no other differences between the two styles. We defined the first reading style as reading with a narrator and the second as reading independently.

Studies of picture book reading have shown that children pay very little attention to text, regardless of whether they are reading independently (Evans et al., 2009) or reading with a narrator (Evans and Saint-Aubin, 2005; Justice et al., 2005). Unlike reading independently, reading with a narrator includes sound information. Therefore, we wondered whether the added auditory information affects the amounts of time that children allocate to pictures versus text when reading picture books. Han et al. (2011) directly compared the characteristics of children's eye movement for these two reading styles and found that the time spent on pictures did not differ. For the text zones, however, the viewing time of the children (aged 5–6 years) who were reading independently was longer than that of the children who were reading with a narrator. According to Nikolajeva and Scott (2006), picture book reading is a process that involves matching text with pictures, and each component promotes understanding. When reading with a narrator, sound information is added, and the connection between the sound and the printed word is very direct (i.e., children can frequently pronounce a word while failing to grasp its meaning) (Siegel, 1983). Therefore, listening to adults provides a scaffold for children with limited reading ability to explore a story by simultaneously listening to auditory information and looking at pictures (Evans and Saint-Aubin, 2005; Martin-Chang and Gould, 2012). For that reason, children pay less attention to text when reading with a narrator than when reading independently.

However, children do not show an overwhelming preference for the pictures all the time. A longitudinal study of children's reading activities concluded that children experience a process of "from pictures to text" (Sulzby, 1985). Specifically, children focus primarily on pictures when they begin to read, gradually focus on both pictures and text, and eventually actively focus on text and become mature readers. This developmental process is also a process of gradual improvement in reading skills. Similarly, Roy-Charland et al. (2007) found that with the advancement of grade level and reading skill (in their study, reading skill was considered to co-vary with grade level, at least to an extent), the duration of text viewing and fixation on text gradually increased. Roy-Charland et al. (2007) used the Stroop effect to explain these findings. In classical Stroop tasks (1935), it takes less time to name an ink color when the ink color is consistent with the meaning of a word (i.e., when the word "blue" is written in blue ink) than when the ink color conflicts with the meaning of the word (i.e., when the word "blue" is written in red ink), which suggests that the connection between a printed word and its meaning is more direct than the connection between a printed word and its color. This effect is thought to prompt children to automatically focus on text when they look at it. Moreover, once children form the connection between printed words and their meanings, the Stroop effect becomes very strong (Everatt et al., 1999; Wright and Wanley, 2003; but see Jerger et al., 1993). However, this connection usually forms at approximately 7 years of age (Comalli et al., 1962). Children in lower grades (i.e., kindergarten), whose reading skills have not developed as fully, are unable to connect printed words and their meaning automatically. Thus, they primarily attend to pictures.

Picture book reading is not only a process of "from pictures to text," which reflects the development of children's reading skills; it is also a process of receiving information from the external environment. Children's receptive vocabulary and literacy are strong predictors of their reading ability (Coates and Lewis, 1984; Evans et al., 2009). Evans et al. (2009) tracked the eye movements of children aged 59–71 months while they were reading a simple, clear alphabet book (each page featured a single large uppercase letter, one prominent word in uppercase, a single corresponding object, and the same bear). The results suggested that children with a larger receptive vocabulary more quickly fixated on letters and words. Both vocabulary and letter knowledge accounted for 23–56% of the variance in the seven dependent variables. Similar results were reported by Evans and Saint-Aubin (2013), who analyzed the vocabulary acquisition of 36 children aged 50–62 months after repeated reading of a storybook (7 days). The results of their study revealed that the children's fixation on the text was significantly correlated with their letter knowledge. Thus, children's receptive vocabulary and letter knowledge are important influences on the reading process and warrant further study.

The eye-tracker, which can record eye movements in real time, has become an important tool for investigating reading behavior. Previous research on preschool children's picture book reading activities, i.e., comparing eye movement patterns either among different reading styles or among children with different receptive vocabulary sizes, have often focused statically on different interest zones on the whole-page. However, we do not think that the static analysis method is sufficient. According to Nikolajeva and Scott (2006), picture book reading is a process that involves matching text with pictures. To determine whether this is true, we must examine children's reading process in depth. In addition, in our practice, we have also found that children's distribution of text and picture reading time is different at different stages of reading a book. Therefore, we chose to subdivide the children's wholepage reading time to examine the changes in eye movement patterns over the time course. Specifically, we divided each page into 10 viewing periods, i.e., the duration from first fixation to

last fixation. For each period, the time spent viewing the text zones that accounted for the entire page was calculated, and eye movement trends throughout the process of reading a picture book were examined. We thus viewed picture book reading as a continuous process; this dynamic analysis method may be a more accurate and detailed way to investigate the topic. The picture book used in this study was clear and simple with an apple tree and a protagonist (a little mouse) appearing on every page. Thus, we defined the protagonist as one of two important zones of interest (the other being the text zone).

The first purpose of this study was to determine whether there are differences in eye movement patterns when children read picture books independently versus when they read them with a narrator. Considerable research has shown that children primarily look at pictures when reading picture books (Justice et al., 2005; Evans and Saint-Aubin, 2005, 2013; Evans et al., 2009). Therefore, regardless of which reading style is assigned, it might be expected that children will show an overwhelming preference for the protagonist and that there will be no difference in the time spent viewing the protagonist between the two reading styles. Given the special method of subdividing the entire viewing time into smaller periods, we propose no a priori hypothesis about children's reading patterns in the text zone.

A second purpose was to examine the influence of Peabody Picture Vocabulary Test (PPVT) scores and literacy skills on children's eye movement patterns while reading a picture book. According to Coates and Lewis (1984), children's receptive vocabulary and literacy skills are strong predictors of their reading ability. Consequently, we hypothesized that the influence of these two variables on reading patterns would mainly affect the text zone; in other words, the higher the children's PPVT scores and literacy levels were, the sooner and the longer they would fixate on the text. Through this research, we hope to deepen the understanding of the characteristics of early childhood reading in order to provide a reference for parents who wish to determine the appropriate reading style for their children.

## MATERIALS AND METHODS

#### Participants

The participants included 54 children from a kindergarten in Beijing. All the children were native Chinese speakers with no history of gross motor, hearing, vision or language problems. The children were randomly assigned to two reading groups. In the formal experiment, eight children performed successive mouse clicks or made large body movements, so their eye movement data were invalid. The final sample consisted of 46 children, with 23 children in each group; 15 boys and 8 girls, ranging in age from 55 to 76 months, were assigned to the independentreading group, and 12 boys and 11 girls, ranging in age from 54 to 78 months, were assigned to the reading-with-a-narrator group. See details in **Table 1**.

## Materials

A picture book entitled "The Little Mouse Wants Apples" (Zhongjiang and Shangye, 2014) was used. The book tells a simple and interesting story about a small mouse who eats apples with the help of friends. Six types of animals are mentioned in turn, except the little mouse. We confirmed that all the children knew these animals and that they had not previously read the book. The book is aimed at children aged 3–6 years. Three pages of the book were excluded, and 13 pages were presented to the children. The first page (home) was the cover, and the last page contained no Chinese words. The 12th page was structured very differently from the other pages. Therefore, those three pages were not included in the analysis of the eye movement data, and 10 pages were analyzed. Blue circles on the page represented the text and protagonist zones, which did not appear on the actual page. The boundary of the text zones was defined as a visual angle away from the text, and the circular radius was 75 pixels. The boundary of the protagonist zone was defined as a visual angle away from the protagonist, and the zone's radius was 150 pixels. The text and picture zones did not overlap, and the text was fixed on the right side of each page (see **Figure 1**).

The PPVT (Dunn, 1965) used in this study was translated and revised by the Shanghai Institute of Pediatrics, and its test–retest reliability was 0.95. Its correlation coefficient with the Wechsler Preschool and Primary Scale of Intelligence (WPPSI) was 0.53 (Gong and Guo, 1984).

The sound used in the reading-with-the-narrator condition was recorded in advance by a non-professional voice dubbing artist majoring in psychology. The recording matched the picture book text. The voice artist read the book vividly, changing her tone, intonation and emotion according to the plot. There was only one sentence on each page (e.g., "A bird comes and takes an apple"). The number of words per sentence was 7–13, and the recording time for each sentence was 2–4 s (M = 3, SD = 0.67). Children heard the same recording when reading with a narrator.

#### Instruments

The digital version of the book was presented on a monitor with a resolution of 1024 pixels × 768 pixels and a refresh rate of 120 Hz. Eye movements were measured with a high-speed eyetracking device produced by Senso Motoric Instruments (SMI) that had a high sampling rate (250 Hz) and high accuracy (<1). The model was SMI-RED250. A program written in Visual Basic was used to present the stimulus and collect monocular data on the participants. The Python language was used for the eye-gaze analysis.

#### Procedure

#### Vocabulary Tests

The PPVT was used to test the children's receptive vocabulary by asking them to point to one of four pictures that represented a spoken word.

The literacy measure was developed to test the children's knowledge of the words used in the picture book, and the children were asked to read specific words presented at random. The scores were recorded as 1 (correct) or 0 (incorrect). The correlation coefficient between the children's PPVT scores and the literacy measure was 0.49 (p < 0.01). Detailed information is provided in **Table 1**.


TABLE 1 | Descriptive statistics of the children.

fpsyg-09-01753 September 20, 2018 Time: 16:36 # 4

PPVT refers to the Peabody Picture Vocabulary Test. "Independent" means independent reading, and "narrator" means reading with a narrator.

#### Eye Movements

The monitor was positioned approximately 80 cm from the participants' eyes, and the participants viewed the monitor through a square window to keep their eyes focused on the stimuli. The eye movement measurement process began with a familiarization period. The children were asked to read another picture book to learn how to use the mouse to turn the pages. After ensuring that the children understood the instructions, a standard 9-point calibration procedure was initiated. The participants had to stare directly at the computer screen and look at a series of nine sequential dots. After successful calibration, the experiment began. The children were told that they were going to look at a picture book on the computer. They could turn the pages by clicking the mouse, just like they had learned, until they had read the whole book. During the process, they needed to sit very still and try not to move. After the children were thoroughly prepared, the system randomly selected a reading style for them. The sessions lasted approximately 30 min.

#### RESULTS

Each page contained two zones: the protagonist zone and the text zone. The analysis primarily involved three measures. The first measure was the total time spent viewing each zone, which is sensitive to slower and faster cognitive processes (Holmqvist et al., 2011). The second measure was the fixation counts in each zone. Fixation represents the processing of information (Rayner, 1978). When reading materials are attractive or require a high cognitive load, they elicit higher fixation counts and longer viewing times. In this study, fixation was coded for gaze durations of 100 ms or longer. The third measure was time to first fixation (TFF, i.e., the latency from page presentation to the first fixation on the interest zone), which perfectly mirrored previous data on the first fixation (Evans et al., 2009). The shorter the TFF, the more likely the participants' attention was to be drawn to the target.

## The Influence of Vocabulary and Literacy on the Children's Reading Behavior

To evaluate the effect of the children's PPVT and literacy scores on their eye movement variables, a series of simultaneous multiple regression analyses were computed (enter method). The PPVT and literacy scores were entered together as predictor variables, and the eye movement indices (i.e., viewing time, fixation counts, and TFF) were the dependent variables. **Table 2** presents the results of the simultaneous multiple regression analyses. For the TFF of the text zone, the overall regression was significant, F(2,51) = 3.22, p < 0.05, R <sup>2</sup> = 0.13. To further determine the unique contribution of each predictor to the overall regression model, hierarchical regression analyses with the PPVT at step 1 as the control variable and the literacy score at step 2 was performed (enter method). **Table 3** presents the results of the hierarchical regression analyses. The results revealed that the children's PPVT scores had a significant, unique contribution to the TFF of the text zone and accounted for 12% of the variance, F(1,51) = 5.85, p < 0.05, R <sup>2</sup> = 0.12. After PPVT scores were controlled for, literacy did not explain a significant amount of additional variance (1R <sup>2</sup> = 0.01, p > 0.05). These findings demonstrated that the higher the children's PPVT scores were, the sooner they fixated on the text. Another noteworthy finding was that the children's PPVT and literacy scores had a marginal effect on the viewing time for the text zone, F(2,51) = 3.04, p = 0.058, R <sup>2</sup> = 0.08, indicating that the higher the children's PPVT and literacy scores were, the longer they fixated on the text, to a certain extent. No significant effects between the children's receptive vocabulary, literacy scores and other measures were found (all ps > 0.05).

## The Influence of Reading Styles on Children's Reading Behavior

To investigate the effect of different reading styles on children's reading behavior, a dependent sample t-test was conducted using reading style (independent reading vs. reading with a narrator) as the independent variable and eye movement indices (i.e., viewing time, fixation counts, and TFF) as the dependent variables. The results revealed no significant difference in any of the eye movement indices (all ps > 0.05). See details in **Table 4**.


PPVT, Peabody Picture Vocabulary Test. <sup>∗</sup>p < 0.05.

fpsyg-09-01753 September 20, 2018 Time: 16:36 # 5

TABLE 3 | Hierarchical regression analyses predicting the Time to First Fixation (TFF) on the text zone.


PPVT score at Step 1 and literacy level at step 2. <sup>∗</sup>p < 0.05.

Considering the continuous process of reading, we divided each page into 10 viewing time periods and performed further analysis. We conducted a multilevel analysis using hierarchical linear modeling (HLM) software with time points (TIMEP), reading styles (GROUP), and the PPVT scores as predictor variables; the dependent variable was the probability of fixation on the text zones. The model was as follows:

Level 1:

Y = B0 + B1∗TIMEP + R

Level 2:

B0 = G00 + G01∗PPVT + G02∗GROUP B1 = G10 + G11∗PPVT + G12∗GROUP

In the above equation, Y is the index of the dependent variable. B0 is the intercept at Level 1. B1 is the slope, which is the increase in Y for each additional unit of TIMEP. TIMEP is the time point ranging from 1 to 10. GROUP refers to the reading styles, and G00–G12 are the parameters to be estimated. The results of the parameter estimation are presented in **Table 5**.

According to the HLM analysis, only reading style significantly affected B1 [GROUP (G12 = –0.007, SE = 0.0035, T-ratio (454) = –1.933, p < 0.05)], revealing a significant interaction between time point and reading style; specifically, the rate of change in the fixation probability was significantly lower when the children read independently than when they read with a narrator. Additional information is provided in **Figure 2**. In contrast, there was no significant interaction between time point and PPVT score [G11 = 0.000048, SE = 0.000078, T-ratio (454) = 0.607, P > 0.05], indicating that the children's receptive vocabulary had no significant effect on the rate of change in the fixation probability.

## DISCUSSION

This study compared the eye movement characteristics of children with different receptive vocabulary and literacy levels when they read independently and when they read with a narrator.

First, we explored the effect of receptive vocabulary and literacy levels on eye movement patterns from a static perspective. In the present study, the children's vocabulary and literacy levels only influenced the eye movement indices in the text zones. Specifically, the children with higher PPVT scores fixated sooner on the text. The results indicate that children with a larger vocabulary have formed a stronger connection between printed words and their meaning. This finding corroborated the Stroop effect in reading (Roy-Charland et al., 2007). In addition, in the simultaneous multiple regression model, the PPVT and literacy scores had a marginal effect on the viewing time for the text zone. The results indicated that the higher the children's PPVT and literacy scores were, the longer they fixated on the text to a certain extent. The marginal effect may be related to the small number of subjects, and future studies can increase the number of subjects.

In addition, children with different vocabulary and literacy levels showed no significant difference in terms of total viewing time and fixation counts for the entire page and the protagonist zone. Further analysis revealed that the children spent 33% of their total viewing time on the protagonist zone. The children's preference for the protagonist may be related to the picture book used in this study and the definition of the interest zones. Specifically, although the picture book included both pictures and text, we did not use the whole picture zone in this study. We defined the protagonist zone as the zone of interest, given that the protagonist appeared on every page, as mentioned earlier. Similarly, in Evans et al. (2009), the same bear appeared on each page of the alphabet book, and the


TABLE 4 | Means (and standard errors) of eye-gaze indices for each area for different reading styles.

The units of means are milliseconds.

fpsyg-09-01753 September 20, 2018 Time: 16:36 # 6

TABLE 5 | A two-level hierarchical regression predicting the children's fixation probability for the text zone.


children fixated more quickly and for longer on the picture of the bear than on any of the text zones. These findings suggest that children prefer familiar and repeated features; thus, the method of defining a zone of interest and the repetition of the protagonist may have affected the children's attention.

Compared with children who are Chinese language speakers, children who know alphabetic languages appear to pay less attention to text. For example, the children in the present study spent 19.38% of their total viewing time on the text zone, which was similar to the 19.9% reported by Han et al. (2011). However, different results were reported by Justice et al. (2005). Their research indicated that children spent only approximately 2.7% of their viewing time on the text. Even in the case of a print-salient storybook, the proportion was only 6%. Of course, variables such as the difference between Chinese and alphabetic languages and the use of different picture books reduce the comparability between research on Chinese subjects and subjects from other

countries. Such variables suggest the need for cross-cultural studies.

Second, we explored the effect of reading styles on eye movement patterns. The results revealed no significant difference in the fixation counts and viewing time for protagonist zones for the two reading styles, which was consistent with the results of Han et al. (2011) and the hypothesis, however, in their study, the children spent more time on the text zone when reading independently than when reading with a narrator. In the present study, the children's viewing time and fixation counts on the text zone did not differ significantly between the two reading styles. Further analysis revealed that the primary difference between the two reading styles was reflected in the slope (the rate of change in the fixation probability with reading time), which was significantly higher for the children who read with a narrator than for those who read independently. Combined with the information presented in **Figure 2**, it is clear that the fixation probability for the text zone decreased with time when the children read independently but increased with time when they read with a narrator.

More precisely, when children read with a narrator, listening to an adult might provide a scaffold for exploring the story by allowing the children to directly match the sound with the pictures (Evans and Saint-Aubin, 2005; Martin-Chang and Gould, 2012), which leads children to pay a great deal of attention to pictures when they begin to read each page. It might be worth noting that the picture book used in this study was relatively simple, and each page included only one short sentence. Consequently, it was very easy for the children to remember the audio content. With the passage of time and the end of the recording, the children's interest in the text gradually increased. It is possible that they gradually matched the text with the pictures and the sound in their memory to understand the story. Matching sound with text is considered conducive to the formation of a link between speech and text in children (Sulzby, 1985). Although it is not enough to be a mature reader, children in senior kindergarten may have mastered some vocabulary and reading skills. In addition, textual information helps children quickly grasp a story. Therefore, when they need to read independently, they still strive to match text with pictures in order to promote their understanding (Nikolajeva and Scott, 2006). Consequently, it was unsurprising that the probability of fixation on the text decreased with time when the children read independently but increased with time when they read with a narrator and that the total viewing time did not differ significantly between the two reading styles.

In the present study, we simulated an electronic reading situation for children and defined two types of reading styles. Given the abovementioned findings, it is logical to speculate that when children read an e-book with a narrator, the sound might provide a scaffold that allows pre-readers to better understand the story. In addition, listening to an adult and matching text with sound is thought to help children establish and consolidate the links between speech and text (Sulzby, 1985). Considerable research has shown that establishing links between print and speech is a crucial step in learning to read (Blomert, 2011; Jost et al., 2013). A failure to establish an effective link between print and speech can lead to reading disorders (Blomert, 2011). For that reason, when children read with an adult, they are gradually encouraged to attach importance to textual information. Attention to print is the first step in children's internalization of the forms and functions of print and a key process in forming a speech sound-print connection (Evans and Saint-Aubin, 2005; Justice et al., 2008). Children who have a high level of reading skill and have already established an adequate link between print and sound are able to obtain information quickly by reading text directly, and reading with a narrator does not seem necessary.

## Limitations and Directions for Future Research

This study only examined children's eye movement patterns during picture book reading. It did not consider reading comprehension or its relationship with eye movements. Future studies should explore the influence of picture book reading on emotions and behavior and/or investigate participants' understanding and memory level after reading a picture book. Second, the structure of the picture book used in this study was relatively simple. In future studies, more complex picture books should be used to investigate children's eye movement patterns when reading. Third, e-books often include numerous media effects (such as music) and functions (such as a dictionary). Thus, the effects of e-reader use on children's language development should be further investigated. Finally, we analyzed the data by subdividing the entire reading time into smaller time periods. This approach was not fully developed, and previous studies offer no precedent in this regard. Our study represents an exploratory attempt, and future studies can improve on our design.

## CONCLUSION

From a static perspective, children with high receptive vocabulary scores and literacy levels more quickly fixated on the text zones and tended to fixate longer. Based on the dynamic process of reading a picture book, the rate of change in the children's fixation probability was higher when reading with a narrator than when reading independently. Overall, the fixation probability for the text decreased with time when the children read independently but increased with time when they read with a narrator. For children in senior kindergarten, reading with a narrator may help to establish and consolidate the links between speech and text and thus promote reading acquisition.

## ETHICS STATEMENT

This study was conducted in accordance with the ethical standards of the Ethics Committee of the College of Education, Capital Normal University, with written, informed consent from all subjects. Institutional review board approval was obtained for this study. Given that the subjects in this study were children, we received all of the written informed consent forms from their parents. All procedures performed in the study involving human

participants were in accordance with the ethical standards of the institutional and national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

## AUTHOR CONTRIBUTIONS

fpsyg-09-01753 September 20, 2018 Time: 16:36 # 8

YW and YS designed the experiments. YS and ZS collected the data. JD and ZS analyzed the data. ZS and YS wrote the

#### REFERENCES


manuscript. ZS and ZM re-edited the manuscript according to the reviewers and the editors' comments.

### FUNDING

This research was supported by the National Natural Science Foundation of China (Grant No. 31371058 to YW) and State Administration of Press, Publication, Radio, Film and Television of China (Grant No. GD1608).

storybook reading. Dev. Psychol. 44, 855–866. doi: 10.1037/0012-1649.44. 3.855


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Su, Wang, Sun, Ding and Ma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Letter and Speech Sound Association in Emerging Readers With Familial Risk of Dyslexia

Joanna Plewko<sup>1</sup> , Katarzyna Chyl <sup>1</sup> , Łukasz Bola2,3 , Magdalena Łuniewska1,4 , Agnieszka De˛bska<sup>1</sup> , Anna Banaszkiewicz <sup>3</sup> , Marek Wypych<sup>3</sup> , Artur Marchewka<sup>3</sup> , Nienke van Atteveldt <sup>5</sup> and Katarzyna Jednoróg<sup>1</sup> \*

<sup>1</sup>Laboratory of Psychophysiology, Nencki Institute of Experimental Biology, Polish Academy of Sciences (PAS), Warsaw, Poland, <sup>2</sup> Institute of Psychology, Jagiellonian University, Krakow, Poland, <sup>3</sup>Laboratory of Brain Imaging, Neurobiology Center, Nencki Institute of Experimental Biology, Polish Academy of Sciences (PAS), Warsaw, Poland, <sup>4</sup>Faculty of Psychology, University of Warsaw, Warsaw, Poland, <sup>5</sup>Department of Clinical Developmental Psychology & Institute LEARN!, Faculty of Behavioural and Movement Sciences, Vrije Universiteit Amsterdam, Amsterdam, Netherlands

In alphabetic scripts, learning letter-sound (LS) association (i.e., letter knowledge) is a strong predictor of later reading skills. LS integration is related to left superior temporal cortex (STC) activity and its disruption was previously observed in dyslexia (DYS). Whether disruption in LS association is a cause of reading impairment or a consequence of decreased exposure to print remains unclear. Using fMRI, we compared activation for letters, speech sounds and LS association in emerging readers with (FHD+, N = 50) and without (FHD−, N = 35) familial history of DYS, out of whom 17 developed DYS 2 years later. Despite having similar reading skills, FHD+ and FHD− groups showed opposite pattern of activation in left STC: In FHD− children activation was higher for incongruent compared to congruent, whereas in FHD+ it was higher for congruent LS pairs. Higher activation to congruent LS pairs was also characteristic of future DYS. The magnitude of incongruency effect in left STC was positively related to early reading skills, but only in FHD− children and (retrospectively) in typical readers. We show that alterations in brain activity during LS association can be detected at very early stages of reading acquisition, suggesting their causal involvement in later reading impairments. Increased response of left STC to incongruent LS pairs in FHD− group might reflect an early stage of automatizing LS associations, where the brain responds actively to conflicting pairs. The absence of such response in FHD+ children could lead to failures in suppressing incongruent information during reading acquisition, which could result in future reading problems.

Keywords: letter-speech sound association, audiovisual integration, dyslexia, reading fluency, familial risk

## INTRODUCTION

In alphabetic scripts, learning the association between letters and speech sounds (LS; i.e., letter knowledge) is a critical step in reading acquisition. LS knowledge is a strong predictor of later reading skills across many languages (Schatschneider et al., 2004; Caravolas et al., 2012). The pace of LS acquisition depends on a given script, especially its orthographic transparency, i.e., the degree of regularity in LS correspondence (Seymour et al., 2003). In transparent orthographies,

#### Edited by:

Silvia Brem, Psychiatrische Klinik der Universität Zürich, Switzerland

#### Reviewed by:

Chris McNorgan, University at Buffalo, United States Maaike Vandermosten, KU Leuven, Belgium

\*Correspondence:

Katarzyna Jednoróg k.jednorog@nencki.gov.pl

Received: 23 April 2018 Accepted: 11 September 2018 Published: 02 October 2018

#### Citation:

Plewko J, Chyl K, Bola Ł, Łuniewska M, De˛ bska A, Banaszkiewicz A, Wypych M, Marchewka A, van Atteveldt N and Jednoróg K (2018) Letter and Speech Sound Association in Emerging Readers With Familial Risk of Dyslexia. Front. Hum. Neurosci. 12:393. doi: 10.3389/fnhum.2018.00393 most children master LS associations within 1 year of reading instruction and acquire reading effortlessly (Blomert and Vaessen, 2009). Although learning letter-sound (LS) associations happens at the very start of reading acquisition or already prior to reading acquisition, the full integration of LS pairs requires practice, and might take years to become fully automated.

However, around 10 percent of children struggle with reading acquisition and develop persistent reading difficulties, i.e., dyslexia (DYS; Shaywitz et al., 1998). The risk of developing DYS is substantially increased in children whose first-degree relatives had a history of reading problems (up to 30%–40% instead of 10% in general population, Snowling and Melby-Lervåg, 2016). According to a recent meta-analysis, children with family history of DYS (FHD+) face challenges in acquiring letter knowledge in preschool, which might result in later reading difficulties (Snowling and Melby-Lervåg, 2016).

Several fMRI studies examined brain response to LS association (congruent, where letters correctly denote speech sounds, and incongruent with non-matching LS pairs) in typical and reading disabled populations. In typically reading Dutch adults, response in the superior temporal cortex (STC) was enhanced by congruent and suppressed by incongruent LS pairs (van Atteveldt et al., 2004; Blau et al., 2009). Furthermore, adults with DYS underactivated STC, relative to typical readers, during congruent LS pairs processing (Blau et al., 2009). The decrease in activation was related to reduced processing of speech sounds, which in turn predicted the subjects' phonological skills. Alterations in neural activity of the STC were also observed in 10-year-old Dutch children with DYS (Blau et al., 2010). While typical readers showed a strong congruency effect (higher activation to congruent than incongruent LS pairs), readers with DYS showed weaker congruency in left planum temporale/Heschl's sulcus (PT/HS) and bilateral superior temporal sulcus (STS). The weaker congruency effect was further related to decreased LS matching knowledge and reading skills. Furthermore, in unisensory conditions, DYS readers compared to controls had lower activity in bilateral anterior STC for speech sounds and fusiform gyri (FG) for visual letters.

In less transparent orthographies, LS pairs induced a reversed congruency effect, namely stronger responses in the STC for incongruent compared to congruent grapheme-phoneme pairs in adult English typical readers (Holloway et al., 2015). Similarly, Swiss-German typical adolescent readers had enhanced brain activation to incongruent compared to congruent LS and consonant-vowel-consonant associations in left STC and FG, while the reversed pattern was observed in readers with DYS (Kronschnabel et al., 2014).

Although studies agree that LS integration as reflected by neural congruency effect is deficient in struggling readers across different alphabetic orthographies, the results are rather mixed in terms of the directionality of the congruency effect. The influence of orthographic transparency, stimulus properties (i.e., grain size) and developmental factors may contribute to this disparity. Moreover, since previous fMRI studies examined LS association in adults or children with at least 3 years of reading experience, it remains unclear whether the neural disruption in LS association is a cause of DYS or a consequence of decreased exposure to print. There is only one study (Nash et al., 2017) that addressed this issue by comparing the degree of LS integration between DYS readers and reading-matched controls. They did not find group differences, suggesting that the deficit is rather a consequence of the reading deficit than the cause.

We analyzed data from 85 Polish 7-year-old beginning readers with (FHD+) and without (FHD−) familial history of DYS, with similar early reading skills, out of whom 17 obtained DYS diagnosis 2 years later. If a different pattern of neural response for LS associations in left STC is inherent to reading deficits it should be already present at the beginning of literacy acquisition in FHD+ children especially those who later develop DYS. If however it is a consequence of impoverished reading experience FHD+ children should not differ from their FHD− peers in brain response to letters, speech sounds and LS pairs, as they still do not differ in reading experience at this stage of literacy acquisition.

## MATERIALS AND METHODS

### Participants

We recruited 120 children from the last class of kindergarten and first grade of primary school for the purpose of longitudinal study on DYS. First graders had on average 3.62 months (SD = 2.01 range 1.20–7.80) of formal reading instruction. The results from other fMRI tasks on the same sample were described before (De˛bska et al., 2016; Chyl et al., 2018). The inclusion criteria were: typical IQ (≥25th percentile in Raven's Colored Progressive Matrices), birth at term (≥37 weeks), right-handedness, monolingualism (speaking Polish as their native language), normal (or corrected to normal) vision, normal hearing, no history of neurological illness or brain damage and no symptoms of ADHD. The study was approved by the Warsaw University Ethical Committee and all children and their parents gave informed consent to the study in accordance with the Declaration of Helsinki.

Due to excessive motion during fMRI scanning (n = 20), failing to complete two runs (n = 4) or dropping out from the study before DYS diagnosis (n = 11), 35 children were excluded from the current analyses. Specifically out of 109 children who participated in the longitudinal study until DYS diagnosis we had to deselect 24: nine FHD+ who developed DYS, eight FHD+ and seven FHD− who became typical readers. The final sample included 85 children: 35 FHD− (21 girls, 14 boys; mean age: 6.89 years (range: 5.93–8.04)) and 50 FHD+ (30 girls, 20 boys; mean age: 6.92 years (range: 5.52–8.06)). Children from the FHD+ group had at least one first degree relative with DYS diagnosis (65.6%), or at least one parent who scored greater than 40 points on the Adult Reading History Questionnaire (ARHQ, Lefly and Pennington, 2000) as specified in previous studies (Maurer et al., 2003; Black et al., 2012).

To control for non-verbal IQ, Raven's Colored Progressive Matrices were used (Szustrowa and Jaworowska, 2003). Parental socioeconomic status (SES) was measured with Hollingshead's (1975) index of social status based on parental education and profession; two families did not answer SES questionnaire. In case of four children fathers could not be contacted and thus their ARHQ scores could not be estimated. The two groups did not differ in age, sex, grade, IQ and parental SES (for details see **Table 1**).

Two years after the fMRI experiment, we conducted a formal diagnosis of DYS using a dedicated battery of tests (Bogdanowicz et al., 2009) that enabled retrospective selection of children with DYS. The battery consisted of 10 tests: four of them assessing reading, two assessing writing, three measuring phonological skills, and a measure of rapid automatized naming (RAN). Children who achieved low scores (3rd sten and lower, corresponding to 11.3 percentile) in at least two reading subtests (out of four: sight word reading, pseudo-word reading, text reading and lexical decision task) were identified as dyslexics. Based on these criteria 17 children from the current sample were diagnosed with DYS (N = 17, mean age = 6.74, nine girls, eight boys). Twelve belonged to the FHD+ group and five to the FHD− group. The remaining 68 children developed typical reading skills (TR group, mean age = 6.96, 38 FHD+, 42 girls, 26 boys). Thus, the proportion of dyslexic children was similar in the FHD− (16.7%) and FHD+ (24%) group (Chi(1) = 1.21, p = 0.27) in the analyzed sample, because of large sample of FHD+ DYS children, who did not have usable fMRI data from the LS association task. However, the proportion of dyslexic children was significantly larger in the FHD+ (31.3%) than FHD− (11.9%) group for the total sample of 109 children (Chi(1) = 5.37, p = 0.02), who took part in the longitudinal study. Thus, the prevalence of DYS in the current study is similar to the one reported in recent meta-analysis (Snowling and Melby-Lervåg, 2016). DYS and TR children did not differ in age, sex, grade, IQ or parental ARHQ, however TR children had higher parental SES (see **Table 1**).

#### Behavioral Measures

Before the fMRI experiment (on average 46 days and no more than 4 months), all children underwent behavioral testing. The Decoding Test (Szczerbinski and Pelc-Pekała, 2013) was used to assess early reading and phonological skills. It included tasks of letter knowledge (upper and lower cases), sight word and pseudo-word reading (score: the number of correctly read words or pseudowords in a minute), phoneme elision (score: the number of items correctly solved in a minute), and phoneme analysis (score: the number of correctly solved items). Since psychometric norms were available only for first graders and our sample also included kindergartners, raw scores were used. Early print skills were measured with an orthographic awareness test where children had to choose the letter string, which exists in Polish (for instance DAG trigraph exists in Polish orthography, while DGA does not; Awramiuk and Krasowicz-Kupis, 2014). The outcome measure was the raw number of correctly assigned trigraphs. The passive vocabulary was tested with the Picture Vocabulary Test: comprehension (Haman et al., 2012), where a child is asked to select one of four images that corresponds to a specific word. The test had been standardized and normalized only for children from 2;0 to 6;11 years, therefore raw scores were used in the analyses. RAN was measured with subtests objects and colors naming (Fecenec et al., 2013). The outcome measure was the average time (in seconds) needed to name all stimuli in two subtests.

A formal diagnosis of DYS was conducted using a standardized battery of tests (Bogdanowicz et al., 2009) and children who achieved low scores (equal or lower than the 3rd sten) in at least two reading subtests (out of four: sight word reading, pseudo-word reading, text reading and lexical decision task) were identified as DYS.

To investigate behavioral performance differences between the FHD+ and FHD− groups independent sample t-tests were used. Because of the unequal group sizes, to test which behavioral variables significantly differ between DYS and TR, we performed bootstrap analysis. First, for each variable, the actual between-group difference was calculated. The values from both groups were put together to one dataset. Next, from this dataset, two subsets with sizes equal to the sizes of actual groups (for e.g., N(DYS) = 17, N(TR) = 68) were generated by drawing with replacement, and the difference between the means of the subsets was calculated. This step was repeated 10,000 times and histograms represent the distributions of the obtained mean differences. We calculated the number of occurrences when absolute values of differences from the distribution exceeded the absolute value of the real between-group difference. Two-tailed p-value was estimated by dividing the obtained number by the number of drawings (i.e., 10,000).

#### fMRI Task

The experiment consisted of two runs, each run having 12 stimulation blocks and 12 fixation periods. One block (15.6 s) consisted of three mini-blocks (5.2 s) and contained


Mean (SD) are depicted. B: boys; G: girls; K: kindergarten; FG: first grade of elementary school; <sup>b</sup> Bootstrap statistics. <sup>∗</sup>marks significant effects.

12 stimuli (four per mini-block) and was repeated twice per run, resulting with 48 stimuli per condition. The order of blocks was pseudorandomized so that two blocks of the same kind were not displayed in a row. The procedure was adapted from van Atteveldt et al. (2004). In each block stimuli from one of six conditions were presented using Presentation software (Neurobehavioral Systems). There were four experimental conditions: unisensory visual letters and speech sounds corresponding to selected Polish single letters (consonants: B, C, D, G, H, J, K, L, M, N, P, R, S, T, W, Z; and vowels: A, E, I, O, U), multisensory congruent and incongruent LS pairs, as well as two control conditions: symbols (Greek letters unknown to children) and speech sounds transformed into noise-vocoded speech with an in-house script in Praat (Boersma and Weenink, 2001). This study focuses only on the four experimental conditions, and comparisons with control conditions will be presented in a separate publication. Children were instructed to pay attention to the stimuli very carefully. To ensure that children attended to the stimuli, we followed the procedure as in Blau et al. (2010). A line drawing of cat, a voice (saying ''cat'') in the unisensory blocks, or a combination of the two in the multisensory blocks, was presented once per block (pseudo-randomized). Children were asked to press a button on a response-pad with left thumb every time they detect such stimuli.

#### fMRI Data Acquisition

All participants were familiarized with the MRI environment and procedure in a mock scanner before the beginning of experimental session in the 3T Siemens Trio MR system (Siemens AG, Munich, Germany). We used sparse design sequence so that the stimuli could be presented during silent delay of volume acquisition, which minimized the effects of scanning noise on experimental activation (van Atteveldt et al., 2004). Functional MRI data were acquired using a T2<sup>∗</sup> - sensitive, gradient echo planar imaging sequence covering the whole-brain (29 slices, slice thickness: 4 mm, 3 × 3 in-plane resolution, TR = 5.2 s (1.5 s of volume acquisition followed by 3.7 s delay), TE = 25 ms, matrix size: 64 × 64). The task was presented in two fMRI runs, each lasting for 6 min and 17 s (73 volumes), which in total gave 12 min and 34 s (146 volumes). Anatomical data were acquired using a T1 weighted sequence (176 slices, slice-thickness 1 mm, TR = 2.53 s, TE = 3.32 ms, flip angle = 7◦ , matrix size: 256 × 256, voxel size 1 × 1 × 1 mm).

#### fMRI Data Preprocessing

The imaging data were analyzed with BrainVoyager QX 2.2.0 (Brain Innovation, Maastricht, Netherlands; Goebel et al., 2006). Functional data were preprocessed to correct for 3D motion artifacts (trilinear interpolation), linear drifts and low-frequency non-linear drifts (high pass filter ''3 cycles/time course). All functional images were co-registered to the anatomical image. The anatomical image was then transformed into Talairach stereotaxic space (Talairach and Tournoux, 1988), and this transformation was applied to the aligned functional data. The functional images were spatially smoothed with a 6-mm FWHM Gaussian kernel. Finally, ART toolbox<sup>1</sup> was used to detect motion-affected functional volumes (thresholds were adapted from Raschle et al. (2012): movement threshold: 3 mm, rotation threshold: 0.05 mm). If number of motion-affected volumes was higher than 20%, the participant was excluded from analysis.

## MRI Whole Brain Statistical Analyses

Both experimental and control conditions were modeled in single subject design matrix together with motion parameters and separate regressors for each volume that was identified as motion-affected by ART toolbox. Second level statistical analyses were adapted from Blau et al. (2010). Second level analyses were performed using the general linear model (GLM) approach. The first analysis was a single factor model including four experimental conditions (i.e., letters, speech sounds, congruent LS pairs and incongruent LS pairs) as separate predictors, and was used to determine brain regions involved during the experimental tasks for the whole sample. The statistical map from this analysis (all four experimental conditions vs. baseline (rest period) contrast) was used as a mask (thresholded at p = 0.05) for subsequent GLMs. Next, two separate GLMs (GLM1 and GLM2) were computed for FHD− and FHD+ children, to evaluate the spatial pattern of activation for letters and speech sounds in each group separately (corrected for multiple comparisons using falsediscovery rate, q(FDR) < 0.01).

Direct between-group comparisons for unisensory conditions-letters and speech sounds were performed in GLM3. GLM 4 was a 2 × 2 factorial model including FHD status and multimodal conditions: congruent and incongruent pairs of letters and speech sounds. The congruency effect: difference between congruent and incongruent letter-speech sound pair calculated in the GLM4 was used to identify multisensory integration sites (Van Atteveldt et al., 2007). We applied the same statistical threshold as in the previous study on DYS children (Blau et al., 2010), i.e., voxel-wise threshold of p < 0.01, corrected for multiple comparisons using cluster extent threshold of p < 0.05 (Forman et al., 1995; Goebel et al., 2006). The clusters are reported in the Talairach space and displayed on average brain from all participants. Additionally, in **Supplementary Table S1** we report the results of whole brain analyses with a more stringent voxel-wise threshold of p < 0.005 with cluster extent threshold of 50 voxels similarly to other pediatric fMRI studies (e.g., Raschle et al., 2012; Wang et al., 2018; Yu et al., 2018).

#### fMRI ROI Analyses

To further explore the differences between the groups in unisensory and multisensory conditions in regions previously reported to differ between dyslexic and control subjects, ROI analyses were performed. Seven ROIs: left and right FG (for letter condition), left and right anterior superior temporal gyi (aSTG; for speech sound condition) as well as left and right STS and left planum PT/HS (for multisensory conditions) were examined by creating a 4 mm spheres around the peak coordinates taken from Blau et al. (2010). The percent signal change in these

<sup>1</sup>http://www.nitrc.org/projects/artifact\_detect

ROIs was compared between FHD+ and FHD− children. The statistical threshold was corrected for the number of ROIs with p < 0.025 for letters and speech sounds, and p < 0.016 for multisensory conditions.

Next, similarly as for behavioral variables, we retrospectively explored brain activity differences between DYS and TR groups by the means of bootstrap analysis in ROIs taken from Blau et al. (2010) as well as in regions showing significant differences between FHD− and FHD+ children in the whole brain analysis.

Moreover, we calculated Pearson's correlations between scores on reading related tests (word reading, orthographic awareness, phoneme analysis and elision) and the strength of the fMRI congruency effect. The correlations were performed in regions showing a significant group × congruency interaction in the current study and in (Blau et al. (2010); i.e., left and right STS and left PT/HS) in FHD+ and FHD− children, and in TR and DYS groups separately. The statistical threshold was corrected for the number of ROIs and behavioral measures (p < 0.007).

Finally, in the sample of first graders, we computed correlations between time of reading instruction, behavioral performance and congruency effects in ROIs taken from the whole brain analysis of FHD status and from Blau et al. (2010). Further to examine if the same pattern of results in present for beginning readers and prereaders, we performed additional ROI analyses reported in **Supplementary Materials**.

## RESULTS

#### Behavioral Results

FHD+ children did not differ significantly from FHD− group with respect to performance on early reading, phonological awareness or orthographic awareness tests (for details see **Table 2**).

The bootstrap analyses revealed that at the beginning of reading acquisition, children who were 2 years later classified as DYS, had lower scores in letter knowledge, word and pseudoword reading, phoneme analysis, elision, RAN and orthographic awareness than TR children (for detailed scores see **Table 2**). No significant differences were found in passive vocabulary between DYS and TR children.

## fMRI Results

#### Whole Brain Analyses

**Figure 1** depicts brain activity in FHD− and FHD+ children in response to unisensory presented letters and speech sounds as well as the overlap of brain activity for both conditions (GLMs 1 and 2).

When the two groups were directly compared for each unisensory condition (GLM3) significant differences in brain activity were found only for speech sounds. FHD+ children had higher activity for speech sounds than their FHD− peers in right middle and inferior frontal gyri (see **Table 3** and **Figure 2**). No significant differences between the groups were found for letter processing.

A significant interaction between group and multisensory conditions (GLM4) was found in the left PT/STG and right


inferior temporal gyrus (ITG, for details see **Table 3** and **Figure 3**). In the left PT/STG it was driven by higher activation to incongruent, relative to congruent LS pairs in FHD− (p = 0.021), and a reversed pattern (higher activity for congruent pairs) in FHD+ children (p = 0.037). The two groups differed for incongruent (FHD− > FHD+; p = 0.029), but not for congruent condition. In the right ITG the pattern was opposite, namely the activation was higher for congruent, relative to incongruent LS pairs in FHD− children (p = 0.004), whereas a reversed effect (higher activity for incongruent pairs) was present in FHD+ children (p = 0.024). In this cluster the groups differed for congruent (FHD− > FHD+; p = 0.008), but not for incongruent condition.

#### ROI Analyses

Further analysis of seven ROIs based on regions distinguishing between DYS and control children in Blau et al. (2010) revealed a trend for lower activation in FHD+ compared to FHD− children in the left fusiform gyrus for letter processing (x = −36, y = −51 z = −17; t = 1.95, p = 0.056). In the left PT/HS (x = −42, y = −28, z = 13) we found significant interaction between group and multisensory conditions (F(1,83) = 6.22, p = 0.012). The groups differed only in the incongruent condition (p = 0.009), where FHD− children had higher activation than FHD+ children. Additionally, FHD+ children presented higher activation for congruent compared to incongruent LS pairs (p = 0.029), while no differences between the conditions were


Note: The results are reported at voxel-wise threshold of p < 0.01, corrected for multiple comparisons using cluster extent threshold of p < 0.05.

FIGURE 2 | Unisensory group effects for speech sounds with increased activation in FHD+ compared to FHD− children in the right inferior frontal gyrus (A) and in the right middle frontal gyrus (B). The clusters are displayed on average brain from all participants at voxel-wise threshold of p < 0.01, corrected for multiple comparisons using cluster extent threshold of p < 0.05.

and FHD+ (horizontal lines illustrate significant post hoc tests) as well as in DYS and typical reading (TR) children (horizontal line illustrates significant bootstrap statistics).

observed in FHD− children. We did not find any FHD effects in the other ROIs.

#### Bootstrap Analyses (Comparisons Between TR And DYS Groups)

For ROIs taken from Blau et al. (2010), a trend for differences appeared in left PT/HS: DYS children had higher activity than TR group for congruent LS pairs (p = 0.039). Additionally, in the right aSTG in response to speech sounds DYS group had significantly higher activity than TR group (p = 0.022). We did not find any DYS effects in the other ROIs taken from Blau et al. (2010). Furthermore, in the left PT/STG, an ROI showing significant interaction between FHD status and congruency in the whole brain analysis, DYS children had significantly higher activity than TR group for congruent LS pairs (p = 0.006). **Figure 4** presents histogram distributions from bootstrap analysis together with the actual between group differences in percent signal change in these three brain regions.

#### Correlations With Behavioral Variables

Several significant negative correlations between congruency effect (i.e., higher response for congruent compared to incongruent LS pairs) in the left STS ROI and early reading skills were found in FHD− children (word reading r = −0.59, p < 0.001; orthographic awareness r = −0.60, p < 0.001; phoneme analysis r = −0.52, p = 0.001; phoneme elision r = −0.55, p = 0.001; see **Figure 5**). None of the above correlations were significant in FHD+ group. We did not find significant correlations for the clusters showing significant FHD effects on the whole brain level.

When the sample was retrospectively split into TR and DYS, we found negative correlations in TR between congruency effect in the left STS and word reading (r = −0.34, p = 0.005), while correlations with phoneme analysis (r = −0.30, p = 0.012) and elision (r = −0.27, p = 0.027) did not survive the correction for multiple comparisons. None of the correlations were significant in the DYS group.

To test the relation between reading instruction, behavioral performance and brain activity, we correlated months of reading instruction that the first-grade children (n = 66) received with congruency effects in ROIs taken from the whole brain analysis of FHD status and from Blau et al. (2010). These correlations were performed in the whole sample of first graders and separately in FHD− (n = 28), FHD+ (n = 38) and TR children (n = 56), but not in DYS (n = 10) because of too few subjects. Importantly, no differences were found in months of reading instruction between FHD− and FHD+ and between TR and DYS children. The time of reading instruction was weakly positively correlated with word and pseudoword reading in the whole sample (r = 0.25, p = 0.047 and r = 0.32, p = 0.009), in FHD+ children (r = 0.33, p = 0.04 and r = 0.36, p = 0.025) and in TR (only pseudoword reading, r = 0.34, p = 0.011), however these correlations did not survive the correction for multiple comparisons (due to repeating the correlations for seven behavioral measures). On the neural level only in the whole sample of first graders and in FHD− children a weak negative correlation was found between months of reading instruction and congruency effect in right STS (r = −0.31, p = 0.012 and r = −0.46, p = 0.015 for the whole sample and FHD− first graders respectively). Again, these correlations were not significant after correction for multiple comparisons (due to repeating the correlations in five ROIs).

## DISCUSSION

We examined brain response to letters, speech-sounds and LS pairs in emerging readers with and without familial risk for DYS and retrospectively assessed which of the observed effects are present in children who developed DYS 2 years later. Even though behaviorally FHD+ and FHD− groups did not differ with respect to early reading, phonological awareness and orthographic skills (similarly as in Specht et al., 2009 and characteristic of transparent orthographies) and the prevalence of DYS was also similar between the FHD groups in children qualified for the current analyses (see ''Participants'' section for details), we found brain activation differences for both unisensory and multisensory conditions in regions previously implicated in DYS. Children who later developed DYS compared to typical readers presented lower early reading skills and altered brain response in STC to speech sounds and congruent LS pairs.

In details, for multisensory conditions, we found an interaction between FHD group and LS congruency (congruent vs. incongruent LS pairs) in left STC and in right inferior temporal cortex. The cluster in left STC was in close proximity to the left PT/HS cluster where weaker congruency effect in DYS children was found previously (Blau et al., 2010). Curiously, in the current study the congruency effect in the left STC was of a different direction, i.e., FHD− children had higher brain response to incongruent compared to congruent LS pairs, while FHD+ children had the opposite pattern (higher brain activity for congruent compared to incongruent condition). The reversed direction of congruency effect was further confirmed in the ROI analysis, where the activity in the left PT/HS for incongruent condition was significantly higher in the FHD− than FHD+ group, while no group differences were found for the congruent condition. Additionally, children who developed DYS had significantly higher response than the typically reading group in the left STC (left PT/STG and left PT/HS ROIs) for congruent LS pairs. Finally, stronger response to the incongruent LS pairs (relative to congruent pairs, i.e., incongruency effect) in left STS ROI was positively related to early reading performance in FHD− children and (retrospectively in) typical readers. The congruency effect was not related to performance measures neither in FHD+ nor in DYS children. However, lack of correlation in DYS group could be explained by both smaller sample and more restricted range of behavioral performance in the lower end of the continuum.

In left PT/STG and left PT/HS, we observed group differences related to DYS or risk of DYS similarly as in previous studies on adult (STG, Blau et al., 2009) and older children (PT/HS, Blau et al., 2010). Next, we found that the congruency effect is negatively related to reading and reading related performance in typical readers and children without the risk of DYS in left STS but not PT/STG or PT/HS. Only in Blau et al. (2010) brain-behavior correlations were performed and even though for the whole sample of children significant relations with congruency effect in left STS and PT/HS were found, they were driven by group differences and became non-significant when the group factor was controlled for. PT/HS and surrounding STG are sensitive to acoustic features, and the former does not distinguish speech and non-speech (Price, 2012). STS on the other hand is more involved in speech than nonspeech, shows neural adaptation effects to phonological level information (Vaden et al., 2010), while bilateral lesion of STS often associated with word deafness (Stefanatos, 2008). Activity in left STS in response to both print and speech is also related to reading abilities in emerging readers (Chyl et al., 2018). That is why enhanced activation to congruent vs. incongruent LS in left HS and PT was putatively attributed to feedback from STS and STG to primary auditory cortex (van Atteveldt et al., 2004). Perhaps in typical beginning readers the more efficient the reading skills the more effective feedback from STS to auditory cortex, resulting in

higher incongruency effect as compared to children at risk for DYS.

There might be several explanations for the reversed congruency effects in the left STC observed in the current study. First of all, the effect could be driven by differences in orthographic transparency. The observed direction of congruency effect of Polish children is more comparable to results obtained from English and Swiss-German (Kronschnabel et al., 2014; Holloway et al., 2015), where in typical readers higher brain response to incongruent compared to congruent stimuli was recorded. One could argue that it is specific for irregular orthographies with high LS mapping inconsistency. Indeed, one comparative study reported the highest inconsistency in English, followed by German, while Dutch was on the other end of LS ambiguity (Polish was not included, Borgwaldt et al., 2005). In this study transparency measurements were performed for single LS as well as letter clusters (rimes and onsets), thereby modeling knowledge of advanced readers. More recently (Schüppert et al., 2017) similar approach, based only on single LS correspondences (modeling beginning readers) was used in 16 European languages, including Polish. English was the least predictable, while Dutch had higher inconsistency for reading than German or Polish. Thus, the argument for orthography irregularity driving incongruency effect would not hold for beginning readers.

On the other hand, developmental, reading skill or even effects related to processing effort might modulate directionality of congruency effect as the currently examined sample is much younger and has less reading experience than all previously studied samples. The observed pattern could reflect an early stage in the process of LS integration in FHD− group, where the brain responds actively to the conflicting pairs. Only after automation, incongruent pairs could be suppressed. FHD+ on the other hand do not show the increased activation to conflicting pairs, but instead higher activity to congruent ones (especially those children who later develop DYS), which could later lead to failures in suppressing the incongruent information. This explanation would be consistent with studies showing that the automation in LS integration develops relatively slowly. For instance, reaction times of LS discrimination decisions steadily decreased during the whole range of Dutch primary school reading instruction (Froyen et al., 2009). This extended development towards automatic LS integration has been also observed in studies measuring EEG responses in a passive crossmodal ''oddball'' paradigm (Froyen et al., 2008, 2009, 2011; Žari´c et al., 2014). Readers with 4 years of reading experience showed an influence of letters presentation on the processing of speech sounds, but in a different temporal window than experienced adult readers. In beginner readers (with only 1 year of reading experience) or in DYS children, on the other hand, there was no indication of an early and automatic influence of (conflicting) letters during speech sound processing. These results suggest that beginner or DYS readers merely actively associate letters to speech sounds, whereas increasing experience with reading may lead to automatic LS integration (Froyen et al., 2008, 2009).

We propose that higher incongruency effect observed in left STC in beginning readers in the present study reflects this early stage of LS integration, which could reverse into congruency effect with increasing reading experience as observed in previous studies (Blau et al., 2009, 2010). The incongruency effect is also behaviorally relevant—in FHD− children or those who become typical readers the higher the incongruency effect in the left STS the better the performance in reading and reading-related tasks. Whereas FHD+ children show diminished incongruency effect, which is atypical for beginning readers. This result is in agreement with recent EEG-fMRI study examining audiovisual association processes of artificial stimuli in kindergarten children with familial risk for DYS (Karipidis et al., 2017). Higher familial risk for DYS correlated with diminished incongruency effect and children at a very high familial risk presented a congruency effect.

Conversely and rather unexpectedly, in the right hemispheric inferotemporal cortex, we found a congruency effect in FHD− children and an opposite effect (increased response to incongruent vs. congruent LS pairs) in the FHD+ group. This time the two groups differed only for the congruent condition. We did not find though any significant differences between

significant only in FHD− children.

children who later developed DYS and typical readers in brain response to neither congruent nor incongruent LS pairs in this region. Therefore, we suggest that the effects observed in the right inferotemporal cortex reflect early reading strategies based mostly on perceptual analysis of text and non-lexical form recognition system, which might be altered in FHD+ children. It was shown that in the course of reading acquisition, due to greater exposure to text, children shift from those strategies as reflected by progressive disengagement of the right ventral stream cortex (Turkeltaub et al., 2003).

In the current study additional FHD effects as well as early DYS predictors were found for unisensory conditions. When processing speech sounds, FHD+ children showed increased activation compared to FHD− group in right inferior and middle frontal gyri, possibly reflecting more effortful speech comprehension in at risk for DYS children (Monzalvo and Dehaene-Lambertz, 2013). Additional effects were found for the right aSTG ROI from Blau et al. (2010): children who developed DYS had higher activity in response to speech sounds than typically reading children. This result is in contrast to Blau et al. (2010) who found weaker activation to speech sounds in dyslexic compared to control children. It is in line though with studies on literate and illiterate subjects, where the response to speech in bilateral STG shows less activation for literate relative to illiterate participants (Dehaene et al., 2010).

Finally, for the unisensory letters, there was a trend for lower activity in the left fusiform gyrus ROI taken from Blau et al. (2010) in FHD+ compared to FHD− children. The location of this ROI is in close proximity to the visual word form area (VWFA) implicated in the processing of letters and words (Cohen et al., 2002; McCandliss et al., 2003; Cohen and Dehaene, 2004). Since the two groups had similar reading experience, it would be tempting to speculate that the abnormalities in the left occipitotemporal cortex in DYS (including hypoactivation in response to letters) is to some extent genetically driven and related to family risk. It is in agreement with anatomical studies showing less gray matter volume in left fusiform (Raschle et al., 2011) as well as atypical white matter organization in left ventral tract (Vandermosten et al., 2015) in FHD+ prereaders. Most importantly, however, we found no significant differences in the left fusiform between DYS and control children, even though behaviorally DYS children had lower letter knowledge and early reading skills. It is possible that differences between typical and DYS readers that emerge in orthographic processing later on may be a consequence of disordered crossmodal feedback into VWFA in DYS readers (Žari´c et al., 2017), as differences in VWFA activity are not present in children at the start of formal reading education.

#### LIMITATIONS

Even though we examined children with less than a year of formal reading education, only one third could be considered as prereaders. However, this relatively short period of formal reading instruction did not have a significant impact on the pattern of current findings. Additionally, we found a similar pattern of results in readers and prereaders (for details see **Supplementary Figure S1** in **Supplementary Materials**), which supports the approach to pool these two groups in the current study. Yet, present results should be treated with caution until replicated on even younger, pre-reading children.

Retrospective selection of children who developed DYS resulted in largely unequal sample sizes (DYS = 17; TR = 58), which make heterogeneity of variance a problem. We thus performed bootstrap analyses to test for group differences on each measurement—performance on each behavioral test and brain activation in each ROI to specific stimuli (letters, speech sounds and congruent or incongruent LS pairs). In this way, it was not possible to test for interaction between multisensory (congruent and incongruent) conditions and group.

Moreover, this article was designed to follow the approach by Blau et al. (2010), thus in the main text we used identical statistical threshold for whole brain analyses, namely voxel-wise threshold of p < 0.01, corrected for multiple comparisons using cluster extent threshold of p < 0.05. Since nowadays such threshold might be regarded as liberal (Eklund et al., 2016), the analyses were repeated with a p < 0.005 voxel-wise threshold with an extent of 50 voxels common for pediatric studies (for e.g., Wang et al., 2018). Reassuringly all clusters reported in the main text survived this more stringent statistical approach (see **Supplementary Table S1**).

## CONCLUSION

Our study shows that alterations in brain activity during LS integration can be detected at very early stages of reading acquisition, suggesting their fundamental involvement in later reading impairments. Left STC actively responds to the conflicting LS pairs, which translates into better reading skills in

## REFERENCES


children without the risk of developing DYS. The absence of such active response in FHD+ and even higher response to congruent LS in DYS in left PT could lead to failures in suppressing incongruent information during reading acquisition, which could result in future reading problems.

#### DATA AVAILABILITY

The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.

## AUTHOR CONTRIBUTIONS

JP analyzed and interpreted the data, drafted the manuscript. KC collected the data, analyzed and interpreted the data and revised the manuscript. ŁB helped with BrainVoyager software and revised the manuscript. MŁ, AD and AB collected the data and revised the manuscript. MW helped with data analysis. AM helped with fMRI design and revised the manuscript. NA helped with analysis and interpretation of the data and revised the manuscript. KJ designed the experiments, interpreted the data and drafted the manuscript. All the authors read and approved the final version of the manuscript.

### FUNDING

This work was funded by grants from the Polish Ministry of Science and Higher Education (IP2011 020271) and National Science Centre grants (2011/03/D/HS6/05584, 2014/14/A/HS6/00294 and 2016/22/E/HS6/00119). The project was realized with the aid of CePT research infrastructure purchased with funds from the European Regional Development Fund as part of the Innovative Economy Operational Programme, 2007–2013.

#### ACKNOWLEDGMENTS

We would like to thank all the families that participated in the current study.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnhum. 2018.00393/full#supplementary-material

of reading failure: a functional magnetic resonance imaging study of dyslexic children. Brain 133, 868–879. doi: 10.1093/brain/awp308


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Plewko, Chyl, Bola, Łuniewska, De˛bska, Banaszkiewicz, Wypych, Marchewka, van Atteveldt and Jednoróg. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Deficient Letter-Speech Sound Integration Is Associated With Deficits in Reading but Not Spelling

Ferenc Kemény <sup>1</sup> \*, Melanie Gangl <sup>1</sup> , Chiara Banfi<sup>1</sup> , Sarolta Bakos <sup>2</sup> , Corinna M. Perchtold<sup>1</sup> , Ilona Papousek <sup>1</sup> , Kristina Moll <sup>2</sup> and Karin Landerl <sup>1</sup>

1 Institute of Psychology, University of Graz, Graz, Austria, <sup>2</sup>Department of Child and Adolescent Psychiatry, Psychosomatics, and Psychotherapy, Ludwig-Maximilian University, Munich, Germany

Efficient and automatic integration of letters and speech sounds is assumed to enable fluent word recognition and may in turn also underlie the build-up of high-quality orthographic representations, which are relevant for accurate spelling. While previous research showed that developmental dyslexia is associated with deficient letter-speech sound integration, these studies did not differentiate between subcomponents of literacy skills. In order to investigate whether deficient letter-speech sound integration is associated with deficits in reading and/or spelling, three groups of third graders were recruited: (1) children with combined deficits in reading and spelling (RSD, N = 10); (2) children with isolated spelling deficit (ISD, N = 17); and (3) typically developing children (TD, N = 21). We assessed the neural correlates (EEG) of letter-speech sound integration using a Stroop-like interference paradigm: participants had to decide whether two visually presented letters look identical. In case of non-identical letter pairs, conflict items were the same letter in lower and upper case (e.g., "T t"), while non-conflict items were different letters (e.g., "T k"). In terms of behavioral results, each of the three groups exhibited a comparable amount of conflict-related reaction time (RT) increase, which may be a sign for no general inhibitory deficits. Event-related potentials (ERPs), on the other hand, revealed group-based differences: the amplitudes of the centro-parietal conflict slow potential (cSP) were increased for conflicting items in typical readers as well as the ISD group. Preliminary results suggest that this effect was missing for children with RSD. The results suggest that deficits in automatized letter-speech sound associations are associated with reading deficit, but no impairment was observed in spelling deficit.

Edited by: Silvia Brem, Psychiatrische Klinik der Universität Zürich, Switzerland

#### Reviewed by:

Milene Bonte, Maastricht University, Netherlands Susana Araújo, Universidade de Lisboa, Portugal

> \*Correspondence: Ferenc Kemény ferenc.kemeny@uni-graz.at

Received: 29 March 2018 Accepted: 19 October 2018 Published: 14 November 2018

#### Citation:

Kemény F, Gangl M, Banfi C, Bakos S, Perchtold CM, Papousek I, Moll K and Landerl K (2018) Deficient Letter-Speech Sound Integration Is Associated With Deficits in Reading but Not Spelling. Front. Hum. Neurosci. 12:449. doi: 10.3389/fnhum.2018.00449 Keywords: reading deficit, spelling deficit, dyslexia, letter-speech sound integration, cross-modal integration, letter-speech sound interference

#### INTRODUCTION

Strong association between letters and speech sounds is a crucial component of literacy skills. However, knowing letters and corresponding speech-sounds is not sufficient to develop proficient reading; these associations also need to be automatized (letter-sound integration hypothesis, Blomert, 2011). It has been suggested that letter-speech sound integration is deficient in poor readers (Bakos et al., 2017). The current experiment tests the automaticity of letterspeech sound associations with a Stroop-like interference task in 9-year-old children with developmental dyslexia—conceptualized as combined reading and spelling deficit (RSD), a group with isolated spelling deficit (ISD), and a group of typically developing (TD) children. Our aim is to identify whether the automatized nature of letter-speech sound associations is a crucial feature in reading development, or both.

Although reading and spelling skills are generally treated as closely related (Perfetti et al., 1997), the relationship is far from obvious. Some interpret the high correlation between reading and spelling (0.77 < r < 0.86) as an indicator that they are two aspects of the same phenomenon (Ehri, 1997). Others highlight that the high correlation between these skills only appears in opaque orthographies, like English, in which letters and speech sounds have various possible mappings (e.g., the ''o'' is decoded differently in ''womb,'' ''wombat'' or ''women''). In orthographies with transparent letter-sound correspondences, like German, reading accuracy is close to ceiling and reading fluency is the main criterion to measure reading skills. In these languages, reading fluency and spelling skills only show a moderate correlation (e.g., Moll and Landerl, 2009).

In accordance with only moderate correlations, dissociations have been reported between impairments in spelling and reading fluency (Wimmer and Mayringer, 2002; Moll and Landerl, 2009; Moll et al., 2014). Developmental studies showed that the prevalence of isolated, as well as combined RSDs (the latter usually referred to as dyslexia) are around 6%–8% in German (Moll and Landerl, 2009), while the combined deficit has a somewhat higher prevalence in French (Fayol et al., 2009). Neuropsychological studies also reported a double dissociation of reading and spelling skills (De Renzi et al., 1987; Mochizuki and Ohtomo, 1988). These studies argue that the underlying core problems are different. Isolated reading fluency deficit may be a consequence of impaired access to orthographic representations, while these representations are available for top-down spelling processes (Moll and Landerl, 2009). ISD, on the other hand, may be the result of a reduced orthographic lexicon. Reading skills are not affected, as even the reduced orthographic lexicon is sufficient for word recognition, i.e., reading (Frith, 1980). Another explanation of the lack of reading problems in individuals with ISD suggests that they may use highly efficient decoding strategies, which compensate for the deficient orthographic knowledge (Moll and Landerl, 2009). The dissociation between RSDs is further supported by evidence showing different cognitive profiles associated with impairments in reading fluency and spelling (Wimmer and Mayringer, 2002). Thus, spelling problems have been associated with phonological deficits, whereas reading fluency problems with difficulties in rapid automatized naming (i.e., the serial naming of repeated items presented in lines or columns), which is an indicator of visual-verbal access.

The distinct core deficits in reading vs. spelling impairment also suggest different patterns in automatized letter-speech sound associations. That is, since children with a spelling deficit have a reduced orthographic lexicon (Frith, 1980) but efficient decoding strategies (Moll and Landerl, 2009), they are expected to demonstrate preserved letter-speech sound associations. On the other hand, children with combined RSDs (i.e., dyslexia) are expected to show atypical automatized letter-speech sound associations, due to deficient access to orthographic representations. Thus, atypical automated letter-speech sound associations should be related to reading, and not spelling impairment.

The following section reviews previous results of letter-speech sound associations in dyslexia, and describes a classical method (Posner and Mitchell, 1967) that has been applied to assess letterspeech sound associations using event-related potential (ERP) in a novel study (Bakos et al., 2017). Since the current study is an ERP study, the introduction follows with the description of conflict-related ERP components, and then turns to the description of the current study.

## LETTER-SPEECH SOUND ASSOCIATIONS IN DYSLEXIA

Previous studies mainly used a passive oddball mismatch negativity (MMN) method to assess the neural correlates of crossmodal letter-speech sound associations. MMNs are elicited by deviant stimuli (Näätänen et al., 1978, 1993), and are interpreted as correlates of memory functions, violation detection or predictive functions (for a review, see Winkler, 2007). Although the MMN methodology mainly uses unimodal auditory or visual stimuli (Czigler, 2007), the method was also adapted to crossmodal associations, like letter-speech sound correspondences (Froyen et al., 2008; Moll et al., 2016). In crossmodal adaptations, visual (letter) and auditory (speech sound) stimuli were presented simultaneously. Auditory (Froyen et al., 2008) but not visual (Froyen et al., 2010) MMNs were boosted by simultaneous congruent information, but only in advanced and not beginning readers (Froyen et al., 2009; Jones et al., 2016) and also not in children with dyslexia (Froyen et al., 2011). These results have been integrated, suggesting that automatization of letter-speech sound associations develops with reading proficiency, but not in the case of a reading deficit (Blomert, 2011). Results further suggest that the crossmodal MMN deficit is most pronounced when the auditory and visual stimuli come simultaneously (Žari´c et al., 2014). On the other hand, the impairment was found to be at least partially reversible (Žari´c et al., 2015). Reading skills also correlated with the elicited MMN measures of letter-speech sound integration (Žari´c et al., 2014, 2015). Others, however, did not replicate absent crossmodal MMN effect in dyslexia, but observed a delay (Moll et al., 2016).

While the MMN methodology has been successfully adapted to crossmodal events, a disadvantage of the method is rooted in its passive nature. Long passive observation tasks are difficult to administer with school aged children. In addition, behavioral data that allows controlling performance rate and attention are not available in passive tasks. Using an active priming task, Nash et al. (2017) found that children with dyslexia show a pattern similar to a reading-age-matched control group, which suggests that letter-speech sound integration is a function of reading proficiency.

Similar results were borne out by Bakos et al. (2017), using an adapted Stroop-like letter-speech sound interference paradigm (Posner and Mitchell, 1967). Throughout the task, participants saw two letters, and had to decide whether the two letters are visually identical or not by pressing a response key. The letter pairs could be the same letter with the same visual features (e.g., ''t t'' or ''T T'': ''yes'' answer), different letters (e.g., ''T k'' or ''t K'': ''no'' answer), or the same letter in different cases (e.g., ''T t'': ''no'' answer). The critical comparison is between the two ''no'' answer conditions, which differ in whether they are conflicting or not. Conflict emerges from the same letters presented in different cases (e.g., ''T t''): they are visually different but are associated with the same phoneme. A novel ERP study used this task to compare neural correlates of conflict in RSD and TD children, and found a similar reaction time (RT) increase to conflict trials in both groups, but conflict-related ERP amplitude modulation was missing in RSD (Bakos et al., 2017).

The current study replicates and extends the Bakos et al. (2017) study by using the same method but contrasting the effect of reading vs. spelling deficit on the automatization of letter-speech sound associations. Since the method is based on interference processing, it is important to differentiate between general inhibitory processes and processes related to letter-speech sound integration and how they are associated with dyslexia. Although a number of articles found atypical inhibitory performance in dyslexia (Everatt et al., 1997; Helland and Asbjørnsen, 2000), it is not clear, whether the deficit is rooted in the overall higher response latencies of dyslexic children (Das, 1993; Protopapas et al., 2007; Faccioli et al., 2008), or in the fact that some experimental tasks loaded on reading skills or employed letter-based stimuli (Reiter et al., 2005; Bakos et al., 2017). To avoid confounding effects, the current study tests both behavioral and ERP measures of conflict processing. The following section provides an overview of conflict-related ERP components, and their realization in children with dyslexia.

#### CONFLICT-RELATED ERP COMPONENTS

Previous studies addressed conflict processing and conflict resolution mainly with Stroop or Flanker tasks. These studies identified three crucial components of conflict identification, conflict monitoring and conflict resolution: an N1 (Yu et al., 2015) and an N2 component (Larson et al., 2014), as well as a late positive complex (West, 2003), respectively.

The N1 is a negative fronto-central component peaking between 100 ms and 200 ms. While the N1 was shown to be sensitive to conflict detection (Yu et al., 2015), both conflictrelated amplitude increase (Johnstone et al., 2009) as well as decrease (Yu et al., 2015) have been reported. Previous studies have also found atypical N1 amplitude modulation in adults (Mahé et al., 2014), and children with dyslexia (Bakos et al., 2017).

The second, N2 component peaks between 250 ms and 350 ms, and has a maximum over fronto-central electrodes. Previous studies argue that this component results from conflict detection and monitoring, originating from the anterior cingulate cortex (Larson et al., 2009, 2014). Similar to the N1 component, some results showed decreased (Yu et al., 2015), while others reported increased amplitudes for conflict vs. non-conflict trials (Johnstone et al., 2009). Yet others found no conflict-related N2 amplitude modulation either in TD children, or in children with dyslexia (Henkin et al., 2010; Bakos et al., 2017). A further adult study observed altered N2 amplitude modulations for conflict in the flanker task, but only in typical readers, not in individuals with dyslexia (Mahé et al., 2014).

The last conflict-related component is the conflict slow potential (cSP), which is a late positive complex. The complex begins approximately 500 ms after stimulus onset and is observable over the centro-parietal electrodes. The cSP has been hypothesized to originate from lateral and posterior cortices (West, 2003; Hanslmayr et al., 2008; Larson et al., 2014), and to reflect conflict resolution (West, 2003) or response selection (West et al., 2005). Bakos et al. (2017) found a marginally significant amplitude decrease to conflict trials in TD children, but not in children with dyslexia.

## THE CURRENT STUDY

The current study is a replication and extension of the Bakos et al. (2017) study. As described above, Bakos et al. (2017) tested typical readers as well as children with combined RSDs in a letter-speech sound interference task and found a significant conflict-related N1 as well as cSP amplitude decrease in typical development. Both effects were missing in children with combined RSDs. Neither group showed any signs of N2 conflictsensitivity.

We are aimed at replicating the results, expecting that children with dyslexia (combined RSD) show deficient conflict processing with stimuli relying on automatized letter-speech sound associations. The deficit is expected to root in the deficient access to the orthographic representations. Extending the previous design, we also tested children with ISD. ISD


Note. 1: CFT-IQ (Weiß, 2006), 2: percentiles of reading speed on SLS 2–9 (Wimmer and Mayringer, 2014), 3: percentiles of 1-min word reading and 4: 1-min pseudoword reading on SLRT-II (Moll and Landerl, 2010), 5: percentiles of spelling on DRT 3 (Müller, 2004). For all group comparisons, p < 0.001.

is associated with reduced orthographic lexicon, which does not affect reading abilities, due to the use of underspecified orthographic representations for reading (Frith, 1980) and/or efficient decoding skills (Moll and Landerl, 2009). Thus, we expect typical automated letter-speech sound associations in ISD. We also expect that none of the groups show an impairment in general inhibitory measures, as revealed by a comparable increase in RTs for the conflicting trials. To this end, we expect increased RTs in conflict processing in all three groups.

## MATERIALS AND METHODS

## Participants

Altogether 48 children participated in the study. Children were selected based on a screening of 3rd graders in primary schools in and around Graz, Austria. Initially, reading (Wimmer and Mayringer, 2014) and spelling skills (Müller, 2004) were assessed in a classroom setting. Later, reading skills were reassessed using individual 1-min word and pseudoword reading tasks (Moll and Landerl, 2010). Three groups were selected based on the screening: children with combined RSD, children with ISD and TD children. RSD children had both reading and spelling skills ≤20th percentile. Children with ISD had spelling skills ≤20th percentile, but ≥25th percentile in reading. TD children performed ≥25th percentile on both reading and spelling. All children were monolingual German speakers, had an IQ ≥85 (Weiß, 2006), and had normal or corrected vision. No children had a history of sensory or neurological deficits, had a clinical diagnosis of ADHD, or an abovethreshold score on a parental questionnaire for attention deficits (FBB-ADHS, DISYPS-II, Döpfner et al., 2008). The final pool of participants was composed of 10 children with RSD, 17 children with ISD and 21 TD children. Age, IQ, reading and spelling abilities for each of the groups are provided in **Table 1**.

EEG recording took place in an acoustically and electrically shielded examination room at the University of Graz. An examiner stayed with the children throughout the testing session to provide support and monitor adherence to the testing protocol. Children received 25 e for their participation. This study was carried out in accordance with the recommendations of the Ethical Committee of the University of Graz. The protocol was approved by the University of Graz. Parents of all subjects gave written informed consent in accordance with the Declaration of Helsinki.

#### Stimuli and Procedure

Children were shown two letters at the same time and were instructed to press a certain key on the keyboard when the two letters were visually identical (i.e., ''looked the same'') and another response key when the letters were visually different (i.e., ''looked different''). Identical pairs were upper case or lower case (e.g., ''T T'' or ''k k''), different pairs were different letters (Non-conflict items, e.g., ''T k'' or ''t K,'' one of the letters always lower case, the other always upper case), or the same letter in upper case and lower case (Conflict items, e.g., ''T t''). There were 45 lower case and 45 upper case Same items, 60 Conflict and 60 Non-conflict different items. Participants were instructed to respond as fast as possible by pressing ''p'' for same, and ''q'' for different items on a QWERTZ keyboard<sup>1</sup> .

The items were composed of a crosshair shown for 1,000 ms, then the letter pair appearing in 57-pt Arial in the middle of the screen until response. The response was followed by a blank screen for 1,000 ms. Altogether, 210 items were used in two blocks of 105 items. The order of the stimuli was randomized, and participants had a self-paced break between the blocks.

## EEG Recording and Preprocessing

EEG recording was done from 19 channels according to the international 10-20 system, using a Brainvision BrainAmp Research Amplifier (Brain Products, sampling rate of 500 Hz, resolution 0.1 µV) and a stretchable electrode cap, referenced to the nose and re-referenced offline to a mathematically averaged ears reference (Essl and Rappelsberger, 1998; Hagemann, 2004; Papousek et al., 2016). Impedance was kept below 5 kΩ. EOG measures were obtained to identify ocular artifacts. The vertical EOG was recorded from the supra- and sub-orbit of the right eye, the horizontal EOG was recorded from the outer canthi using adhesive Ag/AgCl electrodes. The continuous EEG was filtered (low cutoff: 0.1 Hz, time constant: 15.91, 24 dB/Oct; high cut off: 100 Hz, 24 dB/Oct; notch filter: 50 Hz), EOG artifacts were removed by automatic ocular correction, using an ICA algorithm as implemented in BrainVision Analyzer 2.0 (slope mean, over the whole data, ICA with infomax algorithm, total squared correlations to delete: 30%; Gratton et al., 1983). Then data was segmented into epochs of −100 to 700, in which the time window of −100 to 0 served as the basis for baseline correction. Only segments with a correct response outside the 0–700 ms time window were considered. Other artifacts were excluded automatically (gradient criteria: more than 50 µV difference between two successive data points or more than 200 µV difference in a 200 ms window; absolute amplitude criteria: amplitudes exceeding +100 or −100 µV; low activity criterion: less than 0.5 µV activity in a 100 ms window). All participants had at least 26 valid segments in each of the two ''Different'' conditions, thus all children were included in the analyses. The mean number of included conflict and non-conflict epochs were 53.20 (SD = 8.59) and 51.40 (SD = 8.45) for the RSD, 50.41 (8.19) and 49.35 (9.50) for the ISD and 52.10 (6.50) and 52.38 (6.99) for the TD group.

F3, Fz and F4 electrodes were pooled for the analyses of the N1 and N2 components. The time window for the N1 components was between 90–170 ms after stimulus onset, whereas the time window for the N2 component was between 310–380 ms. For the cSP, the Pz electrode was considered between 500 ms and 700 ms after the onset of the stimulus. Regions of interest and time windows were based on a previous

<sup>1</sup>All other keys were removed from the keyboard.

studies using the same paradigm (Bakos et al., 2017). EEG montage and regions of interest are provided in **Figure 1**.

### RESULTS

## Behavioral Measures: Accuracy and Reaction Times

First behavioral measures were analyzed. Since accuracies for the two stimulus-type conditions across the three groups were above 97.5%, accuracies were not analyzed directly. In the case of RTs, only RTs for correct answers were considered. We calculated the median RTs for Conflict as well as Non-conflict items for all participants. To account for speed-accuracy tradeoff, RTs were corrected by dividing them with the corresponding accuracy. RTs by Stimulus-type and by Group are provided on **Figure 2**. We ran a 2 × 3 mixed ANOVA with Stimulus-type (Conflict vs. Non-conflict) as within-subject and Group (RSD vs. ISD vs. TD) as between subject variable. The ANOVA revealed a significant main effect of Stimulus-type, F(1,45) = 33.002, p < 0.001, η 2 <sup>p</sup> = 0.423, with higher RTs for Conflict than for Non-conflict items. Neither the main effect of Group, nor the Stimulus-type × Group interaction were significant (both ps > 0.278). To confirm that all three groups indeed showed significantly higher RTs for conflict than for non-conflict items, we ran separate repeated measures ANOVAs for each group with Stimulus-type as within-subject variable. A significant effect of stimulus-type was confirmed for all three groups, F(1,9) = 15.838, p = 0.003, η 2 <sup>p</sup> = 0.638 for the RSD group, F(1,16) = 22.382, p < 0.001, η 2 <sup>p</sup> = 0.583 for the ISD group and F(1,20) = 4.842, p = 0.040, η 2 <sup>p</sup> = 0.195 for the TD group.

## Early Frontal Correlates of Conflict—N1 and N2

indicate SEM. Asterisks indicate significant differences (p < 0.05).

Next, conflict-related N1 peak amplitudes were analyzed. In accordance with previous studies (Moll et al., 2016; Bakos et al., 2017) pooled signals from the F3, Fz and F4 electrodes were used with Stimulus-type (Conflict vs. Non-conflict) as within-subject and Group (RSD vs. ISD vs. TD) as between subject variables. The same analysis was carried out for both the N1 and N2 components. N1 and N2 amplitudes by Stimulus-type and Group are provided in **Figure 3**. For N1, the 2 × 3 mixed ANOVA revealed a significant main effect of Group, F(2,45) = 3.411, p = 0.042, η 2 <sup>p</sup> = 0.132. No other effects were significant (all ps ≥ 0.236). Since no conflict related effect were found, no further analyses were conducted.

Similarly, the ANOVA for N2 peak amplitudes showed only a significant main effect of Group, F(2,45) = 3.774, p = 0.031, η 2 <sup>p</sup> = 0.144. No other effects were significant, all ps ≥ 0.266.

#### Late Parietal Correlates of Conflict—cSP

Next, conflict-related slow potentials were analyzed over the centro-parietal Pz electrode. Conflict-related amplitudes are provided in **Figure 4**. A 2 × 3 mixed ANOVA was used with Stimulus-type (Conflict vs. Non-conflict) as withinsubject variable, and Group (RSD vs. ISD vs. TD) as between subject variable. The ANOVA revealed a significant Stimulus-type × Group interaction, F(2,45) = 3.284, p = 0.047, η 2 <sup>p</sup> = 0.127. No other effects were significant, all ps ≥ 0.290. Amplitudes for cSP are provided on **Figure 4**.

To further analyze the Stimulus-type × Group interaction, a separate repeated-measures ANOVA was conducted for each group with Stimulus-type (Conflict vs. Non-conflict) as within-subject variable. The ANOVAs revealed a significant effect for the TD group, F(1,20) = 6.801, p = 0.017, η 2 <sup>p</sup> = 0.254, as well as for the ISD group, F(1,16) = 4.946, p = 0.041, η 2 <sup>p</sup> = 0.236. For the TD and ISD groups, amplitudes for conflict were more positive than for non-conflict items. The RSD group showed the reverse pattern, this difference, however, was not significant, F(1,9) = 0.907, p = 0.366, η 2 <sup>p</sup> = 0.092.

FIGURE 3 | N1 and N2 amplitudes by Stimulus-type and by Group. Highlighted areas indicate the N1 (90–170 ms after stimulus onset) and N2 (310–380 ms after stimulus onset) time windows. Scalp maps show the averaged activity in the highlighted time windows.

## DISCUSSION

The current study tested the automaticity of letter-speech sound associations using an interference task. Three groups of 3rd graders were tested: a group with combined RSD, a group with ISD, and a group of typical readers (TD). Results showed no general inhibitory deficit, as all three groups showed a comparable RT increase for conflicting trials. However, the neural signatures of conflict processing differed between the groups. Whereas the N1 and N2 components remained unaffected, cSP amplitudes were modulated differently throughout the groups. In the case of the TD and ISD groups, conflicting events elicited a more positive cSP than non-conflicting events. The RSD group on the other hand showed a non-significant reverse pattern. That is, the lack of the effect is not a power issue, but the consequence of different conflict processing. Note, however, that there were only 10 participants in the RSD group. Further supporting analyses are provided in **Supplementary Data Sheet 1 (Supplementary Analysis 1)**. These analyses compare the 10 RSD participants to 10 ISD children matched on spelling skills, and 10 TD children with reading skill matching the selected ISD participants.

To sum up, the most important results are that all groups show a comparable behavioral difference between non-conflict and conflict items, suggesting that general inhibition may not be deficient in any of the groups. ERP results, however, show typical patterns of conflict-processing in the ISD group, but no effect in the RSD group. That is, letter-speech sound integration deficits are more closely associated with reading skills, whereas no similar effects were observed for spelling.

Based on the findings, the discussion focuses on three relevant issues: (1) how can the lack of a deficit in letter-speech sound integration in ISD be explained in terms of previous theoretical frameworks; (2) how can the current results of the RSD group be reconciled with Bakos et al.'s (2017) notion that deficient automatized letter-speech sound associations are related to reading impairment; and (3) how can the relation between cSP amplitude modulation and activation of phonological information be explained.

First, the typical conflict-related amplitude modulation in ISD might be associated with decoding skills. Previous theoretical frameworks by Moll and Landerl (2009) suggest that children with ISD have a reduced orthographic lexicon, which is compensated by highly efficient decoding strategies. The current results are in line with this hypothesis, as typical letter-speech sound associations were observed in ISD.

The current results are also in line with previous studies showing that sensitivity to letters is a crucial factor predicting reading performance in TD children (Kemény et al., 2018). Our findings corroborate earlier evidence that individuals with reading deficit may be impaired in print sensitivity (Maurer et al., 2006, 2011; Hasko et al., 2013; Fraga González et al., 2014; Araújo et al., 2015), but this impairment might only appear from a certain age onwards (for discussion, see Kemény et al., 2018). Since the cited evidence does not stem from crossmodal processing, it is a question whether the core deficit is in fact associated to letter-speech sound associations, or rather to letterbased effects. The current study was not designed to address this question though.

## Replication Differences

The current study replicated the findings of Bakos et al. (2017) in a number of ways: both studies found RT effects of stimulus-type in all groups and interpreted those as a sign of intact general inhibitory mechanisms. Both studies found atypical conflict processing in RSD, although the temporal windows were different: Bakos et al. (2017) reported differences on the N1 as well as the cSP amplitudes, whereas the current study only found differences in the cSP amplitude modulation (see below).

RTs of the Bakos et al. (2017) study were also higher: TD children responded on average in 1,059 ms to Non-conflict and 1,097 ms to Conflict trials, and the measures were 1,167 ms to Non-conflict and 1,185 ms to Conflict trials in children with dyslexia. The RTs observed in the current study were 857 and 893 ms for TD and 879 and 948 ms for the RSD group. This is a difference around 200 ms (22%) in TD, and 260 ms (29%) in the RSD group.

As the selection criteria were identical, differences might be rooted in the sample characteristics. As shown in **Table 2**, between group differences on both word reading and spelling skills were smaller in our sample than in the sample of Bakos et al. (2017). Group-differences in word reading skills may be crucial for the integration of the results. Both the Bakos et al. (2017) and the current results argue that conflict-related amplitude modulations are associated with reading abilities, with only good readers exhibiting conflict sensitivity. Reading abilities of TD children in Bakos et al. (2017), though, were better than in the current study (difference is more than 7.5 percentile). Thus, it is plausible, that the differences in the N1 amplitude modulation are rooted in reading skills: better readers show earlier sensitivity, less good readers only show later effects (cSP), and children with reading deficit show no effect at all. The current study was not planned for such an analysis, an individual differences design with a larger sample size could provide further insights.


Note. 1: CFT-IQ (Weiß, 2006), 2: percentiles of reading speed on SLS 2–9 (Wimmer and Mayringer, 2014), 3: percentiles of 1-min word reading and 4: 1-min pseudoword reading on SLRT-II (Moll and Landerl, 2010), 5: percentiles of spelling on DRT 3 (Müller, 2004).

### Activation of the Phonological Codes

Our results replicate those of Bakos et al. (2017) showing no conflict-related amplitude modulations of the cSP in dyslexia. Bakos et al. (2017) provided two possible explanations to this phenomenon that could not be distinguished based on their results: on the one hand, phonological codes might not be activated in developmental dyslexia until 900 ms after stimulus onset. This was a reasonable assumption, provided that the mean RT of RSD children was 1,167 ms to Non-conflict and 1,185 to Conflict items, whereas the cSP was only analyzed until 900 ms. It is thus possible, that the typical pattern emerges between 900 ms and response. On the other hand, automatic phonological activation might not take place at all in children with developmental dyslexia.

In the current study, we found no conflict related cSP amplitude modulation in the 500–700 ms time window. **Supplementary Data Sheet 2 (Supplementary Analysis 2)** provides an analysis of the cSP in the −200 to 0 ms time window before response, where we still found no conflict-related amplitude modulation in the RSD group, arguing against late activation. That is, such an experimental design may not elicit automatic activation of the phonological codes in RSD. While behavioral results (that is, RT increase to conflicting trials) may not differentiate between automatic and controlled processes, cSP amplitudes may only be sensitive to the former, not the latter. This hypothesis should be supported by focused ERP experiments contrasting automatic and controlled processes.

## CONCLUSION

The current study was aimed at testing whether automatized letter-speech sound associations contribute to reading skills, spelling skills or both. We tested children with combined

## REFERENCES


RSD, children with ISD, and TD children. We used a letterspeech sound interference task, in which conflict was evoked by presenting the same letter in different cases (but representing the same phoneme). The current study argues that neither reading deficit, nor spelling deficit is associated with impaired inhibition. At the same time, typical conflict-related N1, N2 and cSP amplitude modulations are not observed in RSD. Results argue, that automatized letter-speech sound association is a crucial factor in the development of reading fluency skills (Blomert, 2011).

## AUTHOR CONTRIBUTIONS

KL and KM were involved in the conceptualization and design of the study. SB, IP and KM developed and applied the methodology. FK, MG, CB and CP acquired the data. FK performed the analyses and wrote the manuscript.

## FUNDING

This research was carried out within the D-A-CH international stand-alone project entitled ''Identification of neurobiological causes and neuro-cognitive profiles in children with isolated reading deficits and isolated spelling deficits'' supported by the Austrian Science Fund (FWF, grant no. I 1658- G22, principal investigator: KL) and the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG, grant no. MO 2569/2-1, principal investigator: KM).

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnhum.20 18.00449/full#supplementary-material

eds C. A. Perfetti, L. Rieben and M. Fayol (Mahwah, NJ: Lawrence Erlbaum Associates Publishers), 237–269.


visual MMN study. Front. Integr. Neurosci. 4:9. doi: 10.3389/fnint.2010. 00009


Hanslmayr, S., Pastötter, B., Bäuml, K.-H., Gruber, S., Wimber, M., and Klimesch, W. (2008). The electrophysiological dynamics of interference during the Stroop task. J. Cogn. Neurosci. 20, 215–225. doi: 10.1162/jocn.2008.20020

Hasko, S., Groth, K., Bruder, J., Bartling, J., and Schulte-Körne, G. (2013). The time course of reading processes in children with and without dyslexia: an ERP study. Front. Hum. Neurosci. 7:570. doi: 10.3389/fnhum.2013.00570


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Kemény, Gangl, Banfi, Bakos, Perchtold, Papousek, Moll and Landerl. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Cognitive Profiles of Developmental Dysgraphia

Diana Döhla<sup>1</sup> , Klaus Willmes<sup>2</sup> and Stefan Heim1,3 \*

<sup>1</sup> Department of Psychiatry, Psychotherapy and Psychosomatics, Medical Faculty, RWTH Aachen University, Aachen, Germany, <sup>2</sup> Department of Neurology, Medical Faculty, RWTH Aachen University, Aachen, Germany, <sup>3</sup> Institute of Neuroscience and Medicine (INM-1), Forschungszentrum Jülich, Jülich, Germany

Developmental dysgraphia is a disorder of writing/spelling skills, closely related to developmental dyslexia. For developmental dyslexia, profiles with a focus on phonological, attentional, visual or auditory deficits have recently been established. Unlike for developmental dyslexia, however, there are only few studies about dysgraphia, in particular about the variability of its causes. Research has demonstrated high similarity between developmental dyslexia and dysgraphia. Thus, the aim of the study was to investigate cognitive deficits as potential predictors of dysgraphia, analogously to those for dyslexia, in order to identify dysgraphia profiles, depending on the particular underlying disorder. Different tests were carried out with 3rd and 4th grade school children to assess their spelling abilities, tapping into phonological processing, auditory sound discrimination, visual attention and visual magnocellular functions as well as reading. A group of 45 children with developmental dysgraphia was compared to a control group. The results showed that besides phonological processing abilities, auditory skills and visual magnocellular functions affected spelling ability, too. Consequently, by means of a two-step cluster analysis, the group of dysgraphic children could be split into two distinct clusters, one with auditory deficits and the other with deficits in visual magnocellular functions. Visual attention was also related to spelling disabilities, but had no characteristic distinguishing effect for the two clusters. Together, these findings demonstrate that a more fine-grained diagnostic view on developmental dysgraphia, which takes the underlying cognitive profiles into account, might be advantageous for optimizing the outcome of individuum-centered intervention programs.

Keywords: developmental dysgraphia, spelling, profiles, comorbidities, phonological processing, auditory processing, visual attention, visual magnocellular functions

#### Edited by:

Gorka Fraga González, Universität Zürich, Switzerland

#### Reviewed by:

Hong-Yan Bi, Institute of Psychology (CAS), China Patrick Snellings, University of Amsterdam, Netherlands

> \*Correspondence: Stefan Heim s.heim@fz-juelich.de; sheim@ukaachen.de

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 27 April 2018 Accepted: 01 October 2018 Published: 22 November 2018

#### Citation:

Döhla D, Willmes K and Heim S (2018) Cognitive Profiles of Developmental Dysgraphia. Front. Psychol. 9:2006. doi: 10.3389/fpsyg.2018.02006

**Abbreviations:** BAKO 1–4, Engl.: Basic competences for reading and spelling skills, German: Basiskompetenzen für Lese-Rechtschreibleistungen; CFT 20-R, Engl.: Cattell Culture Fair Test 20 – Revision, German: Grundintelligenztest Skala 2 - Revision; CVE, cue validity effect; DRT, Engl.: Diagnostic spelling test for 3rd/4th grade, German: Diagnostischer Rechtschreibtest für 3./4. Klassen; DSM-V, diagnostic and statistical manual of mental disorders, the fifth edition; H-LAD, Engl.: Heidelberger test for auditory sound discrimination, German: Heidelberger Lautdifferenzierungstest; ICD-10, International Statistical classification of diseases and related health problems-tenth revision; KNUSPEL-L, Engl.: Knuspel's reading exercises, German: Knuspels Leseaufgaben; WHO, World Health Organization.

## INTRODUCTION

fpsyg-09-02006 November 20, 2018 Time: 19:8 # 2

Developmental dysgraphia is a disorder characterized by difficulties in the acquisition of writing/spelling skills despite adequate schooling, visus and normal IQ. It is closely related to developmental dyslexia, a disorder of the acquisition of reading skills, which has been more in the focus of investigation for the past years. As defined by the American Psychiatric Association (2014) and the World Health Organisation [WHO] (2018) dyslexia and dysgraphia can co-exist as well as occur alone. The prevalence for reading and writing impairments is reported to be about 7–17% (Shaywitz and Shaywitz, 2005; Hawke et al., 2009).

There are several parallels between dyslexia and dysgraphia with respect to their underlying cognitive abilities and relevant cognitive skills (for a detailed review see Döhla and Heim, 2016), which shall be outlined here briefly. (1) There is evidence for a link between reading and spelling and phonological processing abilities. For instance, Snowling (2000) describes phonological awareness as the most known underlying deficit of developmental dyslexia. Phonological awareness as well as phonological working memory was reported to play an important role for dyslexia (Seigneuric and Ehrlich, 2005; Pennington et al., 2012) as well as for dysgraphia (Moll et al., 2009, 2012; Winkes, 2014; Capodieci et al., 2018 only for working memory). (2) The automatization of linguistic, motor and cognitive skills is supported by the cerebellum (Ito, 2008). Consequently, a dysfunctional cerebellum leads to problems with procedural learning resulting in a deficit in automatization that finally ends up in reading and writing deficits (dyslexia: e.g., Fawcett et al., 1996; Nicolson et al., 2001; Tiffin-Richards et al., 2001; dyslexia and dysgraphia: Nicolson and Fawcett, 2011; World Health Organisation [WHO], 2018). The relevance of automatization for reading and spelling, however, is not undisputed, since other studies failed to observe a deficit in a variety of automatization tasks for dyslexics (Heim et al., 2008 for children; Ramus et al., 2003 for adult). (3) There is ample evidence for the impact of magnocellular functions on reading, in particular auditory processing skills (Ramus et al., 2003; Steinbrink et al., 2014) and visual processing (Stein, 2001; Tholen et al., 2011). The connection of auditory processing and spelling has also been demonstrated (Schaadt et al., 2015). (4) Moreover, the relation between reading deficits and deficits in orienting spatial attention have been demonstrated, e.g., by Facoetti et al. (2003). Bosse et al. (2007) reported that both visual attention deficits as well as a phonological disorder can be associated with dyslexia, thus causing reading problems for different reasons. Banfi et al. (2017) investigated visuo-spatial cueing effects for children with isolated reading and spelling problems as well as a combined disorder. In contrast to children with an isolated reading or spelling disorder, children with a combined reading and spelling deficit showed a cueing deficit, which means, no significant difference in reaction time between valid and invalid cues. Dyslexic and dysgraphic children differed with respect to a position effect (Banfi et al., 2017). Whereas poor readers had a strong right-over-left advantage, poor writers had no position effect. Connecting visual and auditory information is crucial for learning to read and spell, e.g., with respect to grapheme–phoneme and, respectively, phoneme–grapheme correspondence. During speech perception, typically developing children profit from the bimodal presentation of stimuli: the combination of printed letters (visual stimulus) and speech sounds (auditory stimulus). Schaadt et al. (2018) revealed visualauditory speech perception difficulties for children with spelling difficulties. Usually the combination of visual information and auditory processing supports information processing, but children with spelling deficits seem to fail in using this crossmodal integration (Schaadt et al., 2018). (5) Several studies revealed comorbidity between ADHD and spelling deficits (Adi-Japha et al., 2007; Capodieci et al., 2018). (6) Finally, the connection of SLD and later reading and writing performance has already been in the focus of investigation (dyslexia: Pennington and Bishop, 2009; dysgraphia: Puranik and Lonigan, 2012).

Because of the heterogeneity of diverse underlying deficits, a lot of research has been conducted to identify profiles of developmental dyslexia (e.g., Lachmann et al., 2005; Bosse et al., 2007; Reid et al., 2007). Heim et al. (2008) found three different dyslexic clusters: their Cluster A performed worse in phonological, visual and auditory tasks, Cluster B was characterized by a deficit only in phonological awareness and Cluster C scored worse only in visuospatial attention. Interestingly, automatization skills did not seem to have an influence on reading skills. Because of the known similarities of developmental dyslexia and developmental dysgraphia, it can be assumed that developmental dysgraphia might be characterized by meaningful profiles as well.

As indicated in Döhla and Heim (2016), the close relationship of dyslexia and dysgraphia as disorders on the one hand and the documented relationship between reading disability in dyslexia and deficits in the variables mentioned above (i.e., auditory processing, visual magnocellular functions and visual attention), on the other hand, show that the latter might also play a critical role for success or failure in acquiring spelling skills. The important influence of phonological processing on reading and spelling performance has already been established. But overall, there is much richer evidence for the field of dyslexia. Therefore the aim of the present study is to transfer existing knowledge about developmental dyslexia to dysgraphia with a focus on spelling abilities and consequently to investigate if differential profiles of developmental dysgraphia exist, and if so, whether these potential profiles also resemble those reported for dyslexia. Characterizing such profiles might help to specify prevention and therapy methods later on. For the sake of comparability to the previous study about cognitive profiles of dyslexia by Heim et al. (2008), the methodological approach of the study was kept as similar as possible: (1) The groups of dysgraphic and normal writers were compared against each other, with sample sizes in the present study which are comparable to those used by Heim et al. (2008) in order to identify three clusters. (2) To identify profiles, the group of dysgraphic children was clustered with respect to performance in tests of phonological processing, auditory sound discrimination, visual processing, and visual attention. In contrast to the previous study of Heim et al. (2008), scores for phonological working memory were entered into the study as a further variable, whereas automatization

was excluded because it had not contributed to any of the dyslexia profiles (see also Ramus et al., 2003), and also in order not to exhaust children with too long testing sessions. The deficit profiles of these groups were then established by comparisons between the clusters obtained and between each of these clusters and the control group. (3) Finally, relationships between writing and reading were assessed. For the sake of simplicity, the comorbid abilities and skills are subsumed under the term "cognitive variables" in the remainder of this paper.

## MATERIALS AND METHODS

## Participants

One hundred and thirty-two children and their parents agreed to participate. Out of these, those 98 children, who had a non-verbal IQ ≥ 70 and thus did not suffer from a general learning disorder according to the ICD-10; World Health Organisation [WHO] (2018), were included. Children with T-score < 37/percentile < 10 in a standardized German spelling assessment (see below) were assigned to the group of dysgraphic children. Children with a T-score ≥ 43 (i.e., percentile ≥ 25) were assigned to the group of normally spelling children. The T-scores of 37 and 43 were chosen because of non-perfect reliability of the test: The 90% confidence interval for the T-score 40 denoting the lower boundary of the normal value range is 40 ± 3.2. Out of the 132 children who volunteered, 25 did not fit the inclusion criteria. Moreover, data sets of nine children were not complete and thus could not be included in the analysis, yielding a total of 98 (65 boys and 33 girls) valid and complete data sets. Fifty-four children were in 3rd grade, 44 in 4th grade. An overview of the sample of participants is shown in **Table 1**. Twenty-one of the 45 dysgraphic children additionally had developmental dyslexia. We considered this issue by running separate analyses for the entire sample and only for the dysgraphic children with no reading difficulties, respectively (see below).

The volunteers for this study had been recruited from six different German primary schools and special education schools from Cologne and Mönchengladbach, which agreed to take part, furthermore from one practice for speech therapy in Aachen, in the period between March 2014 and April 2015. Parents were provided detailed information about the content of the study according to the Declaration of Helsinki (World Medical Association, 2000). Written informed consent was obtained from all parents and children before participation. The study was approved by the local Ethics Committee of the Medical Faculty, RWTH Aachen University.



#### Procedure

There were two test sessions. Spelling abilities, IQ and reading abilities were tested in a group setting on the 1st day. On the 2nd day, dysgraphic and non-dysgraphic children who fulfilled the inclusion criteria stated above were tested individually for their performance in phonological processing, auditory sound discrimination, visual magnocellular functions, and visual attention. The order of tests on the 2nd day was counterbalanced over participants to avoid order effects. All tests were administered in a quiet room in the schools or in the practice for speech therapy.

#### Tests

The tests for spelling ability and IQ were administered to check inclusion criteria and are therefore described first. The other tests served as the dependent variables when investigating for clusters in the dysgraphic sample. An overview of the different tests and their settings is presented in **Table 2**.

#### Spelling Ability

Spelling skills were tested with the German DRT-3 (Müller, 2003, for grade 3) or DRT-4<sup>1</sup> (Grund et al., 2004, for grade 4). Sentences with a missing word were presented to the children and they were asked to write down the missing word, e.g., "Bert kauft das \_\_\_\_\_\_\_." ["Buch"] ("Bert buys the \_\_\_\_\_\_\_." ["book"]). This test provides T-scores for spelling accuracy.

#### Non-verbal Intelligence

Non-verbal intelligence was assessed with the CFT 20-R<sup>2</sup> (Weiß, 2006). The test was administered in its short form (Part 1; with a reliability of 0.92) in order not to exhaust the children too much because of the long testing time (for test details see Heim et al., 2008). This test provides age-related IQ scores.

#### Reading Ability

Reading competence of participants was assessed with the KNUSPEL-L<sup>3</sup> (Marx, 1998). Children had to perform four

<sup>3</sup>KNUSPEL-L: Engl.: Knuspel's reading exercises, German: Knuspels Leseaufgaben.

TABLE 2 | Overview of the different tests and their settings.


<sup>1</sup>DRT: Engl.: Diagnostic spelling test for 3rd/4th grade, German: Diagnostischer Rechtschreibtest für 3./4. Klassen.

<sup>2</sup>CFT 20-R: Engl.: Cattell Culture Fair Test 20 – Revision, German: Grundintelligenztest Skala 2 - Revision.

different tasks: Subtest 1, "Auditory comprehension" (German: "Hörverstehen"); subtest 2, "Recoding" (German: "Rekodieren"); subtest 3, "Decoding" (German: "Dekodieren") and subtest 4, "Reading Comprehension" (German: "Leseverstehen").<sup>4</sup> Finally, the test provides two different norms, one for "precursor skills" which means basic skills which are considered necessary for learning to read (consisting of subtest 1–3) and one for reading performance (consisting of subtest 2–4), the latter entering the analysis. The test differentiates between monolingual and multicultural class norms (T-scores and percentile ranks) for grades 1 to 4, each in the middle or at the end of the school year. The tested classes hosted a variety of nationalities, so multicultural class norms were chosen.

#### Phonological Processing

fpsyg-09-02006 November 20, 2018 Time: 19:8 # 4

#### **Phonological awareness**

The ability to work with the phonological structure of words like recognizing, segmenting, synthesizing and manipulating phonemes, syllables and onsets and rhymes was tested with the German test BAKO 1–4<sup>5</sup> (Stock et al., 2003). Two subtests were chosen out of the set of seven subtests. Children had to do one receptive subtest, test 6: "Vowel length detection" (German: "Vokallängenbestimmung"), and one productive test, test 4: "Phoneme exchange" (German: "Phonemvertauschung"). In test 6: "Vowel length," participants had to identify one out of four acoustically presented pseudowords with a vowel length different from the other three pseudowords (e.g., "[mA:5] – [RA:s] – [dak] – [lA:t]": [dak] is pronounced with a short vowel in contrast to the other three words). Test 4 "Phoneme exchange" requires children to change the first two phonemes of words and pseudowords which are presented auditorily, (e.g., /iftak/ → /fitak/). Separate norms for grades 1 to 4 were given for both tests. Because the scores of both tests were positively correlated in the previous study of Heim et al. (2008; r = 0.41; p < 0.001) and also in the present study (r = 0.306; p = 0.002), the average T-score was calculated for each child and used for further analyses as the measure for phonological awareness as in the Heim et al. (2008) analysis.

#### **Phonological working memory**

Phonological working memory was tested with the "Mottier-Test" (Mottier, 1951). Children were asked to repeat 30 sequences of meaningless syllables (e.g., "lu-ri" or "bi-ga-do-na-fe-ra"). The new standardization by Wild and Fleck (2013) for children aged between 5 and 17 years is valid for both mono- and bilingual children and thus their T-scores constituted the basis for the analysis.

#### Auditory Processing

The values of subtest 1 from the H-LAD<sup>6</sup> (Brunner et al., 2005) were included in the analysis. Children had to determine whether pairs of real words or syllables were equal or different (e.g., [kUs] – [gUs], [bA:] – [bA:], [kEm@n] – [kEn@n]). T-scores and percentile ranks for grades 1–4 are provided. For the present analysis, the T-scores for 3rd and 4th graders were used.

#### Visual Magnocellular Functions

In the computerized paradigm "Star field" (Wilms et al., 2005) children saw a moving random dot pattern and had to click the left mouse button as quickly as possible when its motion changed. The dot pattern was changing its motion (expanding, static and contracting) after a varying time interval of 1–3 s (for details see Wilms et al., 2005). Motion direction as well as time intervals were pseudo-randomized. The average reaction time was used for subsequent analysis.

#### Visual Attention

In the Posner Paradigm (Posner, 1980; Vossel et al., 2006) the participants had to click the left or right mouse button as quickly as possible, to indicate on which side of the screen a target stimulus is shown, to measure the participant's reaction time. In advance of each trial, in the middle of the screen a neutral, a valid or invalid cue or no cue appears. The neutral cue indicates that a target stimulus will appear and thereby prepares the participant that a response is to be expected soon. Other than the neutral cue, valid and invalid cues point to a particular side. The valid cue points to the side where the target stimulus will appear and therefore is helpful and informative in order to be able to push the button faster. The invalid cue points to the opposite side of the subsequent target stimulus. It is therefore misleading and the participant has to shift the focus of attention back to the correct target-side before pushing the button. Two values for aspects of visual attention, the alertness effect and the CVE, were included in the analysis. Alertness is the general readiness of the brain to respond to an expected stimulus (Wiegand et al., 2017), calculated as the reaction time difference between average reaction time of no cue trials versus neutral trials which contain a cue alerting the subject to an upcoming stimulus, but without directional information where that stimulus is going to appear on the screen. The CVE is computed as the reaction time difference between invalidly cued and validly cued trials. This difference indicates how quickly attention can be shifted from one location toward a new location. Smaller CVE values indicate quicker and more effective reorienting of attention.

All computerized tests were programmed and administered with Presentation <sup>R</sup> (version 0.7, Neurobehavioral Systems, Albany, CA, United States) run on an Acer Travelmate 5744 laptop under Windows 7.

<sup>4</sup>All tasks were about the little, fictive "Knuspel" creatures that lead children through the test. In the first subtest, children had to listen to questions and give answers in written form, e.g., "How many bad-tempered Knuspels and Knuspels in a good mood do you see on the previous page? Put a cross in the first box for each Knuspel." In subtest 2, children had to recognize homophones (e.g., German "mehr" [me5] and "Meer" [me5]) or non-homophones (e.g., "Stahl" [R tA:l] and "Stall" [R tal]). Subtest 3 required the children to identify pseudo-homophones, i.e., indicate if words written incorrectly would sound like a real word if they were read aloud (e.g., "SANDT" [zant] sounds like "Sand" [zant] (sand) or "ROTT" [ROt] which only reminds of the German word "rot" [Ro:t] (red) but is not pronounced equally. Subtest 4 is similar to subtest 1 but this time, in order to test children's reading comprehension, they had to read and answer the instruction by themselves. <sup>5</sup>BAKO 1–4: Engl.: Basic competences for reading and spelling skills, German: Basiskompetenzen für Lese-Rechtschreibleistungen.

<sup>6</sup>H-LAD: Engl.: Heidelberger test for auditory sound discrimination, German: Heidelberger Lautdifferenzierungstest.

## Data Analysis

fpsyg-09-02006 November 20, 2018 Time: 19:8 # 5

Only participants with complete data sets were included in the data analysis (n = 98) using SPSS 22 for Mac IOS (SPSS Inc., Chicago, IL, United States). To ensure comparability, analysis in the present study was very similar to the analysis of the previous study about cognitive profiles of dyslexia (Heim et al., 2008). In that study, first a general comparison of the two groups (dyslexic children vs. normally spelling children) was conducted. A partitioning cluster analysis for the group of impaired children was done next and followed by discriminant analyses. In the present study, a similar procedure was chosen, as explained in the following paragraphs.

#### Discriminant Analysis Part 1

In a first step, all children were compared with a linear discriminant analysis to find out which of the six variables considered (phonological awareness, phonological working memory, auditory sound discrimination, magnocellular function and visual attention: CVE and alertness) allow for the best separation of the whole dysgraphic group and the group of normally spelling children. We chose the discriminant analyses instead of a series of separate two-sample t-tests because the former, rather than the latter, consider potential covariation of the dependent variables in the analysis.

#### Two-Step Cluster Analysis

Next, a two-step cluster analysis was conducted to identify the optimum number of profiles in the dysgraphic sample. As previously done by Heim et al. (2008), the analysis was run with the following specifications: maximum number of clusters: 15, distance estimation: log-likelihood, clustering criterion: Akaike's information criterion, outlier treatment: no noise-handling, initial distance change threshold: 0, depth levels: a maximum of three. All variables were standardized during the clustering procedure.

#### Discriminant Analyses Parts 2 to 4

A series of linear discriminant analyses followed. The clusters were compared directly with each other and also with the control group. For all discriminant analyses the following settings were selected (in line with the procedure used by Heim et al., 2008): the dependent variables were entered step-wise, inclusion criterion: p ≤ 0.05, exclusion criterion: p ≥ 0.10. Priors were set equal. Wilks' lambda was calculated for each step. For the assignment of children to a particular group the leaving-one-out method was used to prevent biased (under-) estimates of misclassification rates.

#### Additional Analysis: The Relationship of Reading and Spelling Skills

Next, several chi-square analyses were conducted in order to test for distributional differences of reading impairment among the dysgraphic participants as well as for sex and grade differences across the clusters and among dysgraphics and normally spelling children. For age, a t-test for independent samples was carried out in order to assess mean age differences between dysgraphics and normally spelling children.

Furthermore, a series of t-tests for independent samples was run in order to compare the reading competence of the dysgraphic children in both clusters with each other and with that of the normally spelling children. In addition the effect sizes were calculated with Cohen's d (Lenhard and Lenhard, 2016) for the average reading competence of Clusters 1 and 2 in comparison to normally spelling children.

Finally, the observed cluster solution from the two-step cluster analysis, which included data from all dysgraphic children (with and without reading deficits) was revalidated including only those dysgraphic children with no diagnosed reading deficits. To this end, the same parameter settings were used for a predefined 2-cluster solution. The coincidence in assignment of children with pure dysgraphia to the original clusters and the newly established clusters in that second analysis is reported in a 2 × 2 contingency table.

## RESULTS

## Cognitive Variables: Group Differences of Dysgraphic vs. Normally Spelling Children

The first discriminant analysis was employed to compare the whole group of dysgraphic children with the normally spelling children with respect to six dependent variables using a stepwise forward selection approach to find the best discriminating variables. The two groups differed significantly in phonological processing: in phonological working memory (Wilks' λ = 0.57; p < 0.001) as well as in phonological awareness (Wilks' λ = 0.66; p < 0.001). Thirty-eight of the 45 dysgraphic children (84.4%) and 41 of the 53 normally spelling children (77.4%) were correctly assigned to their spelling skill groups on the basis of the set of selected cognitive variables using the leaving-one-out method, resulting in a total positive classification rate of 80.6% on the basis of the two variables for phonological processing. **Table 3** gives an overview, presenting means and standard deviations for T-Scores and raw scores of the participants' performance in the cognitive variables.

## Cognitive Clusters of Dysgraphia

The cluster analysis of the dysgraphic group based on the set of six cognitive variables (phonological awareness, phonological working memory, auditory sound discrimination, magnocellular function and visual attention: CVE and alertness) yielded two clusters (Cluster 1: n = 17; Cluster 2: n = 28). The average T-Scores for spelling competence of the clusters and of the normally spelling group are displayed in **Figure 1**.

In the subsequent discriminant analyses, the comparison of Clusters 1 and 2 revealed that auditory sound discrimination (Wilks' λ = 0.33; p < 0.001) and visual magnocellular functions (Wilks' λ = 0.57; p < 0.001) out of the profile of six variables contributed significantly to discrimination among both clusters. Cluster 1 could be identified as being significantly worse in visual magnocellular functions, Cluster 2 scored significantly worse in auditory sound discrimination – separated clusters with 16/17

TABLE 3 | Means (M) and standard deviations (SD) of the two Clusters and Controls for the cognitive variables.


(94.1%) correctly identified dysgraphic children in Cluster 1 and 27/28 (96.4%) in Cluster 2 with an overall correct assignment of 95.6%.

In a next step, Cluster 1 was compared to the normally spelling children. Cluster 1 differed from the normally spelling children in phonological working memory (Wilks' λ = 0.55; p < 0.001), and visual magnocellular functions (Wilks' λ = 0.63; p < 0.001). The overall rate of correct classifications was 91.4%, with 50/53 (94.3%) of the normally spelling children and 14/17 (82.4%) of the dysgraphic children in Cluster 1 correctly assigned.

Children in Cluster 2 and control children differed in phonological working memory (Wilks' λ = 0.51; p < 0.001), auditory sound discrimination (Wilks' λ = 0.58; p < 0.001) and phonological awareness (Wilks' λ = 0.68; p < 0.001) with an overall classification rate of 87.7% (48/53 = 90.6% for the normally spelling children and 23/28 = 82.1% of the dysgraphic children of Cluster 2 correctly assigned to their groups).

**Figure 2** shows the average T-scores for phonological awareness, phonological working memory, auditory sound discrimination and the average reaction time (higher reaction times indicating worse performance) for visual magnocellular functions and visual attention separately for each cluster and the control group.

The two variables for visual attention revealed no significant differences between the clusters (Alerting: Wilks' λ = 0.31; p = 0.114; CVE: Wilks' λ = 0.32; p = 0.341) and the normally spelling children (Cluster 1 vs. normally spelling children: Alerting: Wilks' λ = 0.52; p = 0.517; CVE: Wilks' λ = 0.54; p = 0.551 and Cluster 2 vs. normally spelling children: Alerting: Wilks' λ = 0.51; p = 0.602; CVE: Wilks' λ = 0.51; p = 0.492).

**Figure 3** shows the differing average profiles of the dysgraphic clusters, displayed as fingerprint plots for the six chosen cognitive variables.

Since the discriminant analysis revealed significant differences from controls in phonological awareness only for Cluster 2 but not for Cluster 1, an additional t-test was conducted to validate the results also reporting Cohen's d and the statistical power estimate. Cluster 1 (t43.<sup>09</sup> = 5.29, p < 0.001<sup>7</sup> , d = −1.31, power estimate = 0.996) and Cluster 2 (t74.<sup>63</sup> = 7.68, p < 0.001, d = −1.58, power estimate = 0.999999) showed significant differences in comparison to normally spelling children after

<sup>7</sup> p-values < 0.05 are reported already taking into account Bonferroni-correction, i.e., the alpha level to be compared with the p-value is already divided by the number of comparisons.

FIGURE 2 | Comparison of the dysgraphic clusters and the group of normally spelling children with respect to the different cognitive variables displayed with T-Scores for (A–C) and reaction time in ms for (D–F), thus higher values in the visual tests indicate longer reaction times (linear discriminant analysis with mean and SD; <sup>∗</sup>p < 0.05).

Bonferroni-correction. The comparison of Clusters 1 and 2 revealed no mean difference (t23.<sup>56</sup> = 0.62, p = 0.543, d = −0.2 power estimate = 0.1) and thus confirmed and extended the previous results: phonological awareness deficits are a common factor for both dysgraphia clusters (as expressed in the significant t-tests) but explain independent variance to a different degree (as expressed in the only partly significant solutions of the discriminant analyses).

The chi-square analyses revealed no sex and grade differences across the clusters (sex: Pearson's chi<sup>1</sup> <sup>2</sup> = 0.37, p = 0.546, grade: Pearson's chi<sup>1</sup> <sup>2</sup> = 0.95, p = 0.758). The comparison of the whole group of dysgraphic vs. normally spelling children also revealed no sex differences (Pearson's chi<sup>1</sup> <sup>2</sup> = 3.48, p = 0.18) and no age differences (t<sup>96</sup> = −0.433, p = 0.666).

## The Relationship of Reading Ability and Dysgraphia

In an additional analysis, we tested whether the actual degree of reading competence differed across the clusters and the normally spelling children. Mean and standard deviation of the reading skills of the two dysgraphic clusters and the normally writing children are displayed in **Figure 4**. For this purpose we ran a series of t-tests for independent samples, comparing pairwise the reading scores of the children in the two clusters and normally spelling children. The results of these t-tests revealed that the co-occurrence of developmental dyslexia with dysgraphia does not seem to have a substantial effect on the formation of the dysgraphia clusters in the present sample. The comparison of Cluster 1 vs. Cluster 2 provided no significant differences

(t<sup>43</sup> = −1.24, p = 0.222). In contrast, however, comparison of each cluster against normally spelling children revealed significant mean differences also after Bonferroni-correction: Normally spelling children vs. Cluster 1 (t<sup>68</sup> = 6.41, p < 0.001) and normally spelling children vs. Cluster 2 (t<sup>79</sup> = 6.26, p < 0.001).

In a next step, the average scores for the variable reading competence of the two clusters versus normally spelling children were compared and revealed very high effect sizes (Field, 2014): Cluster 1 vs. normally spelling children (Cohen's d = −1.75) and Cluster 2 vs. normally spelling children (Cohen's d = −1.47).

Finally, re-running the two-step cluster analysis to study the assignment of the purely dysgraphic children with no reading difficulties (n = 27 instead of n = 45) to the two clusters revealed that most children were assigned to the same clusters (95% for cluster 2 and 71.4% for cluster 1; cf. **Table 4**). Together, these results further corroborate the analyses above, indicating that reading performance had no substantial effect on the cluster structure in the present sample.

#### DISCUSSION

The aim of this study was to identify cognitive deficit profiles of developmental dysgraphia depending on the underlying disorders. In a further step, the new evidence about diverse patterns of impairment in developmental dysgraphia was compared to the existing knowledge about developmental dyslexia to point out communalities and differences.

The present study provided evidence that there are three important cognitive abilities that may differentially characterize dysgraphia: phonological, auditory and visual magnocellular processing. Based on assessment procedures for these types of processing abilities, two distinct clusters of children with different cognitive profiles could be identified. Whereas phonological awareness and phonological working memory are general characteristics for developmental dysgraphia, distinguishing the children from normal writers, auditory processing and visual magnocellular functions were identified as differentiating variables, distinguishing dysgraphic children from normally spelling children as well as the two dysgraphic clusters from each other, rather unrelated to their skills in reading performance. These results will now be discussed in detail.

## Underlying Cognitive Skills of Developmental Dysgraphia

In a first step, we examined for which cognitive variables the whole group of dysgraphic children differed from normally spelling children. Phonological processing skills, i.e., phonological awareness and phonological working memory emerged as significantly differentiating variables distinguishing dysgraphic from normally spelling children in 80.6%. This confirms the earlier investigation that performance in phonological processing is an important variable for dysgraphia (phonological awareness: e.g., Moll et al., 2009; phonological working memory: e.g., Steinbrink and Klatte, 2008; Steinbrink et al., 2008; Winkes, 2014 with only an indirect influence of phonological working memory on phonological awareness, which consequently influences spelling competence). The other cognitive variables (auditory processing, visual magnocellular function and visual attention) did not distinguish the whole group of dysgraphic children from the normally spelling children.

#### Profiles of Developmental Dysgraphia

However, the two-step cluster analysis went beyond this initial finding. It revealed structure within the group of dysgraphic children, separating them into two clusters. Two clearly distinguishable profiles of dysgraphic children appeared based on visual-magnocellular vs. auditory processing abilities. In comparison to normally spelling children, besides the already documented dysfunction in phonological working memory, Cluster 1 was characterized by deficits in visual magnocellular function, alongside a numerical but non-significant reduction also in visual attentional processing. In contrast, Cluster 2 was characterized by significantly worse auditory performance in comparison to normally spelling children, with significant deficits in both variables representing phonological processing abilities (phonological awareness and phonological working memory). The direct comparison of the two clusters revealed the differential impairment pattern in these profiles, with Cluster 1 demonstrating a visual impairment and Cluster 2 an auditory impairment.

Even after excluding dyslexic children, the two clusters remained similar although the Cluster 1 group was reduced more extensively with 7 dysgraphic children left in contrast to the Cluster 2 group with 20 dysgraphic children remaining. After excluding dyslexic children 6 children fall in Cluster 1 and show a visual magnocellular deficit and 21 in cluster 2 with an auditory deficit. This leads to the conclusion that even if visual magnocellular functions play a role for spelling, auditory functions seem to influence dysgraphic children more often. The exact role auditory and visual functions play throughout the course of literacy development needs to be addressed further in longitudinal studies.

In conclusion, the present study revealed new evidence that children with developmental dysgraphia are not homogeneous and that diverse cognitive variables, i.e., phonological awareness, phonological working memory, auditory processing and visual magnocellular function (with some visual attention problems) are

TABLE 4 | Assignment of purely dysgraphic children (n = 27) to the two clusters in the original two-step cluster analysis (n = 45) and the 2-cluster replication (n = 27).


important for developmental dysgraphia. These findings show that spelling is a complex process, influenced by diverse cognitive variables, which should be taken into account for diagnosis and remediation. Usually only whole word and letter-by-letter reading/spelling are in the focus during the process of diagnosing dyslexics/dysgraphics. On the basis of the present data, inclusion of additional variables equivalent to those chosen for the study could be included in prediction and prevention programs. The question about the connection of developmental dysgraphia and dyslexia as well as their communalities and differences remains still unanswered and will be discussed below.

## Cognitive Profiles of Dysgraphia and Their Relationship to Dyslexia

The distributions of dysgraphic participants with vs. without accompanying reading difficulties across the clusters did not differ significantly in the present sample. These data can be taken to reflect either that the type of cognitive profile of a dysgraphic child is not influenced by his/her reading ability – or that the observed cognitive profiles have rather comparable impact on reading ability, at least in the sample studied here.

In order to reflect the communalities of developmental dysgraphia and dyslexia, the new evidence about dysgraphic children will now be compared to the existing knowledge about dyslexia. In a first step, the underlying skills of the spelling and reading deficit are compared and in a second step, the clusters obtained in the present study are compared to those previously found by Heim et al. (2008).

The present study corroborated the assumption that dyslexia and dysgraphia share several commonly underlying disorders (Döhla and Heim, 2016): Problems in phonological, auditory, and visual processing. Impaired phonological processing, already known to be closely related to dyslexia (phonological awareness deficit: e.g., Snowling, 2000; Steinbrink et al., 2008; Pennington et al., 2012 and phonological working memory deficit: e.g., Seigneuric and Ehrlich, 2005; Steinbrink et al., 2008), has also been investigated as an important characteristic for dysgraphia (phonological awareness deficit: e.g., Moll et al., 2009; phonological working memory deficit: e.g., Steinbrink and Klatte, 2008; Steinbrink et al., 2008; Winkes, 2014) and was also found by us. Also, there is much evidence that dyslexics can have deficits in auditory processing (Ramus, 2003; Steinbrink et al., 2014 for children and Ramus et al., 2003; Christmann et al., 2015 for adults). The present study has shown similar results for dysgraphic children: The clusters found reflect that auditory processing is impaired in some dysgraphic children but not in others. Besides their phonological processing deficits, children in Cluster 2 also yielded worse auditory performance in comparison to normally spelling children. With respect to the literature there are two scenarios: On the one hand, deficits in auditory processing and phonological awareness can appear independently, on the other hand severe auditory deficits can cause deficits in phonological skills (i.e., phonological awareness, phonological working memory, or rapid naming) and therefore affect reading as well (Ramus et al., 2003). In the present study, auditory processing deficits usually seem to appear together with deficits in phonological processing (here: phonological awareness and phonological working memory). Children in Cluster 2 scored worse on average than normally spelling children in auditory processing as well as phonological awareness and phonological working memory; conversely, however, deficits in phonological awareness did not necessarily occur together with deficits in auditory processing. The whole group of dysgraphics scored significantly worse than normally spelling children in phonological awareness and phonological working memory, but not all of them also had worse results in auditory processing. Dividing the whole group of dysgraphic children into clusters, the children in Cluster 1 were, besides visual deficits, characterized by impaired phonological working memory, but on average did not show worse results in auditory processing. Consequently, the data indicate that phonological processing deficits may not always be a result of auditory deficits.

Apart from phonological and auditory processing deficits, visual deficits have also been discussed as relevant for reading and dyslexia. Stein (2001), Heim et al. (2008) and Tholen et al. (2011) showed a connection of visual magnocellular processing and reading competence. The results of the present study revealed that visual magnocellular functions can be impaired in poor writers as well. Cluster 1 was characterized by deficits in visual magnocellular function. The neurobiological model (Ramus, 2004) describes a phonological deficit as the main cause for dyslexia, sometimes co-occurring with deficits in magnocellular functions. Stein (2001) in his general magnocellular theory argues for the reverse relationship: he sees the reason for dyslexia in a general magnocellular deficit that causes diverse cognitive deficits. Findings of Heim et al. (2008, 2010) supported the theory of Ramus (2004) in that phonological awareness is not depending on magnocellular processing skills. For dysgraphia, a visual magnocellular dysfunction may lead to a possible deficit of dysgraphic children, which occurs in some cases but not in others, just like deficits in auditory processing.

Finally, visual attention requires additional consideration. In contrast to dyslexics (e.g., Facoetti et al., 2003; Bosse et al., 2007), significantly worse performance in visual attention (CVE and alertness) has not been shown for dysgraphic children. But even if the numerical trend did not reach significance, the variable visual attention revealed interesting results nevertheless, because Cluster 1, the cluster with deficits in visual magnocellular function, also had numerically worse results in visual attention than Cluster 2 and the normally spelling children. With respect to visual attention skills, similarities to results by Banfi et al. (2017) can be pointed out, who investigated visuo-spatial attention skills in dyslexic and dysgraphic children. In their study, a difference

between poor readers and writers was also detected with respect to a right-over-left advantage (position effect) for dyslexics and no position effect for dysgraphic children.

In summary, the present study revealed communalities with respect to deficits in underlying cognitive abilities of developmental dyslexia and dysgraphia, i.e., in phonological, auditory and visual magnocellular processing. The whole group of dysgraphic children in our study and dyslexics (in the study of Heim et al., 2008) alike differ from normally spelling/reading children with respect to worse performance in phonological processing (i.e., phonological awareness and phonological working memory), dyslexics additionally differ in worse performance in visual attention tasks (Heim et al., 2008).

Even if there are different profiles of dyslexic and dysgraphic children, developmental dyslexia and dysgraphia show similarities, although the variables were not completely identical in the study by Heim et al. (2008) and in the present study. Generally speaking, comparing profiles makes sense for both disorders and helps to characterize the disorders in a more fine-grained way. Examined in more detail, Heim et al. (2008) reported three distinct profiles of developmental dyslexia: Cluster A showed worse phonological, auditory and magnocellular skills, Cluster B only scored worse in phonological awareness tasks, Cluster C had impaired visual attention skills. The cluster analyses for dysgraphic children revealed Cluster 1 with deficits in visual magnocellular functions and Cluster 2 with deficits in auditory sound discrimination, both sharing deficits in phonological processing. Although partially different variables led to differentiating the children into different profile groups, visual and auditory skills contribute to the characterization of the clusters found for both disorders. In contrast to the dyslexic clusters in Heim et al. (2008), children with developmental dysgraphia could be distinguished as either showing auditory or visual disorders. Moreover, impaired phonological processing was generally characteristic both for dysgraphia and dyslexia in contrast to normally spelling/reading children, but was furthermore identified as a differentiating variable between clusters only for dyslexia.

The findings of the present study suggest that developmental dyslexia and dysgraphia have a common basis: they (1) share the fact that they have diverse underlying deficits and they (2) also share almost all of those deficits; they (3) moreover share the fact that impaired children can be subdivided into profiles and they (4) furthermore have in common that either impaired visual magnocellular functions or impaired auditory processing differentiate between dyslexic and dysgraphic children. With a closer look they also show differences in (5) the combination of those underlying deficits and finally (6) might differ with respect to possible profile groups. Thus, developmental dysgraphia and developmental dyslexia might be regarded as similar but not homologous with respect to their underlying cognitive profiles.

Future research should include a broader diagnostic assessment approach, comprising a large pool of different functions assessed to investigate possible further dysgraphic profiles. Furthermore it would be inspiring to substantiate the new evidence with imaging techniques and thus get more precise information especially of the magnocellular system and the cerebellum with respect to visual and auditory processing. A more fine-grained analysis by contrasting exclusively dyslexic and exclusively dysgraphic children in comparison to children with a combined disorder of reading and spelling skills may reveal interesting information with respect to underlying additional deficits and eventually question the actually supposed communalities of the two disorders.

#### CONCLUSION

The present study revealed new insights about underlying deficits and possible performance profiles of developmental dysgraphia and communalities of them with developmental dyslexia. In comparison to normally spelling children, dysgraphic children score worse in phonological processing skills, i.e., phonological awareness and phonological working memory. Based on six variables, measuring cognitive abilities and acquired skills, dysgraphic children could be subdivided into two profiles, one with an auditory and phonological processing (phonological awareness and phonological working memory) deficit and another with an impairment in visual magnocellular functions and phonological working memory. In summary, the present study revealed evidence for underlying deficits of developmental dysgraphia, i.e., phonological and auditory processing impairments as well as deficits in visual magnocellular functions. Finally, a comparison revealed that developmental dysgraphia and dyslexia are similar but not homologous. They share a common basis with different individual characteristics. As a consequence, it is reasonable to transfer this new evidence about impaired underlying functions to therapeutic everyday practice and conduct a more finegraded diagnosis of dysgraphic children and consequently tailor remediation on the basis of the patient's individual resources or barriers.

## AUTHOR CONTRIBUTIONS

DD: data acquisition, first manuscript draft, data analysis. DD, SH, and KW: concept, data analysis strategy, interpretation of results, revision of the manuscript drafts, and visualization of results.

## ACKNOWLEDGMENTS

We wish to thank the primary schools and special education schools, in particular all children and their parents who participated in this study. Furthermore we want to thank the group of students of the speech and language therapy school in Cologne ("Internationaler Bund – Medizinische Akademie – Schule für Logopädie Köln"), who helped with the diagnostic assessments.

### REFERENCES

fpsyg-09-02006 November 20, 2018 Time: 19:8 # 11



Winkes, J. (2014). Isolierte Rechtschreibstorung. Eigenstandiges Storungsbild oder leichte Form der Lese- Rechtschreibstorung? Eine Untersuchung der kognitiv-linguistischen Informationsverarbeitungskompetenzen von Kindern mit Schriftspracherwerbsstorungen. Doctor dissertation, Universität Freiburg, Freiburg.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Döhla, Willmes and Heim. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fpsyg-09-02510 December 12, 2018 Time: 17:15 # 1

# Evaluating the Effects of Metalinguistic and Working Memory Training on Reading Fluency in Chinese and English: A Randomized Controlled Trial

Tik-Sze Carrey Siu<sup>1</sup> , Catherine McBride<sup>2</sup> , Chi-Shing Tse<sup>3</sup> , Xiuhong Tong<sup>4</sup> and Urs Maurer<sup>2</sup> \*

<sup>1</sup> Department of Early Childhood Education, The Education University of Hong Kong, Hong Kong, Hong Kong, <sup>2</sup> Department of Psychology, The Chinese University of Hong Kong, Shatin, China, <sup>3</sup> Department of Educational Psychology, The Chinese University of Hong Kong, Shatin, China, <sup>4</sup> Department of Psychology, The Education University of Hong Kong, Hong Kong, Hong Kong

#### Edited by:

Gorka Fraga González, Universität Zürich, Switzerland

#### Reviewed by:

Pol Ghesquière, KU Leuven, Belgium Caicai Zhang, Hong Kong Polytechnic University, Hong Kong

> \*Correspondence: Urs Maurer umaurer@psy.cuhk.edu.hk

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 13 July 2018 Accepted: 26 November 2018 Published: 12 December 2018

#### Citation:

Siu T-SC, McBride C, Tse C-S, Tong X and Maurer U (2018) Evaluating the Effects of Metalinguistic and Working Memory Training on Reading Fluency in Chinese and English: A Randomized Controlled Trial. Front. Psychol. 9:2510. doi: 10.3389/fpsyg.2018.02510 Children traditionally learn to read Chinese characters by rote, and thus stretching children's memory span could possibly improve their reading in Chinese. Nevertheless, 85% of Chinese characters are semantic-phonetic compounds that contain probabilistic information about meaning and pronunciation. Hence, enhancing children's metalinguistic skills might also facilitate reading in Chinese. In the present study, we tested whether training children's metalinguistic skills or training their working-memory capacity in 8 weeks would produce reading gains, and whether these gains would be similar in Chinese and English. We recruited 35 second graders in Hong Kong and randomly assigned them to a metalinguistic training group (N = 13), a working-memory training group (10), or a waitlist control group (12). In the metalinguistic training, children were taught to analyze novel Chinese characters into phonetic and semantic radicals and novel English words into onsets and rimes. In the working-memory training, children were trained to recall increasingly long strings of Cantonese or English syllables in correct or reverse order. All children were tested on phonological skills, verbal working memory, and word reading fluency in Chinese and in English before and after training. Analyses of the pre- and post-test data revealed that only the metalinguistic training group, but not the other two groups, showed significant improvement on phonological skills in Chinese and English. Working-memory span in Chinese and English increased from the preto post-test in the working-memory training group relative to other two groups. Despite these domain-specific training effects, the two training groups improved similarly in word reading fluency in Chinese and English compared to the control group. Our findings suggest that increased metalinguistic skills and a larger working-memory span appear equally beneficial to reading fluency, and that these effects are similar in Chinese and English.

Keywords: reading fluency, metalinguistic, working memory, phonological awareness, morphological awareness, reading training, reading intervention, literacy

## INTRODUCTION

fpsyg-09-02510 December 12, 2018 Time: 17:15 # 2

Developmental dyslexia is defined as a specific learning disorder in reading and writing despite adequate intelligence, motivation, and educational opportunities (Snowling, 2000). Decades of research have shown that, across a variety of cognitive problems, difficulties in phonological processing are among the core deficits in dyslexia (Ramus, 2003; Ziegler and Goswami, 2005). It follows that reading remediation programs are predominantly concerned with learning to manipulate different phonological units (e.g., Bus and van Ijzendoorn, 1999; Ehri et al., 2001; Kjeldsen et al., 2003; Blachman et al., 2004; Elbro and Peterson, 2004; Hatcher et al., 2004; Suggate, 2016), at least in alphabetic languages. When it comes to learning to read in Chinese, questions arise as to whether such phonology-based training works equally well and whether other methods may also facilitate reading development in this non-alphabetic script.

Several recent studies have directly compared the efficacy of different training methods on improving literacy skills in Chinese (Zhou et al., 2012; Wang and McBride, 2017; Wang et al., 2017). For instance, Zhou et al. (2012) trained Chinese preschool children in phonological awareness, morphological awareness, and homophone awareness, and found that morphological training was more effective in improving their Chinese word reading compared to the other training types. Wang and McBride (2017) contrasted the effects of copying-only training, copyingplus-phonological training, and copying-plus-morphological training on learning to read and write in Chinese among Chinese preschoolers. Greater improvement in Chinese word reading and writing for those who received the copyingplus-morphological training demonstrated the importance of morphological awareness in literacy acquisition in Chinese. Note that these training sessions typically center around analyzing the unique structure and characteristics of Chinese, such as the phonological, morphological, and orthographic information available in Chinese characters/words. Therefore, we group them under the label of metalinguistic training in this paper.

In the present study, we chose to compare the effects of working memory and metalinguistic training on reading fluency in Chinese. We did this for two reasons. First, Chinese is morphosyllabic (DeFrancis, 1984), meaning that characters map onto morphemes and syllables rather than onto phonemes as in alphabetic languages. Hence the alphabetic model of training letter-sound correspondence does not quite apply to Chinesereading children, particularly those in Hong Kong who do not use phonetic coding systems in learning Chinese (McBride-Chang et al., 2004). Indeed, drill-and-practice is the conventional pedagogy in teaching Chinese characters (Wu et al., 1999). Children in Hong Kong typically learn to read in Chinese via rote memorization due to a lack of a coding system to represent character pronunciations. Also they are subject to massive repetitive copying of characters both at school and at home in Chinese instruction (Lin et al., 2009). Obviously such rote learning demands a heavy working memory load. It is thus unsurprising that children's working-memory capacity predicts their performance in word reading (Ho et al., 2004; Chung and McBride-Chang, 2011), text comprehension, and writing in Chinese (Guan et al., 2014). Chinese children with reading difficulties also show deficits in working memory (Ho et al., 2004; Peng et al., 2013). Given the relevance of working memory to learning Chinese, a reasonable prediction is that increasing children's working memory span can enhance their word reading fluency in Chinese.

Second, we targeted metalinguistic training because recent work on Chinese literacy acquisition has taken a more analytic approach to understanding Chinese characters. Therefore we aimed to gather corroborative evidence on whether an intervention that trained children to analyze features of Chinese characters would improve reading fluency. The most common type of Chinese character is the semantic-phonetic compound. It accounts for about 85% of the characters in the Chinese writing system (Zhu, 1988) and for 72% of the Chinese characters that children are expected to learn in primary school (Shu et al., 2003). This type of character contains two basic constituents. While the semantic radical offers a hint about meaning, the phonetic component provides a clue to pronunciation (Ho and Bryant, 1997; Shu and Anderson, 1997). For instance, the Chinese character /wu4/ (Cantonese; lake) is made up of the semantic radical , which means water or liquid, and the phonetic component is pronounced /wu4/. It is considered a fully regular and consistent character – it has the exact same pronunciation as its phonetic part written as a character ( /wu4/), and all the compound characters that contain this phonetic radical are pronounced the same (e.g., /wu4/, /wu4/). However, note that such phonetic information is not entirely reliable. There are inconsistent characters in which the compound characters sharing the same phonetics are not pronounced the same way (Yang et al., 2009). For instance, though having the same phonetic radical /lei5/, the characters /lei5/, /lei5/, /leoi5/, and /maai4/ are pronounced quite differently in Cantonese. Sometimes the phonetic provides partial information about pronunciation (e.g., /leoi5/ because its phonetic /lei5/ has some phonological information in common but not all, e.g., the same onset and tone but different rimes; Shu et al., 2003). In extreme cases the phonetic component offers obscure information about the character pronunciation (e.g., /maai4/, so that its phonetic /lei5/ is different in all the three phonological elements (i.e., onset, rime, and tone).

Interestingly, despite the probabilistic nature of phonological information in Chinese characters, Chinese children still refer to phonetic components for character pronunciation (Ho and Bryant, 1997; Shu et al., 2000; Anderson et al., 2003). Specifically, they use the pronunciation of the phonetic in reading semanticphonetic compound characters (e.g., read /maa5/ by referring to its phonetic radical /maa5/). When a character contains a bound-phonetics which is unpronounceable (e.g., ), children then infer the character pronunciation by analogy with other characters sharing the same phonetic radical (e.g., /jiu4/, /jiu4/, /jiu4/; Ho and Ma, 1999; He et al., 2005). Previous intervention studies have demonstrated the effectiveness of using subcharacter information about pronunciation in improving children's literacy skills in Chinese. In a 5-day intensive training study, Ho and Ma (1999) taught Chinese dyslexic children the structure of compound characters and introduced the functions fpsyg-09-02510 December 12, 2018 Time: 17:15 # 3

of phonetic components in regular, semi-regular, and irregular characters. Those who received such explicit phonetic instruction outperformed the control group on Chinese word reading after training. Packard et al. (2006) and Wu et al. (2009) similarly trained Chinese children to identify and analyze the semantic and phonetic components in Chinese characters. The training enhanced children's performance in reading fluency, vocabulary, reading comprehension, as well as writing Chinese characters from memory.

Anderson et al. (2003) claimed that the partial information about character pronunciation is helpful for assimilating the Chinese characters. We agree with this notion, and see the lessthan-perfect clue to character pronunciation as a starting point for children to encode and remember the characters. Beyond this point, however, children still have to learn the correct pronunciations provided by teachers or stated in a dictionary by rote. Hence, in reading such a phonologically opaque script, we argue that both an ability to reflect on the phonetic and semantic components of Chinese words and an ability to hold phonological information in working memory are indispensable resources. One intriguing question that follows is which of these two abilities is more effective in improving reading in Chinese. To answer this question, we designed the current training study to contrast the effects of metalinguistic and working-memory trainings on children's reading fluency in Chinese.

Notice that our reading test included separate lists of consistent and inconsistent Chinese characters. In this study, we hypothesized training-specific effects on reading different types of characters. We expected that the metalinguistic training would improve children's learning to read consistent characters as compared to their learning to read inconsistent characters. Phonetic radicals in consistent characters offer reliable cues to character pronunciations. Children might use a more phonological strategy by relying on phonetic radicals when reading consistent characters. Therefore our metalinguistic training with instruction on phonetic radicals would specifically improve learning to read consistent characters. In contrast, phonetic radicals in inconsistent characters are unreliable hints for the correct pronunciation. In learning inconsistent characters children might use rote learning to memorize and recall character pronunciations. Hence, we speculated that children who were trained in working memory might be at an advantage in reading inconsistent characters because they were equipped with an increased working memory capacity to learn the inconsistent character pronunciations.

In addition, we also compared the two training methods in terms of their effects on learning to read in English. English is an alphabetic script which maps onto phonemes as well as other phonological units and, thus, is phonologically more transparent than Chinese. We speculated that learning to analyze word structure in English (i.e., metalinguistic training) is more effective than learning word pronunciations by rote (i.e., working memory training), because phonological information in English words is more reliable in cuing word pronunciations. We also expected that this effect would be more pronounced in learning to read consistent than inconsistent English words.

## MATERIALS AND METHODS

#### Participants

A total of 37 second graders (19 boys; mean age = 7.5 years, SD = 0.3) from 30 mainstream primary schools in Hong Kong participated in this training study. All children were native Cantonese speakers with reading skills in the normal range. Parents responded to advertisements posted in online parentchild forums, and were contacted via follow-up phone calls. The children were randomly assigned to three groups. Thirteen children were trained in metalinguistic skills, twelve children were trained in working memory, and twelve children were the waitlist-control group who received training only after the post-test. Two children from the working memory group later dropped out. Parents gave written informed consent on children's participation at the pre-test. All testing and training sessions were administered individually either at the child's home or in a laboratory at the university. The children and parents received stationery gifts and HKD300 cash coupon upon completion of the entire training study. The research procedures and written consent form were approved by the Ethics Committee of the Social Science Panel at The Chinese University of Hong Kong.

The group characteristics are listed in **Table 1**. The three groups did not significantly differ regarding gender, age, nonverbal intelligence, or in Chinese and English oral vocabulary (all p > 0.27).

#### Materials and Procedures Pre-test and Post-test Battery **Non-verbal intelligence**

At the pre-test we used Raven's Standard Progressive Matrices (RSPM; Raven et al., 1996) to estimate children's non-verbal reasoning ability. Children finished Set A to C of RSPM which included 36 black-and-white items. Each item presented a target geometric design with one missing part. The children were instructed to pick the piece that best completed the geometric

TABLE 1 | Group characteristics for training and control groups.


design among six or eight option patterns. The maximum score of this test was 36.

#### **Oral vocabulary**

fpsyg-09-02510 December 12, 2018 Time: 17:15 # 4

We administered two vocabulary definition tests, one in Chinese and one in English, at the pre-test to measure children's expressive vocabulary knowledge as a proxy of their general language proficiency in Chinese and in English. The test design was based on the vocabulary subscale of Stanford-Binet Intelligence Scale (Thorndike et al., 1986). The test items and scoring procedure have been used in previous studies on literacy acquisition in Chinese children (e.g., McBride-Chang et al., 2008; Li et al., 2012; Zhou et al., 2012, 2014). The tests consisted of 26 Chinese two-character words and 15 English words that are frequently used in locally published primary school textbooks (Zhuang, 2000). The words were arranged in order of increasing difficulty. For each item, the experimenter read aloud a word and the child was asked to explain the meaning of the word. The answers were scored 0, 1, or 2 based on the accuracy and fullness of the definition given. The test was discontinued when a child failed to define five consecutive words. The maximum score of the Chinese and English tests were 52 and 30, respectively.

We administered the phonological tests, verbal working memory tests, and word reading fluency tests in Chinese and in English at both pre- and post-tests.

#### **Word reading fluency in Chinese**

We assessed children's ability to read in Chinese at the preand post-tests to evaluate the training effects. In this test, children were presented with two lists of Chinese single-character words separately. The first list included 80 consistent Chinese characters. The consistency value of all consistent characters was greater than 0.8 (mean consistency value<sup>1</sup> = 0.96). Excluding tone, the consistency value of all the characters in this list was 1, which means characters sharing the same phonetic component sounded the same (e.g., /wu4/, /wu4/, /wu4/, /wu4/). The second list had 80 inconsistent Chinese characters (mean consistency value = 0.35), in which characters having a common phonetic component are pronounced quite differently (e.g., /paau2/, /paau3/, /baau2/, /pou5/). Notice that Chinese characters from the two lists were not intermixed. Also, the preand post-tests used the same lists of test characters, which are different from those used in the training. Children were given 1 min to read aloud the characters in each list as quickly as possible, skipping over any unknown words. The maximum score for each list was 80.

#### **Phonological skills in Chinese**

The phonological test included syllable and phoneme deletion to assess children's abilities to manipulate phonological units in Chinese. This test has been used in several prior studies (e.g., Shu et al., 2008; Cheung et al., 2010; Tong and McBride-Chang, 2010; Li et al., 2012), and the same set of items was used in the pre- and post-test. The syllable-deletion section consisted of two blocks of 10 three-syllable Chinese words: the first block had real words whereas the second block had pseudowords. The experimenter read the words aloud one by one, and the child was asked to delete either the initial, middle, or final syllable. There were 20 one-syllable words in the phoneme-deletion section, with 10 real words and 10 pseudowords. Children were asked to say the syllable that would be left when the initial phoneme was deleted. Testing in each section began with four practice trials with modeling and corrective feedback. The tests stopped if a child made errors in five consecutive items in each section. The maximum score of the test was 40.

#### **Verbal working memory in Chinese**

We administered a test of non-word repetition to measure children's verbal working memory in Chinese. Based on previous work which has tested verbal working memory in Chinese children (e.g., Ho et al., 2002, 2004), the test comprised 12 non-word strings which ranged from three to eight Cantonese syllables in length. The syllables were randomly combined so that each non-word string did not carry any lexical meaning. The same set of non-words was used in both the pre- and post-test. In each item, the experimenter read aloud a string of Cantonese syllables and children were asked to repeat the string. One point was awarded for each correct syllable recalled and also for each correct order of consecutive pairs. Two practice trials preceded the test items, and the test was discontinued when a child failed both items in the same span length. The maximum score of this test was 120.

#### **Word reading fluency in English**

To examine the effect of training on word reading fluency in English, we used the same English word reading test at the preand post-tests. This test consisted of two separate lists of English monosyllabic words. The first list contained 80 consistent words for which all other words with the same-spelling rime sound the same (e.g., boy-toy, luck-duck, feet-meet). The second list had 80 inconsistent words for which there exist other words with the same-spelling rime but a different pronunciation (e.g., toe-shoe, five-live, home-come). These test words were different from those used in the training. In each list, children were asked to read aloud the English words in 1 min as quickly and as accurately as possible, and were instructed to skip unknown words. The maximum score for each list was 80.

#### **Phonological skills in English**

This phonological test assessed children's abilities to manipulate syllables and phonemes in English. The test items were adapted from those used in previous studies (e.g., Cheung et al., 2010; Li et al., 2012), and the same set of items was used in the preand post-test. The syllable-deletion section comprised 20 threesyllable English items, with 10 real words and 10 pseudowords. In each item, the experimenter read aloud an English word and the child was asked to say the word without the initial, middle, or final syllable. The phoneme-deletion section included 20 singlesyllable English items, 10 real words and 10 pseudowords. After the experimenter read aloud a word, children were instructed to say the syllable without the initial phoneme. After the four practice items in each section, corrective feedback were given.

<sup>1</sup>Consistency value was estimated by dividing the number of characters which share same phonetic radical AND have the same pronunciation by the total number of characters having the same phonetic radical (Lee et al., 2005).

Testing was discontinued if a child made five consecutive errors in each section. The maximum score of the test was 40.

#### **Verbal working memory in English**

fpsyg-09-02510 December 12, 2018 Time: 17:15 # 5

We tested children's verbal working memory in English with a non-word repetition task. The test design was modeled on previous work (e.g., Ho et al., 2002, 2004), and the same set of non-words was used in the pre- and post-tests. The test included 12 non-word strings, each with three to eight English syllables. In each item, the experimenter read aloud a string of English nonwords. Children were then asked to recall the non-word string in correct order. One point was awarded for each correct non-word recalled and another for each correct order of consecutive pairs. Children became familiarized with the test in two practice items. Testing stopped when children failed both items in the same span length. The maximum score of this test was 120.

#### Training Materials and Protocol

Children in the training groups received 8 weeks of oneto-one tutoring between the pre- and post-tests. The control children received the exact same training (either metalinguistic or working memory) after the post-test. All training sessions involved both parents and undergraduates/postgraduates (named as experimenters thereafter) as tutors. There were four 30-min parent-led and one 1-h experimenter-led tutoring sessions per week. Before training began, tutors completed a 3-h pre-training instruction, given by the first author. The pre-training included an overview of the reading training programs, the structure of Chinese and English languages, and the specific teaching activities and strategies to be followed in the reading training sessions. A log book was given to parents and experimenters to keep a record of the actual training time and the children's learning performance (i.e., number of words correctly read and written). The first author also met experimenters once a week to obtain individual feedback about children's learning so as to review and adjust the training protocol.

#### **Training stimuli**

In the 8-week training, children were taught 96 Chinese and 96 English late-acquired words. Half of the training words were consistent words and half were inconsistent words. Based on a local database of primary school Chinese and English, all the selected words were unfamiliar to second graders in Hong Kong. Pilot study results with 30 second graders also indicated that none of the children could read any of the selected words before training.

#### **Metalinguistic training protocol**

Parents taught three Chinese and three English words in each tutoring session. The training time between Chinese and English was equally split. The instructions for each Chinese singlecharacter word under the metalinguistic training were as follows:


For English words, we taught children to analyze the words into onset and rime. The instructions were as follows:


Each session ended with reinforcement practice where children were asked to read aloud and write the three Chinese and three English words learned in that session. Parents were instructed to praise correct responses and to correct incorrect responses.

Trained experimenters followed up the training after four parent-led sessions each week. Each experimenter-led session began with a review of all the 12 Chinese and 12 English words learned over the previous week. Children were asked to read aloud and write down the words, then corrective feedback was given by the experimenter. Next, in the Chinese session, children were instructed to name any characters that shared the same phonetic or the same semantic component as the 12 target characters (beyond those presented in the training booklet). If a child failed to recall any, the experimenter guided the child to search for these words in the child's school textbook. This activity trained the children to note and extract phonetic and semantic information from Chinese characters. Similarly, in the English session, children were guided to name any words that shared the same onset or the same rime as the 12 target words, with the aid of the child's school textbook.

Children received morphological training in the last 2 weeks. Parents taught two-character Chinese words and English compound words, and asked the child to identify the common morpheme among two to three words. In experimenter-led training, after a review of the words learned, children were asked to name any two-character Chinese words and English compound words that shared the same morpheme as the target words. In addition, they were guided to use the common morphemes to create novel Chinese and English words.

#### **Working memory training protocol**

fpsyg-09-02510 December 12, 2018 Time: 17:15 # 6

The children that received the working memory training were exposed to the exact same Chinese characters and English words as those in the metalinguistic training group. The training activities were as follows:


out of three items in each span length, he/she proceeded to the next span length, otherwise he/she repeated the same span length. This procedure was intended to stretch the children's working-memory span when they were ready for a longer span length. The English task followed the same procedure, except that English words were used.

Before the session ended, children were also asked to read aloud and write the words learnt. Parents were asked to give corrective feedback on the children's responses.

In the experimenter-led follow-up training session, children first read aloud and wrote all 12 Chinese and 12 English words learned as a review. The experimenter then introduced two crossmodal word span tasks. In the oral-to-visual word span task, the 12 target Chinese or English words were printed (font size = 150) on a large cardboard (60 cm × 40 cm). The experimenter read aloud strings of Cantonese or English syllables (i.e., the target words), and the child was asked to tap the word on the cardboard in correct or reverse order. In the visual-to-oral word span task, the experimenter presented strings of target words visually, one by one, on Powerpoint slides. The child was then asked to orally recall the words in a correct or reverse order. Again, the child stayed in the given span length until he/she succeeded in two out of the three items.

#### Statistical Analyses

First of all, the training log books indicated that parents from the three groups completed all the designated exercises with their child. The total time spent on the training was also comparable across the three groups, F(2,32) = 0.33, p > 0.05. Therefore, we can reasonably assume that the training groups were equally compliant with the training.

TABLE 2 | Pre-test and post-test mean performances on all measures for training and control groups.


fpsyg-09-02510 December 12, 2018 Time: 17:15 # 7

TABLE 3 | Results of repeated measures ANOVAs on word reading for combined and separate language analyses.


L, language; T, time; C, consistency; G, group; n.a, not applicable. Bold print indicates significant effects.

Next, we computed repeated measures ANOVAs to evaluate the effects of different training methods. The ANOVAs on phonological skills and working memory consisted of the between-subject factor group (metalinguistic vs. workingmemory vs. control) and the within subject factors time (T1 pre-test vs. T2 post-test) and language (Chinese vs. English). The ANOVA on reading fluency contained the additional withinsubject factor consistency variable (consistent vs. inconsistent characters/words). Follow-up t-tests comparing pre- to post-test results were conducted to facilitate interpretation.

#### RESULTS

**Table 2** summarizes the means and standard deviations of children's performance on all measures at pre- and post-tests, organized by groups. At the pre-test, the three groups were comparable in their phonological skills [Chinese: F(2,34) = 0.134, p = 0.875; English: F(2,34) = 0.717, p = 0.495], verbal working memory [Chinese: F(2,34) = 0.076, p = 0.927; English: F(2,34) = 0.329, p = 0.722], and word reading in English [F(2,34) = 0.575, p = 0.568] before training. The groups tended to differ in word reading in Chinese [F(2,34) = 2.658, p = 0.085]. We submitted the pre- and post-tests data to repeated-measures ANOVAs and the results are reported below. For Chinese word reading, we added pre-test word reading as a covariate in one of the analyses to test whether posttest results could be explained by pre-test differences (see below).

## Word Reading Fluency

fpsyg-09-02510 December 12, 2018 Time: 17:15 # 8

Children's mean performance on word reading fluency in Chinese and English before and after training are shown in **Figure 1** for both consistent and inconsistent stimuli. The results of the repeated measures ANOVAs are summarized in **Table 3**. Accordingly, word reading fluency increased from the pre- to post-test (Time, p < 0.001), but more for the two training groups compared to the waitlist control group (Time × Group, p < 0.001). This training advantage for the metalinguistic and working memory groups was more pronounced for the consistent than the inconsistent items (Time × Group × Consistency, p < 0.05). This three-way interaction also modulated the main effect of consistency (p < 0.001) with higher reading fluency for consistent than inconsistent items and its interaction with group (p < 0.01). The three-way interaction also modulated a time by consistency interaction reflecting an increase of the consistency effect from the pre- to post-test. In addition, Chinese stimuli were read more fluently than English stimuli (Language, p < 0.05), and the consistency effect was larger in Chinese than English (Language × Consistency, p < 0.001).

Given the strong a priori assumption that meta-linguistic training would be more effective relative to the working memory training in English compared to Chinese, the lack of a modulation of the training effects by language was surprising. A lack of an interaction with language, however, does not necessarily mean that the group by training interactions occur in both English and Chinese. We therefore computed follow-up repeated measures ANOVAs separately for each language. Pre–post t-tests were computed separately for each group and consistency condition, in order to interpret the results of both the overall and the language-specific ANOVAs.

#### Word Reading Fluency in Chinese

The results of the repeated measures ANOVAs for Chinese are listed in **Table 3**. Reading fluency increased from the pre- to post-test (Time, p < 0.001), particularly in the metalinguistic and working-memory groups compared to the waitlist control group (Time × Group, p < 0.001). This training advantage of the metalinguistic and working memory groups was more pronounced for the consistent than inconsistent Chinese characters (Time × Group × Consistency, p < 0.05). This three-way interaction also modulated the consistency main effect (p < 0.001) with higher reading fluency for consistent than inconsistent characters, and also the consistency by time interaction (p < 0.01).

Post hoc t-tests revealed that both the metalinguistic and working memory training groups increased their reading fluency of consistent [metalinguistic: t(12) = −8.34, p < 0.001; working memory: t(9) = −7.58, p < 0.001] and inconsistent [metalinguistic: t(12) = −5.52, p < 0.001; working memory: t(9) = −4.05, p < 0.01] characters from before to after training. The pre–post change, however, was not significant for the control children [consistent: t(11) = −1.46, p = 0.171; inconsistent: t(11) = −1.53, p = 0.152].

Given the trend toward differences in Chinese word reading before training between the groups, we also computed a repeated measures ANOVA on post-test reading fluency measures with the within-subject factor consistency (consistent vs. inconsistent) and the between subject factor group (metalinguistic vs. workingmemory vs. control) and using pre-test reading fluency (average across consistent and inconsistent conditions) as a covariate (Rausch et al., 2003). The results revealed significant main effects of group [F(2,31) = 22.46, p < 0.001, η 2 <sup>p</sup> = 0.592] and consistency [F(1,31) = 9.17, p < 0.01, η 2 <sup>p</sup> = 0.228] suggesting that post-training group differences in Chinese word reading fluency could not be explained by pre-training group differences.

#### Word Reading Fluency in English

The results of the repeated measures ANOVAs for English are again detailed in **Table 3**. Reading fluency increased from the pre- to post-test (Time, p < 0.001), especially for the metalinguistic and working memory groups compared to the waitlist control group (Time × Group, p < 0.01). This training advantage, however, was not significantly larger for consistent than inconsistent words despite a pattern in the means (Time × Group × Consistency, p = 0.32). Similar to Chinese, the consistency effect with higher reading fluency for consistent than inconsistent words increased from the preto post-test (Time × Consistency, p < 0.05). In addition, consistent words were read more fluently in the metalinguistic and working-memory groups, whereas the control group showed a slightly reversed pattern (Consistency × Group, p < 0.05).

Post hoc t-tests revealed that reading fluency increased from the pre- to post-test in both metalinguistic and working memory groups for both consistent [metalinguistic group: t(12) = −5.72, p < 0.001; working memory group: t(9) = −5.10, p < 0.001] and inconsistent items [metalinguistic group: t(12) =−3.77, p < 0.01; working memory group: t(9) = 3.32, p < 0.01]. Although the pre– post increases were smaller in the control group, they were still significant [consistent: t(11) = −2.51, p < 0.005; inconsistent: t(11) = −2.78, p < 0.05].

#### Phonological Skills

Children's mean performance on the Chinese and English phonological tests at pre- and post-tests are presented in **Figure 2**. The repeated measures ANOVA revealed that performance fpsyg-09-02510 December 12, 2018 Time: 17:15 # 9

increased from the pre- to post-test [Time, F(1,32) = 18.80, p < 0.001, η 2 <sup>p</sup> = 0.370], particularly for the metalinguistic group [Time × Group, F(2,32) = 7.43, p < 0.01, η 2 <sup>p</sup> = 0.317]. No other effects were significant (all ps > 0.14).

Post hoc t-tests revealed that phonological scores increased from the pre- to post-test in the metalinguistic group for both Chinese [t(12) = −4.46, p < 0.001] and English [t(12) = −4.22, p < 0.01], but no significant increase was found for the other two groups in either of the two languages (all ps > 0.21).

#### Verbal Working Memory

**Figure 2** also presents children's mean performance on Chinese and English non-word repetition tasks at both time points. The repeated measures ANOVA revealed that working-memory performance increased from the pre- to post-test [Time, F(1,32) = 12.74, p < 0.01, η 2 <sup>p</sup> = 0.285], particularly for the working-memory group [Time × Group, F(2,32) = 10.02, p < 0.001, η 2 <sup>p</sup> = 0.385]. In addition, performance was better for Chinese than for English stimuli [Language, F(1,32) = 48.19, p < 0.001, η 2 <sup>p</sup> = 0.601]. No other effects reached significance (ps > 0.15).

Post hoc t-tests revealed that working-memory performance increased from the pre- to post-test in the working memory group in both Chinese [t(9) = −2.30, p < 0.05] and English [t(12) = −2.91, p < 0.05], but that the pre–post changes were not significant in the other two groups in neither of the two languages (ps > 0.37).

### DISCUSSION

The present study aimed to evaluate two types of reading interventions to improve reading fluency in Chinese and English. Children were either taught to analyze the structure of Chinese and English words (i.e., identify information about pronunciation and meaning in Chinese characters and identify onset and rime in English words; identify morphemes in compound words), or trained to memorize increasingly long strings of Chinese and English syllables. The results indicated that both metalinguistic and working memory training could effectively enhance the respective skills they were meant to train within each language. Despite the training-specific effects, the two training groups improved more in Chinese reading fluency than the control group and this training effect was more pronounced for consistent characters. In other words, a better ability to analyze Chinese characters seems as beneficial to reading fluency in Chinese as an increased working memory capacity. Moreover, these beneficial effects seem not to be fundamentally different for reading Chinese and English, as they could also be found for the metalinguistic and working memory training in English.

## Training Effects on Reading Fluency in Chinese

Our findings contribute to the literature on literacy intervention in Chinese in several ways. First, our study joins prior studies fpsyg-09-02510 December 12, 2018 Time: 17:15 # 10

in demonstrating the causal influences of phonological (syllable awareness in particular) and morphological training on learning to read in Chinese (e.g., Chow et al., 2005, 2008; Packard et al., 2006; Wu et al., 2009; Zhou et al., 2012, 2015; Wang and McBride, 2017). Indeed, with our metalinguistic training that combined phonological and semantic training, children learned to analyze character and word structure in Chinese for pronunciation and meaning. Thus our results support the combined effects of phonological and semantic knowledge of Chinese characters in learning to read aloud in Chinese (Zhou et al., 2015).

Second, we evaluated if training in verbal working memory could be translated into reading gains in Chinese. This attempt is an important addition to the literature. Recent meta-analyses have cast doubt on the effectiveness of working-memory training in conferring benefits on general cognitive and scholastic performance (Melby-Lervåg and Hulme, 2013; Melby-Lervåg et al., 2016; though see Klingberg, 2010; Au et al., 2015, 2016, for the counter-argument). Despite this, our data indicate that working-memory training gives both near- and far-transfer effects. Compared to waitlist controls, children who were trained with word span tasks for 8 weeks had a larger verbal working memory span in Chinese immediately after training. These children also improved on their ability to read Chinese characters after training. Our findings, together with other emerging evidence from working-memory training studies in Chinese (Luo et al., 2013; Yang et al., 2017), suggest that workingmemory training is effective in facilitating reading aloud in Chinese.

Third, our study showed that the reading training which included consistent and inconsistent characters led to a larger increase in reading fluency for consistent characters. The larger increase for consistent characters, however, was not restricted to the metalinguistic group and also occurred in the workingmemory group which did the working-memory training with the same stimulus material. Thus, mere exposure to consistent and inconsistent characters during training may be sufficient to sensitize children to consistency properties and enable them to draw on this information for reading.

Crucially, our study enriches the literature by contrasting the effects of two different teaching approaches to reading acquisition in Chinese. There has been separate evidence that metalinguistic and working-memory training can each improve reading skills in Chinese (e.g., Packard et al., 2006; Wu et al., 2009; Zhou et al., 2012; Luo et al., 2013; Wang and McBride, 2017; Yang et al., 2017). However, due to the complex and ambiguous phonological information in the Chinese script, we wondered if analyzing character and word structure in Chinese or learning character pronunciation by rote might be more promising in learning to read in Chinese. Hence we directly compared their training effects in a single study. We found domain-specific training effects that the metalinguistic and working memory training could enhance the respective skills that they intended to train. To our surprise, our data showed that an increased ability to reflect on character and word structure in Chinese is as beneficial as a larger working memory capacity in improving reading fluency in Chinese. Though the training produced similar reading gains at the behavioral level, the mechanisms underlying the improvements remain unknown (see "directions for future research" below).

## Training Effects on Reading Fluency in English

The training effects in English were similar to those in Chinese. Although the control children also improved in their reading skills in English from the pre- to post-test, this increase was larger for the metalinguistic and the working-memory groups. The increase in phonological skills and in reading fluency in the children trained in metalinguistic skills is in agreement with a large number of studies in English and in other alphabetic languages (e.g., Bus and van Ijzendoorn, 1999; Ehri et al., 2001). More surprising is the effect found in the workingmemory training group. As in Chinese, the working-memory group specifically increased in working-memory performance, but also showed an improvement in reading fluency, similar to the metalinguistic group. We speculate that such a beneficial effect on reading fluency in English may have to do with the use of linguistic material in our working-memory training. From the meta-analysis by Melby-Lervåg and Hulme (2013), we found that the studies that failed to find a large effect size on word decoding or vocabulary mostly used non-linguistic materials such as digits, shapes, pictures and sounds of objects as training stimuli (e.g., Horowitz-Kraus and Breznitz, 2009; Holmes et al., 2010; Van der Molen et al., 2010; Shiran and Breznitz, 2011). Interestingly, when word endings and complete words were used in working memory training, a significant large effect on children's vocabulary was reported (Alloway and Alloway, 2009). This suggests that working memory training may also have beneficial effects on literacy acquisition in English, at least when linguistic material is used as in the current study. Whether such beneficial effects of working-memory training are restricted to learning English as a second language, or even more restricted to learning English as a second language in native Chinese speakers remains to be shown in future studies.

## Limitations and Directions for Future Research

A limitation of the current study is the small group size with 13 and 12 children in the metalinguistic and control groups, respectively, and only 10 children in the working-memory group. Though we randomly assigned the children into three groups, the small group size may contribute to slight pre-existing group differences before training. For instance, the control group performed better than the two other groups in Chinese word reading at the pre-test, though the difference is not statistically significant. Considering all the abilities tested at the pre-test, we think the three groups are reasonably comparable. Yet having a larger group size can further ensure more equivalent groups.

We should also be cautious in interpreting the results because our control group was a waitlist control who did not engage in comparable activity between the pre- and post-test. Without an active control group, some may speculate that the effects found in the metalinguistic and working memory groups were driven by the one-to-one attention given by the parents or experimenters during training. Having said that, the domain-specific effects with working memory gains in the working memory group and phonological gains in the metalinguistic group argue against this notion. The domain-specific effects in the two training groups show that our interventions indeed worked and lend the results credibility.

Another potential limitation of the study is that training was partly provided by parents who may have conducted the training less reliably. However, the training log books showed that all parents completed all the designed training activities with their child. The parents across the three groups also spent comparable time on the training. Moreover, we had trained experimenters who visited the children once per week to monitor the progress and quality of the parent training. We believe that these weekly experimenter training could reduce differences in training quality and increase reliability of the parent training.

Even though the current study did not find differences between metalinguistic and working memory training, such effects may exist but could have gone undetected in the current study for several reasons: First, the small group size may not have provided sufficient power to detect smaller effects between the training types. Second, the two drop-outs were both from the working memory training group which suggests that the effect of the working memory training may be overestimated. Third, our participants were second-grade children who were beginning readers. Their limited vocabulary may restrict their ability to analyze and extract information from the phonetic and semantic components of characters/words. These beginning readers may rely more on rote learning and thus the effect of our working memory training may again be overestimated. In future research older children who have a wider vocabulary should also be tested so that we can compare the effects of metalinguistic and working memory training across age groups. Finally, even if the two training programs have similar effects in typically developing children, it is still possible that one of the two programs has stronger effects in dyslexic children. Thus, it would be worthwhile to replicate the current training study on Chinese dyslexic children. Findings from the dyslexics would be a strong addition to our evaluation of metalinguistic and working memory training in learning to read in Chinese. After all, reading remediation programs are targeted mainly at dyslexic children. Direct evidence that compares the efficacy of training programs on dyslexic children is warranted.

The same arguments about the lack of differences between metalinguistic and working memory trainings also apply to the lack of differences between training in Chinese and English. More powerful studies may still find differences in relative efficacy of the two training methods in the two languages. Also, since our participants are second language learners of English, they may apply their learning strategies in L1 Chinese to their learning of L2 English. This transfer of learning strategies may give rise to similar results between Chinese and English. It is thus imperative to run follow-up studies with native English speaking or more balanced bilingual children to re-examine the effects of different training types on learning to read in English. Moreover, future studies may also find a way to make the working-memory training more engaging to prevent training-specific drop-outs, as was the case in the current study.

In our metalinguistic training that combined phonological and morphological training, both phonological and semantic information of the characters and words were taught. Therefore we were left unsure about whether it was the phonological or morphological or both that drove the improvement on learning to read. In future studies we may have separate groups of phonological and morphological training so that we can disentangle their respective effects.

Another direction for future research is to conduct neural evaluations of the reading training. As the children who received metalinguistic and working memory training showed similar reading gains at the behavioral level, the effects at the neural level may still be different. In the case of dyslexia, similar behavioral reading improvements may result from remediation of deficient processes or from compensatory activities (Gabrieli, 2009). In future training studies, we suggest recording children's neural activities in response to word reading in Chinese in order to examine training-induced neural plasticity and to determine the mechanisms underlying reading improvements. Moreover, pretraining neural measures may be predictive of training success (Gabrieli et al., 2015; Karipidis et al., 2017), which could offer new perspectives for developing individualized reading training for struggling readers.

## AUTHOR CONTRIBUTIONS

All authors contributed to the study design. CM, XT, and T-SS contributed test materials. T-SS collected the data. T-SS and UM collaborated on the data analyses and drafted the manuscript. CM provided critical revisions. All authors approved the final version of the manuscript.

## FUNDING

This work was supported by the Brain and Mind Institute (BMI) at The Chinese University of Hong Kong (4930740) and by the Health and Medical Research Fund (HMRF) of the Food and Health Bureau, Hong Kong (04152496).

## ACKNOWLEDGMENTS

We thank all the children and parents for participating in this training study. We are also grateful to the tutors and student research assistants for conducting the training and testing sessions.

fpsyg-09-02510 December 12, 2018 Time: 17:15 # 11

## REFERENCES

fpsyg-09-02510 December 12, 2018 Time: 17:15 # 12


Shu, H., and Anderson, R. C. (1997). Role of radical awareness in the character and word acquisition of Chinese children. Read. Res. Q. 32, 78–89. doi: 10.1598/

fpsyg-09-02510 December 12, 2018 Time: 17:15 # 13


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Siu, McBride, Tse, Tong and Maurer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Functional Neuroanatomy of Letter-Speech Sound Integration and Its Relation to Brain Abnormalities in Developmental Dyslexia

#### Fabio Richlan\*

Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Salzburg, Austria

This mini-review provides a comparison of the brain systems associated with developmental dyslexia and the brain systems associated with letter-speech sound (LSS) integration. First, the findings on the functional neuroanatomy of LSS integration are summarized in order to obtain a comprehensive overview of the brain regions involved in this process. To this end, neurocognitive studies investigating LSS integration in both normal and abnormal reading development are taken into account. The neurobiological basis underlying LSS integration is consequently compared with existing neurocognitive models of functional and structural brain abnormalities in developmental dyslexia—focusing on superior temporal and occipito-temporal (OT) key regions. Ultimately, the commonalities and differences between the brain systems engaged by LSS integration and the brain systems identified with abnormalities in developmental dyslexia are investigated. This comparison will add to our understanding of the relation between LSS integration and normal and abnormal reading development.

#### Edited by:

Gorka Fraga González, University of Zurich, Switzerland

#### Reviewed by:

Nienke Van Atteveldt, VU University Amsterdam, Netherlands Gaël Jobard, Université de Bordeaux, France

\*Correspondence:

Fabio Richlan fabio.richlan@sbg.ac.at

Received: 13 September 2018 Accepted: 18 January 2019 Published: 01 February 2019

#### Citation:

Richlan F (2019) The Functional Neuroanatomy of Letter-Speech Sound Integration and Its Relation to Brain Abnormalities in Developmental Dyslexia. Front. Hum. Neurosci. 13:21. doi: 10.3389/fnhum.2019.00021 Keywords: audiovisual integration, brain, development, dyslexia, grapheme-phoneme conversion, letter-speech sound integration, magnetic resonance imaging, reading

## DEVELOPMENTAL DYSLEXIA

Developmental dyslexia is a neurocognitive disorder characterized by a severe and persistent impairment in the acquisition of reading skills. According to the diagnostic criteria of DSM-IV (American Psychiatric Association, 2000) and ICD-10 (World Health Organization, 2007), performance in reading accuracy, fluency, comprehension and/or spelling is substantially below the performance expected from the person's chronological age, intelligence, motivation, sensory acuity and educational environment. In addition, these difficulties significantly interfere with academic achievement or activities in everyday life requiring reading skills.

During the last two decades, there has been significant advance in the neurobiological understanding of developmental dyslexia. Across many languages and writing systems, studies using neurocognitive methods have identified brain regions critically involved in typical and dyslexic reading using functional magnetic resonance imaging (fMRI; e.g., Eden et al., 1996; Shaywitz et al., 1998; Temple et al., 2003; Siok et al., 2004; Gaab et al., 2007; Hoeft et al., 2007; van der Mark et al., 2009), electroencephalography (EEG; e.g., Duffy et al., 1980; Brandeis et al., 1994; Maurer et al., 2007), magnetoencephalography (MEG; e.g., Helenius et al., 1999; Simos et al., 2000; Salmelin, 2007), and positron-emission tomography (PET; e.g., Horwitz et al., 1998; Brunswick et al., 1999; Paulesu et al., 2001).

Qualitative narrative reviews and quantitative meta-analyses of neuroimaging studies have converged on a functional neuroanatomical model of developmental dyslexia. Specifically, altered brain activation in dyslexic readers was consistently reported in left posterior temporo-parietal (TP) cortex (middle and superior temporal, supramarginal and angular gyri), left occipito-temporal (OT) cortex (inferior temporal and fusiform gyri), and left frontal cortex (inferior frontal and precentral gyri). For the posterior brain regions (i.e., TP and OT cortices), the dominant finding is dyslexic underactivation compared with typical readers, while the picture is less clear for the anterior regions. Objective meta-analytic evidence speaks for dyslexic overactivation in the left precentral gyrus and underactivation in the left inferior frontal gyrus (IFG; Richlan et al., 2009, 2011; Martin et al., 2016; Hancock et al., 2017). In addition, there are occasional reports on other bilateral cortical, subcortical, and cerebellar dyslexic activation abnormalities but consistency across studies is scarce.

## LIMITATIONS AND OPEN ISSUES

Importantly, dyslexic brain dysfunctions were predominantly assessed in the context of whole-word studies in the visual modality (i.e., studies visually presenting words or nonwords) utilizing reading-related tasks (e.g., lexical decision, semantic judgment, rhyme judgment, etc.). Undoubtedly, these studies have contributed tremendously to our understanding of the neural mechanisms during visual word recognition in typical and dyslexic readers (for a recent overview see Mascheretti et al., 2017). To what extent these findings generalize to natural reading processes, and especially to normal and abnormal reading development—requiring the initial integration of letters and speech sounds and the subsequent automation of this process—is an open issue.

Unfortunately, comparatively few studies investigated brain responses of dyslexic readers in relation to unimodal auditory stimulation (e.g., Corina et al., 2001; Gaab et al., 2007), and even fewer did so in relation to multimodal audiovisual stimulation (e.g., Blau et al., 2009; Kronschnabel et al., 2014). Multimodal audiovisual integration—particularly the binding of letters (or graphemes) and speech sounds (phonemes)—is a crucial process particularly during the early stages of literacy acquisition. Understanding of these proximal (neuro-) cognitive functions at the core of learning to read is an absolute necessity for a holistic understanding of typical and dyslexic reading development.

To this end, this mini-review summarizes the findings on the functional neuroanatomy of letter-speech sound (LSS) integration in order to obtain a comprehensive overview of the brain regions involved in this process. These brain regions are consequently compared with existing neurocognitive models of reading-related functional and structural brain abnormalities in developmental dyslexia. The investigation of the commonalities and differences between the brain systems engaged by LSS integration and the brain systems identified with abnormalities in developmental dyslexia will add to our understanding of the relation between letterspeech sound integration and normal and abnormal reading development.

## LETTER-SPEECH SOUND INTEGRATION

It has been aptly argued that the development of automated LSS integration plays a crucial role in the acquisition of fluent reading skills (e.g., Blomert, 2011). Consequently, failure to develop automated LSS integration results in an impairment of reading fluency. Therefore, a close link has been suggested between the development of automated processing of LSS associations and the emergence of a functional neuroanatomical system for skilled reading. Both behavioral and functional neuroimaging studies have evidenced less efficient LSS integration in children and adults with dyslexia compared with typically reading controls (e.g., Blau et al., 2009, 2010). In addition, recent intervention studies have demonstrated that training LSS correspondences could be a promising way to remediate slow and effortful reading in developmental dyslexia (e.g., Fraga González et al., 2015).

As explained by Blomert (2011), learning to read in alphabetic orthographies starts with learning a script code consisting of LSS pairs. Typically developing children learn the associations between letters (or graphemes) and speech sounds (phonemes) within months—often even before the onset of formal reading instruction. It takes, however, considerably longer to automatically process these LSS associations as newly constructed audiovisual (AV) objects. In beginning dyslexic readers—maybe as the result of an independent deficit or as a consequence of other deficits—this fundamental coupling of letters and speech sounds is substantially disturbed and the difficulties frequently persist into adulthood.

Blomert (2011) hypothesized that a specific deficit in the binding of sublexical orthographic and phonological information may not only constitute the immediate source of reading problems in developmental dyslexia, but may also explain the severe and persistent deficit regarding reading fluency—the lead symptom of dyslexia in shallow alphabetic orthographies (e.g., Wimmer, 1993; Torppa et al., 2010; Landerl et al., 2013). Undoubtedly, the proximal cause of developmental dyslexia is a highly controversial topic and the field certainly does not lack hypotheses about underlying (neuro-) cognitive deficits. The present mini-review is aimed at highlighting the possible role of an LSS integration deficit in dyslexia. In doing so, it does not deny or exclude other potentially relevant deficit explanations for the cause of developmental reading problems.

As explained in detail in the next section, in skilled readers LSS integration is linked to regions of the bilateral auditory cortex including the planum temporale (PT) and the bilateral heteromodal superior temporal sulcus (STS). The initial formation and subsequent automation of newly constructed grapheme-phoneme associations influences letter-specific processing and the build-up of visualorthographic representations in the left ventral OT cortex. In developmental dyslexia, a neurocognitive deficit in the integration of letters and speech sounds is thought to impede the binding of orthographic and phonological information and, consequently, the emergence of the left ventral OT ''reading skill zone'' required for fast, fluent, and seemingly effortless reading.

Regarding the automation of LSS associations, important evidence comes from electrophysiology (i.e., EEG) studies (e.g., Froyen et al., 2008, 2009, 2011; Žari ´c et al., 2014, 2015). In these studies the mismatch negativity (MMN) is used, which is a valid indicator of automatic processing. For example, Froyen et al. (2009) showed that advanced readers (4 years of reading instruction) but not beginning readers (1 year of reading instruction) exhibited an enhanced MMN amplitude indicating fast and automatic LSS integration. Furthermore, Froyen et al. (2011) reported that in 11-year-old dyslexic children this response pattern was absent. Interestingly, although lacking the early, automatic processing stage, the dyslexic children showed a late negativity effect, which was similar to that of beginning readers and interpreted as reflecting non-automatic LSS matching.

## FUNCTIONAL NEUROIMAGING STUDIES ON LETTER-SPEECH SOUND INTEGRATION

Functional neuroimaging studies have identified several brain regions associated with LSS integration. These include bilateral temporal, OT, and inferior frontal regions. Specifically, a major role in basic sensory AV integration is attributed to the bilateral heteromodal STS and adjacent superior temporal gyrus (STG) and PT. More specifically, evidence for crucial engagement of the bilateral STS in grapheme-phoneme conversion was provided by the presence of congruency effects (i.e., differences between LSS pairs with congruent or incongruent orthographic and phonological information) in typical readers (e.g., van Atteveldt et al., 2004, 2007). **Figure 1** provides a schematic overview of the most important brain regions discussed in this mini-review and their interconnections via the arcuate fasciculus.

In order to disentangle basic sensory aspects from higherlevel associative (e.g., orthographic-phonological) aspects of AV integration, many of the functional neuroimaging studies on LSS integration use the following rationale (see Hocking and Price, 2008): activation in response to multisensory AV stimuli is compared with activation in response to unisensory auditory and unisensory visual stimuli to identify basic sensory aspects of AV integration. In some cases the multisensory AV stimulation results in higher activation compared with the summed unisensory auditory + visual stimulation (i.e., super-additivity effect), whereas in other cases the multisensory AV stimulation results in

lower activation compared with the summed unisensory auditory + visual stimulation (i.e., sub-additivity effect). Both effects can be interpreted as indicating aspects of basic sensory AV integration. There are, however, limitations to this approach due to potential blood-oxygen-level-dependent (BOLD) saturation effects in fMRI (Goebel and van Atteveldt, 2009).

In order to test for higher-level associative (e.g., orthographicphonological) aspects of AV integration, activation in response to congruent LSS pairs is compared with activation in response to incongruent LSS pairs (i.e., congruency effect). Congruent means that the orthographic information represented by the visual letter stimulus matches the phonological information represented by the (simultaneously or sequentially presented) auditory speech sound stimulus. Accordingly, in incongruent LSS pairs this information does not match. Usually, the presence of a congruency effect (regardless of whether congruent LSS pairs result in higher activation compared with incongruent LSS pairs or vice versa) is taken as indicator for the engagement of a certain brain region in AV grapheme-phoneme conversion.

The tasks employed by the different functional neuroimaging studies vary considerably and—unsurprisingly—were shown to have a substantial effect on the degree of activation of the identified brain regions (van Atteveldt et al., 2007) and on the presence and/or direction of the congruency effect (Kronschnabel et al., 2014). The tasks employed include passive perception (viewing and/or listening; e.g., van Atteveldt et al., 2004), active matching (i.e., indicating via button press whether the letter and the speech sound match; e.g., van Atteveldt et al., 2007), specific speech sound target detection (i.e., detecting /a/; e.g., Blau et al., 2008), non-letter and non-speech sound target detection (i.e., detecting simple visual – ### –, auditory—piano sound—and AV targets among LSS pairs; e.g., Kronschnabel et al., 2014) and one-back task (i.e., detecting repeated stimuli; e.g., Francisco et al., 2018).

Across studies and despite different functional activation tasks, age groups and orthographies, the most consistently identified brain region associated with both basic sensory and higher-level associative AV integration seems to be the bilateral heteromodal STS. Here the typical findings are: (i) higher activation for multisensory compared with unisensory stimulation; and (ii) higher activation for congruent compared with incongruent LSS pairs in skilled readers (e.g., van Atteveldt et al., 2004). As already mentioned, the exact locations and response profiles of the activated brain regions depend on the in-scanner functional activation task. In addition, the response might be blurred by temporal limitations of the BOLD fMRI signal. In this case, EEG or MEG studies (e.g., Herdman et al., 2006; Froyen et al., 2011; Žari ´c et al., 2015) providing high temporal resolution might be more informative.

Another method to circumvent specific limitations of the BOLD signal, namely saturation effects and spatial averaging, is by using an fMRI adaptation design. In this design, the well-known phenomenon of repetition suppression (i.e., the reduced neural activity in response to stimulus repetitions) is utilized in order to investigate the functional specificity of the neural populations within voxels. van Atteveldt et al. (2010) used such a design and identified several small clusters along the STG and STS showing stronger adaptation in response to repetitions of congruent compared with incongruent LSS pairs. This finding was taken as evidence for the existence of multisensory neurons in the STG/STS that are tuned to AV content relatedness.

In addition to the specific adaptation effect in the STG and STS, van Atteveldt et al. (2010) identified a network of bilateral OT regions that showed a more general adaptation effect. That is, these regions adapted to repetitions of both congruent and incongruent LSS pairs, indicating sensitivity to letters, speech sounds or both. Activation in other regions often identified in fMRI studies on LSS integration, like the IFG, was assumed to be more related to the type of task employed and corresponding explicit decision making in active matching paradigms (Blomert, 2011). Likewise, activation in the inferior parietal lobule (IPL) is often related to task demands requiring executive functions, particularly in the presence of ambiguity (Oberhuber et al., 2016; Vignali et al., 2019).

Based on the results from a carefully designed fMRI study, Hocking and Price (2008) postulated a more general role of the bilateral posterior STS in conceptual matching, not necessarily restricted to AV integration. Most importantly, they found that the bilateral posterior STS responds in the same way to crossmodal AV conceptual matching as to intramodal auditory or intramodal visual matching when task, attention and stimuli are controlled. They concluded that the posterior STS is not specifically dedicated to multimodal integration but is part of a bilateral brain network including OT, IFG and IPL regions subserving conceptual matching, irrespective of input modalities.

In line with the idea of a functional brain network supporting AV, auditory-auditory or visual-visual conceptual matching, Blomert (2011) emphasized the importance of the gradual tuning of OT and IPL regions for increasingly automated LSS integration. This tuning and automation constitutes one of the first milestones in reading acquisition and provides the basis for the emergence of an efficient functional neuroanatomical network for the integration of letters and speech-sounds (van Atteveldt et al., 2009) and for skilled reading (Brem et al., 2010; Schurz et al., 2014a; Martin et al., 2015; Schuster et al., 2015). Exactly this functional neuroanatomical network was shown to be disrupted in developmental dyslexia (e.g., Richlan, 2012), as will be discussed in detail in the next section.

## THE FUNCTIONAL NEUROANATOMY OF LETTER-SPEECH SOUND INTEGRATION AND ITS RELATION TO BRAIN ABNORMALITIES IN DEVELOPMENTAL DYSLEXIA

In the field of developmental dyslexia functional neuroimaging studies on LSS integration are relatively new (Blau et al., 2009, 2010; Holloway et al., 2013; Kronschnabel et al., 2014; Karipidis et al., 2017, 2018). Blau et al. (2009, 2010) followed up on the seminal fMRI studies by van Atteveldt et al. (2004, 2007) and used their AV LSS integration paradigm with dyslexic adults (Blau et al., 2009) and with dyslexic children (Blau et al., 2010). Basically, the dyslexic readers did not exhibit the behavioral and neurofunctional congruency effects demonstrated by the typical readers. That is, the dyslexic readers did not show higher activation for congruent compared with incongruent LSS pairs in the brain regions (e.g., STS) identified as being part of the AV integration network in skilled readers (see e.g., van Atteveldt et al., 2004).

Furthermore, strong evidence for structural abnormalities (i.e., less gray matter volume) in STG and STS regions in developmental dyslexia was reported in quantitative coordinatebased meta-analyses and multi-center studies across different laboratories and countries (Richlan et al., 2013; Eckert et al., 2016). Taken together, these findings were interpreted as indicating a disruption in the functional neuroanatomical network supporting automated AV integration and graphemephoneme conversion in developmental dyslexia. Interestingly, two structural MRI studies with pre-reading children found that children with a family-risk for developmental dyslexia exhibited reduced gray matter volume in bilateral STG/STS regions even before formal reading instruction (Raschle et al., 2011; Black et al., 2012). Importantly, for these young children the reduction in gray matter volume can hardly be attributed to a reduced amount of reading experience.

Similar to the findings of the Dutch readers of Blau et al. (2009, 2010), Kronschnabel et al. (2014) reported activation differences between typical and dyslexic readers in congruency effects in a sample of native German-speaking Swiss adolescents. Brain regions identified with group differences included the STS, OT, IFG and IPL. Interestingly, the directionality of the congruency effect was different from the previous studies. This is most probably attributable to subtle differences in the experimental task—avoiding active monitoring of congruency condition by guiding the participants' attention away from the LSS pairs (see previous section on tasks), orthographic depth of the investigated language (see Holloway et al., 2013 for similar results in native English readers) and/or developmental factors.

Recently, Karipidis et al. (2017, 2018) investigated the emergence of AV integration in pre-reading children at varying risk for developmental dyslexia by training artificial LSS correspondences. The artificial LSS pairs were familiarized in a single training session of about 10–30 min and consisted of unfamiliar false font characters coupled with familiar phonemes. The fMRI data acquired after the training session revealed associations between individual learning rate, phonological awareness and familial history of developmental dyslexia with degree of activation in a brain network consisting of bilateral STS/STG, OT, frontal and parietal regions.

The results of these functional neuroimaging studies are fully compatible with the notion of a gradual tuning of a distributed brain network subserving increasingly automated AV binding postulated by Blomert (2011). The specific crossmodal binding deficit between letters and speech sounds in impaired readers is thought to be reflected in defective functional and structural connectivity between the brain regions constituting the reading network in skilled readers including occipital, temporal, parietal and frontal brain regions (Richlan, 2012, 2014). The disrupted connectivity between uni- and multisensory brain regions particularly in temporal and occipital cortices may hamper the incremental emergence of fast and efficient single- and multi-letter recognition in the putative ''reading skill zone'' of the left ventral OT cortex in developmental dyslexia.

The left ventral OT cortex was identified as exhibiting underactivation in dyslexic readers compared with age-matched controls across experimental tasks (Richlan et al., 2009), age groups (Richlan et al., 2011) and orthographies (Paulesu et al., 2001; Martin et al., 2016). It was proposed that in typical readers the left ventral OT cortex is not only engaged by fast and effortless visual word processing but even more so by unfamiliar letter-string processing relying on phonological decoding (Richlan et al., 2010; Schurz et al., 2010; Wimmer et al., 2010). Therefore, the left ventral OT cortex in skilled readers serves as an interface area providing access from visualorthographic information to phonological information (Price and Devlin, 2011).

In typical readers, left ventral OT, temporal and frontal regions are functionally connected, whereas in dyslexic readers this functional coupling is impaired. The reduced functional connectivity between left ventral OT and superior temporal/inferior frontal brain regions was shown for both reading-related (e.g., van der Mark et al., 2011; Olulade et al., 2015) as well as resting-state activation (e.g., Schurz et al., 2014b). Consistent with these observations are findings from neuroimaging studies on structural connectivity using diffusion tensor imaging. As evidenced by the meta-analysis by Vandermosten et al. (2012), dyslexic readers exhibit reduced integrity of the major white matter fiber tracts connecting the brain regions engaged during reading processes. Importantly, the main difference in structural integrity between typical and dyslexic readers was identified in the left TP white matter.

Although it is not entirely resolved which of various potential fiber tracts is specifically affected (see Ben-Shachar et al., 2007), convincing evidence points to the left arcuate fasciculus (Dehaene et al., 2015). It connects occipital, temporal, parietal, and frontal language regions and was shown to be among the first brain systems to anatomically change during reading acquisition. Specifically, an increase in fractional anisotropy and a decrease in perpendicular diffusivity indicated a microstructural improvement of the TP aspect of the arcuate fasciculus in response to learning to read (Thiebaut de Schotten et al., 2012; Yeatman et al., 2012). Based on these properties, the left arcuate fasciculus is assumed to play an important role particularly during early stages of reading development by subserving LSS integration and grapheme-phoneme conversion, which, in turn, constitutes the prerequisite for self-reliant phonological word decoding.

The idea that developmental dyslexia results from impaired connections between brain regions for vision and language was first put forward by Geschwind (1965a,b). Since—at least for shallow alphabetic orthographies—the dyslexic reading speed impairment was sufficiently explained by a reformulation of the phonological deficit explanation postulating an inefficient access from letters to otherwise intact phonemic information (Wimmer, 1993), this idea received new support (see Ramus and Szenkovits, 2008; Boets et al., 2013). As evidenced by modern-day neuroimaging, the visual-verbal speed deficit of dyslexic readers can be aptly attributed to functional and structural impairments in the TP and OT brain systems linking both lexical and sub-lexical orthographic and phonological information.

## CONCLUSIONS

According to the here presented literature, the development of automated LSS integration is thought to play a crucial role in the acquisition of fluent reading skills and disturbance of this development was shown to result in an impairment of reading fluency—the lead symptom of dyslexia in shallow alphabetic orthographies. Both behavioral and functional neuroimaging studies have evidenced less efficient LSS integration in children and adults with developmental dyslexia compared with typically reading controls—although certainly more research on the potential causal role of LSS integration deficits in developmental dyslexia is needed.

In skilled readers successful LSS integration is linked to regions of the bilateral auditory cortex including the PT and the bilateral heteromodal STS. The initial formation and subsequent automation of newly learned AV graphemephoneme associations influences letter-specific processing and the build-up of visual-orthographic representations in the left ventral OT cortex. In developmental dyslexia, a putative specific neurocognitive deficit in the crossmodal integration of letters and speech sounds is thought to impede the binding of orthographic and phonological information and, consequently, the emergence of the functional neuroanatomical brain system including the left ventral OT ''reading skill zone,'' the heteromodal TP cortex and frontal brain regions required for fast, fluent, and seemingly effortless reading.

#### REFERENCES


## AUTHOR CONTRIBUTIONS

FR conceived and wrote the manuscript.

#### ACKNOWLEDGMENTS

I would like to thank Angelika Basler and Marco Gareis for their assistance and Florian Hutzler for his feedback during the preparation of this manuscript.


developmental dyslexia and acquired letter-by-letter reading? PLoS One 5:e12073. doi: 10.1371/journal.pone.0012073


dyslexia. Neurosci. Biobehav. Rev. 36, 1532–1552. doi: 10.1016/j.neubiorev. 2012.04.002


speech sounds in dyslexic children scales with individual differences in reading fluency. PLoS One 9:e110337. doi: 10.1371/journal.pone.01 10337

Žari´c, G., Fraga González, G., Tijms, J., van der Molen, M. W., Blomert, L., and Bonte, M. (2015). Crossmodal deficit in dyslexic children: practice affects the neural timing of letter-speech sound integration. Front. Hum. Neurosci. 9:369. doi: 10.3389/fnhum.2015.00369

**Conflict of Interest Statement**: The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Richlan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Reading-Induced Shifts in Speech Perception in Dyslexic and Typically Reading Children

#### Linda Romanovska\*, Roef Janssen and Milene Bonte

Maastricht Brain Imaging Center, Department Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Netherlands

One of the proposed mechanisms underlying reading difficulties observed in developmental dyslexia is impaired mapping of visual to auditory speech representations. We investigate these mappings in 20 typically reading and 20 children with dyslexia aged 8–10 years using text-based recalibration. In this paradigm, the pairing of visual text and ambiguous speech sounds shifts (recalibrates) the participant's perception of the ambiguous speech in subsequent auditory-only post-test trials. Recent research in adults demonstrated this text-induced perceptual shift in typical, but not in dyslexic readers. Our current results instead show significant text-induced recalibration in both typically reading children and children with dyslexia. The strength of this effect was significantly linked to the strength of perceptual adaptation effects in children with dyslexia but not typically reading children. Furthermore, additional analyses in a sample of typically reading children of various reading levels revealed a significant link between recalibration and phoneme categorization. Taken together, our study highlights the importance of considering dynamic developmental changes in reading, letter-speech sound coupling and speech perception when investigating group differences between typical and dyslexic readers.

#### Edited by:

Iliana I. Karipidis, University of Zurich, Switzerland

#### Reviewed by:

Jarmo Hamalainen, University of Jyväskylä, Finland Susana Araújo, Universidade de Lisboa, Portugal Katarzyna Chyl, Nencki Institute of Experimental Biology (PAS), Poland

#### \*Correspondence:

Linda Romanovska linda.romanovska@ maastrichtuniversity.nl

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 10 July 2018 Accepted: 22 January 2019 Published: 07 February 2019

#### Citation:

Romanovska L, Janssen R and Bonte M (2019) Reading-Induced Shifts in Speech Perception in Dyslexic and Typically Reading Children. Front. Psychol. 10:221. doi: 10.3389/fpsyg.2019.00221 Keywords: reading development, dyslexia, letter-speech sound coupling, recalibration, adaptation

## INTRODUCTION

Reading is a complex cognitive skill most of us learn within the first decade of life. While there is some variability in how smoothly this learning process goes, most children learn to correctly associate corresponding letters and speech-sounds after 1 year of reading instruction (Blomert, 2011) and continue refining the newly acquired skill over a protracted period throughout primary school (Maurer et al., 2006; Brem et al., 2009; Froyen et al., 2009; Ben-Shachar et al., 2011). However, 5–10% of children show particular difficulties in learning to read and are diagnosed with developmental dyslexia, a learning difficulty characterized by impaired reading fluency and spelling despite adequate intelligence, motivation and schooling (Lyon et al., 2003).

A number of theories have been proposed to describe the underlying mechanisms of developmental dyslexia, ranging from phonological (Snowling, 1980; Shaywitz et al., 1998; Lyon et al., 2003), to audio-visual (Blomert, 2011; Kronschnabel et al., 2014; Aravena, 2017), visual (Bosse et al., 2007; Vidyasagar and Pammer, 2010), auditory (Tallal, 2004; Vandermosten et al., 2010),

magnocellular (Ramus, 2003) and cerebellar (Fawcett and Nicolson, 1999) deficits. However, most theories converge in acknowledging that dyslexic readers typically exhibit difficulties in phonological processing and that the formation of robust letter-speech sound mappings is essential to fluent reading acquisition. Here we explore letter-speech sound mappings in typically reading children and children with dyslexia using a newly developed short-term audio-visual learning paradigm called text-based recalibration.

Support for a deficit in letter-speech sound integration in dyslexic readers largely comes from studies comparing the processing of congruent versus incongruent letter-speech sound stimuli. Indeed, both behavioral (Snowling, 1980; Blomert and Willems, 2010; Aravena et al., 2013) and brain activity studies (Blau et al., 2009, 2010; Froyen et al., 2011; Žaric et al., 2014, ´ 2015; Jones et al., 2016; Moll et al., 2016) have shown that children and adults with dyslexia process letter speech sound pairs differently from typical readers (but see Nash et al., 2016; Clayton and Hulme, 2017). In a series of EEG studies in the relatively transparent Dutch orthography, these differences were observed in audio-visual mismatch negativity (MMN) and late negativity (LN) responses at a 100–200 ms and 600–750 ms latency following an audio-visual deviant stimulus in a sequence of standards (Froyen et al., 2009, 2011; Žaric et al., 2014 ´ ). The audio-visual MMN and LN responses can be seen as an indirect measure of letter-speech sound integration, for only if the auditory and visual modalities have been properly processed and integrated, they will yield a mismatch response. Studies by Froyen and Žaric and colleagues have revealed that children ˇ with dyslexia show a reduced audiovisual MMN and/or LN response compared to typically reading children, pointing to a reduced integration of letters and speech sounds. Furthermore, the latency of these responses has been found to scale with reading fluency and remediation, respectively (Žaric et al., ´ 2014, 2015). Concordantly, in functional magnetic resonance imaging (fMRI) studies, superior temporal cortical (STC) activity of children (Blau et al., 2010) and adults (Blau et al., 2009) with dyslexia, as well as pre-readers at familial risk of dyslexia (Karipidis et al., 2017), has been found to show less sensitivity to letter-speech sound (in)congruency compared to typical readers. Taken together these findings indicate deviant letterspeech sound processing and integration processes in dyslexic readers.

However, the manner in which stimulus (in)congruency is processed may be influenced by a number of factors, including individual differences in the level of reading skills (Plewko et al., 2018), or phoneme perception (Basu Mallick et al., 2015), but also more general factors such as attentional focus (Talsma and Woldorff, 2005), task characteristics (Basu Mallick et al., 2015) or familial risk for dyslexia (Maurer et al., 2003). A complementary approach to investigate letter-speech sound coupling can be found in (phonetic) recalibration paradigms, in which the perceived identity of an ambiguous speech sound is biased in the direction of previously presented disambiguating context information. This context information can consist of lipread speech (Bertelson et al., 2003; Vroomen and Baart, 2012), lexical (spoken word) context (Norris et al., 2003), overt or imagined speech articulation (Scott, 2016), or, most relevant for our current study, visual text (Bonte et al., 2017; Keetels et al., 2018). In the classical recalibration paradigm an ambiguous speech sound /a?a/ midway between /aba/ and /ada/ is combined with a disambiguating video of a speaker articulating 'aba' or 'ada' to bias the perception of the ambiguous sound toward the video. Thus, repeated presentation of a speaker articulating 'aba' while playing the /a?a/ sound, shifts participants' subsequent perception of this ambiguous sound toward /aba/. Similarly, a speaker articulating 'ada' shifts later perception toward /ada/. Recalibration thus involves an 'attracting' perceptual bias where participants perceive phoneme boundary shifts toward the visual information. The induced bias (recalibration) is typically described as a multi-sensory perceptual effect that has been found to be minimally influenced by higher-level task demands (Baart and Vroomen, 2010). In contrast, an opposite 'repulsive' perceptual bias (or auditory selective adaptation) is induced after repeated presentation of the same videos together with clear speech sounds. That is, after exposure to a speaker articulating 'aba' together with clear /aba/ speech sounds, the ambiguous /a?a/ sound is more likely to be perceived as /ada/ (and 'ada' articulation more often leads to /aba/ perception; Bertelson et al., 2003; Vroomen et al., 2004; Keetels et al., 2016). Phonetic recalibration with lip-read speech has been reliably shown in typically reading adults (Bertelson et al., 2003) and 8-year-old children but not in 5-year-old children, suggesting a developmental build-up of the effect (van Linden and Vroomen, 2008). A similar but delayed developmental trend has been reported in the adaptation effect, with robust effects observed in adults (Bertelson et al., 2003; Vroomen et al., 2004, 2007; Baart and Vroomen, 2010) but not in 5– 10 year-old children (Sussman and Carney, 1989; Sussman, 1993).

To investigate potential differences in letter speech-sound mappings between children with dyslexia and typically reading children, we use a recent modification of the recalibration paradigm which employs visual 'aba' or 'ada' text to bias the perception of ambiguous /a?a/ speech sounds (Keetels et al., 2016, 2018; Bonte et al., 2017). Most interestingly, while both videos and text were recently shown to elicit significant recalibration effects in typically reading adults (Keetels et al., 2018), adults with dyslexia only showed significant recalibration with videos, but not with text (Keetels et al., 2018), suggesting a specific deficit in the audiovisual mapping of letters and speech sounds. Here, we use text-based recalibration to investigate letter-speech sound mapping in 8–10 year-old typically reading children and children with dyslexia. While the nature of the study was exploratory, as text-based recalibration has not been previously studied in children, we expected to replicate the findings of Keetels et al. (2018) and to observe significant recalibration effects only in typical readers. We also explored potential links between recalibration effects and individual differences in reading proficiency (accuracy and fluency) and in categorical speech perception (phoneme categorization slope). In addition, we employ an adaptation task with clear /aba/ and /ada/ stimuli providing both a baseline with respect to potential response strategies and a test for potential developmental changes in speech adaptation (van Linden and Vroomen, 2008).

## MATERIALS AND METHODS

fpsyg-10-00221 February 5, 2019 Time: 17:12 # 3

#### Participants

Twenty children with dyslexia (mean age 8.5 ± 0.82 years; 9 females) were recruited from a specialized institute for dyslexia and reading problems, and fifty-six typically reading children (mean age 8.4 ± 0.94 years; 34 females) from local elementary schools. Parents gave written informed consent for participation in the study. To perform group comparisons and run statistical analyses, a subset of twenty typically reading children were matched for age, gender and scores on a non-verbal subtest (block design) of the Dutch version of the Wechsler Intelligence Scale for Children-III (WISC-III-NL; Kort et al., 2005) to the children with dyslexia, group characteristics and comparisons using oneway ANOVA are shown in **Table 1**. All children were native Dutch speakers with no reported hearing impairments, normal or corrected to normal vision, and no history of diagnosed neurological disorders. The dyslexia diagnosis was given by the institute based on the results of extensive cognitive psychodiagnostic testing and results of standardized reading measures. Children received a small present as participation reward. The experiment was approved by the ethics committee of the Faculty of Psychology and Neuroscience, Maastricht University.

## Literacy Skills

Each participant performed a computerized reading task of the 3DM (Dyslexia Differential Diagnosis; Blomert and Vaessen, 2009). The task comprised three subtasks including reading of high frequency words, low frequency words and pseudo words. Instructions of the reading task were simultaneously presented on the computer screen and aurally through over-ear headphones. The participant was asked to read the (pseudo)words as quickly and accurately as possible. For each subtask the participant had a time limit of 30 s to read. Reading accuracy was determined by calculating the proportion of correctly versus incorrectly read words within the given time limit. Reading fluency was calculated as the number of correctly read words within the given time limit for the whole task as well as per subtask.

#### Experimental Design and Procedure Stimuli

The speech stimuli consisted of recordings of a native male Dutch speaker pronouncing the speech sounds /aba/ and /ada/ (see Bertelson et al., 2003 for a detailed description). Both speech sounds lasted 650 ms and were used to create a nine-token continuum (BD1-BD9) ranging from a clear /aba/ sound to a clear /ada/ sound by changing the second formant (F2) in eight steps of 39 Mel using PRAAT software (Boersma and Weenink, 2001). The visual stimuli consisted of the written counter-parts of the speech sounds, namely 'aba' and 'ada' text presented in white at the center of a black screen in 'Times New Roman' font (font size 50). The auditory and visual stimuli were presented using Presentation software (Version 17.2, Neurobehavioral Systems, Inc., Berkeley, CA, United States).

All children completed the pre-test, recalibration and adaptation tasks. The children with dyslexia completed these tasks in a quiet room at the specialized dyslexia institute, whereas the typically reading children were tested in a quiet room at their school. All tasks were performed on a laptop computer with the auditory stimuli presented at a comfortable listening level over noise-canceling headphones (SONY MDR-7509HD).

#### Pre-test

Prior to the main experimental tasks, all participants completed a pre-test in which all nine tokens of the /aba/ - /ada/ continuum were presented a total of 98 times in a randomized order. The children were instructed to listen to each sound carefully and to indicate which sound they heard by pressing the left (/aba/)


or right (/ada/) shift button with the left or right index finger, respectively, following a response cue (**Figure 1**). The response cue consisted of 'aba' (left) and 'ada' (right) text held up by cartoon monsters created using the Monster Workshop content pack of the iClone 6 software<sup>1</sup> . No emphasis was put on speed, and it was furthermore emphasized that there were no correct or incorrect responses. While the speech sounds were played, children viewed a black screen with a white fixation cross, which was followed by the response screen (cartoon monsters) after 1 s and terminated when children provided a response. The subsequent speech sound was presented 2 s after a response was given. The total duration of the pre-test was approximately 5 min.

The results of the pre-test were used to determine the most ambiguous speech sound for each participant. This was done based on the proportion of /aba/ responses to each token along the /aba/-/ada/ continuum and was identified as the sound with a response proportion of /aba/ versus /ada/ closest to 0.5. This individually determined most ambiguous sound was subsequently used in the audiovisual exposure blocks of the recalibration task as well as in the post-test trials of the recalibration and adaptation tasks. In the post-trials, next to the most ambiguous sound, we also presented its flanking sounds /a?a/+1 and /a?a/−1 on the /aba/-/ada/ continuum.

The pre-test served two purposes: (1) to determine the most ambiguous sound for each participant, and (2) to allow for the investigation of the phoneme categorization slope in each group. Previous research has indicated that adult readers with dyslexia perceive speech sounds less categorically compared to typical readers (Ahissar, 2007; Baart et al., 2012). Thus, the results of the pre-test allow us to investigate whether these findings extend to our sample of children with dyslexia and typically reading children.

#### Recalibration Task

The text-based recalibration paradigm is composed of audiovisual exposure blocks and subsequent auditory-only post-test trials (**Figure 2**). During each audio-visual exposure block, the children were presented with 8 repetitions of either the text 'aba' or 'ada', paired with the individually determined ambiguous speech sound /a?a/. The speech sound and visual text were presented simultaneously (relative SOA of 0 ms) and auditory stimuli had a duration of 650 ms, while text was presented for 1 s. The inter-trial interval between subsequent audio-visual exposure trials was set to 2 s. During the audio-visual exposure blocks, children were instructed to pay close attention to the speech sounds and text without providing a response.

Each exposure block was followed by four auditory-only post-test trials. The four post-test sounds were presented in a randomized order with the individually determined most ambiguous /a?a/ sound presented twice and each of its flanking sounds /a?a/+1 and /a?a/-1 on the /aba/-/ada/ continuum, presented once. Each post-test sound was followed by a response cue consisting of 'aba' and 'ada' texts held by cartoon monsters (**Figure 2**).

Children were instructed to listen to each sound carefully and to make forced-choice /aba/-/ada/ judgments by pressing the left/right shift button with the left/right index finger, respectively, once the cartoon monsters appeared. Identical to the pre-test, no emphasis was put on speed and it was further emphasized that there were no incorrect responses. All responses were self-paced. The onset of the response picture was jittered 1–2 s in relation to the post-test sound and was terminated upon the button-press.

<sup>1</sup>https://www.reallusion.com/

Post-test trials were presented with an inter-trial interval of 2 s after the participant had provided a response.

The recalibration task was divided into 2 6-min runs, both consisting of 10 'aba' and 10 'ada' exposure blocks, each followed by 4 post-test trials amounting to 40 post-test trials for each type of exposure block.

#### Adaptation Task

The adaptation task was identical to the recalibration task in all aspects except for the speech sounds used in the exposure blocks. Here, the clear /aba/ and /ada/ sounds were combined with the corresponding 'aba' and 'ada' text, creating congruent audio-visual stimuli in the exposure blocks. The task instructions and stimulus timings were all identical to those of the recalibration task. The adaptation task allowed us to explore auditory adaptation and served as a control for potential response strategies that children may employ.

All children performed the pre-test followed by 2 runs of the recalibration task and 2 runs of the adaptation task. The task order was kept constant across all participants. The reason for this fixed order instead of counterbalancing was threefold: (1) because we were interested in audiovisual learning, the recalibration blocks were of primary interest, with the adaptation blocks serving as a control, (2) this behavioral experiment served as a preparation of a longitudinal fMRI project where we only included text-based recalibration, and (3) initial pilot results suggested interference from adaptation blocks to subsequent recalibration blocks but not vice versa. This finding is in line with the observation of short-lived audio-visual recalibration effects compared to longer lasting adaptation effects (Vroomen and Baart, 2012).

#### Statistical Analysis

The data were assessed for statistical significance using repeated measures ANOVA (SPSS version 24.0, IBM Corp., Armonk, NY, United States). The ANOVA model included the type of task (recalibration vs. adaptation), type of exposure ('aba' text vs. 'ada' text), post-test sounds (/a?a/, /a?a/ +1, /a?a/-1) as within subjects factors and group (dyslexic vs. typically reading) as between subjects factor. The differences in average /aba/ versus /ada/ response proportions (aftereffects) following the two types of exposure blocks were further assessed using paired-samples t-tests. For the conditions in which the sphericity assumption was violated, the degrees of freedom were adjusted using the Greenhouse-Geisser correction.

The fit of the pre-test slopes was estimated using the Slope Fitting Tool in MATLAB 2016a (The MathWorks, Inc., Natick, MA, United States). Based on previous literature, a custom logistic function (Function 1) was used to obtain partial R<sup>2</sup> values and evaluate the goodness of fit of individual as well as grouplevel categorization slopes (McMurray and Spivey, 2000; Ley et al., 2012). Subsequently, the non-linear least squares solver in MATLAB was employed to obtain the slope value (c in Function 1) that provided the best fit to the data and yielded the smallest sum of squares. To optimize the outcome, the results of the fitting procedure were restricted for each of the variables in Function 1 to 0 ≤ a ≥ 10, −10 ≤ b ≥ 10, −10 ≤ c ≥ 10, −9 ≤ d ≥ 18. The best fit was determined by running 30 iterations of the slope fitting procedure and taking the slope value with the smallest sum of squares. The number of iterations was verified by replicating the procedure multiple times.

$$\mathbf{y} = \frac{\mathbf{a}}{1 + \mathbf{e}^{\frac{-(\mathbf{x}-\mathbf{d})}{\mathbf{c}}}} + \mathbf{b} \tag{1}$$

Function 1: a, amplitude of the function; b, lowest asymptote of y-axis; c, slope of the function; d, location of the category boundary.

To investigate a potential link between recalibration/ adaptation aftereffects, pre-test slope and behavioral reading measures, linear regression analyses were performed in R 3.4.1 (R Development Core Team, 2013). In addition, all statistical analyses were also performed on the complete sample of controls to assess the reliability of our findings within a larger sample of typical readers of various reading levels.

## RESULTS

## Pre-test

The results of the pre-test were used to investigate the categorical perception of the nine auditory tokens employed in this study. **Figure 3A** shows the proportion of /aba/ responses per sound stimulus in children with dyslexia (dashed line) and the matched typically reading control children (solid line). These figures indicate similar categorical perception of speech sounds in the groups of typically reading children and children with dyslexia. This observation was confirmed by a 9 auditory token × 2 (Group) repeated measures ANOVA. The ANOVA revealed an expected main effect of sound [F(2,100) = 135.03, p < 0.001, Greenhouse-Geisser corrected], indicating that the participants were more likely to perceive the auditory tokens closer to the /aba/ end of the continuum (BD1-BD3) as /aba/ and the tokens closer to the /ada/ end (BD7-BD9) as /ada/. Furthermore, no difference in the overall proportion of /aba/ responses was observed between the children with dyslexia (M = 0.51, SD = 0.06) and typically reading children (M = 0.54, SD = 0.11); [t(38) = −0.92, p = 0.36], indicating that the slope was equivalent in both groups. **Figure 3B** shows the same slopes for children with dyslexia and all of the control participants tested (n = 56), once again showing similar categorical perception in typically reading children and children with dyslexia. The goodness of fit estimation of the slopes reflected in partial R<sup>2</sup> values was 0.99 in the dyslexic, matched as well as the entire control group.

## Recalibration and Adaptation Tasks

#### Matched Groups

During the recalibration task, participants' perception of the three post-test sounds – the most ambiguous sound (a?a) and its two closest neighbors (a?a+1 and a?a-1) – was influenced by the preceding exposure blocks, as seen when analyzing the proportion of /aba/ versus /ada/ responses during the posttest trials. Intriguingly, both the children with dyslexia and the typically reading children showed a recalibration effect (**Figure 4A** middle and right columns, respectively). Thus, both groups were more likely to perceive the ambiguous posttest sounds as /aba/ following 'aba' exposure blocks (solid line **Figure 4A**). Similarly, 'ada' text shifted later perception toward /ada/ (dashed line **Figure 4A**). This effect was particularly pronounced for the most ambiguous /a?a/ sound (proportion of /aba/ responses children with dyslexia 0.54 vs. 0.31, typical readers 0.57 vs. 0.31, respectively). Across both groups, the participants only seemed to show a small adaptation effect for the most ambiguous post-test sound, namely the exposure to the clear /aba/ sound in combination with 'aba' text shifted the perception of the post-test trials to /ada/ (dashed line **Figure 4B** left). Correspondingly, being exposed to clear /ada/ in combination with 'ada' text led to a small shift in the perception of subsequent post-test trials toward /aba/ (solid line **Figure 4B** left).

A 2 (Task) × 2 (Exposure) × 3 (post-test sounds) × 2 (group) repeated measures ANOVA showed a significant task × exposure × post-test sounds interaction [F(2,76) = 3.52, p < 0.05] confirming that the participants responded differently to the post-test sounds following the two types of exposure blocks in recalibration and adaptation tasks. This was further confirmed by the significant main effects of task [F(1,38) = 31.27, p < 0.001], exposure ['aba' versus 'ada'; F(1,38) = 8.65, p = 0.006], and posttest sounds [F(1,48) = 117.05, p < 0.001, Greenhouse-Geisser corrected], as well as significant task × exposure [F(1,38) = 45.32, p < 0.001], task × post-test sounds [F(1,61) = 3.38, p < 0.05, Greenhouse-Geisser corrected] and exposure × post-test sounds [F(2,76) = 7.39, p < 0.005] interactions. No main effect of group was observed [F(1,38) = 1.06, p = 0.31], and

none of the interactions with group were significant (all F ≤ 2.1), corroborating the absence of significant differences in recalibration and adaptation results in children with dyslexia and typically reading children.

The results of the Recalibration task were further tested in a 2 (Exposure) × 3 (post-test sounds) × 2 (Group) repeated measures ANOVA. A main effect of exposure [F(1,38) = 51.43, p < 0.001], post-test sounds [F(1,49) = 84.02, p < 0.001, Greenhouse-Geisser corrected], as well as a significant exposure × post-test sounds interaction [F(2,76) = 8.37, p = 0.001] again highlighted that the participants responded differently to the post-test sounds depending on the type of exposure block preceding them. Results yielded no main [F(1,38) = 0.054, p = 0.81] or interaction (all F ≤ 0.9) effects for group.

A 2 (Exposure) × 3 (post-test sounds) × 2 (Group) repeated measures ANOVA was also run on the results of the adaptation task confirming the absence of an overall adaptation effect across sounds in both groups [F(1,38) = 2.35, p = 0.13]. The results revealed a main-effect of post-test sounds [F(1,52) = 80.73, p < 0.001, Greenhouse-Geisser corrected] and a non-significant trend toward an exposure × post-test sounds interaction [F(2,76) = 2.90, p < 0.06]. No other main effects or interactions were significant (all F ≤ 1.2).

Post hoc paired-samples t-tests were run on the proportion of /aba/ responses for each of the three post-test sounds following both exposure blocks ('aba' versus 'ada') in both tasks across groups. In the recalibration task, the analyses yielded significant differences in the proportion of /aba/ responses following an 'aba' exposure block compared to an 'ada' exposure block across all post-test sounds (/a?a/: M = 0.55, SD = 0.16 vs. M = 0.31, SD = 0.15, t(39) = 6.99, p < 0.001; /a?a/+1: M = 0.26, SD = 0.19 vs. M = 0.14, SD = 0.14, t(39) = 3.98, p < 0.001; and /a?a/−1: M = 0.66, SD = 0.18 vs. M = 0.44, SD = 0.20, t(39) = 6.43, p < 0.001). In the adaptation task, only the proportion of /aba/ responses to the most ambiguous sound (/a?a/) was significantly different following 'aba' versus 'ada' exposure blocks [M = 0.48, SD = 0.21 vs. M = 0.57, SD = 0.19, t(39) = −2.06, p < 0.05].

To test for potential response-strategies, a paired samples t-test was run on the proportion of /aba/ responses across all three post-test sounds in the recalibration task compared to the adaptation task (van Linden and Vroomen, 2008). The results revealed a significant difference in the proportion of /aba/ responses in the recalibration task (M = 0.57, SD = 0.50) compared to the adaptation task [M = −0.14, SD = 0.62;t(1,39) = 6.81, p < 0.001], indicating that the children did not employ a clear response strategy thus confirming the reliability of the observed recalibration effect.

#### Entire Control Group

The same analyses were also performed on the data of the entire control group and yielded similar recalibration results. Five of the 56 participants did not complete the adaptation task, thus the statistical analyses including the task condition are based on 51 participants. A 2 (Task) × 2 (Exposure) × 3 (post-test sounds) repeated measures ANOVA revealed a significant main effect of task [F(1,50) = 99.53, p < 0.001], exposure ['aba' versus 'ada'; F(1,50) = 15.93, p < 0.001], and

post-test sounds [F(1,69) = 155.55, p < 0.001, Greenhouse-Geisser corrected], as well as significant task × exposure [F(1,50) = 22.70, p < 0.001] and task × post-test sounds [F(1,67) = 0.82, p < 0.05, Greenhouse-Geisser corrected] interactions. The results are summarized in **Figure 5**, which illustrates that the participants showed a recalibration effect (solid line above the dashed line) but did not show an adaptation effect (no separation between the lines). A 2 (Exposure) × 3 (post-test sounds) repeated measures ANOVA was run for each task and revealed a significant main effect of exposure [F(1,50) = 41.09, p < 0.001], post-test sounds [F(1,70) = 128.02, p < 0.001, Greenhouse-Geisser corrected], and a significant exposure × post-test sounds interaction [F(2,100) = 4.45, p < 0.05] in the recalibration task as well as a main effect of post-test sounds [F(1,68) = 97.39, p < 0.001, Greenhouse-Geisser corrected] in the adaptation task, highlighting the presence of a recalibration effect and the absence of an adaptation effect.

Post hoc paired samples t-tests on the proportion of /aba/ responses for each of the post-test sounds per exposure block ('aba' versus 'ada') revealed significant differences in the proportion of /aba/ responses for each of the sounds following 'aba' compared to 'ada' recalibration exposure blocks [/a?a/: M = 0.52, SD = 0.15 vs. M = 0.33, SD = 0.16, t(50) = 5.80, p < 0.001; /a?a-1/: M = 0.62, SD = 0.17 vs. M = 0.46, SD = 0.19, t(50) = 5.30, p < 0.001; /a?a+1/: M = 0.24, SD = 0.17 vs. M = 0.14, SD = 0.13, t(50) = 4.34, p < 0.001]. None of the paired samples t-tests for the adaptation task yielded a significant result.

#### Relation With Standardized Reading Measures

Given the absence of overall group differences in recalibration, an important aspect to consider is whether the presence of this effect is related to individual differences in reading fluency, the magnitude of adaptation and/or the phoneme categorization slope. Accordingly, two separate linear regression analyses were performed in the matched groups, one to investigate potential links between the magnitude of the recalibration and adaptation effects (quantified as the proportion of /aba/ vs. /ada/ responses), the individual phoneme categorization slopes and standardized reading measures. The second analysis investigated the relation between the individual phoneme categorization slopes and the magnitude of the recalibration and adaptation effects and reading measures. Prior to running the regression analyses, the data were assessed for outliers using boxplots. In the matched groups, the analyses identified two outliers in categorization slope values, one child with dyslexia (lower quartile plus 3 times inter-quartile range) and one typically reading participant (lower quartile plus 1.5 times inter-quartile range). Similarly, 7 participants were identified as outliers in the entire control group according to the same criteria and were excluded from the subsequent regression analyses. All linear regression models initially included main effects for: group (dyslexia yes/no), recalibration and adaptation aftereffects, reading fluency and accuracy scores, and pre-test phoneme categorization slope values, as well as interactions between the main effects and dyslexia. Where applicable, these models were refined by removing interaction terms with a p-value exceeding 0.7 thus improving model fit. The reading measures were centered with respect to the overall average to facilitate interpretation.

The results of the linear regression analyses of the magnitude of the recalibration effect showed a significant interaction between dyslexia and the adaptation effect (**Table 2** 'Recalibration effect'). Simple slope analyses of the interaction effect revealed a significant positive association between the strength of the recalibration and adaptation effects in children with dyslexia but not typically reading children (**Figure 6**). Moreover, a trend was observed in the main effect of pre-test slope on recalibration across groups. Regression analyses of the phoneme categorization slope values did not reveal significant main or interaction effects in the matched groups. However, the main effect of recalibration did approach significance, suggesting a link between pre-test



The bold values indicate statistically significant results. <sup>∗</sup>p ≤ 0.05.

slope and the strength of the recalibration effect (**Table 2** 'Pre-test slope'). Slope values were not found to significantly differ between children with dyslexia (n = 19) and typically reading children (n = 19; t(36) = −0.54, p = 0.59, equal variances assumed).

The regression analyses of the strength of the recalibration effect and phoneme categorization were also performed in the whole control group (N = 51), revealing a significant link between the strength of the recalibration effect and categorical perception of phonemes (**Table 3** 'Recalibration effect' and 'Pre-test slope'). Moreover, a significant association between reading accuracy and steepness of the pre-test slope was observed, with reading fluency scores also approaching significance (**Table 3** 'Pre-test slope'). These findings complement and extend those of the matched groups highlighting the influence of phoneme perception on recalibration, with the additional finding of a significant link between phoneme categorization and reading accuracy in the control group.

#### DISCUSSION

In the present study, we investigated reading-induced audiovisual plasticity in 8–10 year old children with dyslexia and typically reading children by using written text to recalibrate children's perception of ambiguous speech sounds. Contrary to reported findings in adults, our results revealed that both groups of children reliably show a recalibration effect. The magnitude of the effect was significantly related to the magnitude of the adaptation effect in children with dyslexia but not typically reading children. Phoneme categorization slopes in turn revealed comparable categorization of /aba/ and /ada/ sounds in both children groups. Furthermore, extending the analyses to a sample of typically reading children of various reading levels revealed an association between phoneme categorization slope and reading accuracy. These findings emphasize the importance of studying different age groups to investigate a potential developmental

TABLE 3 | Results of the recalibration effect and pre-test slope regression analyses in the entire control group.


The bold values indicate statistically significant results. ∗∗p ≤ 0.005, <sup>∗</sup>p ≤ 0.05.

trend in short-term text-induced audio-visual learning, and to uncover possible differences in mechanisms responsible for letter-speech sound coupling, phoneme perception and reading fluency in dyslexic and typical readers.

Replicating our recent findings in typically reading adults (Bonte et al., 2017; Keetels et al., 2018), our current findings show that text stimuli can successfully be used to bias the perception of ambiguous speech in 8–10 year-old children. Recalibration is proposed to rely on short-term perceptual learning mechanisms that help resolve the discrepancy between context information (e.g., lip-read speech, text) and ambiguous sound (Samuel and Kraljic, 2009; Vroomen and Baart, 2012). Unlike lip-read speech which is rooted in biology (Kuhl and Meltzoff, 1982), letter-speech sound associations are by nature arbitrary and are learnt through explicit instruction (Keetels et al., 2016; Fraga González et al., 2017). Our results suggest that already during the first years of reading acquisition, at least at the behavioral level, these learned associations lead to significant perceptual shifts similar to those induced by lip-read information (van Linden and Vroomen, 2008). That is to say, simple 'aba' and 'ada' syllables lead to perceptual recalibration in 8–10 year old children in the relatively transparent Dutch orthography that is characterized by fairly consistent letterspeech sound mappings and a rather small grain size. In future studies it would be interesting to test whether similar syllables also yield significant text-based recalibration in less transparent orthographies and/or orthographies with larger grain sizes (see e.g., Paulesu et al., 2000; Brennan et al., 2012; Lallier and Carreiras, 2017).

The observation of significant recalibration in children with dyslexia is in line with a previous study indicating comparable context sensitivity during speech perception in 7–9 year old children with dyslexia and typically reading children at auditory, phonetic and phonological levels (Blomert et al., 2004). But how can this observation be reconciled with the absence of a significant effect in adults with dyslexia (Keetels et al., 2018)? One possible explanation for the discrepancy between findings in children and adults is that 8–10 year-old children presumably have a wider integration window for letter-speech sound coupling. EEG research investigating letter-speech sound integration in children within our age range indicates timing differences in the MMN window in response to letter-speech sound pairs. Namely, unlike in adults, in children the audiovisual MMN effect is not restricted to simultaneous presentation of letters and speech sounds (Froyen et al., 2008), but is also seen when letters are presented 200 ms prior to the speech sounds. Furthermore, the MMN response peaks at a later time point, a pattern that gradually shifts to earlier and shorter integration windows with increased reading experience (Froyen et al., 2009; Žaric et al., 2014 ´ ). These changes have been proposed to reflect the automatization of letter-speech sound coupling (Froyen et al., 2008, 2009). A similar pattern, albeit with a reduced sensitivity to letter-speech sound congruency and delayed with respect to their age-matched peers, is also observed in children with dyslexia (Froyen et al., 2011; Žaric´ et al., 2014, 2015). A wider temporal integration window might be beneficial when resolving the conflict between the ambiguous sound and disambiguating text, and may reflect how text to speech sound audio-visual learning mechanisms are still developing during the first few years of reading instruction. Furthermore, developmental changes in the sensitivity to text may follow an 'inverted U' trajectory, where text is a more salient stimulus in the first few years of reading instruction and the salience decreases with increased reading expertise (Maurer et al., 2008; Price and Devlin, 2011; Žaric et al., 2014, 2015 ´ ; Fraga González, 2015). Because the children in our study fall within the age range of 'peak' text sensitivity, further observations of the same children in a longitudinal comparison may reveal interesting developmental trends in the text-based recalibration effect.

Another possibility that could explain the difference in results between the adults and children with dyslexia is that there might be larger inter-individual differences in adults. Thus, the adult dyslexic readers who do not show a text-based recalibration effect may suffer from a more severe form of dyslexia and/or may have switched to relying on different reading strategies circumventing one-to-one mappings of letters and speech sounds. Instead, reading is a daily occurrence for school-age children, with a presumably predominant reliance on letter-sound decoding skills especially for children with dyslexia included in our study who were at the initial phase of a dyslexia intervention with a focus on these skills.

Our results also contrast with previous findings reporting reduced sensitivity to letter-speech sound (in)congruency in children and adults with dyslexia (Blau et al., 2009, 2010; Froyen et al., 2009, 2011; Žaric et al., 2014, 2015 ´ ; Karipidis et al., 2017). A possible reason for the observed differences in results may lie in the paradigms employed. While the aforementioned studies have used congruency manipulations and oddball paradigms to explore group differences between typical and dyslexic readers, we have used a more implicit measure. Recalibration typically involves the disambiguation of ambiguous speech signals based on short-term perceptual (audiovisual) learning. It is possible that, at a purely behavioral level, the task is not sensitive enough to capture subtle group

differences between children with dyslexia and typically reading children. Indeed, previous studies on audiovisual integration have revealed underlying differences in brain mechanisms using neuroimaging methods despite a lack of significant differences in behavioral measures (see Nash et al., 2016; Plewko et al., 2018). In future studies it would be important to further understand the specific role of task and stimulus characteristics, as well as risk factors such as family history of dyslexia (Raschle et al., 2012; Plewko et al., 2018) in yielding these audio-visual integration deficits. Moreover a next essential step would be to combine our text-based recalibration paradigm with measurements of brain activity (e.g., Bonte et al., 2017) and investigate whether different or comparable neural mechanisms underlie the perceptual shifts in children with dyslexia and typically reading children.

Linear regression analyses of the magnitude of the recalibration effect revealed a significant association between recalibration and adaptation in the dyslexic but not typical readers. That is, in dyslexic readers, stronger recalibration was associated with stronger adaptation effects. Furthermore, in the matched groups, the recalibration effect showed a tendency toward an association with pre-test slope across participants. This link reached statistical significance when the analyses were extended to the entire control group, suggesting a close link between the categorical perception of phonemes and short-term text-induced audiovisual learning, with sharper phoneme categorization linked to stronger recalibration effects. The findings of the matched groups were thus extended and complemented by those of the entire sample of controls. We would therefore speculate that the abovementioned pattern of results would also replicate in a larger sample of both children with dyslexia and typically reading children.

Our study did not find support for proposed differences in categorical perception of speech sounds between children with dyslexia and typically reading children. This finding is in line with previous research reporting a similar lack of group differences (Blomert and Mitterer, 2004; Snellings et al., 2010) or differences only in small sub-groups of dyslexic readers (Manis et al., 1997; Joanisse et al., 2000), but not with others that do report reduced categorical perception of phonemes in dyslexic readers (Boets et al., 2011; Baart et al., 2012). While no significant association between phoneme categorization and reading measures was observed in the matched groups, the association between phoneme categorization and recalibration did approach significance. This relationship was confirmed by the results within the whole control group, revealing a significant link between the magnitude of the recalibration effect and the individual phoneme categorization slopes. Additionally, a significant link between reading accuracy and phoneme categorization also emerged in the entire control group, corroborating previous findings indicating that speech perception and reading are mediated by children's phonological skills (Mcbride-chang, 1996) and that speech perception and phonological awareness measures are significant predictors of first grade reading accuracy in preschoolers (Boets et al., 2008). These findings warrant further investigation in a larger sample of dyslexic and typical readers.

Our data also revealed a small adaptation effect, with the /aba/ response proportions to the most ambiguous sound in the adaptation task reaching statistical significance across the dyslexic and typical readers. The main purpose of this task was to investigate potential response strategies and ensure the reliability of the observed recalibration effect (van Linden and Vroomen, 2008). The finding that children showed a shift in the perceptual boundary of the ambiguous post-test sounds in the direction of text in the recalibration but not adaptation task reaffirms the robustness of the recalibration effect across groups. Thus, if children had simply responded in line with the text seen during the exposure blocks for both tasks, there would be no significant difference in the proportion of /aba/ responses between recalibration and adaptation. The finding that the adaptation effect itself was only significant when both groups of children were pooled together and only for the most ambiguous sound likely reflects the previously observed developmental trend in adaptation (Sussman and Carney, 1989; Sussman, 1993; van Linden and Vroomen, 2008). Another potential explanation for the lack of adaptation effects in our study may be found in the proposed more fragile nature of the effect. While recalibration effects can already be observed after single exposure (Keetels et al., 2016), adaptation effects have been shown to develop after a longer time period, require more exposure trials to emerge, and be longerlasting compared to recalibration effects (Vroomen et al., 2004, 2007).

## CONCLUSION

The present study investigated text-induced changes in perception of ambiguous speech sounds in children employing text-based recalibration. Our results indicate that both 8– 10 year-old dyslexic and typical readers show significant text-induced shifts in their perception of ambiguous speech. This finding is likely rooted in the flexibility of the cortical systems for letter-speech sound integration which have not yet been 'set in stone' at this age and are thus more flexible in terms of phonemic category perception. Furthermore, the magnitude of the recalibration effect was linked to the adaptation effect in children with dyslexia but not in typical readers. Extending these analyses to a larger sample of only typical readers revealed additional associations between recalibration and phoneme categorization as well as phoneme categorization and reading measures. Our findings highlight the importance of considering task demands and dynamic developmental changes in reading, speech perception and audiovisual learning when investigating group differences between typical and dyslexic readers. Future longitudinal research following the same children at different stages using both behavioral and brain activity measures is thus essential to understand the neurocognitive mechanisms explaining individual differences in acquired reading levels and dyslexia.

## AUTHOR CONTRIBUTIONS

fpsyg-10-00221 February 5, 2019 Time: 17:12 # 12

MB and LR designed the experiments. LR and RJ collected and analyzed the data. MB, LR, and RJ wrote the paper.

## FUNDING

This research was supported by The Netherlands Organization for Scientific Research (Vidi-Grant 452-16-004 to MB).

## REFERENCES


## ACKNOWLEDGMENTS

We would like to thank Giancarlo Valente and Fabian van den Berg for their advice on data analysis and all the children and parents for the time they took to participate in our research. Furthermore, we are grateful to the Regionaal Instituut voor Dyslexie (RID), and primary schools Kindcentrum Aloysius, Montessori Kindcentrum Maastricht, Basisschool Wyck, OBS de Spiegel, MBS de Poort, OBS de Regenboog for their support in acquiring participants.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Romanovska, Janssen and Bonte. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Early Brain Sensitivity to Word Frequency and Lexicality During Reading Aloud and Implicit Reading

*Luís Faísca1 , Alexandra Reis1 and Susana Araújo2 \**

*1 Department of Psychology and Educational Sciences and Centre for Biomedical Research (CBMR), University of Algarve, Faro, Portugal, 2 Faculdade de Psicologia, Universidade de Lisboa, Lisbon, Portugal*

The present study investigated the influence of lexical word properties on the early stages of visual word processing (<250 ms) and how the dynamics of lexical access interact with task-driven top-down processes. We compared the brain's electrical response (event-related potentials, ERPs) of 39 proficient adult readers for the effects of word frequency and word lexicality during an explicit reading task versus a visual immediaterepetition detection task where no linguistic intention is required. In general, we observed that left-lateralized processes linked to perceptual expertise for reading are task independent. Moreover, there was no hint of a word frequency effect in early ERPs, while there was a lexicality effect which was modulated by task demands: during implicit reading, we observed larger N1 negativity in the ERP to real words compared to pseudowords, but in contrast, this modulation by stimulus type was absent for the explicit reading aloud task (where words yielded the same activation as pseudowords). Thus, data indicate that the brain's response to lexical properties of a word is open to influences from top-down processes according to the representations that are relevant for the task, and this occurs from the earliest stages of visual recognition (within ~200 ms). We conjectured that the loci of these early top-down influences identified for implicit reading are probably restricted to lower levels of processing (such as whole word orthography) rather than the process of lexical access itself.

Keywords: N1 print tuning, early top-down modulation, reading aloud, implicit reading, word frequency, lexicality effects

## INTRODUCTION

People recognize written letters at such effortless and fast rate (<200 ms; Maurer and McCandliss, 2007), thanks to a universal, highly-specialized network specifically tuned to the recurrent properties of the orthographic code. This functional network comprises the left ventral occipitotemporal cortex and notably the visual word form area (VWFA; Cohen et al., 2002; McCandliss et al., 2003; Dehaene, 2010), whose responsivity to familiar letter strings (i.e., enhanced activation) originates from extensive experience with visual word forms. Event-related potential (ERP) studies have consistently identified the visual N1 (or N170) component as a neural correlate of fast, visual specialization for print (e.g., Bentin et al., 1999; Maurer et al., 2005b, 2006), presumably linked to the VWFA (Brem et al., 2006, 2009).

#### *Edited by:*

*Manuel Perea, University of Valencia, Spain*

#### *Reviewed by:*

*Gorka Fraga González, University of Zurich, Switzerland Urs Maurer, The Chinese University of Hong Kong, China*

#### *\*Correspondence:*

*Susana Araújo smaraujo@psicologia.ulisboa.pt*

#### *Specialty section:*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

*Received: 12 September 2018 Accepted: 28 March 2019 Published: 11 April 2019*

#### *Citation:*

*Faísca L, Reis A and Araújo S (2019) Early Brain Sensitivity to Word Frequency and Lexicality During Reading Aloud and Implicit Reading. Front. Psychol. 10:830. doi: 10.3389/fpsyg.2019.00830*

**203**

The N1 follows the P1 component and is indicated by an enhanced negative deflection around 150–200 ms postpresentation of printed letters versus symbol strings or false fonts. A selective functional response of the N1 emerges rapidly with literacy acquisition (Maurer et al., 2006, 2007; Eberhard-Moscicka et al., 2015) and most impressively even after a short grapheme-phoneme training in kindergarten (Brem et al., 2010), neoliterate adults (Pegado et al., 2014), or adults trained on a novel script (Maurer et al., 2010). It is related to word-reading fluency (Eberhard-Moscicka et al., 2015) and reduced/missed in illiterates (illiterate vs. literate adults; Pegado et al., 2014) and poor readers (dyslexic vs. typically developing readers; e.g., Maurer et al., 2007; Araújo et al., 2012; Hasko et al., 2013). However, which cognitive processes exactly are involved and contribute to the early neural tuning for words, indexed by the N1, remains somehow unclear and is the purpose of the present study.

Developmentally, a "coarse neural tuning" for print establishes early in the course of learning to read (after only 1 year of reading instruction; e.g., Zhao et al., 2014; Eberhard-Moscicka et al., 2015), as indexed by the N1 difference in the ERPs between letter strings and visually similar nonletters (e.g., A vs. Ϫ). Though this N1 activation reflects a low-level specialization for visual aspects of print, it is linguistically modulated and, hence, tends to be left-lateralized in expert readers (Bentin et al., 1999; Pegado et al., 2014). This occurs because constant print-to-speech pairing during literacy acquisition establishes interconnections between left-hemisphere regions associated with phonological processing and occipito-temporal regions related to visual recognition of print (Maurer and McCandliss, 2007). The developmental trajectory for the enhanced sensitivity to visual words follows an inverted U-curve with initial increase and its subsequent decrease with age (Maurer et al., 2006; Brem et al., 2009). This change over time is probably due to acquired efficiency and full specialization involving more selective brain processes. For instance, a "fine-tuned" N1 for words emerges as reading acquisition progresses, strongly allied to reading ability, and N1 becomes then responsive to familiar orthographic patterns within words (e.g., BSNEO vs. BESNO for portuguese; Hauk et al., 2006; Zhao et al., 2014; Araújo et al., 2015).

An open question is whether the N1 just reflects an automatic, bottom-up response to surface form features (e.g., visual word form) or is it already sensitive to the activation of specific representations *within* the word recognition system. To address this, several studies have compared the brain's neurophysiological response to two psycholinguistic dimensions of words known to influence lexical dynamics, word frequency, and word lexicality. Yet, results have been mixed: they either found larger N1 negativity in the ERP to low frequency words compared to high frequency words in adults (frequency effect; Sereno et al., 1998, 2003; Assadollahi and Pulvermuller, 2003; Hauk and Pulvermüller, 2004), reflecting the difficulty of accessing the lexical representations of low frequency words, or found no reliable effects in children (Araújo et al., 2012). Concerning lexical status, it has been shown that pseudowords elicited stronger brain responses than words in adults (lexicality effect; Hauk et al., 2009, 2012) and adolescents (Taroyan and Nicolson, 2009) already in an early time window. But again, lexicality effects on N1 have not been reliably found in children (Kast et al., 2010; Araújo et al., 2012; Hasko et al., 2013; Eberhard-Moscicka et al., 2015). All together, these results seem to suggest that N1 sensitivity to word frequency and lexicality depends on the phase of reading development, as well as on reading expertise (Araújo et al., 2015; Eberhard-Moscicka et al., 2015, 2016). However, in other studies, neither adults nor children processed pseudowords differently than words in the N1 component (Maurer et al., 2005b) or adults did not exhibit a N1 specialization for words over pseudowords in contrast to children who showed larger amplitudes for words (Maurer et al., 2006). It is possible that, beyond developmental aspects, factors such as reading strategies and task characteristics may contribute to or conversely mask differences in N1 sensitivity.

Previous studies have used different kinds of stimuli (real words of high- and low-frequency, pseudowords), but whether they trigger different reading strategies cannot be established based on these general stimulus categories. This is important given that the differences in reading strategies (from letterby-letter decoding to fluent whole-word reading), observed during the process of learning to read (Yoncheva et al., 2010; Ben-Shachar et al., 2011) or at different levels of proficiency (Zhao et al., 2014), potentially shape the N1 specialization for words. For example, when learning a new script, using graphemeto-phoneme conversion for reading induces a more left-lateralized negativity in the N1 window relative to whole-word recognition (Yoncheva et al., 2010) 1 . Ben-Shachar et al. (2011) also provided longitudinal evidence (7-to 15-year-old children) that changes in BOLD signals in the left occipito-temporal sulcus, in the vicinity of the VWFA, correlates with the change in sight word efficiency (number of frequent words read in 45 sec) but not with raw scores in phonemic decoding efficiency (pseudoword reading). But perhaps when reading becomes highly automated, like in proficient adult readers, print tuning disengages from reading strategies modulation (cf. Maurer et al., 2010). The present study followed up on this idea, aiming at testing adults' N1 sensitivity to lexical word properties (word frequency and lexicality) within a paradigm where the design and the stimulus material were carefully selected to elicit the presumable use of different reading strategies, either by whole-word recognition vs. piece-wise grapheme-to-phoneme conversion (see below).

A related question is whether and how the linguistic intention of the subject (given the task goals) could affect N1 sensitivity to the lexico-semantic properties of a written word. To date, mainly implicit word-processing tasks were used to study early visual processing, such as repetition detection (e.g., Maurer et al., 2006; Eberhard-Moscicka et al., 2015, 2016), lexical decision (a general measure of "wordlikness",

<sup>1</sup> It is worth noting that Yoncheva et al. (2010) refrained from relating their findings to a debate regarding dual reading routes (Coltheart et al., 2001), while the focus was on the importance of explicitly directing readers' attention to small sublexical phonological units versus large (whole-word) units of representations on early reading acquisition.

e.g., Kast et al., 2010; Mahé et al., 2012), or other variants of implicit reading (Araújo et al., 2012, 2015). However, using these implicit tasks as a proxy of reading in real life may not be as straightforward: in these tasks, participants had no conscious intention to engage in linguistic processing, and the focus is presumably on visual word form rather than grapheme-tophoneme conversion. Moreover, though implicit reading is usually effective in activating the reading network (e.g., Ben-Shachar et al., 2011), different electrophysiological patterns emerge just after the low-level visual analysis when processing print stimuli during implicit versus explicit reading tasks (with silent reading: Chen et al., 2013; with reading aloud: Mahé et al., 2015). This is (at least partly) expected given the demonstrations that even automatic/unconscious perception of stimuli can be modulated by context (e.g., stroop effect; Besner et al., 1997; masked priming N400 effects; Kiefer and Martens, 2010).

Therefore, in recent years, a few electrophysiological studies have explored the effects of task demands (e.g., could involve grapheme-phoneme decoding or simple visual recognition) on the processing of surface features (e.g., word form; Wang and Maurer, 2017; Sánchez-Vincitore et al., 2018) and of lexico-semantic properties of a word (Chen et al., 2015; Mahé et al., 2015; Strijkers et al., 2015) at the earliest latencies. For instance, Strijkers et al. (2015) observed an effect of word frequency as early as 120 ms after stimulus onset when readers consciously retrieved the meaning of the words (semantic categorization), but not until 100 ms later (at around 220 ms), when participants categorized the colored font of the same words (ink color categorization, where no linguistic processing is necessary). Recently, Wang and Maurer (2017) extended these findings by showing that task demands influence coarse neural tuning for print in the (late part of) N1, i.e., the letter-symbols difference was more pronounced in delayed naming and color detection compared to repetition detection. Taken together, these findings suggest that, though word recognition processes are largely automatic in the brain, very early on (N1 time window) visual-orthographic processing is flexible and penetrable to top-down influences. But very little attention has been dedicated to examining how these findings extend to the intentional and conscious skill of reading, a more ecological task.

Only a few studies have used explicit reading tasks and mainly to evaluate coarse neural tuning for print (Yoncheva et al., 2010; Chen et al., 2013; Sánchez-Vincitore et al., 2018; but see also Chen et al., 2015 and Mahé et al., 2015). For example, a recent study suggested a stronger sensitivity to word frequency in a lexical decision task compared to the silent reading task, reflected by enhanced activation of the ventral occipito-temporal cortex around 160 ms (Chen et al., 2015; but see Mahé et al., 2015). This result suggests that top-down modulation already affects information retrieval processes in visual word recognition and also in decision processes.

The present study thus aimed to further investigate (1) the influence of lexical word properties on the very early stages of visual processing (< 250 ms) of written words, and (2) whether the earliest modulation by lexico-semantic information retrieval (if any) interacts with task demands (i.e., the type of processing strategies required by the task, either graphemephoneme decoding for ulterior production or simple visual recognition for immediate-repetition detection). For (1), we manipulated the word form frequency (high vs. low) and the lexical status (real words vs. pseudowords) of the written words, all being well-matched for important sublexical aspects. Critically, we wanted to take this manipulation a step further, i.e., we ensured that words either encouraged alphabetic decoding versus whole word recognition for reading. Thus, stimuli were selected after being previously tested in an independent reading task with eye-movement recordings: supposedly, the reader's spatial and temporal approach to the word provides a proxy of the reading strategies used (Hawelka et al., 2010; Schattka et al., 2010; see Method section). In addition, we used a blocked list design in order to exacerbate early differences tied to reading strategies. It is conceivable that the block-wise design favors lexical processing for words versus grapheme-phoneme conversion as the preferred unit of phonological recoding for pseudowords (Kinoshita et al., 2004; Pagliuca et al., 2007; Lima and Castro, 2010). For testing (2), we compared the brain's response to print in the context of a task where conscious linguistic processing is not mandatory (one-back task as a measure of implicit reading) versus a more ecological task (delayed reading aloud task2 ) that required explicit reading and minimizes effects related to visual short-term memory or to task dependent decision/verification processes, testing the same participants and material in both tasks. We argue that the most convincing evidence in terms of specific word recognition processes will come from studies with complementary designs. This was the motivation and aim of our study. Typically developing adult readers have already reached automaticity in reading; therefore, we expect to observe a predominant left-lateralized N1 for all stimuli (words and pseudowords), irrespective of the task. Moreover, if lexical access during word recognition is instantiated automatically in adult readers, we predict lexical effects to start already around the N1 time window. Any interaction with task at these latencies would provide evidence for top-down task modulation of early retrieval of specific psycholinguistic information.

## MATERIALS AND METHODS

#### Participants

Thirty-nine adults (27 females) aged between 17 and 32 years (mean age [±SD] = 21.7 [±3.1] years) participated in this study. They were all undergraduate students and Portuguese native speakers and did not report neurological diseases or psychiatric disorders neither had history of reading and/or spelling problems (Portuguese adaptation of the Adult Reading History Questionnaire; Alves and Castro, 2004).

<sup>2</sup> As we become competent readers, silent reading likely becomes a preferred reading mode. Furthermore, silent and oral reading not necessarily rely on the same underlying processes and strategies (e.g., Krieber et al., 2017). That said, reading aloud tasks (used in our study) can certainly still provide a good index of the processes occurring during "reading," while they minimize effects related to visual short-term memory or to task dependent decision or verification processes.

Additional inclusion criterion for all the participants was a nonverbal IQ in the normal range (>85; Wechsler Adult Intelligence Scale—WAIS-III) and adequate reading level as determined by a reading decoding and comprehension test for dyslexia screening (Lobrot L3 > 25th percentile; 1-min time limit; five alternative forced-choice of the word that completes a sentence; total of 36 sentences; Portuguese adaptation for adults: Fernandes et al., 2017). Moreover, a reading aloud fluency test of the Differential Diagnosis Dyslexia Battery (3-DM, Portuguese version: Pacheco et al., 2014) was applied. This test comprised three lists of high-frequency words, low-frequency words, and pseudowords. Performance is computed as the number of stimuli read correctly per list in 30 s (mean score for real-word reading composite, *M* = 2.0 items/sec, *SD* = 0.28; for pseudoword reading, *M* = 1.5 items/ sec, *SD* = 0.22). Data from ten additional subjects were excluded either due to poor reading level (three participants) or excessive movement and eye blinking artifacts or other technical problems during EEG recording (seven participants). All participants gave their written informed consent to participate in the study and were paid for compensation.

#### Stimuli Material

The same material was used both for the one-back task and the reading aloud task. A total of 100 words (50 high-frequency words—HFW and 50 low-frequency words—LFW) were selected according to their word-form frequency (frequency of occurrence per million, *M* = 125.1 vs. 0.7 for high-frequency vs. low-frequency; P-PAL database; Soares et al., 2018). Fifty orthographically legal and pronounceable pseudowords (PW) were also created by exchanging at least two letters in the set of real words. Words and pseudowords were four-to-nine letters long, and all three conditions were matched (*F* tests, all *ps* > 0.2) in orthographic and phonological length, bigram frequency, and orthographic neighborhood density.

Important, the current study for the first time controlled for the reading strategies elicited by different words by means of eye movement recording. That is, all stimuli (high- and low-frequency words and pseudowords) to be included were selected after being previously tested in an independent reading task with 40 undergraduate students, while eye movements were recorded (SMI hi-speed eye tracking system, 1,250 Hz; see Silva et al., 2016, for a detailed description of the paradigm). In this task, words were arranged in six sets of matrices corresponding to the orthogonal manipulation of familiarity (high- and low-frequency words and pseudowords) and word length (short, long); each matrix comprised 12-to-15 items arranged in a 3 × 4/5 layout and 5 matrices for each set were presented (in total, 80 × 3 experimental stimuli plus fillers). Participants were instructed to read these words in a left-to-right and down fashion, and their speech responses and eye-movements were collected. Eye-movement data provide a good indication of online cognitive processing during reading such as the ease or difficulty of visual word recognition (Rayner, 1998) and might be informative about the reader's processing strategy, either a sublexical strategy or a lexical strategy for reading. For example, the well-documented word length effect in the case of unfamiliar words is an important marker of sublexical strategies manifested in RTs and, notably, also on the eye tracking parameters (that is, longer gaze duration and higher number of fixations for long items compared with short items; e.g., Hawelka et al., 2010). Thus, the assumption here was that prolonged gaze durations and higher fixation counts for words are taken to reflect sublexical decoding-based processes. In contrast, single fixations and shorter gaze durations, expected for the easiest items (i.e., familiar visual words), are suggestive of lexical reading *via* direct orthographic whole-word recognition (Hawelka et al., 2010; Schattka et al., 2010; Ablinger et al., 2014). For the present study, the selected PW received a significantly higher number of fixations and longer gaze durations (*M* ± *SD* = 3.26 ± 1.02 and 877 ms ± 253) than the selected LFW (*M* ± *SD* = 2.48 ± 0.52 and 610 ms ± 130) and those with HFW (*M* ± *SD* = 1.79 ± 0.31 and 422 ms ± 66), with stimulus length controlled; all differences between conditions were highly significant (*p* < 0.001). We thus assumed that participants rely on different reading strategies when processing these different types of words. Moreover, given that stimulus conditions were presented in separate blocks (see below), it is likely that the words-only presentation biases toward lexical processing, while the pseudowords-only list elicits a stronger reliance on smaller units of phonological recoding (e.g., Pagliuca et al., 2007; Lima and Castro, 2010).

#### Experimental Procedures

Each task was split into three blocks of HFW, LFW, and PW presented in pseudorandom order with specific instructions and a brief training (eight practice trials) before each block (**Figure 1**). The sequence of blocks was counterbalanced between participants. For the *explicit reading task,* we used a delayed reading aloud format to prevent recordings from being contaminated by speechrelated artifacts. Hence, this task allowed ERPs to be calculated for each stimulus on its initial presentation without interference from any reaction on the part of the subject, while behavioral accuracy responses after stimulus presentation ensured that subjects were engaged in the task. Each trial began with a fixation cross (500 ms) which was then replaced by a blank screen (100 ms), followed by the stimulus for 800 ms. Then, participants were cued with question marks "???" (1,500 ms) to read aloud the preceding (pseudo)word. The next trial began after an intertrial interval of 1,500 ms (including a period for the participants to blink their eyes). Participants were asked to pay attention to the words and pseudowords displayed but only to read them out loud whenever they saw question marks.

For the *implicit reading task*, we used a one-back task that has been commonly used in EEG research on early visual word recognition. Participants were asked to watch sequences of words and pseudowords and to press a button whenever an immediate repetition occurred (17% of the time); they were not required to read consciously the stimulus being presented. Each trial was presented on the following sequence: firstly, a fixation cross (500 ms) was displayed, which was then replaced by a blank screen (100 ms). Then, the stimulus appeared for 800 ms.

Again, the next trial began after an inter-trial interval of 1,500 ms. In both tasks, all (pseudo) words were displayed in lower case, in black Arial font on a white background, at eye-level at the center of the screen, and ranged from 2.2° to 3.8° visual angle.

The participants were tested individually in a soundproof room and sat at ~100 cm in front of a computer screen, being instructed to remain still and relaxed. Presentation software (version 11; https://www.neurobs.com/) was used to display the stimuli and record the participant's responses for the one-back task. The spoken responses in the reading aloud task were digitally recorded for latter response accuracy check.

All participants completed both tasks3 in counterbalanced order. Previous analyses conducted with task order as a factor yielded no main effects or interactions, and so task order was collapsed for the reported analyses.

#### EEG Recording and Analysis

The electroencephalogram (EEG) was recorded continuously using an ActiveTwo Biosemi amplifier (DC-67 Hz bandpass, 3 dB/octave, 24-bit sampling, 512 Hz sampling rate) from 64 Ag/AgCl scalp electrodes mounted in an elastic cap according to the International 10–20 system guidelines. The electrode montage included 10 midline sites and 27 sites over each hemisphere (**Figure 2**). Additional electrodes were used as ground and online reference (CMS/DRL nearby Pz; for a complete description, see biosemi.com) and for recording the electroencephalogram (EOG; placed below the right eye).

The EEG data were analyzed using the FieldTrip open source toolbox (Oostenveld et al., 2011). The continuously recorded data were epoched from −125 before to 700 ms following presentation of the stimulus and were time-locked to the onset of the target stimuli. Offline, the EEG data were low-pass filtered at 30 Hz and transformed to an average reference (eye electrodes were excluded to compute the common reference), and a baseline correction was applied by subtracting the average pre-stimulus voltage from the entire waveform. Bipolar EOG was computed using the Fp2 and the electrode placed vertically (vertical eye-movements) and horizontally using the F7 and the F8 electrode. Before averaging, epochs for each participant were physically inspected and those containing blinks and horizontal eye movements, muscle, or other artifacts were manually removed from the analysis. Data were visually artifact rejected on a trial-bytrial basis for eye blink and on a channel-by-channel basis for drift, blocking, and excessive alpha wave; the rejection procedure was blind to participants and conditions. A minimum of 30 trials for each of the conditions, per participant, were included in the final analyses. ERP data were analyzed by computing the mean amplitude of the waveforms during

<sup>3</sup> We acknowledge that our experimental design resulted in between task differences, with participants being requested to give a response to each trial in delayed reading and only to a limited number of trials in one-back detection. Nonetheless, in both tasks, ERPs were collected to each stimulus, i.e., during a period where participants prepared to responding in any case (and therefore in advance of the repetition/no repetition decision versus overt naming).

specific time windows, relative to the −125 to 0 ms pre-stimulus baseline interval.

All corrected trials were first averaged within experimental condition for each channel, synchronous to the onset of the target and following baseline correction. To restrict the number of statistical comparisons, a region-of-interest (ROI) approach (i.e., data averaged over a sub-set of electrodes, selected *a priori* according to theoretical considerations and visual inspection) was then used to calculate a grand-average over all participants for each condition and time window of interest.

To investigate fine-tuning effects in early visual processing, we analyzed brain's sensitivity to word form frequency and lexicality during the time windows from 90 to 120 ms (P1 component), given that prior studies have identified this component as the earliest index of specialized orthographic processing (e.g., Maurer et al., 2005a, 2006) and from 160 to 220 ms (N1 component). The mean amplitude of the Word frequency effect (high-frequency *vs*. low-frequency) and the Lexicality effect (low/high-frequency *vs*. pseudowords) on a set of representative sites (P7/P8, P9/P10, PO7/PO8, PO3/PO4, O1/O2) was subjected to an omnibus repeated measures ANOVA, including the factors Task (implicit reading vs. explicit reading), Stimulus type (HFW vs. PW and LFW vs. PW), and Hemisphere (right parieto-occipital sites vs. left parieto-occipital sites). Whenever two- and three-way interactions involving Task were found to be significant, we proceed to test each contrast regarding our manipulation of interest separately in a mixed-design ANOVA.

As a complementary approach, we performed a systematic analysis of our main component of interest (early N1 ERP component) in peak time window by using mean amplitude over +/− 30 ms interval around the maximum peak (determined per subject for each condition and for the clusters of channels of interest).

## RESULTS

#### Behavioral Results

To assess differences in difficulty between the explicit and implicit reading tasks, we ran repeated measures ANOVA on the error percentages with Task (implicit reading and explicit reading) and Stimulus (HFW, LFW, and PW) as within-subject factors. Accuracy was close to ceiling for both tasks, although slightly higher for the explicit reading task (implicit reading task: *M ± SD* = 93.4% ± 7.8; explicit reading task: *M* ± *SD* = 97.0% ± 2.1%; *F*(1, 38) = 7.2, *p* = 0.011, *partial-ƞ*<sup>2</sup> = 0.16). Given these high accuracy responses, further differences in evoked brain responses between both tasks are not likely related to poor accuracy in performing the task or task comprehension difficulties. A significant interaction suggests that accuracy differences between stimulus were not equal for both tasks, *F*(2, 76) = 7.6, *p* < 0.001, *partial-ƞ*<sup>2</sup> = 0.17: while error rates were similar for the three type of stimulus in the implicit reading task (*p* = 0.935), for the explicit reading task, HFWs were more often correctly named (*M* = 99.9%) than LFW (*M* = 97.6%) and both more correctly named than PW (*M* = 93.5%), *F*(1.4, 51.4) = 62.5, *p* < 0.001, *partial-ƞ*<sup>2</sup> = 0.62, with Greenhouse–Geisser correction for sphericity.

## Electrophysiological Results

#### Sensitivity to Word Form Frequency

To test the P1–N1 sensitivity to word-form frequency, we contrasted ERPs to letter-strings that mainly differ by frequency of occurrence. An overall analysis was done with Task (implicit reading vs. explicit reading), Stimulus Type (HFW vs. LFW), and Hemisphere (right parieto-occipital sites vs. left parieto-occipital sites) as within-subject factors.

*P1 (90–120ms):* Only a main effect of Hemisphere was observed at around 90–120 ms, *F*(1,38) = 13.3, *p* < 0.001, *partial-ƞ*<sup>2</sup> = 0.26, revealing that at posterior sites, the P1 elicited by high- and low-frequency words was more positive over the right than the left hemisphere. We did not find reliable Stimulus (*p* = 0.215) and Task (*p* = 0.908) effects (all interactions involving these factors, *p's* > 0.4).

*N1 (160–220ms):* In the 2 (Task) × 2 (Word form frequency) × 2 (Hemisphere) omnibus ANOVA run on the N1 mean amplitude, the three-way interaction Stimulus by Task by Hemisphere was at a trend level, *F*(1, 38) = 3.9, *p* = 0.055, *partial-ƞ*<sup>2</sup> = 0.09. Planned comparisons were then performed for each task separately. The main effect of hemisphere was robust for both implicit reading, *F*(1, 38) = 8.6, *p* = 0.006, *partial-ƞ*<sup>2</sup> = 0.18, and explicit reading, *F*(1, 38) = 9.5, *p* = 0.004, *partial-ƞ*<sup>2</sup> = 0.20. As expected, ERPs were more negative over the left parieto-occipito sites than the right parieto-occipito sites. The effect of Word frequency did not reach statistical significance (explicit reading: *F*(1, 38) = 2.7, *p* = 0.107, *partial-ƞ*<sup>2</sup> = 0.07; implicit reading: *p =* 0.676), hence indicating no significant difference in processing high- and low-frequency words irrespective of the task. Neither did the interaction of Word frequency and Hemisphere (for both tasks, *p's* > 0.2).

The same analysis was repeated using the window centered at the N1 peak. Again, only the main effect of hemisphere was significant, *F*(1, 38) = 10.1, *p* = 0.003, *partial-ƞ*<sup>2</sup> = 0.21. The main effect of Word frequency and the interaction Word frequency by Task were still nonsignificant (all *p's* > 0.4).

Yet, visual inspection of **Figure 3** suggested the possibility of an effect of word frequency on later stages of processing at around 300 ms that already start during the N1. Indeed, when we analyzed voltages on this later time window, just after the N1 (220–340ms after stimulus onset), word frequency did affect brain responses, *F*(1, 38) = 6.9, *p* = 0.012, *partial-ƞ*<sup>2</sup> = 0.15, as high-frequency words yielded larger amplitudes than low-frequency words (main effect of hemisphere, F(1, 38) = 4.7, *p* = 0.036, *partial-ƞ*<sup>2</sup> = 0.11, indicating larger negativity at the left posterior sites). No main effects of Task or interactions of interest were observed in this later time window (all *p's* > 0.3).

#### Sensitivity to Lexicality

To investigate early effects of whole-word processing (sensitivity to lexicality), we contrasted the brain activation to real words and pseudowords. We run two separate ANOVAs: one contrasting HFW vs. PW and the other one LFW vs. PW. For both analysis, the factors were Task (implicit reading vs. explicit reading), Stimulus type (HFW vs. PW or LFW vs. PW) and Hemisphere (right parieto-occipital sites vs. left parieto-occipital sites).

*P1 (90–120ms*): The two-way interaction between Stimulus type and Hemisphere was significant in both ANOVAs (HFW vs. PW: *F*(1, 38) = 5.7, *p* = 0.022, *partial-ƞ*<sup>2</sup> = 0.13; LFW vs. PW: *F*(1, 38) = 6.0, *p* = 0.019, *partial-ƞ*<sup>2</sup> = 0.14). Post-hoc pairwise comparisons showed that at the right hemisphere ERPs elicited by high- and low-frequency words were more positive than

high-frequency (solid line) and low-frequency words (dashed line) and pseudowords (dotted line).

those elicited by pseudowords (*p* = 0.015 and *p* < 0.001, respectively) while there was no lexicality effect for the left hemisphere. The effect of Stimulus was independent of the Task (Stimulus by Task: HFW vs. PW—*F*(1, 38) = 1.5, *p* = 0.224, *partial-ƞ*<sup>2</sup> = 0.04; LFW vs. PW—*F*(1, 38) = 2.7, *p* = 0.107, *partial-ƞ*<sup>2</sup> = 0.07; three-way interactions, both *p's* > 0.3).

*N1 (160–220ms):* The omnibus ANOVAs revealed that task demands interacted with the stimulus effect in the N1 time window as shown by the three-way interaction Stimulus, Task and Hemisphere (LFW vs. PW: *F*(1, 38) = 5.6, *p* = 0.023, *partial-ƞ*<sup>2</sup> = 0.13) and by the nearly significant interaction between Stimulus and Task (HFW vs. PW: *F*(1, 38) = 3.1, *p* = 0.088, *partial-ƞ*<sup>2</sup> = 0.08) (see **Figures 3** and **4**). Planned comparisons separately by Task indicated a Lexicality effect on the left hemisphere for the implicit reading (HFW vs. PW: *F*(1, 38) = 5.8, *p* = 0.021, *partial-ƞ*<sup>2</sup> = 0.13; LFW vs. PW: *F*(1, 38) = 11.4, *p* = 0.002, *partial-ƞ*<sup>2</sup> = 0.23). HFW and LFW elicited more negativegoing ERPs compared to PW over the left occipito-parietal sites, while at right sites, the N1 mean amplitudes did not differentiate processing between stimulus. However, for the explicit reading task, we found no difference between real words and PW (HFW vs. PW: main effect of Stimulus, *p* = 0.736, and Stimulus by Hemisphere, *F*(1, 38) = 2.3, *p* = 0.140, *partial-ƞ*<sup>2</sup> = 0.06; LFW vs. PW: main effect of Stimulus, *F*(1, 38) = 1.1, *p* = 0.296, *partial-ƞ*<sup>2</sup> = 0.03, and Stimulus by Hemisphere*, p* = 0.998). For this task, only the main effect of Hemisphere reached significance in both ANOVAs (HFW vs. PW: *F*(1, 38) = 11.3, *p* = 0.002, *partial-ƞ*<sup>2</sup> = 0.23; LFW vs. PW: *F*(1, 38) = 8.4, *p* = 0.006, *partial-ƞ*<sup>2</sup> = 0.18), with ERPs being more negative over the left than the right hemisphere.

Additionally, we performed the same repeated measures ANOVAs on the peak of the N1. We found that lexicality effects are modulated by task, i.e., implicit reading was associated with greater left posterior activation for real words versus pseudowords (HFW vs. PW: *F*(1, 38) = 9.1, *p* = 0.005, *partial-ƞ*<sup>2</sup> = 0.19, LFW vs. PW: *F*(1, 38) = 14.5, *p* < 0.001, *partial-ƞ*<sup>2</sup> = 0.28), while lexicality effects were observed for the explicit reading (both ANOVAs, *p's* > 0.3).

#### DISCUSSION

This study aimed to explore whether lexical information of a word (i.e., word frequency and lexical status) influences the early stages of visual word recognition and if this influence depends upon the task demands. We recorded ERPs during two reading tasks that either necessarily involve linguistic processing (delayed reading aloud) or not (one-back repetition detection) and using strictly the same material (high-frequency *vs.* low-frequency words *vs.* pseudowords) and participants in both tasks. In this study, we refrained from testing coarse neural tuning for print, as indexed by differences in amplitudes between letter and symbols strings (therefore, symbols were not included in the material). Robust print tuning effects in the visual N1 have already been demonstrated elsewhere, at the group (e.g., Maurer et al., 2006, 2007; Brem et al., 2009; Araújo et al., 2012) and individual level (Eberhard-Moscicka et al., 2016). However, studies do not agree in finding differences between different kinds of letter strings such as lexicality and frequency effects. These effects were then the focus of the present study and our core findings were (1) a robust leftlateralized N1 response in adult expert readers that generalizes to different letter string categories and tasks, (2) early lexicality effects that are task-dependent, and (3) absence of word frequency effect at the early P1-N1 time windows, irrespective of the task (a late frequency effect was rather found, around ~300 ms).

#### Lateralization of N1

In the N1 component, we found larger negativities at left compared to right posterior sites across all types of letterstrings and irrespective of the task. This left-lateralization of N1 for word stimuli is expected for fast, automatic linguistic processes in skilled readers, as opposed to right-hemispheric topography of the N1 in children and adults with low literacy skills, presumably more linked to visual familiarity effects (Maurer et al., 2005b; Sánchez-Vincitore et al., 2018).

Interestingly, these task-independent lateralization effects in skilled adult readers contrast with prior studies of adults learning a novel script: "words" trained through grapheme-to-phoneme conversion elicited left-lateralized N1 responses to the reading verification task (Yoncheva et al., 2010) but not to the one-back task (Maurer et al., 2010). Hence, the type of processing strategies required by the task (i.e., task demands) influences lateralized processes linked to perceptual expertise for reading within ~200 ms but apparently on an earlier acquisition stage. That is, explicit attention on orthography-to-phonology associations may be a necessary condition for a left-lateralized N1 response to visual words in early phases (cf. Maurer et al., 2010). As readers become more expert-like, a predominantly left-lateralized engagement is elicited, not modulated by attention and task demands as observed here (see also Strijkers et al., 2011b).

#### Sensitivity to Lexicality and Word-Form Frequency

In what regard lexical dynamics, the effects of lexicality and word frequency in early stages of visual word recognition have been volatile: from significant effects in adults but not in children (Hauk et al., 2009, 2012; Eberhard-Moscicka et al., 2016) or the reverse (Maurer et al., 2006) to null effects (Maurer et al., 2005b). Moreover, these effects have been barely investigated with explicit reading tasks. Here, we replicated the finding that in adult readers, lexical processing already happens within the first 220 ms of viewing the words during implicit reading (e.g., Hauk et al., 2012; Araújo et al., 2015; Eberhard-Moscicka et al., 2016): using a one-back task, we observed that N1 was increased for real words compared to pseudowords, probably reflecting greater sensitivity for familiar orthographic patterns. Alternatively, though the one-back task minimizes deliberate higher-order processes, task-unrelated automatic phonological activation of words may still have occurred for a certain extent (Kronschnabel et al., 2013). These early lexicality effects have not been seen for younger children (Eberhard-Moscicka et al., 2015, 2016). Yet, we note that a functional relation between the "lexical" N1 specialization in adult readers and (proficient) reading skills was not observed in our data: the word-pseudoword N1 effects at the left hemisphere did not correlate with word reading fluency (correlation with HFW-PW difference: *r* = 0.04, *p* = 0.825; correlation with LFW-PW difference: *r* = −.23, *p* = 0.157). This null finding might suggest that the N1 word form-sensitivity in competent readers reflects a process which is already highly automatized. Accordingly, prior ERP data had shown that the N1 specialization follows a nonlinear development (e.g., Brem et al., 2006).

On the other hand, when conscious linguistic processing is mandatory, as in explicit reading, the activation elicited by words and pseudowords was similar at the N1 time window. This null effect for lexicality replicates earlier findings (Mahé et al., 2015). Hence, we found evidence supporting a task effect on the early neural processes involved in reading: a psycholinguistic variable such as lexicality exerted an influence on early visual word processing but the pattern of its influence was sensitive to the task demands placed on the reader. That is, when the task did not require explicit reading (as for visual immediate-repetition detection), the ERPs elicited by words displayed more negative-going amplitudes at the left hemisphere when compared to those elicited by pseudowords; this suggests that when the task requires a shallow processing (simple visual recognition), real words might engage automatic reading-related processes to a larger degree than pseudowords do (Maurer et al., 2005a; Eberhard-Moscicka et al., 2016), possibly due to their extensive exposure and a tight relationship with phonology. However, these automatic reading processes seem to be flexible enough to accommodate the task demands such as when explicit reading is required. The absence of lexicality effects in our delayed reading aloud task suggests that the goal of the task (reading aloud both words and pseudowords) modulates reading processes, focusing participants' attention to the graphemephoneme decoding attributes of the stimulus.

Though less robust, some of the differences found in the N1 already started earlier during the P1 time window (~100 ms), in agreement with some previous studies (Zhao et al., 2014). The P1 component has been associated with low-level visual processing but is also sensitive to attention load (e.g., Araújo et al., 2015), independent of the literacy level (Pegado et al., 2014). Thus, the lexicality effect observed at this first peak likely arises from a greater perceptual resource allocation for pseudowords than for words (as the visual processing demands are greater for the former), while neural signatures actually corresponding to lexical access occur slightly later, at the N1 window. Importantly, this lexicality effect observed in P1 time window was not modulated by the task. The absence of early task modulation at the level of P1 thus suggests that the interaction effects at N1 window cannot be explained by an exogenous increase in attention toward word stimuli specifically in the one-back repetition detection. Rather, it seems that different intentional goals for explicit reading versus immediaterepetition detection in this study may have induced strategic top-down modulations in processing of words versus pseudowords at early latencies of visual recognition. This early modulation either occurs through facilitating access to word representations, or, alternatively, the loci of these effects are restricted to lower levels of processing such as whole word orthography (see e.g., Katz et al., 2005, Experiment 3). In the former case, yet, we should have seen an earlier lexicality effect for explicit reading compared to when no reading intention is present, because the requirement to speak aloud instigates faster access to the lexicon (Strijkers et al., 2011a), which in turn should be harder for pseudowords.

In a related study, a task-driven lexicality effect was not found: Mahé et al. (2015) reported that from about 140 ms after perceiving a word, the adults' brain electrical response dissociates between reading aloud and lexical decision (taken as a measure of implicit reading), which however did not depend on lexicality (a very late lexicality effect was found in both tasks). Thus, one factor that seems to be of importance is the depth of linguistic processing required in the implicit reading tasks, which may be stronger in lexical decision than in visual immediate-repetition detection of letter strings. By using the latter task, we and others (e.g., Eberhard-Moscicka et al., 2016) did observe an early lexicality effect. Alternatively, it is still possible that differences in the designs between Mahé et al. (2015) and our study may explain this discrepancy. Specifically, ERPs derived from block-wise presentations might be more affected by changes in the attentional states between words and pseudowords compared to randomized stimulus presentations (used in Mahé et al.'s study). However, we have no reason to suspect that attentional effects were more strongly enhanced for blocked words than for blocked pseudowords depending on the specific task.

It is possible that, beyond task demand and its interaction with lexical dynamics, the extent to which the reading strategies (lexical and sublexical) are engaged could per se modulate the N1 specialization (see e.g., Maurer et al., 2010; Ben-Shachar et al., 2011; Zhao et al., 2014). Experimental manipulations involving familiar words (emphasizing whole-word, lexical processing) and pseudowords (requiring letter-by-letter decoding; as predicted by dual-route models; e.g., Coltheart et al., 2001) are commonly used in reading research. But yet, in prior studies, we cannot rule out the possibility that due to shallow task demands (e.g., visual recognition in n-back), the participants processed these stimuli likewise, without recruiting different reading subprocesses (e.g., processing pseudowords as actual words, basing their decisions on "wordlikness"). The originality of the present study stands on the methodological control it offered, ensuring that the processing of the word and pseudoword stimuli is qualitatively distinct as based on external markers collected in an independent eye-tracking study (see method) and a blocked lists design (see e.g., Lima and Castro, 2010). Overall, our data add that, in the adult expert state, early print tuning disengages from reading strategies modulation, and therefore, the effect of stimulus type was null in explicit reading (where one would expect the effects of the reading strategies to be especially exacerbated). However, we support the notion that initial access to the linguistic system is influenced by task-driven top-down processes according to the behavioral goals that are relevant to specific tasks (Balota and Yap, 2006), either the intention to overt speech or not. This main outcome is at odds with the traditional view according to which any influence comes into play during late (post-) decision processes (e.g., Nobre et al., 1998; Bentin et al., 1999), while the observed effects can be accounted for in a number of ways within visual word recognition models (but which our study cannot truly disentangle). In principle, the evidence favors the assertion that some degree of feedback occurs in the system during visual word recognition, modulating early ERP markers. In an "interactive account" of reading, higher-level top-down (e.g., phonological) and visual bottom-up orthographic information interacts reciprocally and in an automatic fashion for visual word recognition (Price and Devlin, 2011). Accordingly, prior studies have provided evidence supporting early-top down effects from the lexical to the abstract orthographic/letter level of encoding (e.g., case match effects at around 200 ms interacted with lexicality in an identity priming paradigm: Vergara-Martínez et al., 2015). Or, our results could be predicted from the Bayesian modeling framework (Norris, 2006; Norris and Kinoshita, 2012), by assuming that readers behave as "optimal" decision makers that take into account perceptual evidence framed by prior knowledge (lexicality effects as an index of the higher probability of real words) combined with their goal and the decision to be made. This view does not necessarily imply feedback mechanisms during visual word recognition, but eventually task demands can tune some parameters of the visual word recognition system and, especially in a block design, shape the feedforward stream of information without requiring a continuous adjustment through feedback control (Norris et al., 2000). In a study of masked priming, for example, Norris and colleagues have shown an equivalent priming effect in *same* responses to nonwords during a same-different task as in *yes* decisions to words in lexical decision, expressed in behavioral and ERP data (Norris et al., 2018). This result was thus interpreted as indicating that priming effects were more so a consequence of the cognitive and perceptual decision/computation that participants must perform on the stimulus than of automatic processing (specifically lexical or semantic) elicited by reading a word. A few other recent ERP studies have also revealed that different intentional goals influence the processing of surface properties (Wang and Maurer, 2017; Sánchez-Vincitore et al., 2018) and also fine tuning for print (Chen et al., 2015; Strijkers et al., 2015), implying that a flexible lexical processing system may depend to some extent on the specific demands of the task. We extended these results to the intentional and conscious skill of reading aloud.

In this study, we could not find reliable N1 differences between high- and low-frequency words, as reported occasionally for adults (e.g., Araújo et al., 2015; Eberhard-Moscicka et al., 2016). A word frequency effect was only seen at a later stage of processing (~300 ms). Neither did we replicate the finding that linguistic intention leads to an earlier onset of word frequency effects (Strijkers et al., 2011a, 2015; Chen et al., 2015). A tentative explanation is that neural tuning for lexical familiarity improves over an inverted U-curve like the typical N1 coarse print tuning development (Maurer et al., 2006; Brem et al., 2009), and perhaps, ERP frequency effects are only observed upon certain conditions, e.g., depending on stimulus repetition or the list composition (specifically, "pure" lists of restricted frequency ranges vs. "mixed" word conditions modulate the word-frequency effect: Glanzer and Ehrenreich, 1979). It is also possible that the use of long words in our study (mean length: 6.8 letters) may have led to a slightly delayed onset of a stimulus frequency effect, given that the amplitude and specific latency of this effect at early brain responses (including the N1) might critically depend on word length. For example, using MEG, Assadollahi and Pulvermüller (Assadollahi and Pulvermuller, 2003) found effects of word frequency as early as 120-170 ms for short, monosyllabic words only (low frequency items leading to stronger brain responses) and latter frequency effects seen specifically for long words (5–7 letters), at around 240 ms.

To summarize, our results indicate that already within the earliest stages of processing, visual word recognition is open to influence from top-down processes due to the intention to engage in linguistic processing (reading aloud) or not (n-back repetition detection). These task-driven modulations extend beyond general word activation, as seen previously using "coarse" contrasts (all-word vs. resting period: Chen et al., 2013; words vs. symbols: Wang and Maurer, 2017), affecting also specific higher-order aspects of the word recognition process. In expert processing, this influence is (apparently) not modulated by reading strategies and is reflected by effects of lexicality within N1 in our study and extends to other psycholinguistic properties that affect lexical access (e.g., lexical frequency, imageability; Chen et al., 2015; Strijkers et al., 2015) tested using other task designs (more or less close to natural reading). However, lateralized reading processes associated with visual expertise for print-produced task-independent effects.

#### ETHICS STATEMENT

The study followed the Portuguese Regulation for the Code of Ethics and Conduct in Psychology. All subjects gave written informed consent in accordance with the Declaration of Helsinki.

#### REFERENCES


#### AUTHOR CONTRIBUTIONS

LF, AR, and SA contributed to the conception and design of the study. LF and SA organized the database and performed the statistical analysis. SA wrote the first draft of the manuscript. LF and AR wrote sections of the manuscript. All authors contributed to manuscript revision, read and approved the submitted version.

#### FUNDING

This work was supported by the Portuguese Foundation for Science and Technology, FCT (project ref. EXPL/ MHC-PCN/0299/2013, UID/BIM/04773/2013 CBMR, PTDC/ PSI-GER/32602/2017), and IF 2015 Program (IF/00533/ 2015) to SA.

#### ACKNOWLEDGMENTS

We thank Loide Carvalho and Luís Casaca for their assistance on data collection.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Faísca, Reis and Araújo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Pitch as the Main Determiner of Italian Lexical Stress Perception Across the Lifespan: Evidence From Typical Development and Dyslexia

Martina Caccia1,2, Giorgio Presti<sup>3</sup> , Alessio Toraldo4,5, Anthea Radaelli<sup>6</sup> , Luca Andrea Ludovico<sup>3</sup> , Anna Ogliari<sup>6</sup> and Maria Luisa Lorusso<sup>2</sup> \*

<sup>1</sup> Center for Neurocognition, Epistemology and Theoretical Syntax, University School For Advanced Studies, Pavia, Italy, <sup>2</sup> Unit of Neuropsychology of Developmental Disorders, Department of Child Psychopathology, Scientific Institute IRCCS "E. Medea", Bosisio Parini, Italy, <sup>3</sup> Laboratory of Music Informatics (LIM), Department of Computer Science, University of Milan, Milan, Italy, <sup>4</sup> Department of Brain and Behavioural Sciences, University of Pavia, Pavia, Italy, <sup>5</sup> Milan Center for Neuroscience, Milan, Italy, <sup>6</sup> Developmental Psychopathology Unit, Vita-Salute San Raffaele University, Milan, Italy

#### Edited by:

Iliana I. Karipidis, Stanford University, United States

#### Reviewed by:

Elpis Pavlidou, University of York, United Kingdom Norbert Maïonchi-Pino, UMR6024 Laboratoire de Psychologie Sociale et Cognitive (LAPSCO), France

\*Correspondence:

Maria Luisa Lorusso marialuisalorusso@lanostrafamiglia.it

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 19 November 2018 Accepted: 07 June 2019 Published: 26 June 2019

#### Citation:

Caccia M, Presti G, Toraldo A, Radaelli A, Ludovico LA, Ogliari A and Lorusso ML (2019) Pitch as the Main Determiner of Italian Lexical Stress Perception Across the Lifespan: Evidence From Typical Development and Dyslexia. Front. Psychol. 10:1458. doi: 10.3389/fpsyg.2019.01458 The study deals with the issue of lexical stress perception in both a developmental (comparing children and adults with typical development) and a clinical perspective (comparing typically developing children and children with dyslexia). The three parameters characterizing the acoustic profiles of words and non-words in a certain language are duration, pitch and intensity of its syllables. Based on (sparse) previous literature on Italian and other European languages, it was expected that syllable duration would be the parameter predominantly determining the perception of stress position. It was furthermore anticipated that children with dyslexia may be found to have an altered perception of lexical stress, due to their impairments in auditory processing of either pitch, duration or (more controversial) intensity. Systematic manipulation of the pitch, duration and intensity profiles of three Italian trisyllabic non-words produced a series of 81 stimuli, that were judged with respect to stress position (perceived on the ultimate, penultimate, or antepenultimate syllable) by the three groups of participants. The results showed, contrarily to expectations, that the pitch component is the most reliable acoustic cue in stress perception for both adults, in whom this dominance is very strong, and typically developing children, who showed a similar but quantitatively less marked pattern. As to children with dyslexia, they did not seem to rely on any parameter for their judgments, and rather gave random responses, which point to a general inability to process the various acoustic modulations that normally contribute to stress perception. Performance on the stress perception task strongly correlates with language (morphosyntactic) measures in the whole sample of children, and with reading abilities in the group with dyslexia, confirming the strict relationship between the two sets of skills. These findings seem to support a language-specific approach, suggesting that the set of acoustic parameters required for the development of stress perception is language-dependent rather than universal.

#### Keywords: lexical stress, acoustic parameters, developmental dyslexia, developmental trajectories, Italian language, pitch, duration, intensity

**Abbreviations:** AP, PE, U, respectively, stress located on the AntePenultimate, Penultimate, Ultimate syllable; DAW, Digital Audio Workstation; DC, Duration Consistency score; DD, children with developmental dyslexia; IC, Intensity Consistency score; LSAC, language-specific auditory cue hypothesis; OC, Overall Consistency score; PC, Pitch Consistency score; RDH, rhythm detection hypothesis; TD, typically developing children.

## INTRODUCTION

fpsyg-10-01458 June 24, 2019 Time: 15:14 # 2

Stress is an important prosodic feature which "makes one syllable in a word more prominent than its neighbors" (Himmelmann and Ladd, 2008, p. 248). Stress contributes to create rhythm in speech and each language is characterized by its own rhythmic pattern (Kuhn et al., 2010); rhythm can be defined as the alternation of strong and weak beats recurring in the sequence of auditory events (Huss et al., 2011).

Languages differ not only in their segmental possibilities, but also in their use of prosodic cues to convey differences in meaning. For example, tone languages, such as Chinese, use variations in pitch to distinguish among different lexical items. These pitch differences seem to be difficult to perceive for adult speakers of non-tonal languages such as English (Wang et al., 1999). Moreover, in a cross-linguistic study, Dupoux et al. (1997) showed that native speakers of French have more difficulties in perceiving word stress than native speakers of Spanish. Indeed, Spanish uses stress in a contrastive way (to distinguish between words) but French does not. The authors conclude that French speakers probably use stress at a different level, for instance for finding word or phonological phrase boundaries.

Following the rhythm detection hypothesis (RDH) (Goswami et al., 2002, 2011), phonological development seems strictly driven by the sensitivity to slower rather than rapid auditory events. However, Antoniou et al. (2015) suggested that the set of acoustic cues required for language development is languagespecific (LSAC, language-specific auditory cue) rather than universal as postulated by RDH. Specifically, tone languages, such as Chinese, seem to be based on pitch movement (the rise and fall of the pitch) (Chung et al., 2017); actually, pitch contour sensitivity (or sensitivity to intonation) appears to be fundamental to phonological, reading and language development both in Mandarin (Goswami et al., 2011) and Cantonese (Antoniou et al., 2015).

The issue of universal vs. language-specific phenomena in stress perception had already been addressed within other conceptual frameworks, such as studies concerning the so-called P-center. Morton et al. (1976) defined the "perceptual center" or "P-center" as the perceptual moment of occurrence of a monosyllabic token. Indeed, languages greatly differ with respect to their rhythmic organization. It has been proposed that they can be subdivided into three classes: stress-timed (e.g., English and German), syllable-timed (e.g., French and Spanish), and moratimed (Japanese). In stress-timed languages, which are by far the most studied ones, the intervals between stressed syllables should be approximately isochronous. However, research has failed to confirm strict isochrony between acoustically defined intervals in speech produced in various conditions (Lehiste, 1977; Fox, 1987). Thus, the perception of rhythmicity does not seem to arise from the presence of isochronous acoustic onsets of linguistic elements such as stressed syllables, nor is it easily amenable to any other common measures of acoustic energy. In stressed syllables, it can be affected by the duration of the single vowels and consonants, the presence of unstressed prefixes and/or suffixes (Fox and Lehiste, 1985, 1987), or vowel onset (Fowler, 1983). In Czech disyllabic words, also the number of consonants, as well as some speakerrelated abilities were found to influence the position of the P-center (Šturm and Volín, 2016). Thus, the phonetic structure of the whole word may contribute to the P-center location. Hoequist (1983) examined the P-center effect in the production of English, Spanish, and Japanese monosyllables and showed a significant Onset Type effect (same vs. different) but no specific Language effects nor any interactions with language, thus leading to the conclusion that the P-center effect is a universal phenomenon.

Fox (1987) also investigated whether the perceived location of the P-center is generalizable across different languages, comparing monolingual Japanese and American English speakers and came to the conclusion that, for American English, vowel duration causes a shift in the P-center location, as a function of the final consonant duration. This would be true also for Japanese speakers, although the absolute values of the parameters were not identical, and moreover, the contribution of the final consonant was irrelevant. These results support the hypothesis that, in spite of some minor differences in timing between languages, the P-center effect may be common to all (or at least many) languages. However, and most crucially, not much is known concerning more complex syllabic structures and multisyllabic structures following highly language-specific stress rules. Moreover, reported data mainly concern stress-timed and mora-timed languages (Czech being half-way between a stress timed and a syllable-timed language, depending on the type of NP, see Šturm and Volín, 2016). For these reasons, a study on Italian would provide interesting information about syllabletimed languages.

Indeed, some authors propose that the phonological characteristics of languages could be more relevant than simply isochrony or other forms of temporal organization (Nespor et al., 2011). It may be relevant to observe that different groups of languages are characterized by different amounts of variation: "syllable-timed" languages have a smaller variety of syllable types than "stress-timed" languages, and their syllables are more similar to each other in duration (Dauer, 1983). In Italian, 60 percent of the syllable types are of the CV-type (Bortolini, 1976). Thus, the speakers/listeners of this language may use other cues beyond duration to support stress perception.

Sensitivity to stress patterns is particularly relevant in language learning as it helps the initial segmentation of words from continuous speech (e.g., Mattys et al., 1999) and it also makes information available about the syntactic category of a word. Specifically, stress may allow to discriminate between content words (stressed) and function words (unstressed) (Gleitman and Wanner, 1982) but also between different content words, such as nouns (stress on the first syllable) and verbs (stress on the second syllable) in many languages (Kelly, 1988).

The first studies on word stress perception in Italian suggested that duration, intensity and pitch all contribute to stress assignment (Panconcelli-Calzia, 1912). Gemelli (1950) was the first author who proposed a hierarchy of the acoustic parameters that concur to stress perception: (1) duration, (2) pitch, and (3) intensity. Duration has later been confirmed to be the most reliable cue in stress perception firstly in disyllabic

words (Ferrero, 1972; Fava and Magno-Caldognetto, 1976). Always using disyllabic words, Bertinetto (1980) proposed the hierarchy (1) duration, (2) intensity, and (3) pitch in perception. Subsequent studies on disyllabic and trisyllabic words confirmed – with minor differences – the relevance of duration as the most reliable acoustic stress cue in Italian in comparison to other languages (even though the authors specified that duration alone is not sufficient to clearly define stress assignment) (e.g., D'Imperio and Rosenthal, 1999; Alfano, 2006; Alfano et al., 2007; Eriksson et al., 2016). Nonetheless, fundamental frequency seems to play a crucial role in lexical stress perception in English and French (Fant et al., 1991; Hasegawa and Hata, 1992) and in both Japanese and Chinese (Hasegawa and Hata, 1992; Antoniou et al., 2015).

Linguistic prosody seems to play a crucial role in enhancing the perception of single sounds in children's phonological representations during speech processing (Chiat, 1983; Pierrehumbert, 2003). Consequently, awareness of prosodic patterns (such as English stress and Mandarin tone/pitch) might be important to detect segmentation cues from sound and, therefore, in reading acquisition because children might use such patterns as segmentation cues to sound out words (Chung et al., 2017). Indeed, prosodic awareness seems to be an early ability since many studies have shown that infants are able to perceive the acoustic correlates of word stress from birth. Thus, Italian newborns have been reported to discriminate different stress patterns in di- and trisyllabic pseudo-words (e.g., /'takala/ vs. /ta'kala/), and in lists of pseudo-words with consonantal variation (/'daga 'nata / vs. /da'ga na'ta /) (Sansavini et al., 1997). Similarly, 2-months-old English infants can discriminate the stress patterns of disyllabic pseudo-words (/'bada'gada/ vs. /ba'da ga'da/) (Jusczyk and Thompson, 1978).

Infants exposed to a language (such as Spanish) with contrastive lexical stress (i.e., with stress-syllable placement determining word meaning) have to process stress patterns not only at the acoustic level, but also at a more abstract, phonological level, since stress could be located on more than one vowel, depending on the specific meaning (Skoruppa et al., 2009). Other studies suggest that stress perception at this abstract level may evolve very early in infant development (Jusczyk et al., 1993). As to English, it is only at 9 months of age that a preference for the predominant stress-initial pattern typical of this language emerges. Moreover, a cross-linguistic study on tone perception in infants shows that between 6 and 6 months of age, English infants' discrimination abilities for stress perception decline compared to those of Chinese infants (Mattock and Burnham, 2006).

Skoruppa et al. (2009) found that language-specific differences in the perception of stress likewise arise during the first year of life. Specifically, 9-month-old Spanish infants successfully distinguish between stress-initial and stress-final pseudowords, while French infants of the same age show no sign of discrimination.

Sensitivity to stress patterns seems to be related to the development of skilled reading (Goswami et al., 2002; Orsolini et al., 2006; Wood, 2006 among others) and to reading-related disorders, specifically, developmental dyslexia.

Developmental dyslexia (DD henceforth) is a neurobiological condition with a genetic basis (Siegel and Lipka, 2008; Peterson and Pennington, 2012) that "is manifested in a continuum of specific learning difficulties related to the acquisition of basic skills in reading, spelling and/or writing, such difficulties being unexplained in relation to an individual's other abilities and educational experiences" (Report of the Task Force on Dyslexia, 2001).

The presence of a deficit at the phonological level and its role in reading disorders are well established. What is still under debate is the nature of these difficulties. Some researchers proposed that besides phonological impairments there is a more basic auditory deficit. Tallal (1980) demonstrated that children with a specific reading impairment face difficulties in making discrimination or temporal order judgements with either very brief tones or tones presented at short (<400 ms) interstimulus intervals (ISIs). In light of the above, Tallal suggested that dyslexic children could have a non-linguistic deficit in temporal resolution of short and rapidly changing auditory stimuli that affects speech perception. Frey et al. (2018) investigated discrimination of phonetic features (syllables differing for voicing, place and mode of articulation) in noise, envelope and silence conditions, and found that children with DD showed longer RTs than their control group across all conditions although they did not differ from TD children in terms of accuracy. The authors proposed that the deficits found in silence conditions might support the hypothesis that internal neural noise disturbs the processing of the acoustic properties of stimuli in dyslexia.

A systematic review on basic auditory processing in DD by Hämäläinen et al. (2013) showed that rise time (meant as the time taken by a signal to change from sound beginning to its maximum amplitude), slow frequency modulation (FM) rates, frequency discrimination with differences smaller than 10%, amplitude modulation (AM) and duration discrimination were most often impaired in individuals with dyslexia (with differences emerging depending on the age of participants and the characteristics of the stimuli or procedures), whereas less consistent findings were found for intensity discrimination and gap perception, that turned out to be unimpaired in dyslexia in most studies.

A number of studies on pitch processing suggest that pitch memory may not be as durable for children as for adults. These studies found declines in children's memory over the course of a few seconds (Keller and Cowan, 1994; Gomes et al., 1999; see also Trehub et al., 1984). In a behavioral study, Keller and Cowan (1994) showed that 6–7 years old children showed a faster accuracy decrease in a pitch change detection task with variable ISIs compared to adults. Furthermore, several studies showed that pitch processing is sensitive to language experience (e.g., Chinese speakers are more sensitive to pitch variations than English speakers). McAnally and Stein (1996) showed that individuals with DD are impaired in detecting audible changes of tone and in generating phase-locked discharges while decoding pitch variation. Furthermore, Baldeweg et al. (1999) found abnormal mismatch negativity (MMN) during passive pitch discrimination in adults with DD but a normal MMN to tone duration deviants; at the behavioral level, they found an impairment in discriminating tone frequency, but not tone duration. The pitch discrimination and MMN deficit were correlated with the degree of impairment in word and non-word reading accuracy.

On the other hand, Cantiani et al. (2009) and Lorusso et al. (2014) found that children with DD were impaired in temporal processing tasks concerning duration discrimination, in a task requiring discrimination between two rhythms differing for the interval between identical repeated tones. Discrimination of patterns of tones differing in their inter-stimulus intervals had already been found by Schulte-Körne et al. (1999) and Kujala et al. (2000) to differentiate between dyslexic and non-dyslexic participants at the psychophysiological response level.

Many studies have shown that children with DD are impaired in processing rhythmic structures; specifically, they show a lack of sensitivity in lexical stress perception (Goswami et al., 2013) which seems to characterize also adults with reading impairments. Barry et al. (2012) tested the sensitivity to lexical stress in adult German-speaking students with a reading deficit, and found that students with reading problems, despite having normal implicit knowledge of lexical stress rules, failed to show explicit metalinguistic awareness of them. Moreover, children (Goswami et al., 2002, 2010) and adults with DD (Law et al., 2014) show atypical processing of sound rise times and intensity. Studies on Finnish and English showed that the perception of duration in speech sounds is critical in DD (Leppänen et al., 1999, 2002; Richardson et al., 2004). In the Finnish language, duration plays a crucial role in differentiating words both orthographically and semantically; indeed Hämäläinen et al. (2009) found that Finnish-speaking children with DD differed from TD children in duration discrimination but not in the perception of intensity modulation and rise time. Moreover, Ziegler et al. (2012) found that DD show a deficit in pitch contour perception.

Wang et al. (2012) found that in children with DD, accurate discrimination of variation in intensity and rise time was a significant predictor of reading accuracy in Chinese, even if intensity discrimination was not found to be an important source of inter-individual differences in many alphabetic languages (Muneaux et al., 2004; Richardson et al., 2004; Goswami et al., 2010; Hämäläinen et al., 2013). Furthermore, Wang et al. (2012) found that duration and frequency discrimination contribute significant unique variance to tasks of onset and rhyme awareness.

Stress assignment in Italian polysyllabic words is neither diacritically marked nor predicted by rules. Most three- and four-syllable words are stressed on the penultimate syllable, which is considered as the dominant (or "regular") stress. A smaller proportion of polysyllabic words are stressed on the antepenultimate syllable (non-dominant or "irregular" stress; e.g., Toraldo et al., 2006). Even if the knowledge of distributional properties of sound–spelling mappings is acquired quite early, it could vary as a function of age and reading/spelling experience, also in consistent orthographies like Italian. Indeed Angelelli et al. (2010) and Paizi et al. (2011) showed that children with DD performed very poorly with lowfrequency words, indicating a possible lack or unavailability of orthographic representations for this kind of material; they also highlighted a reduced lexical processing ability compared to control readers in both spelling and reading tasks in Italian. Moreover, children with specific learning disabilities tend to omit the Italian diacritical stress, which is compulsory for Italian words with stress on the last syllable. However, these children also proved able to take into account the distributional properties of Italian sound–spelling mappings. This effect was present in both reading and spelling, although with notable differences as a function of word frequency. The distributional properties of sound–spelling mappings were detected by third grade, indicating early acquisition of this skill even in children with dyslexia and dysgraphia (Marinelli et al., 2017).

In the present study, the reliability of duration, pitch and intensity as predictors of stress perception was investigated both in a developmental and in a clinical perspective. To our knowledge, there are no previous studies that investigated the role of the acoustic cues in Italian lexical stress perception in both children – typically and atypically developing – and adults. In order to avoid effects due to familiarity, frequency and other lexical variables, only non-words were used in the study. Furthermore, different types of syllable were employed so as to have a larger variety of stimuli and a representative sample of the typical repertoire of Italian lexical strings. Moreover, three-syllabic non-words were considered, so as to have information on three possible stress positions in Italian words: antepenultimate – AP, penultimate – PE, and ultimate – U, with stress on the first, the second and the third syllable, respectively.

The critical manipulation was the dissociation of the three relevant parameters, duration, intensity, and pitch, from one another. By means of a dedicated software, we could build a balanced set of new acoustic stimuli which vary, independently, for the duration profile (AP, PE, U), the intensity profile (AP, PE, U) and the pitch profile (AP, PE, U). For instance, in the set we had a stimulus whose duration profile was that of an AP stimulus, whose intensity profile was that of an U stimulus and whose pitch profile was that of a PE stimulus. All possible combinations were used, and allowed us to derive "consistency" scores, expressing to what degree a given participant used duration, or intensity, or pitch, to determine his/her perceived stress position.

Based on the analysis of previous literature, we expected that:


#### MATERIALS AND METHODS

#### Participants

Typically developing children (TD), children with dyslexia (DD) and normotypical adults participated in this study. Selection criteria are detailed below.

Children with DD were selected among those diagnosed at the Scientific Institute "E. Medea" or at the clinical services depending on San Raffaele-Ville Turro hospital as having Specific Reading Disorders according to standard ICD-10 criteria (World Health Organization, 1992). We included in the study only children who had a score at least 2 SD below the mean in at least two reading tests (speed and/or accuracy parameters), and an IQ score ≥ 80 (see later for further details).

TD children were recruited from local primary schools. As a selection criterion, these children were administered with a battery of tests assessing their general intellectual and linguistic abilities (see list below). Children who scored more than 1.5 SD below the mean in at least one test were excluded from the study.

Normotypical adults were recruited among the experimenters' acquaintances and students at S. Raffaele University. Participants with self-reported hearing impairments, learning disabilities and previous language impairments were excluded.

Before starting the experimental task, children of both groups (TD and DD) were asked to carry out a stress-perception test in order to ascertain if they were familiar with the task of identifying stress position. The pre-test consisted of a list of 24 trisyllabic Italian words with different stress position (antepenultimate – AP, penultimate – PE, and ultimate – U, syllable stress). The experimenter read the target word aloud and children were asked to say aloud the number 1, 2, or 3, corresponding to the first, second or third syllable, according to what syllable they perceived as the stressed one. Children had to correctly answer at least three consecutive items. Participants who did not reach this cut-off were excluded.

At the end of the selection process, 48 participants remained, and took part in the experiment: 18 TD children (mean age = 9.85, SD = 0.67, range 8.9–10.7; 10 males), 15 children with DD (mean age = 10.3, SD = 0.87, range 9.28–11.9; 5 males) and 15 normotypical adults (AC; mean age = 29.2, SD = 11.3, range 20.5–56.8; 6 males).

All participants were native Italian speakers, and all children were regularly attending school. All children's parents/legal guardians and adult participants signed written informed consent. The study had been approved by the Ethics Committee of the University of Pavia, according to standards of the Helsinki Declaration (1964).

#### Materials

#### Standardized Language and Cognitive Tests

Here we list all the tests that were used either as selection criteria for, or to characterize, the TD and DD groups. All these tests were standardized on the Italian school-age population.

The following tests were administered to TD children in order to evaluate their linguistic and general cognitive abilities.

(i) A test of morphosyntactic comprehension and production (CoSiMo – described in Cantiani et al., 2015). This unpublished test has been standardized in a large, wellcontrolled normative sample from various regions of Italy. Three subtests were administered: a direct to indirect speech transformation task ("speech"), an active to passive voice transformation task ("voice") and a task on free morphology where the use of clitic pronouns has to be judged and corrected when necessary so as to render the same meaning as a target sentence ("clitics"). The battery relies on the implicit use of morphosyntactic transformations and avoids any reference to explicit rules, giving examples of transformations as instructions.


**Table 1** reports descriptive statistics for the listed tests in the TD and DD groups.

#### Experimental Stimuli

The stimuli of the experimental task were derived from three non-words /dididi/, /gugugu/, and /tatata/. The vowels /e/, /ε/, /o/, /O/ were not used in order to avoid any biases that tend to be pronounced differently in different regions of Italy and to change their characteristics depending on stress position. Each non-word was recorded by the same native Italian speaker; specifically, several instances were produced and recorded, and the clearest and most recognizable recording (as judged by six adult listeners with an agreement of at least 4/6) was selected, so as to have one recording for each of the different stress patterns: AP, PE, and U, thus producing a set of nine basic recordings. Recordings were carried out with an entry-level dynamic microphone in a quiet environment, without changing distance from the device, and trying to keep a consistent loudness between the takes. Recordings were performed through the PRAAT software (Boersma and Weenink, 2018) and stored in WAV files as a single-channel Pulse Code Modulation stream, with a sampling frequency of 44100 Hz and a bitdepth of 16 bits.

Each of the nine sounds were analyzed to extract the three features that, according to the literature, differentiate stressed from non-stressed syllables:


TABLE 1 | Mean (±SD) standardized scores of the TD and DD groups.

IQ was obtained from the WISC-IV (Orsini et al., 2012) for the DD children, and from the CPM test (Raven, 1947; Belacchi et al., 2008) for the TD children. Z scores are reported for all other tests. Statistical tests are reported for the comparisons between groups (p-values are one-tailed in the direction of deficit in the DD group). Mann–Whitney tests were performed in cases of violation of homoscedasticity or normality assumptions (Z scores are reported in these cases).


**Figure 1** illustrates the different features of each original audio file.

The nine original files were then manipulated (by means of a dedicated software, Steinberg Cubase) in order to obtain 72 new stimuli, in which the duration, intensity and pitch stress patterns were dissociated and varied independently. For instance, there were stimuli with the duration profile of an AP stimulus, the intensity profile of an U stimulus, and the pitch profile of a PE stimulus, others with U duration and pitch patterns but an AP intensity pattern, and so on. All possible combinations were generated, for an overall set of 81 stimuli (3 × 3 × 3 × 3, i.e., duration profile, AP/PE/U, by intensity profile, AP/PE/U, by pitch profile, AP/PE/U, by non-word, /dididi/, /gugugu/, /tatata/ – the nine original stimuli belong to this overall set). This design allowed us to disentangle the contribution of each parameter to the perceived stress position from the contributions of the others.

Thanks to the VariAudio and Free Warp functionalities, all of the previously mentioned manipulated audio files sounded very natural. We explicitly avoided producing artificial-sounding stimuli, as these might have biased the results (by giving the listener some hints as to the manipulation, with unpredictable effects on performance). Albeit synthetic in origin, our naturalsounding material was likely to be processed by the listener in the same way as real-world stimuli.

The exact software procedure applied to produce the stimuli is detailed in the **Supplementary Materials**.

## Procedure

All participants were individually tested in a quiet room, seated next to the experimenter, in front of the 11.6<sup>00</sup> screen of an Acer Aspire V5-131 × 64 laptop computer. They listened to the stimuli through AKG K518DJ headphones. One of three pre-randomized lists of items were randomly assigned to each participant (the algorithms were designed using Psychopy software: Peirce, 2007). A visual stimulus, showing the target non-word written in capital letters and without diacritical stress, e.g., TATATA, appeared on the screen simultaneously with the audio stimulus. Participants had to judge the stress position, by pressing the keys "1", "2," or "3" (corresponding to AP, PE, U) with the index finger of their dominant hand. No feedback was given regarding response accuracy. The written non-word remained on the screen until the participant pressed one of the three keys; 1 second later the next trial began. No time limits were given for responding, but stimuli could not be played twice.

## Statistical Analyses

#### Consistency Scores

A number of 'Consistency scores' were obtained from the performance of each participant and analyzed. A Consistency score expressed to what degree the participant's responses matched a given parameter of the stimulus (duration, intensity, or pitch). Taking the DC as an example and supposing that on a given trial the duration pattern was that of an AP stress, the participant's responses were scored as follows: an AP response, consistent with the Duration pattern, was granted a score of 1; any other response was given a score of 0. If the Duration pattern was Penultimate (PE), 1 was granted to a PE response, and 0 otherwise; if the Duration pattern was Ultimate (U), 1 was granted to an U response, and 0 otherwise. This procedure was repeated all across the trials, obtaining a list of binary 0-1 scores; the proportion of 1 scores expressed the responses' degree of consistency with Duration. This proportion ranges from 1/3

(chance level), that is, the expected proportion of Durationconsistent responses if Duration has no influence on responses, and 1 – the ideal case where Duration directly determines the response on every trial. To have a measure with more transparent meaning, we rescaled the score to bring its range from [0,1] with 1/3 chance level, to [−0.5,1] with 0 chance level (this is achieved by applying a 1.5x – 0.5 transformation to the x original value). Thus the DC is expected to be zero if Duration is not considered at all in the judgements, to be 1 when it determines all responses, and to assume intermediate values when its influence on responses is partial. A negative score indicates that some process led to choose the 'correct' response according to Duration less often than chance would predict. For instance, if a participant tends to perceive or classify AP Duration stress patterns as PE, the Duration scores will come out negative. The extreme value, −0.5, corresponds to the (purely theoretical) case in which the 'correct' response according to Duration is never given.

By applying the same procedure to Intensity and Pitch, we ended up having three Consistency scores: Duration Consistency (DC), Intensity Consistency (IC), and Pitch Consistency (PC). Clearly, the three scores constrain one another, because one cannot be fully consistent (score = 1) with more than one criterion. Thus, while a participant whose responses fully reflect Duration will have (DC, IC, PC) = (1, 0, 0), a participant in whom Duration 'wins' in half the trials and Intensity 'wins' in the other half, will obtain (0.5, 0.5, 0). The sum of the three scores, DC + IC + PC (which usually does not exceed 1) expresses the Overall Consistency (OC) of responses with any criterion. Thus for instance, a participant whose responses are consistent with Duration in <sup>1</sup>/<sup>4</sup> of the trials, with Intensity in another <sup>1</sup>/4, and given at random in the remaining <sup>1</sup>/<sup>2</sup> the trials, obtains (0.25, 0.25, 0), with OC being 0.25 + 0.25 + 0 = 0.5, correctly reflecting the fact that the participant considered some criterion to generate his response in half the trials. Clearly, if responses are totally unrelated to the three criteria and given at random, OC turns out to be 0. This does not necessarily mean that the participant selects the AP, PE, and U responses in equal proportions: thanks to the fully balanced design, any response bias – any preference AP, PE, or U responses cancels out and provides no spurious contribution to the Consistency scores. All such features of the present measures were confirmed by means of simple Monte Carlo simulations.

As a further step, nine specific Consistency scores were derived for each combination of Parameter (Duration, Intensity, Pitch) and Stimulus Stress Pattern (AP, PE, U). This allowed us to understand whether the effect of a given parameter varies according to the position of the stress pattern (e.g., Duration might have a greater impact on responses in the AP than in the PE and U configurations, etc.).

#### Statistical Tools

General Linear Model (GLM) was used to analyze Consistency scores within groups. This proved adequate insofar as the histograms of the residuals obtained from the most complex GLM model did not show important departures from normality. By contrast, non-parametrics were used to compare different Groups as their scores could show important departures from normality. Because of these violations, the GLM interactions between Group and some within-participant variable(s), were little reliable and were either interpreted with caution or omitted from the present discussion.

On a first step, we determined whether the three groups, adults, TD and DD, differed in their overall ability to carry out the task. On a second step, we explored the relative use of the three parameters, Duration, Intensity and Pitch, in each of the three groups. On a third step, we explored whether each parameter had differential effects according to Stress Pattern.

Additionally, we carried out a set of GLM and Partial Correlation analyses to further explore the role of some predictors – age, morphosyntactic comprehension/production, sentence repetition, and reading parameters, on the Consistency scores.

Effect sizes were reported in terms of partial eta-squared (η 2 ) for GLM analyses. Greenhouse–Geisser correction was applied to three-way within-subjects effects (hence the noninteger degrees of freedom).

No correction for multiple comparisons was applied, being the analyses planned comparisons with explicit and clear-cut expectations. As to the impact of various predictors on Stress Perception parameters, for which we had no clear expectations, no correction was applied either, due to the presence of high mutual correlations between the variables. Given the novelty of both the stimuli used and the Consistency scores derived from them, we had no reliable way to estimate the effect sizes before the experiment. Hence we could not perform a power analysis, which is an acknowledged limitation of the present study.

#### RESULTS

#### Overall Consistency Score

When looking at the general ability to solve the task, as measured by the OC, all three groups obtained an above-chance performance, albeit DD children barely surpassed this threshold [Wilcoxon tests: DD children, z = 2.019, one-tailed p = 0.022; adults: z = 3.411, one-tailed p = 0.001; TD children: z = 3.436, one-tailed p = 0.001]. However, there were massive differences between groups [Kruskal–Wallis, χ 2 (2, N = 48) = 23.235, p < 0.001]. The left side of **Figure 2** shows the pattern. As

expected, adults performed much better than TD children: the average OC score of the former, 0.733, was more than three times larger than that of the latter, 0.228, [Mann–Whitney, z = 3.926, one-tailed p < 0.001]; in turn, TD children slightly outperformed dyslexic children, whose OC score was 0.096 [z = 1.701, onetailed p = 0.044]. Thus, recalling that OC = 1 corresponds to a perfectly consistent performance (according to any criterion) and that OC = 0 corresponds to random responses, adults (0.733) were reasonably close to an optimal performance, while children of both groups were very far from it (0.228 and 0.096).

While, as shown above, performance was on the average above chance in all three groups, a significant proportion of participants had a performance level compatible with random selection of responses, and this proportion was largely different in different groups. A Monte-Carlo simulation study (N = 10,000) showed that a reasonable (95%) random-response range for OC is between −0.259 and +0.259. Two out of 15 adults (13%) had a score in the random range, while 9/18 (50%) TD children, and 12/15 (80%!) of DD children did so. This picture, however, has the limitation that there might, in principle, be participants who fell in the 'random' range, with an OC relatively close to zero, not because they responded randomly, but because they had opposite (positive and negative) consistencies canceling out each other (e.g., using Pitch in the 'correct' way, but using Duration in an unexpected way, e.g., systematically selecting the PE response when the Duration pattern is AP). To rule out this criticism, we also studied the inherent variation of the OC score. Indeed OC is actually the mean consistency across 27 atomic subscores (those identified by the 3 × 3 × 3 Non-word × Stress Position × Parameter design); the Monte Carlo study taught us that if participants had been selecting responses at random, the standard deviation (SD) of the 27 consistency values would have been (95%) below 0.294. **Figure 3** plots the OC values against the SD values for all participants, and the random-response range

is shown as a box with dashed borders. In this perspective, the picture is different: 1/15 adults (7%), 3/18 TD children (17%), and 8/15 DD children (53%) fell in the random-response region.

Overall, while DD and TD children are much closer to random performance than AC, there are hints that some non-random behaviors characterize all groups.

#### Main Effects of Parameter

As a second step, we studied to what degree each parameter, Duration, Intensity or Pitch, influenced responses. To do so, we looked at the variable Parameter (DC, IC, PC) within each group. The right side of **Figure 2** shows the patterns.

#### Adults

Adults showed a very large effect of Parameter [F(1.304,18.261) = 88.178, p < 0.001, η <sup>2</sup> = 0.863]. Pitch proved to be by far the most influential factor in determining perceived stress pattern in this group, with a Consistency score of 0.614. Duration and Intensity were much less effective, with 0.078 and 0.042 Consistency scores. However it is important to note that the latter contributions were both significantly above chance [Wilcoxon: z = 2.737, one-tailed p = 0.003, and z = 2.367, one-tailed p = 0.009, respectively]. Pairwise comparisons confirmed the huge advantage of Pitch over Duration [F(1,14) = 93.763, p < 0.001, η <sup>2</sup> = 0.87] and over Intensity [F(1,14) = 99.554, p < 0.001, η <sup>2</sup> = 0.877]. Duration and Intensity influenced responses to a similar degree [F(1,14) = 2.029, p = 0.176, η <sup>2</sup> = 0.127].

#### Typically Developing Children

TD children also showed an effect of Parameter [F(1.998, 33.967) = 4.108, p = 0.025, η <sup>2</sup> = 0.195]. Pitch contributed with a Consistency score of 0.131, Duration of 0.075 and Intensity of 0.023. Only Pitch and Duration contributed significantly above chance [Wilcoxon: z = 3.198, one-tailed p < 0.001, and z = 2.669, one-tailed p = 0.004, respectively], while Intensity failed to reach this threshold [z = 0.83, one-tailed p = 0.203]. Pairwise comparisons between Parameters showed that only the difference between Intensity and Pitch reached significance [F(1,17) = 8.462, p = 0.01, η <sup>2</sup> = 0.332], while the Pitch-Duration [F(1,17) = 2.122, p = 0.163 η <sup>2</sup> = 0.111] and the Intensity-Duration comparisons [F(1,17) = 1.927, p = 0.183, η <sup>2</sup> = 0.102] failed to do so.

#### Dyslexic Children

DD children did not show an effect of Parameter [F(1.617,22.634) = 0.224, p = 0.754, η <sup>2</sup> = 0.016]. So there is no evidence that the (slightly) above-chance performance by this group depends on some specific parameter.

#### Overview

**Figure 2** clearly depicts this general pattern of results. Adults and TD children seem to show a qualitatively similar pattern – Duration and Intensity are used to a small degree<sup>1</sup> , and Pitch is used to a higher degree. As for Pitch, a quantitative difference emerges, in that its influence is much higher in adults. By contrast, DD children seem to show a qualitatively different pattern<sup>2</sup> : albeit they perform slightly above chance, there is no hint as to what (average) combination of Parameters are being used by them – their average pattern seems flat.

#### The Effects of Parameters in Different Stress Positions

As a last step, we explored whether the Consistency scores of the various Parameters were modulated by Stress Position. Given the large overall-performance differences between groups, these analyses were again carried out on each group separately.

#### Adults

Adults showed a significant effect of Parameter [F(1.304,18.261) = 88.178, p < 0.001, η <sup>2</sup> = 0.863] and of Parameter × Stress Position [F(2.901,40.609) = 9.278, p < 0.001, η <sup>2</sup> = 0.399]. The inspection of the plot (**Figure 4**) clarifies the meaning of such an interaction. While Pitch and Intensity influenced responses to a similar degree across the three Stress positions [Pitch: F(1.595,22.33) = 0.758, p = 0.452, η <sup>2</sup> = 0.051; Intensity: F(1.535,21.489) = 0.238, p = 0.732, η <sup>2</sup> = 0.017], Duration seemed to be most effective in PE position [F(1.409,19.733) = 3.961, p = 0.048, η <sup>2</sup> = 0.221].

#### Typically Developing Children

TD children also showed a significant Parameter × Stress Position interaction [F(2.982,50.698) = 4.311, p = 0.009, η <sup>2</sup> = 0.202].

<sup>1</sup>This is confirmed by the non-significant Group × Parameter interaction when only looking at Duration/Intensity and adults/TD children [F(1,31) = 0.124, p = 0.727, η <sup>2</sup> = 0.004]. From the same analysis, the Duration/Intensity comparison fell short of significance [F(1,31) = 3.469, p = 0.072, η <sup>2</sup> = 0.101]. However, because of the marked asymmetry in the distribution of adults' scores, interactions are not completely reliable.

<sup>2</sup>The Group × Parameter interaction, when comparing TD to DD children, gave F(1.947,60.348) = 2.932, p = 0.061, η <sup>2</sup> = 0.086.

Stress Position modulated the effect of Parameters as follows (**Figure 5**). The effect of Pitch was largest in U, intermediate in AP, and smallest in PE, for which the effect was at chance level [F(1.815,30.852) = 3.936, p = 0.034, η <sup>2</sup> = 0.188]. By contrast, Duration seemed to be mostly affecting U stress patterns, while being at chance level for PE and AP [F(1.731,29.431) = 3.464, p = 0.051, η <sup>2</sup> = 0.169]. No Stress-Position effect was found for Intensity [F(1.946,33.079) = 2.261, p = 0.121, η <sup>2</sup> = 0.117].

#### Dyslexic Children

DD children showed a marginal Parameter x Stress Position modulation [F(2.973,41.622) = 2.93, p = 0.045, η <sup>2</sup> = 0.173]. However, none of the within-Parameter analyses revealed any significant effect.

#### Other Predictors

The question then arises whether this pattern of results is modulated by some predictors that were available from

penultimate; or U, ultimate). Error bars are 95% confidence intervals.

our TD and DD samples. Namely, we wondered whether age, morphosyntactic abilities and especially, reading abilities (which were measured in the DD group), have an impact on performance on the stress perception task. Being aware of the lack of power of an analysis including all predictors at once, we explored the dataset in a stepwise fashion, by including only variables that proved significant on a previous step, and if OC was not affected by some predictor, its specific effects on the Duration, Intensity and Pitch components were not studied.

#### TD and DD Children: Effects of Age and Linguistic Abilities

Age, Morphosyntactic abilities (the sum of the three subscores of the CoSiMo battery), and Sentence Repetition were available for both TD and DD children, so the effects of these predictors were analyzed by GLM to partial out possible group differences (**Table 2**).

Age and Age × Group (where Group is TD vs. DD children) were studied at a first step, and proved non-significant as predictors of the OC score of the stress task. Hence, at least in the short age range that we explored (8.95–11.87 years), age does not account for the tendency by some DD children to express random responses, which corresponds to an OC score close to zero.

On a second step, the morphosyntactic ability score (sum of CoSiMo subtests) was used as predictor of OC (Age was used as a covariate, which is equivalent to using age-standardized CoSiMo scores, but again it proved nonsignificant, so it was removed from the analysis). As shown in **Table 2**, CoSiMo significantly affected performance on the stress perception task in the expected direction: the better the morphosyntactic abilities, the better the OC score. More in detail, the Duration and Pitch components contributed to such an effect – i.e., those components that were found to be relevant in the perception of stress position by TD children.

On a third step, Sentence Repetition was studied as a predictor (Age and CoSiMo scores were used as covariates, and only CoSiMo confirmed to have a reliable impact), however, this failed to significantly predict the OC score.

#### DD Children: Effects of Reading Scores

The relationship between OC scores and reading was tested in the DD group. We focused on the word and non-word reading tests (DDE-2, Sartori et al., 2007) because these rely on identical material across the tested ages. Both reading accuracy and reading speed (seconds per syllable, see Toraldo and Lorusso, 2012, for theoretical justification) were analyzed. Partial correlations of such scores with stress perception parameters are reported in **Table 3**.

All four reading scores were rather robust predictors of OC; in most cases (see **Table 3**) Pitch was the component that was best predicted by reading performance. **Figure 6** shows the predictive pattern, which is rather tight, with (partial) correlations between stress perception and reading performance ranging from 0.544 to 0.788 in absolute value.

#### TABLE 2 | Impact of a set of predictors on Stress Perception parameters.


CoSiMo, Morphosyntax. SR, Sentence Repetition. P-values for main effects are one-tailed in the expected direction (the younger, the worse the performance; the worse the performance on the predictor, the worse the performance on stress perception); p-values from interactions are two-tailed. Significant effects are reported in bold. Effect sizes are reported as Eta-squared (η 2 ).

TABLE 3 | Impact of reading performance (DDE-2, Sartori et al., 2007) on Stress Perception parameters in the DD group.


Partial correlations are reported controlling for the contributions by Age and CoSiMo scores (when significant). All p-values are one-tailed in the expected direction (the worse the reading performance, the worse the performance on stress perception). Significant correlations are reported in bold.

Reading Time (seconds per syllable, right panel) on the DDE task, for children with Developmental Dyslexia. Dashed horizontal lines report chance level (OC = 0); solid black lines are the regressions for words (filled squares); solid gray lines are the regressions for non-words (open diamonds).

#### DISCUSSION

Systematic manipulation of the pitch, duration and intensity profiles of three Italian trisyllabic non-words produced a series of 81 stimuli. These stimuli were judged with respect to stress position (perceived on the ultimate, penultimate or antepenultimate syllable) by three groups of participants: children with dyslexia, TD matched on age and gender, and normotypical adults.

We had a number of predictions based on the previous literature which we will now discuss in turn.

#### The Dominance of Pitch Over Duration

A first prediction, based on previous literature, was that duration should have been the critical parameter in determining stress assignment while processing Italian non-words. This hypothesis was contradicted by our results, which showed that the pitch component is the most reliable acoustic cue in

stress perception for both adults, in whom this dominance is very strong, and TD children, who showed a similar but quantitatively less marked pattern. Although many studies on Italian stress perception underlined the role of duration (e.g., Bertinetto, 1980), other studies have shown that pitch plays an important role in stress perception in many languages (e.g., Fant et al., 1991; Hasegawa and Hata, 1992; Antoniou et al., 2015). Moreover, the discrepancy between the results of the present work and those of previous studies on Italian lexical stress assignment may be due to differences in the stimuli: indeed to produce them, we used a software (Steinberg Cubase 5) which is more sophisticated than those typically employed in the literature. Most importantly, however, we used non-words controlled for semantic and phonological neighbors and for coarticulation effects, while words and pseudowords are typically used in the literature. In essence, the present study shows the "barebones" of the machinery of stress assignment, in a particularly pure condition where there can be no plausible influence by lexical processing. Stimuli were natural syllables, but the same syllable was repeated three times across the string, thus producing a stimulus that is neutral at both the semantic and the lexical levels of analysis. Thus we may hypothesize that the reason why previous authors found that duration, and not pitch, was the critical feature in stress perception, is that some interaction occurs between the lexical/semantic levels and the early acoustic analyses in this process (Arciuli and Colombo, 2016), which changes the relative weight of the parameters in determining performance. Note that even in studies using pseudowords there might have been an implicit lexical contribution, as pseudowords partially activate the phonological lexicon and do so as a function of orthographic/phonological similarity (Rosson, 1983) (e.g., the pseudoword /tavoga/ is very likely to activate the lexical node /tavola/, table). Also, differently from experiments using words and pseudowords, across our experiment participants listened to the same strings, /tatata/ /gugugu/, /dididi/, over and over, which likely contributed to a further swamping of any, however, small, lexical activation. Overall, further research is needed to investigate the possible top–down effects of complete or partial lexical access on the acoustic processing that eventually leads to stress perception.

Another source of insight as to the role of pitch can be found when comparing the present results to those by Antoniou et al. (2015). These authors found that pitch perception was a stronger predictor of language ability in Chinese as compared to rhythm perception (which failed to have any impact at all on performance in their tasks); they suggested that the acoustic parameters predicting language development are language-specific, and that tone languages such as Chinese have different predicting patterns as compared to Western languages. Momentarily neglecting the many differences between Antoniou et al.'s (2015) and the present work, both in the experimental tasks and in the dependent variables (language tasks versus reading), we would (if anything) have predicted that Italian participants would behave more similarly to English or other European speakers/readers than to Chinese speakers. By contrast, pitch turned out, both in ours and in Antoniou et al.'s (2015) study, to be the most relevant parameter determining stress processing and perception. Even if Italian is not a tone-based language, it partially differs from most other European languages, especially Germanic ones, in the very range of pitch variations produced by its speakers (Hirst and Di Cristo, 1998). Even more precisely, while tone determines lexical identity in Chinese, pitch variations in Italian are the vehicle for prosody-based pragmatic communication, conveying emotion and meaning (e.g., questioning and statement: Hirst and Di Cristo, 1998; D'Imperio, 2002). This may suggest that Italian speakers are more used to process pitch variations and therefore their ability in processing pitch is higher than for speakers of other languages. Indeed, since the variation in syllable duration is limited in Italian (Nespor et al., 2011), it is reasonable to hypothesize that speakers and listeners of this language base their production/perception of lexical stress on other parameters. Furthermore, if pitch is processed in order to extract pragmatic cues, this might explain the impressive growth in sensitivity that we observed for this parameter across the lifespan: pragmatics is doubtlessly the linguistic component which develops more slowly and more gradually during life, along with experience in interaction with other people across different contexts and conditions.

## Typically Developing Children vs. Adults

The second prediction derived from the literature was that TD children and adults should have shown similar performances when perceiving acoustic parameters. This hypothesis was also falsified. Indeed, sensitivity to pitch turned out to be lower in children than in adults, although pitch was found to be the most relevant parameter also for TD children. To our knowledge, no previous study investigated the development of acoustic parameters processing involved in stress assignment. Moreover, there are studies on pitch perception in infants but they mostly used musical rather than speech stimuli (Trehub, 2001; Plantinga and Trainor, 2005) or they focused on pitch characteristics of infant-direct speech and its influence on infants' discrimination ability (Marcos, 1987; Trainor and Desjardins, 2002).

## Dyslexic vs. Typically Developing Children

A final prediction was that children with DD should have been less sensitive than TD peers to changes in the acoustic parameters while processing stress position. This hypothesis was confirmed. Indeed, our DD children did not seem to rely on any parameter in their judgments, and rather gave random responses, which point to a general inability to process the various acoustic modulations that normally contribute to stress perception. Thus, in line with Goswami et al. (2013), our DD children showed an impaired sensitivity to syllable stress compared to their TD peers (and adults).

Interestingly, performance on lexical stress perception was found to correlate with morphosyntactic abilities in the sample of children (including TD and DD), and with reading abilities in the group with DD. Such correlations support the idea that perception of stress helps building more stable and well-defined phonological and orthographic representations of the words

that will be thus more easily retrieved during reading (Elbro and Jensen, 2005; Perfetti, 2007). Even more crucially, these correlations highlight the strict connections existing between prosodic skills and written text decoding, as well as between prosody and other language abilities. Indeed, a relationship is often described between reading and phonological abilities at the phonemic level (e.g., Bogliotti et al., 2008; Hämäläinen et al., 2009, 2013), possibly extending to the syllabic or onset/rime level (e.g., Goswami et al., 2010, 2013; Frey et al., 2018), but more rarely encompassing lexical prosody for multisyllabic words. An undifferentiated approach to prosody though, not distinguishing between the various levels, can fail to capture the very specific and articulated links between functions both within and across linguistic domains (see Leong and Goswami, 2014). Furthermore, single and specific aspects of prosody seem to be involved in different, specific learning abilities: reading versus writing, speed versus accuracy, word versus non-word versus meaningful text reading, etc. In the context of the Letter-Speech Sound Integration issue, it has been proposed (Zhang and McBride-Chang, 2010) that auditory sensitivity impacts speech perception, with temporal processing especially influencing the segmental/phonemic level while rhythmic processing would more specifically affect the suprasegmental/prosodic level. In turn, segmental and suprasegmental processing would influence literacy acquisition through phonological processing on the one hand, and morphological awareness on the other hand. Nonetheless, in the model proposed by these authors, sensitivity to speech prosody such as stress may also influence speech perception at the segmental level, by facilitating spoken word recognition and enhancing the perception of phonemes according to the Lexical Restructuring Hypothesis (Metsala and Walley, 1998; Wood et al., 2009). In this perspective, sensitivity to speech rhythm could explain individual differences in reading ability beyond, and independently of, the contribution of phonological awareness.

This also suggests that the efforts to train pre-school and early school children in phonemic awareness tasks could be even more effective in preventing or remediating reading failure if complemented by exercises requiring to perform prosodic analysis at various levels (as shown also by Thomson et al., 2013) or by emphasizing the rhythmic structure of the linguistic stimuli (e.g., Bonacina et al., 2015). Moreover, better stress perception could contribute to the development of syntactic skills (in a bidirectional manner) both by providing clearer representations of lexical entries and by helping disentangle ambiguous syntactic structures through prosodic cues (e.g., Frazier et al., 2006; Snedeker and Yuan, 2008; Caccia and Lorusso, 2019). With regard to writing skills, Angelelli et al. (2010) showed that Italian children with DD and, more generally, with specific learning disorders, tend to omit the (compulsory) diacritic marks when writing ultimate-stressed words. The present study suggests that such difficulties possibly lie in stress perception rather than in orthographic stress representation. Specifically, the DD children of our study seemed to have lacking awareness of lexical stress position, suggesting that orthographic difficulties actually originate at the metaphonological level. In the light of this result, it could be interesting to investigate whether this metaphonological deficit, in turn, arises at a low level of acoustic analysis or at a higher level of integration in an abstract stress-position representation. To this purpose, ERPs might be recorded in future studies during the listening of "swapped" acoustic parameters stimuli, thus allowing one to disentangle lower from higher components of stress perception. Moreover, ERP studies could be conducted also with younger children to define developmental trajectories and to identify possible early markers of language disorders.

In the light of the above, our results suggest that DD children show defective processing skills of acoustic parameters responsible for lexical stress assignment and therefore, their orthographic difficulties with diacritical markers should be supported and rehabilitated on the basis of strategies that are not based on acoustic analysis. Since the application of explicit grammatical rules are often challenging for children with DD (Pavlidou et al., 2009), more effective rehabilitation strategies should rely on (e.g.) visual memorization and recognition of grammatical morphemes (e.g., /-ò/, /-à/, as verb suffixes with diacritical marks) or frequent suffixes for nouns (e.g., /-tà/, from Latin "-tas", in "felicitas" – "felicità") or exception words (e.g., città, rondò, perché, così, etc.). Some intervention programs, such as the ones based on stimulation of hemisphere-specific strategies according to the balance model of dyslexia (see Bakker, 2006), rely on such strategies (Lorusso et al., 2006, 2011).

## CONCLUSION

In conclusion, the present study shows that pitch plays a crucial role in Italian stress perception, differently, for example, from stress perception in Spanish and Finnish which is characterized in terms of duration (Alfano et al., 2007; Eriksson et al., 2016). These findings seem to go in the same direction of a language-specific approach; indeed, following the LSAC hypothesis (Antoniou et al., 2015) the set of acoustic parameters required for the development of lexical stress perception (and possibly, of other aspects of language development) is language specific rather than universal, as postulated by the RDH (Goswami et al., 2011). This means that for languages that extensively use a specific acoustic cue (pitch, duration etc.), such acoustic parameter would be more important than the others and consequently would play a crucial role both in language processing and development (see also McBride-Chang et al., 2008). A cross-linguistic study with the same experimental paradigm would be useful to shed light on the role of acoustic parameters in determining lexical stress.

Beyond the role of single parameters for stress perception and language-specific differences, the present results confirm the role of prosody in reading and language development. More precisely, they highlight the need to extend the analysis of phonological abilities from a purely segmental to a broadly defined suprasegmental level to be able to detect and consider the subtle and likely bidirectional relationships linking lowlevel, perceptual abilities to the development of more and more complex oral and written language skills.

## Limitations

fpsyg-10-01458 June 24, 2019 Time: 15:14 # 14

One limitation of the present study is that, given its relatively small sample sizes, statistical power is likely to be low. Given that we used dedicated stimulus types which (to our knowledge) were never used before, and also Consistency scores which were mathematically developed for the purpose, we could not run a reliable power analysis before the experiment. However, the main effects emerging from the analyses, which were the object of our theoretical discussion, are very large (e.g., the dominance of Pitch over the other parameters, or the differences in Consistency scores between adults and children), so that power limitations are unlikely to be an issue at least in those cases.

## ETHICS STATEMENT

All children's parents signed informed consent. The study had been approved by the Ethics Committee of the University of Pavia, according to the standards of the Helsinki Declaration (1964).

## AUTHOR CONTRIBUTIONS

MC contributed to the conception of the study, definition of the experimental design, organized and supervised data collection,

## REFERENCES


and contributed to the writing of the manuscript. GP prepared the experimental stimuli and wrote a section of the manuscript. AT carried out all statistical analyses and took care of the interpretation and description of the results. AR administered the tests and coded them following the experimental protocol. LL participated in the definition of the experimental protocol. AO organized and supervised the recruitment of participants. ML contributed to the conception and definition of the experimental design, participated in the interpretation of results, contributed to the writing of the manuscript, and supervised the whole study. All authors contributed to manuscript revision, and read and approved the submitted version.

## FUNDING

This work was supported by the Italian Ministry of Health (Grant No. RC2018-2019).

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2019.01458/full#supplementary-material

with chronological age and reading level controls. J. Exp. Child Psychol. 101, 137–155. doi: 10.1016/j.jecp.2008.03.006



dyslexic and typically developing children. Cogn. Neuropsychol. 34, 163–186. doi: 10.1080/02643294.2017.1386168


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Caccia, Presti, Toraldo, Radaelli, Ludovico, Ogliari and Lorusso. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Implicit Statistical Learning Across Modalities and Its Relationship With Reading in Childhood

Elpis V. Pavlidou1,2 \* and Louisa Bogaerts<sup>3</sup>

<sup>1</sup> Psychology in Education Research Centre, Department of Education, University of York, York, United Kingdom, <sup>2</sup> Haskins Laboratories, Yale University, New Haven, CT, United States, <sup>3</sup> Department of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel

Implicit statistical learning (ISL) describes our ability to tacitly pick up regularities from our environment therefore, shaping our behavior. A broad understanding of ISL incorporates a great range of possible computations, which render it highly relevant to reading. In the light of this hypothesized relationship, ISL performance was explored in young (M = 8.47 years) typical readers (N = 31) across three different modalities (i.e., visual, auditory, and tactile) using the Artificial Grammar Learning (AGL) paradigm. Adopting repeated measures and correlational designs, the obtained data revealed modality constraints: (1) above-chance performance was observed on the visual and tactile tasks but not on the auditory task, (2) there was no significant correlation of ISL performance across modalities, and (3) split-half reliability of visual and auditory tasks was reasonably high, yet for the tactile task it was close to zero. Evaluating the relation between ISL ability and language skills, we observed a positive correlation between visual ISL performance and phonological awareness. We discuss these findings in view of current perspectives on the nature of ISL and its potential involvement in mastering successful (i.e., accurate and fluent) reading.

Keywords: implicit statistical learning, artificial grammar learning, modality specificity, reading, reading fluency, children

## INTRODUCTION

He who thus considers things in their first growth and origin, [.] will obtain the clearest view of them.

Aristotle, ca.350 BC

It is catholically accepted that successful reading requires the development of a repertoire of skills that feed into both accuracy and fluency while failure to master such skills results in reading difficulties. Significant progress has been made in unpacking and understanding the key cognitive and linguistic factors that govern reading accuracy and more recently those involved in reading fluency (e.g., Pikulski and Chard, 2005). Successful reading is viewed here as a prototypical example of skill acquisition, which, similar to the acquisition and development of other skills, is supported by our ability to extract patterns and regularities from our environment (e.g., Kaufman et al., 2010).

#### Edited by:

Iliana I. Karipidis, Stanford University, United States

#### Reviewed by:

Chris McNorgan, University at Buffalo, United States Milene Bonte, Maastricht University, Netherlands

> \*Correspondence: Elpis V. Pavlidou elpis.pavlidou@york.ac.uk

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 10 January 2019 Accepted: 24 July 2019 Published: 22 August 2019

#### Citation:

Pavlidou EV and Bogaerts L (2019) Implicit Statistical Learning Across Modalities and Its Relationship With Reading in Childhood. Front. Psychol. 10:1834. doi: 10.3389/fpsyg.2019.01834

In recent years, implicit statistical learning<sup>1</sup> (ISL hereafter) that is our ability to pick up structure from our environment (over time) in an undirected fashion has emerged as a strong candidate mechanism to explain, amongst other, linguistic phenomena (e.g., Saffran and Wilson, 2003; Evans et al., 2009). This contemporary theoretical approach binds reading (and language acquisition overall) with a general (rather than language specific) capacity to detect, store, and use statistical regularities in the input (e.g., Arciuli and Simpson, 2012a,b; Frost et al., 2013; Erickson and Thiessen, 2015). As expected by statistical learningbased theories of language acquisition, ISL is established early in development (e.g., Saffran et al., 1996; Fisher and Aslin, 2002; Kirkham et al., 2002; Bulf et al., 2011): according to Goschke and Bolte (2007) the adaptation of our behavior to recurring and sequential patterns is a fundamental function of learning and thus, the encoding and exploitation of such regularities becomes an adaptive advantage (Conway and Pisoni, 2008).

Language constitutes a potent example of a learning environment that requires the exploitation of regularities: spoken words are characterized by idiosyncratic patterns of transitional probabilities (i.e., the conditional probability of one element given another element) that constrain their internal structure. Each writing system is then characterized by a set of correlations that determine the possible co-occurrences of letter sequences, and by high and low correlations of grapheme (letter) to phoneme (speech sound) mappings with different degrees of high and low correlations between letters and speech sounds characterizing different writing systems. Typically, regular letter to speech sound (L-SS) associations are taught to children explicitly (e.g., by giving examples and/or activities to reinforce them) (Apfelbaum et al., 2012). However, many L-SS correlations do not abide to simple rules and thus, are not explicitly taught; instead are picked up implicitly by the learner over increasing exposure to print (e.g., Cassar and Treiman, 1997; Pacton et al., 2001).

#### ISL, Language Acquisition, and Reading

Starting from Saffran et al. (1996) seminal work on infants' ISL abilities and amidst avid critics and unapologetic fans, the prominent role of ISL in spoken language acquisition has been firmly established over the past two decades (e.g., Saffran and Wilson, 2003; Evans et al., 2009). Studies on infant learning (e.g., Marcus et al., 1999; Clohessy et al., 2001) show that implicit learning abilities are already well established in infancy compared to other less well-developed explicit learning abilities at this age. ISL is viewed as the vehicle the novice learner is using to parse language (e.g., Saffran et al., 1996; Gomez and Gerken, 2000; Gómez, 2002; Kirkham et al., 2002); and it is most closely associated with tracking the sequential statistics (typically transitional probabilities) in the incoming speech stream.

However, a broad understanding of ISL and its' link to language incorporates a great range of possible computations (e.g., frequency of individual elements, frequency of cooccurrence, distributional cues, etc., see Erickson and Thiessen, 2015, for a review), which render it highly relevant to reading as well. For example, similar to spoken language, written language contains different types and degrees of statistical information such as distributional cues and co-occurrence across domains for L-SS mappings and non-adjacent dependencies for grammar. Sperling et al. (2004) state that reading fluency in particular, is mastered with a mixture of explicit and implicit learning mechanisms even in languages that are highly regular in grapheme-phoneme correspondences. Connectionist models of language learning (e.g., Seidenberg and McClelland, 1989; Plaut et al., 1996; McClelland and Patterson, 2002; Harm and Seidenberg, 2004) together with neuroimaging data (see Sawi and Rueckl, 2019) bolster the argument that any attempt to attain successful reading involves ISL procedures. ISL could mediate abilities that are directly involved in reading such as phonological awareness (e.g., Spencer et al., 2015), accounting in turn for individual variation in children's reading performance.

It is well documented that phonological awareness (part of phonological processing) is involved not only in L-SS mappings but also in reading comprehension by aiding phonological recovery during both reading aloud and silent reading (e.g., Ashby, 2006). ISL could shape not only visual word processing abilities but also the phonological representations and the automatic access of such representations in long-term memory via the "exploitation" of the regularities inherent in spoken and written language. Reasoning along these lines, Mainela-Arnold and Evans (2014) hypothesized that the ability to track statistical sequential regularities in speech streams may be critical to the acquisition of lexical-phonological knowledge and demonstrated a relationship between auditory ISL and lexical-phonological abilities in children with specific language impairment but also in children with typical development (ages 8–12): poor statistical learners, they found, were also poor at managing lexical-phonological competition. Relatedly, Spencer et al. (2015) tested, in a large sample of 4–10 year old children, ISL abilities and a series of tasks tapping constructs crucial to the development of early literacy skills: oral language skill, vocabulary knowledge and phonological processing. ISL abilities were measured with two different tasks: an auditory Saffran-style word segmentation task and a visual, interactive Simon-AGL task with colored squares after Conway et al. (2010). Using structural equation modeling, the authors revealed that generally speaking ISL accounted for a unique portion of the variance in these literacy-related skills. Interestingly, the two ISL tasks did not load onto a single latent variable and whereas words segmentation had a stronger influence on oral language skills, the visual Simon-AGL task had a stronger contribution to phonological processing skills. This result was interpreted in terms of the different SL mechanisms these two ISL tasks differentially tap into [in line with the extraction and integration framework put forward by Thiessen et al. (2013)]. Importantly, they also suggest that SL abilities in both the auditory and visual modalities are related to early literacy acquisition and that SL abilities in sensory modalities other than the auditory may play a role in the development

<sup>1</sup>The term "implicit statistical learning" proposed by Conway and Christiansen (2009) is used here to denote the fusion of the two traditions that of implicit learning and of statistical learning; assuming they tap on the same phenomena. However, the study does not adopt any strong stances on the availability to consciousness of the resulting knowledge; the term "implicit" refers to the undirected nature of the learning process.

of phonological skills (potentially because of shared underlying learning mechanisms).

A smaller set of studies on ISL and typical reading has demonstrated correlations between ISL abilities and reading skills in first (Arciuli and Simpson, 2012b; e.g., Apfelbaum et al., 2012; Qi et al., 2019; von Koss Torkildsen et al., 2019; but see Schmalz et al., 2019 for contrasting results) and second language (Frost et al., 2013). The majority of these individual differences studies indexed ISL by one non-linguistic visual segmentation task (Arciuli and Simpson, 2012b; Frost et al., 2013; von Koss Torkildsen et al., 2019), yet without the (explicit) assumption that the observed relationship is dependent on the visual presentation modality or the type of input statistics the task of choice taps on. In the theorizing, ISL is typically treated as a unified theoretical construct, a "general capacity for picking up regularities" that is predicted to correlate with measures of literacy (see Siegelman et al., 2017a, for a discussion).

## ISL: A Unified Construct or Not?

Originally, the domain-generality of ISL was invoked to argue against language modularity and innate theories of language acquisition (Chomsky, 1959; Fodor, 1984). The fact that ISL abilities were demonstrated in studies that used different types of stimuli including shapes (e.g., Pothos and Bailey, 2000; Pothos and Kirk, 2004; Bulf et al., 2011); alien figures (e.g., Arciuli and Simpson, 2012a); pure tones (e.g., Saffran et al., 1999); speech-like sounds (e.g., Gomez and Gerken, 2000) and syllables (e.g., Saffran et al., 2006); and tactile stimuli (i.e., finger vibrations) (e.g., Conway and Christiansen, 2005), let to the common belief that ISL is a unitary learning system (e.g., Bulf et al., 2011). Such unitary learning system is thought to execute similar computations across stimuli and sensory modalities (Frost et al., 2015).

The theoretically "appealing" view of ISL as a single entity is, however, challenged firstly by data from adult populations suggesting modality and stimulus-specific constraints (e.g., Conway and Christiansen, 2005, 2006; Mitchel and Weiss, 2011; see Frost et al., 2015, for a comprehensive review). A second finding that is puzzling for ISL as a unified construct is the virtually zero correlation between ISL performances in the auditory vs. visual modality (e.g., Siegelman and Frost, 2015). If there is something like a domain-general ISL faculty extracting patterns across modalities, why would someone who performs well on an ISL task with auditory stimuli not do well on an ISL task with visual stimuli also? These modality-specific effects were demonstrated predominantly in adult populations but a third piece of evidence comes from a cross-sectional study testing visual and auditory ISL performance of children at ages 5–12 (Raviv and Arnon, 2018). Whereas visual SL performance improved linearly with age, auditory SL performance, albeit lower on the average, was not superior for older children. What is the nature of ISL (in these young populations) that can explain differential developmental trajectories?

Recently, Frost et al. (2015) offered a theoretical framework reconciling domain-generality and specificity. ISL, they argue, is "not a unitary mechanism, but a set of domain-general computational principles that operate in different modalities and, therefore, are subject to the specific constraints characteristic of their respective brain regions" (p. 1). This framework raises an interesting question regarding the link between ISL ability and reading ability: Is the association underpinned by a shared reliance on the ability for registering the statistical properties of the input or rather driven by the ability of our visual system specifically to efficiently encode and effectively internally represent visual stimuli (see also Bogaerts et al., 2016)? In other words, are ISL abilities in modalities other than the visual also predictive of reading performance?

Qi et al. (2019) very recently explored the association between reading skills and both visual and auditory ISL with results suggesting that, maybe somewhat surprisingly, auditory ISL contributes more strongly to certain aspects of reading compared to visual ISL. Importantly, auditory ISL might be predictive of reading simply because it taps on the same domain-general capacity for picking up sequential regularities as the visual task or rather via its contribution to oral language skills (Spencer et al., 2015) and/or phonological processing abilities.

## The Present Study

In the light of sparse empirical data from young populations on the nature of ISL per se and the proposed mechanisms via which it could facilitate reading early in development, the purpose of this study becomes twofold: to explore on the one hand, whether ISL can be best described as a unified ability or as a constituent one [if one considers the different neurocognitive computations associated with how information is processed in specific modalities (Frost et al., 2015)] by looking at performance across modalities; and on the other hand, to systematically unpack the relationship of ISL with reading and reading-related abilities in childhood. Embracing, however, the possibility of ISL having both a general component and a domain-specific one, we aim to shed light on which component underlies the hypothesized relation with reading skill. To provide some answers to the aforementioned questions, the Artificial Grammar Learning (AGL) framework was used.

Artificial grammar learning (Reber, 1989) is a paradigm widely used for studying ISL and it has been used previously with young children (Pavlidou et al., 2009, 2010; Pavlidou and Williams, 2014). Its framework provides the theoretical and empirical grounds for exploring various hypotheses pertinent to how implicit learning mechanisms contribute to reading as it is thought to draw on the mechanisms that recognize complex statistical regularities (e.g., Petersson et al., 2004). In a typical AGL task participants are shown strings of letters that are constructed based on a particular rule system (artificial grammar) and then they are asked to identify from a novel set of strings those compatible with the old (Pothos and Bailey, 2000) that is to make grammaticality judgments. Various explanations are proposed to account for typical participants' behavior during the AGL learning episode (see Redington and Chater, 1996; Pothos, 2007, for reviews). Participants are found to be sensitive to specific item factors such as the similarity of testing items to training items (Brooks, 1978; Brooks and Vokey, 1991; Vokey and

Brooks, 1992), fragment (e.g., bigrams or trigrams) information (e.g., Perruchet and Pacteau, 1990, 1991; Witt and Vinter, 2011) and structure (rules or micro-rules) (e.g., Dulany et al., 1984; Manza and Reber, 1997). It is suggested that sensitivity to the level of associative strength of the test stimuli (chunk strength) to the training stimuli reflects a statistical fragment-dependent learning mechanism (chunking models of implicit learning, e.g., Boucher and Dienes, 2003 but see Pothos, 2007 for a comprehensive review of available AGL models). Sensitivity to structure (the system used to create both training and testing items) on the other hand is thought to indicate a structure-based acquisition mechanism (for rule-based models of implicit learning see e.g., Dulany et al., 1984; but see Pothos, 2007). These properties make AGL a suitable "analog" for some of the mechanisms that novice readers could capitalize on to master reading.

The choice of looking at ISL using the AGL paradigm across three different modalities and within subjects will add important behavioral data on whether this type of learning is served by a unified mechanism that "behaves" in a similar way across different perceptual and item level dimensions given that stimulus modality is thought to impact the learning process itself (Silva et al., 2018). Previous studies found a quantitative advantage of the auditory modality (e.g., Conway and Christiansen, 2005), however, recent data do not confirm modality differences in ISL but yet acknowledge the inherent constraints to each modality (e.g., Conway and Christiansen, 2009). AGL has the advantage of allowing experimental manipulations on the type of information (i.e., verbal/non-verbal), modality (i.e., visual, auditory, or tactile), and item level (e.g., adherence or not to the grammar rules or associative strength) all of which allow the formation of numerous testable hypotheses on the nature of the learning and its resulting knowledge. Every effort has been made to tightly control experimental procedures and materials across the senses: To ensure that we still have comparable input across senses but at the same time we have induced maximum learning by accounting for inherent modality constraints, the structure of the ISL stimuli is the same across sensory conditions but the presentation of ISL stimuli is spatial for the visual modality and temporal for the auditory (and tactile) modality. Following the work of Conway and Christiansen (2005, 2009) on typical adults, the present set of experiments provides an adequate comparison of learning across three modalities (vision, audition and touch) and an insight on how modality constraints might affect ISL early in development.

In turn, the combination of the AGL tasks with a range of reading and cognitive measures provides the platform to consider the relationship between ISL performance and reading in more detail. This is the first study to our knowledge that looked at ISL learning performance across modalities and its relationship with reading in young typical children. A robust test of the hypothesis that ISL is related to reading ability involves (1) selection of an ISL task with non-verbal stimuli that has no specific predefined relationship with reading processes per se, and (2) use of standardized tests of phonological awareness and reading ability that were not designed with the probabilistic link between letters and speech sounds, and among letters, in mind (Arciuli and Simpson, 2012b). This was the reasoning adopted in the present study.

## MATERIALS AND METHODS

#### Subjects

This study was carried out in accordance with the recommendations and approval of Yale University Human Investigation Committee (HIC) and the University of Edinburgh The Psychology Research Ethics Committee. Written and informed consent and assent were obtained from parents and the participating children, respectively.

Thirty-one<sup>2</sup> typically developing children participated in the study and they received monetary compensation for their participation. They were between 6 and 9 years old (M = 8.47, SD = 1.19 years; F = 15, M = 16). Children did not have a reported history of reading, speech or hearing impairment.<sup>3</sup> They all had normal or corrected to normal vision. All testing sessions took place at Haskins Laboratories facilities in New Haven, CT (see section "Procedure" for more details).

## Materials

#### Background Measures

Children received a battery of general and reading-related cognitive measures (see **Table 1** for a summary of participants' scores):

#### **General intellectual ability and memory**

General intellectual ability was assessed using the Wechsler Abbreviated Scale of Intelligence (WASI II) (Wechsler, 2011) for verbal comprehension and perceptual reasoning. Children's working memory was assessed using the Digit Span subtest from CTOPP 2 (Wagner et al., 2013).

<sup>3</sup>All children underwent a hearing test.

TABLE 1 | Participants' descriptive statistics on standardized behavioral measures, all reported in standardized scores.


<sup>2</sup>Originally, 33 children were tested. However, two children were excluded from our analysis due to many missing values across all our tasks and measures. Please also note that from our final sample (31), one child did not complete the Auditory ISL task.

#### Pavlidou and Bogaerts Implicit Statistical Learning in Childhood

#### **Literacy**

Reading and spelling. To evaluate reading skills, the 3rd edition of the Woodcock–Johnson Test of Achievement (WJ-III; Woodcock et al., 2001) was administered. We measured the Broad Reading Composite (WJBR) score, which was a composite of scores on the following subtests: Letter-Word Identification (recognizing letters and reading real words of increasing difficulty), Reading Fluency (speeded reading of sentences), and Passage Comprehension (reading and understanding short passages). We also calculated the Basic Reading and Reading Fluency composite scores. Children also completed the Test of Word Reading Efficiency (TOWRE; Torgesen et al., 1999) (speeded reading of single words and non-words), which is composed of two subtests: the Sight Word Efficiency (SWE) and Phonetic Decoding Efficiency (PDE). We computed both subscores as well as the composite score. Finally, children's spelling ability was measured using the Spelling subtest from WJ-III.

Phonological processing and rapid automatized naming (RAN). To tap phonological awareness, we used the Elision and Blending Words subtests from the Comprehensive Test of Phonological Processing 2 (CTOPP 2) (Wagner et al., 2013). We also used from the same test battery, the composite score for Memory for Digits and Non-word Repetition subtests to measure phonological memory and RAN for digits to measure RAN.

#### The Implicit Statistical Learning Tasks **Apparatus**

Three AGL tasks were developed in which information and item levels were kept constant but introduced via a different modality (visual, auditory and tactile). In all three modalities the stimuli were non-verbal: (1) The visual task used unfamiliar shapes; (2) The auditory task used pure tones; (3) The tactile task used finger vibrations. Based on adult data (Conway and Christiansen, 2005) suggesting that participants learn the predictive dependencies better when the visual stimuli are presented in simultaneous (spatial) fashion as opposed to sequential, it was decided to present the visual stimuli spatially (as opposed to auditory and tactile stimuli, which were introduced sequentially/temporally) to induce maximum learning.

Each task consisted of a training phase, which exposed the children to stimuli that followed the permissible transitions of the grammar (i.e., they were not random) and a test phase. Presentation order for both training and testing stimuli was randomized for each participant. The testing stimuli controlling for (a) adherence to the grammar rule and (b) fragment familiarity (item level) shared with the training items. By manipulating both grammaticality and chunk strength we can learn about the learning mechanisms children employ (in each of the modalities).

Visual non-verbal task (VNT). The visual task was based on Knowlton and Squire (1996) experimental grammar. Children were trained on 69 grammatical sequences (i.e., sequences that followed the rules of the grammar) composed of two to six items, i.e., "alien" shapes (see **Figure 1**, for some examples of sequences); but they were not informed about the structured nature of the sequences. The sequences (irrespective of their

length) were presented one at a time and remained on screen for 5 s (inter-sequence interval: 3 s). Children were advised to give their utmost attention and tap<sup>4</sup> their hand whenever they noticed a new sequence on the computer screen.

After the end of the training phase, children were informed that the sequences followed some very complicated rules and that they had to choose from a new set of sequences those that seemed to follow the same rules or looked "familiar" to them with a verbal response (yes, if a sequence looks familiar to them; no, if it doesn't look familiar). They were shown 32 novel test sequences (ranging between 2 and 6 items long). Presentation parameters here were identical to those in the training phase. Half of the sequences obeyed the rules of the grammar and were thus, labeled grammatical (GR) while the remaining half violated those rules and were labeled ungrammatical (UG). We also manipulated associative strength (referred to as "chunk strength" hereafter) to the training sequences, so that half of the GR items had high chunk strength (HCS) and half of the UG items had low chunk strength (LCS). Chunk strength for each sequence was calculated by dividing the total number of bigrams and trigrams (i.e., chunks) it consisted of with the total number of times the same chunks had appeared during the training. Note that children's responses are scored according to grammaticality only: accepting GR items and rejecting UG items are correct responses, rejecting GR items and accepting UG items are incorrect responses.

Auditory non-verbal task (ANT). The auditory task was designed by substituting the "alien" shapes with "alien" sounds that is pure tones (1 = 261.6 Hz, 2 = 277.2 Hz, 3 = 349.2 Hz, 4 = 370 Hz, and 5 = 493.9 Hz) using E-Prime<sup>5</sup> ; the same

<sup>4</sup>This technique was chosen to maintain attention and ensure that children were engaged with the task. One child who showed clear behavioral signs of frequent disengagement was not included in the analysis.

<sup>5</sup>E-Prime Experiment version = 1.0.0.50; Runtime version = 2.0.10.242; Studio version = 147.

training and testing items used for VNT were used to create the auditory stimulus set: the duration of each tone was 500 ms and it was introduced every 100 ms. After the end of one sequence of tones, there was an interval of 1700 ms followed by a fixation cross on the computer screen, which marked the beginning of a new sequence. Again, children were not informed about the structured nature of the tone sequences but were advised to pay attention to the "alien" sounds. They were then introduced to new tone sequences and were asked to indicate/decide which items sounded similar and which did not by using button presses.

Tactile task (TT). The tactile task was designed as follows: a number was assigned to each of the four letters of the grammar and subsequently, each number was mapped on a specific finger [Knowlton and Squire, 1996 grammar had four positions so the last finger (pinky) was mapped onto 0 and did not correspond to any vibration]. Following this "coding scheme," the letter items comprising the training and testing set used for the VNT were replaced with their corresponding fingers (see **Box 1**).

Using an innovative tactile device (see **Box 1**), minor finger vibrations were produced to each finger (i.e., fingers 1, 2, 3, and 4) during training and testing. The duration of each vibration was 500 ms and it was introduced every 100 ms. After the end of one sequence of vibrations, there was an interval of 1700 ms before a new sequence was introduced. Again, the same design as in the VNT and ANT tasks, respectively was used for the tactile task (TT). To impede interference from the other senses, children were asked to wear an "astronaut glove" (see **Box 1**) and headphones playing white noise for the entire duration of the experiment.

#### Procedure

Children performed individually the tasks during three experimental sessions (within a span of ∼1–4 weeks apart; M interval = 2.2 weeks) with the following order of administration were possible: Session 1, WASI II/Visual Non-verbal Task (VNT)/TOWRE; Session 2, Hearing Test/Auditory Non-verbal Task (ANT)/WJ-III; and Session 3, Tactile Task (TT)/CTOPP 2.

## RESULTS

To recap, the current tightly controlled experimental design explored (1) whether AGL learning in one modality is linked with learning in other modalities thus, pointing to the existence of a domain-general ISL mechanism; (2) what are the different statistics (induced/encouraged/tested by our experimental tasks across the different modalities) that helped children to learn the inherent regularities of our stimulus set; and (3) what is the relationship, if any, of ISL in the different modalities with our chosen standardized measures of reading and readingrelated abilities.

### AGL Learning

**Figure 2** depicts children's mean correctness on the three AGL tasks (R Core Team, 2019). An independent t-test was used to compare performance against chance level (0.5 or 50%) in all three conditions/experiments given that above chance performance in AGL literature is taken as an indicator of learning taking place. Children performed at above chance in the visual and tactile tasks but not in the auditory one (visual: M = 56.45%, SD = 0.13, t(30) = 2.72, padj = 0.01, d = 0.53; tactile: M = 55.34%, SD = 0.10, t(30) = 2.94, padj = < 0.01, d = 0.50; auditory: M = 49.48%, SD = 0.16, t(29) = 0.22, padj = 0.57, d = −0.03, with p-values adjusted using Holm's correction for multiple comparisons). We further analyzed the data with a repeated-measures ANOVA with mean correctness as the dependent variable and Modality (visual vs. auditory vs. tactile), Grammaticality status (grammatical vs. ungrammatical) and the two-way interaction Grammaticality:Modality. A marginal main effect for Modality (F(2,58) = 2.50, p = 0.09, η <sup>2</sup>p = 0.08) suggests again modality differences, and we also found a significant main effect for Grammaticality (F(1,29) = 7.37, p = 0.01, η <sup>2</sup>p = 0.20), indicating that, across modalities, children provided more correct responses by correctly rejecting UG items (M = 57.47%, SD = 0.10) than by correctly accepting GR items (M = 50.14%, SD = 0.13). The interaction effect was not significant (F < 1, p = 0.53, η <sup>2</sup>p = 0.02).

#### What Was Learnt During AGL?

The balanced design applied to the AGL test materials (in terms of the grammaticality and chunk strength of the items), allowed the exploration of structural (i.e., grammaticality status) vs. familiarity-based (i.e., chunk status) effects in the visual and tactile modalities (where we observed abovechance performance). We ran a Repeated-measures ANOVA with mean acceptance rate as the dependent variable and as predictors Modality (visual vs. tactile), Grammaticality status (grammatical vs. ungrammatical), Chunk strength (high vs. low), the two-way interactions Grammaticality:Modality, Chunk strength:Modality, Grammaticality:Chunk strength and finally the three-way interaction Grammaticality:Chunk strength:Modality. Note that if children's acceptance responses are driven by grammar-structure/rule leaning we expect an effect of grammaticality, whereas a reliance on item familiarity irrespective of grammaticality status would produce an effect of chunk strength. If both learning mechanisms play a role then we should see an interaction effect between grammaticality and chunk strength.

Our results revealed a main effect of Grammaticality (F(1,30) = 18.83, p < 0.001, η <sup>2</sup>p = 0.39) indicating, in line with the results above regarding mean correctness, higher acceptance rate for GR items relative to UG items. More importantly, we also observed a significant interaction between Chunk strength and Modality (F(1,30) = 4.52, p = 0.04, η <sup>2</sup>p = 0.13) and a threeway interaction between Grammaticality, Chunk strength, and Modality (F(1,30) = 5.88, p = 0.02, η <sup>2</sup>p = 0.16). This threeway interaction is illustrated in **Figure 3**. What we can infer from the figure is that whereas for the visual modality both grammaticality and chunk strength lead to a higher acceptance rate, for the tactile modality only grammaticality has such positive effect on acceptance rate (i.e., correctly accepting GR sequences and correctly rejecting UG sequences).

## Does AGL Performance Correlate Across Modalities?

To determine the relationship, if any, of ISL performance across the three modalities, correlation analysis was applied. A Pearson correlation coefficient was computed to assess the relationship between performances across modalities (see **Figure 4**). There

were no significant correlations (visual-auditory: r = 0.33 with CI95 = [−0.11 0.67], padj = 0.21; visual-tactile: r = −0.18 with CI95 = [−0.50 0.19], padj = 0.34; auditory-tactile: r = 0.29 with CI95 = [−0.14 0.62], padj = 0.25, p-values and confidence intervals adjusted using Holm's correction for multiple comparisons).

Confidence intervals for all estimated correlations include 0 but are wide, hence we report also Bayes factors (BFs)<sup>6</sup> , which can help determine whether these non-significant results support the null hypothesis, or whether the data are rather just insensitive (Dienes, 2011). The strength of evidence for one hypothesis (here, the null hypothesis that the correlation is zero) compared to a competing hypothesis (here, the alternative hypothesis that the correlation is positive) is by convention considered moderate if the BF is larger than 3 (e.g., Jeffreys, 1961; Lee and Wagenmakers, 2013). For the correlation between visual and tactile performance we observe such moderate evidence with a BF0+ = 8.20, indicating that the data are about eight times more likely to have occurred under the null hypothesis than under the alternative hypothesis. The other two BFs are, however, smaller than 1: BF0+ = 0.48 for rvisual-auditory and BF0+ = 0.77 for rauditory-tactile, indicating inconclusive to weak evidence for a positive correlation.

Since the correlation between two measures is upper-bounded by their reliability, we also evaluated the split-half reliability of each of the tasks. Split-half reliability was obtained by correlating performance on odd and even test trials. Reliability correlations were found to be reasonably high for the visual (r = 0.42 with CI95 = [0.21 0.63], Spearman-Brown corrected = 0.58) and auditory (r = 0.56 with CI95 = [0.38 0.73], Spearman-Brown corrected = 0.72) tasks, but substantially lower for the tactile task (r = 0.16 with CI95 = [−0.08 0.41], Spearman-Brown corrected = 0.25). These reliability estimates assure us that the lack of correlations between the three tasks is not just the result of a lack of reliability but rather points to modality specificity, at least to some extent. Moreover, the numerically lower splithalf in the tactile task suggests that the psychometric properties of AGL tasks with the exact same underlying grammar are not identical and possibly point again to an important constraint of the sensory modality an artificial grammar is learned and/or tested in. This result should be interpreted with caution though since the confidence intervals for the tactile split-half correlation and those for the other modalities do overlap.

## Does AGL Performance Correlate With Reading Measures?

Evaluating the link between ISL performance as measured in our three tasks and reading, we were interested in two theoretical connections<sup>7</sup> , the connection with (1) phonological awareness, (2) basic reading skills, and (3) reading fluency. Note that for phonological awareness and basic reading we simply used the standard score of the CTOPP 2 and WJ-III basic reading, respectively. For fluency, we averaged the TOWRE total standard score and the WJ-III fluency subtest as these both tap speeded reading. The use of standard scores (with ages norms) is particularly important because our participants ranged between 6-9 years of age, a dynamic age for language and early reading development.<sup>8</sup>

Based on this lack of significant correlations across the three AGL tasks and the low split-half reliability of the tactile task<sup>9</sup> we focused on the visual and auditory AGL task and looked at them separately. All observed correlations between performance on the visual task and our reading-related measures were positive yet relatively small (see **Figure 5**) and only the correlation between phonological awareness and visual AGL performance was significant applying Holm's correction for multiple comparisons (r = 0.45 with CI95 = [0.04 0.74], padj = 0.03). Controlling for general intelligence we found a partial correlation coefficient of r = 0.52 (p < 0.01). The nonsignificant Pearson correlation between visual AGL performance and basic reading was estimated r = 0.11, with CI95 = [−0.25 0.45] (padj = 0.81) and similarly, for AGL performance and reading fluency r = 0.16, with CI95 = [−0.28 0.55] (padj = 0.81).

Finally, all correlations between performance on the auditory task and our reading-related measures were positive yet relatively small (see **Figure 6**) and none of those was significant (visual AGL-Phon. awareness: r = 0.16 with CI95 = [−0.22 0.50], padj = 0.53; visual AGL-Basic reading: r = 0.31 with CI95 = [−0.14 0.66], padj = 0.27; visual AGL-Reading fluency: r = 0.22 with CI95 = [−0.23 0.59], padj = 0.53).

Given that – with the exception of the correlation between visual AGL performance and phonological awareness – correlations were non-significant, we report again BFs. **Table 2** shows that all BFs for the correlations reported above as nonsignificant fall within the range between 0.33 and 3 and are hence considered as inconclusive, or only weak evidence for either hypothesis.

## DISCUSSION

Based on the hypothesized link between ISL and proficiency with written language, we explored in this study whether the ability to detect structure and fragment correlations implicitly in non-verbal spatial (i.e., visual symbols) or temporal (i.e., pure tones) arrays would correlate with phonological awareness and performance in reading measures of accuracy and fluency. In parallel, we were interested in investigating the nature of the ISL (whether domain – general or specific) given that it is still hotly debated in the

<sup>6</sup>These BFs, and all BFs reported subsequently, were obtained using JASP Team (2018, Version 0.10), using the default stretched Beta prior with width 1.

<sup>7</sup>Note that a large range of measures was included in the test battery (see **Table 1**), the selection of measures here is motivated by our theoretical considerations as outlined in the section "Introduction."

<sup>8</sup> Since ISL performance is not standardized one could also argue for exploring the correlations with raw scores, reflecting absolute phonological/reading skills. None of the correlations between ISL performance and ISL raw scores reached significance when controlling for multiple comparisons (all p's > 0.11). Looking at these same correlations controlling for age, we observed a pattern of results very similar to the one with standard scores reported in detail below with only the partial correlation between raw phonological awareness score and visual ISL being near-significant (r = 0.36, p = 0.56; with p's for all other correlations > 0.36). <sup>9</sup>With a low test reliability we can not expect correlations with other outcome measures.

literature. Hence, we looked at children's performance across three modalities, i.e., vision, audition and touch. We also investigated which type of knowledge children acquire and base their familiarity judgments in the test phase of an AGL task (i.e., knowledge of the underlying grammar (rules) or chuck strength).


To summarize our results, we found:

reading, and fluent reading (all in standard scores). Black lines represent regression lines and gray bands around them represent the standard error.

(3) Positive albeit small correlations between visual AGL performance and phonological awareness, basic reading and fluent reading. Only the correlation with phonological awareness reached significance. Similarly positive small correlations were observed between auditory AGL performance and our three reading-related measures but none of those were significant.

#### ISL Performance Across Domains

First, we explored whether children are able to show learning when faced with unfamiliar training items during short exposure times and novel testing items. Children were able to show above-chance ISL for visual, spatially arrayed, input but also for tactile, temporally presented, input confirming findings from adult populations (Conway and Christiansen, 2005). No other studies to our knowledge have demonstrated ISL abilities in touch in young populations, namely children. Contrary to adult data and the perceived supremacy of the auditory modality in ISL, children did not show learning for auditory input. In fact, there was a numeric advantage of the visual modality. Although authors suggest that human visual statistical learning is similar to auditory learning (e.g., Aslin et al., 1998; Fiser and Aslin, 2002; Kirkham et al., 2002), such conclusions were based on studies that did not use comparable stimuli or procedures across modalities. Considering that our training and testing materials were identical

TABLE 2 | Bayes factors (BF0+) for each of the correlation pairs, quantifying the strength of evidence for the null hypothesis that the correlation is zero compared to the alternative hypothesis that the correlation is positive.


in terms of their underlying structure (i.e., we the same training and testing items across modalities that we developed based on the same grammar) we can make more "refined" comparisons across modalities. Following this, we acknowledge that there are similarities in how infants and adults learn across modalities but the lack of evidence for learning in the auditory modality in our data provides a first piece of evidence suggesting potential important differences in children's learning that should be further explored using other ISL paradigms.

## One Modality Is Not the Other

The lack of correlation of performance across modalities (with one out of the three BFs proving moderate evidence in favor of the null hypothesis) further enhances our argument and advocates for potential important learning differences across modalities: ISL does not appear as a unified entity but rather as subject to modality constraints in childhood, confirming data from adult populations (Siegelman and Frost, 2015): a child's performance in one modality might not generalize to other modalities, rather children may be good in detecting structure and/or fragment correlations in one modality (with one type of stimuli) but not in another (with another type of stimuli). The lack of correlation of performance across modalities was observed despite a reasonable split-half correlation for the visual and auditory tasks. The tactile task by contrast displayed a very low split-half reliability, even with abovechance performance. The differential psychometric qualities of the same test in different modalities is interesting by itself and potentially further attests to modality differences and constraints. Note that split-half reliability is an important type of reliability but concerns only the internal consistency of the measure and not its stability in time. For a full evaluation of reliability one would need to also investigate test-retest reliability, which is typically lower.

Overall, our data supports the finding that ISL as measured by AGL taps on mechanisms that discover both structure and fragment information (e.g., Knowlton and Squire, 1996; Pavlidou et al., 2010; Pavlidou and Williams, 2014). Yet again, the sensitivity children showed toward both the grammatical structure and chunk strength for the visual stimuli and the sensitivity to grammaticality only for the tactile stimuli provides evidence for such domain-specific constraints on learning mechanisms.

## Is ISL Associated With Phonological Awareness and Reading?

Importantly and pertinent to our main theoretical question, we explored the relationship of ISL (as tested by AGL) with phonological awareness, basic reading as well as fluent reading in typically developing children.

The first striking finding is that children who performed well in the visual task, that is, appear to have picked up the implicit structure embedded in the spatially presented visual shapes, on average, scored well on the phonological awareness task (as tested by CTOPP 2). Good phonological awareness is pivotal to the development of accurate and fluent reading as it encapsulates the novice reader's ability to map letters onto their corresponding speech sounds. However, given the arbitrary L-SS mappings in English (and other deep orthographies) where one letter has more than one speech sound mappings, efficient associations are thought to be the result of both explicit and implicit learning processes. Our findings bolster this argument by adding important data on the potential link of ISL with efficient reading in childhood by presenting a positive trend between visual ISL and phonological awareness. ISL could be a key mediating factor, a mechanism that facilitates the novice reader in picking up not only the regular but importantly the irregular L-SS mapping, resulting to fast and effortless word retrieval. What is surprising, however, is that we did not observe a (significant) correlation between performance on the auditory task and phonological awareness. As we discuss in the section below one possibility is that such a correlation does exist but given measurement error and our relatively small sample our study could not reveal it. This remains, however, an open question for future research.

Although we observed small positive correlations between both basic and fluent reading and AGL performance those were not found to be significant. BFs for the non-significant correlation pairs all fell within the range between 0.33 and 3, leading to the conclusion that the data – rather than providing substantial evidence for the null hypothesis – are uninformative about whether the null or the hypothesis of a positive correlation was supported. Whereas the theoretical link detailed in the introduction would definitely have predicted a positive relationship not only between AGL performance and phonological awareness but also between AGL performance and reading skills, our result is in line with previous studies linking individual differences in ISL performance with individual differences in linguistic skills in children, typically reporting correlations which do not exceed r = 0.30 (e.g., Arciuli and Simpson, 2012b; Shafto et al., 2012; Mainela-Arnold and Evans, 2014; Spencer et al., 2015). It is worth noting that our split-half reliability was far from perfect, which is typical not just for AGL but for many different tasks indexing ISL (e.g., Siegelman et al., 2017a; Bogaerts et al., 2018; West et al., 2018). Since the correlation between two measures is upper-bounded by their reliability, these weak correlations could in fact reflect a stronger true correlation (Bogaerts et al., 2018; Conway et al., 2019).

Our result on phonological awareness is in line with Spencer et al. (2015) findings on early reading skills and statistical learning that confirmed the relation between ISL and phonological processing (which includes phonological awareness) using a large sample. Taken together, our findings and Spencer et al. (2015) findings suggest that ISL supports reading-related skills such as phonological awareness both at the early and later stages of mastering. Moreover, the correlation between visual AGL performance and phonological awareness remained significant also when controlling for age or general intelligence, which suggests that there is a "legitimate" link between visual ISL (as manifested in AGL and reading-related skills). Data from reading abilities in adult self-paced reading (Misyak and Christiansen, 2011) and adult second language learning (e.g., Frost et al., 2013) do suggest that this link persists also for later reading, although this clearly requires further investigation.

## Limitations and Directions for Future Research

Our study is based on a relatively small sample size (N = 31) and this fact, admittedly, raises the concern of low power, and potentially missing correlations that are in reality present (and less accurate estimations of the correlation sizes in general). That the correlations with reading in the current study were positive but not significant, with BFs indicating that the data are insensitive rather than supporting the hypothesis of a zero correlation, calls for future work with larger developmental samples. A power analysis (G∗Power 3.1, Faul et al., 2009) assuming an expected effect size of 0.30, a desired power of 0.90 and a one-tailed test, recommends a sample size as large as 88. Such large sample sizes are definitely a challenge in developmental research (and even more so in multi-session experiments) but they prove to be highly necessary. Future studies could also focus on a more restricted age group; whereas in the current investigation we employed standard scores (with age norms) to control for age we cannot exclude the existence of developmental effects and our test group is not sufficiently large to systematically explore them. Note that with a limited sample, results may also be more affected by deviant observations. The scatter plot depicted in **Figure 5** demonstrates, however, that this does not seem to be the case.

Another point to consider is that even in our visual and tactile tasks (for which we observed an above-chance mean group performance) respectively 39 and 29% of children did not perform above chance and hence did not display evidence for learning. This pattern of results is very common for both developing and adult samples (e.g., Don et al., 2003; Gabay et al., 2015, see also Siegelman et al., 2017a). From an individual differences viewpoint at-chance performance might

be meaningful, yet a substantial proportion of the data points simply reflect noise in terms of predictive validity (Siegelman et al., 2017a; see also Siegelman et al., 2017b for a discussion on issues arising from looking at individual differences in statistical learning). Future studies might hence want to develop methods optimized for the measurement of individual differences in developmental samples. Another possible approach would be to explore correlations looking only at the subset of children who show evidence for learning but this would require a larger sample than the one we had available in the present study.

A third point that deserves some attention is the distinction between modality and the specific stimuli we choose to employ. The use of non-verbal stimuli (e.g., pure tones) in our experiments have the advantage of inducing the net efficiency of ISL computations, however, as we used one type of stimulus in each modality we cannot distinguish modality effects from stimulus effects with the use of a specific type of stimulus. It would be therefore interesting to explore in future work stimulus and modality effects by using multiple types of stimuli within the same perceptual dimension (e.g., for the auditory domain pure tones and non-verbal daily sounds, Siegelman et al., 2018).

Finally, the low reliability of the tactile AGL task calls our attention to the mandatory pursuit of exploring the extent to which the various available paradigms are robust proxies of ISL. Therefore, subsequent studies should enhance our understanding of the psychometric properties of all available ISL tasks to inform theory and guide research practice.

## CONCLUSION

On the whole and from a theoretical point of view, our data on AGL performance across perceptual modalities suggests notable modality differences and constraints in the implicit assimilation of statistical regularities. For the types of stimuli and the underlying grammar this investigation tested, we found that young children (6–9 years old) perform, as a group, above-chance performance on a visual task with abstract shapes and tactile tasks with finger vibrations, but not on an auditory task with pure tones. Moreover, we observed no significant correlation of ISL performance across modalities and suggestive differences in the psychometric properties of the different tasks.

Despite such modality difference there might be shared computational principles for the extraction of statistical information (adjacent/non-adjacent dependencies) that operate in different modalities (Frost et al., 2015) and these could be implicated also in reading-related skills (Frost et al., 2013), given that statistical regularities are inherent to each language system. Our finding of a significant positive correlation between visual AGL performance and phonological skills provides support for such a theoretical link. However, at the same time we observed surprisingly low and non-significant correlations between AGL performance and our measures of basic reading as well as reading fluency. These could indicate developmental effects; yet they could also be the result of measurement limitations or the combination of both. Nevertheless, neuroimaging data (e.g., Kosslyn and Koenig, 1992; Paquier and Mariën, 2005; Shohamy and Turk-Browne, 2013) suggest that successful reading is driven by an interaction between domain general and domain specific mechanisms, which support not only efficient learning of perceptual features but also implicit statistical regularities idiosyncratic to each language. Clearly however, we are in need of additional larger-scale systematic investigations of ISL skills and reading skills at both the behavioral and neurobiological levels of analysis and across various populations.

## ETHICS STATEMENT

This study was carried out in accordance with the recommendations of Human Investigation Committee (HIC), Yale University, and the Psychology Research Ethics Committee, University of Edinburgh, with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by HIC (Protocol No. 1304011782).

Research conducted with children at Haskins Laboratories, Yale University, and Edinburgh University was subject to and covered by human subjects protocols and institutional review boards, respectively. The policies that are in place define the standards for the participation of children in research studies conducted at or by Yale University, The University of Edinburgh, and partner institutions. As stated in Yale Institutional Review Boards protocol "children participating in research constitute a special class of subjects for which special protections apply. . . All children considered for enrollment in, or enrolled as subjects in research must be treated in a manner commensurate with their special status as minors. Such research must be designed to ensure the appropriate enrollment of children and employ additional safeguards as described in this policy to ensure and protect their rights and welfare."

The research team made sure that all children's rights were met and that all the experimental conditions were age appropriate so that children benefit the most from the inherent procedures and the overall experience of participating in psychological research. It was of profound importance and an urgent priority of the research team to guarantee maximum research quality as this is defined and understood in ethical codes of research practice. Thus, following the ethical research guidelines for children as participants, all ethical issues were addressed successfully. More specifically, a number of essential criteria for good practice were adopted in relation to the study:

## Process of Consent/Assent

Initially, a letter was drafted and sent out to parents/guardians/caregivers seeking written permission for the child to participate in the research. The parental permission form was written in a language understandable by the parent/guardian/caregiver and contained all elements of informed consent, including a description of the research study, the research procedures and any potential risks or benefits (a copy of the form can be provided upon request). The default mode of informed consent was "opt-in": all consent forms contained two parts: an explanatory statement and the consent form (which was signed). The signed part asked the parent/guardian/caregiver to agree on their child's participation

by signing the form and returning the signed part to the teacher and/or researcher. This way the parent/guardian/caregiver was actively giving consent for participation.

Further to parental consent, we asked for child consent either through writing or oral consent in case of poor literacy skills. We also created information sheets to provide more detailed information on the experimental conditions and procedures relating to the study. Given that the study was targeting children of different age groups, we developed separate ageappropriate information sheets to ensure that all children fully understand what they were giving consent for. Additionally, prior to giving their consent, parents and children were thoroughly explained what the tasks entail and what they are expected to do during those tasks. This ensured that both children and parents/caregivers have a good understanding of the experimental procedure and their input during this process and that they are fully aware about the content of their consent. Despite parental/guardian and child consents, children were frequently reminded that they can opt out of any experiment and at any given time point during the project without any adverse effects or modifications in compensation. In more detail:

#### Evaluation of Subjects Capacity to Provide Informed Consent/Assent

For children, parent/guardian/caregiver (s) were asked to sign the consent forms to allow their children to participate in the study. Parents were always encouraged to ask questions about each study or the consent form itself before they sign. As a matter of course and as stated earlier, children who could follow verbal instructions were asked to provide written (were possible) or oral assent before they participate. Children were given a brief introduction to the tasks so that they ascertain explicitly that they are happy to take part. Children were told that they do not have to complete the tasks if they do not wish to and that they can choose to stop doing them at any time. Given that participation in research is voluntary, children had the right to withdraw at any time. Because of the possibility that children may not be able to communicate their desire to withdraw so clear, the research team took up the responsibility to listen to them and be prepared to have to stop a session prematurely. Children were taken seriously when they began to show signs of discomfort or say "no." Nevertheless, the research team made every effort to make the child feel comfortable during the consent process (procedures discussed above) as well as during the study. All children were asked to summarize what they have been told about what they will be doing during each experiment to ensure that they comprehended the procedures.

#### Safety and Data Monitoring Plan

The assessment of the overall risk level for children participating in this study by the research team was of minimal risk and adverse events were not anticipated. In the unlikely event that such events occurred, the experimental procedure would be terminated and serious adverse events will be reported within 48 h to all relevant stakeholders (including the European Commission and regulatory agencies).

Also, all data collected during the study (both hard copies and electronic) were monitored periodically by EP to ensure maximum safety. Data would be destroyed during the project only if participants or their families specifically require for this to happen. This could be done by contacting EP and requesting data destruction. None of the participating families asked for data destruction to date.

#### Confidentiality and Security of Data

Risks to subject confidentiality were minimized by adopting suitable data storage procedures. Data were/are kept in locked rooms and in locked file cabinets and on password protected computers. More specifically, the consent forms were/are locked in a filing cabinet. Hard copies of the testing protocols and clinical notes were/are also stored in a locked filing cabinet. All hard copies of research data and the clinical information containing personal health information were stored separately in locked file cabinets. Importantly, consent forms were/are NOT be stored with the data so that children are not identifiable by unauthorized sources.

All data for each child were identified by numerical ID; this preserves the anonymity of the child. The master file with children's names and identification numbers (needed to ensure that children meet eligibility criteria) were/are entered in a password protected excel file on a password protected computer. Only the research team had/has authorized access to the master file. All electronic data are stored on a secure server or password protected computers that are furnished with firewalls and antispy and anti-virus software. Names never appeared or will appear in any publication or be mentioned in any public place in connection with this project. The database will be maintained within the existing data management system (i.e., passwordprotected secure databases) providing a high degree of security and quality monitoring.

#### Data Destruction

Finally, in relation to data destruction the research team complied with the national guidelines for data destruction. In most cases, records are kept for 10 years unless there is a specific request for the data to be destroyed at an earlier point. Therefore, all data are kept for 10 years (unless participants request to be destroyed at another point) and will be destroyed after the collapse of this time frame. Consent givers were informed that it is common in research practice to keep the data for this substantial time to allow full scientific analyses to take place. They were reassured that the data will remain safe, confidential and anonymous during that time and in any form of dissemination. However, they were also informed of their right to request destruction of the acquired data any time during or after the completion of the study.

#### Economic Considerations and Insurance of Participants

Participants received monetary compensation for participation in each session of the experiment. For each behavioral testing/questionnaire session they received 20 Euros (\$30). Participants were also be eligible for a 40 Euros (\$50) bonus for completion of all study sessions. In detail, children were asked to

participate in three visits during which they will receive both the implicit learning tasks and the standardized screening tests and they would be compensated 60 Euros (\$90) in total. The study had minimal risks; but in the very unlike event of an injury, children and their families would be fully compensated for any medical costs. There were no such incidents.

#### Benefits of the Study

All participants were provided with the results of standardized behavioral assessment batteries. Other than that, they received no direct benefits to health or well-being and they were made fully aware of that fact before participating. However, our research on implicit learning across different modalities and its relationship with other cognitive abilities such as reading has implications for theories of learning and reading as well as for other didactic and pedagogical aspects of reading.

#### AUTHOR CONTRIBUTIONS

EP conceptualized the study, defined and prepared the experimental stimuli and design, organized, and supervised the recruitment of participants, carried out the data collection, and contributed to the writing of the manuscript. LB carried out all statistical analyses, took care of the interpretation and description of the results, and contributed to the writing of the manuscript.

#### REFERENCES


Both authors contributed to the manuscript revision, and read and approved the submitted version.

### FUNDING

EP received funding from the European Union's 7th Framework Programme under the Marie Skłodowska-Curie Grant Agreement No. 301704 (PIOF). LB received funding from the European Union's Horizon 2020 Research and Innovation Programme under the Marie Skłodowska-Curie Grant Agreement No. 743528 (IF-EF).

#### ACKNOWLEDGMENTS

We would like to thank children and their families for participating in the study. We would also like to thank Ms. Stutzman for helping with data collection, Ms. Gumkowski for the development of the auditory task in E-prime, Dr. Lympouridis for designing and developing Vibra F, and Ms. Burg for creating the astronaut glove. Finally, a special thanks to Prof. Ken Pugh for his scientific advice when designing the study and Dr. Noam Siegelman for his insightful comments on the manuscript.

Chomsky, N. (1959). Verbal behavior. Language 35, 26–58. doi: 10.2307/411334



Jeffreys, H. (1961). Theory of Probability, 3rd Edn. Oxford: Oxford University Press.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Pavlidou and Bogaerts. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.