Auditory Stimulation Training With Technically Manipulated Musical Material in Preschool Children With Specific Language Impairments: An Explorative Study

Auditory stimulation training (AST) has been proposed as a potential treatment for children with specific language impairments (SLI). The current study was designed to test this assumption by using an AST with technically modulated musical material (ASTM) in a randomized control group design. A total of 101 preschool children (62 male, 39 females; mean age = 4.52 years, SD = 0.62) with deficits in speech comprehension and poor working memory capacity were randomly allocated into one of two treatment groups or a control group. Children in the ASTM group (n = 40) received three 30-min sessions per week over 12 weeks, whereas children in the comparison group received pedagogical activities during these intervals (n = 24). Children in the control group (n = 37) received no treatment. Working memory, phoneme discrimination and speech perception skills were tested prior to (baseline) and after treatment. Children in the ASTM group showed significantly greater working memory capacity, speech perception, and phoneme discrimination skills after treatment, whereas children in the other groups did not show such improvement. Taken together, these results suggest that ASTM can enhance auditory cognitive performance in children with SLI.


INTRODUCTION
The National Institute of Deafness and Other Communicative Diseases (NIDCD) reported that 7 to 10% of 5-year-old children are diagnosed with specific language impairment (SLI). Typically, these children have problems with specific rules of grammar, such as the misuse of verb tense (Reilly et al., 2014). There appears to be consensus that SLI originates from deficits in low-level auditory processing of both linguistic and other sound materials (Fey et al., 2011;Murphy and Schochat, 2013). This hypothesis has led to the development of so-called auditory stimulation training (AST) to address children with SLI. However, identifying key elements and evaluating them across different AST programs has remained a daunting task. Here, we explore the efficiency of modulated musical sound materials as produced by a commercially available system (AUDIVA R ) to stimulate auditory processing in children with SLI. Specifically, we addressed how this training would affect the cognitive processing of linguistic materials and auditory working memory in this target group. Murphy and Schochat (2013) conducted a systematic review and found a total of 29 studies eligible for review, which were categorized in terms of three different types of AST: softwarebased training, formal auditory temporal training, and music training. The first type of AST representing a majority of these studies entailed software programs including FastForWord, Earobics, AudioTraining or STAR (Cognitive Concepts, 1997;Scientific Learning Corporation, 1998). Briefly, these software programs represent different types of tasks that are related to linguistic processing (e.g., listen-and-repeat tasks) and, in few cases, cognitive tasks (e.g., working memory training). Some of these programs manipulate the rate and intensity of speech components of frequently repeated linguistic and non-linguistic stimuli and/or manipulate language components, such as vowels, consonants and syllables to improve language processing. Auditory and language learning gains are indicated in these studies with respect to behavioral (Tallal et al., 1996;Strehlow et al., 2006;Fisher et al., 2009;Murphy and Schochat, 2013) and electrophysiological measures (Hayes et al., 2003;Russo et al., 2005;Heim et al., 2013). These findings support the hypothesis that holds that AST training leads to significant increases in language outcomes and enhanced auditory temporal processing. However, Murphy and Schochat (2013) identified risks of biases in some of the studies, which limit the interpretation of their findings.

Auditory Stimulation Training and Language Skills
Two of the studies cited above focused on children with SLI; thus, they appear most relevant to the present endeavor (Tallal et al., 1996;Heim et al., 2013). Tallal et al. (1996) tested the assumption that acoustically modified speech with rapidly changing elements in the acoustic waveform should facilitate acoustic processing in children with SLI. The authors indeed found significantly improved speech discrimination and language comprehension abilities, albeit there was no control or comparison group to substantiate this interpretation. Heim et al. (2013) examined whether the temporal precision of neural coding in children with SLI might be improved via intensive, software-based audio-visual training. The learners attend visually presented exercises that are accompanied by increasingly complex acoustic events such as frequency sweeps, phonemes, words, and sentences. After 32 days of training, the authors observed behavioral improvements in measures of language comprehension. They also found that children improved their temporal auditory processing skills via AST. However, the absence of an alternative training group compromises this interpretation. In other words, whether the observed effects were specifically related to this type of AST remains to be demonstrated.
The second type of studies according to Murphy and Schochat (2013) evaluated formal auditory temporal training in an acoustic cabin. Researchers applied similar tasks including auditory closure, temporal ordering, sentence and non-verbal sounds or figure to ground for digits tasks (Miranda et al., 2008;Gil and Iorio, 2010;Megale et al., 2010;Schochat et al., 2010;Vilela et al., 2012). However, only one study addressed 7-to 12-yearold children with language disorders; children with SLI were not targeted (Filippini et al., 2012). This study suggested improved auditory temporal skills in these children after formal auditory training, whereas no such improvements were found in either typically developed children or children with language disorders. Fey et al. (2011) explored the efficacy of auditory and language interventions for school-aged children with auditory processing disorders in an evidence-based systematic review. The authors reviewed eleven studies that examined the effects of auditory interventions on language outcomes for children with spoken language disorders. A majority of these articles evaluated the effects of Fast ForWord or Fast ForWord-like interventions. Six of eleven studies were classified as exploratory research. Most of the studies reported overall positive outcomes, while two studies found high improvements on standardized measures of receptive language (e.g., Tallal et al., 1996;Stevens et al., 2008).
Moreover, Fey et al. (2011) reported different outcomes for five efficacy studies and two language-oriented auditory interventions. On the one hand, studies by Alexander and Frost (1982), Tallal et al. (1996), Gillam et al. (2001), and Cohen et al. (2005) showed the benefits of standardized measures of phonological awareness, grammar comprehension and language tests. However, those benefits were only reported for a few secondary measures (Gillam et al., 2001;Cohen et al., 2005), which calls into question the efficacy of the interventions. Pokorni et al. (2004) examined different types of interventions, such as Fast ForWord, LiPS (Lindamood Bell Learning Processes, 1999) and EROBICS (Cognitive Concepts, 1996). The authors did not report any group differences for phonological awareness or language outcomes, besides the Blending Phonemes subtest, in which children with auditory stimulation training based on a phoneme sequencing computer program (LiPS) showed greater improvements than their controls. Finally, the two languageoriented auditory interventions by Bishop et al. (2006) and (Segers and Verhoeven, 2004) did not show any significant gains in auditory processing or language tests or in phonological awareness tasks. In summary, Fey et al. (2011) suggested that the effects of AST for children with language disorders had not been examined adequately.
Other approaches focused on computer-based multisensory learning programs in which the memory of phonemes and graphemes should be strengthened by visual and auditory associations to improve both reading and spelling skills in children with and without dyslexia (Kujala et al., 2001;Kast et al., 2007Kast et al., , 2011Ecalle et al., 2009). For example, a study by Kast et al. (2007) used a spelling software called "Dybuster" that recodes words into multisensory representations with visual and auditory codes, to alleviate writing errors in children with and without dyslexia. Results from a matched control group design with prepost measurement showed that children with dyslexia strongly improved their writing skills after 3-month of visual-auditory multimedia training compared to children with dyslexia who do not received a training. Interestingly, children with dyslexia who received the training also improved their writing performances for words that were not part of the training software, indicating a strong transfer effect of the training. Even children without impaired writing or reading skills improved theirs writing skills through the training (Kast et al., 2007).
Taken together, AST appears to be a promising route in the rehabilitation of children with SLI. Nevertheless, the effectiveness of relatively new approaches to AST such as the use of software tools for speech manipulation, language-oriented auditory interventions or multisensory learning programs is still an ongoing debate (see Kujala et al., 2001;Kast et al., 2007;Fey et al., 2011;Kast et al., 2011;Murphy and Schochat, 2013). Moreover, the different types of AST and the small number of studies make it difficult to determine the efficacy of a particular intervention technique and its general benefits for language skills for children with SLI.

Auditory Stimulation Training and Music
Apart from those results from AST in speech rehabilitation, the effects of music on language-related skills are difficult to compare, and evaluating their overall efficiency is difficult. For example, Kampfe et al. (2010) examined the effects of background music on reading skills, whereas the systematic review of Murphy and Schochat (2013) mainly focused on studies using acoustically modified speech or music processing algorithms. Other approaches examined the impact of music training on language and early literacy skills. Those studies were based on different types of musical interventions, such as playing a musical instrument or singing, and included a wide scope of participants, ranging from infants to typically developed children over children with dyslexia and cochlear implants (Dege and Schwarzer, 2011;Francois et al., 2013;Fonseca-Mora et al., 2015;Gordon et al., 2015;Wiens and Gordon, 2018;Steinbrink et al., 2019). In summary, those studies provide consistent support for the inclusion of music instruction in early childhood (Gordon et al., 2015;Wiens and Gordon, 2018).
Although there is an association between musical behaviors and cognitive performance with respect to auditory processing skills in different domains, it is less clear whether and how this association extends to specific forms of music instruction and to children with language comprehension deficits (see Murphy and Schochat, 2013). Despite the absence of studies focusing on the use of auditory stimulation training with technically modulated musical material (ASTM) to improve language outcomes, the approach is based on studies that were published in the 1970s and 1990s (Tallal and Piercy, 1973;Fitch et al., 1997). For example, Tallal and Piercy (1973) proposed that SLI might be related to deficits in auditory temporal processing. Further research in the 1980s and 1990s supported this hypothesis (e.g., Tallal, 1980;Fitch et al., 1997). In particular, Habib (2000) observed difficulties in the processing of temporal auditory stimuli when the stimuli were presented in rapid succession (Habib, 2000). The author suggested that a limited capacity to process short acoustical information such as vowels and consonants can lead to difficulties in the association of letters and their specific sounds. This lack of encoding could then compromise the sensory-motor mapping of letters and their conversion to phoneme production. Moreover, Steinbrink et al. (2014) concluded from their cohort study that auditory processing abilities causally influenced individual literacy and language development. Further indirect support for the use of ASTM could be found in studies focusing on the speech rhythm in specific language-impaired or dyslexic children across languages. For example, an investigation by Cumming et al. (2015) showed a significant relationship between auditory processing skills and sensitivity to rhythmic timing, which is also a key aspect of music perception and processing (see Goswami, 2010;Huss et al., 2011;Tierney and Kraus, 2013;Woodruff Carr and White-Schwoch, 2014).
Up to this point, strategies to improve auditory speech comprehension at psychoacoustic levels are considered key to facilitate language encoding (e.g., Goh and Taib, 2006). As rhythmic and tonal information is explicit in music, auditory stimulation with musical material might have beneficial effects on the process of auditory perception in children with SLI and dyslexia. One of the first studies to investigate the effects of music listening in children with dyslexia was from Overy and colleagues (Overy, 2000;Overy et al., 2003). They tested 15 dyslexic and 11 typically language-developed children with a specifically designed collection of musical aptitude tests, which made it possible to distinguish among a variety of musical skills (mainly pitch and rhythm). The results showed that children with dyslexia scored higher on pitch skills and lower on rhythm (timing) skills than their counterparts. In addition, Overy et al. (2003) reported a strong correlation between rhythm tapping and spelling abilities, suggesting that the impairment of phonological abilities for children with SLI is linked to the general deficit in processing timing information. Due to the quasi-experimental design, the relatively small sample size, and the non-matched controls, the results must be treated with caution. However, a study by Flaugnacco et al. (2014) support these findings, by examining rhythm perception and production skills as predictors for reading abilities in children with dyslexia.

Rhythm in Music and Language
In the past decade, there were a growing number of investigations into the idea that rhythmic skills in particular might be important for the development of language and literacy skills in typically developed children and children with language disorders, including SLI and dyslexia (e.g., Goswami, 2010;Huss et al., 2011;Tierney and Kraus, 2013;Woodruff Carr and White-Schwoch, 2014). Furthermore, evidence has increasingly shown that children with SLI and dyslexia show impaired temporal processing in both language and music (Corriveau et al., 2007;Brandt et al., 2012;Flaugnacco et al., 2014;Clément et al., 2015;Planchou et al., 2015). In particular, children with dyslexia show deficits in perceiving tempo information and tapping to a default rhythm (Thomson and Goswami, 2008;Corriveau and Goswami, 2009). Sallat and Jentschke (2015) examined the melodic and rhythmic-melodic perception of short musical pieces in 5-yearold children with SLI and two control groups. Controls included children with typical language development (TLD) of the same age as the intervention group and a group of younger children (4 years old) with language skills comparable to those of the children with SLI. The authors hypothesized that the processing of prosodic information involves skills similar to those required in music perception. The results showed that children in the SLI group performed equally to the younger children and poorly compared with the children with TLD. Along these lines, the authors suggested the use of musical material in therapy for children with SLI. This interpretation is consistent with previous research reporting that rhythmic auditory stimulation has the potential to boost linguistic structure processing (Hannon and Johnson, 2005;Przybylski et al., 2013;Cumming et al., 2015;Planchou et al., 2015).

Auditory Stimulation Training With Music for Children With SLI
Despite anecdotal reports of positive effects of ASTM in children with SLI, there are no systematic reports available that address these issues in studies with a controlled longitudinal experimental design. Moreover, none of the reported studies focused on the effects of ASTM on auditory working memory performance. However, working memory lends itself as a cognitive system to be affected not only by music training but also by music listening. As a mental system responsible for temporary storage and simultaneous manipulation of information, it is involved in many kinds of conscious mental processes, such as auditory processing and language comprehension (Baddeley, 2006;Franssen et al., 2006;Roden et al., 2012). Hence, an ASTM that increases the perception and the procession of auditory information might also be helpful in developing efficient articulatory rehearsal strategies and positively affect auditory working memory capacity in children with SLI. Finally, none of the previous studies focused on an ASTM to improve language outcomes. Therefore, the aim of the present study was to explore the effects of ASTM in preschool children with SLI on auditory working memory, speech perception and phoneme discrimination performance. Because certain consonant sounds have primary frequencies above 3,000 Hz, the musical stimuli were manipulated as described. First, we only used high-frequency music by including classical pieces of Mozart, Bach and Vivaldi, which were rich in overtones. Second, the music stimuli were filtered to remove the low frequencies (<1,000 Hz) and boost the high frequencies (>2,000 Hz) of the music signal. Finally, sound delivery was lateralized such that frequencies presented to the left ear at higher levels were attenuated to the right ear and vice versa. Based on previous findings, this approach shows that the collaboration of both hemispheres of the brain is essential for phoneme detection and speech comprehension: the superior posterior part of the left temporal lobe (Broca's area) and adjacent partial regions, located in the parietal and temporal lobe of the Wernicke's area, and areas of the right hemisphere, which are responsible for speech perception and the production of prosodic qualities of speech (see Ryding et al., 1996;Ross et al., 1997). To clearly attribute the observed effects of our ASTM on the dependent variables, we designed an explorative randomized control group study with two time points of measurement. The study protocol also included screening procedures to assess language impairments and specific speechcomprehension deficits.
The present study aimed at investigating these tentative findings further by testing the following hypotheses. First, we hypothesized that children in the ASTM group would show a higher increase in auditory working memory performance after 12 weeks of intervention than would children in the control groups (H1). Second, we hypothesized that children in the ASTM group would outperform controls concerning their speech perception skills at high frequencies from 4,000 to 2,000 Hz (H2). Finally, we hypothesized that children in the ASTM group would show higher phoneme discrimination skills than controls would after the end of the intervention (H3).

Participants
A total of 141 children from 10 preschools located in rural areas of the state of Lower Saxony, Germany, were first selected based on deficits, as identified by interviewing their kindergarten teachers, in any of the following domains: poor listening and memory abilities, easily distractible attention, sensitivity to noise and language that is difficult to understand. Each child was then submitted to a test battery to assess their understanding of basic grammar rules (TROG-D; Von Fox-Boyer, 2009, German adaptation of TROG; Bishop, 1989) and to a standardized auditory screening test (HASE; Schöler and Brunner, 2009). Both tests were used to identify children with SLI. Only children with low percentile ranges for the Test of Reception of Grammar (TROG-D) and with poor performance on one or more subtests of the Auditory Screening Test (HASE), which were defined separately for monolingual and multilingual children in the manual, were further included in the study. In contrast, children with mental disorders, psychological trauma and psychoses, children who used hearing aids, children who permanently replaced the phoneme |k| and |g| with |t| or |d|, and children with an acute illness were excluded from the study. Moreover, children from bilingual families who needed to acquire two formal language systems, were also excluded from the study.
Based on these inclusion and exclusion criteria, a total of N = 101 preschool children (mean age = 4.52 years; SD = 0.59; 62 males, 39 females) from seven different kindergartens were randomly assigned to the AST group, the pedagogical activity control group, and the control group. A power analysis using G * Power (Erdfelder et al., 1996) suggested that the sample size was sufficient to ascertain small to medium effects (f = 0.25) in a mixed within-/between-subject design (α:0.05, power (1-β):0.80, correlations between repeated measures: r = 0.50).
Both interventions (ASTM and PA) took place in kindergartens with different room sizes. Therefore, the allocation of the groups was not divided equally. Forty children (24 male, 16 female) participated in the auditory stimulation group with technically manipulated musical material (ASTM), and twentyfour children (16 male, 8 female) joined the pedagogical activities group (PA). Finally, a total of thirty-seven children (22 male, 15 female) served as waiting controls (CG). Children in the CG had the opportunity to participate in either the ASTM or the PA training after the intervention phase of the present study ended. However, no additional data were collected from these children. One-way between-groups analyses of variance (ANOVA) and chi 2 tests were conducted to assess possible baseline differences between groups. In summary, there was no difference between groups for any independent (Age, Test of Reception of Grammar; all F s < 1.99, all p s > 0.14; Gender: χ 2 = 0.37; df = 2; p = 0.83) or dependent variables (Digit Span, Non-word Recall, Recall of Sentences, Phoneme Discrimination and the Speech Perception at high frequencies; all F s < 1.33, all p s > 0.27) at baseline, indicating that the randomization successfully minimized or avoid systematic bias in the selection process. Migration background between groups was by 13% for each group.

Measurement Instruments Test of Reception of Grammar (TROG-D)
The German adaptation of Bishop's Test of Reception of Grammar (TROG; Bishop, 1989) by Von Fox-Boyer (2009) was administered to measure the verbal comprehension of syntax. Each of the 84 test stimuli was presented in a four-image multiplechoice format with lexical and grammatical foils. Three images were slightly different in terms of the grammar and lexicon of the respective target. For each of the 84 stimuli, a sentence is read to the child. The participants were asked to select one of four images to match the sentence. For example, the corresponding sentence is "The cats were looking at the ball." Only one of four images will match the sentence. The other three images showed (a) two people playing with a ball, (b) two cats looking at a butterfly, and (c) one cat looking at a ball (see Supplementary Appendix 2 for the corresponding image).
Twenty-one grammatical phenomena were administered in test blocks of four spoken sentences with 16 different images. Only if all four sentences of a test block were answered correctly was the test block considered solved. The results were assessed by a total raw score of correctly solved test blocks. The TROG-D is standardized for children between 3.0 and 10.11 years of age. According to the manual, 4-year-old children with a percentile rank ≤41, 5-year-old children with a percentile rank ≤40, and 6year-old children with a percentile rank ≤39 were categorized as children with specific language impairments (SLI). The TROG-D reports a high Cronbach's alpha (α = 0.86) and split-half reliability measures (r = 0.87), representing good internal consistency.

Auditory Screening Test (HASE)
Assuming that an insufficient understanding of language may be related to limited auditory memory abilities (Vukovic and Siegel, 2006), a standardized auditory screening test (HASE; Schöler and Brunner, 2009) for preschool children was used for children who performed poorly on the TROG-D test. The screening provides information on language comprehension and production and on auditory working memory performance. Three of four subtests were used: the Digit Span Test, the Non-word Recall Test, and the Recall of Sentences Test. Each test includes standardized instructions for both administrators and participants. Cronbach's alpha for the three subtests varied between α = 0.71 and α = 0.83.

Digit span test
The reproduction of number sequences (digit span) is considered a valid indicator of phonological loop capacity (Hasselhorn et al., 2012;Roden et al., 2014). Based on the fact that the number of syllables significantly influences auditory working memory performance, only monosyllabic numbers from zero to ten were used, excluding the number seven (Schöler and Brunner, 2009). During the task, several sequences of one-syllable numbers were aurally presented via headphones (2 × 2 numbers, 2 × 3 numbers, 2 × 4 numbers, etc.). The children were required to recall the numbers in the correct chronological order directly after hearing the last number of the sequence. If the recall was correct for two consecutive trials, the trial counted as two points. If only one trial of a sequence was correct, it counted for one point. If the recall failed for four trials in succession, the test ended. The maximum score was ten points, and the longest sequence of numbers was six.

Non-word recall test
The immediate recall of non-words is a measure of the phonological short term memory. In particular, the Non-word Recall Test measured how accurately a participant can store unfamiliar words through the articulatory rehearsal mechanism (Gathercole, 1996). The use of non-words reduces the potential influence of long-term memory to reconstruct the sound or phoneme structure of a perceived word. During the task, ten three-to five-syllable non-words (e.g., "lufa" or "fodekina") were aurally presented via headphones (65 dB). The non-words systematically differed from each other in terms of vowel and consonant order and vowel position. After the presentation of each non-word, the participants were asked to reproduce the word as accurately as possible. Each correctly recalled nonword counted as one point. A maximum of ten points could be reached. No adaptive test design was applied here. Therefore, the test procedure was repeated after all ten non-words had been presented and recalled.
The Recall of Sentence Test measures the production and understanding of language expressions. It also yields a measure of the short-term processing of language. The complete test consists of ten pairs of sentences (a and b). Each pair consists of two different sentences with the same sentence structure (e.g., "The shirt gets ironed" or "The dog gets fed"). The second sentence was presented only if the recall of the first sentence was incorrect. The level of difficulty was systematically increased and varied from short sentences with simple structures (such as "Peter is running") to longer sentences with more-complex syntactical structures (such as "The big lamp hangs over the table in the living room"). Each correctly recalled sentence (a or b) was counted as one point. The maximum task score was 10 points.

Phoneme Discrimination Test
The Phoneme Discrimination Test measures the selective discrimination of phonemes entailing similar consonants in the German language, such as b/d, d/t, g/k, and f/w. During the first task, children were asked to immediately recall 32 different two-syllable non-words (such as "AFI, " "IDA, " EBU"; for a full list of stimuli, see Supplementary Appendix 3) that were presented in alternation in the left and right ears via headphones. Each correctly recalled consonant of the presented non-word counted as one point, up to a maximum of 32 points. The test procedure continued after all 32 non-words had been presented and recalled. In addition, 32 different two-syllable nonwords were presented with background noise, which impedes the correct identification of consonants. To adjust the noise to the hearing threshold level, the noise was filtered across a frequency range from 100 Hz to 20 KHz. Again, children were asked to recall the non-words immediately after hearing. Correlation between the two time points for the Phoneme Discrimination Test without background noise was r = 0.42 and was r = 0.31 for the Phoneme Discrimination Test with background noise.

Auditory Stimulation Training With Technically Manipulated Musical Material (ASTM)
The preschool teachers and child care workers who conducted the ASTM were trained by research assistants. Data at both time points were administered by research assistants only. The commercial hardware system used for the ASTM is called "AUDIVA-HWT 1 ". It contains a Discman (AEG, CDP109), a headphone interface for five headphones (KV-2) and fullsized headphones (QP 160) with a transmission range from 30 to 26,000 Hz. The volume level at the hardware device was fixed at 65 dB (SPL). Because certain consonant sounds have primary frequencies above 3,000 Hz, the musical stimuli were manipulated as described below.
Second, the ASTM included electronic filters that removed low frequencies (<1,000 Hz) and boosted medium and high frequencies (>2,000 Hz) of the music signal to adjust its frequency levels to the primary frequency levels of speech comprehension (see Figure 1 for a visualization of the frequency spectrum of the ASTM with unmanipulated or manipulated musical material). Finally, the medium and high frequencies were lateralized such that they were strongly condensed on the left ear while being shut down on the right ear at the same time and vice versa. Thus, a lateral temporal change in an adjustable time (2-25 s) was given (see Supplementary Data Sheets 1, 2). The ASTM could be adjusted to six levels to determine both the frequency level of the filtering signal and the duration of the lateralization process in seconds. In a standard session of 30 min, a child heard acoustically modified music over earphones in a small group of 5-6 children. During the intervention phase, the frequency level and the duration of the lateralization process both increased from level 1 to level 6 to avoid habituation effects in the hearing process. At level 1 and level 2, lower frequencies (0 to 1 kHz) were separated from the tonic tones (3,000 Hz). In fact, more overtones were presented, whereas tonic tones became softer. At levels 3-4, the overtones started to dominate the acoustic sound pattern. The SPLs of lower frequencies were minimized, whereas the frequencies over 10 kHz became more compressed. Finally, the overtones dominated the acoustic sound pattern at level 5 and level 6 (the last 2 weeks of the intervention). Here, the low frequencies were no longer hearable for participants, and the compression of the high frequencies over 10 kHz reached its maximum. During the ASTM, children were allowed to pursue a silent activity, such as painting or reading picture books.

Pedagogical Activity Group and Controls
Children in the comparison group participated in a pedagogical activity program. Training lessons were organized in different group sizes, with a minimum of five and a maximum of ten children, depending on preschool capacities. The focus of the different activities was the preparation of primary school skills, such as visual differentiation and sequencing, set theory, the recognition of rhyming words and phoneme identification.
Children in the control group did not receive additional training at preschool. Nevertheless, after the intervention phase, those children had the choice to participate in the ASTM. No children received additional speech therapy before or during the intervention.

Design
Children in the ASTM group were submitted to three 30min weekly sessions over a period of 12 weeks. Children in the PA group were submitted to the program at comparable intervals. Moreover, the observation period of the children in the PA and CG groups between the tests at both time points of measure corresponded to that of the ASTM group. Sessions were organized in small groups (fewer than 10 children per session) in different rooms. Moreover, child care workers were instructed to administer a list indicating rate and duration of each intervention FIGURE 1 | Visualization of the frequency spectrum of the ASTM. Condition (A) represents the visualization of the frequency spectrum of the ASTM with unmanipulated music. The dark area represented the sound volume of the orchestra, the brighter area were the solo instruments. The horizontal stripes that emerged were the natural overtones of the violins. Condition (B) showed the frequency spectrum with manipulated music, where high and medium frequencies (>2000 Hz) are compressed at intervals. Moreover, the medium and high frequencies were lateralized in a way that they were strongly condensed on the left ear while being shut down on the right ear at the same time and vice versa.
session to ensure that the exposure to either intervention was constant across groups. Data from children whose attendance was less than 80% were excluded from the analyses. Dependent variables were assessed at two different time points: before and after the intervention phase (Pre-Post-design).
The study design was approved by the institutional review board of the Carl von Ossietzky University of Oldenburg in Germany. Additional written informed consent for participation was obtained from preschool administration, parents and children included in the study.

Procedure
To ensure high ecological validity, children were tested in groups in their preschool rooms during morning classes. Child care workers and preschool teachers conducted the interventions, whereas qualified examiners administered the test battery for data collection. To avoid teaching bias from care workers and preschool teachers and to minimize variability beyond the main experimental manipulation, each child care worker and preschool teacher conducted both types of intervention (AST and pedagogical training). The order of the instruments used at baseline was as follows: First, the language comprehension and working memory capacity of each child were measured via the TROG-D and the HASE. Second, further tests with regard to Speech Perception and Phoneme Discrimination skills were administered to a smaller number of children to ensure that the research protocol would not extend 45 min.
Twelve weeks later (Post-Test), each child in the study cohort repeated the HASE and the tests of their high frequency audibility on Speech Perception and Phoneme Discrimination (with and without background noise). To avoid novelty effects, the order of the HASE, the Phoneme Discrimination Test and the Speech Perception Test was the same for each child at both time points.

Statistical Analyses
Repeated-measures analyses of variance (ANOVA) were conducted for the dependent measures of the auditory working memory (Digit Span Test, Non-word Recall Test), the shortterm processing of language (Recall of Sentence Test), the Phoneme Discrimination (with and without noise) and the Speech Perception (4,000 Hz -2,000 Hz) variables at two time points. The experimental design was a 3 × 2 mixed model, with Group (ASTM, PA and CG) as the between-subject factor and Time (pre-test (T 1) vs. post-test (T 2) ) as the within-subject factor.
Preconditions for the ANOVAs with repeated measurement (normality, Box's M test of equality of covariance matrices and Levene's test of equality of variance) were tested for all dependent variables separately and were met in all cases. Moreover, post hoc tests were applied for multiple comparison of means via Bonferroni adjustments. Table 1 reports the mean values and standard deviations for the independent and dependent variables for both the experimental and control groups.
Subsequent comparison of means for within-subject effects showed that children in the ASTM group (t (39) = 7.92, p < 0.001, d = 0.98) and in the PA group (t (23) = 2.51, p = 0.20, d = 0.21) significantly improved their performances for the Recall of Sentences from T 1 to T 2 . However, the observed effect size was much higher for the children in the ASTM compared to the children in the PA group. No such increase was found for the children in the CG (t (36) = 1.23, p = 0.23).
Finally, correlation analyses showed a positive relationship among the auditory working memory, the Phoneme Discrimination and the Speech Perception tasks (see Table 2).
However, calculating numerous correlations increases the risk of a type I error. In order to avoid this and to keep the overall α-level at 5%, we used an approach by Šidák (1967), in which the adjusted α -level could be calculated exactly: α =1−(1−overall α) 1/k . The letter 'k represents the number of correlation coefficients that were calculated from the data. Thus, the adjusted α -level for the multiple correlations reported in Table 2 was = 0.002. In particular, Pearson's product correlation coefficient revealed a moderate correlation between the Nonword Recall Test and the speech perception outcomes for all groups at T 2 (4,000 Hz: r = 0.39; 3,000 Hz: r = 0.53; 2,000 Hz: r = 0.48, all p-values <0.002). ANCOVAs with repeated measurement for the Speech Perception outcomes and the means of the Non-word Recall Test as covariate were assessed. The interaction effect remained significant at 3,000 Hz (F (2, Phoneme discrimination (PD) 0.55 * * PD n = Phoneme Discrimination with background noise. Due to the risk of a type I error for calculating numerous correlations, p-value for all correlations was adjusted to p < 0.002 (see Cupples et al., 1984). * * The correlation is significant at the level of 0.002. 61) = 3.62, p = 0.03, η p 2 = 0.11), whereas the interaction effect at 4,000 Hz and 2,000 Hz was no longer significant (4,000 Hz: F (2, 61) = 1.77, p = 0.18; 2,000 Hz: F (2, 61) = 2.45, p = 0.09). Complete correlation tables for each group are attached in the Supplementary Appendix 1.

DISCUSSION
In the current study, we investigated whether AST with musical material influences auditory working memory performance, language processing, phoneme discrimination and high frequency hearing abilities in preschool children with SLI. We hypothesized that children in the ASTM group would outperform their controls in each of these tasks. A randomized control trial with pre-post-measurements over a period of 12 weeks of intervention was designed to address these assumptions. We will discuss the findings for each of the three dependent measures in turn.

ASTM, Working Memory Capacity and Short-Term Processing of Language (H1)
First, we found that children in the ASTM group scored significantly higher on auditory working memory and shortterm processing of language measures after the intervention than the controls. Specifically, children in the ASTM group revealed significant increases in their scores on the Digit Span, the Non-word Recall, and the Recall of Sentences tests, whereas the comparison and control group showed no such improvement (except the PA group for the Recall of Sentences). The percentage increase for the ASTM group was between 26% for the Recall of Sentences Test, 27% for the Digit Span test and 42% for the Non-word Recall Test. These findings suggest that ASTM enhances the phonological working memory capacity. Moreover, our results are consistent with previous studies showing that typically developed children benefit from music training to improve their auditory working memory abilities (Lee et al., 2007;Roden et al., 2012). They extend previous findings by suggesting that ASTM might also develop the phonological short term memory and phonological coding strategies of written words of children with SLI. Given the high relevance of phonological working memory for the development of both language and literacy, the observed results are quite remarkable, especially when we consider that deficits in phonological working memory usually persist in children with delayed language development even after successful speech therapy (Henry, 2012). But also when we assume that previous research suggests that children with dyslexia -in contrast to typically developed children -often rely on visual instead of phonological coding strategies for the mediation of words in working memory (Kast et al., 2011).

ASTM and Phoneme Discrimination (H2)
Second, children in the ASTM group also benefited in terms of phoneme discrimination abilities. Whereas there were no significant differences between groups at baseline, children in the ASTM group outperformed their peers in the control groups at the end of the intervention. In addition, children in the ASTM group increased their performance significantly over time, whereas controls did not. One reason for these findings might be the close relationship between phoneme discrimination abilities and formant detection on the one hand and the similarity of formants (in speech) and resonance (in music) on the other hand. This interpretation is consistent with previous research suggesting a link between sound discrimination deficits in children with SLI and the inability to perceive formant transitions (Overy et al., 2003;Filippini et al., 2012;Heim et al., 2013).

ASTM and Speech Perception (H3)
Finally, the data from the present study suggest an advantageous effect of AST with musical material on the speech perception performances at high frequencies (4,000 Hz to 2,000 Hz), where children in the ASTM group outperformed controls. Moreover, children in the ASTM group showed a significantly higher increase from T1 to T2 for their speech perception performances at 4,000 Hz, 3,000 Hz and 2,000 Hz, whereas children in the PA group improved their scores only at 3,000 and 2,000 Hz. In contrast, children in the control group did not increase their performances over time. Several aspects of the ASTM might have caused those results. First, the intervention focused on the training of the perception of speech presented at high frequencies above 2,000 Hz, which are essential for speech perception. Therefore, original music pieces were filtered to remove frequencies below a) 4,000 Hz, b) 3,000 Hz, and c) 2,000 Hz. Hence, the ASTM might have improved the perception of higher frequency for the children in the intervention group. Second, the sound delivery was lateralized such that frequencies presented to the left ear at higher levels were attenuated to the right ear and vice versa. This processing might lead to an improved coordination of the cerebral hemisphere, which is crucial for speech processing. Furthermore, the lateralization training might have increased the precision of the processing time for sound and phoneme discrimination (Ryding et al., 1996;Ross et al., 1997).
We further explored whether the improved performances of Speech Perception lead to higher auditory working memory performance. Therefore, we investigated the relationship between phonological short term memory and the Speech Perception outcomes for all groups. Pearson correlations revealed highly significant correlations of moderate sizes between the Speech Perception at 4,000, 3,000, and 2,000 Hz and the Non-word Recall Test at T2. Finally, ANCOVAs with repeated measurement for the Speech Perception outcomes (at 4,000 Hz and 3,000 Hz) and the means of the Non-word Recall were assessed. Interestingly, statistical analyses revealed different results. Whereas the interaction term remained significant in the analyses for the Speech Perception Test at 3,000 Hz, the interaction effect at 4,000 Hz and 2,000 Hz failed to reach significance. Hence, further research will be necessary to clarify which training aspects of Speech Perception might affect the phonological short term memory. Moreover, the highly significant relationship between phoneme discrimination abilities and the perception of speech (all r's > 0.41, all p-values <0.001) might be explained by the fact that the detection of single formants and phonemes takes place between 4,000 and 3,000 Hz.
Considering the significant interaction effects with high effect sizes for all dependent variables (except for Speech Perception at 2,000 Hz) and the fact that the three groups did not differ at pre-tests for any of the eight dependent measures, our results suggest some validation of a transfer effect of ASTM with morespecific cognitive domains, including components of auditory working memory, short-term processing of language, phoneme discrimination and speech perception abilities. Moreover, this pattern of findings suggests that the randomly assigned groups of children were well distributed at the beginning of the intervention phase and that the superior performance by the ASTM group on the dependent variables at post-test measuring can be attributed to the effects of the ASTM. To our knowledge, this study is the first to show the benefits of an ASTM in preschool children based on a randomized control group design, producing even more persuasive evidence than prior quasiexperimental approaches.
Moreover, it supports previous findings from electrophysiological and behavioral research suggesting that the training of non-linguistic auditory and audiovisual stimuli might cause changes in the neural substrate of sound discrimination in children with dyslexia (Kujala et al., 2001;Kast et al., 2007Kast et al., , 2011.

LIMITATIONS AND CONCLUSION
The current study must be interpreted with some caveats in mind. First, one might argue that the reported benefits for children in the ASTM group over 12 weeks of intervention might be related to a regression to the mean effect or in favor of power for the ASTM group. However, because the current study is based on a randomized control group design and differences between groups in dependent measures at baseline were not significant, this issue appears negligible. Second, the ASTM includes several aspects that might have affected the observed variables, such as the training of the perception of high frequencies or the lateralization training of hemispheric coordination. To clarify which aspects of the ASTM might have caused the observed effects, further investigations are necessary that focus, for example, on neuropsychological mechanisms such as mismatch negativity or other neuroimaging procedures. Moreover, the improvement of language-related outcomes after ASTM might been related to indirectly trained skills, such as cognitive skills, rather than auditory temporal processing (see Gaab et al., 2007).
Third, compared to the individual speech therapies for children with SLI, in which the auditory and phonological deficits persist despite successful treatment (Henry, 2012), the ASTM appeared to be very efficient. However, whether those observed effects are sustainable requires further investigations with additional follow-up tests that include retention intervals after the ASTM intervention. Fourth, some might argue that children in the ASTM group were allowed to paint and read during the intervention, which might substantially alter the amount of attention the children devoted to listening to music. Nevertheless, children in the ASTM group showed significant correlations for all the dependent pre-post-test measures, which provides additional evidence that the advantage for the ASTM group children relies on the specific experience of listening to music. Fifth, the group size between the ASTM and PA groups varies from 5-6 participants in the ASTM group to 5-10 participants in the PA group. However, group size for both intervention groups was relatively small, and none of the participants received individual training lessons. Thus, it seems unlikely that the different group size between the ASTM and PA groups would account for our results in total. According to the different size of all three groups (ASTM: n = 40; PA: n = 24; CG: n = 37), however, a balanced design would lead to more powerful analysis by decreasing the type II error and strengthen its resistance to violations (Milhken and Johnson, 1984). Finally, further studies might control for any differences in the musical aptitude of the participants to clarify additional influences on the observed effects. Moreover, future studies should take into account that an additional control group in which the unfiltered musical material is be played could further clarify whether the observed effects are based on the ASTM or on music listening. In the current study, we refrained from installing such a comparison group for two main reasons, one theoretic, one pragmatic. First, the literature does not provide sufficient evidence to hypothesize that music listening per se could have positive effects in the rehabilitation of children with SLI. Second, at the time of the intervention, access to children who were eligible to participate in this study was limited. Therefore, we decided to run an alternative (pedagogical) intervention group rather than music listening.
In conclusion, the present study provides preliminary evidence that AST with musical material might positively affect auditory working memory capacities, phoneme discrimination performance and speech processing in children with SLI. Our study further strengthens previous results by confirming them in a randomized pre-post study design over 12 weeks of intervention. Although the mechanism that drives these effects remains unclear, the findings are consistent with previous approaches focusing on the relevance of auditory processing disorders in children with SLI. However, further studies should include a group of children who experienced music that was not acoustically modified to assess whether the reported effects are due to the acoustic modifications. Nevertheless, the last decade of examinations on the effects of listening to music on cognitive abilities reports effects that are either null or small, suggesting that music listening might have a much smaller potential compared to AST with technically manipulated musical material (see Kampfe et al., 2010).
In summary, our findings indicate an important first step toward showing that auditory cognitive processing and working memory affect language functioning and promote language development in children with SLI. Therefore, we suggest that ASTM could be used as a supplement to speech therapy in language development disorders or in educational institutions for language promotion.

ETHICS STATEMENT
The study was approved by the ethics committee of the Carl von Ossietzky University of Oldenburg, Germany.