Does musical enrichment enhance the neural coding of syllables? Neuroscientific interventions and the importance of behavioral data

A commentary on "Music enrichment programs improve the neural encoding of speech in at-risk children" by Kraus, N., Slater, J., Thompson, E. C., Hornickel, J., Strait, D. L., Nicol, T., et al. (2014). J. Neurosci. 34, 11913–11918. doi: 10.1523/JNEUROSCI.1881-14.2014.

Speech perception problems lead to many different forms of communication difficulties, and remediation for these problems remains of critical interest. A recent study by Kraus et al. (2014b) published in the Journal of Neuroscience, used a randomized controlled trial (RCT) approach to identify how low intensity communitybased musical enrichment for "at-risk children" improved neural discrimination of "ba/ga" syllables. In the study, forty-four children aged six to nine years from "gang reduction zones" received 2 hours of musical training each week arranged in two 1 hour sessions. A control group received a single year of training following a one year delay, whilst the experimental group received two full years of training without delay. They found that auditory brainstem responses (ABRs) to the "ba/ga" syllables were changed in the experimental group, but only after more than one year of training. ABRs were not changed in the control group, either following the delay or after the first full year of training. We endorse the use of a randomized control trial (RCT) to evaluate this educational programme, but argue that several additional criteria must be met before firm conclusions can be drawn about the benefits of the intervention.
Kraus et al. argue their results provide evidence that "community music programs may stave off certain languagebased challenges" (Kraus et al., 2014b, p. 11915), but this claim is hard to sustain without behavioral data (e.g., of concomitant improvements in speech perception or literacy). For the current paper, it would be necessary to show group differences in behavior that relate to the educational program, and explore the ways that individual differences in neural and behavioral profiles vary with the speech and literacy measures. This is particularly important given that a meaningful musicianship advantage in speech perception can be hard to demonstrate, as the size of the advantage shown for musicians (compared to non-musicians) is small (<1 dB) (Parbery-Clark et al., 2009) and has not been consistently replicated (Fuller et al., 2014;Ruggles et al., 2014). We also note a more recent follow up study (Kraus et al., 2014a) shows no improvement in literacy skills associated with active musical engagement.
There are other important issues: for example, Kraus et al. presented a single pair of synthesized "ba" and "ga" syllables 6000 times, at a rate of 4.35 repetitions per second, to each participant. No naturally occurring human speech sequences occur like this: speech tokens are never identical, and repetition itself is normally avoided as it is low in informational value (change, not repetition, conveys information) and leads to illusory percepts (cf. the verbal transformation effect, Pitt and Shoaf, 2002).
In addition, these items were synthesized speech tokens in which a single acoustic cue (the trajectory of the second format, F2) was manipulated. Notably, the major frequency difference where the F2s are maximally different between ba and ga (900-2480 Hz) are not investigated as the cross-phaseogram measurements are restricted to 900-1500 Hz, due to a lack of phase locking above 1500 Hz (Aiken and Picton, 2008). This frequency "window" restricts the analysis to a range where the whole F2 sweep for "ba" is included, but most of that for "ga" is excluded from the analysis (see Figure in Supplementary Materials, Hornickel et al., 2009). This suggests that the response is not specifically discriminant per se, and may be associated with detection of the presence of "ba" stimuli. A contrast of "ba" with "da," which has a lower F2 sweep, would be a way to address this. To further develop our understanding of these ABR effects, it is also essential to understand how the measurements used in this study relate to the auditory brain stem and cortex measures used in other investigations, of the effects of musical training. Table 1 shows a summary of the ABR papers on musical training in children which illustrates the wide variety of measures used and their significance across studies. Currently undergoing private instrumental training, began musical training by age 5 and had practiced ≥20 min at least 5 days weekly for last 4 years

cABR peak timing
First peak at the start of the formant transition (43 ms) faster in quiet and in noise. No significant differences between onset peak (9 ms) or steady-state vowel peak (63 ms) in quiet or noise Less of a quiet-noise timing shift in formant peak (43 ms), but not in onset (9 ms) or steady-state (63 ms) Currently undergoing private or group music training for minimum of 12 consecutive months before the study. Attending weekly classes and used materials to practice 4 times a week at home

cABR peak timing
Onset peak (9 ms) and formant transition (43 ms) faster in quiet and in noise. No significant difference in steady-state peak (63 ms) in quiet or noise.
Less of a quiet-to-noise timing shift for formant transition peak (43 ms), but no significant differences in quiet-noise timing shifts for onset (9 ms) or steady-state vowel (63 ms) peaks cABR peak amplitude No absolute amplitude differences in quiet or noise conditions, nor a difference in quiet-noise amplitude reductions for onset (9 ms Bold indicates a significant difference between the musicians and non-musicians/control group in at least one measure. *Indicates the same ABR measurement used in Kraus et al. (2014b).
RCTs involve certain design features, which Kraus et al. do not always fully exploit: for example, the difference in the size between the control (n = 18) and the experimental (n = 26) groups is unexplained, and may require a different statistical approach (Keselman and Keselman, 1990). The lack of an active control group prevents us from understanding whether the reported neural changes could be induced by an alternative enrichment activity (which is acknowledged by the authors), or whether a more focused language or literacy intervention would have yielded more effective results. It is also important to stress that while the paper makes specific claims about treatment effects for "impoverished brains" (e.g., individuals from low socio-economic backgrounds), no direct evidence of this impoverishment is provided, nor evidence that the effects on "impoverished" brains are any different to the effects on nonimpoverished brains, e.g., by including another control group. RCT methodology requires the reporting of the system used to generate the random allocation sequence, as well as mentioning participant drop-out rates, means, SDs, effect sizes and associated confidence intervals. Although an important first step, this paper falls some way short of suggested recommendations for the reporting of RCTs (Schulz et al., 2010).
To conclude, we have critiqued a recent high impact intervention study examining the effect of musical training on neural responses. Ineffective interventions provide false hope and waste financial resources (Strong et al., 2011) and therefore intervention programmes need to be evaluated rigorously. It is admirable to investigate the potential of community based musical training to improve neural coding of speech, but we argue that a stronger standard of evidence is required before concluding that musical enrichment enhances speech, language and literacy skills.