<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Psychol.</journal-id>
<journal-title>Frontiers in Psychology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Psychol.</abbrev-journal-title>
<issn pub-type="epub">1664-1078</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpsyg.2022.801263</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Psychology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Music Perception Abilities and Ambiguous Word Learning: Is There Cross-Domain Transfer in Nonmusicians?</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Smit</surname>
<given-names>Eline A.</given-names>
</name>
<xref rid="aff1" ref-type="aff"><sup>1</sup></xref>
<xref rid="aff2" ref-type="aff"><sup>2</sup></xref>
<xref rid="c001" ref-type="corresp"><sup>&#x002A;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/1004957/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Milne</surname>
<given-names>Andrew J.</given-names>
</name>
<xref rid="aff1" ref-type="aff"><sup>1</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/669709/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Escudero</surname>
<given-names>Paola</given-names>
</name>
<xref rid="aff1" ref-type="aff"><sup>1</sup></xref>
<xref rid="aff2" ref-type="aff"><sup>2</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/53507/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>The MARCS Institute for Brain, Behaviour and Development, Western Sydney University</institution>, <addr-line>Sydney, NSW</addr-line>, <country>Australia</country></aff>
<aff id="aff2"><sup>2</sup><institution>ARC Centre of Excellence for the Dynamics of Language</institution>, <addr-line>Canberra, ACT</addr-line>, <country>Australia</country></aff>
<author-notes>
<fn id="fn0001" fn-type="edited-by"><p>Edited by: Caicai Zhang, The Hong Kong Polytechnic University, Hong Kong SAR, China</p></fn>
<fn id="fn0002" fn-type="edited-by"><p>Reviewed by: Francesca Talamini, University of Padua, Italy; Mireille Besson, UMR7291 Laboratoire de Neurosciences Cognitives (LNC), France</p></fn>
<corresp id="c001">&#x002A;Correspondence: Eline A. Smit, <email>e.smit@westernsydney.edu.au</email></corresp>
<fn id="fn0003" fn-type="other"><p>This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>28</day>
<month>02</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>13</volume>
<elocation-id>801263</elocation-id>
<history>
<date date-type="received">
<day>25</day>
<month>10</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>08</day>
<month>02</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2022 Smit, Milne and Escudero.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Smit, Milne and Escudero</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Perception of music and speech is based on similar auditory skills, and it is often suggested that those with enhanced music perception skills may perceive and learn novel words more easily. The current study tested whether music perception abilities are associated with novel word learning in an ambiguous learning scenario. Using a cross-situational word learning (CSWL) task, nonmusician adults were exposed to word-object pairings between eight novel words and visual referents. Novel words were either non-minimal pairs differing in all sounds or minimal pairs differing in their initial consonant or vowel. In order to be successful in this task, learners need to be able to correctly encode the phonological details of the novel words and have sufficient auditory working memory to remember the correct word-object pairings. Using the Mistuning Perception Test (MPT) and the Melodic Discrimination Test (MDT), we measured learners&#x2019; pitch perception and auditory working memory. We predicted that those with higher MPT and MDT values would perform better in the CSWL task and in particular for novel words with high phonological overlap (i.e., minimal pairs). We found that higher musical perception skills led to higher accuracy for non-minimal pairs and minimal pairs differing in their initial consonant. Interestingly, this was not the case for vowel minimal pairs. We discuss the results in relation to theories of second language word learning such as the Second Language Perception model (L2LP).</p>
</abstract>
<kwd-group>
<kwd>music perception</kwd>
<kwd>pitch</kwd>
<kwd>phonological processing</kwd>
<kwd>cross-situational word learning</kwd>
<kwd>auditory perception</kwd>
</kwd-group>
<counts>
<fig-count count="4"/>
<table-count count="1"/>
<equation-count count="0"/>
<ref-count count="144"/>
<page-count count="14"/>
<word-count count="12172"/>
</counts>
</article-meta>
</front>
<body>
<sec id="sec1" sec-type="intro">
<title>Introduction</title>
<p>Music and language are universal to humans (<xref ref-type="bibr" rid="ref95">Patel, 2003</xref>) and the connection between the two has been an object of research for centuries, with early ideas even suggesting that music is a spin-off of language in evolution (<xref ref-type="bibr" rid="ref102">Pinker, 1997</xref>). While the precise origins of music and language remain unclear, there are many parallels that can be drawn between the two. Both use a rule-based hierarchical structure organized into discrete elements and sequences (<xref ref-type="bibr" rid="ref126">Tervaniemi et al., 1999</xref>; <xref ref-type="bibr" rid="ref125">Tervaniemi, 2001</xref>; <xref ref-type="bibr" rid="ref95">Patel, 2003</xref>; <xref ref-type="bibr" rid="ref26">Deg&#x00E9; and Schwarzer, 2011</xref>; <xref ref-type="bibr" rid="ref16">Burnham et al., 2015</xref>), such as syllables, words, and sentences for language and single notes, intervals, chords, and musical phrases for music (<xref ref-type="bibr" rid="ref90">Ong et al., 2016</xref>). When focusing on the acoustic characteristics of music and speech sounds, similarities can be found in the reliance on segments of rhythm and harmony alternated with silence, pitch, acoustic envelope, duration, and fundamental frequency (<xref ref-type="bibr" rid="ref130">Varnet et al., 2015</xref>). In order to understand music and speech, a listener needs to categorize sounds into meaningful units. For speech, perceptual skills are needed to distinguish sounds into separate vowels or consonants and for music into pitches (<xref ref-type="bibr" rid="ref52">Hallam, 2017</xref>). The auditory skills needed to process language are similar to those needed to discriminate between rhythms (<xref ref-type="bibr" rid="ref69">Lamb and Gregory, 1993</xref>), harmonies, and melodies (<xref ref-type="bibr" rid="ref3">Barwick et al., 1989</xref>; <xref ref-type="bibr" rid="ref69">Lamb and Gregory, 1993</xref>; <xref ref-type="bibr" rid="ref2">Anvari et al., 2002</xref>). Numerous studies support the overlap of auditory processes involved in music and speech perception (<xref ref-type="bibr" rid="ref93">Overy, 2003</xref>; <xref ref-type="bibr" rid="ref124">Tallal and Gaab, 2006</xref>; <xref ref-type="bibr" rid="ref96">Patel and Iversen, 2007</xref>; <xref ref-type="bibr" rid="ref108">Sammler et al., 2007</xref>; <xref ref-type="bibr" rid="ref135">Wong and Perrachione, 2007</xref>; <xref ref-type="bibr" rid="ref17">Chandrasekaran et al., 2009</xref>; <xref ref-type="bibr" rid="ref61">Kraus and Chandrasekaran, 2010</xref>; <xref ref-type="bibr" rid="ref5">Besson et al., 2011</xref>; <xref ref-type="bibr" rid="ref106">Rogalsky et al., 2011</xref>; <xref ref-type="bibr" rid="ref110">Schulze et al., 2011</xref>; <xref ref-type="bibr" rid="ref9">Bidelman et al., 2013</xref>; <xref ref-type="bibr" rid="ref50">Gordon et al., 2015</xref>; <xref ref-type="bibr" rid="ref62">Kraus and White-Schwoch, 2017</xref>) and individuals with musical training appear to be advantaged in these shared processes (<xref ref-type="bibr" rid="ref63">Krishnan et al., 2005</xref>; <xref ref-type="bibr" rid="ref10">Bigand and Poulin-Charronnat, 2006</xref>; <xref ref-type="bibr" rid="ref64">Krizman et al., 2012</xref>; <xref ref-type="bibr" rid="ref600">White-Schwoch et al., 2013</xref>; <xref ref-type="bibr" rid="ref32">Elmer et al., 2014</xref>).</p>
<p>Those that are expert listeners in either music or language have been found to show <italic>cross-domain transfer</italic> (<xref ref-type="bibr" rid="ref90">Ong et al., 2016</xref>), where an advantage is found for perception in the other domain; for example, in word segmentation (<xref ref-type="bibr" rid="ref47">Fran&#x00E7;ois et al., 2013</xref>), syllabic perception (<xref ref-type="bibr" rid="ref86">Musacchia et al., 2007</xref>; <xref ref-type="bibr" rid="ref92">Ott et al., 2011</xref>; <xref ref-type="bibr" rid="ref33">Elmer et al., 2012</xref>; <xref ref-type="bibr" rid="ref67">K&#x00FC;hnis et al., 2013</xref>; <xref ref-type="bibr" rid="ref20">Chobert et al., 2014</xref>; <xref ref-type="bibr" rid="ref8">Bidelman and Alain, 2015</xref>), receptive and productive phonological skills at the word, sentence and passage level (<xref ref-type="bibr" rid="ref111">Slevc and Miyake, 2006</xref>), and word dictation (<xref ref-type="bibr" rid="ref123">Talamini et al., 2018</xref>). It is suggested that long-term expertise in music, which is gained by years of practice, has led to a fine-tuning of the auditory system (<xref ref-type="bibr" rid="ref116">Strait and Kraus, 2011a</xref>,<xref ref-type="bibr" rid="ref117">b</xref>), as evidenced by enhanced neural responses to changes in acoustic elements, such as pitch, intensity, and voice onset time (<xref ref-type="bibr" rid="ref109">Sch&#x00F6;n et al., 2004</xref>; <xref ref-type="bibr" rid="ref78">Magne et al., 2006</xref>; <xref ref-type="bibr" rid="ref58">Jentschke and Koelsch, 2009</xref>; <xref ref-type="bibr" rid="ref80">Marie et al., 2011a</xref>,<xref ref-type="bibr" rid="ref81">b</xref>). Musicians indeed show enhanced cortical processing of pitch in speech compared to nonmusicians (<xref ref-type="bibr" rid="ref78">Magne et al., 2006</xref>; <xref ref-type="bibr" rid="ref6">Besson et al., 2007</xref>; <xref ref-type="bibr" rid="ref86">Musacchia et al., 2007</xref>; <xref ref-type="bibr" rid="ref61">Kraus and Chandrasekaran, 2010</xref>). These and numerous other studies support the idea of cross-domain transfer between music and speech perception (see <xref ref-type="bibr" rid="ref52">Hallam, 2017</xref> for an extensive list). The present study focuses on the potential auditory processing advantages in pitch perception and auditory working memory (<xref ref-type="bibr" rid="ref92">Ott et al., 2011</xref>; <xref ref-type="bibr" rid="ref67">K&#x00FC;hnis et al., 2013</xref>; <xref ref-type="bibr" rid="ref101">Pinheiro et al., 2015</xref>; <xref ref-type="bibr" rid="ref27">Dittinger et al., 2016</xref>, <xref ref-type="bibr" rid="ref28">2017</xref>, <xref ref-type="bibr" rid="ref29">2019</xref>) associated with music perception skills. Many examples of the effect of music training on speech processing have been reported. For instance, training in music has been associated with phonological perception in the native language (L1; <xref ref-type="bibr" rid="ref140">Zuk et al., 2013</xref>) and with fluency in a second language (L2; <xref ref-type="bibr" rid="ref120">Swaminathan and Gopinath, 2013</xref>; <xref ref-type="bibr" rid="ref136">Yang et al., 2014</xref>). As well, longitudinal studies in children&#x2019;s speech perception found positive effects of music training (<xref ref-type="bibr" rid="ref83">Moreno et al., 2009</xref>; <xref ref-type="bibr" rid="ref26">Deg&#x00E9; and Schwarzer, 2011</xref>; <xref ref-type="bibr" rid="ref47">Fran&#x00E7;ois et al., 2013</xref>; <xref ref-type="bibr" rid="ref127">Thomson et al., 2013</xref>). Regarding the transfer of music experience to word learning, <xref ref-type="bibr" rid="ref27">Dittinger et al. (2016</xref>, <xref ref-type="bibr" rid="ref28">2017</xref>, <xref ref-type="bibr" rid="ref29">2019</xref>) presented listeners with unfamiliar Thai monosyllabic words and familiar visual referents during a learning phase and tested them on their ability to match the words with their corresponding visual objects. Overall, they found that both music training led to higher accuracy in both young adults and children. Additionally, a longitudinal effect of music training was shown, as musicians had the same advantage when tested 5&#x2009;months later (<xref ref-type="bibr" rid="ref27">Dittinger et al., 2016</xref>).</p>
<p>However, counter-examples to a positive association between music training and speech perception also exist (<xref ref-type="bibr" rid="ref107">Ruggles et al., 2014</xref>; <xref ref-type="bibr" rid="ref12">Boebinger et al., 2015</xref>; <xref ref-type="bibr" rid="ref121">Swaminathan and Schellenberg, 2017</xref>; <xref ref-type="bibr" rid="ref115">Stewart and Pittman, 2021</xref>). For instance, <xref ref-type="bibr" rid="ref121">Swaminathan and Schellenberg (2017)</xref> found that rhythm perception skills predicted English listeners&#x2019; discrimination of Zulu phonemic contrasts, but only for contrasts that closely resembled English phonemic contrasts. The authors found no association between other music perception skills, such as melody perception or general music training and non-native speech perception, suggesting that an effect of rhythm rather than pitch is related to participants&#x2019; native language background rather than their music skills. Specifically, unlike for tonal languages, English does not contrast pitch for signaling lexical meaning; hence, it is likely that listeners focus on other cues, such as temporal cues, to distinguish one word from another.</p>
<p>Apart from the ability to perceive novel or familiar phonological contrasts, another important component involved in speech processing, including novel word learning, is working memory. Working memory, which is a short-term memory involved in immediate conscious perceptual and linguistic processing, plays an important role in novel word learning (<xref ref-type="bibr" rid="ref48">Gathercole et al., 1997</xref>; <xref ref-type="bibr" rid="ref134">Warmington et al., 2019</xref>). Mixed results have been found regarding a musician&#x2019;s advantage in working memory, with some studies finding no difference between musicians and nonmusicians (<xref ref-type="bibr" rid="ref53">Hansen et al., 2012</xref>), whereas others find improved auditory and verbal working memory for musicians compared to nonmusicians (<xref ref-type="bibr" rid="ref94">Parbery-Clark et al., 2011</xref>; <xref ref-type="bibr" rid="ref4">Bergman Nutley et al., 2014</xref>). A meta-analysis conducted by <xref ref-type="bibr" rid="ref122">Talamini et al. (2017)</xref> on different types of memory found a medium effect size for short-term and working memory with musicians performing better than nonmusicians, depending on the type of stimulus used.</p>
<p>Most studies examining the link between speech processing and musical abilities have compared professional musicians to nonmusicians (see <xref ref-type="bibr" rid="ref139">Zhu et al., 2021</xref>), with a large focus on explicit tasks when comparing linguistic and musical abilities (e.g., <xref ref-type="bibr" rid="ref27">Dittinger et al., 2016</xref>, <xref ref-type="bibr" rid="ref28">2017</xref>, <xref ref-type="bibr" rid="ref29">2019</xref>). In such tasks, there is no ambiguity during learning, but the link between words and meaning in daily life is much more ambiguous without immediate clear connections, with studies showing that pairing between words and their referent objects are learned by tracking co-occurrences through repeated exposure (e.g., <xref ref-type="bibr" rid="ref114">Smith and Yu, 2008</xref>; <xref ref-type="bibr" rid="ref42">Escudero et al., 2016b</xref>; <xref ref-type="bibr" rid="ref84">Mulak et al., 2019</xref>). Very little is known about the role of musical abilities for ambiguous word learning scenarios, which are most common in everyday life of word learning (<xref ref-type="bibr" rid="ref128">Tuninetti et al., 2020</xref>). In the realm of music perception, recent studies have shown that musical elements, such as musical grammar (<xref ref-type="bibr" rid="ref75">Loui et al., 2010</xref>), harmony (<xref ref-type="bibr" rid="ref59">Jonaitis and Saffran, 2009</xref>), musical expectation (<xref ref-type="bibr" rid="ref97">Pearce et al., 2010</xref>), and novel pitch distributions from unfamiliar musical scales (<xref ref-type="bibr" rid="ref89">Ong et al., 2017a</xref>; <xref ref-type="bibr" rid="ref73">Leung and Dean, 2018</xref>), can be learned through statistical learning. Statistical learning is a domain-general learning mechanism leading to the acquisition of statistical regularities in (in this case auditory) input. This type of learning may lead to cross-domain transfer between music and language due to learners showing sensitivity toward particular acoustic cues (e.g., pitch; <xref ref-type="bibr" rid="ref90">Ong et al., 2016</xref>) which may result in improved ambiguous word learning. Despite the potential effect of music abilities on ambiguous word learning and the many types of learners considered in statistical word learning studies (such as young infants, children and adults, and L2 learners <xref ref-type="bibr" rid="ref138">Yu and Smith, 2007</xref>; <xref ref-type="bibr" rid="ref114">Smith and Yu, 2008</xref>; <xref ref-type="bibr" rid="ref119">Suanda et al., 2014</xref>; <xref ref-type="bibr" rid="ref42">Escudero et al., 2016b</xref>,<xref ref-type="bibr" rid="ref43">c</xref>; <xref ref-type="bibr" rid="ref84">Mulak et al., 2019</xref>), participants&#x2019; musical experience or expertise have yet to investigated. In sum, it has been established that music and language rely on similar general auditory processing skills and, although results are mixed, the majority of studies finds an advantage for music training on auditory and speech perception. By testing whether music abilities in a nonmusician population can help ambiguous word learning, we can further unravel more influences of music on language learning than previously shown.</p>
<p>The current study tests the effect of specific music perception abilities on statistical learning of novel words in a nonmusician adult population. We tested musical abilities through two adaptive psychometric tests targeting specific music perception skills, namely, the ability to perceive fine-pitch mistuning, through the Mistuning Perception Test (MPT; <xref ref-type="bibr" rid="ref70">Larrouy-Maestri et al., 2018</xref>, <xref ref-type="bibr" rid="ref71">2019</xref>), and the ability to discriminate between pitch sequences, through the Melodic Discrimination Test (MDT; <xref ref-type="bibr" rid="ref54">Harrison et al., 2017</xref>; <xref ref-type="bibr" rid="ref55">Harrison and M&#x00FC;llensiefen, 2018</xref>). The MPT is an adaptive psychometric test measuring sensitivity to intonation accuracy in vocal musical performance (<xref ref-type="bibr" rid="ref70">Larrouy-Maestri et al., 2018</xref>, <xref ref-type="bibr" rid="ref71">2019</xref>). Perception of vocal mistuning is a core musical ability, as evidenced by its high correlation with other musical traits (<xref ref-type="bibr" rid="ref72">Law and Zentner, 2012</xref>; <xref ref-type="bibr" rid="ref68">Kunert et al., 2016</xref>; <xref ref-type="bibr" rid="ref71">Larrouy-Maestri et al., 2019</xref>), and its importance when judging the quality of a musical performance (<xref ref-type="bibr" rid="ref71">Larrouy-Maestri et al., 2019</xref>). The MDT aims to test melodic working memory, as it requires melodies to be held in auditory working memory in order for participants to compare and discriminate them correctly (<xref ref-type="bibr" rid="ref30">Dowling, 1978</xref>; <xref ref-type="bibr" rid="ref54">Harrison et al., 2017</xref>; <xref ref-type="bibr" rid="ref55">Harrison and M&#x00FC;llensiefen, 2018</xref>). To do well in these tasks, specific auditory processing skills, in particular pitch perception and auditory working memory, are required. A recent large-scale study across thousands of speakers of tonal, pitch-accented, and non-tonal languages using these two tasks (and a beat alignment task) has shown that language experience shapes music perception ability (<xref ref-type="bibr" rid="ref74">Liu et al., 2021</xref>). Here, we test the opposite, namely, whether the same music perception skills help with language learning, and specifically when learning novel words with different degrees of phonological overlap. Our specific focus is on pitch processing abilities but acknowledge that rhythm processing is also an important component in music and language processing (see <xref ref-type="bibr" rid="ref121">Swaminathan and Schellenberg, 2017</xref>).</p>
<p>To test whether pitch perception and auditory working memory are helpful when learning words in ambiguous scenarios, we used a cross-situational word learning (CSWL) paradigm in which meanings of new words are learned through multiple exposures over time without explicit instruction, where learning of word-object pairings can only take place through their statistical co-occurrences (e.g., Escudero et al., under review; <xref ref-type="bibr" rid="ref138">Yu and Smith, 2007</xref>; <xref ref-type="bibr" rid="ref60">Kachergis et al., 2010</xref>; <xref ref-type="bibr" rid="ref113">Smith and Smith, 2012</xref>; <xref ref-type="bibr" rid="ref41">Escudero et al., 2016a</xref>,<xref ref-type="bibr" rid="ref43">b</xref>, <xref ref-type="bibr" rid="ref44">2021</xref>; <xref ref-type="bibr" rid="ref84">Mulak et al., 2019</xref>; <xref ref-type="bibr" rid="ref128">Tuninetti et al., 2020</xref>). Early CSWL experiments focused on words with very little phonological overlap (e.g., <xref ref-type="bibr" rid="ref114">Smith and Yu, 2008</xref>; <xref ref-type="bibr" rid="ref131">Vlach and Johnson, 2013</xref>), where a listener can rely on other cues to learn the novel words and does not have to focus on the fine phonological details of each word (<xref ref-type="bibr" rid="ref42">Escudero et al., 2016b</xref>). Therefore, (<xref ref-type="bibr" rid="ref41">Escudero et al., 2016a</xref>,<xref ref-type="bibr" rid="ref42">b</xref>) and <xref ref-type="bibr" rid="ref84">Mulak et al. (2019)</xref> studied CSWL of monosyllabic non-minimal and minimal pairs, differing only in one vowel or consonant, to test whether listeners can encode sufficient phonological detail in a short time to learn these difficult phonological contrasts. It was found that accurate phonological encoding of vowel and consonant contrasts predicts high performance in CSWL tasks (<xref ref-type="bibr" rid="ref41">Escudero et al., 2016a</xref>; <xref ref-type="bibr" rid="ref84">Mulak et al., 2019</xref>).</p>
<p>In the present study, we thus tested whether musical ability impacts word learning of phonologically overlapping words using <xref ref-type="bibr" rid="ref42">Escudero et al. (2016b)</xref> and <xref ref-type="bibr" rid="ref84">Mulak et al. (2019)</xref>&#x2019;s CSWL paradigm. Overall, we hypothesize that those with stronger musical abilities are better at perceiving speech sounds due to enhanced pitch perception and working memory, and that will be reflected in higher accuracy overall in the CSWL task. We may also see differences in how well vowels and consonants are learned, due to higher acoustic variability in vowels compared to consonants (<xref ref-type="bibr" rid="ref87">Ong et al., 2015</xref>), which may favor learners with stronger pitch perception skills.</p>
</sec>
<sec id="sec2" sec-type="materials|methods">
<title>Materials and Methods</title>
<sec id="sec3">
<title>Participants</title>
<p>Fifty-four participants took part in the study and were tested online, which is our common practice since the start of the COVID-19 pandemic, using our validated online testing protocols (<xref ref-type="bibr" rid="ref44">Escudero et al., 2021</xref>). In <xref ref-type="bibr" rid="ref44">Escudero et al. (2021)</xref>, we compared online and face-to-face testing using the same CSWL design and online testing results were found to be very similar to results from the laboratory. Ten participants were excluded from the analysis due to technical difficulties, mostly internet dropouts during the experiment or excessive environmental noise, leading to a total participant sample of 44 (<italic>M</italic><sub>age</sub>&#x2009;=&#x2009;26.79, <italic>SD</italic><sub>age</sub>&#x2009;=&#x2009;11.12, 33 females). Participants were recruited through the Western Sydney University&#x2019;s online research participation system (SONA) or <italic>via</italic> word-of-mouth and participation was rewarded with course credit for the former and voluntary for the latter. Written informed consent was obtained online from all participants prior to the start of the experiment, and the study was approved by the Western Sydney University Human Research Ethics Committee (H11022).</p>
</sec>
<sec id="sec4">
<title>Materials</title>
<sec id="sec5">
<title>Questionnaires</title>
<p>The questionnaires conducted at the beginning of the experiment consisted of two parts: a language and a musical background questionnaire. The language background questionnaire consisted of questions aimed to get detailed information regarding participants native (and other) language, as well as the language background of their parents/caretakers. The musical background questionnaire is the Goldsmiths Musical Sophistication Index (GMSI; <xref ref-type="bibr" rid="ref85">M&#x00FC;llensiefen et al., 2014</xref>), which aims to collect wide-range data related to one&#x2019;s engagement with music (e.g., music listening and music performance behavior). Both questionnaires were administered through Qualtrics (Qualtrics, Provo, UT). From the GMSI, 23 participants indicated having zero years of experience with playing an instrument, and seven had 10 or more years of experience. From the language questionnaire, we found that 17 were Australian English monolinguals and 27 were bi- or multilinguals.</p>
</sec>
<sec id="sec6">
<title>Cross-Situational Word Learning</title>
<p>All words and visual referents have been used in prior CSWL studies (<xref ref-type="bibr" rid="ref132">Vlach and Sandhofer, 2014</xref>; <xref ref-type="bibr" rid="ref41">Escudero et al., 2016a</xref>,<xref ref-type="bibr" rid="ref43">b</xref>; <xref ref-type="bibr" rid="ref84">Mulak et al., 2019</xref>; Escudero et al., under review). Novel words consisted of eight monosyllabic nonsense words recorded by a female native speaker of Australian English and followed a consonant-vowel-consonant (CVC) structure while adhering to English phonotactics. The stimuli were produced in <italic>infant-directed speech</italic> (IDS) as we are replicating previous studies that used IDS to compare adult and infant listeners and included two tokens for each word to match prosodic contours across all stimuli (<xref ref-type="bibr" rid="ref41">Escudero et al., 2016a</xref>,<xref ref-type="bibr" rid="ref43">b</xref>).</p>
<p>The eight words were combined into minimal pair sets to form specific consonant or vowel minimal pairs or non-minimal pairs. The two types of minimal pairs featured words that either differed in their initial consonant (consMPs; e.g., BON-TON) or in their vowel (vowelMPs; e.g., DIT-DUT). Non-minimal pairs were formed by pairing two words from each of the two minimal pair types in random order (nonMPs; e.g., BON-DIT).</p>
<p>Every novel word was randomly paired with a color picture of a novel item, which is not readily identifiable as a real-world object. These word-referent pairings were the same for all participants. An overview of the novel words and visual referents is presented in <xref rid="fig1" ref-type="fig">Figure 1</xref>.</p>
<fig position="float" id="fig1">
<label>Figure 1</label>
<caption><p>The eight novel words and their visual referents. The four words in the top row are minimally different in their initial consonant, whereas the words on the bottom are minimally different in their vowel. The vowel used for the consonant minimal pairs is/O/as in POT. Vowels used for the vowel minimal pairs are/i/as in BEAT, /I/as in BIT, /u/as in BOOT, and/U/as in PUT.</p></caption>
<graphic xlink:href="fpsyg-13-801263-g001.tif"/>
</fig>
</sec>
<sec id="sec7">
<title>Mistuning Perception Test</title>
<p>The MPT, which is an adaptive psychometric test, uses short excerpts (6&#x2013;12&#x2009;s) of musical stimuli from pop music performances which are representative of real-life music and are therefore ecologically valid (from MedleyDB; <xref ref-type="bibr" rid="ref11">Bittner et al., 2014</xref>). The test highly correlates with low- and high-level pitch perception abilities, such as pitch discrimination and melody discrimination, and thus provides an assessment of important pitch processing abilities (<xref ref-type="bibr" rid="ref71">Larrouy-Maestri et al., 2019</xref>). In a two-alternative forced-choice task, participants were presented with a pitch-shifted version (out-of-tune) and the normal version (in-tune) of a stimulus and were asked to indicate which version was out-of-tune. Pitch shifting varied from 10 cents to 100 cents, sharp, and flat (for more details about the construction of the MPT, see <xref ref-type="bibr" rid="ref71">Larrouy-Maestri et al., 2019</xref>). Before starting the task, participants received an example of an out-of-tune and an in-tune version. A demo of the experiment can be found on <ext-link xlink:href="https://shiny.gold-msi.org/longgold_demo/?test=MPT" ext-link-type="uri">https://shiny.gold-msi.org/longgold_demo/?test=MPT</ext-link>.</p>
</sec>
<sec id="sec8">
<title>Melodic Discrimination Test</title>
<p>Similar to the MPT, the MDT is also an adaptive psychometric test. The MDT is developed to test one&#x2019;s ability to discriminate between two melodies (<xref ref-type="bibr" rid="ref54">Harrison et al., 2017</xref>; <xref ref-type="bibr" rid="ref55">Harrison and M&#x00FC;llensiefen, 2018</xref>). Participants are presented with a three-alternative forced-choice (3-AFC) paradigm where they listen to three different versions of the same melody, each with a different pitch height (musical transposition), and with one containing an <italic>altered</italic> note produced by changing its relative pitch compared to the base melody (<xref ref-type="bibr" rid="ref54">Harrison et al., 2017</xref>), resulting in a pitch height change for one note compared to the other melodies. Each melody can be altered using four pre-determined constraints: (1) melodies with five notes or fewer cannot have the first nor last note altered, (2) melodies with six notes or longer cannot have the first two nor last two notes altered, (3) the note cannot be altered by more than six semitones, and (4) the altered not must be between an eight note and a dotted half note in length (see <xref ref-type="bibr" rid="ref54">Harrison et al., 2017</xref>). Participants are asked to indicate which of the three melodies are the odd one out. Participants heard an implementation of the MDT with 20 items (see doi:10.5281/zenodo.1300951) using the shiny package in R (<xref ref-type="bibr" rid="ref18">Chang et al., 2020</xref>) which uses an adaptive item selection procedure with each participant&#x2019;s performance level determining the level of difficulty of item presentation. Performance level is estimated using Item Response Theory (<xref ref-type="bibr" rid="ref25">de Ayala, 2009</xref>). A demo of the experiment can be found on <ext-link xlink:href="https://shiny.gold-msi.org/longgold_demo/?test=MDT" ext-link-type="uri">https://shiny.gold-msi.org/longgold_demo/?test=MDT</ext-link>. Tests scores for both the MDT as the MPT are computed as intermediate and final abilities with weighted-likelihood estimation (<xref ref-type="bibr" rid="ref133">Warm, 1989</xref>) and using Urry&#x2019;s rule for item selection (<xref ref-type="bibr" rid="ref77">Magis and Ra&#x00EE;che, 2012</xref>).</p>
</sec>
</sec>
<sec id="sec9">
<title>Procedure</title>
<p>We followed our adult online testing protocol, which was validated in <xref ref-type="bibr" rid="ref44">Escudero et al. (2021)</xref>, for details please see on <ext-link xlink:href="https://osf.io/nwr5d/" ext-link-type="uri">https://osf.io/nwr5d/</ext-link>. In short, participants signed up for a timeslot on SONA after which they received an email with specific instructions for the experiment (e.g., wearing headphones and participating from a silent study space with no background noise was required) and an invitation for a Zoom call. Participants unable to meet the participation requirements were excluded from the analysis (see Section &#x201C;Participants&#x201D;). During the Zoom call, participants were first familiarized with the procedure and then sent links to the consent forms, background questionnaires, and the experiment. During the experiment, they were asked to share their screen and computer audio throughout the entire video call, apart from when filling out the questionnaire to ensure privacy. Participants&#x2019; screen and audio sharing enabled experimenter&#x2019;s verification of appropriate auditory stimuli presentation and participants&#x2019; attention. The experimenter was on mute and with their video off during the experiment to avoid experimenter bias.</p>
<p>Participants first completed the language and musical background questionnaires and were then instructed to start the CSWL task. The CSWL task consisted of a learning and a test phase set up in PsychoPy 3 (<xref ref-type="bibr" rid="ref98">Peirce, 2007</xref>; <xref ref-type="bibr" rid="ref99">Peirce et al., 2019</xref>) hosted on <ext-link xlink:href="http://Pavlovia.org" ext-link-type="uri">Pavlovia.org</ext-link>. Following previous CSWL studies, minimal instruction was provided (i.e., &#x201C;Please listen to the sounds and look at the images&#x201D;) prior to the learning phase. During the learning phase, participants saw 24 trials each consisting of two images accompanied by auditory representations of two words without indication of which word corresponded to which image. The visual referents were presented first for 0.5&#x2009;s before the onset of the first word. Both words lasted for 1&#x2009;s and were followed by a 0.5&#x2009;s inter-stimuli interval (ISI). After this, a 2&#x2009;s inter-trial interval (IT) consisting of a blank screen was then presented, leading to a total trial time of 5&#x2009;s. The learning phase was directly followed by a test phase of 24 trials, for which participants were told that they would be tested on what they have learned and to indicate their answers by pressing specific keys on the keyboard. Every test trial presented two possible visual referents simultaneously on the screen for 3&#x2009;s. During this, participants heard one spoken target word four times (with alternating tokens of the words) and were then asked to indicate which visual referent (the left or the right one) corresponded with the target word by pressing a key on the keyboard any time after the onset of the target word. Trial order was randomized across all participants. The presentation of left and right of the visual referents was counterbalanced and resulted in two between-subject learning conditions. A blank screen of 2&#x2009;s was presented in between trials. Directly after the CSWL task, participants completed the MDT and the MPT task to measure their music perception abilities.</p>
</sec>
</sec>
<sec id="sec10">
<title>Statistical Analysis</title>
<p>We used a Bayesian Item Response Theory (IRT) model to analyze accuracy. IRT models are particularly useful for predicting the probability of an accurate answer depending on an item&#x2019;s difficulty, its discriminability, a participant&#x2019;s latent ability, and a specified guessing parameter (<xref ref-type="bibr" rid="ref15">B&#x00FC;rkner, 2020</xref>), which provides a lower bound for the model&#x2019;s predictions. The statistical analyses were run in the statistical program R (<xref ref-type="bibr" rid="ref105">R Core Team, 2020</xref>) with the brms package using Stan (<xref ref-type="bibr" rid="ref13">B&#x00FC;rkner, 2017</xref>, <xref ref-type="bibr" rid="ref14">2018</xref>; <xref ref-type="bibr" rid="ref105">R Core Team, 2020</xref>).</p>
<p>We used approximate leave-one-out (LOO) cross-validation to find the model that generalizes best to out-of-sample data. Additionally including GMSI or participant&#x2019;s language background did not improve the out-of-sample predictions of the model.</p>
<p>The best model included only the interaction between Pair type and MPT. However, as we are interested in both MPT and MDT as main factors, we will report the next best model. The difference in the LOOIC values for these two models is negligible. Prior to fitting the models, we tested for correlation between MPT, MDT, and GMSI. MPT and MDT were moderately positively correlated, <italic>r</italic>(1054)&#x2009;=&#x2009;0.39, <italic>p</italic>&#x2009;&#x003C;&#x2009;0.005; MPT and GMSI were moderately positively correlated, <italic>r</italic>(1054)&#x2009;=&#x2009;0.30; and MDT and GMSI were weakly positively correlated, <italic>r</italic>(1054)&#x2009;=&#x2009;0.11.</p>
<p>Accuracy was modeled as a binary response variable, with 0 for inaccurate and 1 for accurate. We used a 4-parameter non-linear logistic model (4PL, <xref ref-type="bibr" rid="ref1">Agresti, 2010</xref>) on the Bernoulli distribution with an item, a person and a guessing parameter. The discriminability parameter is removed. The item parameter models the difficulty of the tested items (in this case the pair types); the person parameter models the individual ability of each participant. The guessing parameter represents the probability of being accurate if participants were only guessing (<xref ref-type="bibr" rid="ref15">B&#x00FC;rkner, 2020</xref>). All of our trials are binary forced choice; hence, we use a fixed guessing parameter of 0.5. An advantage of using IRT for modeling binary accuracy responses is that this probability can be taken into account as a type of baseline in the model, meaning that the model&#x2019;s estimates of the underlying probability of being correct will not fall below the 0.5 threshold. We did not include a discrimination parameter, as all tested items are very similar.</p>
<p>The categorical variable Pair type was turned into a factor and modeled using dummy coding, which is the default in R. For MPT and MDT, we are using the raw data scores, as recommended by the experiment designers (MPT: <xref ref-type="bibr" rid="ref70">Larrouy-Maestri et al., 2018</xref>, <xref ref-type="bibr" rid="ref71">2019</xref>; MDT: <xref ref-type="bibr" rid="ref54">Harrison et al., 2017</xref>; <xref ref-type="bibr" rid="ref55">Harrison and M&#x00FC;llensiefen, 2018</xref>), which were computed from the underlying item response models. These scores range from &#x2212;4 to +4. GMSI was scaled and centered to a previously determined population mean from <xref ref-type="bibr" rid="ref55">Harrison and M&#x00FC;llensiefen (2018)</xref>.</p>
<p>For the 3-PL IRT accuracy model, we included separate priors for the item, person and guessing parameters. As detailed below, all such priors were weakly informative in that they weakly favor an effect of zero size and disfavor unfeasibly large effects. The following model formula (including priors) was run in R:</p>
<preformat>
Accuracy&#x2009;&#x2009;~0.5&#x2009;&#x2009;+&#x2009;&#x2009;0.5&#x2009;&#x2009;<sup>&#x002A;</sup>&#x2009;&#x2009;inv_logit(eta),
Eta&#x2009;&#x2009;~1&#x2009;&#x2009;+&#x2009;&#x2009;Pair&#x2009;type&#x2009;&#x2009;<sup>&#x002A;</sup>&#x2009;&#x2009;(MDT&#x2009;&#x2009;ability&#x2009;&#x2009;+&#x2009;&#x2009;MPT ability)&#x2009;+&#x2009;(1|item)&#x2009;+&#x2009;(1|participant),
nl&#x2009;=&#x2009;TRUE)
family&#x2009;&#x2009;&#x003C;&#x2212;&#x2009;&#x2009;brmsfamily(&#x201C;bernoulli,&#x201D; link&#x2009;&#x2009;=&#x2009;&#x2009;&#x201C;identitiy&#x201D;).
priors&#x003C;&#x2212;
prior(&#x201C;normal&#x2009;&#x2009;(0,5),&#x201D;&#x2009;&#x2009;class&#x2009;&#x2009;=&#x2009;&#x2009;&#x201C;b,&#x201D;&#x2009;&#x2009;nlpar&#x2009;&#x2009;&#x2009;=&#x2009;&#x2009;&#x201C;eta&#x201D;)&#x2009;&#x2009;+
prior(&#x201C;constant(1),&#x201D;&#x2009;&#x2009;class&#x2009;&#x2009;=&#x2009;&#x2009;&#x201C;sd,&#x201D; group&#x2009;&#x2009;=&#x2009;&#x2009;&#x201C;participant,&#x201D; nlpar&#x2009;&#x2009;=&#x2009;&#x2009;&#x201C;eta&#x201D;)&#x2009;&#x2009;+
prior(&#x201C;normal(0,3),&#x201D;&#x2009;&#x2009;class&#x2009;&#x2009;=&#x2009;&#x2009;&#x201C;sd,&#x201D;&#x2009;&#x2009;group&#x2009;&#x2009;=&#x2009;&#x2009;&#x201C;item,&#x201D; nlpar&#x2009;&#x2009;=&#x2009;&#x2009;&#x201C;eta&#x201D;).
</preformat>
<p>An important aspect of Bayesian regression is that it calculates the whole posterior distribution of each effect, which allows for the calculation of credibility intervals. In contrast with frequentist confidence intervals, credibility intervals indicate the 95% certainty that reported effect falls within the range of the interval (<xref ref-type="bibr" rid="ref112">Smit et al., 2019</xref>). Evidence for a hypothesized effect will be assessed through evidence ratios, which quantify the likelihood of a tested hypothesis against its alternative (<xref ref-type="bibr" rid="ref13">B&#x00FC;rkner, 2017</xref>, <xref ref-type="bibr" rid="ref14">2018</xref>). We consider evidence ratios of &#x003E;10 to be strong evidence and above &#x003E;30 to be very strong evidence [see <xref ref-type="bibr" rid="ref57">Jeffreys (1998)</xref>, as cited by <xref ref-type="bibr" rid="ref65">Kruschke (2018)</xref>]. For directional hypotheses, where the predicted direction of an effect is given, effects with evidence ratios of &#x003E;19 are roughly similar to an alpha of 0.05 in null-hypothesis significance testing (NHST; <xref ref-type="bibr" rid="ref79">Makowski et al., 2019</xref>; <xref ref-type="bibr" rid="ref82">Milne and Herff, 2020</xref>).</p>
<p>We expect that high musical perception abilities transfer to stronger phonological processing which subsequently translates to higher performance in the CSWL task (as evidenced by higher accuracy), compared to those with less musical perception abilities. With regards to the three tested pair types, we expect them to follow the same pattern as in previous CSWL studies, namely, a higher performance for nonMPs and consMPs and lower performance for vowelMPs (<xref ref-type="bibr" rid="ref41">Escudero et al., 2016a</xref>). Additionally, we were interested in the differences between the moderations of MPT and MDT per pair type. As the MPT tests for perception of fine-pitch changes, one might expect participants with higher MPT scores to learn vowel contrasts more easily due to the acoustic similarities between musical pitch and vowels. As MDT measures auditory short-term memory, we expect high MDT scores to positively correlate with accuracy in general.</p>
</sec>
<sec id="sec11" sec-type="results">
<title>Results</title>
<p><xref rid="fig2" ref-type="fig">Figure 2</xref> shows the overall percentage of accurate responses per pair type. Performance across pair types appears to be very similar and participants were able to learn all pair types during the task, as evidence by performance being significantly above chance (see <xref rid="fig2" ref-type="fig">Figure 2</xref>). Accuracy for these learners is similar, albeit a little lower, to that found in a previous study (between 0.60 and 0.70 for all pair types) using the exact same design and online testing methodology (<xref ref-type="bibr" rid="ref44">Escudero et al., 2021</xref>).</p>
<fig position="float" id="fig2">
<label>Figure 2</label>
<caption><p>Mean accuracy (in percentage) per pair type. Error bars represent the standard error over the mean accuracy responses per pair type. The dotted line represents accuracy by chance.</p></caption>
<graphic xlink:href="fpsyg-13-801263-g002.tif"/>
</fig>
<p>Hypothesis tests run on the results from the multilevel Bayesian model show strong evidence that for participants with average MDT and MPT, accuracy for consMPs is lower than for nonMPs (see <xref rid="tab1" ref-type="table">Table 1</xref>, hypothesis 1). We did not find sufficient evidence to support a difference between the other pair types (hypotheses 2 and 3). We then tested whether performance per pair type is moderated by MPT and MDT ability. As shown in <xref rid="fig3" ref-type="fig">Figure 3</xref>, mean accuracy for nonMPs does not appear to be moderated by MPT ability, whereas for consMPs, higher MPT ability leads to higher accuracy, which was not expected. Also unexpectedly, the opposite occurs for vowelMPs, where higher MPT ability negatively impacts performance. As per our predictions, for MDT ability (see <xref rid="fig4" ref-type="fig">Figure 4</xref>), we see that higher scores generally lead to improved accuracy, especially for nonMPs and vowelMPs. However, important to note is that, as visualized by the colored ribbons in <xref rid="fig3" ref-type="fig">Figures 3</xref>, <xref rid="fig4" ref-type="fig">4</xref>, the slopes&#x2019; credibility intervals are highly overlapping, which indicates that the evidence for these differences might not be decisive. Therefore, we conducted hypothesis testing to confirm this (see hypotheses 4&#x2013;6 for MPT ability and 10&#x2013;12 for MDT ability in <xref rid="tab1" ref-type="table">Table 1</xref>). As can be seen in <xref rid="tab1" ref-type="table">Table 1</xref>, MDT ability influences accuracy in the expected direction (i.e., higher MDT leads to higher accuracy) for all pair types, but unexpectedly, MPT has a negative effect on accuracy for vowelMPs.</p>
<table-wrap position="float" id="tab1">
<label>Table 1</label>
<caption><p>Hypothesis testing&#x2014;accuracy model.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Hypothesis tests</th>
<th align="center" valign="top">Estimate</th>
<th align="center" valign="top">Est. Error</th>
<th align="center" valign="top">[90% CI]</th>
<th align="center" valign="top">Evid. Ratio</th>
<th align="center" valign="top">Post. Prob</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top" colspan="6">For average MDT and MPT ability:</td>
</tr>
<tr>
<td align="left" valign="bottom">1. nonMP&#x2013;consMP &#x003E; 0</td>
<td align="char" valign="bottom" char=".">&#x2212;1.83</td>
<td align="char" valign="bottom" char=".">1.67</td>
<td align="char" valign="bottom" char=".">[&#x2212;4.88, 0.25]</td>
<td align="char" valign="bottom" char=".">11.11</td>
<td align="char" valign="bottom" char=".">0.92</td>
</tr>
<tr>
<td align="left" valign="top">2. nonMP&#x2013;vowelMP &#x003E; 0</td>
<td align="char" valign="top" char=".">&#x2212;0.63</td>
<td align="char" valign="top" char=".">1.80</td>
<td align="char" valign="top" char=".">[&#x2212;3.90, 1.21]</td>
<td align="char" valign="top" char=".">0.64</td>
<td align="char" valign="top" char=".">0.39</td>
</tr>
<tr>
<td align="left" valign="top">3. vowelMP&#x2013;consMP &#x003E; 0</td>
<td align="char" valign="top" char=".">1.20</td>
<td align="char" valign="top" char=".">2.47</td>
<td align="char" valign="top" char=".">[&#x2212;2.63, 4.87]</td>
<td align="char" valign="top" char=".">3.27</td>
<td align="char" valign="top" char=".">3.27</td>
</tr>
<tr>
<td align="left" valign="top" colspan="6"><italic>MPT ability &#x003E; 0</italic> in the following conditions and contrasts:</td>
</tr>
<tr>
<td align="left" valign="top">4. nonMP</td>
<td align="char" valign="top" char=".">0.41</td>
<td align="char" valign="top" char=".">0.70</td>
<td align="char" valign="top" char=".">[&#x2212;0.50, 1.76]</td>
<td align="char" valign="top" char=".">2.47</td>
<td align="char" valign="top" char=".">0.71</td>
</tr>
<tr>
<td align="left" valign="top">5. consMP</td>
<td align="char" valign="top" char=".">2.97</td>
<td align="char" valign="top" char=".">1.48</td>
<td align="char" valign="top" char=".">[0.80, 5.55]</td>
<td align="char" valign="top" char=".">91.78</td>
<td align="char" valign="top" char=".">0.99</td>
</tr>
<tr>
<td align="left" valign="top">6. vowelMP</td>
<td align="char" valign="top" char=".">&#x2212;0.88</td>
<td align="char" valign="top" char=".">0.85</td>
<td align="char" valign="top" char=".">[&#x2212;2.08, 0.17]</td>
<td align="char" valign="top" char=".">12.10</td>
<td align="char" valign="top" char=".">0.92</td>
</tr>
<tr>
<td align="left" valign="top">7. consMP&#x2013;nonMP</td>
<td align="char" valign="top" char=".">2.55</td>
<td align="char" valign="top" char=".">1.53</td>
<td align="char" valign="top" char=".">[0.28, 5.20]</td>
<td align="char" valign="top" char=".">30.61</td>
<td align="char" valign="top" char=".">0.97</td>
</tr>
<tr>
<td align="left" valign="top">8. nonMP&#x2013;vowelMP</td>
<td align="char" valign="top" char=".">1.30</td>
<td align="char" valign="top" char=".">1.05</td>
<td align="char" valign="top" char=".">[&#x2212;0.07, 3.04]</td>
<td align="char" valign="top" char=".">16.37</td>
<td align="char" valign="top" char=".">0.94</td>
</tr>
<tr>
<td align="left" valign="top">9. consMP&#x2013;vowelMP</td>
<td align="char" valign="top" char=".">3.85</td>
<td align="char" valign="top" char=".">1.69</td>
<td align="char" valign="top" char=".">[1.38, 6.68]</td>
<td align="char" valign="top" char=".">92.75</td>
<td align="char" valign="top" char=".">0.99</td>
</tr>
<tr>
<td align="left" valign="top" colspan="6"><italic>MDT ability &#x003E; 0</italic> in the following conditions and contrasts:</td>
</tr>
<tr>
<td align="left" valign="top">10. nonMP</td>
<td align="char" valign="top" char=".">0.95</td>
<td align="char" valign="top" char=".">0.42</td>
<td align="char" valign="top" char=".">[0.28, 1.63]</td>
<td align="char" valign="top" char=".">78.30</td>
<td align="char" valign="top" char=".">0.99</td>
</tr>
<tr>
<td align="left" valign="top">11. consMP</td>
<td align="char" valign="top" char=".">&#x2212;0.08</td>
<td align="char" valign="top" char=".">0.84</td>
<td align="char" valign="top" char=".">[&#x2212;1.45, 1.11]</td>
<td align="char" valign="top" char=".">0.92</td>
<td align="char" valign="top" char=".">0.48</td>
</tr>
<tr>
<td align="left" valign="top">12. vowelMP</td>
<td align="char" valign="top" char=".">1.16</td>
<td align="char" valign="top" char=".">0.93</td>
<td align="char" valign="top" char=".">[&#x2212;0.07, 2.65]</td>
<td align="char" valign="top" char=".">16.33</td>
<td align="char" valign="top" char=".">0.94</td>
</tr>
<tr>
<td align="left" valign="top">13. nonMP&#x2013;consMP</td>
<td align="char" valign="top" char=".">1.04</td>
<td align="char" valign="top" char=".">0.89</td>
<td align="char" valign="top" char=".">[&#x2212;0.24, 2.52]</td>
<td align="char" valign="top" char=".">10.06</td>
<td align="char" valign="top" char=".">0.91</td>
</tr>
<tr>
<td align="left" valign="top">14. vowelMP&#x2013;nonMP</td>
<td align="char" valign="top" char=".">0.21</td>
<td align="char" valign="top" char=".">0.98</td>
<td align="char" valign="top" char=".">[&#x2212;1.78, 1.14]</td>
<td align="char" valign="top" char=".">1.44</td>
<td align="char" valign="top" char=".">0.59</td>
</tr>
<tr>
<td align="left" valign="top">15. vowelMP&#x2013;consMP</td>
<td align="char" valign="top" char=".">1.24</td>
<td align="char" valign="top" char=".">1.23</td>
<td align="char" valign="top" char=".">[&#x2212;0.54, 3.29]</td>
<td align="char" valign="top" char=".">7.63</td>
<td align="char" valign="top" char=".">0.88</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Estimate&#x2009;=&#x2009;mean of the effect&#x2019;s posterior distribution. Estimate error&#x2009;=&#x2009;standard deviation of the posterior distribution. 90% CI&#x2009;=&#x2009;90% credibility intervals. Evidence ratio&#x2009;=&#x2009;the posterior probability under the hypothesis against its alternative</italic>.</p>
</table-wrap-foot>
</table-wrap>
<fig position="float" id="fig3">
<label>Figure 3</label>
<caption><p>Conditional effects of MPT ability and pair type on mean accuracy with 95% credibility intervals.</p></caption>
<graphic xlink:href="fpsyg-13-801263-g003.tif"/>
</fig>
<fig position="float" id="fig4">
<label>Figure 4</label>
<caption><p>Conditional effects of MDT ability and pair type on mean accuracy with 95% credibility intervals.</p></caption>
<graphic xlink:href="fpsyg-13-801263-g004.tif"/>
</fig>
<p>Regarding the extent to which the effect of MPT and MDT differs by pair type, unexpectedly, we find very strong evidence that MPT ability has a stronger impact on accuracy for consMPs than for nonMPs and vowelMPs (see <xref rid="tab1" ref-type="table">Table 1</xref>; hypotheses 7 and 9) and strong evidence for nonMPs compared to vowelMPs (see <xref rid="tab1" ref-type="table">Table 1</xref>; hypothesis 8). Thus, not only does MPT negatively influence the learning of vowelMPs as shown in hypothesis 4, but it also impacts the learning of vowelMPs less strongly than the learning of nonMPs and consMPs. Our finding of strong evidence suggesting that MDT ability has a stronger impact on accuracy for nonMPs and vowelMPs compared to consMPs (see <xref rid="tab1" ref-type="table">Table 1</xref>; hypotheses 13 and 15) was also unexpected, as we thought MDT would influence the learning of all pair types equally.</p>
</sec>
<sec id="sec13" sec-type="discussions">
<title>Discussion</title>
<p>In this study, we tested whether music perception abilities impact the learning of novel word pairs in a CSWL paradigm that provides no explicit instruction during the learning phase. Overall, we found that participants were able to learn all novel word-object pairings regardless of the phonological overlap between the novel words, mostly replicating (albeit a little lower) previous reported results using the same online protocol (<xref ref-type="bibr" rid="ref44">Escudero et al., 2021</xref>). That is, overall accuracy was comparable for novel words that had large phonological differences, forming non-minimal pairs (nonMPs), and for words that differed in a single consonant (consMPs) or a single vowel (vowelMPs). Regarding the relation between accuracy and music perception abilities, participants with average MPT and MDT had similar word learning scores across pair types, with performance for consMP probably being slightly lower than for the other pair types. Crucially, we found unexpected results for how MPT and MDT influenced word learning performance in nonMPs versus consonant and vowelMPs, which we discuss below.</p>
<p>As mentioned above, although we expected higher MPT participants to learn vowel contrasts more easily due to the acoustic similarities between pitched musical sounds and vowels (consonants do not have a clear pitch), we found the opposite effect. It appears that stimuli containing variable pitch information (such as vowels) pose extra difficulty for listeners who are more attuned to such information. A plausible explanation for these results is proposed by <xref ref-type="bibr" rid="ref88">Ong et al. (2017b)</xref> who suggest that listeners&#x2019; experience is important for their ability to learn new acoustic cues, whether this experience is linguistic (through a native language that distinguishes lexical tone contrasts, such as Cantonese, Mandarin, or Thai) or musical. In a <italic>distributional learning</italic> (a form of statistical learning) experiment of nonnative lexical tones, they found that listeners without music or tonal language experience were able to discriminate lexical tones from ambiguous versions of the target tones after a short exposure (<xref ref-type="bibr" rid="ref87">Ong et al., 2015</xref>). In a follow-up study, they found mixed results for <italic>pitch experts</italic>, who they define as listeners with extensive experience with pitch either through a tonal language or through musical training. Those with a tonal language background were able to learn non-native lexical tones distributionally but those with a musical background were not. This was unexpected as musical training has been found to have a positive effect on statistical learning (e.g., <xref ref-type="bibr" rid="ref47">Fran&#x00E7;ois et al., 2013</xref>; <xref ref-type="bibr" rid="ref20">Chobert et al., 2014</xref>), and musicians were expected to perform better due to an improved ability to extract regularities from the input. These results led Ong and colleagues to conclude that domain-specific experience with pitch influences the ability to learn non-native lexical tones distributionally (<xref ref-type="bibr" rid="ref88">Ong et al., 2017b</xref>), indicating no cross-domain transfer of music and linguistic abilities in distributional learning.</p>
<p>Ong and colleagues discussed their results in relation to the Second Language Perception (L2LP) model (<xref ref-type="bibr" rid="ref500">Escudero, 2005</xref>; <xref ref-type="bibr" rid="ref129">van Leussen and Escudero, 2015</xref>; <xref ref-type="bibr" rid="ref34">Elvin and Escudero, 2019</xref>; <xref ref-type="bibr" rid="ref35">Elvin et al., 2020</xref>, <xref ref-type="bibr" rid="ref36">2021</xref>; <xref ref-type="bibr" rid="ref137">Yazawa et al., 2020</xref>), suggesting that the tonal language speakers only had to shift their category boundaries to the novel tonal categories, whereas the musicians had to create new categories, which is more difficult (<xref ref-type="bibr" rid="ref88">Ong et al., 2017b</xref>). Another possible explanation is that musicians did not consider the stimuli as speech tones and thus may have processed them as musical stimuli resulting in them not learning the tonal categories (<xref ref-type="bibr" rid="ref88">Ong et al., 2017b</xref>), but this argument assumes that musical pitch cannot be learned distributionally. In a different study, <xref ref-type="bibr" rid="ref89">Ong et al., (2017a)</xref> tested distributional learning of musical pitch with nonmusicians and showed that they were able to acquire pitch from a novel musical system in this manner. This may be different for musicians, who were found to outperform nonmusicians in the discrimination and identification of Cantonese lexical tones (<xref ref-type="bibr" rid="ref91">Ong et al., 2020</xref>).</p>
<p>From studies on distributional learning of pitch and lexical tones, it can be concluded that cross-domain transfer between speech and music largely depends on the listener&#x2019;s musical or linguistic experience (<xref ref-type="bibr" rid="ref87">Ong et al., 2015</xref>, <xref ref-type="bibr" rid="ref90">2016</xref>, <xref ref-type="bibr" rid="ref89">2017a</xref>,<xref ref-type="bibr" rid="ref88">b</xref>, <xref ref-type="bibr" rid="ref91">2020</xref>). Nonmusicians without tonal language experience can learn novel pitch contrasts in both the speech and the music domain, but the situation is more complex for pitch experts, suggesting that those with extensive music experience may struggle more than those with tonal experience. However, an important difference between Ong et al.&#x2019;s studies and the current study is that they tested listeners at both ends of the experience spectrum, while we tested listeners ranging from the lower to middle end of the music experience spectrum based on their music perception skills. By using music perception tasks, we were able to classify participants using a continuous predictor rather than splitting them into groups, which allowed us to uncover more detailed information about what happens with speech learning as music perception skills increase. A further difference is in the stimuli used, as the lexical and musical tones used in <xref ref-type="bibr" rid="ref87">Ong et al. (2015</xref>, <xref ref-type="bibr" rid="ref90">2016</xref>, <xref ref-type="bibr" rid="ref89">2017a</xref>,<xref ref-type="bibr" rid="ref88">b</xref>, <xref ref-type="bibr" rid="ref91">2020</xref>) contained many variable pitches along a continuum, while our stimuli had limited and uncontrolled pitch variation. Specifically, we focused on word learning of naturally produced novel words, where pitch variability was not consistent among the different words and pair types. Thus, listeners in the present study may have used other acoustic cues that are not pitch-related to discriminate and learn the novel words.</p>
<p>Given that listeners with strong pitch perception abilities are more likely to use pitch as a cue to discriminate between stimuli (<xref ref-type="bibr" rid="ref100">Perfors and Ong, 2012</xref>; <xref ref-type="bibr" rid="ref88">Ong et al., 2017b</xref>, <xref ref-type="bibr" rid="ref91">2020</xref>), our vowelMP stimuli may have been particularly challenging for them due to the use of infant-directed speech (IDS). IDS is the speech style or register typically used by mothers and caregivers when speaking to babies and is characterized by the use of larger pitch variations. Many studies have shown that IDS can facilitate word learning in infants (<xref ref-type="bibr" rid="ref76">Ma et al., 2011</xref>; <xref ref-type="bibr" rid="ref51">Graf Estes and Hurley, 2013</xref>) and adults (<xref ref-type="bibr" rid="ref49">Golinkoff and Alioto, 1995</xref>) due to higher salience leading to enhanced attentional processing (<xref ref-type="bibr" rid="ref49">Golinkoff and Alioto, 1995</xref>; <xref ref-type="bibr" rid="ref66">Kuhl et al., 1997</xref>; <xref ref-type="bibr" rid="ref56">Houston-Price and Law, 2013</xref>; <xref ref-type="bibr" rid="ref31">Ellis, 2016</xref>). Despite it facilitating infant and adult speech learning, IDS may have a negative effect for those with strong musical perception abilities as they might think they are hearing different words due to varying pitch contours when only one word is presented. Unexpectedly, MPT ability affected learning of cMPS and nonMPs more than vMPs. As vMPs naturally contain more pitch variation, those were expected to be the most difficult to learn, hence the influence of IDS is likely stronger on cMPS and nonMPs than on vMPs. A similar result of hearing multiple words instead of one due to the use of IDS has been found in a prior CSWL study (Escudero et al., under review), where the target population consisted of native Mandarin speakers who were L2 English learners. Specifically, word pairs containing non-native vowel contrasts with IDS pitch fluctuations were difficult to learn for L1 Mandarin L2 English learners.</p>
<p>Thus, in populations where pitch variations indicate different lexical meanings, such as native speakers of Mandarin (<xref ref-type="bibr" rid="ref200">Han, 2018</xref>), IDS can be problematic and impair word learning as participants might perceive multiple categories where only one is presented (<xref ref-type="bibr" rid="ref38">Escudero and Boersma, 2002</xref>; <xref ref-type="bibr" rid="ref400">Elvin et al., 2014</xref>; <xref ref-type="bibr" rid="ref129">van Leussen and Escudero, 2015</xref>). The impact of a learner&#x2019;s native language on novel language learning has been explained by L2 speech theories (e.g., <xref ref-type="bibr" rid="ref46">Flege, 1995</xref>; <xref ref-type="bibr" rid="ref500">Escudero, 2005</xref>; <xref ref-type="bibr" rid="ref7">Best and Tyler, 2007</xref>; <xref ref-type="bibr" rid="ref129">van Leussen and Escudero, 2015</xref>). In particular, the L2LP model (<xref ref-type="bibr" rid="ref500">Escudero, 2005</xref>; <xref ref-type="bibr" rid="ref129">van Leussen and Escudero, 2015</xref>; <xref ref-type="bibr" rid="ref34">Elvin and Escudero, 2019</xref>; <xref ref-type="bibr" rid="ref35">Elvin et al., 2020</xref>, <xref ref-type="bibr" rid="ref36">2021</xref>; <xref ref-type="bibr" rid="ref137">Yazawa et al., 2020</xref>) proposes three learning problems when L1 and L2 categories differ in number or in phonetic realization. This model is the only one that handles lexical development and word learning with consideration of hearing more differences than produced in the target language as a learning problem (<xref ref-type="bibr" rid="ref129">van Leussen and Escudero, 2015</xref>; <xref ref-type="bibr" rid="ref39">Escudero and Hayes-Harb, 2021</xref>). Specifically, listeners can categorize binary L2 contrasts into more than two L1 categories, which is referred to as Multiple Category Assimilation (MCA, L2LP; <xref ref-type="bibr" rid="ref38">Escudero and Boersma, 2002</xref>) and can lead to a <italic>subset problem</italic> (<xref ref-type="bibr" rid="ref38">Escudero and Boersma, 2002</xref>; <xref ref-type="bibr" rid="ref500">Escudero, 2005</xref>; <xref ref-type="bibr" rid="ref300">Elvin and Escudero, 2014</xref>, <xref ref-type="bibr" rid="ref34">2019</xref>). A subset problem occurs when an L2 category does not exist in a listener&#x2019;s L1 but is acoustically similar to two or more separate L1 categories and thus is perceived as more than one L1 sound, with no overt information from the target language that will allow the learner to stop hearing the extra category or stop activating <italic>irrelevant</italic> or <italic>spurious</italic> lexical items (<xref ref-type="bibr" rid="ref38">Escudero and Boersma, 2002</xref>; <xref ref-type="bibr" rid="ref500">Escudero, 2005</xref>; <xref ref-type="bibr" rid="ref300">Elvin and Escudero, 2014</xref>, <xref ref-type="bibr" rid="ref34">2019</xref>).</p>
<p>With regard to our CSWL task, we expect that using <italic>adult-directed speech</italic> (ADS) without these additional pitch fluctuations would improve learning for the nonMPs and consMPs for tonal language speakers, but not for vowelMPs. When using IDS, nonmusicians and non-tonal speakers show a pattern where performance is lowest for pair types with the highest pitch variability (i.e., vowelMPs). The use of IDS, which adds even more pitch variability than naturally present in the vowelMPs, seems to pose problems for learners who are not music experts but have some music perception skills. For tonal language speakers, the use of IDS poses problems in general as they consistently use pitch information to discriminate between all pair types. If pitch variability is the main predictor for performance in this CSWL task, then music experts (i.e., musicians) should struggle more with the vowelMPs than the nonmusicians tested here but should perform better for the nonMPs and consMPs than the tonal language speakers discussed earlier in Escudero et al. (under review).</p>
<p>Regarding the results for MDT, although not decisive, the evidence suggests that MDT ability more strongly influences accuracy for nonMPs and vowelMPs compared to consMPs. The MDT ability test focuses heavily on auditory short-term memory (<xref ref-type="bibr" rid="ref30">Dowling, 1978</xref>; <xref ref-type="bibr" rid="ref54">Harrison et al., 2017</xref>; <xref ref-type="bibr" rid="ref55">Harrison and M&#x00FC;llensiefen, 2018</xref>). It has been suggested that auditory short-term memory for consonants is distinct from that for vowels (<xref ref-type="bibr" rid="ref104">Pisoni, 1975</xref>), as explained by the cue-duration hypothesis (<xref ref-type="bibr" rid="ref103">Pisoni, 1973</xref>), which suggests that the acoustic features needed to discriminate between two different consonants are shorter and thus less well represented in auditory short-term memory than those of vowels (<xref ref-type="bibr" rid="ref19">Chen et al., 2020</xref>). As well, seminal studies on speech sounds have suggested that consonants may be stored differently in short-term memory compared to vowels (<xref ref-type="bibr" rid="ref21">Crowder, 1971</xref>, <xref ref-type="bibr" rid="ref22">1973a</xref>,<xref ref-type="bibr" rid="ref23">b</xref>), with the idea that vowels are processed at an earlier stage compared to consonants (<xref ref-type="bibr" rid="ref24">Crowder and Morton, 1969</xref>). It is possible that a different type of auditory memory is activated for nonMPs, which does not rely as strongly on the discrimination of the acoustic features of the stimuli than what is needed to distinguish between phonologically overlapping stimuli. As similarly suggested in <xref ref-type="bibr" rid="ref44">Escudero et al. (2021)</xref>, this could be tested using time-sensitive neurophysiological methods, such as electroencephalography (EEG).</p>
<p>Some limitations of this study must be noted. Even though we tested for perceptual skills, it is possible that accuracy also depends on other skills, such as how well a listener is able to do crossmodal associations. Likewise, it is possible that general cognitive abilities may impact the learning of novel words in an ambiguous word learning paradigm. As we find some differences between accuracy for the different pair types in the current study and prior CSWL studies using the same paradigm (<xref ref-type="bibr" rid="ref90">Escudero et al., 2016</xref>; <xref ref-type="bibr" rid="ref84">Mulak et al., 2019</xref>), it might seem that individual differences, such as the ability to do crossmodal associations or general cognitive abilities, may be the cause of these differences. However, there are other possible sources between the current study and prior CSWL results that might have led to the differences between studies, such as the number of trials and the number of responses used in the learning and test phases. We are currently replicating learning and testing phases from those previous studies using online testing to see if the number of trials is the source of the difference. If this is not the case, future studies can then look further into other possible sources, such as general cognitive abilities. Regarding the use of IDS, it is an empirical question whether adults in general will perform better with stimuli characterized by shorter durations, and non-enhanced differences between vowels and neutral prosodic contours (such as ADS). On the contrary, we found that enhanced vowel differences that are similar to those typical of IDS facilitate phonetic discrimination for adults listeners (<xref ref-type="bibr" rid="ref37">Escudero et al., 2011</xref>; <xref ref-type="bibr" rid="ref45">Escudero and Williams, 2014</xref>). Additionally, there is a possibility that the degree of novelty of the auditory and visual stimuli impacts accuracy responses. Even though language background did not have an influence on accuracy, future studies could consider implementing measuring participants&#x2019; familiarity with the stimuli. Another possible limitation is that we did not collect information regarding participants&#x2019; headphones. However, we did check whether participants were able to hear the stimuli and were wearing headphones, as part of our pre-registered protocol.</p>
<p>Overall, the results show that the tested music perception abilities impact the learning of words that differ in a single consonant or vowel differently and in complex ways. Pitch perception is an important factor for novel word learning, to the extent that those with stronger pitch perception skills are better at distinguishing consonant contrasts, and apparently <italic>too</italic> good at distinguishing vowel contrasts. Using stimuli produced in adult-directed-speech, our follow-up research will establish whether the negative correlation between pitch perception and accuracy in words distinguished by a single vowel is due to our use of IDS and its concomitant large pitch variations. We also find that consonants and vowels are learned differently for those with melodic discrimination skills, reflected in improved auditory short-term memory. In contrast to MPT, an increase in MDT leads to better learning of words distinguished by a single vowel than those distinguished by a single consonant, which may be connected to better auditory short-term memory for vowels. The contrasting results for the two tested music perception skills may reflect different stages of processing. Our results have one clear implication for theories of cross-domain transfer between music and language: considering populations along the entire spectrum of musicality and linguistic pitch experiences is the only way to uncover exactly where and when problems with word learning occur.</p>
</sec>
<sec id="sec14" sec-type="conclusions">
<title>Conclusion</title>
<p>We tested whether specific music perception abilities impact learning of minimal pair types in adults that have not been selected for their musical abilities. Using a CSWL paradigm, we have shown that pitch perception and auditory working memory affect the learning of vowel and consonant minimal word pairs, but vowels and consonants are impacted differently. We suggest this may be due to the pitch fluctuations of the specific characteristic of stimuli, namely, words produced in infant-directed speech (IDS). Similar to the patterns observed in native speakers of tonal languages, this type of speech register may lead to the listeners&#x2019; perception of more distinctions than intended. In future studies, we aim to test the role of IDS compared to adult-directed speech, how specific levels of training in music impact performance in CSWL, and the differential storage of vowels versus consonants.</p>
</sec>
<sec id="sec15" sec-type="data-availability">
<title>Data Availability Statement</title>
<p>The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.</p>
</sec>
<sec id="sec16">
<title>Ethics Statement</title>
<p>The studies involving human participants were reviewed and approved by the Western Sydney University Human Research Ethics Committee (H11022). The participants provided their written informed consent to participate in this study.</p>
</sec>
<sec id="sec17">
<title>Author Contributions</title>
<p>ES and PE conceived the initial experiments. ES was responsible for overseeing data collection and wrote the initial draft. ES and AM analyzed the data. ES, AM, and PE wrote the paper. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec id="sec41" sec-type="funding-information">
<title>Funding</title>
<p>Data collection was funded by a Transdisciplinary &#x0026; Innovation Grant from the Australian Research Centre for the Dynamics of Language (project number TIG1112020) awarded to ES. PE&#x2019;s and ES&#x2019; work and the article publication fees were funded by an Australian Research Council Future Fellowship (FT160100514) awarded to PE. AM&#x2019;s work was funded by an Australian Research Council Discovery Early Career Researcher Award (project number DE170100353).</p>
</sec>
<sec id="conf1" sec-type="COI-statement">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="sec240" sec-type="disclaimer">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ack>
<p>We would like to acknowledge and thank Deeahn Sako, Madeleine Leehy, and Christopher Piller for their help with data collection and the participants for their time and participation.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="ref1"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Agresti</surname> <given-names>A.</given-names></name></person-group> (<year>2010</year>). <source>Analysis of Ordinal Categorical Data.</source> <publisher-loc>New York</publisher-loc>: <publisher-name>John Wiley &#x0026; Sons</publisher-name>.</citation></ref>
<ref id="ref2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Anvari</surname> <given-names>S. H.</given-names></name> <name><surname>Trainor</surname> <given-names>L. J.</given-names></name> <name><surname>Woodside</surname> <given-names>J.</given-names></name> <name><surname>Levy</surname> <given-names>B. A.</given-names></name></person-group> (<year>2002</year>). <article-title>Relations among musical skills, phonological processing, and early reading ability in pre-school children</article-title>. <source>J. Exp. Psychol.</source> <volume>83</volume>, <fpage>111</fpage>&#x2013;<lpage>130</lpage>. doi: <pub-id pub-id-type="doi">10.1016/s0022-0965(02)00124-8</pub-id></citation></ref>
<ref id="ref3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barwick</surname> <given-names>J.</given-names></name> <name><surname>Valentine</surname> <given-names>E.</given-names></name> <name><surname>West</surname> <given-names>R.</given-names></name> <name><surname>Wilding</surname> <given-names>J.</given-names></name></person-group> (<year>1989</year>). <article-title>Relations between reading and musical abilities</article-title>. <source>Br. J. Educ. Psychol.</source> <volume>59</volume>, <fpage>253</fpage>&#x2013;<lpage>257</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.2044-8279.1989.tb03097.x</pub-id>, PMID: <pub-id pub-id-type="pmid">2789961</pub-id></citation></ref>
<ref id="ref4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bergman Nutley</surname> <given-names>S.</given-names></name> <name><surname>Darki</surname> <given-names>F.</given-names></name> <name><surname>Klingberg</surname> <given-names>T.</given-names></name></person-group> (<year>2014</year>). <article-title>Music practice is associated with development of working memory during childhood and adolescence</article-title>. <source>Front. Hum. Neurosci.</source> <volume>7</volume>:<fpage>926</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fnhum.2013.00926</pub-id>, PMID: <pub-id pub-id-type="pmid">24431997</pub-id></citation></ref>
<ref id="ref5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Besson</surname> <given-names>M.</given-names></name> <name><surname>Chobert</surname> <given-names>J.</given-names></name> <name><surname>Marie</surname> <given-names>C.</given-names></name></person-group> (<year>2011</year>). <article-title>Transfer of training between music and speech: common processing, attention, and memory</article-title>. <source>Front. Psychol.</source> <volume>2</volume>:<fpage>94</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2011.00094</pub-id></citation></ref>
<ref id="ref6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Besson</surname> <given-names>M.</given-names></name> <name><surname>Sch&#x00F6;n</surname> <given-names>D.</given-names></name> <name><surname>Moreno</surname> <given-names>S.</given-names></name> <name><surname>Santos</surname> <given-names>A.</given-names></name> <name><surname>Magne</surname> <given-names>C.</given-names></name></person-group> (<year>2007</year>). <article-title>Influence of musical expertise and musical training on pitch processing in music and language</article-title>. <source>Restor. Neurol. Neurosci.</source> <volume>25</volume>, <fpage>399</fpage>&#x2013;<lpage>410</lpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0089642</pub-id>, PMID: <pub-id pub-id-type="pmid">17943015</pub-id></citation></ref>
<ref id="ref7"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Best</surname> <given-names>C. T.</given-names></name> <name><surname>Tyler</surname> <given-names>M. D.</given-names></name></person-group> (<year>2007</year>). &#x201C;<article-title>Nonnative and second-language speech perception: commonalities and complementaries</article-title>,&#x201D; in <source>Language Experience in Second Language Speech Learning: In Honor of James Emil Flege.</source> eds. <person-group person-group-type="editor"><name><surname>Bohn</surname> <given-names>O.-S.</given-names></name> <name><surname>Munro</surname> <given-names>M. J.</given-names></name></person-group> (<publisher-loc>Amsterdam</publisher-loc>: <publisher-name>John Benjamins</publisher-name>), <fpage>13</fpage>&#x2013;<lpage>34</lpage>.</citation></ref>
<ref id="ref8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bidelman</surname> <given-names>G. M.</given-names></name> <name><surname>Alain</surname> <given-names>C.</given-names></name></person-group> (<year>2015</year>). <article-title>Musical training orchestrates coordinated neuroplasticity in auditory brainstem and cortex to counteract age-related declines in categorical vowel perception</article-title>. <source>J. Neurosci.</source> <volume>35</volume>, <fpage>1240</fpage>&#x2013;<lpage>1249</lpage>. doi: <pub-id pub-id-type="doi">10.1523/jNEUROSCIE.3292-14.2015</pub-id>, PMID: <pub-id pub-id-type="pmid">25609638</pub-id></citation></ref>
<ref id="ref9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bidelman</surname> <given-names>G. M.</given-names></name> <name><surname>Hutka</surname> <given-names>S.</given-names></name> <name><surname>Moreno</surname> <given-names>S.</given-names></name></person-group> (<year>2013</year>). <article-title>Tone language speakers and musicians share enhanced perceptual and cognitive abilities for musical pitch: evidence for bidirectionality between the domains of language and music</article-title>. <source>PLoS One</source> <volume>8</volume>:<fpage>e60676</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0060676</pub-id></citation></ref>
<ref id="ref10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bigand</surname> <given-names>E.</given-names></name> <name><surname>Poulin-Charronnat</surname> <given-names>B.</given-names></name></person-group> (<year>2006</year>). <article-title>Are we &#x201C;experienced listeners&#x201D;? A review of the musical capacities that do not depend on formal musical training</article-title>. <source>Cognition</source> <volume>100</volume>, <fpage>100</fpage>&#x2013;<lpage>130</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cognition.2005.11.007</pub-id>, PMID: <pub-id pub-id-type="pmid">16412412</pub-id></citation></ref>
<ref id="ref11"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Bittner</surname> <given-names>R.</given-names></name> <name><surname>Salamon</surname> <given-names>J.</given-names></name> <name><surname>Tierney</surname> <given-names>M.</given-names></name> <name><surname>Mauch</surname> <given-names>M.</given-names></name> <name><surname>Cannam</surname> <given-names>C.</given-names></name> <name><surname>Bello</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). MedleyDB: a multitrack dataset for annotation-intensive MIR research. Paper presented at the International Society for Music Information Retrieval (ISMIR), Taipei, Taiwan.</citation></ref>
<ref id="ref12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Boebinger</surname> <given-names>D.</given-names></name> <name><surname>Evans</surname> <given-names>S.</given-names></name> <name><surname>Rosen</surname> <given-names>S.</given-names></name> <name><surname>Lima</surname> <given-names>C. F.</given-names></name> <name><surname>Manly</surname> <given-names>T.</given-names></name> <name><surname>Scott</surname> <given-names>S. K.</given-names></name></person-group> (<year>2015</year>). <article-title>Musicians and non-musicians are equally adept at perceiving masked speech</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>137</volume>, <fpage>378</fpage>&#x2013;<lpage>387</lpage>. doi: <pub-id pub-id-type="doi">10.1121/1.4904537</pub-id>, PMID: <pub-id pub-id-type="pmid">25618067</pub-id></citation></ref>
<ref id="ref13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>B&#x00FC;rkner</surname> <given-names>P.-C.</given-names></name></person-group> (<year>2017</year>). <article-title>Brms: an R package for Bayesian multilevel models using Stan</article-title>. <source>J. Stat. Softw.</source> <volume>80</volume>, <fpage>1</fpage>&#x2013;<lpage>28</lpage>. doi: <pub-id pub-id-type="doi">10.18637/jss.v080.i01</pub-id></citation></ref>
<ref id="ref14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>B&#x00FC;rkner</surname> <given-names>P.-C.</given-names></name></person-group> (<year>2018</year>). <article-title>Advanced Bayesian multilevel modeling with the R package brms</article-title>. <source>R Journal</source> <volume>10</volume>, <fpage>395</fpage>&#x2013;<lpage>411</lpage>. doi: <pub-id pub-id-type="doi">10.32614/RJ-2018-017</pub-id></citation></ref>
<ref id="ref15"><citation citation-type="other"><person-group person-group-type="author"><name><surname>B&#x00FC;rkner</surname> <given-names>P.-C.</given-names></name></person-group> (<year>2020</year>). Bayesian Item Response Modeling in R with brms and Stan. <italic>arXiv</italic> <comment>[Epub ahead of print]</comment></citation></ref>
<ref id="ref16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Burnham</surname> <given-names>D.</given-names></name> <name><surname>Brooker</surname> <given-names>R.</given-names></name> <name><surname>Reid</surname> <given-names>A.</given-names></name></person-group> (<year>2015</year>). <article-title>The effects of absolute pitch ability and musical training on lexical tone perception</article-title>. <source>Psychol. Music</source> <volume>43</volume>, <fpage>881</fpage>&#x2013;<lpage>897</lpage>. doi: <pub-id pub-id-type="doi">10.1177/0305735614546359</pub-id></citation></ref>
<ref id="ref17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chandrasekaran</surname> <given-names>B.</given-names></name> <name><surname>Krishnan</surname> <given-names>A.</given-names></name> <name><surname>Gandour</surname> <given-names>J. T.</given-names></name></person-group> (<year>2009</year>). <article-title>Relative influence of musical and linguistic experience on early cortical processing of pitch contours</article-title>. <source>Brain Lang.</source> <volume>108</volume>, <fpage>1</fpage>&#x2013;<lpage>9</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.bandl.2008.02.001</pub-id>, PMID: <pub-id pub-id-type="pmid">18343493</pub-id></citation></ref>
<ref id="ref18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chang</surname> <given-names>W.</given-names></name> <name><surname>Cheng</surname> <given-names>J.</given-names></name> <name><surname>Allaire</surname> <given-names>J. J.</given-names></name> <name><surname>Xi</surname> <given-names>Y.</given-names></name> <name><surname>McPherson</surname> <given-names>J.</given-names></name></person-group> (<year>2020</year>). <article-title>Shiny: web application framework for R</article-title>. <source>Tech. Innov. Stat. Educ.</source> <volume>1</volume>:<fpage>7492</fpage>. doi: <pub-id pub-id-type="doi">10.5070/T591027492</pub-id></citation></ref>
<ref id="ref19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>S.</given-names></name> <name><surname>Zhu</surname> <given-names>Y.</given-names></name> <name><surname>Wayland</surname> <given-names>R.</given-names></name> <name><surname>Yang</surname> <given-names>Y.</given-names></name></person-group> (<year>2020</year>). <article-title>How musical experience affects tone perception efficiency by musicians of tonal and non-tonal speakers?</article-title> <source>PLoS One</source> <volume>15</volume>:<fpage>e0232514</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0232514</pub-id>, PMID: <pub-id pub-id-type="pmid">32384088</pub-id></citation></ref>
<ref id="ref20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chobert</surname> <given-names>J.</given-names></name> <name><surname>Francois</surname> <given-names>C.</given-names></name> <name><surname>Velay</surname> <given-names>J. L.</given-names></name> <name><surname>Besson</surname> <given-names>M.</given-names></name></person-group> (<year>2014</year>). <article-title>Twelve months of active musical training in 8- to 10-year-old children enhances the preattentive processing of syllabic duration and voice onset time</article-title>. <source>Cereb. Cortex</source> <volume>24</volume>, <fpage>956</fpage>&#x2013;<lpage>967</lpage>. doi: <pub-id pub-id-type="doi">10.1093/cercor/bhs377</pub-id>, PMID: <pub-id pub-id-type="pmid">23236208</pub-id></citation></ref>
<ref id="ref21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Crowder</surname> <given-names>R. G.</given-names></name></person-group> (<year>1971</year>). <article-title>The sound of vowels and consonants in immediate memory</article-title>. <source>J. Verb. Learni. Behav.</source> <volume>10</volume>, <fpage>587</fpage>&#x2013;<lpage>596</lpage>. doi: <pub-id pub-id-type="doi">10.1016/S0022-5371(71)80063-4</pub-id></citation></ref>
<ref id="ref22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Crowder</surname> <given-names>R. G.</given-names></name></person-group> (<year>1973a</year>). <article-title>Representation of speech sounds in precategorical acoustic storage</article-title>. <source>J. Exp. Psychol.</source> <volume>98</volume>, <fpage>14</fpage>&#x2013;<lpage>24</lpage>. doi: <pub-id pub-id-type="doi">10.1037/h0034286</pub-id>, PMID: <pub-id pub-id-type="pmid">4704206</pub-id></citation></ref>
<ref id="ref23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Crowder</surname> <given-names>R. G.</given-names></name></person-group> (<year>1973b</year>). <article-title>Precategorical acoustic storage for vowels of short and long duration</article-title>. <source>Percept. Psychophys.</source> <volume>13</volume>, <fpage>502</fpage>&#x2013;<lpage>506</lpage>. doi: <pub-id pub-id-type="doi">10.3758/BF03205809</pub-id></citation></ref>
<ref id="ref24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Crowder</surname> <given-names>R. G.</given-names></name> <name><surname>Morton</surname> <given-names>J.</given-names></name></person-group> (<year>1969</year>). <article-title>Precategorical acoustic storage (PAS)</article-title>. <source>Percept. Psychophys.</source> <volume>5</volume>, <fpage>365</fpage>&#x2013;<lpage>373</lpage>. doi: <pub-id pub-id-type="doi">10.3758/BF03210660</pub-id></citation></ref>
<ref id="ref25"><citation citation-type="book"><person-group person-group-type="author"><name><surname>de Ayala</surname> <given-names>R. J.</given-names></name></person-group> (<year>2009</year>). <source>The Theory and Practice of Item Response Theory.</source> <publisher-loc>New York, NY</publisher-loc>: <publisher-name>The Guilford Press</publisher-name>.</citation></ref>
<ref id="ref26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deg&#x00E9;</surname> <given-names>F.</given-names></name> <name><surname>Schwarzer</surname> <given-names>G.</given-names></name></person-group> (<year>2011</year>). <article-title>The effect of a music program on phonological awareness in preschoolers</article-title>. <source>Front. Psychol.</source> <volume>2</volume>:<fpage>124</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2011.00124</pub-id>, PMID: <pub-id pub-id-type="pmid">21734895</pub-id></citation></ref>
<ref id="ref27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dittinger</surname> <given-names>E.</given-names></name> <name><surname>Barbaroux</surname> <given-names>M.</given-names></name> <name><surname>D&#x2019;Imperio</surname> <given-names>M.</given-names></name> <name><surname>J&#x00E4;ncke</surname> <given-names>L.</given-names></name> <name><surname>Elmer</surname> <given-names>S.</given-names></name> <name><surname>Besson</surname> <given-names>M.</given-names></name></person-group> (<year>2016</year>). <article-title>Professional music training and novel word learning: from faster semantic encoding to longer-lasting word representations</article-title>. <source>J. Cogn. Neurosci.</source> <volume>28</volume>, <fpage>1584</fpage>&#x2013;<lpage>1602</lpage>. doi: <pub-id pub-id-type="doi">10.1162/jocn_a_00997</pub-id>, PMID: <pub-id pub-id-type="pmid">27315272</pub-id></citation></ref>
<ref id="ref28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dittinger</surname> <given-names>E.</given-names></name> <name><surname>Chobert</surname> <given-names>J.</given-names></name> <name><surname>Ziegler</surname> <given-names>J. C.</given-names></name> <name><surname>Besson</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>Fast brain plasticity during word learning in musically-trained children</article-title>. <source>Front. Hum. Neurosci.</source> <volume>11</volume>:<fpage>233</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fnum.2017.00233</pub-id>, PMID: <pub-id pub-id-type="pmid">28553213</pub-id></citation></ref>
<ref id="ref29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dittinger</surname> <given-names>E.</given-names></name> <name><surname>Scherer</surname> <given-names>J.</given-names></name> <name><surname>J&#x00E4;ncke</surname> <given-names>L.</given-names></name> <name><surname>Besson</surname> <given-names>M.</given-names></name> <name><surname>Elmer</surname> <given-names>S.</given-names></name></person-group> (<year>2019</year>). <article-title>Testing the influence of musical expertise on novel word learning across the lifespan using a cross-sectional approach in children, young adults and older adults</article-title>. <source>Brain Lang.</source> <volume>198</volume>:<fpage>104678</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.bandl.2019.104678</pub-id>, PMID: <pub-id pub-id-type="pmid">31450024</pub-id></citation></ref>
<ref id="ref30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dowling</surname> <given-names>W. J.</given-names></name></person-group> (<year>1978</year>). <article-title>Scale and contour: two components of a theory of memory for melodies</article-title>. <source>Psychol. Rev.</source> <volume>85</volume>, <fpage>341</fpage>&#x2013;<lpage>354</lpage>. doi: <pub-id pub-id-type="doi">10.1037/0033-295X.85.4.341</pub-id></citation></ref>
<ref id="ref31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ellis</surname> <given-names>N. C.</given-names></name></person-group> (<year>2016</year>). <article-title>Salience, cognition, language complexity, and complex adaptive systems</article-title>. <source>Stud. Second. Lang. Acquis.</source> <volume>38</volume>, <fpage>341</fpage>&#x2013;<lpage>351</lpage>. doi: <pub-id pub-id-type="doi">10.1017/S027226311600005X</pub-id></citation></ref>
<ref id="ref32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Elmer</surname> <given-names>S.</given-names></name> <name><surname>Klein</surname> <given-names>C.</given-names></name> <name><surname>K&#x00FC;hnis</surname> <given-names>J.</given-names></name> <name><surname>Liem</surname> <given-names>F.</given-names></name> <name><surname>Meyer</surname> <given-names>M.</given-names></name> <name><surname>J&#x00E4;ncke</surname> <given-names>L.</given-names></name></person-group> (<year>2014</year>). <article-title>Music and language expertise influence the categorization in musically trained and untrained subjects</article-title>. <source>Cereb. Cortex</source> <volume>22</volume>, <fpage>650</fpage>&#x2013;<lpage>658</lpage>. doi: <pub-id pub-id-type="doi">10.1093/cercor/bhr142</pub-id></citation></ref>
<ref id="ref33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Elmer</surname> <given-names>S.</given-names></name> <name><surname>Meyer</surname> <given-names>M.</given-names></name> <name><surname>J&#x00E4;ncke</surname> <given-names>L.</given-names></name></person-group> (<year>2012</year>). <article-title>Neurofunctional and behavioral correlates of phonetic and temporal categorization in musically trained and untrained subjects</article-title>. <source>Cereb. Cortex</source> <volume>22</volume>, <fpage>650</fpage>&#x2013;<lpage>658</lpage>. doi: <pub-id pub-id-type="doi">10.1093/cercor/bhr142</pub-id>, PMID: <pub-id pub-id-type="pmid">21680844</pub-id></citation></ref>
<ref id="ref300"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Elvin</surname> <given-names>J.</given-names></name> <name><surname>Escudero</surname> <given-names>P.</given-names></name></person-group> (<year>2014</year>). &#x201C;<article-title>Perception of Brazilian Portuguese Vowels by Australian English and Spanish Listeners</article-title>,&#x201D; in <source>Proceedings of the International Symposium on the Acquisition of Second Language Speech Concordia Working Papers in Applied Linguistics.</source> <volume>5</volume>, <fpage>145</fpage>&#x2013;<lpage>156</lpage>.</citation></ref>
<ref id="ref34"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Elvin</surname> <given-names>J.</given-names></name> <name><surname>Escudero</surname> <given-names>P.</given-names></name></person-group> (<year>2019</year>). &#x201C;<article-title>Cross-linguistic influence in second language speech: implications for learning and teaching</article-title>,&#x201D; in <source>Cross-Linguistic Influence: From Empirical Evidence to Classroom Practice.</source> eds. <person-group person-group-type="editor"><name><surname>Juncal Gutierrez-Mangado</surname> <given-names>M. J.</given-names></name> <name><surname>Mart&#x00ED;nez-Adri&#x00E1;n</surname> <given-names>M.</given-names></name> <name><surname>Gallardo-del-Puerto</surname> <given-names>F.</given-names></name></person-group> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>20</lpage>.</citation></ref>
<ref id="ref400"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Elvin</surname> <given-names>J.</given-names></name> <name><surname>Escudero</surname> <given-names>P.</given-names></name> <name><surname>Vasiliev</surname> <given-names>P.</given-names></name></person-group> (<year>2014</year>). <article-title>Spanish is better than English for discriminating Portuguese vowels: acoustic similarity versus vowel inventory</article-title>. <source>Front. Psychol.</source> <volume>5</volume>:<fpage>1188</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2014.01188</pub-id>, PMID: <pub-id pub-id-type="pmid">25866868</pub-id></citation></ref>
<ref id="ref35"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Elvin</surname> <given-names>J.</given-names></name> <name><surname>Williams</surname> <given-names>D.</given-names></name> <name><surname>Escudero</surname> <given-names>P.</given-names></name></person-group> (<year>2020</year>). &#x201C;<article-title>Learning to perceive, produce and recognise words in a non-native language</article-title>,&#x201D; in <source>Linguistic Approaches to Portuguese as an Additional Language.</source> eds. <person-group person-group-type="editor"><name><surname>Molsing</surname> <given-names>K. V.</given-names></name> <name><surname>Perna</surname> <given-names>C. B. L.</given-names></name> <name><surname>Iba&#x00F1;os</surname> <given-names>A. M. T.</given-names></name></person-group> (<publisher-loc>Amsterdam</publisher-loc>: <publisher-name>John Benjamins Publishing Company</publisher-name>).</citation></ref>
<ref id="ref36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Elvin</surname> <given-names>J.</given-names></name> <name><surname>Williams</surname> <given-names>D.</given-names></name> <name><surname>Shaw</surname> <given-names>J. A.</given-names></name> <name><surname>Best</surname> <given-names>C. T.</given-names></name> <name><surname>Escudero</surname> <given-names>P.</given-names></name></person-group> (<year>2021</year>). <article-title>The role of acoustic similarity and non-native categorisation in predicting non-native discrimination: Brazilian Portuguese vowels by English vs. Spanish listeners</article-title>. <source>Languages</source> <volume>6</volume>:<fpage>44</fpage>. doi: <pub-id pub-id-type="doi">10.3390/languages6010044</pub-id></citation></ref>
<ref id="ref37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Escudero</surname> <given-names>P.</given-names></name> <name><surname>Benders</surname> <given-names>T.</given-names></name> <name><surname>Wanrooij</surname> <given-names>K.</given-names></name></person-group> (<year>2011</year>). <article-title>Enhanced bimodal distributions facilitate the learning of second language vowels</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>130</volume>, <fpage>EL206</fpage>&#x2013;<lpage>EL212</lpage>. doi: <pub-id pub-id-type="doi">10.1121/1.3629144</pub-id>, PMID: <pub-id pub-id-type="pmid">21974493</pub-id></citation></ref>
<ref id="ref38"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Escudero</surname> <given-names>P.</given-names></name> <name><surname>Boersma</surname> <given-names>P.</given-names></name></person-group> (<year>2002</year>). &#x201C;The subset problem in L2 perceptual development: multiple- category assimilation by Dutch learners of Spanish.&#x201D; in <italic>Proceedings of the 26th Annual Boston University Conference on Language Development</italic>. eds. B. Skarabela, S. Fish, and A. H.-J. Do. November 2&#x2013;4, 2001. Somerville, MA: Cascadilla Press, 208&#x2013;219.</citation></ref>
<ref id="ref39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Escudero</surname> <given-names>P.</given-names></name> <name><surname>Hayes-Harb</surname> <given-names>R.</given-names></name></person-group> (<year>2021</year>). <article-title>The ontogenesis model may provide a useful guiding framework, but lacks explanatory power for the nature and development of L2 lexical representation</article-title>. <source>Biling. Lang. Congn.</source> <fpage>1</fpage>&#x2013;<lpage>2</lpage>. doi: <pub-id pub-id-type="doi">10.1017/S1366728921000602</pub-id></citation></ref>
<ref id="ref41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Escudero</surname> <given-names>P.</given-names></name> <name><surname>Mulak</surname> <given-names>K. E.</given-names></name> <name><surname>Fu</surname> <given-names>C. S.</given-names></name> <name><surname>Singh</surname> <given-names>L.</given-names></name></person-group> (<year>2016a</year>). <article-title>More limitations to monolingualism: bilinguals outperform monolinguals in implicit word learning</article-title>. <source>Front. Psychol.</source> <volume>7</volume>:<fpage>1218</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2016.01218</pub-id>, PMID: <pub-id pub-id-type="pmid">27574513</pub-id></citation></ref>
<ref id="ref42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Escudero</surname> <given-names>P.</given-names></name> <name><surname>Mulak</surname> <given-names>K. E.</given-names></name> <name><surname>Vlach</surname> <given-names>H. A.</given-names></name></person-group> (<year>2016b</year>). <article-title>Cross-situational word learning of minimal word pairs</article-title>. <source>Cogn. Sci.</source> <volume>40</volume>, <fpage>455</fpage>&#x2013;<lpage>465</lpage>. doi: <pub-id pub-id-type="doi">10.1111/cogs.12243</pub-id>, PMID: <pub-id pub-id-type="pmid">25866868</pub-id></citation></ref>
<ref id="ref43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Escudero</surname> <given-names>P.</given-names></name> <name><surname>Mulak</surname> <given-names>K. E.</given-names></name> <name><surname>Vlach</surname> <given-names>H. A.</given-names></name></person-group> (<year>2016c</year>). <article-title>Infants encode phonetic detail during cross-situational word learning</article-title>. <source>Front. Psychol.</source> <volume>7</volume>:<fpage>1419</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2016.01419</pub-id>, PMID: <pub-id pub-id-type="pmid">27708605</pub-id></citation></ref>
<ref id="ref44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Escudero</surname> <given-names>P.</given-names></name> <name><surname>Smit</surname> <given-names>E. A.</given-names></name> <name><surname>Angwin</surname> <given-names>A.</given-names></name></person-group> (<year>2021</year>). <article-title>Investigating orthographic versus auditory cross-situational word learning with online and lab-based research</article-title>. <source>PsyArXive</source>. doi: <pub-id pub-id-type="doi">10.31234/osf.io/tpn5e</pub-id> <comment>[Epub ahead of print]</comment></citation></ref>
<ref id="ref500"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Escudero</surname> <given-names>P.</given-names></name></person-group> (<year>2005</year>). <italic>Linguistic Perception and Second Language Acquisition: Explaining the Attainment of Optimal Phonological Categorization</italic>. LOT Dissertation Series 113, Utrecht University.</citation></ref>
<ref id="ref45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Escudero</surname> <given-names>P.</given-names></name> <name><surname>Williams</surname> <given-names>D.</given-names></name></person-group> (<year>2014</year>). <article-title>Distributional learning has immediate and long-lasting effects</article-title>. <source>Cognition</source> <volume>133</volume>, <fpage>408</fpage>&#x2013;<lpage>413</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cognition.2014.07.002</pub-id>, PMID: <pub-id pub-id-type="pmid">25128798</pub-id></citation></ref>
<ref id="ref46"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Flege</surname> <given-names>J. E.</given-names></name></person-group> (<year>1995</year>). Second Language Speech Learning Theory, Findings, and Problems.</citation></ref>
<ref id="ref47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fran&#x00E7;ois</surname> <given-names>C.</given-names></name> <name><surname>Chobert</surname> <given-names>J.</given-names></name> <name><surname>Besson</surname> <given-names>M.</given-names></name> <name><surname>Sch&#x00F6;n</surname> <given-names>D.</given-names></name></person-group> (<year>2013</year>). <article-title>Music training for the development of speech segmentation</article-title>. <source>Cereb. Cortex</source> <volume>23</volume>, <fpage>2038</fpage>&#x2013;<lpage>2043</lpage>. doi: <pub-id pub-id-type="doi">10.1093/cercor/bhs180</pub-id>, PMID: <pub-id pub-id-type="pmid">22784606</pub-id></citation></ref>
<ref id="ref48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gathercole</surname> <given-names>S. E.</given-names></name> <name><surname>Hitch</surname> <given-names>G. J.</given-names></name> <name><surname>Marin</surname> <given-names>A. J.</given-names></name></person-group> (<year>1997</year>). <article-title>Phonological short-term memory and new word learning in children</article-title>. <source>Dev. Psychol.</source> <volume>33</volume>, <fpage>966</fpage>&#x2013;<lpage>979</lpage>. doi: <pub-id pub-id-type="doi">10.1037/0012-1649.33.6.966</pub-id></citation></ref>
<ref id="ref49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Golinkoff</surname> <given-names>R. M.</given-names></name> <name><surname>Alioto</surname> <given-names>A.</given-names></name></person-group> (<year>1995</year>). <article-title>Infant-directed speech facilitates lexical learning in adults hearing Chinese: implications for language acquisition</article-title>. <source>J. Child Lang.</source> <volume>22</volume>, <fpage>703</fpage>&#x2013;<lpage>726</lpage>. doi: <pub-id pub-id-type="doi">10.1017/S0305000900010011</pub-id>, PMID: <pub-id pub-id-type="pmid">8789520</pub-id></citation></ref>
<ref id="ref50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gordon</surname> <given-names>R. L.</given-names></name> <name><surname>Shivers</surname> <given-names>C. M.</given-names></name> <name><surname>Wieland</surname> <given-names>E. A.</given-names></name> <name><surname>Kotz</surname> <given-names>S. A.</given-names></name> <name><surname>Yoder</surname> <given-names>P. J.</given-names></name> <name><surname>McAuley</surname> <given-names>J. D.</given-names></name></person-group> (<year>2015</year>). <article-title>Musical rhythm discrimination explains individual differences in grammar skills in children</article-title>. <source>Dev. Sci.</source> <volume>18</volume>, <fpage>635</fpage>&#x2013;<lpage>644</lpage>. doi: <pub-id pub-id-type="doi">10.1111/desc.12230</pub-id>, PMID: <pub-id pub-id-type="pmid">25195623</pub-id></citation></ref>
<ref id="ref51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Graf Estes</surname> <given-names>K.</given-names></name> <name><surname>Hurley</surname> <given-names>K.</given-names></name></person-group> (<year>2013</year>). <article-title>Infant-directed prosody helps infants map sounds to meanings</article-title>. <source>Infancy</source> <volume>18</volume>, <fpage>797</fpage>&#x2013;<lpage>824</lpage>. doi: <pub-id pub-id-type="doi">10.1111/infa.12006</pub-id>, PMID: <pub-id pub-id-type="pmid">24244106</pub-id></citation></ref>
<ref id="ref52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hallam</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). <article-title>The impact of making music on aural perception and language skills: a research synthesis</article-title>. <source>Lond. Rev. Educ.</source> <volume>15</volume>, <fpage>388</fpage>&#x2013;<lpage>406</lpage>. doi: <pub-id pub-id-type="doi">10.18546/LRE.15.3.05</pub-id></citation></ref>
<ref id="ref200"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Han</surname> <given-names>M.</given-names></name> <name><surname>de Jong</surname> <given-names>N. H.</given-names></name> <name><surname>Kager</surname> <given-names>R.</given-names></name></person-group> (<year>2018</year>). <article-title>Lexical tones in mandarin Chinese infant-directed speech: age-related changes in the second year of life</article-title>. <source>Front. Psychol.</source> <volume>9</volume>:<fpage>434</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2018.00434</pub-id>, PMID: <pub-id pub-id-type="pmid">25866868</pub-id></citation></ref>
<ref id="ref53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hansen</surname> <given-names>M.</given-names></name> <name><surname>Wallentin</surname> <given-names>M.</given-names></name> <name><surname>Vuust</surname> <given-names>P.</given-names></name></person-group> (<year>2012</year>). <article-title>Working memory and musical competence of musicians and nonmusicians</article-title>. <source>Psychol. Music</source> <volume>41</volume>, <fpage>779</fpage>&#x2013;<lpage>793</lpage>. doi: <pub-id pub-id-type="doi">10.1177/0305735612452186</pub-id></citation></ref>
<ref id="ref54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Harrison</surname> <given-names>P. M. C.</given-names></name> <name><surname>Collins</surname> <given-names>T.</given-names></name> <name><surname>M&#x00FC;llensiefen</surname> <given-names>D.</given-names></name></person-group> (<year>2017</year>). <article-title>Applying modern psychometric techniques to melodic discrimination testing: item response theory, computerised adaptive testing, and automatic item generation</article-title>. <source>Sci. Rep.</source> <volume>7</volume>:<fpage>3618</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-017-03586-z</pub-id>, PMID: <pub-id pub-id-type="pmid">28620165</pub-id></citation></ref>
<ref id="ref55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Harrison</surname> <given-names>P. M. C.</given-names></name> <name><surname>M&#x00FC;llensiefen</surname> <given-names>D.</given-names></name></person-group> (<year>2018</year>). <article-title>Melodic discrimination test (MDT), psychTestR implementation</article-title>. <source>Zenodo.</source> doi: <pub-id pub-id-type="doi">10.5281/zenodo.1300950</pub-id></citation></ref>
<ref id="ref56"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Houston-Price</surname> <given-names>C.</given-names></name> <name><surname>Law</surname> <given-names>B.</given-names></name></person-group> (<year>2013</year>). &#x201C;<article-title>How experiences with words supply all the tools in the toddler&#x2019;s word &#x2013; learning toolbox</article-title>,&#x201D; in <source>Theoretical and Computational Models of Word Learning: Trends in Psychology and Artificial Intelligence.</source> eds. <person-group person-group-type="editor"><name><surname>Gogate</surname> <given-names>L.</given-names></name> <name><surname>Hollich</surname> <given-names>G.</given-names></name></person-group> (<publisher-loc>Hershey, PA</publisher-loc>: <publisher-name>IGI Global</publisher-name>), <fpage>81</fpage>&#x2013;<lpage>108</lpage>.</citation></ref>
<ref id="ref57"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Jeffreys</surname> <given-names>H.</given-names></name></person-group> (<year>1998</year>). <source>The Theory of Probability.</source> <publisher-loc>England</publisher-loc>: <publisher-name>OUP Oxford</publisher-name>.</citation></ref>
<ref id="ref58"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jentschke</surname> <given-names>S.</given-names></name> <name><surname>Koelsch</surname> <given-names>S.</given-names></name></person-group> (<year>2009</year>). <article-title>Musical training modulates the development of syntax processing in children</article-title>. <source>NeuroImage</source> <volume>47</volume>, <fpage>735</fpage>&#x2013;<lpage>744</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.neuroimage.2009.04.090</pub-id>, PMID: <pub-id pub-id-type="pmid">19427908</pub-id></citation></ref>
<ref id="ref59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jonaitis</surname> <given-names>E. M.</given-names></name> <name><surname>Saffran</surname> <given-names>J. R.</given-names></name></person-group> (<year>2009</year>). <article-title>Learning harmony: the role of serial statistics</article-title>. <source>Cogn. Sci.</source> <volume>33</volume>, <fpage>951</fpage>&#x2013;<lpage>968</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.1551-6709.2009.01036.x</pub-id>, PMID: <pub-id pub-id-type="pmid">21585492</pub-id></citation></ref>
<ref id="ref60"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Kachergis</surname> <given-names>G.</given-names></name> <name><surname>Yu</surname> <given-names>C.</given-names></name> <name><surname>Shiffrin</surname> <given-names>R. M.</given-names></name></person-group> (<year>2010</year>). &#x201C;Cross-situational statistical learning: implicit or intentional?&#x201D; in <italic>Proceedings of the Annual Meeting of the Cognitive Science Society</italic>. <italic>Vol 32</italic>. August 11&#x2013;14, 2010.</citation></ref>
<ref id="ref61"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kraus</surname> <given-names>N.</given-names></name> <name><surname>Chandrasekaran</surname> <given-names>B.</given-names></name></person-group> (<year>2010</year>). <article-title>Music training for the development of auditory skills</article-title>. <source>Nat. Rev. Neurosci.</source> <volume>11</volume>, <fpage>599</fpage>&#x2013;<lpage>605</lpage>. doi: <pub-id pub-id-type="doi">10.1038/nrn2882</pub-id></citation></ref>
<ref id="ref62"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kraus</surname> <given-names>N.</given-names></name> <name><surname>White-Schwoch</surname> <given-names>T.</given-names></name></person-group> (<year>2017</year>). <article-title>Neurobiology of everyday communication: what have we learned from music?</article-title> <source>Neuroscientist</source> <volume>23</volume>, <fpage>287</fpage>&#x2013;<lpage>298</lpage>. doi: <pub-id pub-id-type="doi">10.1177/1073858416653593</pub-id>, PMID: <pub-id pub-id-type="pmid">27284021</pub-id></citation></ref>
<ref id="ref63"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krishnan</surname> <given-names>A.</given-names></name> <name><surname>Xu</surname> <given-names>Y.</given-names></name> <name><surname>Gandour</surname> <given-names>J.</given-names></name> <name><surname>Cariani</surname> <given-names>P.</given-names></name></person-group> (<year>2005</year>). <article-title>Encoding of pitch in the human brainstem is sensitive to language experience</article-title>. <source>Cogn. Brain Res.</source> <volume>25</volume>, <fpage>161</fpage>&#x2013;<lpage>168</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cogbrainres.2005.05.004</pub-id>, PMID: <pub-id pub-id-type="pmid">15935624</pub-id></citation></ref>
<ref id="ref64"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krizman</surname> <given-names>J.</given-names></name> <name><surname>Marian</surname> <given-names>V.</given-names></name> <name><surname>Shook</surname> <given-names>A.</given-names></name> <name><surname>Skoe</surname> <given-names>E.</given-names></name> <name><surname>Kraus</surname> <given-names>N.</given-names></name></person-group> (<year>2012</year>). <article-title>Subcortical encoding of sound in enhanced in bilinguals and relates to executive function advantages</article-title>. <source>Proc. Natl. Acad. Sci.</source> <volume>109</volume>, <fpage>7877</fpage>&#x2013;<lpage>7881</lpage>. doi: <pub-id pub-id-type="doi">10.1073/pnas.1201575109</pub-id>, PMID: <pub-id pub-id-type="pmid">22547804</pub-id></citation></ref>
<ref id="ref65"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kruschke</surname> <given-names>J. K.</given-names></name></person-group> (<year>2018</year>). <article-title>Rejecting or accepting parameter values in Bayesian estimation</article-title>. <source>Adv. Methods Pract. Psychol. Sci.</source> <volume>1</volume>, <fpage>270</fpage>&#x2013;<lpage>280</lpage>. doi: <pub-id pub-id-type="doi">10.1177/2515245918771304</pub-id></citation></ref>
<ref id="ref66"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kuhl</surname> <given-names>P. K.</given-names></name> <name><surname>Andruski</surname> <given-names>J. E.</given-names></name> <name><surname>Chistovich</surname> <given-names>I. A.</given-names></name> <name><surname>Chistovich</surname> <given-names>L. A.</given-names></name> <name><surname>Kozhevnikova</surname> <given-names>E. V.</given-names></name> <name><surname>Ryskina</surname> <given-names>V. L.</given-names></name> <etal/></person-group>. (<year>1997</year>). <article-title>Cross-language analysis of phonetic units in language addressed to infants</article-title>. <source>Science</source> <volume>277</volume>, <fpage>684</fpage>&#x2013;<lpage>686</lpage>. doi: <pub-id pub-id-type="doi">10.1126/science.277.5326.684</pub-id>, PMID: <pub-id pub-id-type="pmid">9235890</pub-id></citation></ref>
<ref id="ref67"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>K&#x00FC;hnis</surname> <given-names>J.</given-names></name> <name><surname>Elmer</surname> <given-names>S.</given-names></name> <name><surname>Meyer</surname> <given-names>M.</given-names></name> <name><surname>J&#x00E4;ncke</surname> <given-names>L.</given-names></name></person-group> (<year>2013</year>). <article-title>The encoding of vowels and temporal speech cues in the auditory cortex of professional musicians: an EEG study</article-title>. <source>Neuropsychologia</source> <volume>51</volume>, <fpage>1608</fpage>&#x2013;<lpage>1618</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.neuropsychologia.2013.04.007</pub-id>, PMID: <pub-id pub-id-type="pmid">23664833</pub-id></citation></ref>
<ref id="ref68"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kunert</surname> <given-names>R.</given-names></name> <name><surname>Willems</surname> <given-names>R. M.</given-names></name> <name><surname>Hagoort</surname> <given-names>P.</given-names></name></person-group> (<year>2016</year>). <article-title>An independent psychometric evaluation of the PROMS measure of music perception skills</article-title>. <source>PLoS One</source> <volume>11</volume>:<fpage>e0159103</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0159103</pub-id>, PMID: <pub-id pub-id-type="pmid">27398805</pub-id></citation></ref>
<ref id="ref69"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lamb</surname> <given-names>S. J.</given-names></name> <name><surname>Gregory</surname> <given-names>A. H.</given-names></name></person-group> (<year>1993</year>). <article-title>The relationship between music and reading in beginning readers</article-title>. <source>J. Educ. Psychol.</source> <volume>13</volume>, <fpage>19</fpage>&#x2013;<lpage>27</lpage>. doi: <pub-id pub-id-type="doi">10.1080/0144341930130103</pub-id></citation></ref>
<ref id="ref70"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Larrouy-Maestri</surname> <given-names>P.</given-names></name> <name><surname>Harrison</surname> <given-names>P. M. C.</given-names></name> <name><surname>M&#x00FC;llensiefen</surname> <given-names>D.</given-names></name></person-group> (<year>2018</year>). <article-title>Mistuning perception test, psychTestR implementation</article-title>. <source>Zenodo.</source> doi: <pub-id pub-id-type="doi">10.5281/zenodo.1415363</pub-id></citation></ref>
<ref id="ref71"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Larrouy-Maestri</surname> <given-names>P.</given-names></name> <name><surname>Harrison</surname> <given-names>P. M. C.</given-names></name> <name><surname>M&#x00FC;llensiefen</surname> <given-names>D.</given-names></name></person-group> (<year>2019</year>). <article-title>The mistuning perception test: A new measurement instrument</article-title>. <source>Behav. Res. Methods</source> <volume>51</volume>, <fpage>663</fpage>&#x2013;<lpage>675</lpage>. doi: <pub-id pub-id-type="doi">10.3578/s13428-019-01225-1</pub-id>, PMID: <pub-id pub-id-type="pmid">30924106</pub-id></citation></ref>
<ref id="ref72"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Law</surname> <given-names>L. N. C.</given-names></name> <name><surname>Zentner</surname> <given-names>M.</given-names></name></person-group> (<year>2012</year>). <article-title>Assessing musical abilities objectively: construction and validation of the profile of music perception skills</article-title>. <source>PLoS One</source> <volume>7</volume>:<fpage>e52508</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0052508</pub-id>, PMID: <pub-id pub-id-type="pmid">23285071</pub-id></citation></ref>
<ref id="ref73"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leung</surname> <given-names>Y.</given-names></name> <name><surname>Dean</surname> <given-names>R. T.</given-names></name></person-group> (<year>2018</year>). <article-title>Learning unfamiliar pitch intervals: a novel paradigm for demonstrating the learning of statistical associations between musical pitches</article-title>. <source>PLoS One</source> <volume>13</volume>:<fpage>e0203026</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0203026</pub-id>, PMID: <pub-id pub-id-type="pmid">30161174</pub-id></citation></ref>
<ref id="ref74"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>J.</given-names></name> <name><surname>Hilton</surname> <given-names>C. B.</given-names></name> <name><surname>Bergelson</surname> <given-names>E.</given-names></name> <name><surname>Mehr</surname> <given-names>S. A.</given-names></name></person-group> (<year>2021</year>). Language experience shapes music processing across 40 tonal, pitch-accented, and non-tonal languages. <italic>bioRxiv</italic>. doi: <pub-id pub-id-type="doi">10.1101/2021.10.18.464888</pub-id> <comment>[Epub ahead of print]</comment></citation></ref>
<ref id="ref75"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Loui</surname> <given-names>P.</given-names></name> <name><surname>Wessel</surname> <given-names>D. L.</given-names></name> <name><surname>Hudson Kam</surname> <given-names>C. L.</given-names></name></person-group> (<year>2010</year>). <article-title>Human rapidly learn grammatical structure in a new musical scale</article-title>. <source>Music. Percept.</source> <volume>27</volume>, <fpage>377</fpage>&#x2013;<lpage>388</lpage>. doi: <pub-id pub-id-type="doi">10.1525/mp.2010.27.5.377</pub-id>, PMID: <pub-id pub-id-type="pmid">20740059</pub-id></citation></ref>
<ref id="ref76"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>W.</given-names></name> <name><surname>Golinkoff</surname> <given-names>R. M.</given-names></name> <name><surname>Houston</surname> <given-names>D. M.</given-names></name> <name><surname>Hirsh-Pasek</surname> <given-names>K.</given-names></name></person-group> (<year>2011</year>). <article-title>Word learning in infant- and adult-directed speech</article-title>. <source>Lang. Learn. Dev.</source> <volume>7</volume>, <fpage>185</fpage>&#x2013;<lpage>201</lpage>. doi: <pub-id pub-id-type="doi">10.1080/15475441.2011.579839</pub-id>, PMID: <pub-id pub-id-type="pmid">29129970</pub-id></citation></ref>
<ref id="ref77"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Magis</surname> <given-names>D.</given-names></name> <name><surname>Ra&#x00EE;che</surname> <given-names>G.</given-names></name></person-group> (<year>2012</year>). <article-title>Random generation of response patterns under computerized adaptive testing with the R package catR</article-title>. <source>J. Stat. Softw.</source> <volume>48</volume>:<fpage>i08</fpage>. doi: <pub-id pub-id-type="doi">10.18637/jss.v048.i08</pub-id></citation></ref>
<ref id="ref78"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Magne</surname> <given-names>C.</given-names></name> <name><surname>Sch&#x00F6;n</surname> <given-names>D.</given-names></name> <name><surname>Besson</surname> <given-names>M.</given-names></name></person-group> (<year>2006</year>). <article-title>Musician children detect pitch violations in both music and language better than nonmusician children: behavioral and electrophysiological approaches</article-title>. <source>J. Cogn. Neurosci.</source> <volume>18</volume>, <fpage>199</fpage>&#x2013;<lpage>211</lpage>. doi: <pub-id pub-id-type="doi">10.1162/jocn.2006.18.2.199</pub-id>, PMID: <pub-id pub-id-type="pmid">16494681</pub-id></citation></ref>
<ref id="ref79"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Makowski</surname> <given-names>D.</given-names></name> <name><surname>Ben-Shachar</surname> <given-names>M. S.</given-names></name> <name><surname>Chen</surname> <given-names>S. H. A.</given-names></name> <name><surname>L&#x00FC;decke</surname> <given-names>D.</given-names></name></person-group> (<year>2019</year>). <article-title>Indices of effect existence and significance in the Bayesian framework</article-title>. <source>Front. Psychol.</source> <volume>10</volume>:<fpage>2767</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2019.02767</pub-id>, PMID: <pub-id pub-id-type="pmid">31920819</pub-id></citation></ref>
<ref id="ref80"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marie</surname> <given-names>C.</given-names></name> <name><surname>Delogu</surname> <given-names>F.</given-names></name> <name><surname>Lampis</surname> <given-names>G.</given-names></name> <name><surname>Belardinelli</surname> <given-names>M. O.</given-names></name> <name><surname>Besson</surname> <given-names>M.</given-names></name></person-group> (<year>2011a</year>). <article-title>Influence of musical expertise on segmental and tonal processing in mandarin Chinese</article-title>. <source>J. Cogn. Neurosci.</source> <volume>23</volume>, <fpage>2701</fpage>&#x2013;<lpage>2715</lpage>. doi: <pub-id pub-id-type="doi">10.1162/jocn.2010.21585</pub-id>, PMID: <pub-id pub-id-type="pmid">20946053</pub-id></citation></ref>
<ref id="ref81"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marie</surname> <given-names>C.</given-names></name> <name><surname>Magne</surname> <given-names>C.</given-names></name> <name><surname>Besson</surname> <given-names>M.</given-names></name></person-group> (<year>2011b</year>). <article-title>Musicians and the metric structure of words</article-title>. <source>J. Cogn. Neurosci.</source> <volume>23</volume>, <fpage>294</fpage>&#x2013;<lpage>305</lpage>. doi: <pub-id pub-id-type="doi">10.1162/jocn.2010.21413</pub-id>, PMID: <pub-id pub-id-type="pmid">20044890</pub-id></citation></ref>
<ref id="ref82"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Milne</surname> <given-names>A. J.</given-names></name> <name><surname>Herff</surname> <given-names>S. A.</given-names></name></person-group> (<year>2020</year>). <article-title>The perceptual relevance of balance, evenness, and entropy in musical rhythms</article-title>. <source>Cognition</source> <volume>203</volume>:<fpage>104233</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cognition.2020.104233</pub-id>, PMID: <pub-id pub-id-type="pmid">32629203</pub-id></citation></ref>
<ref id="ref83"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moreno</surname> <given-names>S.</given-names></name> <name><surname>Marques</surname> <given-names>C.</given-names></name> <name><surname>Santos</surname> <given-names>A.</given-names></name> <name><surname>Santos</surname> <given-names>M.</given-names></name> <name><surname>Castro</surname> <given-names>S. L.</given-names></name> <name><surname>Besson</surname> <given-names>M.</given-names></name></person-group> (<year>2009</year>). <article-title>Musical training influences linguistic abilities in 8-year-old children: more evidence for brain plasticity</article-title>. <source>Cereb. Cortex</source> <volume>19</volume>, <fpage>712</fpage>&#x2013;<lpage>723</lpage>. doi: <pub-id pub-id-type="doi">10.1093/cercor/bhn120</pub-id>, PMID: <pub-id pub-id-type="pmid">18832336</pub-id></citation></ref>
<ref id="ref84"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mulak</surname> <given-names>K. E.</given-names></name> <name><surname>Vlach</surname> <given-names>H. A.</given-names></name> <name><surname>Escudero</surname> <given-names>P.</given-names></name></person-group> (<year>2019</year>). <article-title>Cross-situational learning of phonologically overlapping words across degrees of ambiguity</article-title>. <source>Cogn. Sci.</source> <volume>43</volume>:<fpage>e12731</fpage>. doi: <pub-id pub-id-type="doi">10.1111/cogs.12731</pub-id></citation></ref>
<ref id="ref85"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>M&#x00FC;llensiefen</surname> <given-names>D.</given-names></name> <name><surname>Gingras</surname> <given-names>B.</given-names></name> <name><surname>Musil</surname> <given-names>J.</given-names></name> <name><surname>Stewart</surname> <given-names>L.</given-names></name></person-group> (<year>2014</year>). <article-title>The musicality of non-musicians: an index for assessing musical sophistication in the general population</article-title>. <source>PLoS One</source> <volume>9</volume>:<fpage>e101091</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0089642</pub-id></citation></ref>
<ref id="ref86"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Musacchia</surname> <given-names>G.</given-names></name> <name><surname>Sams</surname> <given-names>M.</given-names></name> <name><surname>Skoe</surname> <given-names>E.</given-names></name> <name><surname>Kraus</surname> <given-names>N.</given-names></name></person-group> (<year>2007</year>). <article-title>Musicians have enhanced subcortical auditory and audiovisual processing of speech and music</article-title>. <source>Proc. Natl. Acad. Sci. U. S. A.</source> <volume>104</volume>, <fpage>15894</fpage>&#x2013;<lpage>15898</lpage>. doi: <pub-id pub-id-type="doi">10.1073/pnas.0701498104</pub-id>, PMID: <pub-id pub-id-type="pmid">17898180</pub-id></citation></ref>
<ref id="ref87"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ong</surname> <given-names>J. H.</given-names></name> <name><surname>Burnham</surname> <given-names>D.</given-names></name> <name><surname>Escudero</surname> <given-names>P.</given-names></name></person-group> (<year>2015</year>). <article-title>Distributional learning of lexical tones: a comparison of attended vs. unattended listening</article-title>. <source>PLoS One</source> <volume>10</volume>:<fpage>e0133446</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0133446</pub-id>, PMID: <pub-id pub-id-type="pmid">26214002</pub-id></citation></ref>
<ref id="ref88"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ong</surname> <given-names>J. H.</given-names></name> <name><surname>Burnham</surname> <given-names>D.</given-names></name> <name><surname>Escudero</surname> <given-names>P.</given-names></name> <name><surname>Stevens</surname> <given-names>C. J.</given-names></name></person-group> (<year>2017b</year>). <article-title>Effect of linguistic and musical experience on distributional learning of nonnative lexical tones</article-title>. <source>J. Speech Lang. Hear. Res.</source> <volume>60</volume>, <fpage>2769</fpage>&#x2013;<lpage>2780</lpage>. doi: <pub-id pub-id-type="doi">10.1044/2016_JSLHR-S-16-0080</pub-id>, PMID: <pub-id pub-id-type="pmid">28975194</pub-id></citation></ref>
<ref id="ref89"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ong</surname> <given-names>J. H.</given-names></name> <name><surname>Burnham</surname> <given-names>D.</given-names></name> <name><surname>Stevens</surname> <given-names>C. J.</given-names></name></person-group> (<year>2017a</year>). <article-title>Learning novel musical pitch via distributional learning</article-title>. <source>J. Exp. Psychol. Learn. Mem. Cogn.</source> <volume>43</volume>, <fpage>150</fpage>&#x2013;<lpage>157</lpage>. doi: <pub-id pub-id-type="doi">10.1037/xlm0000286</pub-id>, PMID: <pub-id pub-id-type="pmid">27149394</pub-id></citation></ref>
<ref id="ref90"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ong</surname> <given-names>J. H.</given-names></name> <name><surname>Burnham</surname> <given-names>D.</given-names></name> <name><surname>Stevens</surname> <given-names>C. J.</given-names></name> <name><surname>Escudero</surname> <given-names>P.</given-names></name></person-group> (<year>2016</year>). <article-title>Na&#x00EF;ve learners show cross-domain transfer after distributional learning: the case of lexical and musical pitch</article-title>. <source>Front. Psychol.</source> <volume>7</volume>:<fpage>1189</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2016.01189</pub-id>, PMID: <pub-id pub-id-type="pmid">26941672</pub-id></citation></ref>
<ref id="ref91"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ong</surname> <given-names>J. H.</given-names></name> <name><surname>Wong</surname> <given-names>P. C. M.</given-names></name> <name><surname>Liu</surname> <given-names>F.</given-names></name></person-group> (<year>2020</year>). <article-title>Musicians show enhanced perception, but not production, of native lexical tones</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>148</volume>, <fpage>3443</fpage>&#x2013;<lpage>3454</lpage>. doi: <pub-id pub-id-type="doi">10.1121/10.0002776</pub-id>, PMID: <pub-id pub-id-type="pmid">33379922</pub-id></citation></ref>
<ref id="ref92"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ott</surname> <given-names>C. G. M.</given-names></name> <name><surname>Lnager</surname> <given-names>N.</given-names></name> <name><surname>Oeschlin</surname> <given-names>M. S.</given-names></name> <name><surname>Meyer</surname> <given-names>M.</given-names></name> <name><surname>J&#x00E4;ncke</surname> <given-names>L.</given-names></name></person-group> (<year>2011</year>). <article-title>Processing of voiced and unvoiced acoustic stimuli in musicians</article-title>. <source>Front. Psychol.</source> <volume>2</volume>:<fpage>195</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2011.00195</pub-id>, PMID: <pub-id pub-id-type="pmid">21922011</pub-id></citation></ref>
<ref id="ref93"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Overy</surname> <given-names>K.</given-names></name></person-group> (<year>2003</year>). <article-title>Dyslexia and music: from timing deficits to musical intervention</article-title>. <source>Ann. N. Y. Acad. Sci.</source> <volume>999</volume>, <fpage>497</fpage>&#x2013;<lpage>505</lpage>. doi: <pub-id pub-id-type="doi">10.1196/annals.1284.060</pub-id></citation></ref>
<ref id="ref94"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Parbery-Clark</surname> <given-names>A.</given-names></name> <name><surname>Strait</surname> <given-names>D. L.</given-names></name> <name><surname>Anderson</surname> <given-names>S.</given-names></name> <name><surname>Hittner</surname> <given-names>E.</given-names></name> <name><surname>Kraus</surname> <given-names>N.</given-names></name></person-group> (<year>2011</year>). <article-title>Musical experience and the aging auditory system: implications for cognitive abilities and hearing speech in noise</article-title>. <source>PLoS One</source> <volume>6</volume>:<fpage>e18082</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0018082</pub-id>, PMID: <pub-id pub-id-type="pmid">21589653</pub-id></citation></ref>
<ref id="ref95"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Patel</surname> <given-names>A. D.</given-names></name></person-group> (<year>2003</year>). <article-title>Language, music, syntax and the brain</article-title>. <source>Nat. Neurosci.</source> <volume>6</volume>, <fpage>674</fpage>&#x2013;<lpage>681</lpage>. doi: <pub-id pub-id-type="doi">10.1038/nn1082</pub-id></citation></ref>
<ref id="ref96"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Patel</surname> <given-names>A. D.</given-names></name> <name><surname>Iversen</surname> <given-names>J. R.</given-names></name></person-group> (<year>2007</year>). <article-title>The linguistic benefits of musical abilities</article-title>. <source>Trends Cogn. Sci.</source> <volume>11</volume>, <fpage>369</fpage>&#x2013;<lpage>372</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.tics.2007.08.003</pub-id>, PMID: <pub-id pub-id-type="pmid">17698406</pub-id></citation></ref>
<ref id="ref97"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pearce</surname> <given-names>M. T.</given-names></name> <name><surname>Ruiz</surname> <given-names>M. H.</given-names></name> <name><surname>Kapasi</surname> <given-names>S.</given-names></name> <name><surname>Wiggins</surname> <given-names>G. A.</given-names></name> <name><surname>Bhattacharya</surname> <given-names>J.</given-names></name></person-group> (<year>2010</year>). <article-title>Unsupervised statistical learning underpins computational, behavioural, and neural manifestations of musical expectation</article-title>. <source>NeuroImage</source> <volume>50</volume>, <fpage>302</fpage>&#x2013;<lpage>313</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.neuroimage.2009.12.019</pub-id>, PMID: <pub-id pub-id-type="pmid">20005297</pub-id></citation></ref>
<ref id="ref98"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peirce</surname> <given-names>J. W.</given-names></name></person-group> (<year>2007</year>). <article-title>PsychoPy-psychophysics software in python</article-title>. <source>J. Neurosci. Methods</source> <volume>162</volume>, <fpage>8</fpage>&#x2013;<lpage>13</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jneumeth.2006.11.017</pub-id>, PMID: <pub-id pub-id-type="pmid">17254636</pub-id></citation></ref>
<ref id="ref99"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peirce</surname> <given-names>J. W.</given-names></name> <name><surname>Gray</surname> <given-names>J. R.</given-names></name> <name><surname>Simpson</surname> <given-names>S.</given-names></name> <name><surname>MacAskill</surname> <given-names>M. R.</given-names></name> <name><surname>H&#x00F6;chenberger</surname> <given-names>R.</given-names></name> <name><surname>Sogo</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>PsychoPy2: experiments in behavior made easy</article-title>. <source>Behav. Res. Methods</source> <volume>51</volume>, <fpage>195</fpage>&#x2013;<lpage>203</lpage>. doi: <pub-id pub-id-type="doi">10.3758/s13428-01801193-y</pub-id>, PMID: <pub-id pub-id-type="pmid">30734206</pub-id></citation></ref>
<ref id="ref100"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Perfors</surname> <given-names>A.</given-names></name> <name><surname>Ong</surname> <given-names>J. H.</given-names></name></person-group> (<year>2012</year>). &#x201C;Musicians Are Better at Learning Non-native Sound Contrasts Even in Non-tonal Languages,&#x201D; in <italic>Proceedings of the 34th Annual Conference of the Cognitive Science Society</italic>. eds. N. Miyake, D. Peebles and R. P. Cooper. August 1-4, 2012. Cognitive Science Society, 839&#x2013;844.</citation></ref>
<ref id="ref101"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pinheiro</surname> <given-names>A. P.</given-names></name> <name><surname>Vasconcelos</surname> <given-names>M.</given-names></name> <name><surname>Dias</surname> <given-names>M.</given-names></name> <name><surname>Arrais</surname> <given-names>N.</given-names></name> <name><surname>Gon&#x00E7;alves</surname> <given-names>&#x00D3;. F.</given-names></name></person-group> (<year>2015</year>). <article-title>The music of language: An ERP investigation of the effects of musical training on emotional prosody processing</article-title>. <source>Brain Lang.</source> <volume>140</volume>, <fpage>24</fpage>&#x2013;<lpage>34</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.bandl.2014.10.009</pub-id>, PMID: <pub-id pub-id-type="pmid">25461917</pub-id></citation></ref>
<ref id="ref102"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Pinker</surname> <given-names>S.</given-names></name></person-group> (<year>1997</year>). <source>How the Mind Works.</source> <publisher-loc>New York</publisher-loc>: <publisher-name>Norton</publisher-name>.</citation></ref>
<ref id="ref103"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pisoni</surname> <given-names>D. B.</given-names></name></person-group> (<year>1973</year>). <article-title>Auditory and phonetic memory codes in the discrimination of consonants and vowels</article-title>. <source>Percept. Psychophys.</source> <volume>13</volume>, <fpage>253</fpage>&#x2013;<lpage>260</lpage>. doi: <pub-id pub-id-type="doi">10.3758/BF03214136</pub-id>, PMID: <pub-id pub-id-type="pmid">23226880</pub-id></citation></ref>
<ref id="ref104"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pisoni</surname> <given-names>D. B.</given-names></name></person-group> (<year>1975</year>). <article-title>Auditory short-term memory and vowel perception</article-title>. <source>Mem. Cogn.</source> <volume>3</volume>, <fpage>7</fpage>&#x2013;<lpage>18</lpage>. doi: <pub-id pub-id-type="doi">10.3758/BF03198202</pub-id>, PMID: <pub-id pub-id-type="pmid">24203819</pub-id></citation></ref>
<ref id="ref105"><citation citation-type="other"><person-group person-group-type="author"><collab id="coll1">R Core Team</collab></person-group> (<year>2020</year>). R: A Language and Environment for Statistical Computing [Computer Software Manual]. Vienna, Austria. Available at: <ext-link xlink:href="https://www.R-project.org/" ext-link-type="uri">https://www.R-project.org/</ext-link></citation></ref>
<ref id="ref106"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rogalsky</surname> <given-names>C.</given-names></name> <name><surname>Rong</surname> <given-names>F.</given-names></name> <name><surname>Saberi</surname> <given-names>K.</given-names></name> <name><surname>Hickok</surname> <given-names>G.</given-names></name></person-group> (<year>2011</year>). <article-title>Functional anatomy of language and music perception: temporal and structural factors investigated using functional magnetic resonance imaging</article-title>. <source>J. Neurosci.</source> <volume>31</volume>, <fpage>3843</fpage>&#x2013;<lpage>3852</lpage>. doi: <pub-id pub-id-type="doi">10.1523/JNEUROSCI.4515-10.2011</pub-id>, PMID: <pub-id pub-id-type="pmid">21389239</pub-id></citation></ref>
<ref id="ref107"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ruggles</surname> <given-names>D. R.</given-names></name> <name><surname>Freyman</surname> <given-names>R. L.</given-names></name> <name><surname>Oxenham</surname> <given-names>A. J.</given-names></name></person-group> (<year>2014</year>). <article-title>Influence of musical training on understanding voiced and whispered speech in noise</article-title>. <source>PLoS One</source> <volume>9</volume>:<fpage>e86980</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0086980</pub-id>, PMID: <pub-id pub-id-type="pmid">24489819</pub-id></citation></ref>
<ref id="ref108"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sammler</surname> <given-names>D.</given-names></name> <name><surname>Grigutsch</surname> <given-names>M.</given-names></name> <name><surname>Fritz</surname> <given-names>T.</given-names></name> <name><surname>Koelsch</surname> <given-names>S.</given-names></name></person-group> (<year>2007</year>). <article-title>Music and emotion: electrophysiological correlates of the processing of pleasant and unpleasant music</article-title>. <source>Psychophysiology</source> <volume>44</volume>, <fpage>293</fpage>&#x2013;<lpage>304</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.1469-8986.2007.00497.x</pub-id>, PMID: <pub-id pub-id-type="pmid">17343712</pub-id></citation></ref>
<ref id="ref109"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sch&#x00F6;n</surname> <given-names>D.</given-names></name> <name><surname>Magne</surname> <given-names>C.</given-names></name> <name><surname>Besson</surname> <given-names>M.</given-names></name></person-group> (<year>2004</year>). <article-title>The music of speech: music training facilitates pitch processing in both music and language</article-title>. <source>Psychophysiology</source> <volume>41</volume>, <fpage>341</fpage>&#x2013;<lpage>349</lpage>. doi: <pub-id pub-id-type="doi">10.1111/1469-8986.00172.x</pub-id>, PMID: <pub-id pub-id-type="pmid">15102118</pub-id></citation></ref>
<ref id="ref110"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schulze</surname> <given-names>K.</given-names></name> <name><surname>Zysset</surname> <given-names>S.</given-names></name> <name><surname>Mueller</surname> <given-names>K.</given-names></name> <name><surname>Friederici</surname> <given-names>A. D.</given-names></name> <name><surname>Koelsch</surname> <given-names>S.</given-names></name></person-group> (<year>2011</year>). <article-title>Neuroarchitecture of verbal and tonal working memory in nonmusicians and musicians</article-title>. <source>Hum. Brain Mapp.</source> <volume>32</volume>, <fpage>771</fpage>&#x2013;<lpage>783</lpage>. doi: <pub-id pub-id-type="doi">10.1002/hbm.21060</pub-id>, PMID: <pub-id pub-id-type="pmid">20533560</pub-id></citation></ref>
<ref id="ref111"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Slevc</surname> <given-names>L. R.</given-names></name> <name><surname>Miyake</surname> <given-names>A.</given-names></name></person-group> (<year>2006</year>). <article-title>Individual differences in second-language proficiency: does musical ability matter?</article-title> <source>Psychol. Sci.</source> <volume>17</volume>, <fpage>675</fpage>&#x2013;<lpage>681</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.1467-9280.2006.01765.x</pub-id></citation></ref>
<ref id="ref112"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smit</surname> <given-names>E. A.</given-names></name> <name><surname>Milne</surname> <given-names>A. J.</given-names></name> <name><surname>Dean</surname> <given-names>R. T.</given-names></name> <name><surname>Weidemann</surname> <given-names>G.</given-names></name></person-group> (<year>2019</year>). <article-title>Perception of affect in unfamiliar musical chords</article-title>. <source>PLoS One</source> <volume>14</volume>:<fpage>e0218570</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0218570</pub-id>, PMID: <pub-id pub-id-type="pmid">31226170</pub-id></citation></ref>
<ref id="ref113"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>A. D. M.</given-names></name> <name><surname>Smith</surname> <given-names>K.</given-names></name></person-group> (<year>2012</year>). &#x201C;<article-title>Cross-situational learning</article-title>,&#x201D; in <source>Encyclopedia of the Sciences of Learning.</source> ed. <person-group person-group-type="editor"><name><surname>Seel</surname> <given-names>N. M.</given-names></name></person-group> (<publisher-loc>Boston, MA</publisher-loc>: <publisher-name>Springer US</publisher-name>), <fpage>864</fpage>&#x2013;<lpage>866</lpage>.</citation></ref>
<ref id="ref114"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>L. B.</given-names></name> <name><surname>Yu</surname> <given-names>C.</given-names></name></person-group> (<year>2008</year>). <article-title>Infants rapidly learn word-referent mappings via cross-situational statistics</article-title>. <source>Cognition</source> <volume>106</volume>, <fpage>1558</fpage>&#x2013;<lpage>1568</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cognition.2007.06.010</pub-id>, PMID: <pub-id pub-id-type="pmid">17692305</pub-id></citation></ref>
<ref id="ref115"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stewart</surname> <given-names>E. C.</given-names></name> <name><surname>Pittman</surname> <given-names>A. L.</given-names></name></person-group> (<year>2021</year>). <article-title>Learning and retention of novel words in musicians and nonmusicians</article-title>. <source>J. Speech Lang. Hear. Res.</source> <volume>64</volume>, <fpage>2870</fpage>&#x2013;<lpage>2884</lpage>. doi: <pub-id pub-id-type="doi">10.1044/2021_JSLHR-20-00482</pub-id>, PMID: <pub-id pub-id-type="pmid">34185549</pub-id></citation></ref>
<ref id="ref116"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Strait</surname> <given-names>D. L.</given-names></name> <name><surname>Kraus</surname> <given-names>N.</given-names></name></person-group> (<year>2011a</year>). <article-title>Can you hear me now? Musical training shapes functional brain networks for selective auditory attention and hearing speech in noise</article-title>. <source>Front. Psychol.</source> <volume>2</volume>:<fpage>113</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2011.00113</pub-id>, PMID: <pub-id pub-id-type="pmid">21716636</pub-id></citation></ref>
<ref id="ref117"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Strait</surname> <given-names>D. L.</given-names></name> <name><surname>Kraus</surname> <given-names>N.</given-names></name></person-group> (<year>2011b</year>). <article-title>Playing music for a smarter ear: cognitive, perceptual and neurobiological evidence</article-title>. <source>Music. Percept.</source> <volume>29</volume>, <fpage>133</fpage>&#x2013;<lpage>146</lpage>. doi: <pub-id pub-id-type="doi">10.1525/mp.2011.29.2.133</pub-id>, PMID: <pub-id pub-id-type="pmid">22993456</pub-id></citation></ref>
<ref id="ref118"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Strange</surname> <given-names>W.</given-names></name></person-group> (ed.) (<year>1999</year>). &#x201C;Second language speech learning theory, findings, and problems,&#x201D; in <italic>Speech perception and linguistic experience: issues in cross&#x2010;language research</italic>. Timonium, MD: York Press, 229-273.</citation></ref>
<ref id="ref119"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Suanda</surname> <given-names>S. H.</given-names></name> <name><surname>Mugwanya</surname> <given-names>N.</given-names></name> <name><surname>Namy</surname> <given-names>L. L.</given-names></name></person-group> (<year>2014</year>). <article-title>Cross-situational statistical word learning in young children</article-title>. <source>J. Exp. Child Psychol.</source> <volume>126</volume>, <fpage>395</fpage>&#x2013;<lpage>411</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jecp.2014.06.003</pub-id>, PMID: <pub-id pub-id-type="pmid">25015421</pub-id></citation></ref>
<ref id="ref120"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Swaminathan</surname> <given-names>S.</given-names></name> <name><surname>Gopinath</surname> <given-names>J. K.</given-names></name></person-group> (<year>2013</year>). <article-title>Music training and second-language English comprehension and vocabulary skills in Indian children</article-title>. <source>Psychol. Stud.</source> <volume>58</volume>, <fpage>164</fpage>&#x2013;<lpage>170</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s12646-013-0180-3</pub-id></citation></ref>
<ref id="ref121"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Swaminathan</surname> <given-names>S.</given-names></name> <name><surname>Schellenberg</surname> <given-names>E. G.</given-names></name></person-group> (<year>2017</year>). <article-title>Musical competence and phoneme perception in a foreign language</article-title>. <source>Psychon. Bull. Rev.</source> <volume>24</volume>, <fpage>1929</fpage>&#x2013;<lpage>1934</lpage>. doi: <pub-id pub-id-type="doi">10.3758/s13423-017-1244-5</pub-id>, PMID: <pub-id pub-id-type="pmid">28204984</pub-id></citation></ref>
<ref id="ref122"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Talamini</surname> <given-names>F.</given-names></name> <name><surname>Alto&#x00E8;</surname> <given-names>G.</given-names></name> <name><surname>Carretti</surname> <given-names>B.</given-names></name> <name><surname>Grassi</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>Musicians have better memory than nonmusicians: a meta-analysis</article-title>. <source>PLoS One</source> <volume>12</volume>:<fpage>e0186773</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0186773</pub-id>, PMID: <pub-id pub-id-type="pmid">29049416</pub-id></citation></ref>
<ref id="ref123"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Talamini</surname> <given-names>F.</given-names></name> <name><surname>Grassi</surname> <given-names>M.</given-names></name> <name><surname>Toffalini</surname> <given-names>E.</given-names></name> <name><surname>Santoni</surname> <given-names>R.</given-names></name> <name><surname>Carretti</surname> <given-names>B.</given-names></name></person-group> (<year>2018</year>). <article-title>Learning a second language: can music aptitude or music training have a role?</article-title> <source>Learn. Individ. Differ.</source> <volume>64</volume>, <fpage>1</fpage>&#x2013;<lpage>7</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.lindif.2018.04.003</pub-id></citation></ref>
<ref id="ref124"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tallal</surname> <given-names>P.</given-names></name> <name><surname>Gaab</surname> <given-names>N.</given-names></name></person-group> (<year>2006</year>). <article-title>Dynamic auditory processing, musical experience and language development</article-title>. <source>Trends Neurosci.</source> <volume>29</volume>, <fpage>382</fpage>&#x2013;<lpage>390</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.tins.2006.06.003</pub-id>, PMID: <pub-id pub-id-type="pmid">16806512</pub-id></citation></ref>
<ref id="ref125"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Tervaniemi</surname> <given-names>M.</given-names></name></person-group> (<year>2001</year>). &#x201C;<article-title>Musical sound processing in the human brain: evidence from electric and magnetic recordings</article-title>,&#x201D; in <source>The Biological Foundations of Music.</source> <volume>Vol. 930</volume>. eds. <person-group person-group-type="editor"><name><surname>Zatorre</surname> <given-names>R. J.</given-names></name> <name><surname>Peretz</surname> <given-names>I.</given-names></name></person-group> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>New York Academy of Sciences</publisher-name>), <fpage>259</fpage>&#x2013;<lpage>272</lpage>.</citation></ref>
<ref id="ref126"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tervaniemi</surname> <given-names>M.</given-names></name> <name><surname>Kujala</surname> <given-names>A.</given-names></name> <name><surname>Alho</surname> <given-names>K.</given-names></name> <name><surname>Virtanen</surname> <given-names>J.</given-names></name> <name><surname>Ilmoniemi</surname> <given-names>R. J.</given-names></name> <name><surname>N&#x00E4;&#x00E4;t&#x00E4;nen</surname> <given-names>R.</given-names></name></person-group> (<year>1999</year>). <article-title>Functional specialization of the human auditory cortex in processing phonetic and musical sounds: A magnetoencephalographic (MEG) study</article-title>. <source>NeuroImage</source> <volume>9</volume>, <fpage>330</fpage>&#x2013;<lpage>336</lpage>. doi: <pub-id pub-id-type="doi">10.1006/nimg.1999.0405</pub-id>, PMID: <pub-id pub-id-type="pmid">10075902</pub-id></citation></ref>
<ref id="ref127"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thomson</surname> <given-names>J. M.</given-names></name> <name><surname>Leong</surname> <given-names>V.</given-names></name> <name><surname>Goswami</surname> <given-names>U.</given-names></name></person-group> (<year>2013</year>). <article-title>Auditory processing interventions and developmental dyslexia: A comparison of phonemic and rhythmic approaches</article-title>. <source>Read. Writ.</source> <volume>26</volume>, <fpage>139</fpage>&#x2013;<lpage>161</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s11145-012-9359-6</pub-id></citation></ref>
<ref id="ref128"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tuninetti</surname> <given-names>A.</given-names></name> <name><surname>Mulak</surname> <given-names>K.</given-names></name> <name><surname>Escudero</surname> <given-names>P.</given-names></name></person-group> (<year>2020</year>). <article-title>Cross-situational word learning in two foreign languages: effects of native and perceptual difficulty</article-title>. <source>Front. Commun.</source> <volume>5</volume>:<fpage>602471</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fcomm.2020.602471</pub-id></citation></ref>
<ref id="ref129"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Leussen</surname> <given-names>J.-W.</given-names></name> <name><surname>Escudero</surname> <given-names>P.</given-names></name></person-group> (<year>2015</year>). <article-title>Learning to perceive and recognize a second language: The L2LP model revised</article-title>. <source>Front. Psychol.</source> <volume>6</volume>:<fpage>1000</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2015.01000</pub-id>, PMID: <pub-id pub-id-type="pmid">26300792</pub-id></citation></ref>
<ref id="ref130"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Varnet</surname> <given-names>L.</given-names></name> <name><surname>Wang</surname> <given-names>T.</given-names></name> <name><surname>Peter</surname> <given-names>C.</given-names></name> <name><surname>Meunier</surname> <given-names>F.</given-names></name> <name><surname>Hoen</surname> <given-names>M.</given-names></name></person-group> (<year>2015</year>). <article-title>How musical expertise shapes speech perception: evidence from auditory classification images</article-title>. <source>Sci. Rep.</source> <volume>5</volume>:<fpage>14489</fpage>. doi: <pub-id pub-id-type="doi">10.1038/srep14489</pub-id>, PMID: <pub-id pub-id-type="pmid">26399909</pub-id></citation></ref>
<ref id="ref131"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vlach</surname> <given-names>H. A.</given-names></name> <name><surname>Johnson</surname> <given-names>S. P.</given-names></name></person-group> (<year>2013</year>). <article-title>Memory constraints on infants&#x2019; cross-situational statistical learning</article-title>. <source>Cognition</source> <volume>127</volume>, <fpage>375</fpage>&#x2013;<lpage>382</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cognition.2013.02.015</pub-id>, PMID: <pub-id pub-id-type="pmid">23545387</pub-id></citation></ref>
<ref id="ref132"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vlach</surname> <given-names>H. A.</given-names></name> <name><surname>Sandhofer</surname> <given-names>C. M.</given-names></name></person-group> (<year>2014</year>). <article-title>Retrieval dynamics and retention in cross-situational statistical word learning</article-title>. <source>Cogn. Sci.</source> <volume>38</volume>, <fpage>757</fpage>&#x2013;<lpage>774</lpage>. doi: <pub-id pub-id-type="doi">10.1111/cogs.12092</pub-id>, PMID: <pub-id pub-id-type="pmid">24117698</pub-id></citation></ref>
<ref id="ref133"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Warm</surname> <given-names>T. A.</given-names></name></person-group> (<year>1989</year>). <article-title>Weighted likelihood estimation of ability in item response theory</article-title>. <source>Psychometrika</source> <volume>54</volume>, <fpage>427</fpage>&#x2013;<lpage>450</lpage>. doi: <pub-id pub-id-type="doi">10.1007/BF02294627</pub-id></citation></ref>
<ref id="ref134"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Warmington</surname> <given-names>M. A.</given-names></name> <name><surname>Kandru-Pothineni</surname> <given-names>S.</given-names></name> <name><surname>Hitch</surname> <given-names>G. J.</given-names></name></person-group> (<year>2019</year>). <article-title>Novel-word learning, executive control and working memory: a bilingual advantage</article-title>. <source>Bilingualism</source> <volume>22</volume>, <fpage>763</fpage>&#x2013;<lpage>782</lpage>. doi: <pub-id pub-id-type="doi">10.1017/S136672891800041X</pub-id></citation></ref>
<ref id="ref600"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>White-Schwoch</surname> <given-names>T.</given-names></name> <name><surname>Carr</surname> <given-names>K. W.</given-names></name> <name><surname>Anderson</surname> <given-names>S.</given-names></name> <name><surname>Strait</surname> <given-names>D. L.</given-names></name> <name><surname>Kraus</surname> <given-names>N.</given-names></name></person-group> (<year>2013</year>). <article-title>Older adults benefit from music training early in life: Biological evidence for long-term training-driven plasticity</article-title>. <source>J. Neurosci.</source> <volume>33</volume>, <fpage>17667</fpage>&#x2013;<lpage>17674</lpage>. doi: <pub-id pub-id-type="doi">10.1523/JNEUROSCI.2560-13.2013</pub-id>, PMID: <pub-id pub-id-type="pmid">22547804</pub-id></citation></ref>
<ref id="ref135"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wong</surname> <given-names>P. C. M.</given-names></name> <name><surname>Perrachione</surname> <given-names>T. K.</given-names></name></person-group> (<year>2007</year>). <article-title>Learning pitch patterns in lexical identification by native English-speaking adults</article-title>. <source>Appl. Psycholinguist.</source> <volume>28</volume>, <fpage>565</fpage>&#x2013;<lpage>585</lpage>. doi: <pub-id pub-id-type="doi">10.1017/S0142716407070312</pub-id></citation></ref>
<ref id="ref136"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>H.</given-names></name> <name><surname>Ma</surname> <given-names>W.</given-names></name> <name><surname>Gong</surname> <given-names>D.</given-names></name> <name><surname>Hu</surname> <given-names>J.</given-names></name> <name><surname>Yao</surname> <given-names>D.</given-names></name></person-group> (<year>2014</year>). <article-title>A longitudinal study on children&#x2019;s music training experience and academic development</article-title>. <source>Sci. Rep.</source> <volume>4</volume>:<fpage>5854</fpage>. doi: <pub-id pub-id-type="doi">10.1038/srep05854</pub-id>, PMID: <pub-id pub-id-type="pmid">25068398</pub-id></citation></ref>
<ref id="ref137"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yazawa</surname> <given-names>K.</given-names></name> <name><surname>Whang</surname> <given-names>J.</given-names></name> <name><surname>Kondo</surname> <given-names>M.</given-names></name> <name><surname>Escudero</surname> <given-names>P.</given-names></name></person-group> (<year>2020</year>). <article-title>Language-dependent cue weighting: An investigation of perception modes in L2 learning</article-title>. <source>Second. Lang. Res.</source> <volume>36</volume>, <fpage>557</fpage>&#x2013;<lpage>581</lpage>. doi: <pub-id pub-id-type="doi">10.1177/0267658319832645</pub-id></citation></ref>
<ref id="ref138"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>C.</given-names></name> <name><surname>Smith</surname> <given-names>L. B.</given-names></name></person-group> (<year>2007</year>). <article-title>Rapid word learning under uncertainty via cross-situational statistics</article-title>. <source>Psychol. Sci.</source> <volume>18</volume>, <fpage>414</fpage>&#x2013;<lpage>420</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.1467-9280.2007.01915.x</pub-id>, PMID: <pub-id pub-id-type="pmid">17576281</pub-id></citation></ref>
<ref id="ref139"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>J.</given-names></name> <name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Yang</surname> <given-names>Y.</given-names></name></person-group> (<year>2021</year>). <article-title>Effects of amateur musical experience on categorical perception of lexical tones by native Chinese adults: an ERP study</article-title>. <source>Front. Psychol.</source> <volume>12</volume>:<fpage>611189</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2021.611189</pub-id>, PMID: <pub-id pub-id-type="pmid">33790832</pub-id></citation></ref>
<ref id="ref140"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zuk</surname> <given-names>J.</given-names></name> <name><surname>Ozernov-Palchik</surname> <given-names>O.</given-names></name> <name><surname>Kim</surname> <given-names>H.</given-names></name> <name><surname>Lakshminarayanan</surname> <given-names>K.</given-names></name> <name><surname>Gabrieli</surname> <given-names>J. D. E.</given-names></name> <name><surname>Tallal</surname> <given-names>P.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Enhanced syllable discrimination thresholds in musicians</article-title>. <source>PLoS One</source> <volume>8</volume>:<fpage>e80546</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0080546</pub-id>, PMID: <pub-id pub-id-type="pmid">24339875</pub-id></citation></ref></ref-list>
</back>
</article>